Table of contents Table of contents...... 1 Summary ...... 3 Abbreviations ...... 4 Symbols I - Incidentals ...... 5 Gene symbols II - Target Gene Names ...... 6 1. Introduction...... 12 1.1. The Pathos of the Crab ...... 12 1.2. Cancer in the Molecular Age - a genetic and epigenetic disease...... 13 1.3. Tumoral evolution...... 14 1.4. DNA repair systems ...... 17 1.4.1. DNA Mismatch Repair...... 18 1.4.2. Double-strand Break Repair...... 19 1.4.3. Direct Repair...... 21 1.4.4. Nucleotide Excision Repair...... 21 1.4.5. Base Excision Repair ...... 21 1.5. The Cell Cycle and Apoptosis...... 22 1.6. Mutator Phenotypes...... 25 1.7. Colorectal cancer...... 25 1.7.1. Signaling pathways critical to colorectal carcinogenesis ...... 30 1.7.2. Colorectal cancer with microsatellite instability...... 35 1.7.2.1 MMR deficiency and replication slippage ...... 35 1.7.2.2. Other non-slippage-induced alterations in MSI tumors ...... 36 1.7.2.3. Concepts of downstream MSI target ...... 36 1.7.2.4. Nonsense-mediated decay and immunogenicity in MSI cancers ...... 39 1.8. Objectives...... 40 2. Materials and Methods ...... 41 2.1. Materials...... 41 2.1.1. MSI series...... 41 2.1.2. AUS series...... 41 2.1.3. Cell lines...... 41 2.2. Methods...... 42 2.2.1. DNA isolation...... 42 2.2.2. MSI status...... 42 2.2.3. Literature survey of putative target genes ...... 44 2.2.4. Target genes...... 46 2.2.4.1. Target gene selection ...... 46 2.2.4.2. Mutation analysis...... 49 2.2.4.3. Clustering analysis and survival correlation of target genes...... 53 2.2.4.4. In silico assessment of frameshift consequences ...... 54 3. Results...... 55 3.1. MSI series...... 55 3.1.1. MSI status...... 55 3.1.2 Literature survey...... 56 3.1.3. Target gene mutation...... 61

1 3.2. AUS series...... 68 3.2.1. DNA quantity and quality ...... 68 3.2.2. MSI status – AUS series...... 68 4. Discussion...... 69 Perspectives ...... 75 Reference List...... 76

Appendix 1 - Analysis of MSI status in MSI series and AUS series…………………………… Appendix 2 - Mutation analyses of MSI target genes…………………………………………… Appendix 3 - Primer sequences for Bethesda markers and MSI target genes……………… Appendix 4 - Reference List for Table 2: Literature Survey…………………………………… Appendix 5 - PCR conditions for Bethesda marker MSI-testing……………………………… Appendix 6 - Review of MSI target genes…………………………………………………………

2 Summary

Colorectal cancer is one of the most common cancer types, and a leading cause of cancer-related deaths. Cases can be divided into two main molecular phenotypes: those with chromosomal instability (CIN) and those with microsatellite instability (MSI), which occur at a frequency of 85% and 15%, respectively. The focus of this study was on MSI tumors, which have defective mismatch repair systems, and on the multitudinous insertions and deletions in MSI tumor DNA which are the result of this defect. The primary aim of the thesis was to serve as a pilot project for later work on a consecutive, clinically representative tumor series. Firstly, we wished to establish an assemblage of genes which could reasonably be assumed to be targets of mismatch repair deficiency, i.e. that they were more frequently subject to insertions or deletions than comparable sequences. Secondly, we wanted to see whether there were, among the above targets, genes which either singly or in company appeared to affect patient outcome depending on their mutational status. A search of the available scientific literature for genes which had been analyzed for, almost uniquely, frameshift mutations in MSI tumors yielded 162 candidates. Forty- three of these were selected for laboratory analysis. The results provided confirmation of the target gene status for many of these, among them certain well-known genes such as TGFβRII, MSH3, E2F4 and CASP5. The histone acetyl transferase EP300 had never before been examined for this type of mutation in primary tumor DNA, and proved to be a low-frequency, but nevertheless noteworthy, target in MSI colorectal cancers. No significant covariance of genes was found which did not depend on mutation frequency alone. A single gene of intermediate mutation frequency showed a robust association with long- term patient survival. SLC23A2, which encodes a sodium/vitamin C cotransporter, was significantly associated with poor prognosis when mutated, and showed indications of being additionally informative with regard to clinical staging as well.

3

Abbreviations

ATP Adenosine triphosphate BER Base excision repair CDK Cyclin-dependent kinase CIMP CpG island methylator phenotype CIN Chromomsomal instability CML Chronic myelogenous leukemia cMNR Coding mononucleotide repeat CpG Cytosine-phosphate-guanine CRC Colorectal cancer CSC Cancer stem cell DNA Deoxyribonucleic acid GEF Guanine nucleotide exchange factor GJIC Gap junction intercellular communication HNPCC Hereditary non-polyposis coli MMR Mismatch repair MSI Microsatellite instability MSI-H High microsatellite instability MSI-L Low microsatellite instability MSS Microsatellite stability NER Nucleotide excision repair NHEJ Non-homologous end joining NMD Nonsense-mediated decay PCR Polymerase chain reaction RNA Ribonucleic acid UTR Untranslated region

4 Gene Symbols I - Incidentals

ABL Abelson murine leukemia viral (v-abl) oncogene homolog AKT kinase B APC Adenomatous polyposis coli APE Apurinic/apyrimidinic endonuclease ARF Alternative reading frame = p14 BCL2 B-cell chronic lymphatic leukemia/lymphoma 2 BCR Breakpoint cluster region BECN1 Beclin 1 BIRC5 Baculoviral IAP repeat-containing protein 5 BRAF v-raf murine sarcoma viral oncogene homolog B1 BRCA1 Breast cancer gene 1 CCND1 Cyclin D1 CDH1 E-cadherin CTNNB β-catenin ERCC1 Excision repair cross-complementing rodent repair deficiency, complementation group 1 (includes overlapping antisense sequence) ERK Extracellular signal-regulated kinase (MAPK) GSK3B Glycogen synthase kinase 3 beta INK4 Cyclin-dependent kinase inhibitor 2A KIT v-kit Hardy-Zuckerman 4 feline sarcoma viral oncogene homolog KRAS v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog MAPK Mitogen-activated protein kinase MEK Mitogen-activated protein kinase kinase MGMT O6-methylguanine-DNA-methyltransferase MLH1 Mut L homolog 1 MYC Myelocytomatosis viral oncogene homolog PDK1 Pyruvate dehydrogenase kinase, isozyme 1 PI3K Phosphatidylinositol 3-kinase PIK3C3 Phosphoinositide-3-kinase, class 3

5 RASSF1 Ras association (RalGDS/AF-6) domain family 1 RB1 Retinoblastoma 1 SMAD Sma- and Mad (mothers against decapentaplegic)- related protein SP1 Specificity protein 1 (transcription factor) TNFR Tumor necrosis factor receptor TP53 Tumor protein 53 WNT Wingless-type protein

Gene symbols II - Target Gene Names*

ABCF1 ATP-binding cassette 50 AC1 4 open reading frame 6 ACTRII Activin receptor type IIA AD7c-NTP Neuronal thread protein AD7c-NTP AIM2 Absent in melanoma 2 AMYB V-myb myeloblastosis viral oncogene homolog (avian)-like 1 ANG2 Angiopoietin 2 APAF-1 Apoptotic protease-activating factor 1 ATM Ataxia telangiectasia mutated ATR Ataxia telangiectasia and Rad3-related AXIN2 Axin 2 (conductin, axil) BAT1 HLA-B associated transcript 1 BAX BCL2-associated X protein BCL10 B-cell CLL/lymphoma 10 BLM Bloom syndrome BLYM Very hypothetical BLYM-1 proto-oncogene BRCA1 Breast cancer 1, early onset BRCA2 Breast cancer 2, early onset

* HGNC where one exists

6 CANX Calnexin CASP1 Caspase 1 CASP4 Caspase 4 CASP5 Caspase 5 CBL CBL E3 ubiquitin protein ligase CBP CREB binding protein (Rubinstein-Taybi syndrome) CCDC28A Coiled-coil domain-containing protein 28A CCKBR Cholecystokinin-2 receptor CDC25C Cell division cycle 25C CDX2 Caudal type homeobox transcription factor 2 CEPBZ/CBF2 CCAAT/enhancer-binding protein zeta CHD2 Chromodomain-helicase-DNA-binding protein 2 CHK1 CHK1 (checkpoint, S.pombe) homolog CRSP3 Cofactor required for Sp1 transcriptional activation subunit 3 CYSLT1 Cysteinyl leukotriene receptor 1 DD5 E3 ubiquitin protein ligase, HECT domain containing, 1 Doc-1 Downregulated in ovarian cancer 1 DRP Inositol polyphosphate phosphatase-like 1 DSTN Destrin (actin depolymerizing factor) E2F-4 E2F transcription factor 4, p107/p130-binding EIF5 Eukaryotic translation initiation factor 5 ELAVL3 ELAV(embryonic lethal, abnormal vision, Drosophila)-like3 EP300 E1A binding protein p300 EPHB2 EPH receptor B2 ERCC5 Excision repair cross-complementing rodent repair deficiency, complementation group 5 (xeroderma pigmentosum complementation group G (Cockayne syndrome)) F8 Coagulation factor VIII, procoagulant component (hemophilia A) FACE-1 Zinc metallopeptidase (STE24 homolog, yeast) FAS Fas (TNF receptor superfamily, member 6) FLASH CASP8 associated protein 2

7 FLJ11186 Chromosome 14 open reading frame 106 FLJ11222 Meiosis-specific nuclear structural protein 1 FLJ11383 Pecanex-like 2 (Drosophila) FLJ11712 Deleted in leukemia 8 protein FLJ13615 Centrosome protein cep290 FLJ20139 Hypothetical protein FLJ20139 FLT3LG Fms-related tyrosine kinase 3 ligand FTO Fatso GART Trifunctional purine biosynthetic protein adenosine-3 GR6 Chromosome 3 open reading frame 27 GRB-14 Growth factor receptor-bound protein 14 GRK4 G protein-coupled receptor kinase 4 HBP17 Fibroblast growth factor binding protein 1 HDCMA18P La ribonucleoprotein domain family, member 7 hnRNP Heterogeneous nuclear ribonucleoproteins C1/C2 HPDMPK F-box protein 46 HT001 Asteroid homolog 1 (Drosophila) HTF34 Zinc finger protein 93 (HTF34) IDN3 Nipped-B homolog (Drosophila) IGF IIR Insulin-like growth factor II receptor KIAA0092 Centrosomal protein of 57 kDa KIAA0295 Zinc finger protein 609 KIAA0335 Zinc finger protein 518 KIAA0336 GRIP and coiled-coil domain containing 2 KIAA0355 KIAA0355 KIAA0530 Zinc finger protein 292 KIAA0595 Peroxisome proliferative activated receptor, gamma, coactivator-related 1 KIAA0754 KIAA0754 protein KIAA0844 Zinc finger protein 365 KIAA0905 Sec31 homolog A (S. cerevisiae) KIAA0943 ATG4 autophagy related 4 homolog B (S. cerevisiae)

8 KIAA0977 COBL-like 1 KIAA1052 Centrosomal protein 164kDa KIAA1268 Poly (ADP-ribose) polymerase family, member 14 KIAA1333 KIAA1333 KIAA1470 Regulator of chromosome condensation 2 KKIAMRE Cyclin-dependent kinase-like 2 (CDC2-related kinase) MAC30 Hypothetical protein MAC30 MARCKS Myristoylated alanine-rich protein kinase C substrate MAZ MYC-associated zinc finger protein (purine-binding transcription factor) MBD4 Methyl-CpG-binding domain protein 4 MCT4 Solute carrier family 16 (monocarboxylic acid transporters), member 3 MKI67 Antigen identified by monoclonal antibody Ki-67 MLH3 MutL homolog 3 (E. coli) MRE11 MRE11 meiotic recombination 11 homolog A (S. cerevisiae) MRP2 ATP-binding cassette, sub-family C (CFTR/MRP), member 2 MSH2 MutS homolog 2, colon cancer, nonpolyposis type 1 (E. coli) MSH3 MutS homolog 3 (E. coli) MSH6 MutS homolog 6 (E. coli) MYO10 Myosin X NBS1 Nibrin NDUFC2 NADH dehydrogenase (ubiquinone) 1, subcomplex unknown, 2, 14.5kDa NKTR Natural killer-tumor recognition sequence NSEP Y box binding protein 1 OGT O-linked N-acetylglucosamine (GlcNAc) transferase P4HB Procollagen-proline, 2-oxoglutarate 4-dioxygenase (proline 4- hydroxylase), beta polypeptide PA2G4 Proliferation-associated protein 2G4 PMS2 PMS2 postmeiotic segregation increased 2 (S. cerevisiae) POLA Polymerase (DNA directed), alpha PRCC Papillary renal cell carcinoma (translocation-associated) PRKCI Protein kinase C, iota

9 PRKDC DNA-dependent protein kinase catalytic subunit PRKWNK1 WNK lysine deficient protein kinase 1 PRRG1 Proline rich Gla (G-carboxyglutamic acid) 1 PTEN Phosphatase and tensin homolog (mutated in multiple advanced cancers 1) PTHLH Parathyroid hormone-like hormone PTPN21 Tyrosine-protein phosphatase non-receptor type 21 RAB2L ral guanine nucleotide dissociation stimulator-like 2 RACK7 Protein kinase C-binding protein 1 RAD50 RAD50 homolog (S. cerevisiae) RBBP2 Jumonji, AT rich interactive domain 1A (RBBP2-like) RBBP8 Retinoblastoma-binding protein 8 RECQL RecQ protein-like (DNA helicase Q1-like) RFC3 Replication factor C (activator 1) 3, 38kDa RGS12 Regulator of G-protein signaling 12 RHAMM Hyaluronan-mediated motility receptor (RHAMM) RIP140 Nuclear receptor-interacting protein 1 RIS1 Ras-induced senescence 1 RIZ PR domain containing 2, with ZNF domain SEC63 SEC63-like (S. cerevisiae) SEMG1 Semenogelin-1 SEX Plexin A3 SHC1 SHC (Src homology 2 domain-containing) transforming protein 1 SLC17A2 Solute carrier family 17 (sodium phosphate), member 2 SLC23A2 Solute carrier family 23 (nucleobase transporters), member 2 SLC4A3 Solute carrier family 4, anion exchanger, member 3 SPINK5 Serine peptidase inhibitor, Kazal type 5 SREBP2 Sterol regulatory element-binding transcription factor 2 ß2m Beta-2-microglobulin STK11 Serine/threonine-protein kinase 11 SYCP1 Synaptonemal complex protein 1

10 TAF-1B TATA box binding protein-associated factor, RNA polymerase I, B, 63kDa TAN-1 Notch homolog 1, translocation-associated (Drosophila) TAP1 Transporter 1, ATP-binding cassette, sub-family B (MDR/TAP) TAP2 Transporter 2, ATP-binding cassette, sub-family B (MDR/TAP) TCF1 Transcription factor 1, hepatic; LF-B1, hepatic nuclear factor, albumin proximal factor TCF4 Transcription factor 7-like 2 (T-cell specific, HMG-box) TCF6L1 Transcription factor 6-like 1 (mitochondrial transcription factor 1-like TEF4 Transcriptional enhancer factor TEF-4 TFDP2 Transcription factor Dp-2 (E2F dimerization partner 2) TFE3 Transcription factor binding to IGHM enhancer 3 TGFßRII Transforming growth factor, beta receptor II (70/80kDa TLOC1 Translocation protein 1 TPRDI Tetratricopeptide repeat protein 3 TSC1 Tuberous sclerosis 1 TTK TTK protein kinase USP-1 Ubiquitin specific protease 1 UVRAG UV radiation resistance associated gene VRK2 Vaccinia-related kinase 2 WBP1 WW domain binding protein 1 WISP3 WNT1 inducible signaling pathway protein 3 WRN Werner's syndrome XPOT Exportin, tRNA ZFP103 Ring finger protein 103

11 1. Introduction

1.1. The Pathos of the Crab The development of cancer is inextricably linked to aging and, unlike various infectious diseases, must be seen to be inherent to the human – even metazoan – condition. Parasitic pathogens come, change, and go, but tumors, it seems, are eternal – physical evidence of all types of osseous tumors exist in skeletons from the Neolithic period and soft tissue tumors are present in Egyptian mummies.1 Arguably the first written reference to cancerous growths occurs in a quasi- medical Egyptian treatise, which, composed about 5000 years ago, is the oldest surgical text known. The ambiguous word bn·wt, which designates a swelling, can in certain cases likely be equated with what a modern pathologist would call a malignant tumor – the recommended treatment, in so far as any existed outside prayer, was cauterization.2 In later, classical medicine, according to the humoral theory, cancer was caused by an excess of black bile, a view which persisted until the advent of modern medicine in the 17th-18th centuries. However, the essentially devouring and expansive nature of a carcinoma was recognized by Galen as early as the second century AD.3 Although the medical paradigms of antiquity had to give way, the roots of standard modern terminology survived. Καρκινος, καρκινώμα (karkinos, karkinóma) from Greek, and the Latin cognate cancer, both of which mean ‘crab’ and likely refer to the appearance of an epithelial tumor cross section, and ογκος (ongkos), meaning a mass or bulk, are directly inherited from Hippocrates and Galen.3-5

12 1.2. Cancer in the Molecular Age - a genetic and epigenetic disease A major breakthrough – one might say the major breakthrough in the understanding of cancer was the realization that it was, to all intents and purposes, an endogenous, genetic disease. As late as the 1970s, the idea that tumors with few exceptions were induced by tumor viruses had become widespread, but the discovery of cellular oncogenes equivalent to those encoded by viral nucleic acids shifted the focus to non-parasitic processes.6 A hallmark of normal tissue is its essential regularity of structure and growth, and a hallmark of cancerous tissue is that it is quite the opposite; as early as 1890 aberrant cell divisions in carcinomas were observed7 and a gross chromosomal aberration, the Philadelphia chromosome, was linked to chronic myelogenous leukemia in 1960 (see Box 1.).8 There can be two truly basal, internal culprits for the chaotic states endemic to cancer cells. One is the loss of integrity of genetic information, the other is the epigenetic mechanisms which largely control the accessibility to this genetic information; that is, they can grant or withhold permission to transcribe a given segment of DNA.

Box 1. From Genetic Aberration to Clinical Treatment The idea that a chromosomal defect could be instrumental in promoting abnormal proliferation was put forth by Theodor Boveri in 1914, but was not met with universal approval.9 The discovery of the Philadelphia chromosome vindicated him. It is a minute chromosome not found in normal cells, but found in the majority of chronic myelogenous leukemia (CML) cells8. Later it was found that the Philadelphia chromosome was the smaller product of a reciprocal translocation between 9 and 2210, and nearly a decade later the relevant genes were identified.11 One is the proto-oncogene ABL, a protein kinase from chromosome 9, the other BCR from chromosome 22. A pharmaceutical was developed in the early 1990s to target the ABL-BCR fusion protein, which affected only few of the many kinases homologous to ABL, and clinical trials of this drug were started barely ten years ago.6 Imatinib mesylate, commonly known as Gleevec, has a marked negative effect on the cancer cells, but relapses are common. The inhibiting effect of Gleevec on at least one other tyrosine kinase, KIT,

13 which is involved in certain gastrointestinal tumors, shows that the value of the drug is not limited to CML patients.

1.3. Tumoral evolution Tumorigenesis has long been considered a multistage process12, and most cancer types appear to be subject to sequential alterations.13 As colon cancer is the standard example used for models of tumoral progression it will be treated in depth below (section 1.7.). Here, the common ground of the more than one hundred cancer types14 will be examined. Tumors are believed to be typically monoclonal, and to arise through waves of clonal expansion, in which each progenitor cell of one expansion possesses a selective, Darwinian advantage over the rest of the cell mass it originated in (see Fig.1).15 The minimum assessment for how many mutations (and in this sense we must include relevant epigenetic alterations) a tumor must acquire to become a tumor varies. It is generally given in single digits, with an estimated 4-7 rate-limiting steps to explain why cancer development can stretch over decades.14,16 In the course of tumorigenesis cancers are thought to require the acquisition of six sundry capabilities which distinguish cancer cells from those in healthy adult tissue.14 These are: (i) self-sufficiency with respect to external growth signals, or the ability to stimulate neighboring cells to produce these, (ii) insensitivity to antigrowth signals, (iii) evasion of apoptosis, (iv) limitless replicative potential, (v) angiogenic abilities for solid tumors seeing as no cell can be further from a capillary than 100μm, and finally (vi) the ability to invade neighboring tissues and colonize distant organs (metastasis). Though alterations are largely successive in

14 tumorigenesis, it is widely acknowledged that the chronology of events is usually inferior to the occurrence of the events themselves.14,16

The apparent ‘immortality’ of tumors is one of their leading characteristics and, analogous to tissues with a high turnover, stem cells are thought to play a key role in carcinogenesis. These cancer stem cells may arise from normal stem cells – solid tumors develop in organs containing stem cell populations – or from differentiated cells or possibly both.17 A very attractive aspect of a direct normal stem cell to cancer stem cell progression is that the only essential difference in capabilities between them is the invasive ability of the latter.17 A normal stem cell already has such neoplastic-like qualities as high replicative potential and long-term survival. The existence of tumor stem cells is supported by the fact that not all cells in a tumor are capable of seeding a new one, implying that the cancer initiating and propagating abilities lie in a subpopulation of cells. An emerging view of a tumor is that of a distinct, rogue organ perpetuated by its own contingent of cancer stem cells.17,18

Figure 1. Clonal expansion and the role of a cancer stem cell (CSC) Waves of clonal expansion with increasingly fit (and malignant) cells in the redder end of the spectrum; the cancer stem cell initiating and perpetuating the tumoral mass. Figure a) shows the scenario of a cancer stem cell originating from a differentiated normal cell; b) directly from a normal stem cell (SC).

15 1.3. Oncogenes and tumor suppressor genes The two groups of genes, or rather gene products, which are heavily involved in cancer development are conceptually different both in their cellular function and in what types of anomalies affect them in the tumorigenic process. Oncogenes have products which upon deregulation or activating mutation drive tumorigenesis19, i.e. are involved in promoting cellular growth and proliferation, and depressing apoptosis. Strictly speaking, these genes are proto-oncogenes in their normal state, and are only upgraded to full oncogenes when they attain a hyperactive state. Many of the most crucial oncogenes are involved in cellular signaling pathways. Any activating component of a mitogenic signaling pathway can theoretically be a proto-oncogene, such as a growth factor receptor, a cytoplasmic signal transducer (more often than not a kinase) or a transcription factor. Canonical examples of each would be, in the above order, the epidermal growth factor receptor (EGFR), the of the RAS family and several of their downstream effectors, and the MYC transcription factors.6 Tumor suppressor genes, in contrast, do exactly what their name implies. In their normal state they inhibit such processes as oncogenes promote, and a loss of functional tumor suppressor genes releases an important brake on cellular expansion. Their existence was first suspected through normal cell/cancer cell fusion studies in which it was discovered that the cancer phenotype could be recessive as opposed to the largely dominant way in which an oncogene would act. Based on an epidemiological study of retinoblastoma cases. Alfred Knudson fronted his by now famous “two-hit hypothesis” 20, which postulates that two mutational events are required in the same cell, each one compromising a different allele of a tumor suppressor gene, to eliminate tumor suppression. In the case of the retinoblastoma protein (pRB), it is indeed so that a single functional copy is sufficient to inhibit improper progression through the cell cycle (see chapter 1.4). Tumor suppressor genes can nonetheless also be subject to haploinsufficiency, one of which is the unequalled tumor suppressor gene TP53.21,22

The above paragraphs have dealt with genetic mutations: classical point mutations, insertions and deletions, and chromosomal alterations. Oncogenes and tumor suppressor genes are equally susceptible to changes in levels of transcriptional

16 expression, ranging from a fully permissive state to complete silencing, by epigenetic mechanisms. DNA methylation, i.e. the addition of a methyl group to cytosine, is the most thoroughly researched of these. In short, DNA methylation is chiefly correlated with lack of gene expression, whilet unmethylated DNA is associated with active transcription. The main route by which DNA methylation represses the possibility of transcription is by the recruitment of histone methyl transferases and histone deacetylases which combine to induce tighter chromatin packing, thus limiting transcription factor accession to DNA.23 The occurrence of hypomethylation and activation of oncogenes and of promoter hypermethylation and silencing of tumor suppressor genes can be as involved in carcinogenesis as any comparable DNA lesion.24-26 In addition, a cancer cell gene genome as a whole is hypomethylated as compared to normal tissue, which appears to have a positive effect on tumorigenesis as well as being linked to chromosomal instability (see below).24,25

1.4. DNA repair systems As the integrity of genomic information is so crucial to the proper functioning of an organism, and genomic instability is often cited as a prerequisite for cancer development (see below), a cell has a number of systems for fixing lesions in its DNA. Many types of DNA damage or errors occur during the synthesis of DNA before mitosis; these are simple incorporations of non-complementary bases, leading to a point mutation, or a slippage event causing a frameshift mutation. Post-replicative events can be divided into three main types: DNA adducts of bulky molecules, cleavage of the bond between the deoxyribose and the base creating abasic sites, and chemical changes in a base which will lead to mispairing in the subsequent round of replication and thus a point mutation. As DNA mismatch repair is inseparable from the cancer phenotype which is the focus of this thesis this DNA repair system will be treated in some detail. Components of the double-strand break repair interact with certain mismatch repair proteins, and several of these components are found mutated in colorectal tumors with mismatch repair deficiency; therefore, the double-strand break repair system will also be more thoroughly

17 explored. The remaining DNA repair mechanisms will be described briefly at the end of the section. Of note, many of the mechanistic aspects of the mammalian repair systems have been extrapolated from the discoveries made in homologous systems in prokaryotes and simple eukaryotic model organisms such as yeast.

1.4.1. DNA Mismatch Repair The targets of DNA mismatch repair (MMR) are primarily insertion or deletion loops in the DNA caused by replication slippage (see 1.5.1.2. below); simple incorporation of an incorrect base may also be recognized, but the proofreading mechanisms of DNA polymerases are thought to be much more important in such cases.27 Eukaryotic MMR proteins are homologs of the prokaryotic systems, whence the MSH (MutS Homolog) and MLH (MutL Homolog) nomenclature. The protein which initially recognizes a mismatch in the DNA is an MSH2* heterodimer, partnered by either MSH3 or MSH6 – specificity factors for loops of 2-8 nucleotides or loops of single nucleotides and mismatches, respectively. The MutS heterodimer will attach itself to a mismatch while carrying a bound molecule of ADP. When the ADP is exchanged for an ATP, MSH2:MSH3/6 transforms into a sliding clamp. It is likely that MutS recognizes the nascent, and per definition mistaken, DNA strand by the existence of the single strand nicks inherent to Okazaki fragments. The entire DNA-MutS-ATP complex recruits another heterodimer, MLH1:PMS2, which in turn displaces the main processive DNA polymerase, DNA pol δ, and its sliding clamp PCNA from the DNA. The MutL dimer also recruits exonuclease 1 and associated proteins to remove a long stretch of DNA (up to 1kb) containing the error. One of the numerous eukaryotic DNA polymerases, possibly DNA pol δ, resynthesizes the excised stretch.28,29

* In order to highlight the main subject of this thesis (the target genes of MSI colorectal cancer) the target genes that have been subject to laboratory analysis will be marked in bold the first time they appear in the introductory text.

18

Figure 2. A simplified overview of the MMR process

Both the MutS and the MutL homolog groups have one primary dimerization partner each; MSH2 and MLH1. As much as 90% of MSH2 molecules are complexed with MSH6, and MSH3 does not appear to be critical to functional MMR, but neither MSH3 nor MSH6 would have any function at all in the absence of MSH2.29 The known MLH1 partners are PMS2, PMS1 and MLH3, but the two latter are not known to have any function in MMR. Interestingly, the expression of PMS2 is downregulated in the absence of MLH1, as are MSH3 and MSH6 in the absence of MSH228, confirming MSH2 and MLH1 as master movers of the MMR system.

1.4.2. Double-strand Break Repair Double-strand breaks can result from direct fracture of the DNA duplex or from single strand breakage at a replication fork. The former type of lesion is normally mended by non-homologous end joining, while the latter is most accurately repaired by homology-directed repair from a sister chromatid.30 Non-homologous end joining (NHEJ) is a simple fix-it mechanism, but as it cannot distinguish between chromosomal sections which belong together and which ones do not, the intended repair can cause translocations. In NHEJ, a Ku70/80 heterodimer

19 binds each free DNA end, to be joined by PRKDC (DNA-PKcs) which attracts the ligase IV complex to seal the ends. A complex consisting of MRE11, RAD50 and NBS1 may enable a stable juxtaposition of DNA ends in situations where these cannot be directly ligated.30

Figure 3. Double-strand break repair, a model for repair mediated by the MRE11/RAD50/NBS1 complex and DNA-PK; for the role of double-strand breakage and ATM/ATR in the cell cycle and apoptosis, see section 1.4. (Adapted from Assenmachner & Hopfner 2004, and Helleday 2007 30,31)

Homology directed repair involves using the undamaged sister chromatid sequence as a template for DNA synthesis beyond the break point, and all subpathways of this type of recombination repair are facilitated by the MRE11/RAD50/NBS1 complex. While the subtypes are too numerous to detail, it is worth noting that MSH2:MSH6 and BLM bind to, and appear to modulate, Holliday Junctions. BLM in particular may be involved in their resolution. Gap filling and nick ligation are thought to be performed by the standard accoutrement of proteins for most repair (DNA pol δ, PCNA, DNA ligase).30

20

It is important to note that the above outlines give an exaggerated impression of the isolation of any given system, and in truth there is a certain amount of interplay between them, at least on the level of protein components.29,32 MRE11, for example, may be involved in 3’ nick-directed MMR with MLH1.33 Also, there are indications that MMR proteins, notably MSH2, are mechanistically involved in processes removed from the repair of DNA such as cell cycle progression and mitotic spindle organization.34,35

1.4.3. Direct Repair Few types of DNA damage are directly reversible. In humans, the notable case is removal of alkyl (methyl) groups from O6-methylguanine to avoid O6-metG:T mispairing, by O6-methylguanine-DNA-methyltransferase (MGMT). Despite its name, MGMT is not an actual enzyme in that it transfers the methyl group to itself in a covalent and irreversible manner.36

1.4.4. Nucleotide Excision Repair Nucleotide excision repair (NER) removes bulky, helix-distorting adducts such as pyrimidine dimers or polycyclic hydrocarbons which are induced by ultraviolet radiation and chemical mutagens. The main repair machinery consists of a large, multisubunit protein complex which removes a short stretch of about thirty nucleotides in the area of the adduct before the proofreading polymerases δ and ε resynthesize the patch. Key proteins of human NER are XPA, XPD, XPF, XPG and ERCC1, the “XP” nomenclature pertaining to the skin disease xeroderma pigmentosum, in which NER is faulty and sufferers are particularly sensitive to ultraviolet radiation.29,37

1.4.5. Base Excision Repair Base excision repair (BER) primarily removes chemically altered bases which have little helix-distorting effect, such as deaminated, oxidized or alkyalted bases, but it is also involved in repairing certain specific mismatches, some of which may not be substrates for the mismatch repair system. An altered base is cleaved from its deoxyribose by a DNA glycosylase, before the abasic pentose is removed by APE

21 (apurinic/apyrimidinic endonuclease) 5’ and AP lyase 3’. This leaves a single nucleotide gap to be filled by the error prone DNA pol β in short patch repair or pol δ/ε in long patch repair, before the remaining nick is sealed by a ligase.29,38

1.5. The Cell Cycle and Apoptosis Many of the genes which are mutated, or over- or under-expressed in tumor cells are involved in cell cycle control and by extension proliferation, and the stubborn evasion of apoptosis is one of the most problematical features of said cells.

The cell cycle is divided into four phases; growth phase 1 (G1), synthesis (S),

growth phase 2 (G2), and mitosis. Cells enter the cycle at G1 from G0, the quiescent or resting state, following mitogenic signaling from positive growth factors such as EGF and IGF. Signal transduction cascades, notably in the MAPK and PI3K pathways (see section 1.7.1.), lead to the initiation of G1 and expression of approximately 100 genes. The signaling must be upheld throughout most of the phase, else the cycle will be aborted 39 and the cell will return to G0. Key to cell cycle control are the cyclins and their partner cyclin-dependent kinases (CDKs), with each CDK-cyclin complex appearing in a given order (see Fig.4). Cyclins D and E and their CDKs phosphorylate the tumor suppressor pRB, precipitating the release of activating E2Fs and, through the activation of genes

with E2F-responsive promoters, the transition from G1 to S phase. Cyclin E appears to be crucial in proceeding past the point of no return par excellence, the restriction point, and can be found upregulated in cancers.40 There are several checkpoints in the progression of the cell cycle, whose raisons d’être are to alert the cell to irregularities of mitosis, sister chromatid separation and DNA fidelity, and in these cases arrest or retard the cell cycle, leaving time for DNA repair or initiating apoptosis41 (see Fig. 4 for proteins involved in these checkpoints, either as effectors of transducers).

22

Figure 4. Cell cycle: R – restriction point; 1- DNA damage checkpoint 1; 2 – replication checkpoint; 3 – intra-S replication checkpoint; 4 – DNA damage checkpoint 2; 5 – mitotic checkpoint. Proteins involved in these checkpoints are in yellow, proteins promoting cell cycle progression are in green, and those in red are involved in retarding or preventing cell cycle progression

Apoptosis, i.e. regulated cell death, is the self-destruction of a single cell following an extracellular ‘death’ signal or excessive intracellular damage. Apoptosis and proliferation are linked, not in a mechanistic fashion, but rather through a web of control pathways, with no single pathway able to trump through any given cellular response.42 Cell death can be induced through two separate pathways, the exogenous death receptor pathway and the endogenous mitochondrial pathway.43 The first relies on the activation of receptors such as FAS and TNFR (tumor necrosis factor receptor) by their extracellular ligands to effect a caspase cascade which ends in the destruction of the cell.

23 The second pathway is primarily activated by DNA damage and other events which activate p53, such as hypoxia, cellular stress and oncogenes42,44. The response to double- strand DNA breaks is detailed in the upper half of Figure 5. Double-strand breaks are sensed by the ATM and ATR proteins, kinases which phosphorylate a number of targets including p53. Two tumor suppressive proteins that are transcriptional targets of p53 are the cyclin-dependent kinase inhibitor p21 which causes cell cycle arrest (see Fig. 4), and the pro-apoptotic BAX protein. BAX functions by entering the mitochondrion and permeabilizing its membrane to release sequestered effectors, which in turn initiate the aforementioned caspase cascade leading to cell skeleton and chromosome degradation, membrane disruption, and absorption of these remains by surrounding cells.45,42,43

Figure 5. Apoptosis pathways activated by DNA damage or extracellular signaling

24 1.6. Mutator Phenotypes Another significant matter related to tumor evolution is whether or not the mutation rate inherent in normal cells is great enough to account for the accumulation of neoplastic alterations, even over a time span of decades. Also, assuming a normal mutation rate were enough, in an unlucky individual, to promote cancerous properties, is it enough to account for the surfeit of alterations in cancer cells? These questions sparked the idea that cancers, at least by and large, require a mutator phenotype to drive their progression.13,15,41,46,47 There are arguments against such a phenotype being a tumorigenic prerequisite. A normal mutation rate can be enough to generate the required mutations in cancer, especially when coupled to clonal expansion and an increased proliferation rate.13,48 Also, the presumed driving instability cannot always be found in all the cells constituting a tumor.47 However, the rampant abnormalities of most cancers, be they at chromosomal, sequence or single nucleotide level, does strengthen the case for a general flaw in the maintenance of genomic stability rather than simply alterations ‘directed’ at oncogenes or tumor suppressor genes. Nor would a mutator phenotype be excessively difficult to induce by few mutations, as nearly a third of human genes are involved in preserving genomic integrity.13,41 Types of instability are likewise an area of uncertainty, with a chromosomal instability (CIN) versus microsatellite instability (MSI or MIN) dichotomy commonly being applied, and widespread instability at the single nucleotide level is also touched upon however hard it is to demonstrate.46,47 Epigenetic ‘mutator’ phenotypes may be equally valid for inducing extensive changes in gene expression but, as for all mutator phenotypes barring MSI, firm evidence of both existence and mechanistic aspects is wanting.47,49

1.7. Colorectal cancer Colorectal cancer (CRC) is, when gender-specific tumor types are excluded, the most common neoplastic disease, affecting nearly one million new people worldwide each year50,51, and around 3500 new cases in Norway alone.52 It is also one of the leading causes of cancer-related deaths.53 The ailment is most common in elderly people in

25 developed countries, and risk factors include smoking, a high-fat diet, and a sedentary lifestyle.51 Colorectal cancers are often classified according to the Dukes’ system, named after an early twentieth-century pathologist, whose goal was to provide a convenient system for comparison and prognosis of rectal cancers. This type classification, however, did not originate with him, and it has also been reworked several times.54 One of the less involved descendants of Dukes’ tripartite A-B-C categorization is Whittaker and Goligher’s54,55: A – tumor is confined to the wall of the rectum B – tumor is extrarectal, and shows invasion of the submucosa, muscularis propria C – spreading to regional lymph nodes D – extensive local spreading or metastasis (Though these classes were initially developed for rectal cases they may be applied to all intestinal carcinomas.) Tumor staging, even on the crude Dukes’ scale whose very simplicity is nevertheless one of its main benefits, remains the most important prognostic predictor of colorectal cancers.56-58 Dukes’ A tumors are usually the most favorable, Dukes’ D are generally fatal, and B-C are of intermediate and more uncertain prognosis.56,58 There are genetic markers which have been shown to have a prognostic impact (see below), but the value of these is still subservient to the extent of tumor invasion, and may serve mostly to subdivide the Dukes’ categories. Tumors of the colon are thought to originate in colonic crypts, in stem or progenitor cells.16,59,60 The crypts house these stem cells which replenish the epithelial sheet of the colon, which has a very high cellular turnover. This naturally elevated proliferation is seen as predisposing the tissue to cancer; however, the small intestine likewise has a high turnover rate but rarely gives rise to tumors.13,61 An early key molecular step in colorectal carcinogenesis is often, if not always, the constitutive activation of the Wnt signaling pathway, most frequently by mutation of the APC ‘gatekeeper’ gene which is mutated in up to 80% of CRCs.16,62,63 In fact, APC loss may be the only required step in the initiation of an adenoma.64 Mutant APC, or an equivalent

26 activating mutation in β-catenin, prevents cell migration out of the crypt, so ‘trapping’ these highly proliferative cells to create an adenomatous crypt (see Fig. 6). 65,66

Figure 6. Colonic crypt Showing two halves in which one is wild type and the other has aberrant Wnt signaling caused by mutated APC; the latter showing a precursor lesion. (Figure from The Biology of Cancer6, © Garland Science)

Most colon cancers appear to develop from adenomas, though only a small percentage of adenomas actually become malignant.60,67 A second frequently seen early molecular change in colon tumor initiation is the mutation of KRAS or BRAF16,68, oncogenes activating the MAP kinase pathway. Other typical alterations in the progression of colorectal tumors are illustrated in Figure 7. These include the loss of genetic stability or tumor suppressor genes by mutation or epigenetic silencing.

27

Figure 7. The Adenoma-Carcinoma sequence (adjusted) – Alterations relating to CIN tumors in grey; in MSI tumors in black; alterations common to both types in pink; photographs from gihealth.com

Two general types of colorectal cancer are currently recognized, typified by the kind of genetic instability, or mutator phenotype, they exhibit (see section 1.6). The majority of tumors, around 85%, belong to the chromosomal instability group (CIN) which is characterized by extensive aneuploidy that is thought to be a result of this instability.69-71 The second group, separately identified much later72-74, displays instability at microsatellites (MSI) and is normally diploid or nearly so. MSI tumors have a higher degree of lymphocyte infiltration and poorer differentiation than microsatellite stable (MSS) tumors – which are by and large CIN – and have a predilection for proximal location in the colon.72-79 Also, apoptosis is more frequently seen in MSI than MSS neoplastic cells.80 Patients with MSI tumors have a noticeably improved prognosis with respect to those with CIN, by as much as 15%.74,76,77,81 Some of the main differences between MSI and CIN cancers are illustrated in Figure 8.

28 The preferential right-sided (proximal) location of MSI tumors may be linked to the embryological provenance of this section of the colon. Approximately two-thirds of the way across the transversum the colon switches from being derived from the embryonic midgut to the embryonic hindgut.82 These two sections have a different blood supply, and different metabolism, gene expression and antigenic profiles.82-84 As seen in Figure 7. CIN tumors often display loss of chromosome arms which contain tumor suppressor genes, such as TP53 on 17p, while MSI tumors lack such gross aberrations, and have a higher incidence of singly mutated genes. Loss of 17p/TP53 and 18q in CIN cancers has been correlated with inferior clinical outcome85-92, while none of the common non-causative MSI-specific mutations have been shown to significantly affect patient survival.93-97 All of these traits indicate the existence of a dichotomized molecular path to colorectal cancers, but there is a third pathway, the CpG island methylator pathway (CIMP), proposed on the grounds of promoter hypermethylation of multiple genes in the same tumor.98 The existence of this pathway, however, is a matter of contention, and due to the existence of promoter hypermethylation in the two ‘classical’ pathways one would also have to deal with the question of significant pathway overlapping should CIMP be admitted.49,99 Another, histologically different, pathway proposed as an adjunct to the canonical adenoma-carcinoma sequence is the serrated pathway which is characterized by CpG island methylation and microsatellite instability (Fig. 7, top).100-102

29

Figure 8. The Colon – a) Highlighting the differences between the proximal and distal portions, and the MSI-CIN cancer types; b) Showing the clinicopathological characteristics of certain tumors from a consecutive series (see 2.1.1; Fig.8b courtesy of T. Ahlquist).

1.7.1. Signaling pathways critical to colorectal carcinogenesis In reference to Figure 7, which shows several of the most commonly altered genes in colorectal cancer, the basic features of the pathways to which many of the said genes belong will be canvassed below.

The TGFβ pathway - TGFβ is a cytokine which affects different tissues in different ways. Its function as a tumor suppressant in colonic epithelium is due to the induction of cell cycle and apoptosis. It exerts its influence through two homodimers of

30 TGFβ receptors I and II, transmembrane serine-threonine kinases which phosphorylate SMAD2/3, thereby releasing it from cytoplasmic anchors. The free SMAD2/3 then complexes with SMAD4, enters the nucleus and modulates the expression of over 300 target genes. Among these are the cyclin-dependent kinase inhibitors p15 and p21. In addition, the SMAD complex associates with E2F4 to repress the transcription of the oncogene MYC, thus increasing the anti-proliferative effect of TGFβ-signaling. TGFβ can also induce apoptosis, though the exact components of this system remain unidentified. TGFβRII and SMAD4 are both recognized as tumor suppressor genes in colorectal cancer, the former in MSI and the latter in CIN phenotypes. However, should the TGFβ pathway survive uncorrupted through the early stages of a carcinoma, it can later promote the tumor through encouraging angiogenesis and enabling metastasis.103-105

Figure 9. The TGFβ pathway Showing both the transcriptional activation and repression effects of SMAD2/3 and cofactors.

31 The Wnt pathway – The Wnt signaling pathway is one of the most important morphogenetic pathways, and it is activated by the Wnt protein through a heptahelical transmembrane receptor called Frizzled. Active Wnt signaling allows β-catenin to escape a sequence of phosphorylation, ubiquitylation and destruction, and to accumulate in the cytoplasm. The stabilized β-catenin then translocates to the nucleus, where it may complex with a TCF transcription factor to effect the transcription of Wnt target genes. These include such positive cell cycle regulators as MYC and cyclinD1 (CCND1), a fact which at least in part explains why constitutively active Wnt signaling is common in cancers.106 The proteins that, in the absence of Wnt keep the cytoplasmic concentration of β- catenin low, are the kinase GSK-3β, which marks β-catenin for proteolysis, and the scaffold proteins AXIN2 and APC.106 A great many components of the Wnt pathway can be implicated in colorectal carcinogenesis; some, such as APC and β-catenin, have been treated above, others such as AXIN2 and TCF4 will be considered in the following chapters.

Figure 10. The Wnt pathway Showing the escape of β-catenin to the nucleus in the presence of Wnt.

32 The MAP kinase pathway – The mitogen-activated protein kinase pathway is activated upon the binding of a signal molecule to receptor tyrosine kinases in the cell membrane. These receptors then autophosphorylate, and by adaptor proteins activate a RAS guanine nucleotide exchange factor (GEF) which stimulates RAS to exchange a bound GDP for a GTP. This leaves RAS activated and in a position to initiate a phosphorylation cascade through RAF Æ MEK Æ ERK, the final one being the eponymous protein of the pathway, a MAP kinase. This pathway initiates a wide range of changes, both in gene expression and protein activity. Key to its role in carcinogenesis, one of the most important effects of MAP kinase activation is cellular proliferation and differentiation.107,108 The MAP kinase pathway can also receive inhibitory signals, exemplified by the action of the activated ephrinB2 receptor (EPHB2) in Figure 11.100

In the context of the MAP kinase pathway one may mention the phosphatidylinositol 3-kinase (PI3K) signaling pathway. It can also be activated by receptor tyrosine kinases, or through RAS-GTP. Both these mechanisms activate PI3K to phosphorylate phosphatidylinositol (4,5) bisphosphate (PIP2) to phosphatidylinositol

(3,4,5) triphosphate (PIP3). PIP3 phosphorylates AKT through PDK1. Signaling downstream of AKT is multipartite, but, like MAPK signaling, the overall effect is of cell proliferation and depression of apoptosis. Deactivation of the pathway is mediated by 109 PTEN, a phosphatase which converts PIP3 back into PIP2. Mutation of PTEN therefore results in the constitutive activation of the PI3K pathway, and this gene is mutated in nearly one fifth of colorectal tumors.110,111 PTEN has an additional tumor suppressive function in its involvement in the Wnt pathway, where it contributes to keep the level of free β-catenin low.112 After TP53, PTEN is the most commonly mutated gene in human cancers.113

33

Figure 11. The MAP kinase pathway -includes cross-talk to the PI3K pathway; RTK – receptor tyrosine kinase

(Note that only the transcriptional effects of each pathway are shown in Figures 9-11.)

Finally, I would repeat the caveat from section 1.4, that the ‘effects’ of these pathways are gross oversimplifications, that few if any pathways do not induce self- opposing effects, and that it is the cumulative weight of signaling for any given response which tips the scales in favor of that response.

34 1.7.2. Colorectal cancer with microsatellite instability Cancers with microsatellite instability, colon cancer being prime among them, have the distinction of being the only ones with a well-elucidated mutator phenotype. As with much of the corpus of molecular knowledge about colon cancers, a good deal of what is known of MSI tumors comes from the study of a hereditary syndrome, in this case hereditary non-polyposis colorectal cancer (HNPCC). Nevertheless, in light of the fact that the hereditary cases constitute a minority, where there are discrepancies between hereditary and sporadic types it is the situation pertaining to sporadic cases which is presented below.

1.7.2.1 MMR deficiency and replication slippage The underlying cause of the MSI phenotype and most of the mutated genes in MSI cancer is defective mismatch repair. In sporadic CRCs the most common defect is promoter hypermethylation of MLH1 which effectively silences the transcription of the gene, though some tumors are due to mutant MSH2.114-120 With reference to chapter 1.4.1 it may be seen that a functional elimination of either of these two gene products will abolish MutS/MutL homolog mismatch repair. As there are other mechanisms which can remove certain single mismatches and some point mutations are synonymous or of marginal functional significance, the dramatic impact of lost MMR is perpetuation of insertions and deletions. Insertion and deletion loops are caused by slippage during replication. Replication slippage can occur at microsatellites, simple sequence repeat stretches of DNA with a repetitive unit of 1-6 nucleotides.27 The error is introduced when there is a transient, local dissociation of the nascent and parental DNA strands in a microsatellite and the two strands subsequently undergo re-annealing between misaligned repeat units.121-123 The result of this is a lengthening or shortening of the daughter strand. The liability of different kinds of microsatellites to replication slippage varies greatly, and as a general rule the shorter the repeat unit and the longer the repeat, the more mutable the microsatellite is. For the homopolymer tracts, those longer than 7 base pairs are considered much less stable than those of seven or fewer units.27 In addition to slippage

35 as the origin of insertions and deletions, unorthodox types of recombination have been suggested, though this is speculative.124 As the MMR deficient phenotype targets microsatellites it is worthwhile to examine where in the they are found and what kind of functions they can have. Microsatellites are unequally distributed in the genome, with most being in non- coding areas, probably due to selection against easily disrupted sequences.27 Nevertheless, many proteins contain microsatellites in their coding sequences.125 The commonest types of repeats are A/T mononucleotide tracts, a feature particular to primates as dinucleotide runs are most common in most other lineages.126 Functional aspects of non-coding microsatellites may include regulating transcription, either by providing protein binding sites in upstream promoter elements or by sequence-dependent conformational effects on DNA. Insertion and deletion events in these non-coding tracts can also affect their function, though perhaps not in as dramatic a manner as frameshift mutations.27

1.7.2.2. Other non-slippage-induced alterations in MSI tumors Although such mutations as are caused by mismatch repair deficiency are rife they are by no means the only types of alterations that occur in MSI tumors. Methylation of MLH1 and point mutations in APC, KRAS and BRAF have been mentioned above. Point mutations in TP53, while much rarer than in MSS tumors, do occur fairly frequently.41 Gene silencing by hypermethylation is also common in MSI tumors, as intimated by the coalescence seen between MSI and the putative CpG island methylator phenotype. Among the methylated genes are the tumor suppressors ARF (p14) and INK4a (p16) and others whose products have a conceivable tumor suppressive effect: EPHB2, RASSF1 and CDH1, encoding E-cadherin.111,127-131

1.7.2.3. Concepts of downstream MSI target genes In theory, the random nature of mutation would mean that each microsatellite has the same likelihood of being hit, provided there are no sequence-dependent structural features that affect the basal replication error rate. So, given MMR deficiency, there will be a background rate of microsatellite mutation, whereas those short tandem repeat-

36 containing genes that are truly involved in tumorigenesis should be found to have a significantly higher mutation rate.79 The background level has been estimated to be in the area of 10-15%.132,133 It is also generally assumed that the carcinogenetic potential of MSI target genes depends on said genes having the relevant oligonucleotide repeats in a coding sequence, though microsatellites in introns near exon boundaries and in UTRs have also been examined for elevated mutation frequencies due to their putative roles in splicing and transcriptional regulation134, respectively. Several schemes have been put forward for the definition and classification of target genes of mismatch repair defective cancers, initially by the National Cancer Institute meeting of 199769 and subsequently by Duval and Hamelin, and Woerner et al.135,136 According to the NCI criteria, a true target gene must (i) have a high frequency of inactivation, (ii) be subject to biallelic inactivation, (iii) take part in a defined growth suppressor pathway. In addition, (iv) the same growth suppressor pathway as above must exhibit inactivation in MSS tumors, and finally (v) the gene must be validated by functional suppressor studies in in vivo or in vitro models. These criteria have been criticized as excessively narrow137: For (ii), biallelic inactivation need not be a requirement in the case of haploinsufficiency, c.f. the dearth of biallelic TAF1B inactivation.138 To be truly useful, (iii) and (iv) would have to entail complete knowledge both of all possible growth suppressor pathways and of every gene or pathway involved inmismatch repair proficient cancers, and (iv) includes the assumption that the molecular pathways to MSS and MSI tumors are essentially equivalent, which is not necessarily the case. Nor do all acknowledged target genes participate in growth suppressor pathways, most notably MSH3/6.135 As regards criterion (v), such studies are lacking for most target genes, and furthermore, it is unsuitable for several types of potential target genes. The transformed phenotype, for example, will not be reversed in the event of reintroduction of a wild-type mutator target gene to a system137, and a gene with no known functional significance in tumor progression may yet be of prognostic clinical significance. Among the genes which have been subject to functional studies are TGFβRII, BAX, AXIN2, ACVR2, RIZ, and E2F4.94,139-143 Duval and Hamelin135 proposed a fourfold, functional classification of affected genes into survivor genes, hibernator genes, cooperator genes and transformator genes.

37 Survivor genes encode vital products, whose inactivation should exert a negative selection pressure. Hibernator genes are non-vital and down-regulated, and should have a mutation rate in the background range. Cooperator genes designate sets of genes with the same terminal effect, e.g. promotion of apoptosis, which have a synergistic effect without any one gene requiring a high mutation frequency. Transformator genes are those which upon mutation independently confer a selection advantage to the cells concerned, and therefore should have the highest mutation frequency. Genes in these categories are thought to be mutated in a preferential order, with the transformator TGFβRII being among the earliest.132,144 The statistical regression model presented by Woerner et al.136 takes into consideration the fact that longer repetitive tracts are more mutable, i.e. the background rate for them is higher, and a gene with a mutation frequency above the 95% prediction interval for any given repeat length is considered a real target gene. TGFβRII, BAX, TCF4, MSH3, ACVR2, PTHL3, HT001 and SLC23A2 are by this method considered genuine positive targets for MSI colorectal cancer, while the authors acknowledge the inapplicability of the model as regards target pathways, c.f. cooperator genes above. Due to the difficulties of implementing clear-cut qualitative criteria (functional aspects of single genetic products and their interactions are often insufficiently elucidated, likewise signaling pathways and cascades) it has been most common to use the unmanipulated mutation frequency as the primary or even sole criterion for target gene detection and to treat any involvement of a frequently mutated gene in, e.g. apoptosis or cell cycle control as a bonus. Another potentially complicating factor is that, however detrimental frameshifts usually are, mutations in the repetitive tracts of target genes do not invariably cause complete inactivation. This appears to be the case for AXIN2, where the mutated product is more stable than wild type and may have a dominant negative effect140, and the mutated isoform of TCF4.145 Both mutated gene products encourage inappropriate Wnt signaling activation, which is cancer promoting.106 The very fact that so many different mutational constellations exist suggests that there are few, if any, truly key genes for carcinogenesis among the target genes, TGFβRII being the only one to have been accorded such status.135,138,146 Rather, the cumulative effect of many different and interchangeable mutations may drive tumorigenesis, with

38 very few of the total being decisive in themselves.135 Nor is it likely that all the relevant microsatellite-containing genes have as yet been tested for mutations in MSI tumors, or even necessarily characterized in any DNA sequence database. Several studies have used genome-wide sequence database searches for genes containing cMNRs as a basis for potential target gene selection.138,146-149 Such a search currently yields well over one thousand protein coding genes containing the most promising (N)≥8 repeats, the figure rising more than ten-fold when the range is expanded to include (N)6-7. Scanning for monorepeats in human genes, courtesy of Torbjørn Rognes, was conducted as follows: The 41,030 coding sequences in the transcripts of 20,484 human protein-coding genes were downloaded using the BioMart service at www.biomart.org 150 on 13 March 2007. A Perl script was written to scan the sequences for repeats. For each gene, only the longest coding sequence was considered. The script identified all mononucleotide repeats of length six and over, and also produced summary information about the repeats in each gene (longest repeat, number of repeats, sum of length of repeats). The pool of potential target genes would be fuller inflated should one take into account non-coding repeats. At least one gene, MRE11, has a frequently mutated intronic repeat which causes aberrant splicing of the pre-mRNA.151

1.7.2.4. Nonsense-mediated decay and immunogenicity in MSI cancers Nonsense-mediated decay (NMD) is a system which targets mRNAs with premature stop codons for degradation, and is important in hindering the production of abnormal proteins which may be toxic by gain-of-function of dominant negative mutations. NMD is induced by termination codons 5’ of any exon-exon boundary in the mRNA.107,152 As the frameshift mutations caused by MMR defects are liable to create targets for NMD the latter is suggested to be involved in MSI tumorigenesis.153 There is a significant amount of degradation of MSI target mRNAs, TGFβRII, MSH3 and GRK4 being among them, whereas TCF4, due to the frameshift’s 3’ location, and BAX are not.154 A phenomenon which can result from the successful translation of a frameshift- affected mRNA is the creation of carboxyterminal neopeptides. These can then be presented on the cell surface and provide antigens for recognition by the immune

39 system.155 TGFβRII, OGT and CASP5 have all been shown to produce epitopes which can be recognized by cytotoxic T lymphocytes.155-158 That MSI tumor cells can be recognized and destroyed by the immune system may contribute to the improved prognosis MSI patients enjoy. However, the mutation of β2m, involved in antigen presentation and itself a target of MMR deficiency (see Table 2), allows numerous MSI tumors to escape immunosurveillance.159

1.8. Objectives The main goals of this project were to investigate the representativity of putative target gene mutation frequencies in the corpus of cancer literature as compared to a single large cohort of tumor samples, and to identify among these target genes, or among combinations of them, potential prognostic molecular markers. The continuation of the project will include an attempt to replicate the findings in a separate, clinically representative series of MSI colorectal tumors.

40 2. Materials and Methods

2.1. Materials Two series of colorectal carcinomas were involved in this project: one to function as a training set (see 2.1.1 MSI series), and another to furnish a test set of MSI samples (see 2.1.2 AUS series) in which to assess the impact of the various target genes of MMR deficiency.

2.1.1. MSI series 144 tumor samples from Scandinavian hospitals had previously been selected, being typed as MSI. The Norwegian cohort of 83 samples was from an unselected series of primary tumors collected between 1987 and 1989 in the Oslo/Akershus region. Fifty- six Swedish samples and six Danish samples were donated from collaborators A. Lindblom, X.F. Sun and G. Bardi. Depending on the place of origin, different microsatellite markers had been used to assess instability; for the Norwegian samples, BAT25, BAT26 and 19 dinucleotide markers were used76,85,160, BAT25 and BAT26 for the Danish and Swedish samples. All samples in the series were therefore retested to ensure consistent MSI classification (section 2.2.2.). All samples are from sporadic carcinomas, as far as is known.

2.1.2. AUS series An unselected prospective series of ~950 colorectal tumor samples were collected between 1993 and 2003 at Aker University Hospital in Oslo. Samples were formalin- fixed and embedded in paraffin. All tumors were subject to a formal resection and extensive clinical data for each patient was kept.

2.1.3. Cell lines DNA from MSI cell lines has often been used in the scientific corpus as a proxy for DNA from primary MSI-H tumors. For this reason nine MSI colorectal cancer cell lines were included in our analyses; Co115, HCT116, HCT15, LoVo, LS174T, RKO, SW48, TC7 and TC71.

41

2.2. Methods 2.2.1. DNA isolation DNA from the Norwegian samples of the MSI series was previously extracted using a 340 A Nucleic Acid Extractor (Applied Biosystems, Foster City, CA, U.S.A.). DNA from the Swedish and Danish samples was extracted on site. DNA from 182 tumor samples of the AUS series was isolated using the QIAamp DNA Mini Kit from Qiagen (GmbH, Hilden, Germany) according to a modified protocol for formalin-fixed, paraffin-embedded tissue samples optimized by Randi Otterstad (pers. comm. Stephan Brackman). In brief, four* approximately 6×25 μm sections of the tumor blocks were lysed in a proteinase K digestion step, the lysate applied to Qiagen spin columns where DNA specifically binds to a silica-gel membrane and is later eluted after two washing steps. The quantity and quality of the extracted DNA was ascertained by a NanoDrop ND-1000 Spectrophotometer (NanoDrop Technologies, Wilmington, DE, U.S.A.).

2.2.2. MSI status MSI status was ascertained for 185 of the samples from the AUS series, and either confirmed or disproved for the MSI series, according to the guidelines given by the National Cancer Insitute in Bethesda, MD.161 The following markers were therefore evaluated: BAT25, BAT26, D2S123, D5S346 and D17S250. The first two are mononucleotide runs, and the last three are dinucleotide microsatellites. When more than one marker exhibited instability, i.e. insertions or deletions, the tumor was classified as MSI-H; those tumors exhibiting instability at one marker only were classified as MSI-L. (Although MSI-L tumors are usually acknowledged to have a phenotype analogous to MSS69,76 these were included in our MSI cohort for downstream analyses on the premise that it is better to err on the side of caution.) Tumor DNA was amplified by PCR using fluorescent forward primers (see Appendix 3 for sequences and below for PCR protocol), and the resulting fragments

* In some cases only two sections were available.

42 analyzed on a 3730 DNA Analyzer from Applied Biosystems using default Microsatellite Analysis settings in the GeneMapper3.7 software. For the fresh-frozen samples the PCR reactions were pentaplexes, while for the formalin-fixed samples the mononucleotide and the dinucleotide markers were amplified in separate reactions. All PCRs were carried out on a RoboCycler® 96 Gradient Cycler (Stratagene, La Jolla, California). Electropherograms were visually scored for mutation by two independent assessors: E.C. Røyrvik and T. Ahlquist. An MSI status laboratory protocol for the Bethesda markers had to be optimized both for fresh-frozen tissue and for DNA from formalin-fixed samples. This involved around 20 test runs, with 35 sets of experimental conditions – varying gradients of primer amount, template amount, PCR cycles and assays in mono-, di-, tri- and pentaplexes (see Appendix 5).

PCR - Bethesda marker pentaplex: 0.06μl of BAT25 and D5S346 forward and reverse primers, 0.08μl of BAT26 and D2S123 primers, 0.15μl of D17S250 primers and 5μl QIAGEN Multiplex PCR Master Mix (GmbH, Hilden, Germany) constitute the active components, in addition to the 37ng of template DNA. (The Multiplex PCR Master Mix contains pre-optimized concentrations of HotStarTaq DNA Polymerase and MgCl2, dNTPs and buffer.) Distilled water was added for a final volume of 10μl.

Denaturing - 95ºC, 15 min 0 sec Denaturing - 95ºC, 0 min 30 sec Annealing - 55ºC, 1 min 15 sec 27 cycles Elongation - 72ºC, 0 min 15 sec Æ 72ºC, 6 min 0 sec PCR – BAT marker diplex: 0.06μl of BAT25 and 0.08μl of BAT26 forward and reverse primers, 0.80μl dNTPs, 1μl 10X PCR buffer (15mM MgCl2) and 0.08μl HotStar Taq DNA polymerase(5U/µl) were added to the 37ng of template DNA. The PCR buffer and HotStar polymerase were both

43 from QIAGEN (GmbH, Hilden, Germany). Distilled water was added for a final volume of 10μl.

Denaturing - 95ºC, 15 min 0 sec Denaturing - 95ºC, 0 min 30 sec Annealing - 55ºC, 1 min 15 sec 30 cycles Elongation - 72ºC, 0 min 15 sec Æ 72ºC, 6 min 0 sec

PCR – dinucleotide marker triplex: 0.15μl of D2S123 and D17S250, and 0.04μl D5S346 forward and reverse primers and 5μl QIAGEN Multiplex PCR Master Mix constitute the active components, in addition to the 37ng of template DNA. Distilled water was added for a final volume of 10μl.

Denaturing - 95ºC, 15 min 0 sec Denaturing - 95ºC, 0 min 30 sec Annealing - 55ºC, 1 min 15 sec 35 cycles Elongation - 72ºC, 0 min 15 sec Æ 72ºC, 6 min 0 sec

2.2.3. Literature survey of putative target genes In order to survey known putative target genes, a search was performed in the PubMed database; search terms were compositions of mononucleotide, microsatellites, repeats, genes, frameshift, and the MeSH term neoplasms. From nearly 140 relevant articles, with the cut-off date for inclusion being December 2006, mutation frequency data for genes containing microsatellites – largely in coding regions, but also some in UTRs and introns – was pooled. Included were studies on MSI tumors and cell lines of all tissue types in which the phenotype occurs, the overwhelming majority being gastrointestinal and endometrial. Inclusion of mutation data on cell lines was kept to a minimum as cell lines cannot be used to represent the situation in primary tumors as

44 regards frameshift mutations of this type.132,138 162 genes had related data considered suitable for pooling. The following information for each gene and each instance of said gene was entered into separate tables in the FileMaker7.0 software: • MSI tumor type (colorectal, gastric, endometrial, etc.) • Primary tumor vs. cell line • Sporadic vs. hereditary cases • Repeat type; unit and number • Mutation frequency in each separate study • Mutation frequency across studies and tissue types • Mutation frequency across studies of colorectal carcinomas • Number of samples in each study • Chromosomal locus of the gene • Location of the repeat within the gene (exon, intron, UTR) • Gene product function, where known • Human Genome Nomenclature Committee (HGNC) symbol and official name • terms • Involvement in biological pathway • and Ensembl accession numbers • Reference (article) for each instance*

These tables were subsequently cross-linked through the corresponding data between tables, which was usually the colloquial gene symbol, to create a database containing all available information in an easily accessible form.

* Gene product function was standardly taken from GeneCards.org; HGNC symbols were confirmed at the HGNC website, gene ontology mining performed with DAVID Bioinformatics Resources (david.abcc.ncifcrf.go), and biological pathways taken from KEGG (/www.genome.jp/keg)

45

Figure 12. Linkage map from FileMaker7.0 database The central table, ‘Gene_List’, details specifics for each gene and the repeat(s) involved. By either the colloquial gene name or the HGNC symbol this table is linked to entries for each given gene in the surrounding tables. Every incidence of a gene in the scientific corpus is entered in ‘Gene_Table_Reference’ (upper right) which again is linked to a reference database (not shown).

This database was then used as a basis to select strong candidate target genes for laboratory mutation analysis.

2.2.4. Target genes 2.2.4.1. Target gene selection For the initial selection a convenient minimum cut-off mutation frequency of 15%* across all tumor types was chosen, following an assessed background mutation level 12-13%.132,133 To as far as possible ensure that the given mutation frequency was representative, this criterion was coupled to a requirement of the existence of at least two

* All numerical values, i.e. for mutation frequencies, number of samples etc. are drawn from the aforementioned target gene database, see 2.2.3.

46 studies of minimum twenty samples each, and a minimum of one hundred samples across all studies. These criteria returned 23 genes. In order to include more potential molecular markers and possibly to avoid some of the bias the above set of criteria had for long established target genes, a second set was defined. This included a minimum observed mutation frequency for the gene of 30%, a minimum of one study of twenty tumor samples, and involvement in one of the following categories, which were in part based on Hanahan and Weinberg’s acquired capabilities of cancer14: DNA repair, cell signaling, apoptosis, cell cycle, transcription or angiogenesis. On the informed assumption that DNA repair and cell cycle genes significantly influence the development and prognosis of MSI tumors, genes falling into these categories were sorted according to mutation frequency, and those displaying mutations in over 15% of samples were culled, regardless of sample number. Finally, genes were included that are considered by Woerner et al. to be true target genes in MSI CRC.136 To recapitulate the criteria: 1. ≥15% mutated, ≥20 tumors in 2 studies, ≥100 tumors total; and 2. ≥30% mutated; ≥20 tumors in 1 study, included in cancer-relevant category; and 3. ≥15% mutated, affecting cell cycle or DNA repair; and 4. One of the following: TGFBR2, ACVR2, BAX, TCF4, MSH3, PTHL3, HT001, AC1 and SLC23A2136 The above criteria were not unremittingly adhered to where a gene that appeared to be promising was just shy of a single cut-off value. The final number of target genes was 43; given in Table 1.

Table 1 Target genes chosen for laboratory analysis Gene symbol – the most common name for any given gene in the scientific corpus surveyed; HGNC symbol – the name approved by the Human Genome Nomenclature Committee, updated to February 2007; Repeat – the repeat unit and number of units of microsatellites which were tested; Gene location – the location of the repeat within the gene, taken when possible from the scientific corpus, otherwise from Ensembl Exon View of a representative transcript; Chr. Location – the location of the gene on a chromosome, taken when possible from the scientific corpus, otherwise from Ensembl or GeneCards.org; Mut.freq. – the mutation frequency across studies and tissue types; Mutated S/C – the number of mutated samples across studies of colorectal carcinomas (mostly sporadic, but with HNPCC cases and cell lines where these could not be separated) over the total number of samples; Mut.freq. S/C – the mutation frequency across studies of colorectal carcinomas (mostly sporadic, but with HNPCC cases and cell lines where these could not be separated)

47 Table1. Gene HGNC- Gene Chr. Mutated Mut. symbol symbol Repeat location location Mut.freq. S/C Freq.S/C AC1 C4orf6 (T)10 exon1 4p16.2 68 % 14/20 70 % ACTRII ACVR2A (A)8 exon10 2q22.3 70 % 95/140 68 % AIM2 AIM2 (A)10 exon6 1q22 52 % 46/81 57 % ATR ATR (A)10 exon10 3q23 23 % 55/252 22 % (G)7, (C)6, AXIN2 AXIN2 (A)6, (C)5 exon7 17q24.1 20 % 18/81 22 % 19q13.3- BAX BAX (G)8 exon3 q13.45 42 % 359/773 45 % BLM BLM (A)9 exon7 15q26.1 16 % 51/373 14 % 11q22.2- CASP5 CASP5 (A)10 exon2 q22.3 43 % 94/207 45 % DNA-PKcs PRKDC (A)10 exon5 8q11.21 22 % 50/228 22 % E2F-4 E2F4 (CAG)13 exon8 16q21-22 47 % 48/109 44 % EBP1/PA2G4 PA2G4 (A)8 exon12 12q13.2 18 % 9/43 21 % EP300 EP300 (A)5, (A)7 exon3, 27 22q13.2 57 % 4/7 57 % EPHB2 EPHB2 (A)9 exon17 1p36.1-p35 41 % 101/246 41 % FLJ11383 PCNXL2 (A)10 exon8 1q42.2 74 % 29/39 74 % GRB-14 GRB14 (A)9 exon7 2q24.3 30 % 17/57 30 % GRK4 GRK4 (A)9 exon8 4p16.3 13 % 19/148 13 % HT001 ASTE1 (A)11 exon6 3q21.3 86 % 17/20 85 % IGF IIR IGF2R (G)8 exon28 6q25.3 22 % 120/530 23 % KIAA1470 RCC2 (A)10 5' UTR 1p36.13 46 % 18/39 46 % MARCKS MARCKS (A)11 exon2 6q22.2 74 % 42/58 72 % MBD4 MBD4 (A)10 exon3 3q21.3 23 % 76/384 20 % MRE11 MRE11A (T)11 intron4-5 11q21 70 % 55/64 86 % MSH2 MSH2 (A)27 intron5-6 2p21 63 % 22/35 63 % MSH3 MSH3 (A)8 exon8 5q14.1 40 % 337/831 41 % MSH6 MSH6 (C)8 exon5 2p16.3 25 % 168/712 24 % OGT OGT (T)10 exon5 Xq13.1 22 % 22 % exon7, PTEN PTEN (A)6 * 2 exon8 10q23.31 17 % 26/138 19 % PTHL3 PTHLH (A)11 exon4 12p11.22 91 % 18/20 90 % RACK7 PRKCBP1 (A)8 exon14 20q13.12 15 % 20/135 15 % RAD50 RAD50 (A)9 exon13 5q23.3 32 % 42/148 28 % RBBP8 RBBP8 (A)9 exon11 18q11.2 17 % 30/179 17 % RIS1 RIS1 (GCN)14 exon 3 3p21.31 44 % 7/16 44 % RIZ PRDM2 (A)9 exon8 1p36.21 35 % 24/83 29 % SEC63 SEC63 (A)10, (A)9 exon16 6q16-22 54 % 58/103 56 % SEMG1 SEMG1 (T)9 intron2-3 20q13.12 51 % 74/146 51 % SLC23A1 SLC23A2 (C)9 exon12 20p13 45 % 9/20 45 % SPINK5 SPINK5 (A)10 exon26 5q32 31 % 12/39 31 % SYCP1 SYCP1 (A)10 exon32 1p13-12 17 % 11/60 18 % TAF-1B TAF1B (A)11 exon3 2p25 78 % 45/58 78 % TCF-4 TCF7L2 (A)9 exon15 10q25.2 36 % 126/307 41 % TGF-ßRII TGFBR2 (A)10 exon3 3p24.1 71 % 759/951 80 % UVRAG UVRAG (A)10 exon8 11q13.5 35 % 20/57 35 % WISP3 WISP3 (A)9 exon4 6q21 22 % 11/36 31 %

48 2.2.4.2. Mutation analysis Mutation analysis of the above selected target genes was performed by fragment analysis of the microsatellite-containing regions of each gene. With the exception of PTEN and EP300, which both had repeats of interest in two separate exons, only one fragment was investigated per gene, yielding a total of 45 fragments. These fragments, ranging from ca. 60-250bp in size, were amplified in multiplex PCRs averaging five genes per reaction (see Appendix 3 for primer sequences). When possible, the primer sequences were those that had been used in previous studies – by now some gene fragments have canonical primer sequences – the remainder were designed for this study using the Primer3 program (frodo.wi.mit.edu/cgi-bin/primer3). Default settings were used except for adjusting melting temperatures upon occasion. All primer pairs were assessed for specificity, i.e. that they only amplify unique sequences of the human genome, by in silico PCR at genome.ucsc.edu, and for hairpin and/or primer-dimer formation at NetPrimer (www.premierbiosoft.com). Target gene fragments were divided into the following groups for multiplex PCR: 1. TGFBR2, SLC23A2, BAX, IGF2R and PRKDC 2. GRK4, SPINK5, PTHL3, AXIN2, SEC63 and MSH2 3. ATR, RAD50, HT001 and EP300 –exon 27 4. TCF4, MBD4, ACVR2, OGT and SYCP1 5. EPBH2, PA2G4, EP300 –exon 3, CASP5 and SEMG1 6. RACK7, TAF1B, FLJ11383 and KIAA1470 7. GRB14, MRE11, RIZ, and PTEN –exons 7 and 8 8. AIM2, RBBP8, MSH6, and E2F4 9. BLM, AC1 and MSH3 10. MARCKS, UVRAG and WISP3 (RIS1 was excluded as it was problematical to amplify and the primers proved to be non- specific.)

37ng of template DNA was used in each assay, and all primers were of a 20pmol/μl concentration.

49 PCR mix Group 1: 0.05μl of TGFBR2, SLC23A2 and IGF2R forward and reverse primers, 0.04μl of BAX primers, 0.08μl of PRKDC primers and 7.5μl QIAGEN Multiplex PCR Master Mix (GmbH, Hilden, Germany) constitute the active components, in addition to the template DNA. (The Multiplex PCR Master Mix contains pre-optimized concentrations of

HotStarTaq DNA Polymerase and MgCl2, dNTPs and buffer.) Distilled water was added for a final volume of 15μl.

PCR mix for Group 2: 0.08μl GRK4, PTHL3, AXIN2, SEC63 and MSH2 primers, 0.06μl SPINK5 primers and 7.5μl QIAGEN Multiplex PCR Master Mix constitute the active components, in addition to the template DNA. Distilled water was added for a final volume of 15μl.

PCR mix for Group 3: 0.10μl ATR primers, 0.05μl RAD50, HT001 and EP300-exon27 primers and 7.5μl QIAGEN Multiplex PCR Master Mix constitute the active components, in addition to the template DNA. Distilled water was added for a final volume of 15μl.

PCR mix for Group 4: 0.08μl TCF4, ACVR2 and SYCP1 primers, 0.06μl OGT primers, 0.10μl MBD4 primers and 7.5μl QIAGEN Multiplex PCR Master Mix constitute the active components, in addition to the template DNA. Distilled water was added for a final volume of 15μl.

PCR mix for Group 5: 0.06μl EPHB2, PA2G4, EP300-exon3, CASP5 and SEMG1 primers and 7.5μl QIAGEN Multiplex PCR Master Mix constitute the active components, in addition to the template DNA. Distilled water was added for a final volume of 15μl.

PCR mix for Group 6:

50 0.08μl KIAA1470 primers, 0.06μl RACK7 and TAF1B primers, 0.5μl FLJ11383 primers and 7.5μl QIAGEN Multiplex PCR Master Mix constitute the active components, in addition to the template DNA. Distilled water was added for a final volume of 15μl.

PCR mix for Group 7: 0.08μl GRB14, MRE11, and PTEN-exon8 and exon7primers, 0.06μl RIZ primers, and 7.5μl QIAGEN Multiplex PCR Master Mix constitute the active components, in addition to the template DNA. Distilled water was added for a final volume of 15μl.

PCR mix for Group 8: 0.08μl AIM2 primers, 0.06μl MSH6 primers, 0.10μl RBBP8 primers, 0.12μl E2F4 primers and 7.5μl QIAGEN Multiplex PCR Master Mix constitute the active components, in addition to the template DNA. Distilled water was added for a final volume of 15μl.

PCR mix for Group 9: 0.06μl AC1, BLM and MSH3 primers and 7.5μl QIAGEN Multiplex PCR Master Mix constitute the active components, in addition to the template DNA. Distilled water was added for a final volume of 15μl.

PCR mix for Group 10: 0.350μl WISP3 primers, 0.225μl UVRAG primers, 0.175μl MARCKS primers and 7.5μl QIAGEN Multiplex PCR Master Mix constitute the active components, in addition to the template DNA. Distilled water was added for a final volume of 15μl.

PCR conditions for Groups 1, 4, 6 and 9: Denaturing - 95ºC, 15 min 0 sec Denaturing - 95ºC, 0 min 30 sec Annealing - 58ºC, 1 min 15 sec 27 cycles Elongation - 72ºC, 0 min 15 sec Æ 72ºC, 6 min 0 sec

51 PCR conditions for Groups 2, 3, 5 and 8: Denaturing - 95ºC, 15 min 0 sec Denaturing - 95ºC, 0 min 30 sec Annealing - 60ºC, 1 min 15 sec 27 cycles Elongation - 72ºC, 0 min 15 sec Æ 72ºC, 6 min 0 sec

PCR conditions for Group 7: Denaturing - 95ºC, 15 min 0 sec Denaturing - 95ºC, 0 min 30 sec Annealing - 53ºC, 1 min 15 sec 27 cycles Elongation - 72ºC, 0 min 15 sec Æ 72ºC, 6 min 0 sec

PCR conditions for Group 10: Denaturing - 95ºC, 15 min 0 sec Denaturing - 95ºC, 0 min 30 sec Annealing - 58ºC, 1 min 15 sec 30 cycles Elongation - 72ºC, 0 min 30 sec Æ 72ºC, 6 min 0 sec

The fragments were labeled with the G5 dye set from Applied Biosystems; PET – red, NED – yellow, VIC – green, and 6-FAM – blue. The size standard used was GS500 LIZ (orange). All fragments were analyzed on a 3730 DNA Analyzer from Applied Biosystems using default Microsatellite Analysis settings in the GeneMapper3.7 software. Electropherograms were visually examined for insertions/deletions by two independent assessors, E.C. Røyrvik and T. Ahlquist, against corresponding fragments from DNA from four different disease-free individuals. All assays were duplicated in tandem runs using different PCR machines to ensure the robustness of the results.

52 A representative view of the fragment analysis output for several genes is given in Figure 13.

Figure 13. Electropherograms of DNA from normal blood (top) and two MSI-H tumors for PCR group 3. Red – ATR; black (yellow dye) – RAD50; blue – CDC25C (not included in the final analysis); green – HT001; orange – size standard; size in base pairs on the x-axis(top).

2.2.4.3. Clustering analysis and survival correlation of target genes Clustering of genes according to mutation states by sample was done using a hierarchical clustering algorithm (UPGMA – un-weighted pair group method with arithmetic mean) in Spotfire®. A simple measure of correlation between mutation state for each gene and survival was achieved using SAM (Significance Analysis of Microarrays) in Microsoft Excel (www-stat.stanford.edu/~tibs/SAM), and this analysis was further visualized by hierarchical clustering in Spotfire®. In this case, the measure of survival was simply a living/dead dichotomy, using 10-year follow-up data on the included patients (n=34).

53 Subsequently, several analyses for patients with MSI-H tumors for whom we had clinical data (survival and Dukes’ stage) were performed in SPSS 13.0 (Statistical Package for the Social Sciences). Firstly, a χ2-test was run for each gene to see if its mutation status showed any trend with regard to Dukes’ stage. This was done using all four Dukes’ stages (A-B-C-D), and then using only A+B and C+D groupings, and finally Dukes’ B and C. Secondly, a Kaplan-Meier (univariate and time-dependent) estimator was calculated for each gene, the variable being wild type versus mutated, for a ten-year follow-up period. A five-year follow-up Kaplan-Meier estimator was calculated for nine genes (BAX, SLC23A2, AXIN2, GRK4, OGT, MRE11, BLM, EP300, PTEN) which showed a collapse or divergence of wild type and mutated graphs in the 10-year Kaplan- Meier plot after 60 months, in order to examine if these genes had any significant effect limited to the first five years after diagnosis. Finally, genes which showed significant or near-significant results in some of the above analyses were included in logistic regression and Cox regression analyses (multivariate, and multivariate and time-dependent, respectively) with Dukes’ staging, for five- and ten-year follow-up. Survival data was based on the Norwegian cohort of the MSI series, for which clinicopathological variables and follow-up were available.

2.2.4.4. In silico assessment of frameshift consequences For certain genes (SLC23A2, EP300, UVRAG and OGT among them) which proved interesting subsequent to the above analyses, it was decided to assess what kind of functional consequence might be expected following a frameshift mutation. The mononucleotide repeat was located in the gene’s coding sequence at EMBL (http://www.ebi.ac.uk/), one unit inserted or deleted in the repeat, and then translated using the Expasy Translate tool (http://www.expasy.org/tools/dna.html). Which functional domains of the wild type protein were affected were judged from SwissProt entries (http://expasy.org/sprot/). (Results are not shown, simply commented on in chapter 4.)

54 3. Results

3.1. MSI series 3.1.1. MSI status The Scandinavian series (45 Norwegian, 5 Danish and 47 Swedish tumor samples; Ntotal = 94) were classified as MSI according to the Bethesda criteria, 87 of these were MSI-H (see Appendix 1).

Figure 14. Electropherograms of the mononucleotide markers used to assess microsatellite instability; a) BAT25; b) BAT26. In both a) and b) the top electropherogram is from a microsatellite stable tumor sample and has the characteristic pattern of the wild type microsatellites, and the two lower are from MSI tumors with profuse deletions

55

Figure 15 Electropherograms of dinucleotide Bethesda markers -a) shows D2S123; b) D5S346; and c) D17S250. In all cases the normal pattern is given on top, and two examples from mutated tumors are given below.

3.1.2 Literature survey The essential results of the literature survey are given in Table 2. Of these genes, those containing coding mononucleotide repeats (cMNRs) of (N)≥8 outnumber the others six to one, with (A)8-10 being the most common. (A/T) cMNRs of over 16bp are exceptionally rare in the genome; (C) and (G) cMNRs do not exceed 16bp and 13bp, respectively. Certain genes contain more than one cMNR, many contain multiple (N)6s, but there are indications that only one repeat is subject to most of the mutational events.136 Di- and trinucleotide repeats are infrequently represented among the target genes. This may be attributed to a lower inherent propensity for replication slippage, and for the trinucleotide tracts the assumption that the insertion or deletion of a full unit will

56 make only a potentially marginal difference in the protein product as it will be in-frame. However, the addition or deletion of an amino acid can have a profound effect on a protein, depending on size, charge and location et cetera, and not all the indels of trinucleotide coding repeats are of one repeat unit. E2F4, for example, the gene of which contains a (CAG) repeat which is polymorphic in normal tissue162,163, appears to enhance proliferation when it contains inserted or deleted codons142, and the imperfect triplet repeat in RIS1 is subject to frameshift mutations which interrupt its product’s polyalanine domain.164

Table 2 For explanation of headings see Table 1. Ref – Articles from which mutation frequency information was pooled. For a bibliography specific to this table, see Appendix 4. Mut Mut Gene HGNC Repeat Chr. Location %(tot) Mut. S/C %(S/C) Ref: ABCF1 ABCF1 (A)10 6p21.33 29 % 17/58 29 % 14 55 AC1 C4orf6 (T)10 4p16.2 68 % 14/20 70 % 35 56 ACTRII ACVR2A (A)8 2q22.3 70 % 95/140 68 % 4 67 115 118 136 AD7c-NTP AD7c-NTP (T)8 1p36 6 % 2/35 6 % 67 AIM2 AIM2 (A)10 1q22 52 % 46/81 57 % 14 67 118 AMYB MYBL1 (A)8 8q22 11 % 118 ANG2 ANGPT2 (A)9 8p23.1 4 % 2/57 4 % 22 APAF-1 APAF1 (A)8 12q23 8 % 5/79 6 % 19 30 63 ATM ATM (T)7 11q22-23 13 % 4/44 9 % 1 36 42 ATR ATR (A)10 3q23 23 % 55/252 22 % 3 22 10 14 35 105 (G)7, AXIN2 AXIN2 (C)6, 17q24.1 20 % 18/81 22 % 21 64 136 (A)6, (C)5 BAT1 BAT1 (T)8 6p21.3 16 % 118 19q13.3- BAX BAX (G)8 q13.45 42 % 359/773 45 % 3 9 12 18 22 24 25 27 28 30 31 33 35 37 40 44 47 48 49 50 51 52 51 52 53 67 68 68 69 70 71 72 73 74 75 76 77 79 80 82 108 110 111 114 118 119 128 130 132 134 136 BCL10 BCL10 (A)8 1p22 8 % 14/172 8 % 12 63 105 130 BLM BLM (A)9 15q26.1 16 % 51/373 14 % 3 10 22 26 36 66 67 72 77 105 136 BLYM BLYM (A)8 4q28.1 5 % 5/96 5 % 105 BRCA1 BRCA1 (A)8 17q21 2 % 3/126 2 % 10 26 36 53 BRCA2 BRCA2 (A)8 13q12.3 2 % 6/191 3 % 10 22 26 53 119 CANX CANX (T)8 5q35.3 21 % 118 CASP1 CASP1 (A)8 11q23 4 % 0/78 0 % 30 53 118 CASP4 CASP4 11q22.3 0 % 0/9 0 % 82 CASP5 CASP5 (A)10 11q22.2-q22.3 43 % 94/207 45 % 10 14 22 27 30 35 110 114 128 136 CBL CBL (ATG)6 11q.23.3 12 % 1/11 9 % 26 77 CBP* CREBBP (C)5 16p13.3 86 % 86 % 8 CCDC28A CCDC28A (A)8 6q24.1 10 % 3/41 7 % 67 118 CCKBR CCKBR (T)8 11p15.4-p15.5 19 % 2/15 13 % 15

57 CDC25C CDC25C (A)8 5q31.2 11 % 10/93 11 % 105 CDX2 CDX2 (G)7 13q12.2 2 % 1/81 1 % 30 62 136 CEPBZ CEBPZ (A)9 2p22.2 14 % 20/148 14 % 22 105 116 CHD2 CHD2 (A)10 15q26 12 % 7/58 12 % 14 55 CHK1 CHEK1 (A)9 11q24.2 9 % 9/68 13 % 28 82 119 CRSP3 CRSP3 (T)8 6q22.33-q24.1 3 % 1/38 3 % 12 CYSLT1 CYSLTR1 (A)8 Xq21.1 9 % 4/44 9 % 67 DD5 EDD1 (A)8 8q22 25 % 56 118 DNA-PKcs PRKDC (A)10 8q11.21 22 % 50/228 22 % 3 14 19 22 35 105 Doc-1 3q12.1 2 % 1/57 2 % 22 DRP INPPL1 (C)8 11q13.4 4 % 2/42 5 % 53 DSTN DSTN (T)8 20p12.1 12 % 4/42 10 % 67 118 E2F-4 E2F4 (CAG)13* 16q21-22 47 % 50/111 45 % 24 31 79 82 85 111 128 139 EIF5 EIF5 (CAC)7 14q32.32 0 % 0/11 0 % 26 ELAVL3 ELAVL3 (G)9 19p13.2 37 % 7/19 37 % 55 56 EP300 EP300 (A)5, (A)7 22q13.2 57 % 4/7 57 % 8 EPHB2 EPHB2 (A)9 1p36.1-p35 41 % 101/246 41 % 104 ERCC5 ERCC5 (A)9 13q33.1 9 % 8/93 9 % 19 105 F8 F8 (A)8 * 2 Xq28 15 % 6/41 15 % 67 FACE-1 ZMPSTE24 (T)9 1p34.2 8 % 3/37 8 % 67 FAS FAS (T)7 10q23.31 7 % 3/30 10 % 63 110 FLASH CASP8AP2 (A)9 6q15 0 % 0/13 0 % 19 14q13.1- FLJ11186 C14orf106 (A)11 14q21.3 64 % 25/39 64 % 14 FLJ11222 MNS1 (A)10 15q11.2 28 % 11/39 27 % 14 FLJ11383 PCNXL2 (A)10 1q42.2 74 % 29/39 74 % 14 FLJ11712 RNASEH2B (A)10 13q14.3 18 % 7/39 18 % 14 FLJ13615 CEP290 (A)11 12q21.33 28 % 11/39 28 % 14 FLJ20139 FLJ20139 (A)10 1p21.2 31 % 12/39 31 % 14 FLT3LG FLT3LG (C)9 19q13.3 36 % 7/20 35 % 35 56 FTO (T)14 16q12.2 80 % 16/20 80 % 35 GART GART (A)10 21q22.11 22 % 13/60 22 % 14 35 GR6 C3orf27 (GA)9 3q21.3 17 % 3/18 17 % 109 GRB-14 GRB14 (A)9 2q24.3 30 % 17/57 30 % 22 GRK4 GRK4 (A)9 4p16.3 13 % 19/148 13 % 22 105 HBP17 FGFBP1 (A)8 4p15.32 8 % 3/38 8 % 12 HDCMA18P LARP7 (A)8 4q25 16 % 3/44 7 % 67 118 MSH2 MSH2 (A)27 2p21 63 % 22/35 63 % 54 MSH3 MSH3 (A)8 5q14.1 40 % 337/831 41 % 3 9 10 18 22 26 27 30 33 34 35 37 40 44 50 51 52 53 67 71 72 73 76 77 79 82 84 85 87 105 111 118 119 124 128 132 133 134 136 MSH6 MSH6 (C)8 2p16.3 25 % 168/712 24 % 3 9 18 22 26 27 30 34 35 36 37 40 44 50 51 52 53 67 71 72 73 77 79 82 84 111 118 119 124 128 132 133 134 136 hnRNP HNRPH1 (T)8 5q35.5 22 % 118 HPDMPK FBXO46 (T)14 19q13.32 95 % 19/20 95 % 35 RAD50 RAD50 (A)9 5q23.3 32 % 42/148 28 % 3 10 19 22 36 136 HT001 ASTE1 (A)11 3q21.3 86 % 17/20 85 % 55 56 HTF34 ZNF93 (A)8 19p13.1-p12 7 % 9/124 7 % 12 105 IDN3 NIPBL (A)8 5p13.2 7 % 3/44 7 % 67 IGF IIR IGF2R (G)8 6q25.3 22 % 120/530 23 % 9 12 18 22 25 26 28

58 31 33 35 40 44 47 51 60 67 72 73 74 75 76 77 79 82 83 110 119 128 132 134 136 KIAA0092 CEP57 (A)8 11q21 7 % 3/43 7 % 67 KIAA0295 ZNF609 (A)8 15q22.31 8 % 3/39 8 % 67 KIAA0335 ZNF518 (A)9 10q24.1 7 % 3/43 7 % 67 KIAA0336 GCC2 (A)8 2q12.3 8 % 3/43 7 % 67 118 KIAA0355 KIAA0355 (A)9 19q13.11 4 % 1/24 4 % 66 KIAA0530 ZNF292 (A)9 6q15 7 % 3/44 7 % 67 KIAA0595 PPRC1 (C)8 10q24.32 7 % 3/43 7 % 67 KIAA0754 KIAA0754 (A)8 1p34.3 10 % 4/41 10 % 67 KIAA0844 ZNF365 (A)8 10q21.2 9 % 4/44 9 % 67 KIAA0905 SEC31A (A)9 4q21.22 17 % 6/43 14 % 67 118 KIAA0943 ATG4B (T)9 2q37.3 11 % 4/44 9 % 67 118 KIAA0977 COBLL1 (T)9 2q24.3 20 % 10/42 24 % 67 118 KIAA1052 CEP164 (A)11 11q23.3 31 % 12/39 31 % 14 KIAA1268 PARP14 (A)10 3q21.1 23 % 9/39 23 % 14 KIAA1333 KIAA1333 (A)10 14q12 21 % 8/39 21 % 14 KIAA1470 RCC2 (A)10 1p36.13 46 % 18/39 46 % 14 KKIAMRE CDKL2 (A)9 4q21.1 4 % 2/57 4 % 22 MAC30 TMEM97 (A)10 17q11.2 17 % 9/60 15 % 14 35 118 MARCKS MARCKS (A)11 6q22.2 74 % 42/58 72 % 14 55 56 136 MAZ MAZ (C)8 16p11.2 8 % 3/38 8 % 12 MBD4 MBD4 (A)10 3q21.3 24 % 76/384 20 % 3 10 14 29 30 35 43 67 112 118 121 128 136 MCT4 SLC16A4 (T)9 1p12 15 % 4/36 11 % 67 118 MKI67 MKI67 (A)8 10q26.2 18 % 118 MLH3 MLH3 (A)9 14q24.3 8 % 4/27 15 % 84 38 MRE11 MRE11A (T)11 11q21 75 % 55/64 86 % 1 3 32 140 141 MRP2 ABCC2 (A)8 10q24 8 % 3/38 8 % 12 MYO10 MYO10 (G)8 5p15.1 11 % 4/38 11 % 12 NBS1 NBN (A)7 8q21.3 0 % 0/39 0 % 36 NDUFC2 NDUFC2 (T)9 11q14.1 31 % 12/43 28 % 56 67 118 NKTR NKTR (C)8 4q32.1 7 % 3/43 7 % 67 NSEP YBX1 (C)8 1p34.2 0 % 0/82 0 % 25 27 53 OGT OGT (T)10 Xq13.1 22 % 26/116 22 % 14 22 35 P4HB P4HB (A)8 17q25.3 10 % 3/42 7 % 67 118 PA2G4 PA2G4 (A)8 12q13.2 18 % 9/43 21 % 67 118 PMS2 PMS2 (A)8 7p22.1 2 % 5/207 2 % 27 29 53 66 105 POLA POLA (A)8 Xp22.11 0 % 0/66 0 % 53 66 PRCC PRCC (C)8 1q21.1 12 % 118 PRKCI PRKCI (A)8 3q26.2 11 % 118 PRKWNK1 WNK1 (A)10 12p13.3 23 % 9/39 23 % 14 PRRG1 PRRG1 (C)8 Xp21.1 9 % 4/43 9 % 67 PTEN PTEN (A)6 * 2 10q23.31 17 % 26/138 19 % 18 21 65 88 89 106 135 136 PTHL3 PTHLH (A)11 12p11.22 91 % 18/20 90 % 35 56 136 PTPN21 PTPN21 (A)8 14q31.3 13 % 5/43 12 % 67 118 RAB2L RGL2 (G)8 6p21.3 12 % 5/43 12 % 67 RACK7 PRKCBP1 (A)8 20q13.12 15 % 20/135 15 % 67 105 RBBP2 JARID1A (A)8 12p11 17 % 118 RBBP8 RBBP8 (A)9 18q11.2 17 % 30/179 17 % 13 19 22 116 RECQL RECQL (A)9 12p12.1 8 % 19/213 9 % 10 22 66 105 RFC3 RFC3 (A)10 13q12.3-13 21 % 8/39 21 % 14 RGS12 RGS12 (C)8 4p16.3 29 % 11/38 29 % 12 RHAMM HMMR (A)9 5q34 16 % 9/57 16 % 22

59 RIP140 NRIP1 (A)9 21q11.2 9 % 3/42 7 % 67 118 RIS1 TMEM158 (GCN)14 3p21.31 44 % 7/16 44 % 122 RIZ PRDM2 (A)8, (A)9 1p36.21 35 % 24/83 29 % 7 66 90 91 118 136 SEC63 SEC63 (A)10,(A)9 6q16-22 54 % 58/103 56 % 14 35 67 118 SEMG1 SEMG1 (T)9 20q13.12 51 % 74/146 51 % 5 SEX PLXNA3 (G)8 Xq28 14 % 5/35 14 % 67 SHC1 SHC1 (G)8 1q22 0 % 0/6 0 % 26 SLC17A2 SLC17A2 (A)8 6p21.3 12 % 118 SLC23A1 SLC23A2 (C)9 20p13 45 % 9/20 45 % 35 SLC4A3 SLC4A3 (C)9 2q35 33 % 7/21 33 % 35 56 SPINK5 SPINK5 (A)10 5q32 31 % 12/39 31 % 14 SREBP2 SREBF2 (CAG)12 22q13.2 6 % 1/18 6 % 109 (CT)4, ß2m B2M 2*(A)6 15q21.1 29 % 5/17 29 % 86 137 STK11 STK11 (C)6 19p13.3 8 % 6/80 8 % 126 SYCP1 SYCP1 (A)10 1p13-12 17 % 11/60 18 % 14 35 118 TAF-1B TAF1B (A)11 2p25 78 % 45/58 78 % 14 55 TAN-1 NOTCH1 (ACC)6 9q34.3 11 % 2/18 11 % 109 TAP1 TAP1 (G)6 6p21.32 11 % 2/18 11 % 137 TAP2 TAP2 (C)6 6p21.32 12 % 1/17 6 % 137 TCF1 TCF1 (C)8 12q24.3 32 % 12/38 32 % 12 TCF-4 TCF7L2 (A)9 10q25.2 36 % 126/307 41 % 3 10 12 17 22 30 44 136 TCF6L1 TCF6L1 (A)10 7pter-cen 47 % 27/57 47 % 14 55 TEF4 TEAD2 (C)8 19q13.33 32 % 12/38 32 % 12 56 TFDP2 TFDP2 (A)8 3q23 3 % 0/57 0 % 116 118 TFE3 TFE3 (C)8 Xp11.22 24 % 9/38 24 % 12 TGF-ßRII TGFBR2 (A)10 3p24.1 71 % 759/951 80 % 3 9 10 12 14 18 22 24 28 30 31 35 37 40 44 47 50 51 58 67 69 71 72 73 74 75 76 77 79 82 92 93 95 96 97 108 111 118 119 125 128 132 134 135 136 TLOC1 TLOC1 (A)9 3q26.2 7 % 4/57 7 % 22 TPRDI TTC3 (A)8 21q22-13 11 % 4/44 9 % 67 118 TSC1 TSC1 (GCA)6 9q34.13 0 % 0/6 0 % 26 TTK TTK (A)9 6q14.1 28 % 118 USP-1 USP1 (A)8 1p31.3 17 % 118 UVRAG UVRAG (A)10 11q13.5 35 % 20/57 35 % 14 55 VRK2 VRK2 (A)8 2p16.1 11 % 118 WBP1 WBP1 (C)9 2p13.1 9 % 3/43 7 % 67 118 WISP3 WISP3 (A)9 6q21 22 % 11/36 31 % 30 136 WRN WRN (A)8 8p12 0 % 0/6 0 % 26 XPOT XPOT (T)9 12q14.2 14 % 6/43 14 % 67 ZFP103 RNF103 (A)8 2p11.2 20 % 118 * The mutation frequency for CBP is pooled with that of EP300; the dimorphic trinucleotide repeat of E2F4 is (CAG)12-13.

60 3.1.3. Target gene mutation Divided into MSI-H and MSI-L tumor samples and MSI cell lines, Table 3 gives the results of mutation analyses for the 43 chosen target genes. Most important is the column of mutation frequencies for MSI-H tumor samples, which is in bold, with the adjacent column of mutation frequencies containing the corresponding frequencies from the scientific literature. For mutation results for each sample see Appendix 2. By far the most common mutation type across genes, though not shown, is a single base pair deletion. In the case of E2F4, this is a single unit deletion, i.e. that of a base triplet.

Table 3 N-mutated – the number of mutated samples over the total number of samples; Mut. Freq. – mutation frequency; Mut. Freq. Lit. – corresponds to Mut % (S/C) of Table 2, and so largely relates to sporadic, primary MSI-H tumors, in the scientific corpus MSI-H MSI-L MSI cell line Gene symbol N-mutated Mut. Freq. Mut. Freq. Lit. N -Mutated Mut. Freq. N -Mutated Mut. Freq. AC1 39/77 51 % 70 % 0/7 0 % 6/9 67 % ACVR2 73/80 91 % 68 % 2/7 29 % 9/9 100 % AIM2 41/74 55 % 57 % 1/7 14 % 7/9 78 % ATR 12/84 14 % 22 % 0/7 0 % 2/9 22 % AXIN2 13/85 15 % 22 % 0/7 0 % 1/9 11 % BAX 38/87 44 % 45 % 1/7 14 % 3/9 33 % BLM 12/77 16 % 14 % 0/7 0 % 3/9 33 % CASP5 49/83 59 % 45 % 1/7 14 % 7/9 78 % E2F-4 44/74 60 % 44 % 2/7 29 % 7/9 78 % EP300 13/76 17 % 57 %1 0/7 0 % 3/9 33 % EPHB2 19/79 24 % 41 % 0/7 0 % 4/9 44 % FLJ11383 53/81 65 % 74 % 1/7 14 % 8/9 89 % GRB-14 31/78 40 % 30 % 1/7 14 % 5/9 56 % GRK4 17/85 20 % 13 % 1/7 14 % 2/9 22 % HT001 58/84 69 % 85 % 2/7 29 % 8/9 89 % IGF2R 26/87 30 % 23 % 1/7 14 % 3/9 33 % KIAA1470 41/81 51 % 46 % 0/7 0 % 4/9 44 % MARCKS 46/72 64 % 72 % 1/7 14 % 9/9 100 % MBD4 24/80 30 % 20 % 0/7 0 % 3/9 33 % MRE11 59/78 76 % 86 % 2/7 29 % 9/9 100 % MSH2 79/85 93 % 63 % 1/7 14 % 7/8 88 % MSH3 39/77 51 % 41 % 0/7 0 % 5/9 56 % MSH6 17/77 22 % 24 % 0/7 0 % 5/9 56 % OGT 29/81 36 % 22 % 0/7 0 % 2/9 22 % PA2G4 7/83 8 % 21 % 0/7 0 % 0/9 0 %

1 This mutation frequency is derived solely from the study of seven cell lines, and EP300 was included because of its vital role in both chromatin organization and relationship to p53 function (see below).

61 MSI-H MSI-L MSI cell line Gene symbol N-mutated Mut. Freq. Mut. Freq. Lit. N -Mutated Mut. Freq. N -Mutated Mut. Freq. PRKDC 19/87 22 % 22 % 0/7 0 % 2/9 22 % PTEN 15/78 19 % 19 % 0/7 0 % 0/9 0 % PTHL3 64/85 75 % 90 % 2/7 29 % 9/9 100 % RACK7 6/81 7 % 15 % 0/7 0 % 1/9 11 % RAD50 41/84 49 % 28 % 1/7 14 % 6/9 67 % RBBP8 10/74 14 % 17 % 1/7 14 % 1/9 11 % RIS1 — — 44 % — — — — RIZ 23/78 30 % 29 % 1/7 14 % 6/9 67 % SEC63 34/85 40 % 56 % 1/7 14 % 5/9 56 % SEMG1 23/72 32 % 51 % 0/7 0 % 6/9 67 % SLC23A2 34/87 39 % 45 % 0/7 0 % 7/9 78 % SPINK5 11/85 13 % 31 % 1/7 14 % 1/8 13 % SYCP1 16/81 20 % 18 % 0/7 0 % 4/9 44 % TAF-1B 62/81 77 % 78 % 1/7 14 % 8/9 89 % TCF-4 33/81 41 % 41 % 1/7 14 % 4/9 44 % TGFBR2 65/87 75 % 80 % 1/7 14 % 9/9 100 % UVRAG 26/68 38 % 35 % 1/7 14 % 4/8 50 % WISP3 18/72 25 % 31 % 0/7 0 % 1/9 11 %

62 In order to ensure that gene mutation frequencies were not simply a function of the repeat length, the bar chart in Figure 16 shows the mutation frequencies according to increasing repeat length (where the gene had been tested for several repeats, the longest was chosen). With the exception of the three leftmost and the two rightmost genes (PTEN, EP300 and AXIN2, and E2F4 and MSH2, respectively), all genes have an (N)8- 11 repeat. Although a slight tendency for increased mutation frequency with increasing repeat length may be divined, it is not all a marked one. AVCR2, for example, has one of the highest mutation frequencies (91%) with only an (A)8 repeat.

100 %

90 %

80 %

70 %

60 %

50 %

40 %

30 %

20 %

10 %

0 %

Figure 16. Mutation frequency by repeat size Mutation frequency is on the y axis, and along the x axis are genes arranged by increasing repeat length, left to right.

Hierarchical clustering of genes according to mutated versus wild type status was conducted to see whether there was any covariance between any groups of genes. The results are presented in Figure 17. When one disregards the clustering which is due to very high or very low mutation frequency alone, such as one sees in the two uppermost and at least seven bottommost genes, there are no very obvious clusters. One may,

63 however, note some association between AC1 and E2F4, as well as between a group containing OGT, UVRAG and TCF4. At the tumor sample level, most of the MSI-L tumors cluster to the left showing very few frameshift mutations, even among the highly mutated genes, which can be taken as some confirmation that there is a real dichotomy between the MSI-L and MSI-H groups. Also, five of the nine cell lines cluster together, while the other four are dispersed among the MSI-H samples. The lack of a single cluster containing all cell lines strengthens the case for colon carcinoma cell lines as a model system, however the clustering of many of the most commonly used cell lines indicates that there are real differences, and that these are usually higher mutation frequencies in cell lines.

Figure 17. Hierarchical clustering of target genes Samples are given on the x-axis, genes on the y-axis. Red cells indicate an insertion/deletion, black cells are wild type, and grey are missing data. The shorter the branches of the cladogram to the left, the more covariance the genes exhibit.

64 The mutation status of only one gene showed a significant association with Dukes’ stage which was not attributable to mutation frequency alone. This was MSH3, which had a non-random mutated/wild type distribution (p=0.01) according to whether the tumor was Dukes’ B (80% mutated) or Dukes’ C (30% mutated). GRB14 exhibited a similar tendency, with 40% mutation in Dukes’ B versus 10% mutation in Dukes’ C (p=0.06). No directional trends, for example increasing mutation frequency from Dukes’ A Æ D, were discovered. In order to ascertain whether there was any association between the mutation status of a given gene and the long-term survival of a patient a significance analysis was run. The results are presented graphically in Figure 18. The only gene which, when mutated, was significantly correlated with patient mortality was SLC23A2. Conversely, OGT shows a tendency to encourage long-term patient survival in a wild type state (p=0.06).

Significant: 1 Tail strength (%): -44.4 Median number of false positives: 0 SAM Plotsheet se (%): 24.4 False Discovery Rate (%): 0 2

1,5

1

0,5

0 Observed Score Observed -1,5 -1 -0,5 0 0,5 1 1,5

-0,5

-1

-1,5 Expected Score

Figure 18. Plot of 10 year survival according to gene mutation status The upper right corner signifies mutated gene status with non-survival, the lower left a wild type gene status with survival. The blue lines

65 represent the expected distribution of genes when there is no correlation between their mutation status and patient survival. The only outlier considered statistically significant is marked in red, SLC23A2.

SLC23A2 mutation was highly significantly correlated with poor long term, i.e. ten-year, prognosis (p=0.006), see Figure 19, but much less so after a follow-up period of only five years (p=0.7). After the standard five-year follow-up the mutated and wild type graphs in the Kaplan-Meier plots for several of the genes appeared to collapse, and for three of these the difference was significant. For 60 month survival, mutated AXIN2 and EP300 were related to poorer prognosis (p=0.048 and p=0.049, respectively), whereas mutated MRE11 was an indicator of a more favorable prognosis (p=0.047).

Figure 19. Kaplan-Meier plot of survival according to SLC23A2 mutation status Blue line (0) indicates wild type; green line (1) indicates mutant; ntotal=35.

Both logistic regressions and Cox regressions returned SLC23A2 mutation status as a variable which could improve upon the prognostic value of Dukes’ staging alone. This was borderline significant for Dukes’ A-B-C-D (p=0.052), and significant for an AB-CD division (p=0.01). In the latter case, correct prediction of patient death was increased from 69% using only Dukes’ stage, to 94% when SLC23A2 status was included. See

66 Figure 20 for a comparison survival according to Dukes’ AB-CD alone, and combined with SLC23A2 mutated/wild type.

Figure 20. Survival for Dukes’ stage + SLC23A2 status versus Dukes’ stage alone (ntotal=35) – a) The blue line represents a combined Dukes’ AB and wild type SLC23A2, and the green line is a combined Dukes’ CD and mutant SLC23A2. b) The blue line represents Dukes’ AB, and the green line Dukes’ CD.

For only Dukes’ A-B-C there was also an unequal distribution of survival according to SLC23A2 status (p=0.02), and despite the small sample size, SLC23A2 status appeared to be able to distinguish between two groups of good or poor prognosis within the Dukes’ B group (see Fig. 21).

Figure 21. Survival according to SLC23A2 mutation status in Dukes’ B patients (n=14)

67 3.2. AUS series 3.2.1. DNA quantity and quality The amount of DNA extracted varied between 10mg-272mg, and its purity between 1.50-1.97. The value distributions are given in Figure 22. The purity is defined as the absorption of the DNA solution at 260nm (the peak absorption wavelength of nucleotides) over the absorption at 280nm (the peak absorption for proteins). Any

A260/A280 ratio of over 1,80 is considered indicative of comparatively uncontaminated DNA sample.

20

15

15

t

t

n

n u

u 10 o

o 10

C C

5 5

0 0 1,60 1,70 1,80 1,90 50,00 100,00 150,00 200,00 250,00 Purity Quantity; mg

Figure 22. Distribution of purity and quantity of DNA from the AUS tumor series

3.2.2. MSI status – AUS series 188 tumors of the AUS series were tested for microsatellite instability using the Bethesda markers. Of these, 131 were MSS, 6 were MSI-L, 19 were indeterminate and 32 were MSI-H (see Appendix 1 for a full account). Excluding the 19 indeterminate cases, the MSI-H group constitutes 19% percent of the series so far, a figure which is consistent with the average 15% MSI-H component postulated for any unselected colorectal tumor series.69

68 4. Discussion As regards the experimental results of target gene mutation analyses, the frequencies are, overall, not significantly different from those found in the corpus, and in general the discrepancy is smaller the more tumors that have been tested. Cases where our values are considerably higher than the pooled mutation frequency are those of ACVR2, CASP5, MSH2 and RAD50, all acknowledged MSI target genes, with the exception of MSH2. The (A)27 repeat is in an intron of MSH2, as is the BAT26 marker, and therefore would logically have a high mutation rate. Considering that the former repeat, however, is so close to an exon, it is not impossible that a large insertion or deletion could affect splicing of the gene and further reduce MMR. The opposite situation, where the mutation frequency is lower than the value given in Table 2, is found for AC1, EPHB2, SEMG1 and SPINK5. All of these have been the subject of few or single studies and often a very restricted sample size, and therefore discrepancies were not unexpected. Taken as a whole, these analyses confirm the target status of many of the 43 genes, and indicate that the MSI series is a representative tumor cohort. Taken together, MRE11 and/or RAD50 exhibit mutations in 81% of MSI tumors, and in these it is to be expected that functional MRE11-complexes, and thereby much double-strand break repair, are reduced – the expression of MRE11 and RAD50 is shown to be eradicated or severely reduced in such tumors.165 AVCR2 (ACTRII, an activin receptor) and TGFRβRII are two of the genes with the highest mutations frequencies in MSI colon cancers, both in this and previous studies. It is interesting in that the most high profile tumor suppressor in MSI cancers, TGFβRII, is joined by this other member of its gene family, which also activates SMAD2 and SMAD3 and affects many of the same downstream target genes. Activin signaling is thought to be an alternative route for effects of TGFβ signaling, and it induces apoptosis in several tumor cell types, upregulates pro-apoptotic genes and downregulates the anti- apoptotic BIRC5.143,166 In addition, activin signaling can inhibit angiogenesis.166 Over 70% of the MSI tumors tested exhibited mutations in both AVCR2 and TGFβRII, and one must assume that there are, overall, few MSI tumors which escape inactivation of one or the other receptor (in our series, the seven MSI-H tumors which were wild type for both

69 harbored almost no frameshift mutations at all). This would seem to confirm the preeminent role of SMAD-mediated activities in the prevention of carcinogenesis.

EP300, encoding the histone acetyltransferase p300 which had not previously been analyzed for frameshifts in primary tumor DNA, exhibited a 17% mutation frequency in its (A)7 tract*. Considering the small size of this repeat, and keeping in mind that repeats of this size are much less prone to slippage than even octohomopolymer tracts, nearly one of every five tumors being affected can be supposed to be well over the background mutation level for such repeats, and is comparable to the 2×(A)6 mutation frequency of the tumor suppressor PTEN. The study by Ionov et al. that analyzed only seven MSI cell lines for frameshifts in (A)5 and (A)7 repeats found alterations in four of these.167 Of the seven, five were overlapping with the nine cell lines used in our study. These were LoVo, HCT15, HCT116, SW48 and RKO. Ionov et al. found HCT15 to be mutated, while in our study it was deemed to be wild type, and the reverse was true for SW48. There can be several explanations for these observations. One is that, as the MMR system in these cell lines is defective, it is only to be expected that new mutations will surface from time to time in the separate cell stocks. Another is technical differences which might potentially distort analyses, and the primers used to amplify the regions in question were not the same. A 1bp deletion in the (A)7 tract results in a premature termination codon. This eliminates functional p300; however, the transcript would still be long enough to create a potentially disruptive truncated protein if it escapes nonsense-mediated decay. The peptide would, for example, retain a bromodomain, which allows binding to acetyl- lysines168 and could interfere with wild type functions. In cells with such a mutation one could imagine a decrease of normal histone acetylation, but the most tantalizing p300 function in the context of cancer is its interaction with and influence on p53. p300 (and possibly its close relative CBP) can acetylate p53 to contribute to its activation and more importantly take part in the context-dependent control of p53 stability.169,170

* An (A)5 in exon 3 was also investigated, but of the very few tumors which exhibited a frameshift all had an additional mutation in the (A)7 tract of exon 27.

70 Of the two groups we identified demonstrating some degree of covariance, one is AC1 and E2F4. The tumor suppressant qualities of the latter are have been demonstrated, while the former is uncharacterized. With the considerably lower mutation frequency found in our study as opposed to the only other examination of AC1 in colon carcinomas (n=20), the “Real Common Target gene”136 status of AC1 could be called into question. If the association is real, however, it may have prognostic significance beyond strictly functional considerations. The second group, which includes OGT, UVRAG and TCF4, is an example of the type of genetic constellation which may prove to be pertinent for the supplemental elucidation of MSI-CRC development. It contains one gene, OGT, which has multiple modification targets and may be related to patient survival (see below). The truncated form of TCF4 likely causes the deletion of a C-terminal domain which can repress its transcriptional function145, and this mutated TCF4 would then function as an oncogene in promoting inappropriate Wnt signaling. UVRAG has a tumor suppressor function as a positive regulator of the BECN1-PIK3C3 complex which induces autophagy.171 Autophagy, though a process that tumor cells can use to survive metabolic stress, is shown to promote tumorigenesis when defective, particularly in association with impaired apoptosis.172 Beclin1 (BECN1) is frequently subject to heterozygous loss in breast, ovarian and prostate cancers172, and the notable anti-apoptotic oncogene BCL2 is a negative regulator of the BECN1-PIK3C3 complex.171 These facts suggest that the regulation of autophagy is of considerable importance in tumorigenesis, and that the UVRAG frameshift mutation, which abrogates the Beclin1-binding domain, may be the MSI colonic equivalent of Beclin1 loss in other tumor types. In brief, this group includes one proto-oncogene, one tumor suppressor gene, and one gene which may modify the actions of many different target proteins in the cell; a model in miniature of the types of genes affected in carcinogenesis.

OGT (O-linked N-acetylglucosamine transferase), the gene which showed the clearest trend to be wild type with full patient recovery, modifies a wide range of proteins, all of which form reversible multimeric complexes.173 On the face of it, OGT could appear to have a cancer-promoting effect in that it likely prevents c-MYC and Sp1

71 degradation.173 Sp1 is a transcriptional activator of many genes that control growth and cell cycle progression, and its overexpression is a negative prognostic factor in many cancers.174 β-catenin is also subject to O-GlcNAc modification, but the functional importance of this modification is uncertain. Indubitably, O-GlcNAc modifications may play a role in carcinogenesis, especially as there is considerable interplay between glycosylation and phosphorylation173, but the picture is murky. Another complication with regard to the role of OGT in MSI carcinomas arises by the discovery that the gene encodes three separate isoforms.175 Only one of these contains the (T)10 repeat which is subject to frameshift mutation, which produces a stop codon after only 46 of 894 amino acids. All the isoforms have a full catalytic C-terminal domain; they differ mainly in the N-terminal region which mediates protein-protein interaction and in their cellular localization. The MSI-relevant isoform is localized to the mitochondria175, which implies that its impact on nucleocytoplasmic proteins such as c-MYC and β-catenin should be negligible. To further increase the general confusion, in vitro studies prior to 2006 did not distinguish between separate isoforms, and the isoform that was used to identify OGT substrates was precisely this mitochondrial one. It is hard to see, in light of what we currently know, exactly how a wild type mitochondrial isoform of OGT would contribute to patient survival. However, considering the multitude of potential modification targets of OGT, the possibility cannot be completely discounted.

The survival versus gene mutation significance test as well as the Kaplan-Meier estimator held up SLC23A as the only gene which when mutated was significantly correlated to patient mortality, albeit at a ten-year follow-up period. Its previously approved symbol was SLC23A1, and the protein functions as a sodium/ascorbate cotransporter, known as SVCT2. SLC23A2 encodes a predicted twelve-pass transmembrane protein176,177, but four of these transmembrane domains would be eliminated by a premature termination codon caused by a 1bp insertion/deletion in the (C)9 tract (see Fig. 23), thus rendering the transporter inactive. The possibility that the truncated protein itself could have an adverse effect cannot be discounted in the absence of functional assays. In many of our samples showing an SLC23A2 mutation it would appear that over half of the alleles harbor an insertion or deletion.

72

Figure 23 a) SVCT2 embedded in the membrane – The site of frameshift-induced truncation is marked by the transition from dark green to light green amino acids, represented as circles. b) Ascorbate and sodium entering the cell via the hypothetical transmembrane channel created by SVCT2.

SVCT2 is involved in regulating the intracellular concentration of ascorbate (vitamin C)176, and as ascorbate is a free-radical scavenger178 it is often cited as a cancer preventative micronutrient179,180 in order to avert oxidative DNA damage. Cancer patients exhibit vitamin C depletion181, and ascorbate has been shown to induce apoptosis in cancer cells by p53 stabilization, p21 and BAX modulation.182,183 It also has a growth inhibitory effect on several tumor types.181 The anti- or pro-oxidant properties of ascorbate, depending on its concentration and the redox potential of the microenvironment, as well as the linkage between ascorbate and hydrogen peroxide concentrations181, are particularly interesting in respect of tumor growth. Ascorbate, as a

73 pro-oxidant, can generate hydrogen peroxide which in turn can drive carcinogenic PI3K 181,184 signaling by the inhibition of PTEN. H2O2 is also known to inhibit gap junction intercellular communication (GJIC) and cause hyperphosphorylation of the crucial gap junction component and putative tumor suppressor connexin43*; ascorbate prevents this downregulation of GJIC, which is thought to be important in carcinogenesis.185,186 184 However, very high levels of H2O2 are cytotoxic and induce apoptosis. Therefore, high ascorbate concentrations can be potent growth inhibitors, while moderate levels may on occasion promote growth and differentiation. Furthermore, ascorbate can depress cancer by bolstering immune system efficiency and in hindering metastasis through preserving a tight extracellular matrix.181 The decrease in functional sodium/ascorbate cotransporters caused by a SLC23A2 frameshift mutation may well disturb intracellular ascorbate concentrations enough to attenuate or eliminate any tumor-inhibitory effect of ascorbate, and possibly also place its concentration at a level where it is growth-promoting enough to be linked to patient mortality. Apart from a purely functional role in carcinogenesis, SLC23A2 looks to be a promising prognostic indicator for the risk stratification of MSI patients, both alone by added resolution to the prognosis based on Dukes’ staging. It is perhaps most interesting in this regard by the suggestion that it may be able to differentiate between groups in Dukes’ B, one of the intermediate stages whose prognosis is less clear-cut than for A or D, and for predicting relapse beyond the standard five-year frame of reference. Due to the low number of patients these results are based on, said results must be interpreted as hypothesis-generating only, and validation analyses are necessary. Nevertheless, the relationship of SLC23A2 status to survival is striking in view of the restricted sample size.

* The hypothetical role of connexins as tumor suppressors is currently the subject of a parallel study at the Dept. of Cancer Prevention.

74 Perspectives To restate the overarching goal of this project, it is to enable subdivision by molecular markers of the existing clinical staging to improve the prediction of tumor behavior. In continuation the retrospective clinical AUS tumor series will be the primary sample set, containing an estimated 140 MSI-H cancers, all of which have associated clinical data. Initially we will focus on those target genes which stood out in this pilot study, and seek validation e.g. for SLC23A2 and OGT in relation to survival and tumor stage, and for the mutation frequency of EP300. We will also revisit the question of whether any combinations of our 42 target genes co-operate to influence patient survival. We may perhaps also be able to conclusively determine whether or not TGFβRII, BAX and ACVR2, the subjects of several conflicting studies93-97, affect prognosis. Subject to feasibility, the MSI series from this study could be combined with the AUS series to improve the statistical strength of the results. An attempt will also be made to identify novel MSI target genes by searching for genes that are involved in known cancer-related pathways among those that are present in the survey of genomic cMNRs (see 1.7.2.3). Furthermore the 43 genes in Table 1 are to be included on a microarray platform with custom-made oligos to target a variety of aberrant cancer transcripts; these are alternative splice variants and fusion genes in addition to unorthodox microsatellites.

Should any of the above microsatellite-containing genes prove significant in both retrospective studies, they will be subject to a prospective study and so be tested concurrently with clinical treatment. Finally, it is to be hoped MSI target genes validated as being prognostic markers can be included in clinical testing to enable high resolution risk stratification and treatment more tailored to the individual cancer patient needs.

75 Reference List Reference List

1. Thillaud, P. L., Paléopathologie du cancer, continutité ou rupture? [in French], Bull. Cancer., 93, 767, 2006.

2. Breasted, J., The Edwin Smith Surgical Papyrus, University of Chicago Press, Chicago, 1930.

3. Galen, Galeni - De proprium animi cuiuslibet affectum dignotione et curatione/De animi cuiuslibet peccatorum dignotione et curatione/De Atra Bile [in Greek], Lipsiae et Berolini in aedibus B.G.Teubneri, Berlin, 1937.

4. Hippocrates, Indices librorum ; Iusiurandum ; Lex ; De arte ; De medico ; De decente habitu praeceptiones ; De prisca medicina ; Deaere locis aquis ; De alimento ; De liquidorum usu ; De flatibus [in Greek], Lipsiae et Berolini in aedibus B.G.Teubneri, Berlin, 1927.

5. Liddell, H. G., Scott, R., Greek-English Lexicon, Oxford University Press, Oxford, 1891.

6. Weinberg, R. A., The Biology of Cancer, Garland Science, New York, 2007.

7. Hansemann, D., Ueber asymmetrische Zelltheilung in Epithelkrebsen und deren biologische Bedeutung, Virchows Arch. Path. Anat., 119, 299, 1890.

8. Nowell, P. C., Hungerford, D., A Minute Chromosome in Human Chronic Myelogenous Leukemia, Science, 132, 1497, 1960.

9. Ribbert, H., Zur Frage der Entstehung Maligner Tumoren, Naturwissenschaften, 2, 676, 914.

10. Rowley, J. D., Letter: A new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining, Nature., 243, 290, 1973.

11. de, K. A., van Kessel, A. G., Grosveld, G., Bartram, C. R., Hagemeijer, A., Bootsma, D., Spurr, N. K., Heisterkamp, N., Groffen, J., Stephenson, J. R., A cellular oncogene is translocated to the Philadelphia chromosome in chronic myelocytic leukaemia, Nature., 300, 765, 1982.

12. FOULDS, L., The natural history of cancer, J. Chronic. Dis., 8, 2, 1958.

13. Loeb, L. A., Mutator phenotype may be required for multistage carcinogenesis, Cancer Res., 51, 3075, 1991.

14. Hanahan, D., Weinberg, R. A., The hallmarks of cancer, Cell., 100, 57, 2000.

76 15. Nowell, P. C., The clonal evolution of tumor cell populations, Science., 194, 23, 1976.

16. Fearon, E. R., Vogelstein, B., A genetic model for colorectal tumorigenesis, Cell., 61, 759, 1990.

17. Al-Hajj, M., Clarke, M. F., Self-renewal and solid tumor stem cells, Oncogene., 20;23, 7274, 2004.

18. Bapat, S. A., Evolution of cancer stem cells, Semin. Cancer Biol., 17, 204, 2007.

19. Stehelin, D., Varmus, H. E., Bishop, J. M., Vogt, P. K., DNA related to the transforming gene(s) of avian sarcoma viruses is present in normal avian DNA, Nature., 260, 170, 1976.

20. Knudson, A. G., Jr., Mutation and cancer: statistical study of retinoblastoma, Proc. Natl. Acad. Sci. U. S. A., 68, 820, 1971.

21. Cook, W. D., McCaw, B. J., Accommodating haploinsufficient tumor suppressor genes in Knudson's model, Oncogene., 19, 3434, 2000.

22. Santarosa, M., Ashworth, A., Haploinsufficiency for tumour suppressor genes: when you don't need to go all the way, Biochim. Biophys. Acta., 1654, 105, 2004.

23. D'Alessio, A. C., Szyf, M., Epigenetic tete-a-tete: the bilateral relationship between chromatin modifications and DNA methylation, Biochem. Cell Biol., 84, 463, 2006.

24. Ehrlich, M., DNA methylation in cancer: too much, but also too little, Oncogene., 21, 5400, 2002.

25. Esteller, M., Epigenetics provides a new generation of oncogenes and tumour- suppressor genes, Br. J. Cancer., 94, 179, 2006.

26. Feinberg, A. P., Vogelstein, B., Hypomethylation of ras oncogenes in primary human cancers, Biochem. Biophys. Res. Commun., 111, 47, 1983.

27. Li, Y. C., Korol, A. B., Fahima, T., Beiles, A., Nevo, E., Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review, Mol. Ecol., 11, 2453, 2002.

28. Jascur, T., Boland, C. R., Structure and function of the components of the human DNA mismatch repair system, Int. J. Cancer., 119, 2030, 2006.

29. Marti, T. M., Kunz, C., Fleck, O., DNA mismatch repair and mutation avoidance pathways, J. Cell Physiol., 191, 28, 2002.

77 30. Helleday, T., Lo, J., van, G., Engelward, B. P., DNA double-strand break repair: From mechanistic understanding to cancer treatment, DNA Repair (Amst)., ., 2007.

31. Assenmacher, N., Hopfner, K. P., MRE11/RAD50/NBS1: complex activities, Chromosoma., 113, 157, 2004.

32. Kolodner, R. D., Marsischky, G. T., Eukaryotic DNA mismatch repair, Curr. Opin. Genet. Dev., 9, 89, 1999.

33. Vo, A. T., Zhu, F., Wu, X., Yuan, F., Gao, Y., Gu, L., Li, G. M., Lee, T. H., Her, C., hMRE11 deficiency leads to microsatellite instability and defective DNA mismatch repair, EMBO Rep., 6, 438, 2005.

34. Campbell, M. R., Wang, Y., Andrew, S. E., Liu, Y., Msh2 deficiency leads to chromosomal abnormalities, centrosome amplification, and telomere capping defect, Oncogene., 20;25, 2531, 2006.

35. Wang, Y., Qin, J., MSH2 and ATR form a signaling module and regulate two branches of the damage response to DNA methylation, Proc. Natl. Acad. Sci. U. S. A., 100, 15387, 2003.

36. Sabharwal, A., Middleton, M. R., Exploiting the role of O6-methylguanine-DNA- methyltransferase (MGMT) in cancer therapy, Curr. Opin. Pharmacol., 6, 355, 2006.

37. de Laat, W. L., Jaspers, N. G., Hoeijmakers, J. H., Molecular mechanism of nucleotide excision repair, Genes Dev., 13, 768, 1999.

38. Almeida, K. H., Sobol, R. W., A unified view of base excision repair: Lesion- dependent protein complexes regulated by post-translational modification, DNA Repair (Amst)., ., 2007.

39. Ford, H. L., Pardee, A. B., Cancer and the cell cycle, J. Cell Biochem., Suppl 32- 33:166-72., 166, 1999.

40. Trimarchi, J. M., Lees, J. A., Sibling rivalry in the E2F family, Nat. Rev. Mol. Cell Biol., 3, 11, 2002.

41. Grady, W. M., Genomic instability and colon cancer, Cancer Metastasis Rev., 23, 11, 2004.

42. Lowe, S. W., Cepero, E., Evan, G., Intrinsic tumour suppression, Nature., 432, 307, 2004.

43. Roos, W. P., Kaina, B., DNA damage-induced cell death by apoptosis, Trends Mol. Med., 12, 440, 2006.

78 44. Vogelstein, B., Lane, D., Levine, A. J., Surfing the p53 network, Nature., 408, 307, 2000.

45. Wyllie, A. H., Kerr, J. F., Currie, A. R., Cell death: the significance of apoptosis, Int. Rev. Cytol., 68:251-306., 251, 1980.

46. Lengauer, C., Kinzler, K. W., Vogelstein, B., Genetic instabilities in human cancers, Nature., 396, 643, 1998.

47. Loeb, L. A., Loeb, K. R., Anderson, J. P., Multiple mutations and cancer, Proc. Natl. Acad. Sci. U. S. A., 100, 776, 2003.

48. Tomlinson, I. P., Novelli, M. R., Bodmer, W. F., The mutation rate and cancer, Proc. Natl. Acad. Sci. U. S. A., 93, 14800, 1996.

49. Anacleto, C., Leopoldino, A. M., Rossi, B., Soares, F. A., Lopes, A., Rocha, J. C., Caballero, O., Camargo, A. A., Simpson, A. J., Pena, S. D., Colorectal cancer "methylator phenotype": fact or artifact?, Neoplasia., 7, 331, 2005.

50. Jemal, A., Siegel, R., Ward, E., Murray, T., Xu, J., Smigal, C., Thun, M. J., Cancer statistics, 2006, CA Cancer J. Clin., 56, 106, 2006.

51. Weitz, J., Koch, M., Debus, J., Hohler, T., Galle, P. R., Buchler, M. W., Colorectal cancer, Lancet., 365, 153, 2005.

52. Cancer in Norway 2005. 2006. Kreftregisteret. Ref Type: Report

53. United States Cancer Statistics 2003 Incidence and Mortality. 2007. Dept. of Health and Human Services. Ref Type: Generic

54. Fitzgerald, R. H., Jr., What is the Dukes' system for carcinoma of the rectum?, Dis. Colon Rectum., 25, 474, 1982.

55. Whittaker, M., Goligher, J. C., The prognosis after surgical treatment for carcinoma of the rectum, Br. J. Surg., 63, 384, 1976.

56. Compton, C. C., Greene, F. L., The staging of colorectal cancer: 2004 and beyond, CA Cancer J. Clin., 54, 295, 2004.

57. Deans, G. T., Parks, T. G., Rowlands, B. J., Spence, R. A., Prognostic factors in colorectal cancer, Br. J. Surg., 79, 608, 1992.

58. Klump, B., Nehls, O., Okech, T., Hsieh, C. J., Gaco, V., Gittinger, F. S., Sarbia, M., Borchard, F., Greschniok, A., Gruenagel, H. H., Porschen, R., Gregor, M., Molecular lesions in colorectal cancer: impact on prognosis? Original data and review of the literature, Int. J. Colorectal Dis., 19, 23, 2004.

79 59. Calabrese, P., Tavare, S., Shibata, D., Pretumor progression: clonal evolution of human stem cell populations, Am. J. Pathol., 164, 1337, 2004.

60. Preston, S. L., Wong, W. M., Chan, A. O., Poulsom, R., Jeffery, R., Goodlad, R. A., Mandir, N., Elia, G., Novelli, M., Bodmer, W. F., Tomlinson, I. P., Wright, N. A., Bottom-up histogenesis of colorectal adenomas: origin in the monocryptal adenoma and initial expansion by crypt fission, Cancer Res., 63, 3819, 2003.

61. Radtke, F., Clevers, H., Self-renewal and cancer of the gut: two sides of a coin, Science., 307, 1904, 2005.

62. Bodmer, W. F., Cancer genetics: colorectal cancer as a model, J. Hum. Genet., 51, 391, 2006.

63. Powell, S. M., Zilz, N., Beazer-Barclay, Y., Bryan, T. M., Hamilton, S. R., Thibodeau, S. N., Vogelstein, B., Kinzler, K. W., APC mutations occur early during colorectal tumorigenesis, Nature., 359, 235, 1992.

64. Shibata, H., Toyama, K., Shioya, H., Ito, M., Hirota, M., Hasegawa, S., Matsumoto, H., Takano, H., Akiyama, T., Toyoshima, K., Kanamaru, R., Kanegae, Y., Saito, I., Nakamura, Y., Shiba, K., Noda, T., Rapid colorectal adenoma formation initiated by conditional targeting of the Apc gene, Science., 278, 120, 1997.

65. van den Brink, G. R., Offerhaus, G. J., The morphogenetic code and colon cancer development, Cancer Cell., 11, 109, 2007.

66. Nathke, I. S., Adams, C. L., Polakis, P., Sellin, J. H., Nelson, W. J., The adenomatous polyposis coli tumor suppressor protein localizes to plasma membrane sites involved in active cell migration, J. Cell Biol., 134, 165, 1996.

67. Pollock, A. M., Quirke, P., Adenoma screening and colorectal cancer, BMJ., 303, 3, 1991.

68. Beach, R., Chan, A. O., Wu, T. T., White, J. A., Morris, J. S., Lunagomez, S., Broaddus, R. R., Issa, J. P., Hamilton, S. R., Rashid, A., BRAF mutations in aberrant crypt foci and hyperplastic polyposis, Am. J. Pathol., 166, 1069, 2005.

69. Boland, C. R., Thibodeau, S. N., Hamilton, S. R., Sidransky, D., Eshleman, J. R., Burt, R. W., Meltzer, S. J., Rodriguez-Bigas, M. A., Fodde, R., Ranzani, G. N., Srivastava, S., A National Cancer Institute Workshop on Microsatellite Instability for cancer detection and familial predisposition: development of international criteria for the determination of microsatellite instability in colorectal cancer, Cancer Res., 58, 5248, 1998.

70. Lengauer, C., Kinzler, K. W., Vogelstein, B., Genetic instability in colorectal cancers, Nature., 386, 623, 1997.

80 71. Rajagopalan, H., Nowak, M. A., Vogelstein, B., Lengauer, C., The significance of unstable chromosomes in colorectal cancer, Nat. Rev. Cancer., 3, 695, 2003.

72. Aaltonen, L. A., Peltomaki, P., Leach, F. S., Sistonen, P., Pylkkanen, L., Mecklin, J. P., Jarvinen, H., Powell, S. M., Jen, J., Hamilton, S. R., ., Clues to the pathogenesis of familial colorectal cancer, Science., 260, 812, 1993.

73. Ionov, Y., Peinado, M. A., Malkhosyan, S., Shibata, D., Perucho, M., Ubiquitous somatic mutations in simple repeated sequences reveal a new mechanism for colonic carcinogenesis, Nature., 363, 558, 1993.

74. Thibodeau, S. N., Bren, G., Schaid, D., Microsatellite instability in cancer of the proximal colon, Science., 260, 816, 1993.

75. Kim, H., Jen, J., Vogelstein, B., Hamilton, S. R., Clinical and pathological characteristics of sporadic colorectal carcinomas with DNA replication errors in microsatellite sequences, Am. J. Pathol., 145, 148, 1994.

76. Lothe, R. A., Peltomaki, P., Meling, G. I., Aaltonen, L. A., Nystrom-Lahti, M., Pylkkanen, L., Heimdal, K., Andersen, T. I., Moller, P., Rognum, T. O., ., Genomic instability in colorectal cancer: relationship to clinicopathological variables and family history, Cancer Res., 53, 5849, 1993.

77. Bubb, V. J., Curtis, L. J., Cunningham, C., Dunlop, M. G., Carothers, A. D., Morris, R. G., White, S., Bird, C. C., Wyllie, A. H., Microsatellite instability and the role of hMSH2 in sporadic colorectalcancer, Oncogene., 20;12, 2641, 1996.

78. Chung, K. Y., Kim, N. G., Li, L. S., Kim, H., Kim, H., Nam, C. M., Kim, H., Shin, D. H., Clinicopathologic characteristics related to the high variability of coding mononucleotide repeat sequences in tumors with high-microsatellite instability, Oncol. Rep., 10, 439, 2003.

79. Duval, A., Hamelin, R., Genetic instability in human mismatch repair deficient cancers, Ann. Genet., 45, 71, 2002.

80. Dolcetti, R., Viel, A., Doglioni, C., Russo, A., Guidoboni, M., Capozzi, E., Vecchiato, N., Macri, E., Fornasarig, M., Boiocchi, M., High prevalence of activated intraepithelial cytotoxic T lymphocytes and increased neoplastic cell apoptosis in colorectal carcinomas with microsatellite instability, Am. J. Pathol., 154, 1805, 1999.

81. Popat, S., Hubner, R., Houlston, R. S., Systematic review of microsatellite instability and colorectal cancer prognosis, J. Clin. Oncol., 20;23, 609, 2005.

82. Glebov, O. K., Rodriguez, L. M., Nakahara, K., Jenkins, J., Cliatt, J., Humbyrd, C. J., DeNobile, J., Soballe, P., Simon, R., Wright, G., Lynch, P., Patterson, S., Lynch, H., Gallinger, S., Buchbinder, A., Gordon, G., Hawk, E., Kirsch, I. R.,

81 Distinguishing right from left colon by the pattern of gene expression, Cancer Epidemiol. Biomarkers Prev., 12, 755, 2003.

83. Bufill, J. A., Colorectal cancer: evidence for distinct genetic categories based on proximal or distal tumor location, Ann. Intern. Med., 113, 779, 1990.

84. Pocard, M., Salmon, R. J., Muleris, M., Remvikos, Y., Bara, J., Dutrillaux, B., Poupon, M. F., Deux colons—deux cancers? Adenocarcinomes coliques proximal ou distal: arguments en faveur d'une cancerogenese distincte [Two colons--two cancers? Proximal or distal adenocarcinoma: arguments for a different carcinogenesis], Bull. Cancer., 82, 10, 1995.

85. Diep, C. B., Thorstensen, L., Meling, G. I., Skovlund, E., Rognum, T. O., Lothe, R. A., Genetic tumor markers with prognostic impact in Dukes' stages B and C colorectal cancer patients, J. Clin. Oncol., 21, 820, 2003.

86. Borresen-Dale, A. L., Lothe, R. A., Meling, G. I., Hainaut, P., Rognum, T. O., Skovlund, E., TP53 and long-term prognosis in colorectal cancer: mutations in the L3 zinc-binding domain predict poor survival, Clin. Cancer Res., 4, 203, 1998.

87. Laurent-Puig, P., Olschwang, S., Delattre, O., Remvikos, Y., Asselain, B., Melot, T., Validire, P., Muleris, M., Girodet, J., Salmon, R. J., ., Survival and acquired genetic alterations in colorectal cancer, Gastroenterology., 102, 1136, 1992.

88. Gerdes, H., Chen, Q., Elahi, A. H., Sircar, A., Goldberg, E., Winawer, D., Urmacher, C., Winawer, S. J., Jhanwar, S. C., Recurrent deletions involving chromosomes 1, 5, 17, and 18 in colorectal carcinoma: possible role in biological and clinical behavior of tumors, Anticancer Res., 15, 13, 1995.

89. Petersen, S., Thames, H. D., Nieder, C., Petersen, C., Baumann, M., The results of colorectal cancer treatment by p53 status: treatment-specific overview, Dis. Colon Rectum., 44, 322, 2001.

90. Ogunbiyi, O. A., Goodfellow, P. J., Herfarth, K., Gagliardi, G., Swanson, P. E., Birnbaum, E. H., Read, T. E., Fleshman, J. W., Kodner, I. J., Moley, J. F., Confirmation that chromosome 18q allelic loss in colon cancer is a prognostic indicator, J. Clin. Oncol., 16, 427, 1998.

91. Lanza, G., Matteuzzi, M., Gafa, R., Orvieto, E., Maestri, I., Santini, A., del, S. L., Chromosome 18q allelic loss and prognosis in stage II and III colon cancer, Int. J. Cancer., 79, 390, 1998.

92. Font, A., Abad, A., Monzo, M., Sanchez, J. J., Guillot, M., Manzano, J. L., Pinol, M., Ojanguren, I., Rosell, R., Prognostic value of K-ras mutations and allelic imbalance on chromosome 18q in patients with resected colorectal cancer, Dis. Colon Rectum., 44, 549, 2001.

82 93. Fernandez-Peralta, A. M., Nejda, N., Oliart, S., Medina, V., Azcoita, M. M., Gonzalez-Aguilera, J. J., Significance of mutations in TGFBR2 and BAX in neoplastic progression and patient outcome in sporadic colorectal tumors with high-frequency microsatellite instability, Cancer Genet. Cytogenet., 157, 18, 2005.

94. Ionov, Y., Yamamoto, H., Krajewski, S., Reed, J. C., Perucho, M., Mutational inactivation of the proapoptotic gene BAX confers selective advantage during tumor clonal evolution, Proc. Natl. Acad. Sci. U. S. A., 97, 10872, 2000.

95. Watanabe, T., Wu, T. T., Catalano, P. J., Ueki, T., Satriano, R., Haller, D. G., Benson, A. B., III, Hamilton, S. R., Molecular predictors of survival after adjuvant chemotherapy for colon cancer, N. Engl. J. Med., %19;344, 1196, 2001.

96. Jung, B., Smith, E. J., Doctolero, R. T., Gervaz, P., Alonso, J. C., Miyai, K., Keku, T., Sandler, R. S., Carethers, J. M., Influence of target gene mutations on survival, stage and histology in sporadic microsatellite unstable colon cancers, Int. J. Cancer., 118, 2509, 2006.

97. Samowitz, W. S., Curtin, K., Neuhausen, S., Schaffer, D., Slattery, M. L., Prognostic implications of BAX and TGFBRII mutations in colon cancers with microsatellite instability, Genes Chromosomes. Cancer., 35, 368, 2002.

98. Toyota, M., Ahuja, N., Ohe-Toyota, M., Herman, J. G., Baylin, S. B., Issa, J. P., CpG island methylator phenotype in colorectal cancer, Proc. Natl. Acad. Sci. U. S. A., 20;96, 8681, 1999.

99. Jass, J. R., Pathogenesis of colorectal cancer, Surg. Clin. North Am., 82, 891, 2002.

100. Makinen, M. J., Colorectal serrated adenocarcinoma, Histopathology., 50, 131, 2007.

101. Jass, J. R., Hyperplastic polyps and colorectal cancer: is there a link?, Clin. Gastroenterol. Hepatol., 2, 1, 2004.

102. Torlakovic, E., Snover, D. C., Serrated adenomatous polyposis in humans, Gastroenterology., 110, 748, 1996.

103. Fodde, R., Smits, R., Clevers, H., APC, signal transduction and genetic instability in colorectal cancer, Nat. Rev. Cancer., 1, 55, 2001.

104. Padgett, R. W., Das, P., Krishna, S., TGF-beta signaling, Smads, and tumor suppressors, Bioessays., 20, 382, 1998.

105. Siegel, P. M., Massague, J., Cytostatic and apoptotic actions of TGF-beta in homeostasis and cancer, Nat. Rev. Cancer., 3, 807, 2003.

83 106. Logan, C. Y., Nusse, R., The Wnt signaling pathway in development and disease, Annu. Rev. Cell Dev. Biol., 20:781-810., 781, 2004.

107. Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., Walter, P., Molecular Biology of the Cell, Garland Science, New York, 2002.

108. Fang, J. Y., Richardson, B. C., The MAPK signalling pathways and colorectal cancer, Lancet Oncol., 6, 322, 2005.

109. Cully, M., You, H., Levine, A. J., Mak, T. W., Beyond PTEN mutations: the PI3K pathway as an integrator of multiple inputs during tumorigenesis, Nat. Rev. Cancer., 6, 184, 2006.

110. Guanti, G., Resta, N., Simone, C., Cariola, F., Demma, I., Fiorente, P., Gentile, M., Involvement of PTEN mutations in the genetic pathways of colorectal cancerogenesis, Hum. Mol. Genet., 9, 283, 2000.

111. Thorstensen, L., Lind, G. E., Lovig, T., Diep, C. B., Meling, G. I., Rognum, T. O., Lothe, R. A., Genetic and epigenetic changes of components affecting the WNT pathway in colorectal carcinomas stratified by microsatellite instability, Neoplasia., 7, 99, 2005.

112. Persad, S., Troussard, A. A., McPhee, T. R., Mulholland, D. J., Dedhar, S., Tumor suppressor PTEN inhibits nuclear accumulation of beta-catenin and T cell/lymphoid enhancer factor 1-mediated transcriptional activation, J. Cell Biol., 153, 1161, 2001.

113. Shaw, R. J., Cantley, L. C., Ras, PI(3)K and mTOR signalling controls tumour cell growth, Nature., 441, 424, 2006.

114. Kane, M. F., Loda, M., Gaida, G. M., Lipman, J., Mishra, R., Goldman, H., Jessup, J. M., Kolodner, R., Methylation of the hMLH1 promoter correlates with lack of expression of hMLH1 in sporadic colon tumors and mismatch repair- defective human tumor cell lines, Cancer Res., 57, 808, 1997.

115. Borresen, A. L., Lothe, R. A., Meling, G. I., Lystad, S., Morrison, P., Lipford, J., Kane, M. F., Rognum, T. O., Kolodner, R. D., Somatic mutations in the hMSH2 gene in microsatellite unstable colorectal carcinomas, Hum. Mol. Genet., 4, 2065, 1995.

116. Cunningham, J. M., Christensen, E. R., Tester, D. J., Kim, C. Y., Roche, P. C., Burgart, L. J., Thibodeau, S. N., Hypermethylation of the hMLH1 promoter in colon cancer with microsatellite instability, Cancer Res., 58, 3455, 1998.

117. Herman, J. G., Umar, A., Polyak, K., Graff, J. R., Ahuja, N., Issa, J. P., Markowitz, S., Willson, J. K., Hamilton, S. R., Kinzler, K. W., Kane, M. F., Kolodner, R. D., Vogelstein, B., Kunkel, T. A., Baylin, S. B., Incidence and

84 functional consequences of hMLH1 promoter hypermethylation in colorectal carcinoma, Proc. Natl. Acad. Sci. U. S. A., 95, 6870, 1998.

118. Kuismanen, S. A., Holmberg, M. T., Salovaara, R., de la, C. A., Peltomaki, P., Genetic and epigenetic modification of MLH1 accounts for a major share of microsatellite-unstable colorectal cancers, Am. J. Pathol., 156, 1773, 2000.

119. Liu, B., Nicolaides, N. C., Markowitz, S., Willson, J. K., Parsons, R. E., Jen, J., Papadopolous, N., Peltomaki, P., de la, C. A., Hamilton, S. R., ., Mismatch repair gene defects in sporadic colorectal cancers with microsatellite instability, Nat. Genet., 9, 48, 1995.

120. Thibodeau, S. N., French, A. J., Cunningham, J. M., Tester, D., Burgart, L. J., Roche, P. C., McDonnell, S. K., Schaid, D. J., Vockley, C. W., Michels, V. V., Farr, G. H., Jr., O'Connell, M. J., Microsatellite instability in colorectal cancer: different mutator phenotypes and the principal involvement of hMLH1, Cancer Res., 58, 1713, 1998.

121. Kunkel, T. A., Misalignment-mediated DNA synthesis errors, Biochemistry., 29, 8003, 1990.

122. Perucho, M., Cancer of the microsatellite mutator phenotype, Biol. Chem., 377, 675, 1996.

123. Streisinger, G., Okada, Y., Emrich, J., Newton, J., Tsugita, A., Terzaghi, E., Inouye, M., Frameshift mutations and the genetic code. This paper is dedicated to Professor Theodosius Dobzhansky on the occasion of his 66th birthday, Cold Spring Harb. Symp. Quant. Biol., 31:77-84., 77, 1966.

124. Sia, E. A., Jinks-Robertson, S., Petes, T. D., Genetic control of microsatellite stability, Mutat. Res., 383, 61, 1997.

125. Marcotte, E. M., Pellegrini, M., Yeates, T. O., Eisenberg, D., A census of protein repeats, J. Mol. Biol., 293, 151, 1999.

126. Toth, G., Gaspari, Z., Jurka, J., Microsatellites in different eukaryotic genomes: survey and analysis, Genome Res., 10, 967, 2000.

127. Alazzouzi, H., Davalos, V., Kokko, A., Domingo, E., Woerner, S. M., Wilson, A. J., Konrad, L., Laiho, P., Espin, E., Armengol, M., Imai, K., Yamamoto, H., Mariadason, J. M., Gebert, J. F., Aaltonen, L. A., Schwartz S Jr, Arango, D., Mechanisms of inactivation of the receptor tyrosine kinase EPHB2 in colorectal tumors, Cancer Res., 65, 10170, 2005.

128. Lind, G. E., Thorstensen, L., Lovig, T., Meling, G. I., Hamelin, R., Rognum, T. O., Esteller, M., Lothe, R. A., A CpG island hypermethylation profile of primary colorectal carcinomas and colon cancer cell lines, Mol. Cancer., 3:28., 28, 2004.

85 129. Oliveira, C., Velho, S., Domingo, E., Preto, A., Hofstra, R. M., Hamelin, R., Yamamoto, H., Seruca, R., Schwartz S Jr, Concomitant RASSF1A hypermethylation and KRAS/BRAF mutations occur preferentially in MSI sporadic colorectal cancer, Oncogene., 24, 7630, 2005.

130. Shen, L., Kondo, Y., Hamilton, S. R., Rashid, A., Issa, J. P., P14 methylation in human colon cancer is associated with microsatellite instability and wild-type p53, Gastroenterology., 124, 626, 2003.

131. Esteller, M., Aberrant DNA methylation as a cancer-inducing mechanism, Annu. Rev. Pharmacol. Toxicol., 45:629-56., 629, 2005.

132. Duval, A., Rolland, S., Compoint, A., Tubacher, E., Iacopetta, B., Thomas, G., Hamelin, R., Evolution of instability at coding and non-coding repeat sequences in human MSI-H colorectal cancers, Hum. Mol. Genet., 10, 513, 2001.

133. Duval, A., Reperant, M., Hamelin, R., Comparative analysis of mutation frequency of coding and non coding short mononucleotide repeats in mismatch repair deficient colorectal cancers, Oncogene., 21, 8062, 2002.

134. Suraweera, N., Iacopetta, B., Duval, A., Compoint, A., Tubacher, E., Hamelin, R., Conservation of mononucleotide repeats within 3' and 5' untranslated regions and their instability in MSI-H colorectal cancer, Oncogene., 20, 7472, 2001.

135. Duval, A., Hamelin, R., Mutations at coding repeat sequences in mismatch repair- deficient human cancers: toward a new concept of target genes for instability, Cancer Res., 62, 2447, 2002.

136. Woerner, S. M., Benner, A., Sutter, C., Schiller, M., Yuan, Y. P., Keller, G., Bork, P., Doeberitz, M. K., Gebert, J. F., Pathogenesis of DNA repair-deficient cancers: a statistical meta-analysis of putative Real Common Target genes, Oncogene., 22, 2226, 2003.

137. Perucho, M., Correspondence re: C.R. Boland et al., A National Cancer Institute workshop on microsatellite instability for cancer detection and familial predisposition: development of international criteria for the determination of microsatellite instability in colorectal cancer. Cancer Res., 58: 5248-5257, 1998, Cancer Res., 59, 249, 1999.

138. Kim, N. G., Rhee, H., Li, L. S., Kim, H., Lee, J. S., Kim, J. H., Kim, N. K., Kim, H., Identification of MARCKS, FLJ11383 and TAF1B as putative novel target genes in colorectal carcinomas with microsatellite instability, Oncogene., 21, 5081, 2002.

139. Chadwick, R. B., Jiang, G. L., Bennington, G. A., Yuan, B., Johnson, C. K., Stevens, M. W., Niemann, T. H., Peltomaki, P., Huang, S., de la, C. A., Candidate tumor suppressor RIZ is frequently involved in colorectal carcinogenesis, Proc. Natl. Acad. Sci. U. S. A., 97, 2662, 2000.

86 140. Liu, W., Dong, X., Mai, M., Seelan, R. S., Taniguchi, K., Krishnadath, K. K., Halling, K. C., Cunningham, J. M., Boardman, L. A., Qian, C., Christensen, E., Schmidt, S. S., Roche, P. C., Smith, D. I., Thibodeau, S. N., Mutations in AXIN2 cause colorectal cancer with defective mismatch repair by activating beta- catenin/TCF signalling, Nat. Genet., 26, 146, 2000.

141. Markowitz, S., Wang, J., Myeroff, L., Parsons, R., Sun, L., Lutterbaugh, J., Fan, R. S., Zborowska, E., Kinzler, K. W., Vogelstein, B., ., Inactivation of the type II TGF-beta receptor in colon cancer cells with microsatellite instability, Science., 268, 1336, 1995.

142. Takashima, H., Matsumoto, Y., Matsubara, N., Shirakawa, Y., Kawashima, R., Tanino, M., Ito, S., Isozaki, H., Ouchida, M., Meltzer, S. J., Shimizu, K., Tanaka, N., Effect of naturally occurring E2F-4 alterations on transcriptional activation and proliferation in transfected cells, Lab Invest., 81, 1565, 2001.

143. Deacu, E., Mori, Y., Sato, F., Yin, J., Olaru, A., Sterian, A., Xu, Y., Wang, S., Schulmann, K., Berki, A., Kan, T., Abraham, J. M., Meltzer, S. J., Activin type II receptor restoration in ACVR2-deficient colon cancer cells induces transforming growth factor-beta response pathway genes, Cancer Res., 64, 7690, 2004.

144. Grady, W. M., Rajput, A., Myeroff, L., Liu, D. F., Kwon, K., Willis, J., Markowitz, S., Mutation of the type II transforming growth factor-beta receptor is coincident with the transformation of human colon adenomas to malignant carcinomas, Cancer Res., 58, 3101, 1998.

145. Duval, A., Rolland, S., Tubacher, E., Bui, H., Thomas, G., Hamelin, R., The human T-cell transcription factor-4 gene: structure, extensive characterization of alternative splicings, and mutational analysis in colorectal cancer cell lines, Cancer Res., 60, 3872, 2000.

146. Mori, Y., Yin, J., Rashid, A., Leggett, B. A., Young, J., Simms, L., Kuehl, P. M., Langenberg, P., Meltzer, S. J., Stine, O. C., Instabilotyping: comprehensive identification of frameshift mutations caused by coding region microsatellite instability, Cancer Res., 61, 6046, 2001.

147. Forgacs, E., Wren, J. D., Kamibayashi, C., Kondo, M., Xu, X. L., Markowitz, S., Tomlinson, G. E., Muller, C. Y., Gazdar, A. F., Garner, H. R., Minna, J. D., Searching for microsatellite mutations in coding regions in lung, breast, ovarian and colorectal cancers, Oncogene., 20, 1005, 2001.

148. Potocnik, U., Glavac, D., Ravnik-Glavac, M., Identification of novel genes with somatic frameshift mutations within coding mononucleotide repeats in colorectal tumors with high microsatellite instability, Genes Chromosomes. Cancer., 36, 48, 2003.

149. Woerner, S. M., Gebert, J., Yuan, Y. P., Sutter, C., Ridder, R., Bork, P., von Knebel, D. M., Systematic identification of genes with coding microsatellites

87 mutated in DNA mismatch repair-deficient cancer cells, Int. J. Cancer., 93, 12, 2001.

150. Kasprzyk, A., Keefe, D., Smedley, D., London, D., Spooner, W., Melsopp, C., Hammond, M., Rocca-Serra, P., Cox, T., Birney, E., EnsMart: a generic system for fast and flexible access to biological data, Genome Res., 14, 160, 2004.

151. Giannini, G., Ristori, E., Cerignoli, F., Rinaldi, C., Zani, M., Viel, A., Ottini, L., Crescenzi, M., Martinotti, S., Bignami, M., Frati, L., Screpanti, I., Gulino, A., Human MRE11 is inactivated in mismatch repair-deficient cancers, EMBO Rep., 3, 248, 2002.

152. Khajavi, M., Inoue, K., Lupski, J. R., Nonsense-mediated mRNA decay modulates clinical outcome of genetic disease, Eur. J. Hum. Genet., 14, 1074, 2006.

153. Ionov, Y., Nowak, N., Perucho, M., Markowitz, S., Cowell, J. K., Manipulation of nonsense mediated decay identifies gene mutations in colon cancer Cells with microsatellite instability, Oncogene., 23, 639, 2004.

154. El-Bchiri, J., Buhard, O., Penard-Lacronique, V., Thomas, G., Hamelin, R., Duval, A., Differential nonsense mediated decay of mutated mRNAs in mismatch repair deficient colorectal cancers, Hum. Mol. Genet., 14, 2435, 2005.

155. Schwitalle, Y., Linnebacher, M., Ripberger, E., Gebert, J., von Knebel, D. M., Immunogenic peptides generated by frameshift mutations in DNA mismatch repair-deficient cancer cells, Cancer Immun., 4:14., 14, 2004.

156. Linnebacher, M., Gebert, J., Rudy, W., Woerner, S., Yuan, Y. P., Bork, P., von Knebel, D. M., Frameshift peptide-derived T-cell epitopes: a source of novel tumor-specific antigens, Int. J. Cancer., 93, 6, 2001.

157. Ripberger, E., Linnebacher, M., Schwitalle, Y., Gebert, J., von Knebel, D. M., Identification of an HLA-A0201-restricted CTL epitope generated by a tumor- specific frameshift mutation in a coding microsatellite of the OGT gene, J. Clin. Immunol., 23, 415, 2003.

158. Saeterdal, I., Gjertsen, M. K., Straten, P., Eriksen, J. A., Gaudernack, G., A TGF betaRII frameshift-mutation-derived CTL epitope recognised by HLA-A2- restricted CD8+ T cells, Cancer Immunol. Immunother., 50, 469, 2001.

159. Bicknell, D. C., Kaklamanis, L., Hampson, R., Bodmer, W. F., Karran, P., Selection for beta 2-microglobulin mutation in mismatch repair-defective colorectal carcinomas, Curr. Biol., 6, 1695, 1996.

160. Thorstensen, L., Holm, R., Lothe, R. A., Trope, C., Carvalho, B., Sobrinho- Simoes, M., Seruca, R., WNT-inducible signaling pathway protein 3, WISP-3, is

88 mutated in microsatellite unstable gastrointestinal carcinomas but not in endometrial carcinomas, Gastroenterology., 124, 270, 2003.

161. Umar, A., Boland, C. R., Terdiman, J. P., Syngal, S., de la, C. A., Ruschoff, J., Fishel, R., Lindor, N. M., Burgart, L. J., Hamelin, R., Hamilton, S. R., Hiatt, R. A., Jass, J., Lindblom, A., Lynch, H. T., Peltomaki, P., Ramsey, S. D., Rodriguez- Bigas, M. A., Vasen, H. F., Hawk, E. T., Barrett, J. C., Freedman, A. N., Srivastava, S., Revised Bethesda Guidelines for hereditary nonpolyposis colorectal cancer (Lynch syndrome) and microsatellite instability, J. Natl. Cancer Inst., 96, 261, 2004.

162. Semba, S., Ouyang, H., Han, S. Y., Kato, Y., Horii, A., Analysis of the candidate target genes for mutation in microsatellite instability-positive cancers of the colorectum, stomach, and endometrium, Int. J. Oncol., 16, 731, 2000.

163. Ikeda, M., Orimo, H., Moriyama, H., Nakajima, E., Matsubara, N., Mibu, R., Tanaka, N., Shimada, T., Kimura, A., Shimizu, K., Close correlation between mutations of E2F4 and hMSH3 genes in colorectal cancers with microsatellite instability, Cancer Res., 58, 594, 1998.

164. Iglesias, D., Fernandez-Peralta, A. M., Nejda, N., Daimiel, L., Azcoita, M. M., Oliart, S., Gonzalez-Aguilera, J. J., RIS1, a gene with trinucleotide repeats, is a target in the mutator pathway of colorectal carcinogenesis, Cancer Genet. Cytogenet., 167, 138, 2006.

165. Miquel, C., Jacob, S., Grandjouan, S., Aime, A., Viguier, J., Sabourin, J. C., Sarasin, A., Duval, A., Praz, F., Frequent alteration of DNA damage signalling and repair pathways in human colorectal cancers with microsatellite instability, Oncogene., ., 2007.

166. Chen, Y. G., Wang, Q., Lin, S. L., Chang, C. D., Chuang, J., Ying, S. Y., Activin signaling and its role in regulation of cell proliferation, apoptosis, and carcinogenesis, Exp. Biol. Med. (Maywood. )., 231, 534, 2006.

167. Ionov, Y., Matsui, S., Cowell, J. K., A role for p300/CREB binding protein genes in promoting cancer progression in colon cancer cell lines with microsatellite instability, Proc. Natl. Acad. Sci. U. S. A., 101, 1273, 2004.

168. Zeng, L., Zhou, M. M., Bromodomain: an acetyl-lysine binding domain, FEBS Lett., %20;513, 124, 2002.

169. Appella, E., Anderson, C. W., Post-translational modifications and activation of p53 by genotoxic stresses, Eur. J. Biochem., 268, 2764, 2001.

170. Grossman, S. R., p300/CBP/p53 interaction and regulation of the p53 response, Eur. J. Biochem., 268, 2773, 2001.

89 171. Liang, C., Feng, P., Ku, B., Dotan, I., Canaani, D., Oh, B. H., Jung, J. U., Autophagic and tumour suppressor activity of a novel Beclin1-binding protein UVRAG, Nat. Cell Biol., 8, 688, 2006.

172. Jin, S., White, E., Role of autophagy in cancer: management of metabolic stress, Autophagy., 3, 28, 2007.

173. Kamemura, K., Hart, G. W., Dynamic interplay between O-glycosylation and O- phosphorylation of nucleocytoplasmic proteins: a new paradigm for metabolic control of signal transduction and transcription, Prog. Nucleic Acid Res. Mol. Biol., 73:107-36., 107, 2003.

174. Safe, S., Abdelrahim, M., Sp transcription factor family and its role in cancer, Eur. J. Cancer., 41, 2438, 2005.

175. Lazarus, B. D., Love, D. C., Hanover, J. A., Recombinant O-GlcNAc transferase isoforms: identification of O-GlcNAcase, yes tyrosine kinase, and tau as isoform- specific substrates, Glycobiology., 16, 415, 2006.

176. Wilson, J. X., Regulation of vitamin C transport, Annu. Rev. Nutr., 25:105-25., 105, 2005.

177. Liang, W. J., Johnson, D., Jarvis, S. M., Vitamin C transport systems of mammalian cells, Mol. Membr. Biol., 18, 87, 2001.

178. Sies, H., Stahl, W., Vitamins E and C, beta-carotene, and other carotenoids as antioxidants, Am. J. Clin. Nutr., 62, 1315S, 1995.

179. Kune, G., Watson, L., Colorectal cancer protective effects and the dietary micronutrients folate, methionine, vitamins B6, B12, C, E, selenium, and lycopene, Nutr. Cancer., 56, 11, 2006.

180. Padayatty, S. J., Katz, A., Wang, Y., Eck, P., Kwon, O., Lee, J. H., Chen, S., Corpe, C., Dutta, A., Dutta, S. K., Levine, M., Vitamin C as an antioxidant: evaluation of its role in disease prevention, J. Am. Coll. Nutr., 22, 18, 2003.

181. Gonzalez, M. J., Miranda-Massari, J. R., Mora, E. M., Guzman, A., Riordan, N. H., Riordan, H. D., Casciari, J. J., Jackson, J. A., Roman-Franco, A., Orthomolecular oncology review: ascorbic acid and cancer 25 years later, Integr. Cancer Ther., 4, 32, 2005.

182. Harakeh, S., ab-Assaf, M., Khalife, J. C., bu-el-Ardat, K. A., Baydoun, E., Niedzwiecki, A., El-Sabban, M. E., Rath, M., Ascorbic acid induces apoptosis in adult T-cell leukemia, Anticancer Res., 27, 289, 2007.

183. Reddy, V. G., Khanna, N., Singh, N., Vitamin C augments chemotherapeutic response of cervical carcinoma HeLa cells by stabilizing P53, Biochem. Biophys. Res. Commun., 282, 409, 2001.

90 184. Fruehauf, J. P., Meyskens, F. L., Jr., Reactive oxygen species: a breath of life or death?, Clin. Cancer Res., 13, 789, 2007.

185. Lee, K. W., Lee, H. J., Kang, K., Lee, C. Y., Preventive effects of vitamin c on carcinogenesis, The Lancet, 359, 172, 2002.

186. Mesnil, M., Connexins and cancer, Biol. Cell., 94, 493, 2002.

91