Genomic and Transcriptome Profiling of Serous Epithelial Ovarian Cancer

By

Rebecca Joanne Zoe Menzies B.Sc.

A thesis submitted in conformity with the requirements

for the degree of Master of Science

Graduate Department of Medical Biophysics

University of Toronto

© Copyright Rebecca Joanne Zoe Menzies (2009)

Genomic and Transcriptome Profiling of Serous Epithelial Ovarian Cancer

Master of Science

2009

Rebecca Joanne Zoe Menzies

Department of Medical Biophysics

University of Toronto

Abstract

Epithelial ovarian cancer is the leading cause of death by gynaecological malignancy.

Elucidation of the driver of ovarian cancer will lead to treatment targets and tailored therapy for this disease. The Affymetrix Genome-Wide SNP Array 6.0 was used to study 100 serous ovarian samples and 10 normal ovarian samples to identify loci and driver genes. The ovarian cancer genome was found to have high overall genomic instability across all and key known genes in this disease were identified in the dataset. Aberrant regions of copy number gain were located in “blocks” of constant copy number at 1p, 1q, 8q,

12p, 19q and 20q. The range in copy number for gains was 4.2 to 5.1. The “blocks” of genes were located at 8p and 5p for copy number losses. The range for copy number loss was 0.6 to

0.9.

ii

Acknowledgements

I would like to acknowledge the support of my supervisor Dr. Tak W. Mak. His mentorship and guidance has been truly inspirational. I would also like to thank the members of my supervisory committee, Dr. Igor Jurisica and Dr. Ben Neel for their helpful comments, feedback and insights. Thanks to Nancy Ng and Yury Bukhman for their technical support with running the arrays. I would like to thank Dr. Elisabeth Tillier for her guidance, discussions and help with this project. I would also like to acknowledge our collaborators for the ovarian cancer project, Dr. Denis Slamon and Dr. Patricia Shaw. Thank you to all the members of the Mak laboratory for their helpful discussions, input and collegiality. During the course of my graduate studies I received funding from the Canadian Institute for Health Research (CIHR). I would finally like to thank my parents, Dr. Teresa Menzies and Dr. John Menzies, my two sisters, Erica

Menzies and Dr. Fiona Menzies and my boyfriend Jonathan Covato. They, as always, provide the support and love I need to allow me to fulfill my goals.

iii

TABLE OF CONTENTS

Abstract ...... ii

Acknowledgments ...... iii

Table of Contents ...... iv

List of Figures and Tables ...... viii

List of Abbreviations...... xi

Chapter 1: Introduction ...... 1

1.1 Epidemiology of Ovarian Cancer ...... 2

1.2 Tumors of the Ovary ...... 2

1.3 Clinical Presentation of Ovarian Cancer ...... 3

1.4 Ovarian Cancer Screening ...... 4

1.5 Ovarian Cancer Risk factors ...... 5

1.6 Histological Subtypes of Ovarian Cancer ...... 5

1.7 Current Treatment Modalities ...... 6

1.8 Prognosis ...... 7

1.9 Theories on the Etiology of Ovarian Cancer ...... 9

1.10 Cancer “Drivers” ...... 10

1.11 Pathways in Cancer – Complexity Increases ...... 11

1.12 Molecular Genetics of Ovarian Cancer ...... 12

1.13 Ovarian Cancer Expression ...... 15

1.14 Structural Variation in Ovarian Cancer ...... 16

1.15 Parallels with Breast Cancer ...... 17

1.16 Targeted Molecular Therapies ...... 18 iv

1.17 Ovarian Cancer Genome-Wide Study ...... 18

1.18 Study Aims and Hypotheses ...... 18

Chapter 2: Experimental Design and Methods ...... 20

2.1 Introduction ...... 21

2.2 Collaboration and Sample Selection ...... 21

2.3 Platform Selection – Copy Number Array Technology ...... 22

2.4 Affymetrix Genome-Wide SNP Array 6.0 ...... 24

2.5 DNA Requirements for the Array ...... 25

2.6 Sample and Array Batches ...... 26

2.7 Array Data Analysis ...... 27

2.8 Accessing Partek Genomics Suite 6.4 ...... 27

2.9 Allele Intensity Import and Fragment Restriction ...... 28

2.10 Creating Copy Number from Intensities ...... 28

2.11 Principal Components Analysis ...... 30

2.12 Copy Number Visualization ...... 30

2.13 Segmentation Algorithms ...... 31

2.14 Assessment of Overall Trends in Data ...... 34

2.15 Segments Found in Multiple Samples ...... 35

2.16 Gene Annotation ...... 36

2.17 Gene List ...... 36

2.18 Removal of Redundant Genes in “Normal” Samples ...... 37

2.19 Gene List Parameters ...... 37

2.20 Selection of Top Hits ...... 38

v

2.21 Functional Inquiry into Genes of Interest ...... 39

2.22 Pathway Analysis of Genes of Interest ...... 39

Chapter 3: Results ...... 40

3.1 Introduction ...... 41

3.2 Quality Control – Principal Components Analysis ...... 41

3.3 Quality Control – Copy Number Histogram ...... 46

3.4 Copy Number Heat Maps ...... 49

3.5 Serous Ovarian Cancer Heat Maps ...... 49

3.6 Normal Sample Heat Maps ...... 65

3.7 Normal and Serous Heat Map Comparison ...... 80

3.8 Copy Number Variation Karyoview ...... 81

3.9 Serous Gene List – Copy Number Gains ...... 85

3.10 Serous Gene List – Copy Number Losses ...... 96

3.11 Normal Gene List – Copy Number Gains ...... 101

3.12 Normal Gene List – Copy Number Losses ...... 101

3.13 Functional Inquiry for Top Genes of Interest ...... 102

3.14 Pathway Analysis ...... 106

Chapter 4: Discussion ...... 108

4.1 Genomic Instability Identified in Serous Ovarian Cancer ...... 109

4.2 Cell Polarity Genes and Ovarian Cancer...... 110

4.3 Copy Number Alteration in Non-ovarian Cancer ...... 111

4.4 Future Directions ...... 113

vi

4.5 Clinical Implications for Future Research...... 115

4.6 Conclusion ...... 115

References ...... 117

Appendix ...... 137

vii

List of Figures and Tables

Figure 1 Age-adjusted cancer death rates in women from 1930-2003 (USA) ...... 2 Figure 2 Kaplan-Meier disease specific survival by grade ...... 8 Figure 3 Histogram of sample number distribution ...... 139 Figure 4 Flow chart displaying application of gene list parameters ...... 140 Figure 5 PCA plot of 100 serous samples that were run on seven dates in 2008.42 Figure 6 PCA plot of 94 serous samples ...... 43 Figure 7 PCA plot of 10 normal samples ...... 44 Figure 8 PCA plot of 9 normal samples after removal of the single outlier ...... 45 Figure 9 Histogram of 100 serous samples showing log2 ratio of copy number 47 Figure 10 Histogram of 94 serous samples showing log2 ratio for copy number . 47 Figure 11 Histogram of all 10 normal samples ...... 48 Figure 12 Log2 ratio distribution for nine normal samples ...... 48 Figure 13 Heat map displaying copy number for 1 for 96 serous ovarian cancer samples...... 54 Figure 14 Heat map for chromosome 2 of 96 serous samples ...... 54 Figure 15 Chromosome 3 heat map for 96 serous ovarian cancer samples ...... 55 Figure 16 PCA plot of 9 normal samples after removal of the single outlier ...... 55 Figure 17 Heat map of for 96 serous samples ...... 56 Figure 18 Heat map of 96 serous samples for chromosome 6 ...... 56 Figure 19 Heat map of chromosome 7 for 96 serous ovarian cancer samples ...... 57 Figure 20 Heat map of for 96 serous ovarian cancer samples ...... 57 Figure 21 Heat map of chromosome 9 for 96 serous ovarian cancer samples ...... 58 Figure 22 Heat map of chromosome 10 for 96 serous ovarian cancer samples .... 58 Figure 23 Heat map of chromosome 11 for 96 serous ovarian cancer samples .... 59 Figure 24 Heat map of for 96 serous ovarian cancer samples .... 59 Figure 25 Heat map of chromosome 13 for 96 serous ovarian cancer samples .... 60 Figure 26 Heat map of chromosome 14 for 96 serous ovarian cancer samples .... 60 Figure 27 Heat map of chromosome 15 for 96 serous ovarian cancer samples .... 61 Figure 28 Heat map of chromosome 16 for 96 serous ovarian cancer samples .... 61 Figure 29 Heat map of chromosome 17 for 96 serous ovarian cancer samples .... 62 Figure 30 Heat map of chromosome 18 for 96 serous ovarian cancer samples .... 62 Figure 31 Heat map of for 96 serous ovarian cancer samples .... 63 Figure 32 Heat map of chromosome 20 for 96 ovarian cancer samples ...... 63 Figure 33 Heat map of chromosome 21 for 96 serous ovarian cancer samples .... 64 Figure 34 Heat map of chromosome 22 for 96 serous ovarian cancer samples .... 64 Figure 35 Heat map of the X chromosome for 96 serous ovarian cancer samples ...... 65 Figure 36 Heat map of for 9 normal samples ...... 69 Figure 37 Heat map of chromosome 2 for 9 normal samples ...... 69 Figure 38 Heat map of chromosome 3 for 9 normal samples ...... 70 Figure 39 Heat map of chromosome 4 for 9 normal samples ...... 70 Figure 40 Heat map of chromosome 5 for 9 normal samples ...... 71 Figure 41 Heat map of chromosome 6 for 9 normal samples ...... 71 viii

Figure 42 Heat map of chromosome 7 for 9 normal samples ...... 72 Figure 43 Heat map of chromosome 8 for 9 normal samples ...... 72 Figure 44 Heat map of chromosome 9 for 9 normal samples ...... 73 Figure 45 Heat map of chromosome 10 for 9 normal samples ...... 73 Figure 46 Heat map of chromosome 11 for 9 normal samples ...... 74 Figure 47 Heat map of chromosome 12 for 9 normal samples ...... 74 Figure 48 Heat map of chromosome 13 for 9 normal samples ...... 75 Figure 49 Heat map of chromosome 14 for 9 normal samples ...... 75 Figure 50 Heat map of chromosome 15 for 9 normal samples ...... 76 Figure 51 Heat map of chromosome 16 for 9 normal samples ...... 76 Figure 52 Heat map of chromosome 17 for 9 normal samples ...... 77 Figure 53 Heat map of chromosome 18 for 9 normal samples ...... 77 Figure 54 Heat map of chromosome 19 for 9 normal samples ...... 78 Figure 55 Heat map of chromosome 20 for 9 normal samples ...... 78 Figure 56 Heat map of chromosome 21 for 9 normal samples ...... 79 Figure 57 Heat map of chromosome 22 for 9 normal samples ...... 79 Figure 58 Heat map of X chromosome for 9 normal samples ...... 80 Figure 59 Threshold values for copy number gain ...... 83 Figure 60 Threshold values for copy number loss ...... 83 Figure 61 Serous ovarian cancer karyoview ...... 84 Figure 62 Normal samples karyoview ...... 84 Figure 63 Graph of copy number gains over 4 copies for each chromosome after the genomic segmentation algorithm was applied to 96 serous ovarian cancer samples ...... 86 Figure 64 Graph of the number of samples for the top copy number gains found on each chromosome ...... 86 Figure 65 Scatterplot of “blocks” of copy number gain in serous ovarian cancer samples ...... 88 Figure 66 Scatterplot of the 8q “block” in 13 serous ovarian samples ...... 90 Figure 67 Scatterplot of the 19q “block” for 10 serous ovarian cancer samples .. 90 Figure 68 Scatterplot of the 12p “block” of 8 samples ...... 91 Figure 69 Scatterplot of the 20q “block” of 7 serous ovarian samples ...... 91 Figure 70 Scatterplot of the 19q “block” in 6 serous samples ...... 93 Figure 71 Scatterplot of the 1p “block” in 6 serous samples ...... 93 Figure 72 Scatterplot of the 1q “block” in 5 serous samples ...... 94 Figure 73 Scatterplot of the 1q “block” in 6 serous samples ...... 94 Figure 74 Scatterplot of the 8q “block” in 5 serous samples ...... 95 Figure 75 Graph of copy number losses below 1 copy for each chromosome after the genomic segmentation algorithm was applied to 96 serous ovarian cancer samples ...... 97 Figure 76 Graph of the number of samples for the top copy number losses found on each chromosome ...... 98 Figure 77 Scatterplot of “blocks” of copy number loss in the serous ovarian cancer samples ...... 98 Figure 78 Scatterplot of the 8p “block” in 3 serous samples ...... 99

ix

Figure 79 Scatterplot of the 8p “block” in 6 serous samples ...... 99 Figure 80 Scatterplot of the 5p “block” in 5 serous samples ...... 100

Table 1 Initial break-down of ovarian samples received from Dr. Denis Slamon ...... 138 Table 2 Gene list for copy number gains and losses ...... 141

x

List of Abbreviations aCGH: Array comparative genomic hybridization CA-125: cancer antigen-125

CBS: Circular Binary Segmentation CGH: comparative genomic hybridization

FF: fresh frozen FFPE: fresh frozen paraffin embedded GS: Genomic Segmentation HapMap: Haplotype Mapping project HMM: Hidden Markov Model LOH: loss of heterozygosity Partek GS: Partek Genomics Suite

PCA: Principal components analysis PCR: polymerase chain reaction NCBI: National Center for Biotechnology Information SNP: single nucleotide polymorphism

TCAG: The Centre for Applied Genomics TVS: trans-vaginal ultrasonography

UCLA: University of California Los Angeles

UHN RIS: University Health Network Research Information Services

xi

Chapter 1. Introduction

1

1.1 Epidemiology of Ovarian Cancer

Epithelial Ovarian cancer (hereafter referred to as ovarian cancer) is the leading cause of death by gynaecological malignancy and fifth most common cause of cancer death among women (Landen et al., 2008). In the United States in 2006, there were over 22,000 ovarian cancers newly diagnosed and over 15,000 deaths attributed to the disease (Jemal et al., 2007).

Only 5-10% of these cancers are of a recognized familial genotype (Holschneider and Berek,

2000). Typically, ovarian cancers are diagnosed at advanced stage (Stage III or IV) and this contributes to the high mortality rate (Bast et al., 2005). The late diagnosis can be attributed to a variety of factors including: non-specific symptoms, poor screening, and lack of established risk factors and the relative rarity of disease.

Figure 1. Age-adjusted cancer death rates in women from 1930-2003 in the USA (Jemal et al., 2007). Ovarian cancer is the fifth most common cause of cancer death in women.

1.2 Tumors of the Ovary

The differential diagnosis of tumors in the ovary include such varied causes as: functional ovarian cyst, theca lutein cyst, endometrioma, malignancy and metastatic carcinoma 2

(Killackey and Neuwirth, 1988). However, malignant tumors in the ovary fall into three categories: epithelial, germ cell and sex cord-stromal. The molecular genetics and underlying causes of each of these three are very different. Germ cell tumors are derived from the primordial germ cells of the ovary and are analogous to the types of testicular tumours found in men (Abu-

Rustum and Aghajanian, 1998). Although germ cell tumors account for between 20 and 25% of all ovarian tumors, they only represent 5% of the malignant tumors of the ovary (Tewari et al.,

2000). Sex cord-stromal tumors are a heterogeneous group of tumors from the non-germ cell and non-epithelial cells of the ovary (Chen et al., 2003). Sex cord-stromal tumors usually account for between 5 and 8 percent of ovarian neoplams (Cronje et al., 1999). The focus of this study is on epithelial ovarian tumors which account for over 90% of all ovarian malignant masses

(Cannistra, 2004).

1.3 Clinical Presentation of Ovarian Cancer

Typical symptoms of ovarian cancer include: lower abdominal pain, abdominal enlargement, and gastrointestinal and urinary complaints (Mitchell et al., 2006). In fact, these symptoms are so non-specific that their differential diagnoses would include: urinary tract infections, pregnancy, colon cancer or even irritable bowel syndrome (Cannistra, 2004). Thus, patients may not present to their physician until these symptoms have persisted for a lengthy duration. Similarly, at the time of presentation since the ovaries are located deep in the pelvis and typically difficult to examine on physical examination, a minimally enlarged ovary may go unnoticed for some time. Thus, one would expect adequate screening to catch potential ovarian cancers early in their course. However, this is not the case since current screening practices lack necessary sensitivity and specificity.

3

1.4 Ovarian Cancer Screening

Current screening for ovarian cancer varies between health centres. Screening primarily involves measuring serum cancer antigen-125 (CA-125) levels and trans-vaginal ultrasonography (TVS) (Cannistra, 2004). However, a high level of CA-125 (over 35 mmol/L) is not a reliable measure of detecting early stage ovarian cancer since it is also elevated in a variety of other gynaecological conditions such as endometriosis, pregnancy, adenomyosis and polycystic ovarian syndrome (Cannistra, 2004). In addition levels of CA-125 are typically low in early stage ovarian cancer (Cannistra, 2004). Thus, CA-125 is typically used to track response to chemotherapy and detect ovarian cancer recurrence (Bast et al., 2005). Since CA-125 is elevated in late stage ovarian cancer it may detect women with advanced disease. However, this is unacceptable since late stage cases have a very poor prognosis. Current screening for ovarian cancer is inadequate and the search for more effective tumor markers is ongoing. Thus, criteria for a good screening tool would include: different levels between women with ovarian cancer and women without disease, be measureable in the clinic and ideally used to detect early stage disease. Visintin et al. (2008) studied a six-biomarker (leptin, prolactin, osteopontin, insulin-like growth factor II, macrophage inhibitory factor and CA-125) screening tool for ovarian cancer and reported having 95.3% sensitivity and 99.4% specificity. However, Coates et al. (2008) warn against actually using this six-biomarker panel in clinical practice as of yet. They note that these reported sensitivities and specificities are particular and may have been confounded by an inordinately high incidence of advanced stage ovarian cancer in the test group. Typically, the annual incidence of ovarian cancer in women over 50 is about 0.036% (Coates et al., 2008). This highlights the problem in establishing ovarian cancer risk groups. Although, several

4

environmental risk factors have been identified as predisposing women to developing the disease an overall understanding of how these risk factors contribute to the disease is still unknown.

1.5 Ovarian Cancer Risk Factors

Known factors that decrease the risk of developing ovarian cancer include: oral contraceptive use (Huber et al., 2008), pregnancy (Moorman et al., 2008), tubal ligation, lactation and hysterectomy (Nagle et al., 2008). Ovarian cancer risk in BRCA mutation carriers has been studied for many years at the University of Toronto by the group led by Dr. Steven

Narod. This group has found that a family history of ovarian cancer and BRCA1/BRCA2 mutation status are the most significant factors that increase a woman’s risk of developing ovarian cancer (Modan et al., 2001). Multiple other factors that may increase risk for developing ovarian cancer have been identified and include: endometriosis (Stern et al., 2001; Olivier et al.,

2006; Kobayashi et al., 2008; Rossing et al., 2008), polycystic ovarian syndrome (Colombo et al., 2006), pelvic inflammatory disease (Ness and Cottreau, 1999; Ness et al., 2000), asbestos and talc exposure (Merritt et al., 2008) and BRCA1/BRCA2 mutated status (Kurian et al., 2005;

Colombo et al., 2006; Allain et al., 2007). These predictors of risk may lead the clinician to identify subgroups of women who should be screened more frequently or closely. However, the molecular underpinnings and interplay between these risk factors and ovarian cancer are still to be elucidated.

1.6 Histological Subtypes of Ovarian Cancer

There are several histological subtypes of ovarian cancer, which in-turn contribute to its heterogeneity as a disease. In fact, there are four main subtypes including: serous, endometrioid,

5

mucinous and clear cell. Although each of the histological subtypes is clinically treated in the same manner, in which patients would receive the same course of chemotherapy, radiation or surgery, the actual clinical course differs greatly. Serous ovarian cancer is the most common subtype accounting for approximately 70% of all malignant cases (Skubitz et al., 2006). In addition to being the most common, the serous subtype is also the most lethal, accounting for the majority of deaths attributable to ovarian cancer (Tan and Kaye, 2007). Endometrioid ovarian cancer accounts for about 10% of cases and is associated with endometriosis in the same individual. In fact, endometriosis, which is a monoclonal disease itself, has been seen to increase the risk of a woman developing ovarian cancer (Olivier et al., 2006). However, the molecular underpinnings of this interaction have not been identified. Mucinous ovarian cancer typically presents grossly with tumors greater than 20cm in diameter (Cannistra, 2004). However, these tumors are fortunately usually confined to the ovary at time of presentation (Cannistra, 2004).

Clear cell tumors are the rarest accounting for only about 6% of all cases (Kurian et al., 2005).

These tumors share similar features to renal clear cell carcinomas and studies have been done to find overlapping genes of interest between the two (Dent et al., 2003). Therefore, considering the heterogeneity of the subtypes it is likely that the molecular underpinnings of each subtype are different.

1.7 Current Treatment Modalities

Ovarian cancer treatment is “one-size fits all”, since it remains unchanged depending on histological subtype (serous, endometrioid, mucinous, clear cell) of epithelial ovarian cancer.

Treatment for early ovarian cancer (Stage I and II) includes a total hysterectomy, bilateral salpingo-oopherectomy and peritoneal cytology (Cannistra, 2004). Upstaging with a vertical

6

incision to explore the entire upper abdomen and the pelvic and para-aortic lymph nodes is also performed to asses stage (Colombo et al., 2006). The majority of patients with epithelial ovarian cancer will require chemotherapy following cytoreductive surgery. However, a subgroup of

Stage I patients will have a 90-95% 5-year survival rate with surgery alone (Cannistra, 2004).

Standard chemotherapy, regardless of histological subtype, is paclitaxel and a platinum based drug (Olivier et al., 2006). Typically chemotherapy is delivered intravenously in ovarian cancer patients. However, in a subset of women with optimally debulked stage III ovarian cancer intraperitoneal chemotherapy is more efficacious (Armstrong et al., 2006). Intraperitoneal chemotherapy delivers a higher and more targeted dose to the tumor. However, there is higher toxicity associated with this treatment regime. Therefore, there still exists a controversy in the field regarding this issue (Tummala et al., 2008). Cytoreductive surgery is also performed in advanced stage (Stage III and IV) disease in the majority of cases. Radiation therapy is typically not administered in the setting of ovarian cancer. Studies on the efficacy of giving abdominopelvic radiation to optimally debulked stage I, II and III patients showed no difference in survival or tumor control (Fyles et al., 1998).

1.8 Prognosis

Independent prognostic factors for improved survival in ovarian cancer include: younger age, non-clear cell histology, earlier stage at diagnosis, lower grade at diagnosis, surgery and lymphadenectomy (Chan et al., 2008). Serous ovarian cancer has the poorest prognosis among all ovarian cancer histological subtypes (Chan et al., 2008). Approximately 75% of patients with serous ovarian cancer present at stage III or IV (Cannistra, 2004). Stage IIIc serous ovarian cancer patients have a five year survival of 28.9%. The stage IV patient have an even lower

7

overall survival, only 13.4% of women are alive after 5 years (Heintz et al., 2003). This poor prognosis is evident in the survival curve from the University Health Network Ovarian Tissue

Bank (personal communication, Dr. Patricia Shaw).

Figure 2. Kaplan-Meier disease specific survival by grade. Four hundred serous ovarian tumors from the UHN ovarian tissue bank are represented on this curve (personal communication, Dr. Patricia Shaw). Over 70% of patients will initially respond to first line chemotherapy, however, only 50% of responders will be alive after 5 years (Olivier et al., 2006). This is primarily due to acquired chemoresistance in ovarian patients. The likelihood for patients to acquire chemoresistnace increases with increased stage at diagnosis, age and high grade at diagnosis (Thigpen et al.,

1993). Despite the variety of first-line and second-line chemotherapy regimes administered almost all women with advanced stage ovarian cancer will eventually die of their disease

(Cannistra, 2004). In fact, in a recent review by Osterberg et al. (2005), the authors postulated that the majority of deaths due to metastatic ovarian cancer were due to chemotherapy resistance.

Thus, the development of molecular targeted drugs may vastly improve the overall survival of women with serous ovarian cancer. 8

1.9 Theories on the Etiology of Ovarian Cancer

Several theories on the etiology of ovarian cancer have been put forth. Ness and Cottreau

(1999) proposed three major theories on the pathogenesis of epithelial ovarian cancer: incessant ovulation, pituitary gonadotropin hypothesis and the inflammatory hypothesis. Evidence in support of incessant ovulation includes the observation that women who have interrupted ovulation, due to oral contraceptive use or pregnancy, have a decreased risk of developing ovarian cancer. This theory also intuitively makes sense since the epithelial cells that undergo repetitive and incessant division and replication will have a higher chance of acquiring genomic aberrations. This damage to the ovarian surface epithelium may be the precursor lesion for the development of cancer (Fathalla, 1971). However, women who are infertile and do not ovulate are at higher risk of developing ovarian cancer (Glud et al., 1998; Zreik et al., 2008). Therefore, how can this piece of evidence be reconciled? The pituitary gonadotropin hypothesis also has very convincing evidence. The critical event behind this hypothesis is the entrapment of ovarian surface epithelium in inclusion cysts followed by persistent stimulation of the entrapped epithelium by gonadotropins (Ness and Cottreau, 1999). High levels of estrogen hormones have long been associated with increased breast cancer risk (Pike et al., 1993). However, this causal link has been challenged lately (Wiseman, 2004). In the setting of ovarian cancer, this theory cannot explain why women with very high estrogen levels (during pregnancy) have a lower risk of developing ovarian cancer. Similarly, post-menopausal women who receive hormone- replacement therapy only have a modest increase in risk of developing ovarian cancer (Neves-E-

Castro, 2008). Therefore, a third and more comprehensive theory has been considered involving inflammation (Ness and Cottreau, 1999). The idea of inflammation contributing to the pathogenesis of cancer is widely accepted in other cancer types (Fantini and Pallone, 2008; Soria 9

and Ben-Baruch, 2008; Vasto et al., 2008). Most of the observations in ovarian cancer can be explained using this theory. For instance, talc and asbestos exposure increases risk for ovarian cancer and also causes release of inflammatory cytokines in the adenexal milieu (Merritt et al.,

2007). Persistent gonadotropin hormone release also leads to an inflammatory process. Thus, the gonadotropin theory directly leads into the inflammation theory. Ness and Cottreau (1999) favour the last theory since inflammation causes DNA damage, oxidative stress and elevated cytokines and prostaglandins. However, the overall missing link with these three theories has been the lack of an identifiable precursor lesion in ovarian cancer (Piek et al., 2007). Therefore, several authors have suggested that the precursor lesion is actually of fallopian tube origin (Tone et al., 2007; Levanon et al., 2008). This would automatically negate the aforementioned incessant ovulation theory. Therefore, a clear controversy in the field exists around the etiology of ovarian cancer. A better understanding of the molecular underpinnings and driver genes of the disease will perhaps elucidate this further.

1.10 Cancer “Drivers”

Cancer “drivers” and “passengers” as defined by Greenman et al. (2007) are at the core of the treatment target discovery for this project. Greenman et al. (2007) define cancer “drivers” as those genes that confer a growth advantage to the cells in which they are mutated or amplified and thus are positively selected. These drivers are directly related to cancer development and likely and upstream event. Conversely, cancer “passengers” are those genes in the cancer cell that already carry a growth advantage due to the driver gene. Thus, passenger genes are not positively selected themselves. Typically passenger genes are found in mature cancer cells, unlike the driver genes which should be present at all stages of the tumor (Torkamani and

10

Schork, 2008). Driver gene growth advantages may include: evasion of apoptosis, sustained angiogenesis, limitless replicative potential, tissue evasion and metastasis, insensitivity to anti- growth signals, self-sufficiency in growth signals and a metabolic advantage (Hanahan and

Weinberg, 2000; Pan and Mak, 2007). kinases are the most commonly identified cancer driver gene to date (Futreal et al., 2004; Torkamani and Schork, 2008). This link between cancer and kinases was first identified by Varmus and Bishop in 1975 with the first cellular oncogene,

Src kinase (Garber, 2006). Subsequent cancer drivers genes that are also kinases include EGFR and Her2/neu. Therefore, it is anticipated that a focus on cancer drivers will lead to the future development of molecular targeted therapies in ovarian cancer.

1.11 Pathways in Cancer – Complexity Increases

Canonical pathways in cancer include the: SOS-Raf-Ras-MAP kinase pathway (Giancotti and Ruoslahti, 1999), pRB pathway (Hanahan and Weinberg, 2000), APC/β-catenin pathway

(Kinzler and Vogelstein, 1996), PI3 Kinase – AKT/PKB pathway (Yuan and Cantley, 2008) and the p53 pathways (Vogelstein et al., 2000). Research into the biochemistry of these pathways has shown that many of the important pathways in cancer interact through cross-talk, common receptors, common ligands or even directly feed into each other (Weinberg, 2007). For example, the PI3 Kinase pathway can be activated by downstream Ras pathway interactors or downregulated by downstream p53 pathway interactors. In turn, the PI3 Kinase pathway activates other pathways involved in apoptosis and cell cycle arrest (Cully et al., 2006).

However, this extensive interaction between pathways in cancer is not surprising since it is such a complex disease. Ultimately these pathways all lead to tumorigenesis and may be the drivers behind the malignant process.

11

1.12 Molecular Genetics of Ovarian Cancer

What is known about the molecular genetics of ovarian cancer can be divided into two categories: BRCA1 and BRCA2 genes and non-BRCA genes. There has been extensive research on the breast-ovarian syndrome in women with BRCA1 or BRCA2 mutations (Miki et al., 1994;

Struewing et al., 1997; Kennedy et al., 2002; Tan et al., 2008). A BRCA1 mutation increases the lifetime risk of developing ovarian cancer by 35 to 46% (Chen et al., 2007). Women with

BRCA2 mutations carry a 13 to 23% increased risk of developing ovarian cancer during their lifetime (Chen et al., 2007). BRCA mutated ovarian cancer patients usually have a more favourable prognosis. For instance, BRCA-mutated ovarian cancer patients typically respond highly to first-line and subsequent platinum-based chemotherapy in instances when non-BRCA- mutated patients would not respond (Tan et al., 2008). This response correlates to an improved overall survival (Tan et al., 2008). However, BRCA-mutated ovarian cancer patients only account for about 5-10% of all cases (Holschneider and Berek, 2000). Therefore, knowledge of the cancer driver genes and molecular genetics of non-BRCA-mutated ovarian cancers must be achieved.

The heterogeneity between the subtypes of ovarian cancer is also reflected in the molecular genetics known so far. For instance, aberrant genes in each subtype have been independently researched and are highly heterogeneous.

Ovarian clear cell carcinomas commonly have an upregulation of Hepatocyte nuclear factor-1 β (HNF-1β) but other subtypes do not (Tsuchiya et al., 2003; Tan and Kaye, 2007).

Interestingly, HNF-1β is also highly expressed in endometriosis linking these two diseases together (Tan and Kaye, 2007). PTEN inactivation is seen in 40% of Stage I and II ovarian clear 12

cell carcinomas (Hashiguchi et al., 2006). This important tumor suppressor gene may be involved in the focal adhesion, spreading and migration of early clear cell cancers. CD44, a membrane glycoprotein and cell-surface receptor, has aberrant splice control in clear cell cancers

(Rodriguez-Rodriguez et al., 2003; Zorn et al., 2005). Two important tumor suppressor genes, p53 and WT1, are mutated at a low frequency in clear cell cancers (Skirnisdottir et al., 2001;

Okuda et al., 2003; Skirnisdottir et al., 2005).

Mucinous cancers have been observed to have point mutations, typically at codons 12 and 13, in the KRAS gene (Caduff et al., 1999; Gemignani et al., 2003; Cho and Shih, 2008).

Mutations in KRAS occur in about 75% of mucinous tumors. Mucinous ovarian cancer tumors also express several mucin genes, MUC2, MUC3 and MUC 17, that are found in mucinous tumors of many organ types (Cho and Shih, 2008). The overexpression of the LGALS4 gene has also been detected in early stage ovarian mucinous tumors (Cho and Shih, 2008). Overall, few molecular aberrations have been identified in mucinous tumors due to the small number of studies published in this area of research (Cho and Shih, 2008).

Serous ovarian cancers are pathologically divided into low-grade and high grade categories. The advanced stage high grade serous ovarian cancers have mutations of the p53 gene in over 80% of cases (Cho and Shih, 2008). In fact, early stage high grade serous ovarian cancers have p53 mutations 50% of the time (Chan et al., 2000). This high frequency at an early stage suggests that p53 mutation is an early event in this cancer. Conversely, p53 is rarely mutated in low grade serous ovarian cancers. The majority of low grade serous ovarian cancers have mutations in the BRAF and KRAS genes (Cho and Shih, 2008). In fact, the differential of these three tumor suppressor genes has led researchers to categorize serous

13

tumors into Type I and Type II tumors. Type II tumors demonstrate more genetic instability and are typically higher grade and stage at diagnosis (Singer et al., 2005; McCluggage, 2008). High grade serous ovarian cancers also have higher frequency of overexpression of MIB1, bcl-2, c-kit,

Her2/neu and p16 compared to the low grade tumors (O’Neill et al., 2005; O’Neill et al., 2007).

Mutations in the PTEN, BRCA1/2 and PI3K tumor suppressor and oncogenes are present in 10% or less of high grade serous ovarian tumors (Cho and Shih, 2008). It is still unclear whether the additional genes mutated in high grade serous cancers are due to a progression in malignancy from the low grade cases or are from a separate precursor all together. The overexpression of

Her2/neu in serous ovarian carcinoma is typically observed in 5-7% of cases (Mano et al., 2004;

Lassus et al., 2006). However, the reported frequency of Her2/neu overexpression varies greatly between studies, in some studies there is no overexpression detected (Cho and Shih, 2008).

When present, Her2/neu overexpression has been linked to poorer prognosis and overall decreased survival in multiple studies (Erkinheimo et al., 2004; Lassus et al., 2004; De Graeff et al., 2008). Intratumoral T cells have been shown to improve overall disease-free survival in serous ovarian cancer patients (Zhang et al., 2003). This indicates a possibly important role for molecular antitumor mechanisms in ovarian cancer. Therefore, the gene expression patterns of even serous ovarian cancer appear to be quite heterogeneous. Thus, knowledge of a central driver gene or pathway of interest would be invaluable.

Endometrioid ovarian cancers have similar genetic aberrations to those found in serous ovarian cancer. P53 is also mutated in endometrioid ovarian cancers in 60% of cases (Cho and

Shih, 2008). Mutations in KRAS and BRAF are seen in about 10% of endometrioid tumors

(Kurman and Shih, 2008). PTEN is also mutated in between 14 and 20% of endometrioid tumors

(Okuda et al., 2003). In a recent review of endometrioid ovarian cancer, Di Cristofano and 14

Ellenson (2007) noted mutations of the canonical Wnt signalling pathway in 16-38% of endometrioid ovarian cancers. In over 60% of endometrioid tumors with a Wnt signalling pathway defect, the key effector β-catenin (Oliva et al., 2006).

1.13 Ovarian Cancer Gene Expression

The ability to survey the genome and transcriptome has been possible due to the explosion in bioinformatics and computer technology over the past two decades. Much of the earlier array experiments focused on microarray experiments looking at gene expression derived from RNA samples in tissues (Fehrmann et al., 2007). Microarray methodology has been used to answer research questions involving carcinogenesis, gene profiles, response to chemotherapy, acquired resistance to chemotherapy, metastasis, precursor lesions, prognostic biomarkers and others. Both tissue samples and cell lines have been focused on in these microarray studies. For instance, in a study of 34 ovarian tumor samples, a 115-gene profile was identified that could be used to stratify the samples by prognosis (Spentzos et al., 2004). Of the 115 genes,

70 genes were overexpressed in the unfavourable outcome group and 45 were underexpressed in the favourable group. Hartmann et al. (2005) set out to develop a gene profile to distinguish women who were at risk for developing ovarian cancer early in their lives compared to a later onset. Another study found a high degree of similarity between luteal phase fallopian tube epithelium and serous ovarian carcinoma (Tone et al., 2008). These microarray studies have highlighted the heterogeneity found in ovarian cancer and generated many hypotheses now being tested in research laboratories around the world.

Gene expression studies have also focused on elucidating the differences between high- grade and low-grade serous ovarian cancers. Many studies including one by Meinhold-Heerlein 15

et al. (2005) have found a pattern of gene expression that can distinguish high-grade from low- grade tumors. In the Meinhold-Heerlein et al. (2005) study the serous borderline tumors clustered with the low grade serous tumors rather than the high grade tumors. This gives further evidence to the theory that high grade serous and low grade serous tumors arise from two separate pathways.

1.14 Structural Variation in Ovarian Cancer

Analysis of the genome by comparative genomic hybridization (CGH) and single nucleotide polymorphism (SNP) arrays are complementary technologies to studying gene expression using microarrays. These technologies will be discussed further in the methodology section of this thesis. Structural variation in the genome is defined as cytogenetic and submicroscopic variation (Scherer et al., 2007). This encompasses deletions, duplications, inversions, insertions and large-scale copy number variants. Both CGH and SNP arrays have the ability to asses structural variation. SNP arrays have a higher resolution than CGH due to the sheer number of SNPs located across the genome (Syvanen, 2005). Many studies on ovarian cancer have used CGH (Mayr et al. 2006, Li et al., 2007; Nowee et al., 2007). A high degree of chromosomal instability has been observed in ovarian cancer (Shridhar et al., 2001; Cho and

Shih, 2008). CGH studies have identified 1q, 3q, 8q and 12p as regions commonly showing gains in copy number (Sonoda et al., 1997; Hauptmann et al., 2002). Losses on chromosome 20 are commonly observed in ovarian cancer (Kiechle et al., 2001). In a study using CGH and SKY originating from the Dr. Jeremy Squire group at the University of Toronto, chromosomes 3, 8,

11, 17 and 21 were identified to have the most frequent genetic aberrations in epithelial ovarian

16

cancer (Bayani et al., 2002). However, the long or short arm of a chromosome may contain several thousand genes so higher resolution must be achieved to identify key driver genes.

In 2007, Gorringe et al. published a dataset of 31 ovarian cancer tumors analysed by the

Affymetrix Genome-Wide SNP array 5.0. They argue that this SNP array technology has the necessary resolution to actually identify notable driver genes. Gorringe et al. (2007) found over

380 areas of copy number aberration that were less than 500kb in size. There were 33 sites of high level gain, defined as more than 5 copies. In another 2007 paper concerning DNA copy number in serous ovarian cancer, Nakayama et al. found several regions of gain. The most frequently amplified subchromosomal regions, found by FISH, were the CCNE1, AKT2,

NOTCH3, RSF1 and PI3K loci (Nakayama et al., 2007).

1.15 Parallels with Breast Cancer

The rationale for a molecular targeted drug in ovarian cancer is evident, although success in this area for this type of cancer has not yet been achieved. However, parallels from the field of breast cancer can be drawn and insight from the Herceptin story can be achieved (Slamon et al.,

1987, Slamon et al., 2001). The five subtypes of breast cancer established by Perou et al. (2000) may have parallels to the field of ovarian cancer. The basal subtype of breast cancer, which has aberrations of p53, PTEN, BRCA1/2 and CCNE1 is definitely similar at the molecular level to the serous subtype of ovarian cancer (Kurman and Shih, 2008). However, there are many obstacles in ovarian cancer compared to breast cancer including: relative rarity of disease, multiple histological subtypes, lack of animal models for testing and lack of known risk factors.

17

1.16 Targeted Molecular Therapies

Targeted molecular therapies have been successfully implemented in the treatment of a variety of cancers. Notably Imatinib (Gleevec), used in the treatment of Chronic Myelogenous

Leukemia, targets the bcr-abl fusion protein. This tyrosine kinase inhibitor was found by Lydon and colleagues in the late 1990s (Drucker and Lydon, 2000). Another commercially available targeted drug is Cetuximab, a monoclonal antibody against the extracellular domain of

Epidermal Growth Factor Receptor (EGFR) (Mahtani and Macdonald, 2008). This drug is used for the treatment of colorectal cancer. There are relatively few targeted molecular drugs but do include: Rituximab (binds to CD20), Bevacuzimab (inhibits VEGF) and most importantly for this proposed study, Herceptin (inhibits Her2) (Widakowich et al., 2007).

1.17 Ovarian Cancer Genome-Wide Study

The rationale for studying ovarian cancer tumors with a high resolution technology is clear. The ability to survey hundreds of tumor samples at one time makes genome-wide study a powerful technique. There is a dearth of established molecular aberrations, pathways, appropriate screening tools and known risk factors for ovarian cancer. Hopefully, by examining copy number data from the ovarian cancer genome key driver genes will be elucidated. This in turn may lead to the development of molecular targeted drugs for treatment of women with this disease.

1.18 Study Aims and Hypotheses

This study has multiple aims and hypotheses that build upon the central question of: What are the driver genes in epithelial ovarian cancer? The aim of this project is to answer this broad question by examining copy number variations and SNPs in the epithelial ovarian cancer

18

samples using a genome wide approach with a focus on novel gene discovery. By elucidating novel driver genes in ovarian cancer future research may develop monoclonal antibody or small molecule targeted drugs, analogous to Herceptin in breast cancer.

It is hypothesized here that novel driver genes in ovarian cancer will be discovered by selecting genes that are present at high (gains) or low (losses) copy number in the majority of the samples tested. That is, the driver genes will be found in more samples than not. It is important to note that genes with particularly high or low copy number may be due to normal polymorphisms in the population in general. Therefore, these driver genes may not have the very highest copy number but will be prevalent throughout the dataset. Several methods are discussed for removal of these normal variations. I also hypothesize that the driver genes will not be present in all of the samples. This may be for a variety of factors including: heterogeneity due to stage, grade, clinical characteristics or actual histology.

19

Chapter 2: Experimental Design and Methods

20

2.1 Introduction

This research project is centrally situated around the use of the Affymetrix Genome Wide

SNP array 6.0. This technology has enabled discovery of potential driver genes and pathways of significant interest in serous epithelial ovarian cancer. The methodological approach undertaken in this study has several advantages over previous studies on the genetics of epithelial ovarian cancer. First, there are currently no publications using the Affymetrix Genome-Wide SNP arrays on epithelial ovarian cancer tissues of any type. Second, the number of samples involved in this study is larger and thus has greater power than previously reported array studies.

2.2 Collaboration and Sample Selection

A major collaboration was set up with Dr. Denis Slamon, University of California Los

Angeles (UCLA) to secure the transfer of DNA extracted from epithelial ovarian cancer tumors.

A total number of 155 samples consisting of a variety of epithelial ovarian cancer histological subtypes were run on the array (see Table 1 in Appendix). Since there was a large disparity in sample number between the groups and some groups had very few samples a focus on the serous subtype has been taken in this project. In fact, clinically serous epithelial ovarian cancer poses the greatest disease burden since it is the most frequently encountered ovarian cancer, accounting for 80% of total malignant cases (Cannistra, 2004).

The tumors selected for this study were reviewed by pathology at UCLA prior to sending them to Toronto. It was noted that they all had at least 80% tumor content, which is essential so there is not a high load of contaminating non-malignant cells that could negatively impact the outcome of the array. The DNA was extracted from the tumors by bulk using the QIAamp DNA

21

Micro Kit (Qiagen, Valencia, CA, USA). Tumor DNA was stored at -20°C until used in the array.

Samples were identified by a four digit code that was assigned at UCLA. This four digit code was kept throughout the preparation for the array and subsequent analysis for consistency.

2.3 Platform Selection – Copy Number Array Technology

Current commercially available technologies are able to study the at a variety of levels. The study of cytogenetics has evolved over the past fifty years with technologies allowing for higher resolution. This revolution in cytogenetic technology actually started in the 1960s with the discovery of prebanding techniques with a resolution of 10-20Mb.

This led to the discovery of Down’s, Turner’s and Klinfelter’s syndrome (Ledbetter, 2008). This cytogenetic revolution continued with the discovery of banding technology in the 1970s then fluorescent in situ hybridization in the 1990s. With each technological advance the ability to detect aberrations with higher resolution increased. Most recently in the past decade cytogenetic array technology has allowed for a resolution between 50 and 500kb. These array technologies include: array-based comparative genomic hybridization, oligonucleotide arrays and SNP arrays

(Ylstra et al., 2006; Baldwin et al., 2008; Beaudet and Belmont, 2008; Ledbetter, 2008). The resolution of these technologies is so great that there is actually a paradigm shift. In the past large chromosomal rearrangements or imbalances could be attributed to the disease phenotype observed. However, now that small microdeletions and amplifications are detectable with the current technologies it is important to separate these from normal genetic polymorphisms in the population (Lee et al., 2007; Ledbetter, 2008). The removal of normal variations in copy number

22

from the results of the copy number analysis will thus be an important part of the methodology discussed in this study.

Historically, comparative genomic hybridization (CGH) was an extension of traditional karyoptying without the added complication of culturing cells (Ylstra et al., 2006). Array comparative genomic hybridization (aCGH) is an even more advanced technology in that it had two major advantages over traditional CGH. These advantages include much higher resolution and a more user-friendly data that does not need specific cytogenetic expertise for interpretation

(Pinkel and Albertson, 2005; Ylstra et al., 2006). Oligonucleotide arrays with single stranded oligos of 25 to 85mer size have comparable resolution and use to aCGH (Pinkel and Alberston,

2005). The advantage of oligo arrays is that they are widely commercially available through

Affymetrix, Agilent and Nimblgen and others (Segal et al., 2005; Ylstra et al., 2006). However, genome-wide SNP arrays have several advantages over aCGH and oligonucleotides arrays.

Several authors have suggested that SNP arrays have the potential to undercover the genetic aberrations in complex genetic disorders (Hirschhorn and Daly, 2005; Syvanen, 2005).

There are over 10 million SNPs known in the human genome (Kruglyak and Nickerson, 2001).

Further study and verification of over 2 million of SNPs was done in the Haplotype Mapping project and Perlegen Sciences project (The International HapMap project, 2003; Hinds et al.,

2005). Since there are so many SNPs spread across the whole human genome there is the potential for higher data quality, more accurate genotyping and a higher degree of information content for SNP array studies compared to previous array technologies (Murray et al., 2004;

Evans et al., 2004; Syvanen, 2005). One of the key obstacles in the generation of SNP arrays was the polymerase chain reaction (PCR) step. Due to the complexity and number of SNPs in the

23

genome it was essential for a PCR complexity reduction step to be incorporated into the commercially available SNP arrays (Syvanen, 2005; Affymetrix, 2008). Several practical concerns concerning Genome-Wide association studies were outlined in a review by Wang et al.

(2005). The authors note the importance of large sample size for a valid genome-wide association study. This idea was further extended by Hehir-Kwa et al. (2007) in their comparison of genome-wide array technologies by using a statistical power analysis. They concluded that

SNP and olignucleotide arrays are superior to BAC cgh arrays for detecting aberrations smaller than 1Mb. Thus, it is highly advantageous to study SNPs due to their high abundance in the human genome, stability and relative ease of scoring in array experiments (Wang et al., 1998).

SNP arrays, in general, have several advantages over other DNA arrays such as BAC arrays or previous versions of the Affymetrix array. Although SNP arrays were originally designed for genetic association studies they are now widely applied to the study of human malignancy. In fact, the examination of SNPs allows the researcher to assess copy number and genotype in a single assay (Monzon et al., 2008). Additionally, SNP arrays can be used to detect loss of heterozygosity (LOH) in tumors (Monzon et al., 2008). SNPs are also the most abundant

DNA variation in the human genome; over 2 million SNPs are known (Zhao et al., 2004). Thus, the degree of diagnostic potential for human disease is vast.

2.4 Affymetrix Genome-Wide SNP Array 6.0

The Affymetrix Genome-Wide SNP Array 6.0 has over 906,600 SNP probes and more than 946,000 copy number probes. The SNPs on the array are 200 to 1,100 Nsp1 or

Sty1 digested fragments. These are amplified using the Genome-Wide Human SNP Nsp/Sty assay kit 6.0 (Affymetrix, 2008). This array has SNPs for chromosomes one to 22, X, Y and 24

mitochondrial chromosomes. There are also a total of 202,000 probes from known areas of copy number variablility annotated in the Toronto Database of Genomic Variants. An additional

744,000 probes from copy number variance were also evenly spaced across the array to enable novel CNV discovery. Thus, all put together the median spacing of the immobilized probes on the array is on average 700 base pairs (Affymetrix, 2008).

2.5 DNA Requirements for the Array

A total of 500ng to 1ug of undigested genomic DNA is the starting material for the array assay. This DNA can be extracted from either fresh frozen paraffin embedded (FFPE) or fresh frozen (FF) tissue. DNA is a relatively stable molecule so both these starting materials have been found to yield good results. Tuefferd et al. (2008) compared CNV results from both FFPE and

FF tissue using the Affymetrix Genome-Wide SNP Array 6.0. They found it necessary to correct the G-C content and measured intensities for fragment length while using FFPE samples. Overall in the Tuefferd et al. (2008) study 83% of high level amplification found in the FF samples were also found in the matching FFPE samples. Thus, the array can handle both starting materials but obviously FF samples are the ideal. In my study FF samples were used for all serous cases.

To asses DNA quality, Reference Genomic DNA 103 supplied by Affymetrix can be used as a positive control. The template genomic DNA for the array must be double stranded and free of PCR inhibitors such as heme or EDTA. DNA also cannot be highly degraded as the Nst1 and Sty1 restriction sites must be intact in order for the steps below to occur. In order to assess the level of potential degradation 25ng of the genomic DNA sample was run on a 1% agarose gel. High quality DNA is expected to have a band at 10-20kb on the gel. Additionally, the optimal DNA concentration is 50-100ng/uL. 25

Undigested genomic DNA (500ng-1ug) is first digested by Nst1 and Sty1 restriction enzymes. The fragments are then ligated to a group of adaptor molecules that recognize 4bp overhangs. All the fragments are candidates for ligation since it is the overhangs that are recognized by the adaptors not the fragment DNA itself. Then a generic primer amplifies the fragment-adaptor complex and binds for PCR amplification. Affymetrix has optimized the PCR conditions to amplify fragments between 200 and 1,100bp in size. The resulting PCR reaction products are pooled and purified using polystyrene beads. Then the DNA is fragmented again with another restriction enzyme. Then the fragments are labelled with the DNA labelling reagent.

As a final step the labelled fragments are hybridized to the array either using the GeneAmp PCR

System 9700 or Applied Biosytems 2720 Thermal Cycler (Affymetrix, 2008).

Once DNA was diluted to the appropriate concentration of 50-100ng/uL and checked for degradation on a 1% agarose gel the DNA was submitted to the The Centre for Applied

Genomics (TCAG) at The Hospital for Sick Children located at the MaRs facility (Toronto, ON) for the array. Each of the restriction enzyme steps, PCR, labelling and hybridization steps was carried out by a single consistent operator to ensure consistency across batches of samples.

2.6 Sample and Array Batches

The 100 serous samples were run in two batches along with the other histological subtypes. A total of 29 serous samples were run in four batches in March 2008. Several serous samples were included in each batch but no duplicates of any samples were run. The remaining

79 serous samples were run over five days in July 2008. The samples were run on two separate months since the full set of sample DNA was not available in March, 2008 for all 100 serous samples. The ten normal samples were run in four batches in July 2008. 26

2.7 Array Data Analysis

Array data was stored in the form of .cel files which can be loaded into several commercially available data analysis software packages including: DNA-Chip Analyzer (dChip)

(Li et al., 2001; Xu et al., 2008), Genotyping Console (Affymetrix, 2008) and Partek Genomics

Suite (Downey, 2006). Other non-commercially available university research lab developed software tools that are notable include: PennCNV (Wang et al., 2008) and SNPExpress (Sanders et al., 2008). Two advantages of the commercially available software packages include their stable accessibility over time and range of algorithms for copy number detection. For instance, both PennCNV and SNPExpress both use the Hidden Markov Model to detect amplification and deletions in the dataset. Partek Genomics Suite (GS) 6.4 was used in this study because it optimally fulfilled several key requirements outlined below.

2.8 Accessing Partek Genomics Suite 6.4

Partek GS is currently available to members of the Mak lab. In order to connect to Partek GS, the freely available X server for Windows XP called Xming (Xming, 2008) was downloaded onto a computer terminal in the Mak lab. This X server enabled Windows to be used as a platform for a Linux environment. This was necessary since Partek runs on Linux from either the gold.uhnres.utoronto.ca or ybsvr.uhnres.utoronto.ca servers. An alternate choice of X server may have been Cygwin (Cygwin, 2008). However, Xming was chosen in this case due to its easier install process. In consultation with the University Health Network Research Information

Services (UHN RIS) it was determined that running Partek GS from the ybsvr server was the most efficient option. Thus, a scratch temp file was created for all saved files to be stored on the

27

server. Environmental settings for the software were saved and ybsvr was accessed by using the

“ssh” command from the gold.uhnres.utoronto.ca server.

2.9 Allele Intensity Import and Fragment Restriction

A total of 200 serous sample .cel files were imported into Partek GS using the Copy number workflow and Load allele intensities option. This file type has the hybridization intensities from the almost 2 million probes on the array. The Copy number workflow is a toolbar in Partek GS that facilitates the user to explore important aspects of copy number analysis available in the software package. After allele intensities were loaded into the software, fragments were selected between 200 and 1000bp. This filter is applied to remove any fragments that were not properly digested in the initial steps of the array assay.

2.10 Creating Copy Number from Intensities

An unpaired method to estimate copy number from the intensities of the probes was used since the samples were independent. The choice of baseline for these samples was a point of much exploration. There were 10 normal ovarian samples run on the Affymetrix Genome-Wide

6.0 array in this study. These tissues were harvested from the normal ovaries of ovarian cancer patients who had a unilateral ovarian cancer case. However, we were blinded to the information about whether any of the normals and serous samples were from the same individual. Presuming this was the case, however, only 10 normals were available so 90% of the serous samples would have unmatched normals in any case. Initially it was considered that these 10 normal samples could be used as a baseline for the study of the 100 serous samples. However, what if the normal samples had abnormal copy number themselves? In fact, approximately 5.3% of the human

28

genome is duplicated (She et al., 2004). Many authors have reported copy number variations in disease-free “normal” tissue samples (Carter, 2007; Gorringe and Campbell, 2008; Piotrowski et al., 2008). Normal genomic copy number variations include: segmental duplications (SD) and copy number variations in the disease free population (Gorringe and Campbell, 2008). Segmental duplications are defined as tracts of DNA that are at least 1kb in length and duplicated at a high frequency (>90%) elsewhere in the genome (Bailey et al., 2001). A discussion about the method for optimal optimization is presented in the Gorringe and Campbell (2008) paper about their previous study on 31 epithelial ovarian cancer tissues (Gorringe et al., 2007). In the study they normalized the tumor samples against matched lymphoblast DNA from the same individual.

Using this method multiple microdeletions and amplifications were found, one in particular, an apparent homozygous deletion on chromosome 17 near the gene CCL4. Gorringe et al. (2007) then attempted to validate this finding using quantitative RT-PCR but did not detect a homozygous deletion at this region on chromosome 17. Then they normalized the lymphoblast

DNA to normal DNA from unrelated individuals and detected a germline copy number variation in the lymphoblast DNA. Therefore, normalizing malignant samples against matched germline

DNA may lead to spurious copy number variation findings. Gorringe and Campbell (2008) suggest several strategies to approach normalization in the analysis of high-resolution copy number array data. When there are no matched normal DNA, Gorringe and Campbell (2008) suggest two approaches. The first option consists of using DNA from samples that are from a similar population to the experimental malignant samples as a baseline. The second option involves using online databases of copy number variants such as the UCSC Genome Browser

(UCSC Genome Browser, 2008). In this study a combination of the two approaches suggest above was used. The 100 serous samples were normalized against a baseline of 270 samples

29

from the HapMap project. The 10 normal samples were also normalized against the baseline of

270 HapMap project samples. However, later in the data analysis a comparison between the 100 serous and 10 normal samples was completed to remove redundant and normal polymorphisms.

This comparison will be discussed later in the methodology section.

2.11 Principal Components Analysis

Principal components analysis (PCA) was carried out on all samples to determine if there were any batch effects or outliers in the data. This is a relatively common statistical method to asses quality and batch effects in copy number variation studies (Rennstram et al., 2003; Somiari et al., 2004; Lo et al., 2008; Reiner et al., 2008). In PCA numerous complex variables are reduced to a smaller and simpler number of variables (Dunteman, 1989). This reduction in complexity corresponds to a reduction in the number of dimensions in the data. The principal components are in fact the linear components of the original variables. In Partek GS, PCA plots are displayed in three dimensional space with the ability to rotate between each component using the computer mouse. Each sample is represented by a coloured sphere in the PCA plot by the date it was run on the array. PCA analysis in Partek was also used to identify outliers in the data set at this step. These outliers were excluded from subsequent analysis.

2.12 Copy Number Visualization

To facilitate ease of interpretation, copy number values are converted to log2 ratio. The logarithm with base of two is used since normally there are two copies of each chromosome in a normal individual. The copy number log2 values are visualized in two tracks in Partek GS. The upper track shows raw intensities in grey and summarized copy numbers in different colours for

30

each sample across a specific chromosome. The lower track displays a heat map with each sample represented for the specific chromosome. Each of the 23 chromosomes can be visualized independently or as a group in Partek GS. Multiple samples can be visualized simultaneously in the top track. However, since there are 100 serous samples in this study visualizing them all at once posed a challenge. Thus, only a few samples were visualized at a time so each colour could be discerned.

2.13 Segmentation Algorithms

The Hidden Markov Model (HMM), Circular Binary Segmentation (CBS) and Genomic

Segmentation (GS) are three commonly used methods to analyse copy number data from array experiments. Other notable analysis tools include: threshold analysis (Weiss et al., 2003), smoothing and averaging compared to normal-normal experiments (Pollack et al., 2002), three component normal mixture model with mouse islet DNA (Hodgson et al., 2001) and a change point model based on RNA expression (Linn et al., 2003). However, most publications in the copy number variation field use one of the big three algorithms, HMM, CBS or GS.

Hidden Markov Models (HMM) have been used in a variety of biological applications including: profile searches, multiple sequence alignment, and regulatory site identification

(Eddy, 2004). HMM is a statistical technique for modelling signals and intensities based on stochastic calculus (Mukherjee and Mitra, 2004). The transition states make up a Markov chain and the resulting probabilities are interpreted using Bayesian probability theory (Eddy, 2004).

This algorithm has been applied to the study of copy number variation in genome-wide studies since the different intensities from the array can be broken up into transition states. This enables the researcher to determine areas of the genome that have copy number variation of significant 31

interest. HMM has been used in several genome-wide copy number variation studies (Yu et al.,

2007; Korn et al., 2008) including the ovarian cancer study by Gorringe et al. (2007). The disadvantage of using HMM is the transition states must be specified in advance. Therefore, very high copy number states may be missed as a result.

Circular binary segmentation (CBS) is based on the fact that gains or losses in copy number are discrete pieces of information. This algorithm developed by Olshen et al. (2004) breaks up the chromosomes into areas of equal copy number in order to account for noise in the data. The actual mathematical modeling is in the Olshen et al. (2004) paper. This algorithm is similar to genomic segmentation in that it breaks the genome into segments based upon copy number. However, CBS was not chosen in this study since it is not a compatible function in

Partek GS.

The genomic segmentation (GS) algorithm was used to analyse DNA copy number data in this study. GS is a two step process in which the first step segments the genome based on three criteria and the second step consists of determining amplifications and deletions. The two step process can be thought of as segmentation and report due to the parameters that each requires.

GS enables the discovery of significant areas of copy number based upon three basic criteria.

The first states that neighbouring regions have statistically significantly different average intensities. This is defined by a p value set by the researcher, in this case a p-value of 0.001. The second criterion states that breakpoint (between regions) are selected for optimal statistical significance. That is, the region boundaries are the places where there is the smallest p-value. In this study, the region boundary p-value was set at 0.01. The third criterion of GS states that the regions must have a minimum number of data points pre-determined by the researcher. In this

32

study, there was a minimum of 10 probe sets for each reported region of significance. The signal to noise ratio set for GS was 0.3. The signal to noise ratio is determined by the change in intensity between adjacent probe sets from the array. Once the segments are determined using the aforementioned p-values, two one-sided t-tests are performed for the probes in each region.

One t-test is for intensities above a given threshold and the other is for the intensities below the given threshold. These threshold values are set by the researcher and are related to the copy number. The minimum p-value of these two t-tests is used to determine if the region has significant deviation from normal copy number. The aim of this study was to find regions of particular high copy number or much loss. Thus, the threshold for gains was set at anything over

4 copies. For losses the threshold was set at copies less than 1. A search of the literature for commonly used thresholds provided no consensus. In fact, it is commonplace to use several thresholds to find areas of low, medium and high copy number (Nakao et al., 2004; Gorringe et al., 2007; Weir et al., 2007). In order to determine the appropriate thresholds for this study an analysis of the thresholds of 1, 1.5, 2.5, 3, 3.5 and 4 was completed. These thresholds were then graphed versus the number of gene blocks that resulted from the segmentation algorithm. It was determined from this analysis that the thresholds for this study would be above 4 for gains and less than 1 for losses.

Since this is a study of 100 serous carcinomas from women, all intensities from the Y chromosome were filtered out. In addition, intensities from the mitrochondrial chromosome were removed since this study focuses on the autosomes. However, mitochondrial chromosomes have been thought to contribute to carcinogenesis in a variety of ways (Hoberman, 1975; Morais et al.,

1994; Zhang et al., 2008).

33

2.14 Assessment of Overall Trends in Data

Assessment of the overall trends in copy number data across all of the 100 serous cancer samples could be performed by averaging the copy number for each gene (personal communication, Dr. Elisabeth Tillier). This would yield an average copy number for a gene across all the samples. This standard statistical technique would be particularly useful in directing the researcher to a specific chromosome of interest that may be aberrant in all of the samples. Cancer driver genes are thought to be mutated or amplified at an early stage in tumorigenesis. Thus, as overall assessment of trends could be highly informative for driver gene discovery. Another advantage of this method is that there are no limitations put on the values of the copy number gains or losses. Therefore, an average copy number of 2.2 would be picked up using this method. However, this additional analysis was not performed in this study due to timing and lack of verification of the true homogeneity or heterogeneity of the samples. Future analyses will likely involve using this assessment of overall trends in the data.

Clinical follow-up data was not available for the samples used in this study. However, most serous cancers present at late stage, so the source of heterogeneity among these samples is primarily due to the nature of ovarian cancer cells. Serous ovarian cancer cells are highly heterogeneous at the molecular level (Tan and Kaye, 2007; Cho and Shih, 2008). This heterogeneity could also be attributed to variation in the pathological typing of the tumors

(Mccluggage, 2008). However, the tumors in this study were all assessed by the same pathologist. Intratumoral heterogeneity is also a cause of molecular heterogeneity in serous ovarian tumors (Cho and Shih, 2008). The degree of stroma in the tumor may influence the genes that are identified in subsequent analysis. However, the tumors were deemed 80% tumor content

34

of more. Therefore, the effect due to stromal genes is likely to be small but is still unknown.

Heterogeneity of the samples was also assessed by the Pearson’s dissimilarity clustering across each chromosome. The use of thresholds to analyse the copy number data is motivated by an overall goal to identify areas of the genome that are particularly aberrant to a high degree.

Therefore, aberrations with moderate levels of change are filtered out in order to focus on the large copy number gains or losses. Previous copy number studies in serous ovarian cancer

(Gorringe et al., 2007) have also used threshold analysis. The use of thresholds to analyse this data does not invalidate the approach using averaging to asses overall trends. Ideally, these two methods could be used to concert with gene expression data to give a robust view of the serous ovarian cancer genome.

2.15 Segments Found in Multiple Samples

The next step in analysing the copy number data was to find segments of interest in multiple samples. Since there was a total of 100 serous samples it was important to establish a minimum number of sample to look for recurrent segments. A lot of consideration was put into this number since if a high number of samples was selected, for instance, 50% or 75% of the samples, particularly high copy number in a small percentage of samples may be missed. The samples are all serous, but this is by no means a homogenous group. For instance, some samples may be from late stage ovarian cancers while others may be from early stage cases. Since we are blinded to the clinical follow-up information for the samples used in this study this information cannot be included in the analysis. Therefore, since this may not be a completely homogenous group of samples choosing a high number to look at for recurrent segments would perhaps miss important copy number changes in subgroups. For instance, the frequency of the different

35

subtypes of breast cancer vary by ethnic background and age at diagnosis among other factors

(Carey et al., 2006). For instance, basal breast cancer was seen in 39% of premenopausal of

African-American descent but only 16% of non-African-American women with breast cancer

(Peppercorn, 2008). Before settling on a final value of a minimum of 2 samples, several different minimum values were tested and are listed in the results section. However, a minimum of 2 samples was selected as a threshold for finding recurrent segments in the serous ovarian samples.

Although, this number only accounts for 2% of the total number of samples this number of samples will guard against missing important areas in the ovarian cancer genome. The detected regions in multiple samples were then plotted using Partek GS in a karyoview. This showed amplifications in red and deletions in blue. The relative degree of amplification or deletion was shown by the distance of the colour bar from the chromosome.

2.16 Gene Annotation

The RefFlat file contains human gene annotation information from the UCSC database that links into the regional genomic information in Partek GS. Thus, the genes located in the segments of the genome identified in multiple samples were found by linking to the refflat file.

This file was downloaded from the UCSC Genome Browser website (UCSC Genome Browser,

2008). Therefore, the RefSeq ID and gene symbol was displayed alongside the start and end point for each segment in the genome identified.

2.17 Gene List

A list of genes ranked by copy number was created using Partek GS. The number of genes on the list varied depending on the threshold values selected for the aforementioned

36

criteria. The samples, where the aberrations were found, are also listed alongside the genes.

Thus, a list of aberrant genes in each samples was also created. The gene lists were sorted using

Microsoft Excel. The lists were first sorted by number of samples, copy number and chromosome.

2.18 Removal of Redundant Genes in “Normal” Samples

The next step was to compare the sorted gene lists developed for the 100 serous samples and 10 normal samples. Presumably genes with aberrant copy number on both lists were due to copy number variance in the germline. Thus, as discussed before, these genes were eliminated from the list of genes contributing to carcinogenesis from the 100 serous samples.

A second method to remove confounding common copy number polymorphisms from the gene lists was to search the Database of Genomic Variants (DGV) (Iafrate et al., 2004; Database of Genomic Variants, 2008; Scherer et al., 2008), UCSC Genome Browser (UCSC Genome

Browser, 2008), Human Structural Variation Database (Sharp et al., 2005; Scherer et al., 2007;

Human Structural Variation Database, 2008), Human Segmental Duplication Database (Cheung et al., 2003; Scherer et al., 2007; Human Segmental Duplication Database, 2008) and the Gene

Expression Omnibus (GEO) (Scherer et al., 2007; Barrett et al., 2008; GEO, 2008).

2.19 Gene List Parameters

In the genomic segmentation algorithm run to detect copy number aberrations in the samples a minimum of 2 samples was selected. However, as stated previously this only accounts for 2% of the sample size. Therefore, a histogram (see Figure 3 in Appendix) was created to determine an appropriate cut-off for the number of samples. After review of the distribution of

37

sample number, it was decided to analyze aberrations that were in at least 5 samples or more for copy number gain and 4 samples or more for copy number loss. Single genes on the list that occupy a small region of the genome are probably due to a normal polymorphism. Therefore, two parameters were developed to filter out normal polymorphisms of the kind from the gene list. First, a minimum of 10 adjacent genes with the same copy number on a chromosome was required. Secondly, genes from regions in the genome less than 150kb with copy number aberration were removed from the gene list. Since genes have variable size in the genome, these parameters were combined with the Boolean “OR” operator. By applying these parameters to the gene list data, “bocks” of genes across the genome will be observed. These “blocks” may be chromosomal rearrangements which are commonly observed in cytogenetic analysis in ovarian cancer (Bello and Rey, 1990). Scatterplots showing the copy number for each sample in the

“blocks” described were made. A final gene list after application of these parameters was created and ranked by number of samples. A flow chart (Figure 4) showing the process of applying the parameters is in the Appendix.

2.20 Selection of Top Hits

Once normal copy number polymorphisms were removed from the gene list and analysis of gene “blocks” was completed, a selection of the top genes was made to further investigate on a case-by-case basis. Since there was over 500 genes identified in the gene list a focus on novel and protein-coding gene discovery was made in this study. These genes were further investigated using the following online databases: National Center for Biotechnology Information (NCBI)

(NCBI, 2008); Oncomine (Rhodes et al., 2007; Oncomine, 2008) and GEO (GEO, 2008). These genes are highlighted in bold text in the gene list (Table 2 in Appendix).

38

2.21 Functional Inquiry into Genes of Interest

Not all copy number aberrations correlate to functional gene expression (Conde et al.,

2007). Conde et al. (2007) argue that there is a dearth in the validity of copy number experiments since many of the identified regions of copy number variation do not have functional gene expression. Therefore, a key step in the analysis of the gene list was to link it to functional gene expression lists and databases.

2.22 Pathway Analysis of Genes of Interest

Canonical pathways of interest in cancer biology include the SOS-Raf-Ras-MAP kinase pathway (Giancotti and Ruoslahti, 1999), pRB pathway (Hanahan and Weinberg, 2000), APC/β- catenin pathway (Kinzler and Vogelstein, 1996), PI3 Kinase – AKT/PKB pathway (Yuan and

Cantley, 2008) and the p53 pathways (Vogelstein et al., 2000). Therefore, identifying key pathways where the genes of interest in this study fit into is invaluable for further study. Several online pathway software programs are available for this purpose. These databases include:

Pathway searcher (Cancer Genome Anatomy Project, 2008) and Wikipathways (Pico et al.,

2008; Wikipathways, 2008). These two online databases are linked to the Kyoto Encyclopedia of

Genes and Genomes (KEGG) (Aoki-Kinoshita and Kanehisa, 2007) and BioCarta databases (Yi et al., 2006).

39

Chapter 3: Results

40

3.1 Introduction

The results of this study are from 100 serous ovarian cancer samples and 10 normal samples run on the Affymetrix Genome-Wide SNP array 6.0 and subsequent statistical and in silico analysis.

3.2 Quality Control – Principal Components Analysis

The quality of the samples loaded into Partek GS was assessed by Principal Components

Analysis (PCA). All 100 serous ovarian cancer samples from all run dates were loaded into

Partek GS and the PCA plot is shown in Figure 5. Six outliers were identified (943; 949; 964;

1016; 1040; 1386). The outliers were readily identified once an ellipsoid was added into the plot.

These six samples were not all run on the same date. This indicated that it was not a batch effect causing an outlier effect. Therefore, the six outliers were filtered out of the spreadsheet from all future analysis discussed in this study. To reassess the quality and overall similarity of the samples another PCA plot was created (Figure 6). The samples showed a high degree of similarity on the PCA plot and were centred around the first two principal components. Several samples were observed to lie outside the centred ellipsoid but these were not removed since the distance was not as great as with the initial 6 outliers. At this point, an effort was made to preserve the large sample size to keep the power of the study high.

Normal samples were also assessed for quality and similarity using a PCA plot. A plot of all ten samples is shown in Figure 7. One sample was identified as an outlier (1652). This sample was removed from all future analysis. A PCA plot was created after the outlier was removed

(Figure 8).

41

Figure 5. PCA plot of 100 serous samples that were run on seven dates in 2008. Each sphere represents a different sample. Sphere colours correspond to date run on array. The ellipsoid is centred around where the majority of samples have clustered.

42

Figure 6. PCA plot of 94 serous samples (6 outliers: 943; 949; 964; 1016; 1040; 1386) filtered out from the analysis. The ellipsoid is centered around the majority of the samples showing a relative high degree of similarity.

43

Figure 7. PCA plot of 10 normal samples. Each sphere represents a single sample. The colour of the sphere relates to the date run on the array. A single outlier (1652) is observed.

44

Figure 8. PCA plot of 9 normal samples after removal of the single outlier. An outlier (1652) was removed from all further analysis. The ellipsoid encompasses all the samples and a relative high degree of similarity is observed.

45

3.3 Quality Control – Copy Number Histogram

Quality was also assessed by creating a histogram of the samples displaying the log2 ratio for copy number and frequency. This shows the distribution of overall copy number in each sample. Ideally, copy number in a disease-free individual is two in a non-mitotic cell. Thus, good quality samples should have a log2 ratio centered roughly around 2 with a peak in frequency at this point. Even in cancer samples the log2 ratio should be centered at 2 since the whole genome does not have aberrant copy number. A histogram for all 100 serous samples was indeed centered close to a log2 of 2 (Figure 9). However, abnormal frequencies were observed for 6 samples: 943; 949; 964; 1016; 1040; 1386. Sample 949 (run July 18, 2008) had a bimodal distribution of copy number, with peaks at approximately a log2 of 2 and 5. This was not seen in any of the other samples. These are the same samples that were identified as outliers on the PCA plot (Figure 5). Therefore, these samples were removed from all subsequent analysis of copy number. A second histogram was created to show the distribution of copy number and frequency with the 6 outliers removed (Figure 10).

Two histograms were created for the normal samples, one with all ten samples (Figure

11) and the other with the outlier (1652) removed (Figure 12). The profiles of the histograms for the 10 normal samples were similar in overall appearance with curves centered on a log2 ratio of

2. However, the outlier (1652) was filtered out of all future analysis since the best measure of overall similarity, the PCA plot showed this.

46

Figure 9. Histogram of 100 serous samples showing frequency versus log2 ratio of copy number. Samples 943; 949; 964; 1016; 1040; 1386 have abnormal frequencies relative to the other 96 samples.

Figure 10. Histogram of 94 serous samples showing frequency versus log2 ratio for copy number. Outliers were removed and each frequency distribution is roughly centered on a log2 ratio of 2.

47

Figure 11. Histogram of all 10 normal samples. Curves are centered on a log2 ratio of 2. Each curve has a unimodal distribution.

Figure 12. Log2 ratio distribution for nine normal samples. The outlier (1652) identified in the PCA plot was removed in this histogram.

48

3.4 Copy Number Heat Maps

The next step after assessing quality of the samples using the PCA and histogram plots was to create copy number heat maps for each chromosome (1-22 and X). The heat maps display the copy number colour metrically. That is, high copy number is shown in red and low copy number is shown in blue. The heat maps were created using log2 transformed copy number.

Therefore, blue corresponds to losses and red, gains. The intensity of the colour corresponds to the degree of gain or loss. Therefore, the highest areas of colour intensity have the highest copy number. The opposite is true for losses, with the most intense blue colour showing areas of the greatest loss. Heat maps are informative for overall trends in gains and losses. Areas of grey colour have normal copy number. However, the colour intensity is qualitative in nature and actual number of copies is better determined from examining the actual intensities in the spreadsheet after the genomic segmentation algorithm is applied. Therefore, areas of interest were identified on the heat maps for further investigation once the gene list was developed.

Each sample was displayed on the heat map and Pearson’s dissimilarity clustering was used to cluster the samples across each chromosome. This was one of several methods of possible clustering that is discussed in the methodology section of this thesis. The heat maps are accompanied by a chromosome map so that the copy number colour intensity corresponds to the approximate location on the chromosome.

3.5 Serous Ovarian Cancer Heat Maps

On the short arm of chromosome 1, there was an area of copy number loss in a cluster of samples (Figure 13). There was a cluster of copy number gain on the long arm of chromosome 1.

49

The samples exhibiting losses in the cluster on the short arm were not the same samples that had gains on the long arm.

The heat map of chromosome 2 (Figure 14) showed a consistent high copy number gain across all the 96 serous ovarian cancer samples. This was in the region near the on the short arm of the chromosome. This consistent high level gain is unlikely to be an actual variant due to a driver gene. The most likely cause of this consistent high level change is due to a normal polymorphism in the disease-free population. When normal variants were removed from the gene list developed after the genomic segmentation algorithm was applied this issue was resolved.

Chromosome 3 (Figure 15) had three distinct clusters of copy number gain covering the majority of the long arm of the chromosome. These clusters were of varying red colour intensity indicating different levels of copy number gain. There was also a region of copy number loss on the short arm of the chromosome. The samples with the highest copy number gains on the long arm of the chromosome were the same ones with the cluster of copy number loss on the short arm. There were also two distinct high copy number regions on the long arm of the chromosome that showed up as red lines on the figure. This may have been due to an actual driver gene copy number gain or a normal polymorphism. This difference was determined later in the analysis of the results.

The heat map of chromosome 4 (Figure 16) displayed widespread copy number loss with the exception of two areas of copy number gain and a line red colour intensity at 4q18. This line corresponded to a consistent area of copy number gain across all the samples, similar to the lines encountered on chromosome 2 and 3. The overall profile of chromosome 5 (Figure 17) displayed 50

two distinct clusters of copy number variation. One cluster of copy number loss was located on the long arm of the chromosome and the other was one of gain on the short arm. The remainder of the chromosome had various copy number gains and losses but no consistent pattern.

Chromosome 6 (Figure 18) had two clusters of copy number variation similar to that on chromosome 5. That is, there was a cluster of copy number gain on the short arm of chromosome

6 and a cluster of copy number loss on the long arm. The opposite of this pattern was seen on chromosome 7 (Figure 19) with copy number losses predominantly on the short arm and gains on the long arm.

The heat map for chromosome 8 had a striking appearance of intense red colour (Figure

20). This was indicative of high copy number gain. The most intense red colour was at the terminal part of the long arm of the chromosome. A key oncogene located in the 8q24 area is c- myc (Kobel et al., 2008; Dang et al., 2006). There was also a region of decreased blue colour tone on the short arm chromosome 8 corresponding to copy number loss in more than half of the ovarian cancer samples. Interestingly, the FGFR1 gene is found in the region at 8p11

(Stachowiak et al., 2007).

Chromosome 9 (Figure 21) was characterized by copy number losses on the long arm, shown in a light blue hue and a small region of copy number gain on the short arm. This pattern was seen again on chromosome 10 (Figure 22) with gains on the short arm and losses on the long arm. However, the region of copy number gain on chromosome 10 extended onto the long arm of the chromosome near the centromere unlike the pattern seen on chromosome 9. Unlike both chromosomes 9 and 10, chromosome 11 (Figure 23) did not have any readily observable clusters

51

of copy number variation. Instead there were areas of increased red and blue tone, indicating gains and losses respectively throughout the heat map.

An area of high redness was visible on the short arm of chromosome 12 (Figure 24) indicating high copy number gain. This area, although encompassing only a small fraction of samples had a high intensity of colour, similar to that seen on 8q24. There was also an area of blueness noticeable on the long arm of chromosome 12 indicating copy number losses.

The heat map for chromosome 13 did not have any information for the small arm since there were no probes in this area of the array due to the highly repetitive nature of DNA in this area of the genome. Therefore, only the long arm of chromosome 13 is featured in Figure 25.

The overall colour tone was blue in this figure indicating copy number losses throughout the long arm of chromosome 13. Chromosome 14, also missing short arm information due to lack of probes in this area, had a red line of intensity on the heat map (Figure 26). This feature, also seen on chromosome 2, 3 and 4 may be due to a recurring copy number gain due to carcinogenesis or a normal polymorphism in the population. This will be distinguished later in the chapter.

Chromosome 15 (Figure 27) had a similar overall pattern of copy number variation compared to that of chromosome 13 with losses apparent on the long arm. However, chromosome 15 had a small area of increased red tone on the telomeric end of the long arm.

Chromosomes 16 (Figure 28) and 17 (Figure 29) were characterized by an overall blue colour tone representative of copy number loss. The tone was not that intense so losses are presumably low level for these two chromosomes. There was also a red line indicative of copy number gain on chromosome 17. The line, also seen on chromosomes 2, 3, 4, and 14 was investigated further in the analysis of results. 52

An area of high red colour intensity was visible on the heat map for chromosome 18

(Figure 30). This area of copy number gain was isolated to a few samples; however, the colour had high intensity so the gains must be of high number. The remainder of the heat map for chromosome 18 was an even blue tone indicating copy number losses.

The pattern of copy number variation for Chromosome 19 (Figure31) was different than chromosomes previously discussed. This chromosome had a small area of copy number loss in the majority of samples close to the telomere on the short arm and small clusters of high copy number gain over the remaining parts of the chromosome. An area of particularly high copy number gain was seen in the 19q12 to 19q13.2 region.

Chromosome 20 (Figure 32) had an overall red saturation in the heat map. The saturation was particularly intense close to the telomere on the long arm of the chromosome for several samples. Therefore, chromosome 20 had low to high level copy number gains over the whole chromosome.

The heat map for chromosome 21 (Figure 33) did not have particularly high saturation of either blue or red colour. Instead, the overall colour in this heat map was blue with small areas of low level copy number gain indicated by red. Similarly, chromosome 22 (Figure 34) had low level losses over the long arm of the chromosome. However, there was an area of high copy number gain close to the centromere of chromosome 22, indicated by high red colour saturation.

There was also a red line of colour (also seen on chromosome 2, 3, 4, 14, and 17). Finally, the X chromosome (Figure 35) had mostly areas of high copy number for the majority of the samples on both the short and long arms of the chromosome. There was a small area of copy number gain in a few samples near the telomere of the short arm. 53

Figure 13. Heat map displaying copy number for chromosome 1 for 96 serous ovarian cancer samples. High copy number gain was seen on the long arm of the chromosome. A region of copy loss could be seen on the short arm.

Figure 14. Heat map for chromosome 2 of 96 serous samples. A consistent copy number gain of intense red colour was seen in the region of centromere on the short arm of the chromosome. This may be a normal polymorphism in the population and was verified by consulting online databases such as the Database of Genomic Variants (DGV, 2008). 54

Figure 15. Chromosome 3 heat map for 96 serous ovarian cancer samples. A large region of copy number gain was seen on the long arm and similarly area of loss is displayed on the short arm.

Figure 16. Heat map of chromosome 4 for 96 serous ovarian samples. Most of this chromosome showed areas of copy number loss except for a consistent polymorphism at 4q18.

55

Figure 17. Heat map of chromosome 5 for 96 serous samples. An area of copy number loss on the long arm was seen on this heat map for chromosome 5 of 96 serous samples. An area of gain, in some of the same samples, was seen with high intensity red colour on the short arm.

Figure 18. Heat map for 96 serous samples of chromosome 6. The heat map displayed areas of copy number loss and gain on the long and short arms, respectively. 56

Figure 19. Heat map for chromosome 7 of 96 serous ovarian cancer samples. Distinct areas of copy number gain and loss could be seen on the long and short arms respectively.

Figure 20. Heat map of chromosome 8 for 96 serous ovarian cancer samples. An area of high copy number gain was seen in this heat map on the long arm of chromosome 8. There was also an area of copy number loss for the majority of samples on the short arm. 57

Figure 21. Heat map of chromosome 9 for 96 serous ovarian cancer samples. Copy number losses were detected for the majority of chromosome 9, except in a small region of the short arm for several samples.

Figure 22. Heat map of chromosome 10 for 96 serous ovarian cancer samples. The heat map for chromosome 10 showed copy number gains on the short arm and a few areas of loss on the long arm. 58

Figure 23. Heat map of chromosome 11 for 96 serous ovarian cancer samples. The heat map did not show any consistent clusters of copy number gain or loss for the 96 serous ovarian cancer samples.

Figure 24. Heat map of chromosome 12 for 96 serous ovarian cancer samples. There was an area of high intensity red colour showing high copy gain on the short arm of chromosome 12. Several other areas of copy number loss were evident on the long arm.

59

Figure 25. Heat map of chromosome 13 for 96 serous ovarian cancer samples. The heat map did not have information for the short arm since there were no probes in that region of the array. Widespread copy number loss was notable on this chromosome’s long arm.

Figure 26. Heat map of chromosome 14 for 96 serous ovarian cancer samples. The heat map did not seem to have any overall patterns of copy number gain or loss apart for two bands of gain and loss in a few isolated samples.

60

Figure 27. Heat map of chromosome 15 for 96 serous ovarian cancer samples. Chromosome 15 had widespread copy number loss on this heat map for 96 serous ovarian cancer samples apart for an area of copy number gain near the telomere of the long arm.

Figure 28. Heat map of chromosome 16 for 96 serous ovarian cancer samples. The heat map showed mostly copy number loss, predominantly on the long arm of the chromosome. 61

Figure 29. Heat map of chromosome 17 for 96 serous ovarian cancer samples. Chromosome 17 was characterized by areas of copy number loss on the short arm for the majority of samples and an area of copy number gain for a select group of samples on the long arm.

Figure 30. Heat map for chromosome 18 of 96 serous ovarian cancer samples. The heat map showed widespread copy number loss in the 96 serous ovarian cancer samples except for a small number of samples with gains on the short arm. 62

Figure 31. Heat map of chromosome 19 for 96 serous ovarian cancer samples. The heat map showed an area of intense red colour, representative of high copy number gain on the long arm and several areas of copy loss on the short and long arms.

Figure 32. Heat map of chromosome 20 for 96 ovarian cancer samples. There was widespread copy number gain on this chromosome. 63

Figure 33. Heat map of chromosome 21 for 96 serous ovarian cancer samples. This showed several areas of copy number loss on the long arm of the chromosome. Isolated areas of copy number gain were also notable.

Figure 34. Heat map of chromosome 22 for 96 serous ovarian cancer samples. Widespread copy number loss was displayed on the heat map. There was also a focal copy number gain in most of the 96 serous ovarian cancer samples, possibly a normal polymorphism. 64

Figure 35. Heat map of the X chromosome for 96 serous ovarian cancer samples. There was widespread copy number loss for the 96 serous ovarian cancer samples shown in this figure.

3.6 Normal Sample Heat Maps

The heat maps for the normal samples had much less colour saturation and intensity compared to those for the serous ovarian cancer samples. This was probably due to less copy number gains and losses in the normal samples. Therefore, there was more grey colour on the heat maps for the normal samples, indicative of copy number neutrality. All of the heat maps show clustering of the 9 normal samples into two groups using the Pearson’s clustering method.

However, it could be seen from the overall subtle colour tone of these heat maps that the copy number gains and losses are quite low number.

The heat map for chromosome 1 (Figure 36) had an overall grey appearance with small areas of intense red colour, indicative of copy number gains on the long arm. Both chromosomes 65

2 (Figure 37) and 3 (Figure 38) had lines of red colour intensity on their heat maps. This showed an area of copy number gain on both of the chromosomes. On chromosome 2, the red line was located very close to the centromere on the short arm. On chromosome 3, the red line was much less intense and was situated near the telomere on the long arm. The possibility that these red lines of intense colour were due to a normal polymorphism in the population was enhanced since these were supposedly normal samples themselves.

Chromosome 4 (Figure 39) had an overall grey and low red and blue colour intensity throughout the heat map except for one region near the centromere on the long arm. This focused area had high copy number in the majority of the samples but has low copy number in one sample.

Chromosomes 5 (Figure 40) and 6 (Figure 41) had subtle copy number gains and losses throughout the heat map. The overall colour of these two heat maps was grey indicating no high level copy number gains or losses. The heat map for chromosome 7 (Figure 42) also had an overall look of copy number neutrality except in the centromeric region where there was high red colour intensity. However, this was not due to actual copy number gains but is an artefact of the probes in this region of the array.

A high copy number gain near the telomere on the short arm of chromosome 8 (Figure

43) could be seen from the het map. Chromosome 8 had particularly high copy number, shown with very high red colour intensity in the heat map for the serous samples (Figure 20). However, as expected, this degree of copy number gain was obviously not present in the normal samples.

66

Chromosome 9 (Figure 44) had a similar overall pattern as that of chromosome 7, with the artefact in the centromeric present showing high red colour intensity. There was a focused area of copy number gain on the long arm of chromosome 10 (Figure 45) near the centromere.

This gain was seen in 4 of 9 of the samples. However, the intensity of the colour was not as great as that seen in previous heat maps, for instance that of 8q24 for the serous samples (Figure 20).

Therefore, this gain on chromosome 10 in the normals may have been relatively low level. The heat maps for chromosomes 11 (Figure 46), 12 (Figure 47) and 13 (Figure 48) were relatively unremarkable. The heat maps had an overall grey colour tonality and it is likely that there were no important copy number gains or losses on these chromosomes in the normal samples.

There was an area of copy number gain on chromosome 14 (Figure 49) on the long arm right beside the centromere. In fact, this gain was seen in the majority of the normal samples except for one which had a distinct copy number loss. There was also a line of intense red colour at the telomeric end of the long arm of chromosome 14. This was likely due to a normal polymorphism in the population as explained in previous cases. Chromosome 15 (Figure 50) had a similar pattern of copy number variation to that of chromosome 14. There was virtually an identical region of high copy number gain or loss right beside the centromere on the long arm of chromosome 15. However, on chromosome 15 there was a more balanced mix of gains and losses in this region in the samples compared to that of chromosome 14 which was mostly gains.

Chromosome 16 (Figure 51) had visually the opposite pattern of copy number on the heat map compared to chromosome 14 and 15. This was due to a focal area of either copy number loss or gain in the samples right beside the centromere on the short arm side of the chromosome.

However, the colour intensity was much less compared to that observed on chromosomes 14 and

15. Therefore, this was likely a region of low copy number gain and loss in the normal samples. 67

An area of copy number gain or loss was observed on the heat map of chromosome 17

(Figure 52). The majority of the normal samples displayed copy number gains on this region on the midpoint of the long arm of the chromosome. However, one normal sample displayed a copy number loss in this region. Therefore, this region may have been one of the red line polymorphic regions seen in other heat maps.

Chromosomes 18 (Figure 53), 20 (Figure 55) and 21 (Figure 56) had an overall grey colour. Therefore, there were no high level copy number gains or losses on these chromosomes.

However, on chromosome 19 (Figure 54) there were a few interesting regions of copy number change. On the short arm of the chromosome there was a very narrow region of copy number loss in three samples. Similarly, on the long arm of the chromosome there was a region of copy number gain in one sample and loss in the other. However, similar to other gains and losses observed on the heat maps for the normal samples the colour was not particularly intense.

Therefore, the actual copy number gains or losses in these areas were potentially not significant.

The heat map for chromosome 22 (Figure 57) was perhaps the most interesting of all the normals. This was because of intense red colouration of several regions on the long arm of the chromosome on the heat map. There were roughly 6 bands of red colouration on this map

(including one line of red intensity likely due to a normal polymorphism). Therefore, chromosome 22 likely had widespread copy number gain in the normal samples in these regions of the long arm of the chromosome.

Finally, the heat map for the X chromosome (Figure 58) had a very grey appearance except for one sample that showed a high level copy number loss on the long arm of the chromosome. 68

Figure 36. Heat map of chromosome 1 for 9 normal samples. The heat map displayed subtle red and blue colouration but no overall clusters of copy number loss or gain.

Figure 37. Heat map of chromosome 2 for 9 normal samples. The heat map displayed a common high intensity red area across all samples. 69

Figure 38. Heat map of chromosome 3 for 9 normal samples. A faint area of red on the long arm was visible and may have indicated copy number gain.

Figure 39. Heat map of chromosome 4 for 9 normal samples. The heat map showed areas of copy number gain and loss in the para-centromeric region.

70

Figure 40. Heat map of chromosome 5 for 9 normal samples. There was only subtle red and blue colour in this figure.

Figure 41. Heat map of chromosome 6 for 9 normal samples. No distinct areas of copy number gain or loss were visible in the heat map.

71

Figure 42. Heat map of chromosome 7 for 9 normal samples. There was an area of copy number gain in the area of the centromere of chromosome.

Figure 43. Heat map of chromosome 8 for 9 normal samples. There were isolated areas of copy number gain on the short arm of chromosome 8 in 5 of the 9 normal samples displayed in this figure. 72

Figure 44. Heat map of chromosome 9 for 9 normal samples. There was subtle red and blue colouration throughout the heat. The intense colouration in the region of the centromere was due to lack of probe signal in this area from the array.

Figure 45. Heat map of chromosome 10 for 9 normal samples. Several of the 9 normal samples showed copy number gain on the long arm of chromosome 10. 73

Figure 46. Heat map of chromosome 11 for 9 normal samples. The heat map showed widespread low level copy number gain and loss.

Figure 47. Heat map of chromosome 12 for 9 normal samples. Low level copy number loss on the long arm of chromosome 12 was displayed for 9 normal samples in this heat map.

74

Figure 48. Heat map of chromosome 13 for 9 normal samples. The heat map displayed widespread low level copy number gain on the long arm of the chromosome for all 9 normal samples.

Figure 49. Heat map of chromosome 14 for 9 normal samples. There were two regions of high copy number gain on the long arm of chromosome 14 in this heat map.

75

Figure 50. Heat map of chromosome 15 for 9 normal samples. There were no observed high copy number gains or losses.

Figure 51. Heat map of chromosome 16 for 9 normal samples. The heat map showed a few isolated areas of high copy number gain on the short arm of the chromosome. 76

Figure 52. Heat map of chromosome 17 for 9 normal samples. The heat map displayed areas of high copy number gain on the long arm of the chromosome.

Figure 53. Heat map of chromosome 18 for 9 normal samples. The heat map displayed low level copy number gain across both the long and short arms of the chromosome. 77

Figure 54. Heat map of chromosome 19 for 9 normal samples. There was an area of high copy number gain and loss on the long arm. There was also widespread low level copy number changes.

Figure 55. Heat map of chromosome 20 for 9 normal samples. The heat map showed low level copy number variation.

78

Figure 56. Heat map of chromosome 21 for 9 normal samples. The heat map showed low level copy number variation across the long arm.

Figure 57. Heat map of chromosome 22 for 9 normal samples. There was high copy number gains visible on the heat map of chromosome 22. 79

Figure 58. Heat map of X chromosome for 9 normal samples. The heat map showed widespread low level copy number variations.

3.7 Normal and Serous Heat Map Comparison

The overall impression of the heat maps observed for the normal and serous samples was highly informative for trends in copy number variation. All the heat maps (Figures 13-35) for the serous ovarian cancer samples had regions of high blue or red colour intensity. This indicated that the cancer samples have areas of high copy number or low copy number. In fact, this trend is what was expected. On the converse, the heat maps for the normal samples were mostly grey in colour tone indicative of copy number neutrality. These findings were consistent with what was expected in the normal and serous ovarian cancer samples.

80

The only regions in the normal samples that seemed to have high copy number were areas of gain, indicated by red lines throughout all the samples. These red lines occurred on chromosomes 2, 3, 4, 14, 17 and 22 in the serous ovarian cancer samples. In the normal samples these red lines appeared on the heat maps for chromosomes 2, 3, 14, 17, and 22. Therefore, it is likely that these red lines were due to normal polymorphisms found in all whether diseased or disease-free and therefore these regions could be discounted from the search for ovarian cancer driver genes. Another possible explanation for these red lines seen in both normals and ovarian cancer samples could be an artefact due to the experimental technology, perhaps the spacing of the copy number or SNP probes at these particular regions are aberrant.

However, in both cases, either due to an actual normal polymorphism or a technological problem the regions could be removed from any further in-depth analysis for elucidating driver genes.

The only chromosome that displayed a red line in the serous samples but not in the normals was on chromosome 4. Therefore, it was important to analyse this further after the genomic segmentation algorithm was applied.

3.8 Copy Number Variation Karyoview

As outlined in the methodology section of this thesis, the genomic segmentation algorithm was applied to the copy number intensities for all samples. It was important to select only aberrant copy number gains and losses, so the threshold was set above 4 copies for gains and less than 1 copy for losses. Obviously, the ability to detect gains has greater scope since there is no limit on the amount of gains in the cell, but only 2 copies can be lost for losses. In order to maximize the ability to detect changes a threshold of only 2 samples was set for recurrent copy number changes. The number of genes for thresholds of above 2.5, 3, 3.5 and 4

81

were used to determine the appropriate level for gains (Figure 59). Similarly, thresholds of 1.5 and 1 were used to develop a rationale for determining how many copies to include for losses

(Figure 60). A balance between a high number of gene “blocks” and likelihood of uncovering a driver gene in the block was used to determine the thresholds. Therefore, if a copy number change of above 4 or less than 1 was detected in 2 or more samples it was detected by the genomic segmentation algorithm. The complete set of chromosomes with these changes shown in blue (for losses) and red (for gains) is shown in Figure 59 for the 96 serous ovarian samples and Figure 60 for 9 normal samples. The distance of the coloured band from the chromosome showed the relative degree of copy number gain or loss. Therefore, the highest gains and losses were furthest from the chromosome graphic in the Figure. The pattern of gains and losses in the serous ovarian cancer samples was similar to the trends seen in the heat maps for these samples

(Figures 13-35). There were large areas of copy number gain seen on chromosomes: 1, 3, 6, 8,

12, 19 and 20. This was consistent with the heat map. In fact, whole arms of chromosomes seem to be gained on chromosomes. The short arms of chromosomes 6 and 12 were almost completely amplified. Similarly, the long arms of chromosomes 3, 8 and 20 were almost completely amplified. Chromosome 8 was the chromosome that showed the most gain and this was consistent with the heat map and also literature in cancer research (Cooke et al., 2008; Salinas et al., 2008; Yeager et al., 2008). Large areas of copy number loss were seen on chromosomes: 4, 5,

6, 8, 9, 13, 16, 18, 21, 22 and X. Almost the complete short arm of chromosome 8 showed copy number loss in this karyoview. Also, there was copy number loss across the whole X chromosome in the serous ovarian cancer samples. These findings were consistent with the trends observed on the heat maps. The overall pattern of gains and losses on the karyoview for the normal samples (Figure 60) showed much less gains and losses compared to the 96 serous

82

ovarian cancer samples. There were no instances of partial or complete chromosomal copy

number gain or loss. The gains and losses seen in the normal samples were narrow and few in

number compared to the serous samples.

Number of blocks

2.5 3 3.5 4 Threshold Figure 59. Threshold values for copy number gain. Copy number above 4 was chosen for the upper threshold in this study for gains.

Number of blocks

1 1.5 Threshold

Figure 60. Threshold values for copy number loss. Copy number below 1 was chosen for the lower threshold in this study for losses 83

Figure 61. Serous ovarian cancer karyoview. This showed copy number gains (red) and losses (blue) for each chromosome. Gains of more than 4 copies and losses of 1 copy in at least 2 samples were shown.

Figure 62. Normal samples karyoview. This showed copy number gains (red) and losses (blue) for each chromosome. Gain of more than 4 copies and losses of 1 copy in at least 2 samples were displayed.

84

The recurring polymorphisms seen as red lines in the heat maps on chromosomes 2, 3, and 14 were also observed in common between the karyoviews also. However, chromosomes 17 and 22 which had red line copy number gain polymorphisms in common between the normal and serous samples did not show up in the normal karyoview. This was probably because the copy number gain on chromosomes 17 and 22 are between 2 and 4 copies. This reinforced that the overall trends from the heat maps are qualitative in nature and actual copy number information is unlikely to be garnered from measuring colour intensity and saturation. However, the maps and karyoviews were highly informative for displaying overall trends and areas of interest in the genome in a single graphic.

3.9 Serous Gene List Properties – Copy Number Gains

The gene list developed for the 96 serous samples with copy number gains with the threshold of at least 4 copies had a total of 1525 genes identified in 18 chromosomes.

Chromosomes 2, 16, 18, 21 and X (referred to as 23 for graphing) did not have copy number gains of more than 4 copies. In the original list Chromosome 17 harboured the region with the greatest number of copies, 12.2 copies (Figure 61). This high level gain was found in two samples, 2372 and 2850. The mean number of copies identified in the copy number gains over 4 across all the chromosomes was actually 5.0.

Each of the 18 chromosomes with gains of more than 4 copies had several regions of amplification with several possible genes involved in each region. However, it was important to establish which chromosomes had the areas of copy number gain that was found in the greatest number of samples. This condition was to fulfill the ultimate goal of supporting the original

85

Copy Number

Chromosome

Figure 63. Graph of copy number gains over 4 copies for each chromosome after the genomic segmentation algorithm was applied to 96 serous ovarian cancer samples.

Number of samples

Chromosome

Figure 64. Graph of the number of samples for the top copy number gains found on each chromosome. Chromosomes with more than one area of copy number gain were only represented with the top number of samples shown. 86

hypothesis that the ovarian cancer driver gene will have high copy number in a large number of samples. Therefore, very high copy number in two or three samples would less likely be due to a driver gene gain compared to a high copy number gain in a large number of samples. A histogram was developed to gauge the distribution of copy number aberrations relative to the number of samples (Figure 3, see Appendix). The majority of the copy number gains occurred in only 2 samples. It was decided to analyse copy number gains that occurred in a minimum of 5 samples or greater. This threshold greatly reduced the number of genes on the list of copy number gains.

Of a total of 96 serous ovarian cancer samples, 21 had gains of 7.2 copies at a specific region on the long arm of chromosome 1 at 1q21.3. The gene annotated at this is LCE3C, late cornified envelope 3C (NCBI, 2008). A second group of 16 samples had a gain of 5.5 copies on the long arm of chromosome 7 at 7q34. The gene at this locus is PRSS2, protease, serine, 2

(trypsin 2) (NCBI, 2008). Another group of 13 serous samples had a gain of 6.4 copies at

10p12.2-p12.1 on the short arm of chromosome 10. The gene annotated for this locus is

KIAA1217. The copy number gains in these genes occurred in the most samples; however it was very likely that they are normal polymorphisms. Therefore, the additional parameters were applied to the gene list. These parameters included: removal of single gene entries, removal of aberrant regions that were less than 150kb or equivalent to 10 genes in length, removal of normal genes by inspection and removal of aberrations that occurred in less than 5 samples. Once these parameters were applied to the gene list, “blocks” of genes across the genome were observed.

The final gene list for copy number gains is Table 2 in the Appendix. The overall distribution of these “blocks” can be seen in the karyoview (Figure 59). A total of 8 “blocks” were found ranging in copy number gain from 4.2 to 5.1 (Figure 63). 87

Copy Number 19q

8q 1p 19q 8q 1q 1q 12p 20q

Number of samples Figure 65. Scatterplot of “blocks” of copy number gain in serous ovarian cancer samples. Each point represents a region on a chromosome where there are areas of copy number gain in at least 10 genes or across more than 150kb.

The first “block” of genes was observed on the long arm of chromosome 8 in 13 of the

samples. The copy number for this area of gain was 4.7. The region contained 67 genes in the

8q24 region. A scatterplot for this region is shown in Figure 64. Ten zinc finger protein genes

were identified with copy number gain in this region. Additionally, several intermediate filament

binding type genes, such as EPPK1 and PLEC1 were identified in this region. PLEC1 has

actually been associated with an colon cancer (Lee et al., 2004). In turn, filament binding

are involved in cell polarization and polization is known the have implications in cancer

(Iden and Collard, 2008). Another gene of interest in the 8q24 region was GPT, a gene encoding

the enzyme that catalzyes a reaction which produces pyruvate. This transaminase is upregulated

in the setting of hepatocellular carcinoma and alcholoic liver disease (Muryama et al., 2007).

Interestingly, MYC was not among the genes with copy number gain in this block, despite its

88

location at 8q24. There are several reasons for its absense. For instance, this gene probably did not have copy number gain over the cut-off of 4.

The second “block” of copy number gain across 12 genes was located on the long arm of chromosome 19. This area, with a gain of 4.7 was found in 10 samples. A scatterplot showing the copy number gain in this region is shown in Figure 65. Although not a high level gain, the known ovarian cancer gene, Cyclin E1 (CCNE1), was detected in the region (Tan and Kaye,

2007). Several of the other genes, such as C19orf12, C19orf2 and DPY19K3, with copy number gain at the 19q12 locus are uncharacterized.

A region containing 13 genes on the short arm of chromosome 12 was found to have copy number gain of 4.3. This region was gained in 8 samples. A scatterplot showing the copy number gain in these 8 samples is shown in Figure 66. A key gene relevant to cancer in this region was the NANOG gene. This gene which encodes a transcription factor has been implicated in the pathogenesis of germ cell and testicular carcinomas (Hoei-Hansen et al., 2005 and Piestun et al., 2006). Two genes involved in inflammation signalling, CLEC4A and

CLEC4C also had copy number gain in this region.

A copy number gain of 4.2 in 7 samples was detected on the long arm of chromosome 20

(Figure 67). This “block” of copy number gain spanned 93 genes. Several key cancer genes were found among this list. For instance, RAB22A, a member of the RAS oncogene family had increased copy number here. The TH1L gene, known to interact with and activate A-Raf, was also found in this gene “block” at 20q13 (Yin et al., 2002). Another notable gene with copy number gain in this region was BMP7. This gene has been shown to be expressed in both fetal and adult ovary (Abir et al., 2008). Additionally, overexpression of BMP7 has been 89

Copy Number 4

3

2

1

3509 0 3502 4184 2368 2367 1970 1083 1589 1909 1986 1367 982 1624

Figure 66. Scatterplot of the 8q “block” in 13 serous ovarian samples. The heat map shows copy number gain in the samples at the end of the long arm of chromosome 8. The scatterplot shows high copy number in this region. Sample ID’s are listed to the left of the heat map. 4

3 Copy Number 2

1

0 3509 3507 2846 996 1006 927 1641 2004 1683 1393

Figure 67. Scatterplot of the 19q “block” for 10 serous ovarian cancer samples. The heat map shows copy number gain in the samples at the end of the long arm of chromosome 19. The scatterplot shows high copy number in this region. Sample ID’s are listed to the left of the heat map. 90

Copy Number

4

3

2

1

0 2373 3507 4184 2846 3506 890 1532 1082

Figure 68. Scatterplot of the 12p “block” of 8 samples. The heat map shows copy number gain in the samples at the end of the short arm of chromosome 12. The scatterplot shows high copy number in this region. Sample ID’s are listed to the left of the heat map. Copy Number 4

3

2

1

0 3509 3510

2841 972 1006 1641

986

Figure 69. Scatterplot of the 20q “block” of 7 serous ovarian samples. The heat map shows copy number gain in the samples at the end of the long arm of chromosome 20. The scatterplot shows high copy number in this region. Sample ID’s are listed to the left of the heat map.

91

found in colorectal carcinoma (Motoyama et al., 2008), prostate carcinoma (Doak et al., 2007) and breast cancer (Yan and Chen, 2007).

The “block” of genes with the highest copy number gain (5.1) was found on the long arm of chromosome 19. However, as seen in Figure 68, this aberration was found in only 6 samples.

The region contained a total of 79 genes, many of which were zinc finger encoding. The PAK4 gene, at the 19q13 locus, had increased copy number. The gene activates the JNK family of

MAP kinases and is involved in cytoskeleton reorganization and polarity (Kho et al., 2008). A particularly interesting finding was the copy number gain in the ACTN4 gene. This gene, involved in multiple cytoskeletal and cytoplasmic processes was shown to be overexpressed in advanced and metastatic ovarian carcinoma (Barbolina et al., 2008).

An area encompassing 61 genes on the short arm of chromosome 1 had an increased copy number (4.8) in 6 samples (Figure 69). This is again another relatively large “block” of genes.

However, oncogenes such as L-myc and the c-myc binding protein gene were found in this region. SNIP1, another gene that encodes a protein that interacts and enhances the activity of c- myc was found to have copy number gain in this region.

Two “blocks” of genes located on the long arm of chromosome 1 were found in 5 (Figure

70) and 6 samples (Figure 71). The “block” found in 6 samples have copy number of 4.6, whereas the one with 5 samples only had copy number of 4.4. There were 35 and 29 genes in each region respectively. As suspected, these regions actually overlapped at the 1q43 locus.

Interestingly, these two “blocks” had four samples in common (3502, 3510, 1070 and 1683).

There were no classical oncogenes found in this area. However, the gene FMN2, involved in cell polarity and cytoskeletal reorganization was found in this region. An link between FMN2 and 92

Copy Number 4

3

2

1

0 3507

927

1641

1719

2004

1393

Figure 70. Scatterplot of the 19q “block” in 6 serous samples. The heat map shows copy number gain in the samples at the end of the long arm of chromosome 19. The scatterplot shows high copy number in this region. Sample ID’s are listed to the left of the heat map.

Copy Number 4 3

2

1

0 3509

2107

1297

1074 1504

2004

Figure 71. Scatterplot of the 1p “block” in 6 serous samples. The heat map shows copy number gain in the samples at the end of the short arm of chromosome 1. The scatterplot shows high copy number in this region. Sample ID’s are listed to the left of the heat map.

93

Copy Number

4

3

2

1

0 3502

3510

972

1070

1683

Figure 72. Scatterplot of the 1q “block” in 5 serous samples. The heat map shows copy number gain in the samples at the end of the long arm of chromosome 1. The scatterplot shows high copy number in this region. Sample ID’s are listed to the left of the heat map. 4

3

2 Copy Number

1

0 3503

3502

3510

3506

1070

1683

Figure 73. Scatterplot of the 1q “block” in 6 serous samples. The heat map shows copy number gain in the samples at the end of the long arm of chromosome 1. The scatterplot shows high copy number in this region. Sample ID’s are listed to the left of the heat map.

94

problems with fertility was found in a study by Ryley et al. (2005). This is interesting since epidemiological studies have shown women with a history of infertility and endometriosis and predisposed to developing ovarian cancer (Cannistra, 2004).

The final area of copy number gain was found in a block on the long arm of chromosome

8. This copy number gain of 4.8 was found in 5 samples spanning 58 genes. A scatterplot showing this copy number aberration is shown in Figure 72. Many interesting genes were found at this 8q21 locus including MMP16, a matrix metalloproteinase gene that is overexpressed in hepatocellular carcinoma (Arai et al., 2007).

Copy Number

4

3

2

1

0 4184

2368

2356

1909

1367

Figure 74. Scatterplot of the 8q “block” in 5 serous samples. The heat map shows copy number gain in the samples at the end of the long arm of chromosome 8. The scatterplot shows high copy number in this region. Sample ID’s are listed to the left of the heat map. 95

3.10 Serous Gene List – Copy Number Losses The copy number loss gene list for the serous ovarian cancer samples consisted of less genes overall compared to the gains list. This is because there can be only a maximum loss of 2 copies of a gene. A total of 397 genes were identified with copy number less than one. The mean number of copies for genes identified on the gene list was 0.8. A total of 21 chromosomes had copy losses, with copy number less than 1 (Figure 73). Chromosome 17 had the copy number loss that was representative of the greatest number of samples, a total of 35 (Figure 74). The second most common area of copy number loss, from 34 samples was at chromosome 1. After applying the aforementioned parameters to the gene list several “blocks” of genes become apparent. The final gene list for copy number losses is Table 2 in the Appendix. The overall distribution of these “blocks” can be seen in the karyoview (Figure 60). A total of 3 “blocks” were found ranging in copy number from 0.6 to 0.9 (Figure 75).

Initially, before removal of the single gene peaks several genes were noted. The

LOC654346 gene, located at 17p11.2, only had 0.8 copies of the gene in a total of 35 samples.

Thirty four samples had a copy number loss with 0.5 copies at 1q21.3 at the LCE3B locus. The gene with the highest degree of copy number gain in the most number of samples (21 in total) was called LCE3C. This gene is located right beside LCE3B. Therefore, this may be a particularly unstable region of the genome in the serous ovarian cancer samples. In 24 samples on the long arm of chromosome 17 a keratin associated protein (KRTAP9-9) only has 0.9 copies.

These genes are most likely due to normal polymorphisms in the samples therefore, the genes located in the “block” regions after applying the parameters will now be discussed.

96

The first two “blocks” noted in the copy number loss gene list are both located on the

short arm of chromosome 8. In fact, the genes in this region are all located at the 8p23 and 8p24

loci. The reason that these regions are counted as separate “blocks” is because the area with 3

samples has a copy number of 0.9 (Figure 76) and the area with 6 samples has copy number of

0.8 (Figure 77). However, of three samples from the region with 0.8 copy number, only sample

1624 is found in the other “block” with copy number 0.9. The region with low copy number in 3

samples spanned a region of 14 genes. beta genes accounted for 4 of the 14 genes in

that location. Similarly, 10 of the 13 genes located in the “block” with copy number of 0.8 were

defensin beta genes. The defensin beta gene family encode proteins that have cytotoxic and

microbicidal function. In fact, copy number for DEFB4, DEFB103 and DEFB104 has been

suggested as a tumor marker in prostate cancer (Huse et al., 2008). Thus, a reduced copy number

of these genes is not surprising in the setting of ovarian cancer.

Copy Number

Chromosome Figure 75. Graph of copy number losses below 1 copy for each chromosome after the genomic segmentation algorithm was applied to 96 serous ovarian cancer samples.

97

Number of samples

Chromosome Figure 76. Graph of the number of samples for the top copy number losses found on each chromosome. Chromosomes with more than one area of copy number loss are only represented with the top number of samples shown.

8p

8p

Copy Number

5p

Number of samples Figure 77. Scatterplot of “blocks” of copy number loss in the serous ovarian cancer samples. Each point represents a region on a chromosome where there are areas of copy number loss in at least 10 genes or across more than 150kb.

98

Copy Number 4

3

2

1

0

1909

1624

1070

Figure 78. Scatterplot of the 8p “block” in 3 serous samples. The heat map shows copy number gain in the samples at the end of the short arm of chromosome 8. The scatterplot shows high copy number in this region. Sample ID’s are listed to the left of the heat map. 4 3 Copy Number 2

1 0 2850

1074

1088

952

1624

1683

Figure 79. Scatterplot of the 8p “block” in 6 serous samples. The heat map shows copy number gain in the samples at the end of the short arm of chromosome 8. The scatterplot shows high copy number in this region. Sample ID’s are listed to the left of the heat map.

99

Copy Number

4

3

2

1

0 1074

1367

1430

927

1082

Figure 80. Scatterplot of the 5p “block” in 5 serous samples. The heat map shows copy number gain in the samples at the end of the short arm of chromosome 5. The scatterplot shows high copy number in this region. Sample ID’s are listed to the left of the heat map.

The third “block” of genes identified with copy number of 0.6 was located on the short arm of chromosome 5. A scatterplot of this region of the genome showing copy number loss is shown in Figure 78. This area encompassed 11 genes all of which were members of the protocadherin alpha gene cluster. This cluster of genes at 5p31 is linked in tandem to two other related clusters (NCBI, 2008). This chromosomal arrangement is similar to that of the B-cell and

T-cell receptor gene clusters. A putative role in cell-cell adhesion, specifically in the brain has been suggested for this gene cluster (Lachman et al., 2008). However, very few publications exist for this gene family.

100

3.11 Normal Gene List – Copy Number Gains

The normal gene list had only 2 amplified regions over 4 copies. The first region was located at the SIRPB1 locus at 20p13 in two normal samples (1395 and 1863). The gene had 7.3 copies. This was considered a high copy number gain. SIRPB1 is the signal regulatory protein beta 1 gene and is a member of the immunoglobulin family of genes (NCBI, 2008). It is known to be involved in neutrophil migration (Liu et al., 2005). The second gene found to be amplified in the normal samples was LCE3C, with 5.9 copies in two samples (1605 and 1863). This gene, located on chromosome 1, was also found to be amplified in the serous ovarian cancer samples by 7.2 copies. Thus, although the gene is found in to have copy number gain in the normal samples, there seemed to be a greater degree of copy number gain in the malignant samples. This gene has no published papers regarding its function on the NCBI publically available on-line literature database (NCBI, 2008).

3.12 Normal Gene List – Copy Number Losses

A total of 7 genes were identified for copy number in the normal samples analyzed. The

RHD gene, located on chromosome 1, had only 0.7 copies in 3 samples (1395, 1777 and 1863).

This gene was also found on the list of copy number losses for the serous ovarian cancer samples. The RHD gene had 0.7 copies in 17 of the serous samples. Interestingly, the degree of copy loss is greater in the normal samples compared to the malignant cases. LCE3B also showed copy number loss in the normal samples. This gene had 0.5 copies in 2 samples (1777 and 1874).

The LCE3B gene also had copy number loss in the serous ovarian cancer samples, with only 0.5 copies in 34 malignant samples. Therefore, it appeared that this gene was comparably lost in both the normal and serous ovarian cancer samples. The CFHR3 gene had only 0.5 copies in the 101

normal samples. This gene, located on chromosome 1, was partially lost in two normal samples

(1605 and 1777). The GLANT17 gene, found on chromosome 4, had 0.5 copies in three normal samples (1316, 1605, 1874). This gene although exhibiting a high degree of copy number in the normal samples was not found to be lost in the serous ovarian cancer samples. The OR51A4 gene had 0.4 copies in two normal samples (1395 and 1867). This gene, located on chromosome

11, was also found in 11 serous ovarian cancer samples with a copy number of only 0.7.

Therefore, this gene is deleted to a greater extent in the normal samples compared to the malignant cases. The gene displaying the greatest degree of copy number loss in the normal samples was WWOX, located on chromosome 16. This oxireductase gene had only 0.3 copies in three normal samples (1301, 1863 and 1874).

3.13 Functional Inquiry for Top Genes of Interest

The genes that make up the “blocks” of copy number aberration observed in the previous section may or may not be expressed. Therefore, a search of the GEO and Oncomine databases was performed for key genes of interest in each of the 1p, 1q, 8q, 12p, 19q and 20q chromosomal regions for copy number gain and 8p and 5p regions for copy number loss. The results of these searches are discussed below.

A search of the GEO database for datasets relevant to ovarian cancer resulted in a total of

58 hits, many of which were gene expression studies on chemotherapy response. However, several datasets showed overexpression concordant with the copy number gain observed in this study. The EPPK1 gene noted before to have copy number gain of 4.7 on the long arm of chromosome 8 was significantly overexpressed (P=3.4x10-9) in serous ovarian cancer compared to normal ovarian tissue (Hendrix et al., 2006). The EPPK1 gene ranked second on Hendrix et al. 102

(2006) gene list. The PLEC1 gene was significantly higher overexpressed (P=6.3x10-5) in endometrioid ovarian cancer compared to the other three main subtypes (Hendrix et al., 2006).

Interestingly, the GPT gene is more overexpressed in metastatic liver and prostrate cancers compared to primary tumors at those sites (Chen et al., 2004; Oncomine, 2008).

CCNE1, a well known ovarian cancer oncogene, was found to have high copy number in this study. A search of gene expression studies on Oncomine show that CCNE1 is significantly overexpressed in serous ovarian cancer tissue (P=2.3x10-5) (Hendrix et al., 2006). The Komatsu et al. (2006) study listed on the GEO database found that CCNE1 was overexpressed in both clear cell and serous ovarian cancers. However, they reported a higher level of overexpression of

CCNE1 in the clear cell samples compared to the serous samples.

The Komatsu et al. (2006) study also found NANOG, a transcription factor located on the short arm of chromosome 12, to be overexpressed in serous ovarian cancer cell lines. A search of

Oncomine for this gene did not result in any relevant studies with serous ovarian cancer results.

However, in an unpublished super series by Dr. David Bowtell (Oncomine, 2008) NANOG was significantly overexpressed in serous borderline cystadenomas compared to ovarian adenocarcinoma. Another gene located on 12p fond to have high copy number n this study was

CLEC4A. The Hendrix et al. (2006) dataset also showed significant overexpression (P=2.7x10-5) of CLEC4A in serous ovarian cancer compared to normal tissue. CLEC4A was overexpressed to a higher degree in serous ovarian cancer compared to clear cell ovarian cancer (Komatsu et al.,

2006).

RAB22A, TH1L and BMP7, from “block” of genes located on the long arm of chromosome 20 were highlighted for further inquiry using GEO and Oncomine. In the 103

unpublished super series of ovarian cancer samples by Dr. David Bowtell (Oncomine, 2008),

RAB22A was found to be significantly overexpressed in serous samples compared to borderline serous cystadenomas. Interestingly, RAB22A is more overexpressed in carboplatin resistant serous ovarian cancer tumors (Peters et al., 2005). The TH1L gene was also significantly overexpressed in ovarian serous tissue compared to borderline serous cystadenomas (Oncomine,

2008). TH1L was also more highly overexpressed in serous ovarian cancer compared to clear cell (Komatsu et al., 2006). The BMP7 gene was highly overexpressed (P=3x10-7) in serous ovarian cancer in the Hendrix et al. (2006) study. BMP7 was also more overexpressed in serous ovarian cancer compared to clear cell ovarian cancer (Komatsu et al., 2006). Interestingly, BMP7 was expressed at a normal level in carboplatin sensitive ovarian cancer cell lines but was highly overexpressed in carboplatin resistant ovarian cancer cell lines (Peters et al., 2005).

On chromosome 19, two genes of interest were prioritized for further investigation with

Oncomine and GEO. These genes, PAK4 and ACTN4 are located on the long arm of the chromosome. In the Hendrix et al. (2006) study PAK4 was significantly overexpressed in mucinous ovarian cancer. However, the PAK4 gene in the Hendrix et al. (2006) dataset was not overexpressed in serous ovarian cancer. Despite this finding, PAK4 was shown to be overexpressed in both clear cell and serous ovarian cancers in the Komatsu et al. (2006) study. A search of Oncomine showed that ACTN4 was overexpressed in all four subtypes of ovarian cancer (Hendrix et al., 2006). Furthermore, ACTN4 was more highly overexpressed in carboplatin sensitive serous ovarian tumors compared to carboplatin resistant tumors (Peters et al., 2005).

104

Two genes, SNIP1 and FMN2 were prioritized for further investigation from the two

“blocks” of genes on chromosome 1. SNIP1 was actually found to be underexpressed in serous, mucinous and endometrioid ovarian cancer tumors (Hendrix et al., 2006). The copy number for

SNIP1 was found to be 4.8 in the current study. In other types of cancer, such as cervical cancer,

SNIP1 is significantly overexpressed (P=9.6x10-5) (Pyeon et al., 2007). The FMN2 gene did not have any ovarian cancer specific studies listed in the Oncomine database (2008). However,

FMN2 was shown to be overexpressed in other types of cancer, such as pharyngeal cancer

(Oncomine, 2008). The FMN2 gene was also overexpressed in clear cell and ovarian cancer cell lines from the Komatsu et al. (2006) study.

Finally, MMP16 from the “block” of genes located on the long arm of chromosome 8 with copy number of 4.8 was prioritized for further investigation. MMP16 did not have any ovarian cancer gene expression entries in Oncomine. However, in the Pyeon et al. (2007) study,

MMP16 was found to be significantly overexpressed in head and neck cancers. MMP16 was normally expressed in carboplatin resistant cell lines but overexpressed in the sensitive ones

(Komatsu et al., 2006).

Two key areas of interest were identified in the copy number loss data in this study, the short arm of chromosome 8 and the short arm of chromosome 5. The 8p “block” of genes consisted of mainly defensin beta genes. These genes, such as DEFB4 were lowly expressed in serous ovarian cancer (Komatsu et al., 2006). Very little information was found on the defensin beta genes in the Oncomine database. However, it seems that there is variable expression of this gene in a variety of tissue types. The protocadherin alpha genes found in tandem with low copy number in 5 samples in this study were found to be lowly overexpressed (P=1.9x10-6) in serous

105

ovarian cancer (Oncomine, 2008). In the same study, however, significant underexpression of the same protocadherin gene was found in the borderline serous ovarian cancer samples (P=1.3x10-

9). However, these protocadherin alpha genes have significantly low expression in pancreatic cancer (Harada et al., 2008) and bladder cancer (Dyrskjot et al., 2004).

3.14 Pathway Analysis

Genes prioritized for further investigation into gene expression using GEO and

Oncomine were also investigated for pathways using Pathway Searcher, BioCarta and KEGG.

The GPT gene located on the long arm of chromosome 8 is involved in the glutamate, aspartate and arginine metabolism pathways. GPT catalyzes the reaction that converts L-alanine into pyruvate which then feeds into the Kreb’s cycle. CCNE1 is involved in the cell cycle pathways.

At the G1/S check point, CCNE1 binds to RB, which in turn binds to E2F1 and allows the cell to enter into the S phase. Aberrant overexpression of CCNE1 will then lead to increased proliferation of cells. CCNE1 is degraded in the CCNE1 degradation pathway after the cell passes into S phase. CCNE1 is also involved in the p53 pathway since MDM2 negatively regulates RB. The BMP7 gene, found on chromosome 20, is involved in the TGFβ and

Hedgehog signalling pathways. These are key pathways in development and cancer. The PAK4 gene identified with high copy number in this study is involved in pathways regulating the actin cytoskeleton. PAK4 is activated by Rac and Cdc42 and in turn activates the MAPK signalling pathway. The ACTN4 gene is also involved in pathways regulating the actin cytoskeleton. This means it has a role in the tight junction and adherens pathways of cell to cell contact and polarity. The genes found to have low copy number in this study, defensin beta genes and

106

protocadherin alpha genes were not listed in any pathways using the Pathway Searcher, KEGG and Biocarta databases.

107

Chapter 4. Discussion

108

4.1 Genomic Instability Identified in Serous Ovarian Cancer

It is evident from this study that serous ovarian cancer is a disease marked by high genomic instability. This in itself is not a novel finding in the field of ovarian cancer research

(Singer et al., 2005; McCluggage, 2008; Salvador et al. 2008). However, this study has identified novel regions and genes in the ovarian cancer genome that have not been described before.

Chromosomes identified with copy number gains, in order of rank by number of aberrations, were: 1, 7, 10, 8, 19, 12, 6, and 4. Chromosomes with copy number losses were: 17,

1, 11, 15, 5, 7 and 10. Parameters to remove normal variants from the gene list resulted in a series of regions that resembled “blocks” of genes. These regions were located on 1p, 1q, 8q,

12p, 19q and 20q for gains. The range in copy number gains in these regions was 4.2 to 5.1. The

“blocks” of genes were located at 8p and 5p for copy number losses. The range for copy number loss was 0.6 to 0.9. Regions which were expected to have high chromosomal instability and potentially high copy number gain was the long arm of chromosome 8. This area of the genome has well documented structural variation, both in the ovarian cancer and other cancer genomes

(Cooke et al., 2008; Salinas et al., 2008; Yeager et al., 2008).

Genome-wide approaches for discovering novel driver cancer genes is a new advance in the field of ovarian cancer. Very few studies have examined the ovarian cancer genome at a genome-wide level, a notable exception is the Gorringe et al. (2007) study. In the Gorringe et al.

(2007) study they found high level gains (greater than 5 copies) on the short arm of chromosome

8 and low level gains (between 2 and 5 copies) on the short arms of chromosomes 1, 12, 17 and

19 and the long arms of chromosomes 2, 14 and 19. The highest level gain found in the Gorringe et al. (2007) study was at 8p12-p11.3, a locus where the FGFR1 receptor is located. Interestingly, 109

in our dataset there was copy number loss at the 8p23 locus but no gains at the 8p12-11.3 locus.

However, there was extensive copy number gain on the long arm of chromosome 8 in this study.

There were two regions of copy number gain at 8q21 and 8q24 in 5 and 15 samples respectively.

Interestingly, Gorringe et al. (2007) did not report areas of copy number on the long arm of chromosome 8. The Gorringe et al. (2007) group found copy number gains in 35% of their samples at the 12p12.1 locus, suggestive of an aberration in the KRAS gene. In our study 8 samples had a copy number of 4.3 at the 12p13.3 locus. This proximity to the known cancer gene

KRAS suggests that there may be some overlap with chromosomal instability at this region of the genome. Finally, the Gorringe et al. (2007) study reported copy number gains at the 2q13,

8p12, 9q34.13, 14q23.2, 17q13.1 and 19p13.12 loci that were not found in our dataset. However, the strict parameters set for driver gene discovery may have eliminated these regions from any subsequent analysis of the original data. A recent study on the gene expression of serous ovarian and fallopian tube cancers showed amplification at the 1q, 8q, 12p, 20q and 19q loci (Nowee et al., 2007). The amplification at the 19q locus, also found in our study, is likely driven by classical ovarian cancer gene, CCNE1. Interestingly, both the Nowee et al. (2007) and Gorringe et al. (2007) studies did not report either copy number loss or decreased gene expression on the short arm of chromosome 5. However, in 5 samples in our study, there was a copy number of only 0.6 at the 5q31 locus. The protocadherin alpha genes located in this region are implicated in cytoskeletal stability and cell polarity (Lachmann et al., 2008).

4.2 Cell Polarity Genes and Ovarian Cancer

Several of the genes identified in this study with copy number aberrations are involved cell polarity and cytoskeleton. The polarity of a cell is crucial to numerous cellular processes

110

including: cell division, cell death, shape changes, cell migration and differentiation (Bryant and

Mostov, 2008). In fact, the three highly conserved cell polarity complexes, PAR, Crumbs and

Scribble are involved in cell proliferation (Goldstein and Macara, 2007). Therefore, it is likely that these cell polarity genes have a role in the pathogenesis of cancer. The atypical protein kinase C gene (aPKC) regulate apical-basal cell polarity in ovarian epithelial cells. The zeta isoform of aPKC, aPKCzeta has been found to have an apical to cortical redistribution in serous and mucinous ovarian cancers (Grifoni et al., 2007). The aPKCzeta gene is also overexpressed in serous and mucinous ovarian cancer (Grifoni et al., 2007). Cadherin genes, involved in cell polarity have been associated with ovarian cancer metastasis (Kuwabara et al., 2008). In our study several genes, including, PAK4 and ACTN4, are involved in cytoskeletal stability and cell polarity. Therefore, future analysis of cell polarity mechanisms in ovarian cancer cells may be useful.

4.3 Copy Number Alteration in Non-ovarian Cancer

Genome-wide studies of copy number in other cancer types have been researched in parallel with the field of ovarian cancer. Many studies have shown that copy number alterations in cancer are highly variable between tumor types and between individuals with the same tumor type (Smith et al., 1999; Lassus et al., 2001; Albertson et al., 2003). Recurrent copy number aberrations are commonly observed in sarcomas, lymphomas and leukemias (Rowley, 1998;

Albertson et al., 2003). In solid tumors, numerous copy number alterations are found but less consistently across all tumors of the same type (Albertson et al., 2003). Often these areas of copy number change contain hundreds of genes that may be linked to increased cancer risk. The histological and structural complexity of solid tumors also contributes to the high number of

111

copy number alterations detected. Tumors accumulate chromosomal aberrations are different rates and patterns across tissue types. For instance, in breast cancer fewer copy number changes are seen in ductal hyperplasia compared to carcinoma in situ (Gong et al., 2001). In contrast, high-grade dysplasias and adenomas have roughly the same frequency of copy number aberrations as carcinoma of the colon (Albertson et al., 2003). In lung cancer areas of known copy number aberration, such as the Myc, PTEN and the CDKN2A loci, have been identified using SNP array technology (Thomas et al., 2006). Novel areas of copy number gain at the

12p11 and 22q11 loci were identified in non-small cell lung cancer using 100,000 probe SNP arrays (Zhao et al., 2005). The regions typically gained or lost in the genomes of lung cancer patients are usually large in size. Thus, Thomas et al. (2006) suggested that the higher resolution arrays, such as the Affymetrix Genome-Wide SNP 6.0 array might not uncover any additional copy number aberrations in lung cancer. However, one could obviously use the higher resolution arrays to more clearly define the regions of copy number gain or loss. Copy number alterations are also prevalent in prostate cancer (Dong, 2006). Copy number loss at the NKX3-1 and

CDKN1B loci in prostate tumors were identified using SNP array technology (Dong et al., 2002;

Gary et al., 2004). Copy number gain at the MYC locus is often detected in copy number experiments in prostate tumors (Dong, 2006). Overall, copy number alterations are highly prevalent in cancer genomes. Future cancer studies will likely integrate copy number information with gene expression and other newly available genomic data in order to undercover areas of significant interest for driver gene discovery.

112

4.4 Future Directions

Protein interactions in the prioritized genes of interest, from the finalized gene list could be identified using the online I2D (Brown and Jurisica, 2007) database in the future. These protein-protein interactions would be visualized using the NAViGaTOR software (Network

Analysis, Visualization & Graphing TORonto, 2008). Multiple studies have used the

NAViGaTOR software to visualize protein-protein interaction networks. One particular study of note, authored by Motamed-Khorasani et al. (2007), used the online OPHID database (Online

Predicted Human Protein Interaction Database) (Brown and Jurisica, 2005) and NAViGaTOR software to prioritize ovarian cancer genes of interest from gene expression profiling experiments. Knowledge of protein-protein interactions could be used to corroborate other evidence for putative driver genes. These interactions could also be used to direct the researcher into examining specific key pathways of interest.

Gene expression data using the Agilent array technology was completed on some of the samples run in this study by Dr. Denis Slamon’s laboratory. In fact, the Fejzo et al. (2008) showed that the ADRM1 gene at 20q13 was the overexpressed gene most highly linked to better survival. In this study copy number gain was also found on the long arm of chromosome 20 in 7 samples. Therefore, a future direction would be to obtain and compare these two datasets to further prioritize genes of interest.

The “blocks” of genes seen across the genome were found in different samples. It may be interesting to determine whether these “blocks” occur in a repetitive fashion across conserved samples. This information would be useful in two ways. First, recurring patterns in the samples would enable the researcher to cluster samples and corresponding copy number aberrations. If 113

the stage, grade and other clinical data was known about the samples recurring copy number aberrations could be linked to the sample attributes. For instance, the copy number aberrations observed on chromosomes 1, 8 and 5 could be associated with high grade serous cancers, whereas aberrations on only 1 and 8 could be associated with low grade cases. This sort of information about patterns in samples could also be developed into a molecular signature or further stratify the subtypes of ovarian cancer. Perou et al. (2000) identified five subtypes of breast cancer and specific subtypes of ovarian cancer may also be present. The poor prognosis seen in serous cancer may be due to the molecular heterogeneity within the subtype Serous cancer are already divided into high-grade and low-grade subtypes (Mccluggage, 2008).

However, further stratification by molecular profile may improve the ability to develop targeted or appropriate drugs for these subtypes. For instance, if a specific signature was associated with a poorer response to carboplatin chemotherapy the physician might be able to modify the treatment regime to better suit the patient. A far-reaching goal of developing a molecular signature would be to develop a test that could identify women at greater risk for developing ovarian cancer based upon their pattern of gene copy number. This would require many steps before it could actually be put into clinical practice. However, profiling germline DNA from women with ovarian cancer may be useful in determining ovarian cancer pre-disposing haplotypes. A correlation analysis would be required to analyse the similarities between the “blocks” in the genome.

Genes that have been prioritized from the gene list developed in this study may be artificially significant; validation in vitro will prove if they are actually operating in an in vitro biological model of ovarian cancer. In vitro validation will include: quantitative real time polymerase chain reaction, immunohistochemistry, fluorescent in situ hybridization, gene overexpression and si-RNA knockdown of the gene in a human ovarian cancer cell line. The next 114

step after in vitro validation is modelling of these genes in vivo (mouse model). This will prove the actual biological relevance to ovarian cancer of the prioritized and validated gene. This step brings the research closer to the actual aim of developing a molecular target for ovarian cancer therapy in human patients.

4.5 Clinical Implications for Future Research

Ovarian cancer is currently treated in a “one-size fits all” manner. This method is clearly not optimal due to the large degree of heterogeneity between ovarian cancer subtypes and variable prognosis for each subtype. Examination of early stage ovarian cancer tissue samples may be useful in determining key driver genes that are aberrant early in the pathogenesis of the disease. These early driver genes could in turn be targeted for treatment of the disease. In addition to treatment problems, lack of sensitive screening tools to detect disease at an early stage is problematic in ovarian cancer. As with most cancers, the earlier the stage at diagnosis the more favourable prognosis the patient will have. Therefore, the need to elucidate better markers of early disease is vast.

4.6 Conclusion

Epithelial ovarian cancer remains a highly lethal type of cancer, mainly because of a lack of understanding of its etiology, lack of known risk factors, relative rarity of disease, lack of good screening tools, and late stage at diagnosis. If patients can be identified at an earlier stage by using a molecular signature this would be of great benefit. Additionally, if a better understanding of the molecular aberrations of the different subtypes of ovarian cancer is elucidated, targeted therapies may be developed. Areas of genomic instability, including 1p, 1q,

115

5p, 8p, 8q, 12p, 19q and 20q were identified after applying a rigorous genome-wide approach to a group of 96 serous ovarian cancer samples. These areas contain novel genes with possible important roles in the development of ovarian carcinoma. Based on the history of Her2 and

Herceptin outlined in this proposal, it is clear that targeting cancer driver genes will provide new opportunities for the treatment of ovarian cancer. Thus, molecular therapy for ovarian cancer has the potential to vastly improve patient prognosis and overall quality of life.

116

References

117

Abir, R., Ben-Haroush, A., Melamed, N., Felz, C., Krissi, H. and Fisch, B. (2008). Expression of bone morphogenetic proteins 4 and 7 and their receptors IA, IB, and II in human ovaries fromfetuses and adults. Fertility and sterility. 89, 1430-1440. Abu-Rustum, N.R. and Aghajanian, C. (1998). Management of malignant germ cell tumors of the ovary. Seminars in oncology. 25, 235-242. Affymetrix (2008). http://www.affymetrix.com Agarwal, R. and Kaye, S.B. (2003). Ovarian Cancer: Strategies for overcoming resistance to chemotherapy. Nature Rev. 3, 502-516.

Albertson, D.G., Collins, C., McCormick, F. and Gray, J.W. (2003). Chromosome aberrations in solid tumors. Nature Genetics. 34, 369-376. Allain, D.C., Sweet, K. and Agnese, D.M. (2007). Management options after prophylactic surgeries in women with BRCA mutations: a review. Cancer control. 14, 330-337.

Aoki-Kinoshita, K.F. and Kanehisa, M. (2007). Gene annotation and pathway mapping in KEGG. Methods in molecular biology. 396, 71-91.

Arai, I., Nagano, H., Kondo, M., Yamamoto, H., Hiraoka, N., Sugita, Y., Ota, H., Yoshioka, S., Nakamura, M., Wada, H., Damdinsuren, B., Kato, H., Marubashi, S., Miyamoto, A., Takeda, Y., Dono, K., Umeshita, K., Nakamori, S., Wakasa, K., Sakon, M. and Monden, M. (2007). Overexpression of MT3-MMP in hepatocellular carcinoma correlates with capsular invasion. Hepatogastroenterology. 54, 167-171.

Armstrong, D.K., Bundy, B., Wenzel, L., Huang, H.Q., Baergen, R., Lele, S., Copeland, L.J., Walker, J.L., Burger, R.A. and Gynecologic Oncology Group. (2006). Intraperitoneal cisplatin and paclitaxel in ovarian cancer. NEJM. 354, 34-43.

Bailey, J.A., Yavor, A.M., Massa, H.F., Trask, B.J., Eichler, E.E. (2001). Segmental duplications: Organization and impact within the current human genome project assembly. Genome Res. 11, 1005–1017.

Baldwin E.L., Lee, J., Blake,D.M., Brian, B.S., Bunke P., Alexander, C.R., Kogan, A.L., Ledbetter, D.H., Martin, C.L. (2008). Enhanced detection of clinically relevant genomic imbalances using a targeted plus whole genome oligonucleotide microarray. Genetics in Medicine. 10: 415- 429. Barbolina, M.V., Adley, B.P., Kelly, D.L., Fought, A.J., Scholtens, D.M., Shea, L.D. and Stack, M.S. (2008). Motility-related actinin alpha-4 is associated with advanced and metastatic ovarian carcinoma. Lab investigation. 88, 602-614. Barrett, T., Troup, D.B., Wilhite, S.E., Ledoux, P., Rudnev, D., Evangelista, C., Kim, I.F., Soboleva, A., Tomashevsky, M., Marshall, K.A., Phillippy, K.H., Sherman, P.M., Muertter, R.N., and Edgar, R. (2008). NCBI GEO: archive for high-throughput functional genomic data. Nucleic acids res. Oct 21 (Epub ahead of print). 118

Bast, R.C., Badgwell, D., Lu, Z., Marquez, R., Rosen, D., Liu, J., Baggerly, K.A., Atkinson, E.N., Skates, S., Zhang, Z., Lokshin, A., Menon, U., Jacobs, I., Lu, K. (2005). New tumor markers: CA125 and beyond. Int. J. Gynecol Cancer. 15, 274-281. Bayani, J., Brenton, J.D., Macgregor, P.F., Beheshti, B., Albert, M., Nallainathan, D., Karaskova, J., Rosen, B., Murphy, J., Laframboise, S., Zanke, B. and Squire, J.A. (2002). Parallel analysis of sporadic primary ovarian carcinomas by spectral karyotyping, comparative genomic hybridization, and expression microarrays. Cancer Research. 62, 3466-3476. Beaudet, A.L. and Belmont, J.W. (2008). Array-based DNA diagnostics: Let the revolution begin. Annu. Rev. Med. 59, 113-129. Bello, M.J. and Rey, J. (1990). Chromosome aberrations in metastatic ovarian cancer: relationships with abnormalities in primary tumors. International Journal of Cancer. 45, 50-54. Brown, K.R. and Jurisica, I.(2005). Online predicted human interaction database. Bioinformatics. 21, 2076-2082. Brown, K.R. and Jurisica, I. (2007). http://ophid.utoronto.ca/i2d. Bryant, D.M. and Mostov, K.E. (2008). From cells to organs: building polarized tissue. Nature Reviews Molecular Cell Biology. 8, 887-901. Caduff, R.F., Svoboda-Newman, S.M., Ferguson, A.W., Johnston, C.M. and Frank, T.S. (1999). Comparison of mutations of Ki-RAS and p53 immunoreactivity in borderline and malignant epithelial ovarian tumors. American journal of surgical pathology. 23, 323-328. Cancer Genome Anatomy Project. (2008). http://cgap.nci.nih.gov/Pathways/Pathway_Searcher. Cannistra, S.A. (2004). Cancer of the ovary. N. Engl. J. Med. 351, 2519-2529. Carey, L.A., Perou, C.M., Livasy, C.A., et al. (2006). Race, breast cancer subtypes, and survival in the Carolina Breast Cancer Study. JAMA. 295, 2492–2502.

Carter, N.P. (2007). Methods and strategies for analysing copy number variation using DNA microarrays. Nature genetics. 39, S16-S21. Chan, W.Y., Cheung, K.K., Schorge, J.O., Huang, L.W. et al. (2000). Bcl-2 and p53 protein expression, apoptosis and p53 mutation in human epithelial ovarian cancers. American Journal of Pathology. 156, 409-417. Chan, J.K., Fuh, K., Shin, J.Y., Cheung, M.K., Powell, C.B., Chen, L.M., Kapp, D.S. and Osann, K. (2008). The treatment and outcomes of early-stage epithelial ovarian cancer: have we made any progress? Br. J. Cancer. 98, 1191-1196. Chen, V.W., Ruiz, B., Killeen, J.L., Cote, T.R., Wu, X.C. and Correa, C.N. (2003). Pathology and classification of ovarian tumors. Cancer. 15, 2631-2642.

119

Chen X, Higgins J, Cheung ST, Li R, Mason V, Montgomery K, Fan ST, van de Rijn M, So S. (2004). Novel endothelial cell markers in hepatocellular carcinoma. Modern Pathology. 17, 1198-1220. Chen, S. And Parmigiani, G. (2007). Meta-analysis of BRCA1 and BRCA2 penetrance. Journal of Clinical Oncology. 25, 1329-1333. Cheung, J., Estivill, X., Khaja, R., MacDonald, J.R., Lau, K., Tsui, L.C., and Scherer S.W. (2003). Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence. Genome Biology. 4, 25-35. Cho, K.R. and Shih, I.M. (2008). Ovarian Cancer. Annual review of pathology: mechanisms of disease. 4, 287-313. Coates, R.L., Kolor, K., Stewart, S.L. and Richardson, L.C. (2008). Diagnostic markers for ovarian cancer screening: not ready for clinical use. 14, OF1-OF2. Colombo, N., Van gorp, R., Parma, G., Amant, F., Gatta, G., Sessa, C. and Vergote, I. (2006). Ovarian Cancer. Onc. Hematology. 60, 159-179. Conde, L., Montaner, D., Burguet-Castell, J., Tárraga, J., Al-Shahrour, F., and Dopazo, J. (2007). Functional profiling and gene expression analysis of chromosomal copy number alterations. Bioinformation. 1, 432-435. Cooke, S.L., Pole, J.C., Chin, S.F., Ellis, I.O., Caldas, C. and Edwards, P.A. (2008). High- resolution array CGH clarifies events occurring on 8p in carcinogenesis. BMC Cancer. 8, 288- 303. Cronje, H.S., Niemand, I., Bam, R.H. and Woodruff, J.D. (1999). Review of the granulosa-theca cell tumors from the emil Novak ovarian tumour registry. American Journal of obstetrics and gynecology. 180, 323-327. Cully, M., You, H., Levine, A.J. and Mak, T.W. (2006). Beyond PTEN mutations: the PI3K pathway as an integrator of multiple inputs during tumorigenesis. Nature Rev. Cancer. 6, 184- 192. Cygwin. (2008). http://www.cygwin.com/. Database of genomic variants. (2008). http://projects.tcag.ca/variation/. Dang, C.V., O'Donnell, K.A., Zeller, K.I., Nguyen, T., Osthus, R.C. and Li, F. (2006). The c- Myc target gene network. Seminars in cancer biology. 16, 253-264. De Graeff, P., Crijns, A.P., Ten Hoor, K.A., Klip, H.G., Hollema, H., Oien, K., Bartlett, J.M., Wisman, G.B., de Bock, G.H., de Vries, E.G., de Jong, S. and van der Zee, A.G. (2008). The ErbB signalling pathway: protein expression and prognostic value in epithelial ovarian cancer. British journal of cancer. 99, 341-349.

120

Dent, J., Hall, G.D., Wilkinson, N., Perren, T.J., Richmond, I., Markham, A.F., Murphy, H. And Bell, S.M. (2003). Cytogenetic alterations in ovarian clear cell carcinoma detected by comparative genomic hybridization. Br. J. Cancer. 88, 1578-1583. Doak, S.H., Jenkins, S.A., Hurle, R.A., Varma, M., Hawizy, A., Kynaston, H.G. and Parry, J.M. (2007). Bone morphogenic factor gene dosage abnormalities in prostatic intraepithelial neoplasia and prostate cancer. Cancer genetics and cytogenetics. 176, 161-165. Dong, J.T. (2002). Chromosomal deletions and tumor suppressor genes in prostate cancer. Cancer Metastasis Review. 20, 173–193.

Dong, J.T. (2006). Prevalent mutations in prostate cancer. Journal of cellular biochemistry. 97, 433-447. Downey, G. (2006). Analysis of a multifactor microarray study using Partek genomics solution. Methods in Enzymology. 411, 256-270.

Drucker, B.J. and Lydon, N.B. (2000). Lessons learned from the development of an Abl tyrosine kinase inhibitor for chronic myelogenous leukemia. J. Clin. Invest. 105, 3-7.

Dunteman, G.H. (1989). Principal Components Analysis. SAGE Publications: Newbury Park, California.

Dyrskjot, L., Kruhoffer, M., Thykjaer, T., Marcussen, N. et al. (2004). Gene expression in the urinary bladder: a common carcinoma in situ gene expression signature exists disregarding histopathological classification. Cancer Research. 64, 4040-8.

Eddy, S.R. (2004). What is a hidden markov model? Nature biotechnology. 22, 1315-1316.

Erkinheimo, T.L., Lassus, H., Finne, P., van Rees, B.P., Leminen, A., Ylikorkala, O., Haglund, C., Butzow, R. and Ristimäki, A. (2004). Elevated cyclooxygenase-2 expression is associated with altered expression of p53 and SMAD4, amplification of HER-2/neu, and poor outcome in serous ovarian carcinoma. Clinical cancer research. 10, 538-545.

Evans, D.M. and Cardon, L.R. (2004). Guidelines for Genotyping in Genomewide Linkage Studies: Single Nucleotide Polymorphism Maps versus Microsatellite Maps. Am. J. of human genetics. 75, 687-692.

Fantini, M.C. and Pallone, F. (2008). Cytokines: from gut inflammation to colorectal cancer. Current drug targets. 9, 375-380.

Fathalla, M.F. (1971). Incessant ovulation – a factor in ovarian neoplasia? Lancet. 2, 163-172.

Fehrmann, R.S., Li, X.Y., van der Zee, A.G., de Jong, S., Te Meerman, G.J., de Vries, E.G. and Crijns, A.P. (2007). Profiling studies in ovarian cancer: a review. Oncologist. 12, 960-966.

121

Fejzo, M.S., Dering, J., Ginther, C., Anderson, L., Ramos, L., Walsh, C., Karlan, B., and Slamon, D.J.. (2008). Comprehensive analysis of 20q13 genes in ovarian cancer identifies ADRM1 as amplification target. Genes, chromosomes & cancer. 47, 873-883.

Futreal, P.A., Coin, L., Marshall, M., Down, T., Hubbard, T., Wooster, R., Rahman, N., Stratton, M.R. (2004). A census of human cancer genes. Nature Reviews Cancer. 4, 177-183.

Fyles, A.W., Thomas, G.M., Pintilie, M., Ackerman, I. and Levin, W. (1998). A randomized study of two doses of abdominopelvic radiation therapy for patients with optimally debulked Stage I, II, and III ovarian cancer. International journal of radiation oncology, biology and physics. 41, 543-549.

Garber, K. (2006). The second wave in kinase cancer drugs. Nature Biotechnology. 24, 127-130.

Gary, B., Azuero, R., Mohanty, G.S., Bell, W.C., Eltoum, I.E., Abdulkadir, S.A. (2004). Interaction of Nkx3-1 and p27kip1 in prostate tumor initiation. American Journal of Pathology 164, 1607-1614.

GEO: Gene Expression Omnibus. (2008). http://www.ncbi.nlm.nih.gov/geo/.

Gemignani, M.L., Schlaerth, A.C., Bogomolniy, F., Barakat, R.R., Lin, O., Soslow, R., Venkatraman, E. and Boyd, J. (2003). Role of KRAS and BRAF gene mutations in mucinous ovarian carcinoma. Gynecologic oncology. 90, 378-381.

Giancotti, F.G. and Ruoslahti, E. (1999). Integrin signalling. Science. 285, 1028-1032.

Goldstein, B. And Macara, I.G. (2007). The PAR proteins: fundamental players in animal cell polarization. Developmental Cell. 13, 609-622.

Glud, E., Kjaer, S.K., Troisi, R. and Brinton, L.A. (1998). Fertility drugs and ovarian cancer. Epidemiology Rev. 20, 237-257.

Gong, G., DeVries, S., Chew, K.L., Cha, I., Ljung, B.M. and Waldman, F.M. (2001). Genetic changes in paired atypical and usual ductal hyperplasia of the breast by comparative genomic hybridization. Clinical Cancer Research. 7, 2410-2414.

Gorringe, K.L., Jacobs, S., Thompson, E.R., Sridhar, A., Qiu, W., Choong, D.Y.H. and Campbell, I.G. (2007). High-resolution single nucleotide polymorphism array analysis of epithelial ovarian cancer reveals numerous microdeletions and amplifications. Clinical cancer res. 13, 4731-4739.

Gorringe, K.L. and Campbell, I.G. (2008). High-resolution copy number arrays in cancer and the problem of normal genome copy number variation. Genes, Chromosomes and Cancer. 47, 933- 938.

122

Greenman et al. (2007) Patterns of somatic mutation in human cancer genomes. Nature. 446, 153-158. Grifoni, D., Garoia, F., Bellosta, P., Parisi, F., De Biase, D., Collina, G., Strand, D., Cavicchi, S. and Pession, A. (2007). aPKCzeta cortical loading is associated with Lgl cytoplasmic release and tumor growth in Drosophila and human epithelia. Oncogene. 40, 5960-5965. Hanahan, D. and Weinberg, R.A. (2000).The Hallmarks of Cancer. Cell. 100, 57-70. Harada, T., Chelala, C., Bhakta, V., Chaplin, T. et al. (2008). Genome-wide DNA copy number analysis in pancreatic cancer using high-density single nucleotide polymorphism arrays. Oncogene. 27, 1951-60. Hartmann, L.C., Lu, K.H., Linette, G.P., Cliby, W.A., Kalli, K.R., Gershenson, D., Bast, R.C., Stec, J., Iartchouk, N., Smith, D.I., Ross, J.S., Hoersch, S., Shridhar, V., Lillie, J., Kaufmann, S.H., Clark, E.A. and Damokosh, A.I. (2005). Gene expression profiles predict early relapse in ovarian cancer after platinum-paclitaxel chemotherapy. Clinical cancer research. 11, 2149-2155. Hashiguchi, Y., Tsuda, H., Inoue, T., et al. (2006). PTEN expression in clear cell adenocarcinoma of the ovary. Gynecologic oncology. 101, 71–5.

Hauptmann, S., Denkert, C., Koch, I., Petersen, S., Schlüns, K., Reles, A., Dietel, M. and Petersen, I. (2002). Genetic alterations in epithelial ovarian tumors analyzed by comparative genomic hybridization. 33, 632-641.

Hehir-Kwa, J.Y., Egmont-Petersen, M., Janssen, I.M., Smeets, D., van Kessel, A.G. and Veltman, J.A. (2007). Genome-wide Copy Number Profiling on High-density Bacterial Artificial Chromosomes, Single-nucleotide Polymorphisms, and Oligonucleotide Microarrays: A Platform Comparison based on Statistical Power Analysis. DNA Res. 14, 1-11.

Heintz, A.P., Odicino, F., Maisonneuve, P., Beller, U., et al. (2003). Carcinoma of the ovary: FIGO annual report. Interantional Journal of Obstetrics. 83, 135-144.

Hendrix, N.D., Wu, R., Kuick, R., Schwartz, D.R., Fearon, E.R. and Cho, K.R. (2006). Fibroblast growth factor 9 has oncogenic activity and is a downstream target of Wnt signaling in ovarian endometrioid adenocarcinomas. Cancer Research. 66, 1354-1362.

Hinds, D.A., Stuve, L.L., Nilsen, G.B., Halperin, E., Eskin, E., Ballinger, D.G., Frazer, K.A. and Cox, D.R. (2005). Whole-genome patterns of common DNA variation in three human populations. Science. 307, 1072-1079. Hirschhorn, J.N. and Daly, M. (2005). Genome-wide association studies for common diseases and complex traits. Nature rev. Genetics. 6, 95-108. Hoberman, H.D. (1975). Is there a role for mitrochondrial genes in carcinogenesis? Cancer Res. 35, 3332-3335.

123

Hodgson, G., Hager, J.H., Volik, S., Hariono, S., Wernick, M., Moore, D., Nowak, N., Albertson, D.G., Pinkel, D., Collins, C. et al., (2001). Genome scanning with array CGH delineates regional alterations in mouse islet carcinomas. Nature Genetics. 29, 459–464.

Hoei-Hansen, C.E., Almstrup, K., Nielsen, J.E., Brask-Sonne, S., Graem, N., Skakkebaek, N.E., Leffers, H. and Rajpert-De Meyts, E. (2005). Stem cell pluripotency factor NANOG is expressed in human fetal gonocytes, testicular carcinoma in situ and germ cell tumours. Histopathology. 47, 48-56.

Holschneider, C.H. and Berek, J.S. (2000). Ovarian cancer: epidemiology, biology and prognostic factors. Seminars in surgical oncology. 19, 3-10.

Huber, J.C., Bentz, E.K., Ott, J. and Tempfer, C.B. (2008). Non-contraceptive benefits of oral contraceptives. Expert opinion on pharmacotherapy. 9, 2317-2325.

Human Structural Variation Database. (2008). http://humanparalogy.gs.washington.edu/structuralvariation/.

Human Segmental Duplication Database. (2008). http://projects.tcag.ca/humandup/.

Huse, K., Taudien, S., Groth, M., Rosenstiel, P., Szafranski, K., Hiller, M., Hampe, J., Junker, K., Schubert, J., Schreiber, S., Birkenmeier, G., Krawczak, M. and Platzer, M. (2008). Genetic variants of the copy number polymorphic beta-defensin locus are associated with sporadic prostate cancer. Tumor biology. 29, 83-92.

Iafrate, A.J., Feuk, L., Rivera, M.N., Listewnik, M.L., Donahoe, P.K., Qi, Y., Scherer, S.W., and Lee, C. (2004). Detection of large-scale variation in the human genome. Nature genetics. 36, 949-951.

Iden, S. and Collard, J.G. (2008). Crosstalk between small GTPases and polarity proteins in cell polarization. Nature Reviews Molecular cell biology. 9, 846-859.

Jemal, A. Siegel, R., Ward, E., Murray, T., Xu, J. And Thun, M.J. (2007). Cancer Statistics, 2007. CA Cancer J. Clin. 57, 43-66. Kennedy, R.D., Quinn, J.E., Johnston, P.G. and Harkin DP. (2002). BRCA1: mechanisms of inactivation and implications for management of patients. Lancet. 360, 1007-1014. Kiechle, M., Jacobsen, A., Schwarz-Boeger, U., Hedderich, J., Pfisterer, J. and Arnold, N. (2001). Comparative genomic hybridization detects genetic imbalances in primary ovarian carcinomas as correlated with grade of differentiation. Cancer. 91, 534-540. Killackey, M.A. and Neuwirth, R.S. (1988). Evaluation and management of a pelvic mass: a review of 540 cases. Obstetrics and gynecology. 71, 319-322. Kinzler, K.W. and Vogelstein, B. (1996). Lessons from hereditary colorectal cancer. Cell. 87, 159-170. 124

Kobel, M., Huntsman, D. and Gilks, C.B. (2008). Critical molecular abnormalities in high-grade serous carcinoma of the ovary. Expert reviews in molecular medicine. 19, 10-14. Kobayashi, H., Sumimoto, K., Kitanaka, T., Yamada, Y., Sado, T., Sakata, M., Yoshida, S., Kawaguchi, R., Kanayama, S., Shigetomi, H., Haruta, S., Tsuji, Y., Ueda, S. and Terao, T.(2008). Ovarian endometrioma--risks factors of ovarian cancer development. European journal of obstetrics, gynecology and reproductive biology. 138, 187-193. Komatsu, M., Hiyama, K., Tanimoto, K., Yunokawa, M. et al. (2006). Prediction of individual response to platinum/paclitaxel combination using novel marker genes in ovarian cancers Molecular cancer therapy. 5, 767-75. Korn, J.M., Kuruvilla, F.G., McCarroll, S.A., Wysoker, A., Nemesh, J., Cawley, S., Hubbell, E., Veitch, J., Collins, P.J., Darvishi, K., Lee, C., Nizzari, M.M., Gabriel, S.B., Purcell, S., Daly, M.J. and Altshuler, D. (2008). Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature genetics.40, 1253-1260. Kruglyak, L. and Nickerson, D.A. (2001). Variation is the spice of life. Nature genetics. 27, 234- 236. Kurian, A.W., Balise, R.R., McGuire, V. and Whittemore, A.S. (2005). Histologic types of epithelial ovarian cancer: have they different risk factors? Gynecol Oncol. 96, 520-530. Kurman, R.J. and Shih, I. (2008). Pathogenesis of Ovarian Cancer: Lessons From Morphology and Molecular Biology and Their Clinical Implications. International journal of gynaecological pathology. 27, 151-160.

Kuwabara Y, Yamada T, Yamazaki K, Du WL, Banno K, Aoki D, Sakamoto M. (2008). Establishment of an ovarian metastasis model and possible involvement of E-cadherin down- regulation in the metastasis. Cancer Science. 99, 1933-1939.

Lachman, H.M., Petruolo, O.A., Pedrosa, E., Novak, T., Nolan, K., Stopkova, P.(2008). Analysis of protocadherin alpha gene deletion variant in bipolar disorder and schizophrenia. Psychiatric genetics. 18, 110-115.

Landen, C.N., Birrer, M.J. and Sood, A.K. (2008). Early Events in the Pathogenisis of Epithelial Ovarian Cancer. J. Clnical Oncology. 26, 995-1005. Lassus, H., Laitinen, M.P., Anttonen, M., Heikinheimo, M., Aaltonen, L.A., Ritvos, O. and Butzow, R. (2001). Comparison of serous and mucinous ovarian carcinomas: distinct pattern of allelic loss at distal 8p and expression of transcription factor GATA-4. Lab investigation. 81, 517-526. Lassus, H., Leminen, A., Vayrynen, A., Cheng, G., Gustafsson, J.A., Isola, J. and Butzow, R. (2004). ERBB2 amplification is superior to protein expression status in predicting patient outcome in serous ovarian carcinoma. Gynecologic oncology. 92, 31-39.

125

Lassus, H., Sihto, H., Leminen, A., Joensuu, H., Isola, J., Nupponen, N.N. and Butzow, R. (2006). Gene amplification, mutation and protein expression of EGFR and mutations in ERBB2 in serous ovarian carcinoma. Journal of molecular medicine. 84, 671-681. Ledbetter, D.H. (2008) Cytogenetic Technology – Genotype and Phenotype. N. Engl. J. Med. 359, 1728-1730. Lee, K.Y., Liu, Y.H., Ho, C.C., Pei, R.J., Yeh, K.T., Cheng, C.C. and Lai, Y.S. (2004). An early evaluation of malignant tendency with plectin expression in human colorectal adenoma and adenocarcinoma. Journal of medicine. 35, 141-149. Lee, C., Iafrate, J., Brothman, A.R. (2007). Copy number variations and clinical cytogenetic diagnosis of constitutional disorders. Nature gen. 39, S48-S54. Lee, K.M., et al. (2008). CYP1A1, GSTM1, and GSTT1 polymorphisms, smoking, and lung cancer risk in a pooled analysis among Asian populations. Cancer epidemiology, biomarkers and prevention. 17, 1120-1126. Levanon, K., Crum, C. and Drapkin, R. (2008). New insights into the pathogenesis of serous ovarian cancer and its clinical impact. Journal of clinical oncology. 26, 5284-5293. Li, C. and Wong, W.H. (2001). Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. PNAS 98, 31–36.

Li, P., Maines-Bandiera, S., Kuo, W.L., Guan, Y., Sun, Y., Hills, M., Huang, G., Collins, C.C., Leung, P.C., Gray, J.W. and Auersperg, N. (2007). Multiple roles of the candidate oncogene ZNF217 in ovarian epithelial neoplastic progression. International journal of cancer. 120, 1863- 1873.

Linn, S.C., West, R.B., Pollack, J.R., Zhu, S., Hernandez-Boussard, T., Nielsen, T.O., Rubin, B.P., Patel, R., Goldblum, J.R., Siegmund, D. et al., (2003). Gene expression patterns and gene copy number changes in dermatofibrosarcoma protuberans. American Journal of Pathology 163, 2383–2395.

Liu, Y., Soto, I., Tong, Q., Chin, A., Buhring H.J., Wu, T., Zen, K. and Parkos, C.A. (2005). SIRPbeta1 is expressed as a disulfide-linked homodimer in leukocytes and positively regulates neutrophil transepithelial migration. Journal of biological chemistry. 280, 36132-36140.

Lo, K.C., Stein, L.C., Panzarella, J.A., Cowell, J.K., Hawthorn, L. (2008). Identification of genes involved in squamous cell carcinoma of the lung using synchronized data from DNA copy number and transcript expression profiling analysis. Lung Cancer. 59, 315-331.

Mahtani, R.L. and Macdonald, J.S. (2008). Synergy between cetuximab and chemotherapy in tumors of the gastrointestinal tract. Oncologist. 13, 39-50. Mano, M.S., Awada, A., Di Leo, A., Durbecq, V., Paesmans, M., Cardoso, F., Larsimont, D. and Piccart, M. (2004). Rates of topoisomerase II-alpha and HER-2 gene amplification and expression in epithelial ovarian carcinoma. Gynecological Oncolcology. 92, 887–895. 126

Marasca, R., Maffei, R., Zucchini, P., Castelli, I., Saviola, A., Martinelli, S., Ferrari, A., Fontana, M., Ravanetti, S. and Torelli, G. (2006). Gene expression profiling of acute promyelocytic leukaemia identifies two subtypes mainly associated with flt3 mutational status. Leukemia. 20, 103-114.

Marshall CR, Noor A, Vincent JB, Lionel AC, Feuk L, Skaug J, Shago M, Moessner R, Pinto D, Ren Y, Thiruvahindrapduram B, Fiebig A, Schreiber S, Friedman J, Ketelaars CE, Vos YJ, Ficicioglu C, Kirkpatrick S, Nicolson R, Sloman L, Summers A, Gibbons CA, Teebi A, Chitayat D, Weksberg R, Thompson A, Vardy C, Crosbie V, Luscombe S, Baatjes R, Zwaigenbaum L, Roberts W, Fernandez B, Szatmari P, Scherer SW. (2008) Structural variation of chromosomes in autism spectrum disorder. Am. J. Human Genetics. 82, 477-488. Massazza, G., Tomasoni, A., Lucchini, V., Allavena, P., Erba, E., Colombo, N., Mantovani, A., D’Incalci, M., Mangioni, C. and Giavazzi, R. (1989). Intraperitoneal and subcutaneous xenografts of human ovarian carcinoma in nude mice and their potential in experimental therapy. Int. J. Cancer. 44, 494-500. Mayr, D., Kanitz, V., Anderegg, B., Luthardt, B., Engel, J., Lohrs, U., Amann, G. and Diebold, J. (2006). Analysis of gene amplification and prognostic markers in ovarian cancer using comparative genomic hybridization for microarrays and immunohistochemical analysis for tissue microarrays. American journal of clinical pathology. 126, 101-109.

McCluggage, W.G. (2008). My approach to and thoughts on the typing of ovarian carcinomas. Journal of clinical pathology. 61, 152-163. McMarroll, S.A. and Altshuler, D.M. (2007). Copy-number variation and association studies of human disease. Nature Genetics. 39, S37-S42. Meinhold-Heerlein, I., Bauerschlag, D., Hilpert, F., Dimitrov, P., Sapinso, L.M. et al. (2005). Molecular and prognostic distinction between serous ovarian caercinomas of varying grade and malignant potential. Oncogene. 24, 1053-1065. Merritt, M.A., Green, A.C., Nagle, C.M., Webb, P.M., Australian Cancer Study (Ovarian Cancer) and Australian Ovarian Cancer Study Group. (2008). Talcum powder, chronic pelvic inflammation and NSAIDs in relation to risk of epithelial ovarian cancer. International journal of cancer. 122, 170-176. Miki, Y., Swensen, J., Shattuck-Eidens, D., Futreal, P.A., Harshman, K., Tavtigian, S., Liu, Q., Cochran, C., Bennett, L.M., Ding, W., et al. (1994). A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. Science. 266, 66-71. Mitchell, R.N., Kumar, V., Abbas, A.K. and Fausto, N. (2006). Pathologic Basis of Disease, 7th Ed. Saunders Elsevier: Toronto. Modan, B., Hartge, P., Hirsh-Yechezkel, G., Chetrit, A., Lubin, F., Beller, U., Ben-Baruch, G., Fishman, A., Menczer, J., Ebbers, S.M., Tucker, M.A., Wacholder, S., Struewing, J.P., Friedman, E., Piura, B. and National Israel Ovarian Cancer Study Group. (2001). Parity, oral 127

contraceptives, and the risk of ovarian cancer among carriers and noncarriers of a BRCA1 or BRCA2 mutation. NEJM. 345, 235-240. Monzon, F.A., Hagenkord, J.M., Lyons-Wieler, M.A., Balani, J.P., Parwani, A.V., Sciulli, C.M., Li, J., Chandran, U.R., Bastacky, S.I., Dhir, R. (2008). Whole genome SNP arrays as a potential diagnostic tool for the detection of characteristic chromosomal aberrations in renal epithelial tumors. Modern Path. 21, 599-608. Moorman, P.G., Calingaert, B., Palmieri, R.T., Iversen, E.S., Bentley, R.C., Halabi, S., Berchuck, A. and Schildkraut, J.M. (2008). Hormonal risk factors for ovarian cancer in premenopausal and postmenopausal women. American journal of epidemiology. 167, 1059- 1069. Morais, R., Zinkewich-Péotti, K., Parent, M., Wang, H., Babai, F., Zollinger, M. (1994). Tumor- forming ability in athymic nude mice of human cell lines devoid of mitochondrial DNA. Cancer Res. 54, 3889-3896. Motamed-Khorasani, A., Jurisica, I., Letarte, M., Shaw, P.A., Parkes, R.K., Zhang, X., Evangelou, A., Rosen, B., Murphy, K.J. and Brown, T.J. (2007). Differentially androgen- modulated genes in ovarian epithelial cells from BRCA mutation carriers and control patients predict ovarian cancer survival and disease progression. Oncogene. 26, 198-214. Motoyama, K., Tanaka, F., Kosaka, Y., Mimori, K., Uetake, H., Inoue, H., Sugihara, K. and Mori, M. (2008). Clinical significance of BMP7 in human colorectal cancer. Annals of surgical oncology. 15, 1530-1537. Mukherjee, S. And Mitra, S. (2004). Hidden Markov Models, Grammars, and Biology: A tutorial. Journal of Bioinformatics and Computational Biology. 3, 491-526. Murayama, H., Fukuda, Y., Tsunekawa, S., Ikemoto, M. and Nagata, A. (2007). Ratio of serum ornithine carbamoyltransferase to alanine aminotransferase as a potent indicator for hepatocellular carcinoma. Clinical biochemistry. 40, 1077-1080. Murray, S.S., Oliphant, A., Sten, R., McBride, C., Steeke, R.J., Shannon, S.G., Rubano, T. Kermani, B.G., Fan, J., Chee, M.S. and Hansen, M.S. (2004). A highly informative SNP linkage panel for human genetic studies. Nature methods. 1, 1-5. Nakao, K., Mehta, K.R., Fridlyand, J., Moore, D.H., Jain, A.N., Lafuente, A., Wiencke, J.W., Terdiman, J.P. and Waldman, F.M. (2004). High-resolution analysis of DNA copy number alterations in colorectal cancer by array-based comparitive genomic hybridization. Carcinogenesis. 25, 1345-1357. Nakayama, K., Nakayama, N., Jinawath, N., Salani, R., Kurman, R.J. et al. (2007). Amplicon profiles in ovarian serous carcinomas. International Journal of cancer. 120, 2613-2617. Nagle, C.M., Bain, C.J., Green, A.C. and Webb, P.M. (2008). The influence of reproductive and hormonal factors on ovarian cancer survival. International journal of gynaecological cancer. 18, 407-413.

128

Network Analysis, Visualization & Graphing TORonto. (2008). http://ophid.utoronto.ca/navigator/ NCBI: National Center for Biotechnology Information. (2008). http://www.ncbi.nlm.nih.gov/. Ness, R.B. and Cottreau, C. (1999). Possible role of ovarian epithelial inflammation in ovarian cancer. Journal of the national cancer institute. 91, 1459-1467. Ness, R.B., Grisso, J.A., Cottreau, C., Klapper, J., Vergona, R., Wheeler, J.E., Morgan, M. and Schlesselman, J.J. (2000). Factors related to inflammation of the ovarian epithelium and risk of ovarian cancer. Epidemiology. 11, 111-117.

Neves-E-Castro, M. (2008). Association of ovarian and uterine cancers with postmenopausal hormonal treatments. Clinical obstetrics and gynecology. 51, 607-617. Nowee, M.E., Snijders, A.M., Rockx, D.A., de Wit, R.M., Kosma, V.M., Hämäläinen, K., Schouten, J.P., Verheijen, R.H., van Diest, P.J., Albertson, D.G. and Dorsman, J.C. (2007). DNA profiling of primary serous ovarian and fallopian tube carcinomas with array comparative genomic hybridization and multiplex ligation-dependent probe amplification. Journal of pathology. 213, 46-55. Okuda, T., Otsuka, J., Sekizawa, A., Saito, H., Makino, R., Kushima, M., Farina, A., Kuwano, Y. and Okai, T. (2003). p53 mutations and overexpression affect prognosis of ovarian endometrioid cancer but not clear cell cancer. Gynecologic oncology. 88, 318-325. Oliva, E., Sarrio, D., Brachtel, E.F., Sanchez-Estevez, C., Soslow, R.A., Moreno-Bueno, G. And Palacios, J. (2006). High frequency of beta-catenin mutations in borderline endometrioid tumours of the ovary. Journal of pathology. 208, 708-713. Olivier, R.I., van Beurden, M., van’t Veer, L.J. (2006). The role of gene expression in the clinical management of ovarian cancer. Eur. J. Cancer. 42, 2930-2938. Olshen, A.B., Venkatraman, E.S., Lucito, R.Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 5, 557-572. Oncomine (Research) – Cancer Profiling Database (2008). http://www.oncomine.org/ O'Neill, C.J., Deavers, M.T., Malpica, A., Foster, H. and McCluggage, W.G. (2005). An immunohistochemical comparison between low-grade and high-grade ovarian serous carcinomas: significantly higher expression of p53, MIB1, BCL2, HER-2/neu, and C-KIT in high-grade neoplasms. American journal of surgical pathology. 29, 1034-1041. O'Neill, C.J., McBride, H.A., Connolly, L.E., Deavers, M.T., Malpica, A. and McCluggage, W.G. (2007). High-grade ovarian serous carcinoma exhibits significantly higher p16 expression than low-grade serous carcinoma and serous borderline tumour. Histopathology. 50, 773-779. Osterberg, L., Levan, K., Partheen, K., Helou, K. And Horvath, G. (2005). Cytogenetic analysis of carboplatin resistance in early-stage epithelial ovarian carcinoma. Cancer genetics and cytogenetics. 163, 144-150. 129

Pan, J.G. and Mak, T.W. (2007). Metabolic targeting as an anticancer strategy: Dawn of a new era? Science Signalling. 381, 14-20. Peppercorn, J., Perou, C.M. and Carey, L. (2008). Molecular subtypes in breast cancer: divide and conquer. Cancer investigation. 26, 1-10. Perou, C.M., Sorlie, T., Eisen, M.B., van de Rijn, M., Jeffrey, S.S., Rees, C.A., Pollack, J.A., Ross, D.T., Johnsen, H., Akslen, L.A., Fluge, O., Pergamenshikov, A., Williams, C., Zhu, S.X., Lonning, P.E., Borressen-Dale, A., Brown, P.O. and Botstein, D. (2000). Molecular portraits of human breast tumors. Nature. 406, 747-752. Peters, D., Freund, J. and Ochs, R.L.. Genome-wide transcriptional analysis of carboplatin response in chemosensitive and chemoresistant ovarian cancer cells. Molecular cancer therapy. 4, 1605-16. Pico, A.R., Kelder, T., van Iersel, M.P., Hanspers, K., Conklin, B.R., and Evelo, C. (2008). WikiPathways: pathway editing for the people. PLoS Biology. 6, 1403-1407. Piek, J.M.J., Kenemens, P., Zweemer, R.P., van Diest, P.J. and Verheigen, R.H. (2007). Ovarian carcinogenesis, an alternative theory. Gynecologic oncology. 107, 355-358. Piestun, D., Kochupurakkal, B.S., Jacob-Hirsch, J., Zeligson, S., Koudritsky, M., Domany, E., Amariglio, N., Rechavi, G. and Givol, D. (2006). Nanog transforms NIH3T3 cells and targets cell-type restricted genes. Biophysical and biochemical research communications. 343, 279-285. Pike, M.C., Spicer, D.V., Dahmoush, L. and Press, M.F. (1993). Estrogens, progestogens, normal cell proliferation, and breast cancer risk. Epidemiology Reveiews. 15, 17–35. Pinkel, D. and Albertson, D.G. (2005). Array comparative genomic hybridization and its applications in cancer. Nature genetics. 37, S11-S17. Piotrowski, A., Bruder, C.E.G., Andersson, R., de Stahl, T.D., Menzel, U., Sangren, J., Poplawski, A., van Tell, D., Crasto, C., Bogdan, A., Bartoszewski, R., Bebok, Z., Kryzanowski, M., Janowski, Z., Partridge, E.C., Komoroski, J. and Dumanski, J.P. (2008). Somatic moscacism for copy number variation in differentiated human tissue. Human mutation. 29, 1118-1124. Pollack, J.R., Sorlie, T., Perou, C.M., Rees, C.A., Jeffrey, S.S., Lonning, P.E., Tibshriani, R., Botstein, D., Borresen-Dale, A.L. and Brown, P.O. (2002). Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. PNAS, 99, 12963–12968.

Pyeon, D., Newton, M.A., Lambert, P.F., den Boon, J.A. et al. (2007). Fundamental differences in cell cycle deregulation in human papillomavirus-positive and human papillomavirus-negative head/neck and cervical cancers. Cancer Research. 67, 4605-19.

Riener, M.O., Nikolopoulos, E., Herr, A., Wild, P.J., Hausmann, M., Wiech, T., Orlowska-Volk, M., Lassmann, S., Walch, A., Werner, M. (2008). Microarray comparative genomic hybridization analysis of tubular breast carcinoma shows recurrent loss of the CDH13 locus on 16q. Human Pathology. 39, 1621-1629. 130

Rennstram, K., Ahlstedt-Soini, M., Baldetop, B., Bendahl, P.O., Borg, A., Karhu, R., Tanner, M., Tirkkonen, M. and Isola, J. (2003). Patterns of chromosomal imbalances defines subgroups of breast cancer with distinct clinical features and prognosis. A study of 305 tumors by comparative genomic hybridization. Cancer Res. 63, 8861-8868.

Rhodes, D.R., Kalyana-Sundaram, S., Mahavisno, V., Varambally, R., Yu, J., Briggs, B.B., Barrette, T.R., Anstet, M.J., Kincead-Beal, C., Kulkarni, P., Varambally, S., Ghosh, D., and Chinnaiyan A.M. (2007). Oncomine 3.0: genes, pathways, and networks in a collection of 18,000 cancer gene expression profiles. Neoplasia. 9, 166-180. Rodriguez-Rodriguez, L., Sancho-Torres, I., Mesonero, C., Gibbon, D.G., Shih, W.J. and Zotalis, G. (2003). The CD44 receptor is a molecular predictor of survival in ovarian cancer. Medical oncology. 20, 255-263. Rossing, M.A., Cushing-Haugen, K.L., Wicklund, K.G., Doherty, J.A. and Weiss, N.S. (2008). Risk of epithelial ovarian cancer in relation to benign ovarian conditions and ovarian surgery. Cancer causes & control. Aug 14 (Epub ahead of print).

Rowley, J.D. (1998). The critical role of chromosome translocations in human leukemias. Annual review of genetics. 32, 495-519. Ryley, D.A., Wu, H.H., Leader, B., Zimon, A., Reindollar, R.H. and Gray, M.R. (2005). Characterization and mutation analysis of the human formin-2 (FMN2) gene in women with unexplained infertility. Fertility and Sterility. 83, 1363-1371. Salinas, C.A., Kwon, E., Carlson, C.S., Koopmeiners, J.S., Feng, Z., Karyadi, D.M., Ostrander, E.A. and Stanford, J.L. (2008). Multiple independent genetic variants in the 8q24 region are associated with prostate cancer risk. Cancer epidemiology, biomarkers and prevention. 17, 1203- 1213. Salvador, S., Rempel, A., Soslow, R.A., Gilks, B., Hunstman, D. and Miller, D. (2008). Chromosomal instability in fallopian tube precursor lesions of serous carcinoma and frequent monoclonality of synchronous ovarian and fallopian tube mucosal serous carcinoma. Gynecologic Oncology. 110, 408-417. Sanders, M.A., Verhaak, R.G.W., Geertsma-Kleinekoort, W.M.C., Abbas, S., Horsman, S., van der Spek, P., Lowenburg, B. And Valk, P.J.M. (2008). SNPExpress: integrated visualization of genome-wide genotypes, copy numbers and gene expression levels. BMC Genomics. 9, 1-7. Scherer, S.W., Lee, C., Birney, E., Altshuler, D.M., Eichler. E.E., Carter, N.P., Hurles, M.E., and Feuk, L. (2007). Challenges and standards in integrating surveys of structural variation. Nature genetics. 39, S7-S15. Segal, E., Friedman, N., Kaminski, N., Regev, A., Koller, D. (2005). From signatures to models: understanding cancer using microarrays. Nature genetics. 37, S38-S45. Sharp, A.J., Locke, D.P., McGrath, S.D., Cheng, Z., Bailey, J.A., Vallente, R.U., Pertz, L.M., Clark. R.A., Schwartz, S., Segraves, R., Oseroff, V.V., Albertson. D.G., Pinkel, D., and Eichler,

131

E.E. (2005). Segmental duplications and copy-number variation in the human genome. American journal of human genetics. 77, 78-88. She, X., Jiang, Z., Clark, R.A., Liu, G., Cheng, Z., Tuzun, E., Church, D.M., Sutton, G., Halpern, A.L., Eichler, E.E. (2004). Shotgun sequence assembly and recent segmental duplications within the human genome. Nature 431, 927–930.

Shridhar, V., Lee, J., Pandita, A., Iturria, S., Avula, R., Staub, J., Morrissey, M., Calhoun, E., Sen, A., Kalli, K., Keeney, G., Roche, P., Cliby, W., Lu, K., Schmandt, R., Mills, G.B., James, C.D., Couch, F.J., Hartmann, L.C., Lillie, J. and Smith, D.I. (2001). Genetic analysis of early- versus late-stage ovarian tumors. Cancer research. 61, 5895-5904.

Singer, G., Stöhr, R., Cope, L., Dehari, R., Hartmann, A., Cao, D.F., Wang, T.L., Kurman, R.J. and Shih, I. (2005). Patterns of p53 mutations separate ovarian serous borderline tumors and low- and high-grade carcinomas and provide support for a new model of ovarian carcinogenesis: a mutational analysis with immunohistochemical correlation. American journal of surgical pathology. 29, 218-224.

Skirnisdottir, I., Sorbe, B., Karlsson, M. and Seidal, T. (2001). Prognostic importance of DNA ploidy and p53 in early stages of epithelial ovarian carcinoma. International journal of oncology. 19, 1295-1302.

Skirnisdottir, I., Seidal, T., Karlsson, M.G. and Sorbe, B. (2005). Clinical and biological characteristics of clear cell carcinomas of the ovary in FIGO stages I-II. International journal of oncology. 26, 177-183.

Skubitz, A., Pambuccian, S., Argenta, P. and Skubitz, K. (2006). Diffirential gene expression identifies subgroups of ovarian cancer. Translational Res. 148, 223-248. Slamon, D.J., Clark, G.M., Wong, S.G., Levin, W.J., Ullrich, A., Mcguire, W.L. (1987). Human Breast Cancer: Correlation of Relapse and Survival with Amplification of the Her2/neu oncogene. Science 235, 177-182. Slamon, D.J., Leyland-Jones, B., Shak, S., Fuchs, H., Paton, V., Bajamonde, A., Fleming, T., Eiermann, W., Wolter, J., Pegram, M., Baselga, J., Norton, L. (2001). Use of chemotherapy plus a monoclonal antibody against Her2 for metastatic breast cancer that overexpresses Her2. N Engl J Med 344, 783-792. Smith, J.S., Alderete, B., Minn, Y., Borell, T.J., Perry, A., Mohapatra, G., Hosek, S.M., Kimmel, D., O'Fallon, J., Yates, A., Feuerstein, B.G., Burger, P.C., Scheithauer, B.W. and Jenkins, R.B. (1999). Localization of common deletion regions on 1p and 19q in human gliomas and their association with histological subtype. Oncogene. 18, 4144-4152. Somiari, S.B., Shriver, C.D., He, J., Parikh, K., Jordan, R., Hooke, J., Hu, H., Deyarmin, B., Lubert, S., Malicki, L., Heckman, C. and Somiari, R.I. (2004). Global search for chromosomal abnormalities in infiltrating ductal carcinoma of the breast using array-comparative genomic hybridization. Cancer genetics and cytogenetics. 155, 108-118. 132

Sonoda, G., Palazzo, J., du Manoir, S., Godwin, A.K., Feder, M., Yakushiji, M. and Testa, J.R. (1997). Comparative genomic hybridization detects frequent overrepresentation of chromosomal material from 3q26, 8q24, and 20q13 in human ovarian carcinomas. Genes, chromosomes and cancer. 20, 320-328.

Soria, G. and Ben-Baruch, A. (2008). The inflammatory chemokines CCL2 and CCL5 in breast cancer. Cancer letters. 267, 271-285. Spentzos, D., Levine, D.A., Ramoni, M.F., Joseph, M., Gu, X., Boyd, J., Libermann, T.A. and Cannistra SA. (2004). Gene expression signature with independent prognostic significance in epithelial ovarian cancer. Journal of clinical oncology. 22, 4700-4710. Stachowiak, M.K., Maher, P.A. and Stachowiak, E.K. (2007). Integrative nuclear signaling in cell development--a role for FGF receptor-1. DNA and cell biology. 26, 811-826. Stern, R.C., Dash, R., Bentley, R.C., Snyder, M.J., Haney, A.F. and Robboy, S.J. (2001). Malignancy in endometriosis: frequency and comparison of ovarian and extraovarian types. International journal of gynaecological pathology. 20, 133-139. Struewing, J.P., Hartge, P., Wacholder, S., Baker, S.M., Berlin, M., McAdams, M., Timmerman, M.M., Brody, L.C. and Tucker, M.A. (1997). The risk of cancer associated with specific mutations of BRCA1 and BRCA2 among Ashkenazi Jews. New England Journal Medicine. 336, 1401-1408. Syvanen, A. (2005). Toward genome-wide SNP genotyping. Nature genetics. 37, S5-S10. Tan, D.S.P. and Kaye, S. (2007). Ovarian clear cell adenocarcinoma: a continuing enigma. J. Clin. Pathol. 60, 355-360. Tan, D.S., Rothermundt, C., Thomas, K., Bancroft, E., Eeles, R., Shanley, S., Ardern-Jones, A., Norman, A., Kaye, S.B. and Gore, M.E. (2008). "BRCAness" Syndrome in Ovarian Cancer: A Case-Control Study Describing the Clinical Features and Outcome of Patients With Epithelial Ovarian Cancer Associated With BRCA1 and BRCA2 Mutations. Journal of clinical oncology. Epub ahead of print. Tewari K., Cappuccini, F., Disaia, P.J., Berman, M.L., Manetta, A. and Kohler, M.F. (2000). Malignant germ cell tumors of the ovary. Obstetrics and gynecology. 95, 128-133. The International HapMap consortium. (2003). The International HapMap project. Nature, 426, 789- 796. Thigpen, J.T., Vance, R.B. and Khansur, T. (1993). Second-line chemotherapy for recurrent carcinoma of the ovary. Cancer. 71, 1559-1564. Thomas, R.K., Weir, B. and Meyerson, M. (2006). Genomic approaches to lung cancer. Clinical Cancer Research. 12, S4384-S4391.

133

Tone, A.A., Begley, H., Sharma, M., Murphy, J., Rosen, B., Brown, T.J. and Shaw, P.A. (2008). Gene expression profiles of luteal phase fallopian tube epithelium from BRCA mutation carriers resemble high-grade serous carcinoma. Clinical cancer research. 14, 4067-4078. Torkamani, A. and Schork, N.J. (2008). Prediction of cancer driver mutations in protein kinases. Cancer Research. 68, 1675-1682. Tsuchiya, A., Sakamoto, M., Yasuda, J., Chuma, M., Ohta, T., Ohki, M., Yasugi, T., Taketani, Y., Hirohashi, S. (2003). Expression Profiling in Ovarian Clear Cell Carcinoma. Am. J. Path. 163, 2503-2512. Tummala, M.K., Alagarsamy, S. and McGuire, W.P. (2008). Intraperitoneal chemotherapy: standard of care for patients with minimal residual stage III ovarian cancer? Expert review of anticancer therapy. 8, 1135-1147. UCSC Genome Browser (2008). http://genome.ucsc.edu. Vasto, S., Carruba, G., Candore, G., Italiano, E., Di Bona, D. and Caruso, C.(2008). Inflammation and prostate cancer. Future oncology. 4, 637-645. Visintin, I., Feng, Z., Longton, G., Ward, D.C., Alvero, A.B., Lai, Y., Tenthorey, J., Leiser, A., Flores-Saaib, R., Yu, H., Azori, M., Rutherford, T., Schwartz, P.E. and Mor, G. (2008). Diagnostic markers for early detection of ovarian cancer. Clinical Cancer Res. 14, 1065-1072. Vogelstein, B., Lane, D., Levine, A.J. (2000). Surfing the p53 network. Nature. 408, 307-310. Wang, D.G., Fan, J.B., Siao, C.J., Berno, A., Young, P., Sapolsky, R., Ghandour, G., Perkins, N., Winchester, E., Spencer, J., et al (1998). Large-scale identification, mapping, and genotyping of single nucleotide polymorphisms in the human genome. Science, 280, 1077-82.

Wang, W.Y.S., Barratt, B.J., Clayton, D.G. and Todd, J.A. (2005). Genome-wide association studies: theoretical and practical concerns. Nature genetics. 6, 109-118. Wang, K., Li, M., Hadley, D., Liu, R., Glessner, J., Grant, S.F.A., Hakonarson, H., Buca, M. (2008). PennCNV: A integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665-1674. Weinberg, R.A. (2007). The biology of cancer. Garland: New York, NY. Weiss, M.M., Snijders, A.M., Kuipers, E J., Ylstra, B., Pinkel, D., Meuiwissen, S.G.M., Van Diest, P.J., Albertson, D.G. and Meijer, G.A. (2003). Determination of amplicon boundaries at 20q13.2 in tissue samples of human gastric adenocarcinomas by high-resolution microarray comparative genomic hybridization. The Journal of Pathology. 200, 320–326.

Weir, B.A. et al. (2007). Characterizing the cancer genome in lung adenocarcinoma. Nature. 450, 893-901.

134

Widakowich, C., de Castro, G., de Azambuja, E., Dinh, P. and Awada, A. (2007). Review: side effects of approved molecular targeted therapies in solid cancers. Oncologist. 12, 1443-1455. Wiseman, R. (2004). Breast cancer: critical data analysis concludes that estrogens are not the cause, however lifestyle changes can alter risk rapidly. Journal of Clnical Oncology. 57, 766- 772. Xming. (2008). http://www.straightrunning.com/XmingNotes/. Xu, B., Roos, J.L., Levy, S., van Rensburg, E.J., Gogos, J.A. and Karayiorgou, M. (2008). Strong association of de novo copy number mutations with sporadic schizophrenia. Nature genetics. 40, 880-885. Yan, W. and Chen, X. (2007). Targeted repression of bone morphogenetic protein 7, a novel target of the p53 family, triggers proliferative defect in p53-deficient breast cancer cells. Cancer Research. 67, 9117-9124. Yeager, M., Xiao, N., Hayes, R.B., Bouffard, P., Desany, B., Burdett, L., Orr, N., Matthews, C., Qi, L., Crenshaw, A., Markovic, Z., Fredrikson, K.M., Jacobs, K.B., Amundadottir, L., Jarvie, T.P., Hunter, D.J., Hoover, R., Thomas, G., Harkins, T.T. and Chanock, S.J. (2008). Comprehensive resequence analysis of a 136 kb region of human chromosome 8q24 associated with prostate and colon cancers. Human genetics. 124, 161-170. Yi, M., Horton, J.D., Cohen, J.C., Hobbs, H.H. and Stephens, R.M. (2006). WholePathwayScope: a comprehensive pathway-based analysis tool for high-throughput data. BMC Bioinformatics. 7, 30-54. Yin, X.L., Chen, S. and Gu, J.X. (2002). Identification of TH1 as an interaction partner of A-Raf kinase. Molecular and cellular biochemistry. 231, 69-74. Ylstra, B., van den Ljssel, P., Carvalho, B., Brakenhoff, R.H., Meijer, G.A. (2006). BAC to the future! or oligonucleotides: a perspective for micro array comparative genomic hybridization (array CGH). Nucl. Acids Res. 34, 445-450. Yu, Y., Baras, A.S., Shirasuna, K., Moskaluk C.A. (2007). Concurrent loss of heterozygosity and copy number analysis in adenoid cystic carcinoma by SNP genotyping arrays. Lab investigation. 87, 430-439. Yuan, T.L. and Cantley, L.C. (2008). PI3K pathway alterations in cancer: variations on a theme. Oncogene. 27, 5497-5510. Zhang, L., Conejo-Garcia, J.R., Katsaros, D., Gimotty, P.A., Massobrio, M., Regnani, G., Makrigiannakis, A., Gray, H., Schlienger, K., Liebman, M.N., Rubin, S.C. and Coukos, G. (2003). Intratumoral T cells, recurrence, and survival in epithelial ovarian cancer. NEJM. 348, 203-213. Zhang Q, Wu J, Nguyen A, Wang BD, He P, Laurent GS, Rennert OM, Su YA. (2008). Molecular mechanism underlying differential apoptosis between human melanoma cell lines

135

UACC903 and UACC903(+6) revealed by mitochondria-focused cDNA microarrays. Apoptosis. 13, 993-1004. Zhao, X., Li, C., Paez, G., Chin, K., Janne, P.A., Chen, T., Girard, L., Minna, J., Christiani, D., Leo, C., Gray, J.W., Sellers, Wl and Meyerson, M. (2004). An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. Cancer Res. 64, 3060-3071. Zhao, X.,Weir, B.A., La Framboise, T., et al. (2005). Homozygous deletions and chromosome amplification in human lung carcinomas revealed by single nucleotide polymorphism array analysis. Cancer Research. 65, 5561-5570.

Zorn, K.K., Bonome, T., Gangi, L., Chandramouli, G.V., Awtrey, C.S., Gardner, G.J., Barrett, J.C., Boyd, J. and Birrer, M.J. (2005). Gene expression profiles of serous, endometrioid, and clear cell subtypes of ovarian and endometrial cancer. Clinical cancer research. 11, 6422-6430. Zreik, T.G., Ayoub, C.M., Hannoun, A., Karam, C.J. and Munkarah, A.R. (2008). Fertility drugs and risk of ovarian cancer: dispelling the myth. Current opinion in obstetrics and gynecology. 20, 313-319.

136

Appendix

137

Epithelial ovarian cancer histological subtype Number of samples

Serous 100

Normal 10

Endometrioid 10

Borderline serous 7

Benign 5

Clear cell 5

Metastasis 4

Malignant mixed mullerian tumor (MMMT) 4

Mucinous 3

Squamous 3

Borderline mucinous 2

Transitional cell 2

Table 1. Initial break-down of ovarian samples received from Dr. Denis Slamon. The DNA from 155 tumors were received from Dr. Denis Slamon’s laboratory and run on the Affymetrix Genome-Wide SNP array 6.0.

138

Frequency of aberration

Sample Number

Figure 3. Histogram of sample number distribution. Most genomic aberrations are only found in 2 samples. Copy number gain is shown in black, copy number loss is in grey.

139

Initial gene list: 1525 copy number gains 397 copy number losses

Remove aberrations in less than 5 samples for gains, less than 3 samples for losses 513 copy number gains

117 copy number losses

Remove single gene copy number entries

508 copy number gains

86 copy number losses

Remove genes that were in normal sample gene list

508 copy number gains

85 copy number losses

Remove obvious normal polymorphic genes e.g. olfactory receptor genes 469 copy number gains

78 copy number losses

Only keep regions of copy number aberration spanning at least 10 genes or at least 150kb

Final gene list:

447 copy number gains

38 copy number losses

Figure 4. Flow chart displaying application of gene list parameters. As more parameters were applied the gene lists for both copy number gains and losses become shorter. 140

Copy Chromosomal Number Genes found Gene Annotation Function Number Region of in region Samples SYCN syncollin Trafficking SIRT2 sirtuin (silent mating type Cytoskeleton information regulation 2 homolog) 2 FAM98C family with sequence Unknown similarity 98, member C PLEKHG2 pleckstrin homology G protein domain containing, family G (with RhoGef domain) member 2 SPRED3 sprouty-related, EVH1 Cell domain containing 3 signalling EID2 EP300 interacting inhibitor Cell of differentiation 2 proliferation YIF1B Yip1 interacting factor Unknown homolog B (S. cerevisiae) KCNK6 potassium channel, Ion channel subfamily K, member 6 PSMD8 proteasome (prosome, Cell macropain) 26S subunit, signalling non-ATPase, 8 5.1 19q 6 GGN gametogenetin Cell proliferation RASGRP4 RAS guanyl releasing GTP binding protein 4 IL28A interleukin 28A Immune (interferon, lambda 2) MAP4K1 mitogen-activated protein Kinase kinase kinase kinase kinase 1 EIF3K eukaryotic translation Cell initiation factor 3, subunit proliferation K ACTN4 actinin, alpha 4 Cytoskeleton CAPN12 calpain 12 Cell proliferation LGALS7 , galactoside-binding, Metabolism soluble, 7 LGALS4 lectin, galactoside-binding, Metabolism soluble, 4 ECH1 enoyl Coenzyme A Immune hydratase 1, peroxisomal 141

FLJ45909 Ras and Rab interactor-like GTP binding SARS2 seryl-tRNA synthetase 2, mitochondrial MRPS12 mitochondrial ribosomal Respiration protein S12 FBXO17 F-box protein 17 Ubiquitinatio FBXO27 F-box protein 27 n FLJ16165 purple acid phosphatase Cell long form signalling PAK4 p21 protein (Cdc42/Rac)- Cell cycle activated kinase 4 LOC342897 non-specific cytotoxic cell Unknown receptor protein 1 homolog IL28B interleukin 28B (interferon, Immune lambda 3) IL29 interleukin 29 (interferon, lambda 1) LRFN1 leucine rich repeat and Cytoskeleton fibronectin type III domain containing 1 GMFG glia maturation factor, gamma SAMD4B sterile alpha motif domain Cell containing 4B signalling PAF1 peroxisomal membrane protein 3, 35kDa MED29 mediator complex subunit 29 SUPT5H suppressor of Ty 5 Transcription homolog factor TIMM50 ranslocase of inner Respiration mitochondrial membrane 50 homolog (S. cerevisiae) SELV selenoprotein V EID2B EP300 interacting inhibitor Cell of differentiation 2B proliferation LGALS13 lectin, galactoside-binding, Metabolism soluble, 13 NFKBIB nuclear factor of kappa Apoptosis light polypeptide gene enhancer in B-cells inhibitor, beta C19orf15 chromosome 19 open Unknown reading frame 15

142

RYR1 Cell (skeletal) signalling ZNF568 zinc finger protein 568 Transcription factor GAL7 lectin, galactoside-binding, Metabolism soluble, 7 DLL3 delta-like 3 (Drosophila) Development HNRNPL heterogeneous nuclear Cell ribonucleoprotein L proliferation RPS16 ribosomal protein S16 Cell signalling ZNF527 zinc finger protein 527 Transcription factor RPS16 ribosomal protein S16 Cell signalling ZNF527 zinc finger protein 527 Transcription ZNF829 zinc finger protein 829 factor ZNF529 zinc finger protein 529 ZFP36 zinc finger protein 36, C3H ZFP14 zinc finger protein 14 homolog ZNF545 zinc finger protein 82 homolog ZNF566 zinc finger protein 566 ZNF260 zinc finger protein 260 ZNF382 zinc finger protein 382 ZNF461 zinc finger protein 382 ZNF567 zinc finger protein 567 ZNF790 zinc finger protein 790 ZNF345 zinc finger protein 345 ZNF420 zinc finger protein 420 ZNF585A zinc finger protein 585A ZNF585B zinc finger protein 585B ZNF383 zinc finger protein 383 HKR1 GLI-Kruppel family Development member HKR1 ZNF569 zinc finger protein 569 Transcription ZNF570 zinc finger protein 570 factor ZNF793 zinc finger protein 793 ZNF540 zinc finger protein 540 ZNF571 zinc finger protein 571 ZFP30 zinc finger protein 30 homolog ZNF781 zinc finger protein 781 143

ZNF607 zinc finger protein 607 ZNF573 zinc finger protein 573 SIPA1L3 signal-induced Cell proliferation-associated 1 proliferation like 3 DPF1 D4, zinc and double PHD Transcription fingers family 1 factor PPP1R14A protein phosphatase 1, Cell regulatory (inhibitor) signalling subunit 14A SPINT2 serine peptidase inhibitor, Kunitz type, 2 C19orf33 chromosome 19 open Unknown reading frame 33 COL9A2 collagen, type IX, alpha 2 Cytoskeletal MYCL1 v-myc myelocytomatosis Tumor viral oncogene homolog suppressor 1, lung carcinoma derived (avian) EDN2 endothelin 2 Cell signalling PPIE peptidylprolyl isomerase E Metabolism (cyclophilin E) MFSD2 major facilitator Trafficking superfamily domain containing 2 HIVEP3 human immunodeficiency Immune virus type I enhancer binding protein 3 4.8 1p 6 MTF1 metal-regulatory Transcription transcription factor 1 factor ZC3H12A zinc finger CCCH-type containing 12A KCNQ4 potassium voltage-gated Ion channel channel, KQT-like subfamily, member 4 TRIT1 tRNA Cell isopentenyltransferase 1 proliferation C1orf102 chromosome 1 open Unknown reading frame 102 CSF3R colony stimulating factor 3 Cell receptor (granulocyte) proliferation C1orf149 chromosome 1 open Unknown reading frame 149 SNIP1 Smad nuclear interacting Cell 144

protein 1 signalling DNALI1 dynein, axonemal, light Cytoskeletal intermediate chain 1 GNL2 guanine nucleotide binding GTP binding protein-like 2 (nucleolar) C1orf109 chromosome 1 open Unknown reading frame 109 CDCA8 cell division cycle Cell cycle associated 8 EPHA10 EPH receptor A10 Receptor tyrosine kinase MANEAL mannosidase, endo-alpha- Metabolism like YRDC yrdC domain containing Unknown (E. coli) C1orf122 chromosome 1 open reading frame 122 INPP5B inositol polyphosphate-5- Cell phosphatase, 75kDa signalling SF3A3 splicing factor 3a, subunit Cell 3, 60kDa proliferation FHL3 four and a half LIM Unknown domains 3 UTP11L UTP11-like, U3 small Ubiquinitaio nucleolar n ribonucleoprotein, (yeast) POU3F1 POU class 3 homeobox 1 Transcription factor RRAGC Ras-related GTP binding C GTP binding MYCBP c-myc binding protein Tumor suppressor RHBDL2 rhomboid, veinlet-like 2 Development (Drosophila) C1orf108 akirin 1 Unknown NDUFS5 NADH dehydrogenase Metabolism (ubiquinone) Fe-S protein 5, 15kDa (NADH- coenzyme Q reductase) MACF1 microtubule-actin Cytoskeletal crosslinking factor 1 KIAA0754 hypothetical LOC643314 Unknown BMP8A bone morphogenetic Cell protein 8a proliferation

145

PABPC4 poly(A) binding protein, Cell cytoplasmic 4 (inducible signalling form) HEYL hairy/enhancer-of-split Unknown related with YRPW motif- like NT5C1A 5'-nucleotidase, cytosolic Cell IA proliferation HPCAL4 hippocalcin like 4 Development OXCT2 3-oxoacid CoA transferase Metabolism 2 PPT1 palmitoyl-protein thioesterase 1 RLF rearranged L-myc fusion Cell signalling TMCO2 transmembrane and coiled- Cell coil domains 2 proliferation ZMPSTE24 zinc metallopeptidase Transcription (STE24 homolog, S. factor cerevisiae) SMAP2 small ArfGAP2 G protein ZNF643 zinc finger protein 643 Transcription ZNF642 zinc finger protein 642 factor C1orf176 defects in morphology 1 Unknown homolog (S. cerevisiae) ZNF684 zinc finger protein 684 Transcription factor RIMS3 regulating synaptic Trafficking membrane exocytosis 3 NFYC nuclear transcription factor Transcription Y, gamma factor CITED4 Cbp/p300-interacting Cell transactivator, with signalling Glu/Asp-rich carboxy- terminal domain, 4 CTPS CTP synthase Oncogene SLFNL1 schlafen-like 1 Cell signalling SCMH1 sex comb on midleg Cell homolog 1 (Drosophila) proliferation RSPO1 R-spondin homolog Unknown GJA9 gap junction protein, alpha Cell adhesion 9, 59kDa BMP8B bone morphogenetic Cell

146

protein 8b proliferation MRPS15 mitochondrial ribosomal Respiration protein S15 CAP1 CAP, adenylate cyclase- Cytoskeletal associated protein 1 (yeast GRIK3 glutamate receptor, Metabolism ionotropic, kainate 3 RUNX1T1 runt-related transcription Cell cycle factor 1; translocated to, 1 (cyclin D-related) SLC26A7 solute carrier family 26, Cell member 7 signalling CNGB3 cyclic nucleotide gated Cell cycle channel beta 3 CNBD1 cyclic nucleotide binding domain containing 1 WDR21C WD repeat domain 21C Unknown MMP16 matrix metallopeptidase 16 Extracellular (membrane-inserted) matrix FAM82B family with sequence Unknown similarity 82, member B SLC7A13 solute carrier family 7, Ion channel (cationic amino acid transporter, y+ system) member 13 WWP1 WW domain containing E3 Ubiquintatio 4.8 8q 5 ubiquitin protein ligase 1 n REXO1L1 REX1, RNA exonuclease 1 Cell homolog (S. cerevisiae)- proliferation like 1 PSKH2 protein serine kinase H2 Kinase ATP6V0D2 ATPase, H+ transporting, Trafficking lysosomal 38kDa, V0 subunit d2 CA1 carbonic anhydrase I Metabolism CA3 carbonic anhydrase III, muscle specific C8orf59 chromosome 8 open Unknown reading frame 59 E2F5 E2F transcription factor 5, Transcription p130-binding factor RALYL RALY RNA binding Cell protein-like signalling LOC646486 fatty acid binding protein Unknown 12 147

IMPA1 inositol(myo)-1(or 4)- Cell monophosphatase 1 signalling SLC10A5 solute carrier family 10 Trafficking (sodium/bile acid cotransporter family), member 5 ZFAND1 zinc finger, AN1-type Transcription domain 1 factor CHMP4C chromatin modifying Cell protein 4C proliferation SNX16 sorting nexin 16 Trafficking FABP9 fatty acid binding protein Metabolism 9, testis PMP2 peripheral myelin protein 2 Cell signalling FABP5 fatty acid binding protein 5 Metabolism (psoriasis-associated) ZBTB10 zinc finger and BTB Transcription domain containing 10 factor LOC389672 No name Unknown ZNF704 zinc finger protein 704 Transcription factor TPD52 tumor protein D52 Cell proliferation FAM164A family with sequence Unknown similarity 164, member A DECR1 2,4-dienoyl CoA reductase Respiration 1, mitochondrial MRPS28 mitochondrial ribosomal protein S28 FABP4 fatty acid binding protein Metabolism 4, adipocyte KIAA1429 No name Unknown FAM92A1 family with sequence similarity 92, member A1 PPM2C protein phosphatase 2C, Cell magnesium-dependent, signalling catalytic subunit CDH17 cadherin 17, LI cadherin Cytoskeletal (liver-intestine) GEM GTP binding protein GTP binding overexpressed in skeletal muscle NB non-metastatic cells 1, Cell protein (NM23A) 148

expressed in signalling HEY1 hairy/enhancer-of-split related with YRPW motif 1 CA2 carbonic anhydrase II Metabolism STMN2 stathmin-like 2 RAD54B RAD54 homolog B (S. Cell cerevisiae) signalling RIPK2 receptor-interacting serine- Kinase threonine kinase 2 NECAB1 N-terminal EF-hand Cell calcium binding protein 1 signalling TMEM55A transmembrane protein Cell adhesion 55A TMEM64 transmembrane protein 64 OTUD6B OTU domain containing Unknown 6B RBM12B RNA binding motif protein Cell 12B signalling IL7 interleukin 7 Immune CPNE3 copine III Trafficking PAG1 phosphoprotein associated with glycosphingolipid microdomains 1 OSGIN2 oxidative stress induced Cell growth inhibitor family proliferation member 2 TMEM67 transmembrane protein 67 Cell adhesion CA13 carbonic anhydrase XIII Metabolism PDCD5 programmed cell death 5 Apoptosis CCNE1 cyclin E1 Cell cycle C19orf12 chromosome 19 open Unknown reading frame 12 TSHZ3 teashirt zinc finger Transcription homeobox 3 factor UQCRFS1 ubiquinol-cytochrome c Ubiqintation reductase, Rieske iron- 4.7 19q 10 sulfur polypeptide 1 POP4 processing of precursor 4, Cell ribonuclease P/MRP proliferation subunit (S. cerevisiae) PLEKHF1 pleckstrin homology Cytoskeletal domain containing, family F (with FYVE domain) member 1

149

C19orf2 chromosome 19 open Unknown reading frame 2 ZNF536 zinc finger protein 536 Transcription ZNF507 zinc finger protein 507 factor DPY19L3 dpy-19-like 3 (C. elegans) Unknown ANKRD27 ankyrin repeat domain 27 Cytoskeletal (VPS9 domain) SLC39A4 solute carrier family 39 Transcription (zinc transporter), member factor 4 SCRIB scribbled homolog Cell (Drosophila) signalling PLEC1 plectin 1, intermediate Cytoskeletal filament binding protein 500kDa ZNF250 zinc finger protein 250 Transcription factor KIAA1688 No name Unknown LRRC14 leucine rich repeat containing 14 LRRC24 leucine rich repeat containing 24 MGC70857 chromosome 8 open reading frame 82 MFSD3 major facilitator Trafficking 4.7 8q 13 superfamily domain containing 3 PPP1R16A protein phosphatase 1, Cell regulatory (inhibitor) signalling subunit 16A FOXH1 forkhead box H1 Transcription factor CYHR1 cysteine/histidine-rich 1 Unknown KIFC2 kinesin family member C2 Cytoskeletal CPSF1 cleavage and Cell polyadenylation specific proliferation factor 1, 160kDa GPR172A G protein-coupled receptor G-protein 172A coupled receptor ADCK5 aarF domain containing Kinase kinase 5 SCRT1 scratch homolog 1, zinc Transcription finger protein (Drosophila factor

150

FBXL6 F-box and leucine-rich Unknown repeat protein 6 DGAT1 diacylglycerol O- Metabolism acyltransferase homolog 1 HSF1 heat shock transcription Homeostasis factor 1 SHARPIN SHANK-associated RH Unknown domain interactor BOP1 block of proliferation 1 Cell proliferation C8orf30A chromosome 8 open Unknown reading frame 30A KIAA1833 No name MAF1 MAF1 homolog (S. cerevisiae) CYC1 cytochrome c-1 Metabolism OPLAH 5-oxoprolinase (ATP- Cell hydrolysing) signalling EXOSC4 exosome component 4 Cell proliferation SPATC1 cell division cycle 20 Cell cycle homolog (S. cerevisiae) GRINA glutamate receptor, Metabolism ionotropic, N-methyl D- aspartate-associated protein 1 (glutamate binding) PARP10 poly (ADP-ribose) Transcription polymerase family, factor member 10 EPPK1 epiplakin 1 Cytoskeletal NRBP2 nuclear receptor binding Cell protein 2 signalling PUF60 poly-U binding splicing factor 60KDa ZNF707 zinc finger protein 707 Transcription factor TSTA3 tissue specific Cell transplantation antigen signalling P35B ZNF623 zinc finger protein 623 Transcription factor TIGD5 tigger transposable element Unknown derived 5 PYCRL pyrroline-5-carboxylate Cell 151

reductase-like proliferation EEF1D eukaryotic translation Cell elongation factor 1 delta signalling (guanine nucleotide exchange protein) NAPRT1 nicotinate phosphoribosyltransferase domain containing 1 C8orf73 chromosome 8 open Unknown reading frame 73 GSDMDC1 gasdermin D Cytoskeletal MAFA v-maf musculoaponeurotic Oncogene fibrosarcoma oncogene homolog A ZC3H3 zinc finger CCCH-type Transcription containing 3 factor TOP1MT topoisomerase (DNA) I, Respiration mitochondrial RHPN1 rhophilin, Rho GTPase GTP binding binding protein 1 ZFP41 zinc finger protein 41 Transcription homolog factor LI4 No name Unknown ZNF696 zinc finger protein 696 Transcription factor VPS28 vacuolar protein sorting 28 Trafficking homolog (S. cerevisiae) ZNF251 zinc finger protein 251 Transcription ZNF34 zinc finger protein 34 factor RPL8 ribosomal protein L8 Respiration ZNF517 zinc finger protein 517 Transcription ZNF7 zinc finger protein 7 factor COMMD5 COMM domain containing Unknown 5 ZNF16 zinc finger protein 16 Transcription factor C8orf33 chromosome 8 open Unknown reading frame 33 SCXB scleraxis homolog B RECQL4 RecQ protein-like 4 MAPK15 mitogen-activated protein Kinase kinase 15 GPAA1 glycosylphosphatidylinosit Metabolism ol anchor attachment

152

protein 1 homolog FAM83H family with sequence Unknown similarity 83, member H SCXA scleraxis homolog A NFKBIL2 nuclear factor of kappa Cell light polypeptide gene proliferation enhancer in B-cells inhibitor-like 2 GPT glutamic-pyruvate Metabolism transaminase (alanine aminotransferase) GNG4 guanine nucleotide binding Cell protein (G protein), gamma signalling 4 CHML choroideremia-like (Rab Trafficking escort protein 2) FH fumarate hydratase Metabolism EXO1 exonuclease 1 Cell KMO kynurenine 3- signalling monooxygenase (kynurenine 3- hydroxylase) RBM34 RNA binding motif protein Unknown 34 LYST lysosomal trafficking Trafficking regulator MTR 5-methyltetrahydrofolate- Cell homocysteine proliferation 4.6 1q 5 methyltransferase ERO1LB ERO1-like beta (S. Cell cerevisiae) signalling IRF2BP2 interferon regulatory factor Immune 2 binding protein 2 KCNK1 potassium channel, Ion channel subfamily K, member 1 SLC35F3 solute carrier family 35, Unknown member F3 C1orf31 chromosome 1 open reading frame 31 TARBP1 TAR (HIV-1) RNA Cell binding protein 1 signalling TOMM20 translocase of outer Unknown mitochondrial membrane 20 homolog (yeast) ARID4B AT rich interactive domain 153

4B (RBP1-like) GGPS1 geranylgeranyl Cell diphosphate synthase 1 signalling TBCE tubulin folding cofactor E Cytoskeletal B3GALNT2 beta-1,3-N- Metabolism acetylgalactosaminyltransf erase 2 NID1 nidogen 1 Cell adhesion GPR137B G protein-coupled receptor G-protein 137B coupled receptor EDARADD EDAR-associated death Apoptosis domain LGALS8 lectin, galactoside-binding, Metabolism soluble, 8 HEATR1 HEAT repeat containing 1 Homeostasis ACTN2 actinin, alpha 2 Cytoskeletal ZP4 zona pellucida Development glycoprotein 4 CHRM3 cholinergic receptor, Cell muscarinic 3 signalling FMN2 formin 2 Cytoskeletal GREM2 gremlin 2, cysteine knot Cell superfamily, homolog proliferation RGS7 regulator of G-protein Cell signaling 7 signalling WDR64 WD repeat domain 64 Unknown MAP1LC3C microtubule-associated Cytoskeletal protein 1 light chain 3 gamma PLD5 phospholipase D family, Cell member 5 signalling RYR2 ryanodine receptor 2 (cardiac) OPN3 opsin 3 G-protein coupled receptor NLRP3 NLR family, pyrin domain Unknown containing 3 SMYD3 SET and MYND domain Cytoskeletal containing 3 4.4 1q 5 ZNF238 zinc finger protein 238 Transcription factor C1orf100 chromosome 1 open frame Unknown

154

100 ADSS adenylosuccinate synthase Metabolism C1orf101 chromosome 1 open Unknown reading frame 101 FAM152A family with sequence similarity 152, member A FAM36A family with sequence similarity 36, member A HNRNPU heterogeneous nuclear Cell ribonucleoprotein U signalling (scaffold attachment factor A) EFCAB2 EF-hand calcium binding domain 2 KIF26B kinesin family member Cytoskeletal 26B TFB2M transcription factor B2, Transcription mitochondrial factor C1orf71 chromosome 1 open Unknown reading frame 71 SCCPDH saccharopine Metabolism dehydrogenase (putative) AHCTF1 AT hook containing Transcription transcription factor 1 factor ZNF670 zinc finger protein 670 ZNF669 zinc finger protein 669 FLJ45717 No name Unknown ZNF124 zinc finger protein 124 Transcription factor VN1R5 vomeronasal 1 receptor 5 G-protein coupled receptor ZNF496 zinc finger protein 496 Transcription factor C1orf150 chromosome 1 open Unknown reading frame 150 TRIM58 tripartite motif-containing Cell 58 signalling LOC646627 No name Unknown SH3BP5L SH3-binding domain Cell protein 5-like signalling ZNF672 zinc finger protein 672 Transcription ZNF692 zinc finger protein 692 factor PGBD2 piggyBac transposable Unknown

155

element derived 2 SLC2A3 solute carrier family 2 Trafficking (facilitated glucose transporter), member 3 C3AR1 complement component 3a Immune receptor 1 APOBEC1 apolipoprotein B mRNA Metabolism editing enzyme, catalytic polypeptide 1 NANOG Nanog homeobox Transcription factor CLEC4C C-type lectin domain Cell adhesion family 4, member C CLEC4A C-type lectin domain family 4, member A 4.3 12p 8 GDF3 growth differentiation Cell factor 3 proliferation DPPA3 developmental pluripotency associated 3 SLC2A14 solute carrier family 2 Trafficking (facilitated glucose transporter), member 14 FOXJ2 forkhead box J2 Transcription factor NECAP1 NECAP endocytosis Trafficking associated 1 ZNF705A zinc finger protein 705A Transcription factor FAM90A1 family with sequence Unknown similarity 90, member A1 RAE1 RAE1 RNA export 1 homolog (S. pombe) RBM38 RNA binding motif protein 38 HMG1L1 high-mobility group Cell (nonhistone chromosomal) signalling protein 1-like 1 4.2 20q 7 ZBP1 Z-DNA binding protein 1 C20orf85 chromosome 20 open Unknown reading frame 85 RAB22A RAB22A, member RAS Cell oncogene family proliferation APCDD1L adenomatosis polyposis coli down-regulated 1-like NPEPL1 aminopeptidase-like 1 Cell 156

signalling TH1L TH1-like (Drosophila) Cell proliferation CTSZ cathepsin Z Oncogene ATP5E ATP synthase, H+ Ion channel transporting, mitochondrial F1 complex, epsilon subunit PHACTR3 phosphatase and actin Cytoskeletal regulator 3 SYCP2 synaptonemal complex Cell protein 2 signalling PPP1R3D protein phosphatase 1, Unknown regulatory (inhibitor) subunit 3D C20orf177 chromosome 20 open reading frame 177 CDH26 cadherin-like 26 Cytoskeletal C20orf197 chromosome 20 open Unknown reading frame 197 CDH4 cadherin 4, type 1, R- Cytoskeletal cadherin (retinal) LSM14B LSM14B, SCD6 homolog Unknown B (S. cerevisiae) SS18L1 synovial sarcoma Cell translocation gene on signalling chromosome 18-like 1 GTPBP5 GTP binding protein 5 G-protein (putative) coupled receptor HRH3 histamine receptor H3 Cell signalling OSBPL2 oxysterol binding protein- G-protein like 2 coupled receptor ADRM1 adhesion regulating Cell molecule 1 Adhesion LAMA5 laminin, alpha 5 Cytoskeletal RPS21 ribosomal protein S21 Cell proliferation CABLES2 Cdk5 and Abl enzyme Cell cycle substrate 2 C20orf151 chromosome 20 open Unknown reading frame 151 C20orf200 chromosome 20 open 157

reading frame 200 C20orf166 chromosome 20 open reading frame 166 SLCO4A1 solute carrier organic anion Ion channel transporter family, member 4A1 NTSR1 neurotensin receptor 1 G-protein (high affinity) coupled receptor C20orf20 chromosome 20 open Unknown reading frame 20 OGFR opioid growth factor Cell receptor proliferation TCFL5 transcription factor-like 5 Transcription (basic helix-loop-helix) factor C20orf11 chromosome 20 open Unknown reading frame 11 BHLHB4 basic helix-loop-helix Transcription domain containing, class factor B, 4 YTHDF1 YTH domain family, Unknown member 1 NKAIN4 Na+/K+ transporting Ion channel ATPase interacting 4 ARFGAP1 ADP-ribosylation factor G-protein GTPase activating protein coupled 1 receptor COL20A1 collagen, type XX, alpha 1 Cytoskeletal C20orf149 chromosome 20 open Unknown reading frame 149 SRMS src-related kinase lacking Kinase C-terminal regulatory tyrosine and N-terminal myristylation sites C20orf195 chromosome 20 open Unknown reading frame 195 PRIC285 peroxisomal proliferator- Transcription activated receptor A factor interacting complex 285 GMEB2 glucocorticoid modulatory DNA element binding protein 2 replication STMN3 stathmin-like 3 Cytoskeletal RTEL1 regulator of telomere DNA elongation helicase 1 damage ARFRP1 ADP-ribosylation factor Cell 158

related protein 1 signalling ZGPAT zinc finger, CCCH-type Transcription with G patch domain factor LIME1 Lck interacting Cell transmembrane adaptor 1 signalling ZBTB46 zinc finger and BTB Transcription domain containing 46 factor C20orf135 chromosome 20 open Unknown reading frame 135 TPD52L2 tumor protein D52-like 2 Trafficking DNAJC5 DnaJ (Hsp40) homolog, Unknown subfamily C, member 5 UCKL1 uridine-cytidine kinase 1- Kinase like 1 ZNF512B zinc finger protein 512B Transcription factor SAMD10 sterile alpha motif domain Unknown containing 10 PRPF6 PRP6 pre-mRNA mRNA processing factor 6 splicing homolog (S. cerevisiae) PRR17 proline rich 17 Unknown TCEA2 transcription elongation Transcription factor A (SII), 2 factor LOC198437 chromosome 20 open Unknown reading frame 201 NPBWR2 neuropeptides B/W G-protein receptor 2 coupled receptor MYT1 myelin transcription factor Transcription 1 factor PCMTD2 protein-L-isoaspartate (D- Metabolism aspartate) O- methyltransferase domain containing 2 VAPB VAMP (vesicle-associated Trafficking membrane protein)- associated protein B and C GNAS GNAS complex locus G protein SPO11 SPO11 meiotic protein Cell division covalently bound to DSB homolog (S. cerevisiae) OPRL1 opiate receptor-like 1 G-protein coupled receptor 159

EDN3 endothelin 3 Development TNFRSF6B tumor necrosis factor Anti- receptor superfamily, apoptosis member 6b, decoy DIDO1 death inducer-obliterator 1 Anti- apoptosis RGS19 regulator of G-protein Transcription signaling 19 factor KCNQ2 potassium voltage-gated Ion channel channel, KQT-like subfamily, member 2 COL9A3 collagen, type IX, alpha 3 Cytoskeletal PMEPA1 prostate transmembrane Membrane protein, androgen induced 1 STX16 syntaxin 16 Trafficking TAF4 TAF4 RNA polymerase II, Transcription TATA box binding protein factor (TBP)-associated factor, 135kDa BIRC7 baculoviral IAP repeat- Anti- containing 7 apoptosis SLC2A4RG SLC2A4 regulator Unknown ZNF831 zinc finger protein 831 Transcription factor C20orf59 chromosome 20 open Unknown reading frame 59 CTCFL CCCTC-binding factor Transcription (zinc finger protein)-like factor CHRNA4 cholinergic receptor, Ion channel nicotinic, alpha 4 GATA5 GATA binding protein 5 Transcription factor PSMA7 proteasome (prosome, Ubiquitinatio macropain) subunit, alpha n type, 7 TUBB1 tubulin, beta 1 Cytoskeletal PTK6 PTK6 protein tyrosine Kinase kinase 6 SLMO2 slowmo homolog 2 Development (Drosophila) BMP7 bone morphogenetic Multiple protein 7 EEF1A2 eukaryotic translation Development elongation factor 1 alpha 2 160

SOX18 SRY (sex determining Development region Y)-box 18 PCK1 phosphoenolpyruvate Metabolism carboxykinase 1 (soluble) DEFB136 Defensin beta 136 Cytotoxic DEFB137 Defensin beta 137 DEFB130 Defensin beta 130 FDFT1 farnesyl-diphosphate Metabolism farnesyltransferase 1 DEFB134 defensin, beta 134 Cytotoxic AMAC1L2 acyl-malonyl condensing Unknown enzyme 1-like 2 NEIL2 nei like 2 (E. coli) C8orf13 family with sequence Unknown 0.9 8p 3 similarity 167, member A DUB3 deubiquitinating enzyme 3 Ubiquitinatio n CTSB cathepsin B Lysosomal MTMR9 myotubularin related Cytoskeletal protein 9 BLK B lymphoid tyrosine kinase Immune LOC728957 zinc finger protein 705D Unknown GATA4 GATA binding protein 4 Transcription factor DEFB103A defensin, beta 103A Cytotoxic DEFB106A defensin, beta 106A DEFB104B defensin, beta 104B SPAG11B sperm associated antigen Unknown 11B DEFB106B defensin, beta 106B Cytotoxic DEFB105B defensin, beta 105B 0.8 8p 6 DEFB105A defensin, beta 105A DEFB103B defensin, beta 103B DEFB107B defensin, beta 107B SPAG11A sperm associated antigen Unknown 11A DEFB104A defensin, beta 104A Cytotoxic DEFB107A defensin, beta 107A DEFB4 defensin, beta 4 PCDHA1 protocadherin alpha 1 Cell adhesion PCDHA2 protocadherin alpha 2 0.6 5p 5 PCDHA3 protocadherin alpha 3 PCDHA4 protocadherin alpha 4 PCDHA5 protocadherin alpha 5 161

PCDHA6 protocadherin alpha 6 PCDHA7 protocadherin alpha 7 PCDHA8 protocadherin alpha 8 PCDHA9 protocadherin alpha 9 PCDHA10 protocadherin alpha 10 PCDHA11 protocadherin alpha 11 Table 2. Gene list for copy number gains and losses. The genes listed are found in the “block” area of the chromosome. Each “block” has the same copy number for each gene listed.

162