Supplementary Methods
Total Page:16
File Type:pdf, Size:1020Kb
Supplementary Methods Patient Cohorts and Clinicopathological Review AOCS and Clinical Cohort AOCS is a population-based, case control study of ovarian cancer which recruited eligible women aged 18-79 years with newly diagnosed epithelial ovarian cancer (including primary peritoneal and fallopian tube cancer) through 20 gynaecologic oncology units across all Australian states, between January 2002 and June 2006. Women who were unable to provide informed consent due to illness, language difficulties or mental incapacity were excluded, as were those whose diagnosis was not histopathologically confirmed. Detailed clinical and follow-up data was obtained from medical records at predefined intervals: post surgery, post primary chemotherapy, at 6-monthly intervals to 5 years and annually thereafter. All treatment details and clinical assessments were recorded on case report forms using Good Clinical Practice (GCP) guidelines (ICH E6: Good Clinical Practice: Consolidated Guidance. 1996; http://www.fda.gov/oc/gcp/guidance.html), and included chemotherapy regimen, imaging details and pre- and post-treatment serum CA125 levels. The CA125 assay type and upper limit of normal for each measurement were recorded and assignment of relapse date was based on Gynecologic Cancer Intergroup (GCIG) criteria1. Clinical details for the cases from The Royal Brisbane Hospital, and Westmead Hospital were obtained retrospectively from medical records. The majority of patients with invasive ovarian cancer (n= 267) underwent laparotomy for diagnosis, staging and debulking and subsequently received first-line platinum/taxane based chemotherapy. In most cases, tumor examined was excised at the time of primary surgery, prior to the administration of chemotherapy. Eighteen patients who had neoadjuvant, platinum based chemotherapy were also included in the study as were 18 patients with low malignant potential disease. 1 Clinical Definitions Surgical staging was assessed in accordance with FIGO (Fédération Internationale des Gynaecologistes et Obstetristes) classification. Optimal debulking was defined as less than 1 cm (diameter) residual disease, and sub-optimal debulking was more than 1 cm (diameter) residual disease. Progression-free survival (PFS) was defined as the time interval between the date of diagnosis and the first confirmed sign of disease recurrence based on GCIG definitions1. Overall survival (OS) was defined as the time interval between the date of histological diagnosis and the date of death from any cause. Patients who died from causes deemed unrelated to their malignancy were censored for survival analysis. Pathology Review Individual cases were subject to review by either light microscopy assessment of representative formalin fixed tissue taken adjacent to arrayed tissue material or diagnostic slides (n=202), or by extraction of information from the original pathology reports when slides were not available (n=74). Malignant cases subject to pathology review were re-graded according to Silverberg classification2, whilst LMP tumors were graded as either low or high grade. Sample Processing and Microarray Frozen tissue specimens for microarray analysis were collected at the time of primary debulking surgery and snap frozen in liquid nitrogen. In most cases (241/285) serial tissue sections (12 x 100 µm) were cut by cryotome for RNA extraction taking a 5 µm section for hematoxylin and eosin (H&E) before and after serial sectioning for assessment of tissue content. In the remaining cases (44/285) RNA was extracted from frozen tumor pieces. For cases where tumor percentage was known 92.5% (223/241) contained ≥50% tumor. In the remaining cases (7.5%, 18/241), tumor content was between 30 and 49%. RNA was extracted using TRIZOL reagent (Invitrogen) and then further purified by column chromatography using a Qiagen RNeasy spin column 2 (Qiagen, Valencia, CA). Total RNA quality was assessed using an Agilent Bioanalyzer 2100 Pico assay (Agilent, Palo Alto, CA) and Nanodrop Spectrophotometer (Nanodrop Technologies, Delaware). Only samples with a Bioanalyzer degradation factor (DF) less than 8 and a NanoDrop A260/A280 ratio between 1.8 and 2.1 were used for further analysis. A single round amplification was employed to generate cRNA from total RNA extracts, hybridized to Affymetrix U133 Plus 2 array and scanned in accordance with standard Affymetrix protocols (Affymetrix, Santa Clara, CA). Microarray Data Processing and Analysis Data Preprocessing and Quality Control Image analysis and probe quantitation was done using Affymetrix Gene Chip Operating Software (GCOS) using a scaling factor of 150. R packages “Simple Affy” and “Affy” available form the Bioconductor project (www.bioconductor.org) were used for quality control and normalization respectively. CEL files were subject to quality control before batch normalization using the RMA (robust multiarray average algorithm) method3. Arrays passed QC if 3’ to 5’ ratios for GAPDH and β-actin did not exceed 1 and 3 respectively, scaling factor did not exceed 3 from the median across all arrays and there was no large deviation in present-absent calls. Batch RMA normalization corrected for systematic biases introduced using the array data generation. Default options of RMA include background correction, quantile normalization and log transformation. Probe Set Filtering Before clustering we removed non-informative and noisy probes using the following criteria: Probes which do not have an expression level above 7.0 (128 in linear scale) in any of the samples and probes with variance less than 0.5 were removed. This filtering step removed all but 8732 probes sets. Consensus k-Means Clustering A consensus k-means clustering method was used to find the optimal 3 clustering configuration. The consensus k-means method was composed of three parts: 1) k-means clustering using Euclidean distance as a correlation metric was used to partition samples into k groups using the standard algorithm4. The clustering was repeated 1000 times using different starting positions. 2) Those samples which were co-clustered within the same group in more than 800 of the 1000 permutations were selected as the robust sample set. 3) We did the same procedure for k ranging from 2 to 10. The optimal number of k was decided based on GAP statistic as proposed by Tibshirani et al5. GAP statistic suggests 6 as the ideal number of clusters in our dataset (Figure SM1). Additionally we investigated other indexes (other than the GAP statistic) proposed for estimating the number of clusters. Milligan & Cooper6 carried out a comprehensive simulation comparison of 30 different procedures. We used those best performing procedures described by Milligan and Cooper on the ovarian array data with results shown in Table SM1. DB and SSI are similar indexes. CAL and DB suggests k = 6 as the number of clusters in the data while SSI suggests k = 5. 4 Figure SM1. A function of GAP statistic is plotted along the y axis for different values of cluster number k. The optimal cluster number is the smallest k for which the Gap(k) – Gap(k+1) – s(k+1) is positive suggesting that we have 6 clusters in the data. Table SM1. Traditional cluster indexes to decide the number of clusters in our dataset. CAL is the index proposed by Calinski and Harabasz7. Index k = 2 k = 3 k = 4 k = 5 k = 6 k = 7 k = 8 k = 9 k = 10 CAL 25.19 23.28 20.24 28.77 29.45 23.71 21.10 18.99 12.11 DB 284.06 255.44 254.79 241.78 282.73 249.04 290.58 255.05 271.19 SSI 0.15 0.19 0.22 0.24 0.23 0.31 0.43 0.31 0.44 5 Extending the Consensus Cluster by Class Prediction To extend the cluster we used a class prediction approach, whereby the robust consensus set was used to build a multivariate function capable of predicting those samples which could not be consensus clustered. Diagonal linear descriminant analysis and k-nearest neighbors (k-NN) are methods that have been found to perform robustly in comparison to other class prediction methods8. Optimized classifier was developed using a LOOCV method as explained below. 1) Order genes based on the ratio of between and within group sum of squares after removing a sample. 2) Take the top n genes as features of the classifier. 3) Test the classifier performance on the left out sample. 4) Repeat steps 1 to 3 for every sample and the number of total errors made is an estimate of the classifier accuracy. 5) Repeat steps 1 to 4 for different values of n to decide the number of features to use. 6) Accuracy of both DLDA and k-NN as a function of the number of genes used is given in Figures SM2 and SM3. 7) Both DLDA and k-NN made 3 errors out of 171 (98.2% accuracy) when we used top ~750 genes features for the classifier. The left out samples from the original consensus k-means clustering were added to the clusters if both the classifiers agreed on the class membership of the sample. 6 Figure SM2. Number of errors made in LOOCV as a function of the number of probes used as features of the classifier. 7 Figure SM3. Number of errors made in LOOCV as a function of the number of features used. Validation of molecular subtypes in an independent published dataset A class prediction algorithm (DLDA), trained on the 251 consensus sample set and 6 molecular subtype annotation, was used to predict the subtype identity of samples from an independent microarray dataset described by Dressman and colleagues9. The classifier used only those genes found to be significantly differentially expressed between subtypes (Supplementary Table 2) and where probe set identifiers matched between the two Affymetrix gene chips (U133A and U133Plus2). The classifier was tested by leave-one-out cross validation of the training dataset, giving a CV accuracy of 98.2%. Samples from the independent dataset were then tested providing subtype 8 annotation for data visualisation and survival analysis (Figure 5C, D and E, manuscript).