S1: Patient samples and associated genomic information Tumor samples were ascertained from three independent cohorts. 1) The TCGA cohort of HGSC was collated by investigators from international hospital and universities, mostly from North America (1). Germline, somatic mutation and methylation status of HRR pathway members was available to us for 316 tumors and patients. From these we selected cases that had either somatic or germline pathogenic BRCA1/2 mutation or no apparent disruption of the BRCA pathway (wild-type) for expression and copy number analysis, including: • 280 cases with gene expression profiles, including 210 cases for which the molecular subtypes were identified in a previous publication (2) (27 BRCA1 mutated, 28 BRCA2 mutated, 10 BRCA1 methylated and 145 wild type). • 204 cases with copy number data (34 BRCA1 mutated, 30 BRCA2 mutated, 140 wild-type). • Gene expression profiles of additional 196 samples for clinical validation of the BRCA1/2 classifier. Mutation status of these samples was not available at the time of this study. 2) The Australian Ovarian Cancer Study (AOCS) is a population based case-control cohort of ovarian cancer patients ascertained at diagnosis between 2002 and 2006 (3). Research associated with the use of AOCS samples and clinical data was approved by the Human Research Ethics Committees at the Peter MacCallum Cancer Centre. A cohort of high grade serous samples was selected from tumors where germline BRCA1 and BRCA2 mutations (4) and Affymetrix U133 2.0 data were available (3). As described below, all 132 tumors were screened for somatic BRCA1/2 mutations using a high-resolution melt analysis (5), and for methylation of the BRCA1, PALB2 and FANCF gene promoters using methylation-sensitive high-resolution melting technology (6, 7) after bisulfite conversion (EpiTect Bisulfite Kit, Qiagen). One hundred and eleven of the 132 tumors profiled could be classified into one of the four molecular subtypes of HGSC previously described (3) and were used in the subsequent gene expression analysis. 3) Gene expression data for a cohort of 61 ovarian cancer tumors with mixed histologies, ascertained from the Memorial Sloan-Kettering Cancer Centre, and generated using in-house cDNA microarray chips manufactured at the National Cancer Institute Microarry Facility, was publically available as supplementary data provided by the journal 1 (8).

1 http://jnci.oxfordjournals.org/content/94/13/990/suppl/DC1

Study ID Gene Exon Nomenclature Location

2476 BRCA1 11 c.3302G>A Germline

3958 BRCA1 2 c.80+3A>C Germline

2410 BRCA2 11 c.[4094G>A] Germline

4256 BRCA2 3 c.[68-7T>A] Germline

1096 BRCA2 22 c.[8830A>T] Germline

1033 BRCA2 15 c.7504C>T Germline

6496 BRCA2 18 c.[7985C>A] Germline

66211 BRCA1 15 c.4669G>C Germline

20392 BRCA1 7 c.[305C>G] Germline

32023 BRCA1/BRCA2 17/11 c.[4987-20A>G/c.[5446A>C] Germline

Supplementary Table 1 : Germline unclassified sequence variants identified in 132 AOCS tumors. Pathogenicity of sequence variants was assessed by reference to the Breast Information Core database (BIC), function prediction algorithims (SIFT, PolyPhen 2.0, AGVGD), splice site prediction algorithms (NNSPLICE, MaxEntScan, Splice Site Finder-Like, Human Splicing Finder) and literature searches as necessary. This process was standardised by use of the Alamut program (Interactive Biosoftware) as described in (4). 1 also found to be methylated at the BRCA1 gene promoter. 2 co-existing BRCA1 pathogenic mutation 3 co-existing BRCA2 pathogenic mutation, and BRCA1 gene promoter methylation

S2: Somatic BRCA1/2 mutation detection using High-Resolution Melting (HRM) Analysis

DNA extracted from fresh-frozen primary tumor tissue was screened for mutations in all coding exons, and intron-exon boundaries, of BRCA1 and BRCA2 using high resolution melting (HRM) analysis, as previously described (5) with the LightCycler®480 thermocycler (Roche). The success of the melting curve analysis relies on a heteroduplex being formed by the wild-type and mutant DNA sequences during the PCR step. To account for potential loss of heterozygosity at either BRCA1 or BRCA2 locus, which would result in a duplex being formed between two similar DNA strands (i.e. two mutant strands in the absence of a wild-type strand), needle macrodissected or high tumor content samples (≥80%tumor) were mixed with a 4ng/ul wild-type control DNA in a 1:1 ratio before being added to the assay. Data output was analysed with the LightCycler®480 Software (Version 1.5.0.39; Roche) and using the Gene Scanning workflow. The melting curves for each amplicon were normalised, and the temperature-shift differences plotted. Samples in which a melt profile differed from that of the wild- type control were selected for Sanger DNA sequence analysis.

The HRM product of selected amplicons was PCR amplified, and sequenced using the BigDye Terminator v3.1 (Applied Biosystems) assay. After purification and re-suspension in HiDi Formamide solution (Applied Biosystems) the samples underwent capillary electrophoresis on an ABI3730 sequencer (Applied Biosystems). Mutant sequences were imported into the SequencherTM 4.10.1 software (Gene Codes) for comparison to a wild-type sequence. Mutations were confirmed in a second independent analysis by sequencing of both the forward and reverse strands.

S3: Methylation-sensitive HRM analysis

Tumor DNA that was used for HRM analysis also underwent a methylation-sensitive high-resolution melting analysis as described previously (6, 7). Briefly, 200ng of DNA was bisulfite converted using the EpiTect Bisulfite Kit (Qiagen) as per the manufacturer’s instructions. Bisulfite converted DNA was added to a mastermix utilising HotStarTaq DNA polymerase containing forward and reverse primers for each assay (Supplementary Table 2) and 5 μmol/L SYTO 9 (Life Technologies, Carlsbad, CA). Assays were performed on the Rotor-Gene 6000 (Corbett, Sydney, Australia), with slight variations in cycling conditions depending on the target (Supplementary Table 3). Temperature was increased by 0.2°C per second during the HRM step. Samples were assayed in duplicate. Appropriate DNA methylation standard controls were included in each assay (in duplicate) for a comparison of extent of methylation during the analysis; 100%, 50%, 25%,10%, 5%, 1% or 0% of fully methylated DNA in unmethylated DNA, as well as a no template control.

HRM analysis was performed with the software provided for the Rotor-Gene 6000 (Corbett). For each sample the negative first derivative of the fluorescence over temperature were plotted; the resulting peaks representing the ratio of both methylated and unmethylated DNA in the amplified product. Melting curves were then compared to those of the DNA methylation standard controls, and the extent of methylation estimated based on the shared features between the samples and the controls (6, 7).

Gene Forward Primer Reverse Primer Product Size BRCA1 TTGTTGTTTAGCGGTAGTTTTTTGGTT* CAATCGCAATTTTAATTTATCTATAATTCCC* 81bp PALB2 TTTTCGGTTTAGGGTTAATTGGGTT CACCTTTTCCTTCTCCTCACAACTAAA 135bp FANCF ATTGATATGTATTTCGATTAATAGTATTGT ATCCAAATACTACAAAAAAAATTCCATAAA# 149bp

Supplementary Table 2: Primers for the methylation-sensitive HRM analysis. * These primers were published in Wong et al. (2011) (9). # This primer was published in Taniguchi et. al (2003) (10).

BRCA1 Assay PALB2 Assay FANCF Assay Temperature Time Cycles Temperature Time Cycles Temperature Time Cycles 95°C 15 minutes 1 95°C 15 minutes 1 95°C 15 minutes 1 95°C 10 seconds 50 95°C 20 seconds 50 95°C 20 seconds 50 61°C 10 seconds 62°C 20 seconds 56°C 25 seconds 72°C 20 seconds 72°C 30 seconds 72°C 20 seconds 95°C 1 minute 1 95°C1 minute197°C1 minute1 65-95°C 1 70-95°C 1 65-95°C 1

Supplementary Table 3: Cycling conditions for the methylation-sensitive HRM analysis

Twenty-three cases showed evidence of methylation in the analysis, although in two cases this was deemed to be not biologically significant due to the fact that both cases carried a germline BRCA2 pathogenic mutation (Supplementary Table 4).

Study ID Estimated % of methylated alleles in Estimated % of tumor material in DNA Other BRCA DNA event 958 <5% MD BRCA2 GL PATH 3202 ~ 50% 50% BRCA2 GL PATH 7492 ~ 50% 60% - 3102 ~15-20% 60% - 3133 ~10% 50% - 5154 ~20% 80% - 1024 ~ 50% 90% - 7329 ~ 50% 40% - 451 ~90% 80% - 1806 ~50% 60-70% - 1978 ~60% MD - 3612 ~100% 90% - 3793 ~50% 80% - 4465 ~50% 70% - 9152 ~50% 85-90% - 6483 ~80% 50-60% - 5349 ~25-30% 90-100% - 4051 ~30% 70% - 2738 ~30% MD - 6621 ~30% 50% BRCA1 UV 3961 ~20% MD - 3843 ~40% 60-80% - 6977 ~30% 70% -

Supplementary Table 4: BRCA1 gene promoter methylation detected in the 132 AOCS cases by methylation-sensitive HRM analysis. GL = germline; UV = unclassified variant; MD = macro-dissected

GL BRCA1/2 Somatic BRCA1/2 BRCA1 methylation Wild-Type positive positive positive Total Cases 21 8 21 82 Age at Diagnosis (years) Mean 55.01 62.48 55.57 62.79 Standard Deviation 8.71 7.17 9.31 9.76 Primary Site Ovary 81.0% 62.5% 85.7% 74.4% Fallopian Tube 4.8% - 4.8% 4.9% Peritoneum 14.3% 37.5% 9.5% 20.7% Histology Serous 100% 100% 100% 100% FIGO Stage I 4.8% - 14.3% 2.4% II - - 14.3% 2.4% III 81.0% 100% 57.1% 82.9% IV 9.5% - 14.3% 6.1% Not Known 4.8% - - 6.1% Tumour Grade 1 - - - 3.7% 2 4.8% - 23.8% 19.5% 3 81.0% 100% 76.2% 69.5% Not Known 14.3% - - 7.3% Residual Disease Nil macroscopic 28.6% - 33.3% 22.0% ≤ 1cm 52.4% 62.5% 38.1% 32.9% > 1cm 19.0% 37.5% 28.6% 39.0% Not Known/Size Not Known - - - 6.1% Time from diagnosis to disease progression (months) Median 18.10 14.60 23.40 12.30 Range 8.51 – 41.98 10.85 – 20.71 5.00 – 37.31 0.76 – 55.59 Time from diagnosis to death (months) Median 47.1 59.60 62.10 37.10 Range 19.20 – 84.00 20.25 – 71.34 14.56 – 66.94 0.76 – 83.47

Supplementary Table 5: Clinico-pathological features of the AOCS cohort used in this analysis

S4: Bioinformatic analyses S4.1 BRCA mutation status and molecular subtypes

A sample was considered to have abnormal BRCA status if the sample had a pathogenic germline or somatic mutation in either BRCA gene, or the BRCA1 promoter was methylated. The molecular subtypes of samples in the TCGA cohort were previously identified using a classification procedure (2). To test the association between molecular subtype and BRCA mutation status within each cohort, a contingency table of counts at each combination of the two factors were constructed (Supplementary Tables 6 and 7). The expected number of count in each cell was calculated under the null hypothesis, which assumes that the two variables are independent of each other. Association plots (11) indicating the deviations from the independence models are shown in Figure 1B and C. In the association plots, each cell of the contingency table is depicted as a rectangle with height equal to the standardized residual (the difference in observed and expected count divided by the square root of the expected count) and with width equal to the standard deviation of the expected count (the square root of the expected count). Thus the area of each rectangles is equal to the difference between the observed and expected count. For each cohort, a focused test of association between BRCA1 mutation and the C2 subtype was undertaken by fitting a Poisson log-linear generalized linear model. This produced a two-sided likelihood ratio test on 1 degree of freedom. BRCA1 disruption was significantly associated with the C2 molecular subtype in both TCGA (p=0.0002) and AOCS (p = 0.017) cohort (Figure 1B, 1C)..

Molecular subtype

C1 C2 C4 C5

BRCA1 8 (12) 14 (6) 9 (11) 6 (9)

BRCA2 12 (9) 5 (4) 7 (8) 4 (7)

Abnormality Abnormality Wild Type 48 (47) 14 (23) 44 (46) 39 (34)

Supplementary Table 6: Contingency table to show the association between molecular subtypes and abnormal status of BRCA in TCGA cohort. A sample was considered to have abnormal BRCA status if the sample had a pathogenic germline or somatic mutation in either BRCA gene, or the BRCA1 promoter was methylated. BRCA mutation status is significantly associated with the molecular subtypes (P=0.0059, Fisher’s Exact test)

Molecular subtype

C1 C2 C4 C5

BRCA1 11 (12) 12 (7) 8 (7) 2 (6)

BRCA2 6 (4) 0 (2) 2 (2) 2 (2)

Abnormality Abnormality Wild Type 25 (26) 12 (15) 14 (15) 17 (13)

Supplementary Table 7: Contingency table to show the relationship between molecular subtypes and abnormal status of BRCA genes in AOCS cohort. Observed (Expected). A sample was considered to have abnormal BRCA status if the sample had a pathogenic germline or somatic mutation in either BRCA gene, or the BRCA1 promoter was methylated. BRCA mutation status is significantly associated with the molecular subtypes (P = 0.051, Fisher’s Exact test). One hundred and eleven of the tumors profiled could be classified into one of the four molecular subtypes of HGSC previously described [3] and were used in the subsequent gene expression analysis.

S4.2 Gene expression data analysis

S4.2.1 Unsupervised analysis and comparison with the Jazaeri analysis

Multidimensional scaling (MDS) plots (12, 13) were used to visualize the similarity of expression profiles between BRCA1 or BRCA2 mutant or sporadic samples in the Jazaeri dataset. When the expression data from all the genes available on the array were used, we observed no segregation between tumors based on genotype (Supplementary Figure 1). The result differed to that reported previously (8), where the authors showed a clear segregation between the mutation-associated tumors. Sample segregation resembling the previous results could be obtained when MDS plots were generated based on those genes identified to be differentially expressed between the BRCA1-associated and BRCA2-associated samples (Supplementary Figure 2), however, this supervised approach is likely to be a result of over-fitting of the data.

We extended the MDS analysis to The Cancer Genome Atlas (TCGA) dataset. We used 200 expression-profiled samples from the TCGA cohort with known BRCA mutation status (27 BRCA1, 28 BRCA2 and 145 wild type). We excluded those tumors with BRCA1 promoter methylation or other mechanisms of pathway disruption. The Euclidean distance, computed based on all genes with mean expression level above 7 was used as the distance between any two samples in this cohort (1978 genes in the filtered set) and depicted as a MDS plot (Supplementary Figure 3). The findings suggest that the unselected gene expression profiles of BRCA1 and BRCA2 mutated samples are not sufficiently different as to be separated in using MDS analysis.

We considered whether the threshold used to filter genes had an effect on the structure of the plot. A two dimensional MDS plot based on the top 500 highest-expressing genes is shown in Supplementary Figure 4. In this case only the BRCA1/2 samples are shown, and it is clear that the samples do not cluster based on the mutation status. We also used a supervised approach to validate the BRCAness signature in AOCS and TCGA cohort using ROAST (14), a gene set test. The genes differentially expressed between BRCA1 and BRCA2 mutated samples in the Jazaeri et al cohort were tested as a unit to see if there was evidence for their differential expression in BRCA1 vs BRCA2 samples. There was no evidence for the differential expression of this gene set in both TCGA (p = 0.124) and AOCS cohort ( p = 0.154) between BRCA1 and BRCA2 mutant samples.

S4.2.2 Supervised Analysis Differentially expressed genes Genes that are differentially expressed between BRCA1, BRCA2 and wild type tumors were identified using empirical Bayes method available in R-package limma (15) (12, 13). We utilised a subset of the publically available TCGA data (55 BRCA1/2 mutated, and 145 wild-type tumors). We excluded those tumors with BRCA1 promoter methylation or other mechanisms of pathway disruption from the discovery analysis (see text). A gene was identified as differentially expressed if the false discovery rate adjusted p-value was less than 0.05. There were 65 genes differentially expressed between BRCA mutated samples (BRCA1 and BRCA2 combined) and wild type samples (Supplementary Table 8), and 34 genes differentially expressed between BRCA1 (n=27) and wild type tumors (n=145) (Supplementary Table 9). Overall, 24 genes were common between the two lists. Using the same cut- off, no gene was identified as significantly differentially expressed between BRCA2 and wild type, between BRCA1 and BRCA2 or between BRCA1 mutated/methylated and wild type tumors.

SYMBOL logFC AveExpr P.Value FDR MAP 1 SLC19A3 0.29 3.37 4.09E-07 4.93E-03 2q36.3 2 ASB7 0.21 3.86 4.68E-06 2.19E-02 15q26.3 3 BMI1 -0.45 7.62 8.62E-06 2.19E-02 10p11.23 4 MRPS27 -0.52 7.03 9.94E-06 2.19E-02 5q13.2 5 FZD4 0.59 6.04 1.17E-05 2.19E-02 11q14.2 6 PKIG -0.54 7.37 1.57E-05 2.19E-02 20q12-q13.1 7 SNRPA1 0.43 6.73 1.78E-05 2.19E-02 15q26.3 8 CDKN1C -0.40 4.88 1.82E-05 2.19E-02 11p15.5 9 HSF1 0.23 4.96 2.15E-05 2.19E-02 8q24.3 10 C12orf29 -0.42 6.53 2.17E-05 2.19E-02 12q21.32 11 AOF2 -0.41 7.08 2.30E-05 2.19E-02 1p36.12 12 LTA4H -0.45 9.16 2.39E-05 2.19E-02 12q22 13 ALDH3A2 -0.41 6.55 2.63E-05 2.19E-02 17p11.2 14 UNC119 -0.29 5.00 2.66E-05 2.19E-02 17q11.2 15 COX10 -0.17 4.57 2.73E-05 2.19E-02 17p12 16 SIL1 0.39 5.54 3.13E-05 2.34E-02 5q31 17 DCTN2 -0.26 7.69 3.31E-05 2.34E-02 12q13.3 18 GPATCH3 -0.20 4.18 4.34E-05 2.62E-02 1p35.3-p35.1 19 RAD17 -0.36 5.57 4.35E-05 2.62E-02 5q13 20 CBFA2T2 -0.18 4.44 4.81E-05 2.62E-02 20q11 21 RASA1 -0.45 6.68 5.05E-05 2.62E-02 5q13.3 22 NSDHL 0.44 5.61 5.35E-05 2.62E-02 Xq28 23 ERGIC3 -0.38 9.34 5.36E-05 2.62E-02 20pter-q12 24 P2RY6 0.38 4.30 5.59E-05 2.62E-02 11q13.5 25 GPAA1 0.42 7.17 5.76E-05 2.62E-02 8q24.3 26 CDK7 -0.45 6.83 5.77E-05 2.62E-02 5q12.1 27 SCAMP1 -0.30 5.52 5.88E-05 2.62E-02 5q14.1 28 EIF4G3 -0.41 6.01 6.60E-05 2.84E-02 1p36.12 29 IQGAP1 0.45 8.16 6.95E-05 2.86E-02 15q26.1 30 NAP1L1 -0.40 8.80 7.13E-05 2.86E-02 12q21.2 31 BACE1 0.45 6.27 7.81E-05 3.04E-02 11q23.2-q23.3 32 PML 0.19 4.25 8.47E-05 3.19E-02 15q22 33 IFT52 -0.45 7.19 8.73E-05 3.19E-02 20q13.12 34 TJP2 0.48 7.80 9.36E-05 3.31E-02 9q13-q21 35 C5orf21 -0.47 6.39 9.69E-05 3.33E-02 5q15 36 TEAD1 -0.35 5.70 1.02E-04 3.40E-02 11p15.2 37 KIF1A -0.89 5.68 1.20E-04 3.76E-02 2q37.3 38 TSTA3 0.42 7.15 1.20E-04 3.76E-02 8q24.3 39 ASXL1 -0.28 6.06 1.23E-04 3.76E-02 20q11 40 RNF41 -0.29 5.27 1.25E-04 3.76E-02 12q13.13 41 SLC10A3 0.39 5.76 1.28E-04 3.76E-02 Xq28 42 ZDHHC24 0.29 4.58 1.32E-04 3.79E-02 11q13.2 43 EXOSC4 0.44 6.46 1.41E-04 3.79E-02 8q24.3 44 C20orf3 -0.35 8.22 1.41E-04 3.79E-02 20p11.21 45 BLMH -0.41 5.08 1.42E-04 3.79E-02 17q11.2 46 IDH2 0.48 7.88 1.46E-04 3.80E-02 15q26.1 47 PSMD13 -0.33 6.18 1.48E-04 3.80E-02 11p15.5 48 PLSCR3 -0.36 7.09 1.56E-04 3.92E-02 17p13.1 49 IL6R 0.38 4.36 1.62E-04 3.97E-02 1q21 50 RP11-298P3.3 -0.39 6.70 1.65E-04 3.97E-02 13q13.1 51 SARS2 -0.44 4.82 1.72E-04 4.06E-02 19q13.2 52 PYCRL 0.38 4.59 1.89E-04 4.22E-02 8q24.3 53 XRCC4 -0.19 3.98 1.91E-04 4.22E-02 5q14.2 54 CRNKL1 -0.47 6.44 1.92E-04 4.22E-02 20p11.2 55 POLR3B -0.40 6.32 1.93E-04 4.22E-02 12q23.3 56 IL24 0.10 3.32 1.98E-04 4.22E-02 1q32 57 C20orf19 -0.48 5.74 2.02E-04 4.22E-02 20p11.23 58 ALDH4A1 -0.21 4.04 2.03E-04 4.22E-02 1p36 59 DFFA -0.36 6.70 2.15E-04 4.38E-02 1p36.3-p36.2 60 C8orf51 0.17 3.95 2.18E-04 4.38E-02 8q24.3 61 DHX35 -0.26 4.76 2.30E-04 4.54E-02 20q11.22-q12 62 LZTFL1 -0.50 6.11 2.46E-04 4.73E-02 3p21.3 63 ENAH -0.42 7.66 2.48E-04 4.73E-02 1q42.12 64 DMN 0.20 3.46 2.65E-04 4.95E-02 15q26.3 65 RAB40B 0.55 6.42 2.67E-04 4.95E-02 17q25.3

Supplementary Table 8: Differentially expressed genes (n=65) between BRCA mutated (BRCA1 and BRCA2 combined) and wild type samples. logFC = log10 expression fold-change; AveExpr = average gene expression; t= moderated t- statistic; pvalue= individual p-values based on moderated t-statistic; FDR = false discovery rate; MAP = chromosomal location.

SYMBOL logFC AveExpr P.Value FDR MAP 1 SNRPA1 0.64 6.71 9.44E-07 1.00E-02 15q26.3 2 SLC19A3 0.33 3.35 2.46E-06 1.00E-02 2q37 3 TM2D3 0.57 8.01 2.69E-06 1.00E-02 15q26.3 4 DCTN2 -0.38 7.70 3.33E-06 1.00E-02 12q13.3 5 PKIG -0.71 7.40 5.93E-06 1.38E-02 20q12-q13.1 6 C12orf29 -0.56 6.55 6.89E-06 1.38E-02 12q21.32 7 ASB7 0.26 3.84 1.57E-05 2.42E-02 15q26.3 8 KIF1A -1.28 5.72 1.61E-05 2.42E-02 2q37.3 9 ZDHHC24 0.41 4.56 2.24E-05 2.63E-02 11q13.2 10 NSDHL 0.58 5.58 2.56E-05 2.63E-02 Xq28 11 HSF1 0.28 4.94 2.73E-05 2.63E-02 8q24.3 12 IQGAP1 0.61 8.13 2.81E-05 2.63E-02 15q26.1 13 MRPS27 -0.62 7.08 2.84E-05 2.63E-02 5q13.2 14 BMI1 -0.54 7.66 3.30E-05 2.82E-02 10p11.23 15 COL4A3BP -0.59 6.32 3.55E-05 2.82E-02 5q13.3 16 PCTK2 -0.34 4.87 3.75E-05 2.82E-02 12q23.1 17 FZD4 0.66 5.98 3.99E-05 2.83E-02 11q14.2 18 SLC10A3 0.54 5.74 5.29E-05 3.54E-02 Xq28 19 BCAP31 0.53 8.53 6.93E-05 4.19E-02 Xq28 20 GPAA1 0.54 7.14 7.05E-05 4.19E-02 8q24.3 21 PML 0.23 4.24 7.31E-05 4.19E-02 15q22 22 SCAMP1 -0.40 5.54 8.35E-05 4.57E-02 5q14.1 23 CDKN1C -0.47 4.91 8.85E-05 4.63E-02 11p15.5 24 HMGCR -0.59 5.94 9.36E-05 4.70E-02 5q13.3-q14 25 PNPLA4 0.60 5.30 1.00E-04 4.81E-02 Xp22.3 26 TBL1X 0.55 4.99 1.11E-04 4.81E-02 Xp22.3 27 SLC25A1 0.57 5.66 1.18E-04 4.81E-02 22q11.21 28 TSTA3 0.54 7.13 1.26E-04 4.81E-02 8q24.3 29 POLR3B -0.52 6.34 1.28E-04 4.81E-02 12q23.3 30 EXOSC4 0.56 6.43 1.29E-04 4.81E-02 8q24.3 31 FAM49B 0.47 7.16 1.31E-04 4.81E-02 8q24.21 32 TRIB1 0.71 7.44 1.32E-04 4.81E-02 8q24.13 33 LTA4H -0.52 9.20 1.35E-04 4.81E-02 12q22 34 PYCRL 0.49 4.56 1.36E-04 4.81E-02 8q24.3

Supplementary Table 9: Differentially expressed genes (n=34) between BRCA1 mutated and wild type samples.

S4.3 Gene expression based BRCA mutation classifier

S4.3.1 BRCAness profile classifier

A BRCAness profile has been described that was used to predict the “BRCAness” of a tumor sample that is based on the expression levels of 60 genes (16). This signature was derived using the Jazaeri data set. We investigated the accuracy of the BRCAness signature to predict the mutation status of tumors in the AOCS and TCGA datasets. All the 60 genes were available on the platform used to profile AOCS samples and all but one gene (SKP1) was present in the platform used to profile TCGA samples. BRCAness score was computed as described in Konstantinopoulos et al and a BRCA-score of >18.973 were considered to predict a BRCA mutation (16). The relationship between BRCAness score and BRCA mutation status were depicted (Supplementary Figure 5). There was no association between the BRCAness score and mutation status in either of the cohorts. We reasoned that the cut-off proposed by Konstantinopoulos et al may not translate across platforms and hence decided to use an ROC curve to identify an optimal threshold. However the Area Under the Curve (AUC) obtained using the BRCAness score was not significantly higher than 0.5 suggesting that the signature lacked discriminatory power (Supplementary Table 10).

PMCI BRCA1 signature PMCI BRCA1/2 signature Konstantinopoulos signature AOCS 0.87 0.82 0.55 TCGA 0.89 0.88 0.61 Jazaeri 0.86 0.77 0.63

Supplementary Table 10: Area under the ROC curve computed using the different signatures

S4.3.2 BRCA classifier based on TCGA and AOCS data

The BRCAness score represents the average log expression of differentially expressed genes, weighted by the direction and magnitude of the change of those genes in the training set. We had previously shown that molecular subtype scores could be used to classify samples based on the gene expression profiles (2). Therefore we decided to define our signature as the differentially expressed genes between BRCA mutant samples and wild type samples within the TCGA cohort. We visualized and validated the relationship between BRCA mutation status and the novel BRCA score in the completely independent AOCS cohort (Figure 2). The discriminating power of the BRCAness score based on the new signature was assessed using ROC plots. BRCA scores were computed for BRCA1 and BRCA1/2 (referred to as PMCI BRCA1 and PMCI BRCA1/2 signature respectively) classification and the area under the curves compared against those generated by the classifier proposed by Konstantinopoulos et al (Supplementary Table 10). ROC curves are shown in Supplementary Figure 6. We also computed the statistical significance of the classifier by computing the probability of obtaining a cross-validated classification error as small as achieved if there were no relationship between the expression data and class identifiers. It is important to establish this probability, as it is possible to obtain an accurate classifier due to the high dimensional nature of gene expression data (17). The statistical significance of the classifiers was assessed using Monte Carlo simulation. Briefly, random subsets of genes were used as signature and the area under the curve estimated. This procedure is repeated for a large number (n) of times. The number of times an area under the curve equal to or greater than that observed for the actual signature is noted (m). The p-value can then be computed as equal to (m+1)/(n+1) (18). The estimated statistical significances of all the classifiers are shown in Supplementary Table 11.

PMCI BRCA1 signature PMCI BRCA1/2 signature Konstantinopoulos signature AOCS 5.00E-04 5.00E-04 1.701E-01 TCGA 5.00E-04 5.00E-04 05.5E-03 Jazaeri 5.00E-04 5.00E-04 4.85E-02

Supplementary Table 11: Statistical significance of the signature computed using a permutation procedure. The newly derived signature is significant in all cohorts

We also classified samples using K-nearest-neighbor (kNN) classification algorithm and estimated the performance measures. Since the TCGA cohort was used to design the classifier, the error estimated was made using leave one out cross validation method, whereas the AOCS cohort was an independent validation dataset. Several measures can be used to quantify the performance of a binary classification function based on the actual label and the classifier predicted label of the samples in the test set as described below (19).

• Sensitivity: Probability of classifying a BRCA mutated sample correctly.

• Specificity: Probability of classifying a wild-type sample correctly.

• Negative predictive value: Proportion of actual wild-type samples within the set of samples classified as wild-type samples.

• Positive predictive value: Proportion of mutated samples within the set of samples classified as mutated.

• Accuracy: Proportion of samples correctly classified.

The performance metrics of the BRCA1 and BRCA1/2 classifiers are given in Supplementary Tables 12 and 13 respectively.

Performance Measure TCGA AOCS

Sensitivity 0.56 0.62

Specificity 0.95 0.96

Negative Predictive Value 0.91 0.92

Positive Predictive Value 0.71 0.77

Accuracy 0.88 0.89

Supplementary Table 12: Performance measure of the BRCA1 K-nearest-neighbor (kNN) classification algorithm classifier estimated from TCGA and AOCS data.

Performance Measure TCGA AOCS

Sensitivity 0.62 0.54

Specificity 0.90 0.86

Negative Predictive Value 0.85 0.83

Positive Predictive Value 0.72 0.58

Accuracy 0.82 0.77

Supplementary Table 13: Performance measure of the BRCA1/2 K-nearest-neighbor (kNN) classification algorithm classifier estimated from TCGA and AOCS data.

Univariate survival analyses using log-rank analyses showed that in both the TCGA and AOCS cohorts the tumors classified as being BRCA1-like by the BRCA1 classifier had better survival outcomes (Supplementary Figure 7). .

S4.4 Differentially expressed gene sets

Gene set testing can offer improved statistical power over testing genes individually. We used CAMERA, a competitive gene set testing approach that avoids inflated type I error rates by accounting for inter-gene correlation within each set of genes (18). The collection of annotated gene sets available in the Molecular Signature Database (MSigDB v3.1) 2 (20), publicly available from the BROAD institute, was used for this analysis. CAMERA identified 11 gene sets as unusually differentially expressed between BRCA1 and wild type samples (Supplementary Table 14) and 9 gene sets differentially expressed between BRCA2 and wild type samples (Supplementary Table 15).

2 http://www.broadinstitute.org/gsea/msigdb

NGenes CorrelationDirectionality P-value Gene set 73 0.23 Up 8.1E-04 chr8q24 33 0.11 Up 1.5E-05 chr15q26 48 0.12 Up 4.5E-03 chr15q24 66 0.13 Up 7.3E-03 chrxq28 30 0.14 Up 2.2E-03 chr15q25 73 0.12 Up 3.2E-03 chrxp22 23 0.06 Down 1.4E-03 chr12q21 31 0.14 Down 2.2E-03 chr5q13 27 0.09 Up 8.7E-03 chrxq26 83 0.21 Up 8.2E-04 NIKOLSKY_BREAST_CANCER_8Q23_Q24_AMPLICON 79 0.02 Up 7.0E-03 LABBE_WNT3A_TARGETS_DN

Supplementary Table 14: Differentially expressed gene sets between BRCA1 and wild type tumors (TCGA). The columns provide the number of genes (NGenes) , inter-gene correlation (correlation), the direction of change and the gene set name. “Up” refers to sets that are up-regulated in BRCA1 tumors. Copy number profile comparison of BRCA1 mutated and wild type samples identified both 8q24, Xq26 and Xq28 locus amplified in higher proportion of BRCA1 mutated samples.

NGenes CorrelationDirectionalityP-value GeneSet 97 0.08 Up 2.40E-03 KEGG_SYSTEMIC_LUPUS_ERYTHEMATOSUS 27 0.01 Up 7.01E-03 MODULE_385 47 0.01 Up 4.60E-03 MODULE_418 24 0.07 Up 3.35E-03 REACTOME_DEGRADATION_OF_THE_EXTRACELLULAR_MATRIX 48 0.04 Up 4.47E-03 SU_PANCREAS 32 0.01 Up 8.53E-03 CAMPS_COLON_CANCER_COPY_NUMBER_DN 64 0.02 Up 7.42E-03 CADWELL_ATG16L1_TARGETS_UP 124 0.04 Up 6.82E-03 ALK_DN.V1_UP 93 0.01 Up 9.71E-03 CRX_DN.V1_UP

Supplementary Table 15: Differentially expressed gene sets between BRCA2 and wild type tumors (TCGA). No gene sets mapping to chromosomal regions were associated with BRCA mutation status, like we see when comparing BRCA1 mutated tumors and wild-type. S4.5 Copy number profile analysis

S4.5.1 Genomic alteration summary statistics

Copy number profiles of 204 samples with BRCA mutations (34 BRCA1, 30 BRCA2 and 140 wild type samples) were available from the TCGA data portal. In addition to the raw data, TCGA data portal also provides normalized and segmented copy number profiles (level 3 data) for all the samples. The segmented copy number profile can be used to identify the copy number level of any locus in the genome. The total genome gained, lost or amplified in a sample is computed by adding the length of all the samples with mean segment values greater than 0.3, less than -0.3 or greater than 0.8 respectively. The association between gains, losses or amplification events and BRCA mutation status is visualized using box plots (Supplementary Figure 8).

S4.5.2 Differentially altered regions

CAMERA gene set testing identified genes associated with several distinct chromosomal regions including 8q24, 15q26 and Xq28 up regulated in BRCA1 tumors, compared to wild type tumors. Therefore we decided to identify genomic regions differentially altered in samples with distinct BRCA mutations. For every gene in the genome, the proportions of samples gained, lost and amplified were computed in the three different cohorts; BRCA1 mutated, BRCA2 mutated and wild type. The statistical significance of difference in proportions was tested using Fisher’s exact test and the p-values were corrected for multiple testing. The genes differentially gained, lost and amplified at a false discovery rate less than 0.05 is shown in Supplementary Figures 9, 10 and 11 respectively. The structure of the amplified region at Chr8q24, Chr19q12 and ChrX is depicted in Figure 3B, Supplementary Figure 12 and Supplementary Figure 13 respectively. The complete list of genes differentially amplified in Chr8q24, Chr19 and ChrX are given in Supplementary Tables 16, 17 and 18 respectively.

Supplementary Table 16, Supplementary Table 17 and Supplementary Table 18 are provided as separate documents.

S4.5.3 X abnormalities

Abnormalities in expression of genes, including those at Xp28, and uniparental X chromsome isodisomy have been shown to be associated with basal-like breast cancer (21). Although X chromosome uniparental isodisomy had the potential to result in a global change in X chromosome gene expression, this was not observed but rather than there was a subset of X chromosome genes differentially expressed in basal-like breast cancer (21).

Given the apparent similarity between basal-like breast cancer and BRCA1 mutant ovarian cancer, including specifically altered X chromosome gene copy and expression, we investigated patterns of gene expression on the X chromosome in mutant and wild type HGSC samples. Box plots were used to visualize the mRNA expression levels of genes within each chromosome across BRCA mutated and wild type tumors (Supplementary Figure 15). The expression levels of these genes in BRCA1 and BRCA2 mutated samples were compared to that of wild type tumors (Supplementary Table 19). Although specific X chromosome loci are amplified and overexpressed in BRCA1 mutated HGSC samples, we did not observe a global increase in X chromosome transcription.

Richardson et al (21) described a subset of genes that are differentially expressed when non-basal-like and basal-like breast cancer samples are compared. We used ROAST (14) to investigate this gene list and found it to be significantly up regulated in both BRCA1 mutated and BRCA2 mutated samples when compared to wild type samples( p = 0.0002 ). Differential gene expression is most pronounced in BRCA1 mutant samples (Supplementary Table 19).

SYMBOL logFC BRCA1 logFC BRCA2 MAP 1 CASK -0.08 -0.04 Xp11.4 2 CYBB 0.24 0.05 Xp21.1 3 XK -0.11 0.24 Xp21.1 4 PDHA1 0.20 0.16 Xp22.1 5 PDK3 0.00 0.03 Xp22.11 6 PRDX4 0.14 0.04 Xp22.11 7 AP1S2 0.17 0.15 Xp22.2 8 PIR 0.66 0.36 Xp22.2 9 CLCN4 -0.06 0.02 Xp22.3 10 MSL3L1 0.16 0.06 Xp22.3 11 PRKX 0.05 0.11 Xp22.3 12 PRPS2 0.23 0.21 Xp22.3-p22.2 13 ARHGEF9 0.01 0.09 Xq11.1 14 KIF4A 0.21 0.00 Xq13.1 15 TMEM28 -0.01 0.21 Xq13.1 16 CUL4B 0.23 0.09 Xq23 17 NXT2 0.21 0.17 Xq23 18 HPRT1 0.43 0.03 Xq26.1 19 HTATSF1 0.22 0.07 Xq26.3 20 MOSPD1 0.25 -0.05 Xq26.3 21 SLC9A6 0.38 0.23 Xq26.3 22 VGLL1 0.36 0.23 Xq26.3 23 DKC1 0.44 0.01 Xq28 24 IRAK1 0.46 0.11 Xq28 25 MTCP1 0.40 0.11 Xq28 26 UBL4A 0.18 0.03 Xq28 27 VBP1 0.40 0.04 Xq28

Supplementary Table 19: The log fold change of gene expression in TCGA ovarian tumors - chromosome X genes.

S6: qPCR Validation

Gene copy number for MYC, PTK2 and PYCRL was assessed by real-time quantitative-PCR using the 7900HT Fast Real-Time PCR system (Applied Biosystems) as described previously (22). Primers were designed to amplify exonic regions, avoiding known SNPs and amplification of homologous sequences, using Primer3 (23) (Supplementary Table 20). Triplicate 10µL reactions containing 2 ng genomic DNA, 1 μM of each primer and SYBR green master mix (Applied Biosystems) were completed. Amplification conditions were 50°C for 2 min, 95°C for 10 mins followed by 40 cycles of 95°C for 15 sec and 60°C for 1 min. Disassociation analysis was used to ensure amplification product specificity and threshold cycle numbers were obtained using Sequence Detection Software 2.3 (SDS) using default settings. Target gene log2 copy number ratio was estimated in tumor samples normalized to the Line-1 repetitive element and a normal female reference DNA (Novagen) (24). Samples were classified as having copy number gain if log2 copy number ratio was greater than 0.5 (~3 copies).

Gene Left Primer Tm Forward Primer Tm Product Size

MYC TCAAGAGGCGAACACACAAC 59.88 GGCCTTTTCATTGTTTTCCA 59.92 110

PTK2 CTCCTACTGCCAACCTGGAC 59.72 GCTGGCTGGATTTTACTGGA 60.21 100 PYCRL CGTCATCTTTGCCACCAAG 60.25 CACGGACACCAAGATGTGTT 59.44 91

Supplementary Table 20: Primers for qPCR Validation References: 1. The Cancer Genome Atlas Research Network: Integrated genomic analyses of ovarian

carcinoma. Nature 2011; 474:609-15.

2. Helland A, Anglesio MS, George J, Cowin PA, Johnstone CN, House CM et al: Deregulation

of MYCN, LIN28B and LET7 in a molecular subtype of aggressive high-grade serous ovarian

cancers. PloS one 2011; 6:e18064.

3. Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S et al: Novel molecular subtypes

of serous and endometrioid ovarian cancer linked to clinical outcome. Clin Cancer Res 2008;

14:5198-208.

4. Alsop K, Fereday S, Meldrum C, Defazio A, Emmanuel C, George J et al: BRCA Mutation

Frequency and Patterns of Treatment Response in BRCA Mutation-Positive Women With

Ovarian Cancer: A Report From the Australian Ovarian Cancer Study Group. Journal of

clinical oncology : official journal of the American Society of Clinical Oncology 2012;

30:2654-63.

5. Hondow HL, Fox SB, Mitchell G, Scott RJ, Beshay V, Wong SQ et al: A high-throughput

protocol for mutation scanning of the BRCA1 and BRCA2 genes. BMC cancer 2011; 11:265.

6. Mikeska T, Dobrovic A: Methylation-sensitive high resolution melting for the rapid analysis of

DNA methylation. In: Epigenetics: A Reference Manual. edn. Edited by Craig JM, Wong NC:

Caister Academic Press; 2011: 325-35.

7. Wojdacz TK, Dobrovic A, Hansen LL: Methylation-sensitive high-resolution melting. Nat

Protoc 2008; 3:1903-8.

8. Jazaeri A, Yee C, Sotiriou C, Brantley K, Boyd J, Liu E: Gene expression profiles of BRCA1-

linked, BRCA2-linked, and sporadic ovarian cancers. J Natl Cancer Inst 2002; 94:990-1000. 9. Wong EM, Southey MC, Fox SB, Brown MA, Dowty JG, Jenkins MA et al: Constitutional

methylation of the BRCA1 promoter is specifically associated with BRCA1 mutation-

associated pathology in early-onset breast cancer. Cancer Prev Res (Phila) 2011; 4:23-33.

10. Taniguchi T, Tischkowitz M, Amezaine N, Hodgson SV, Matthew CG, Joenje H et al:

Disruption of the Fanconi anemia-BRCA pathway in cisplatin-sensitive ovarian tumours.

Nature medicine 2003; 9:568-74.

11. Meyer D, Zeileis A, Hornik K: The strucplot framewok: Visualizing multi-way contingency

tables with vcd. Journal of Statistical Software 2006; 17:1-48.

12. Borg I, Groenen PJF: Modern Multidimensional Scaling: Theory and Applications; 2005.

13. Smyth GK: Linear models and empirical Bayes methods for assessing differential expression in

microarray experiments. Statistical Applications in Genetics and Molecular Biology 2004;

3:Article 3.

14. Wu D, Lim E, Vaillant F, Asselin-Labat ML, Visvader JE, Smyth GK: ROAST: rotation gene

set tests for complex microarray experiments. Bioinformatics 2010; 26:2176-82.

15. Smyth GK: Limma: linear models for microarray data. In: Bioinformatics and Computational

Biology Solutions using R and Bioconductor. edn. Edited by R. Gentleman VC, S. Dudoit, R.

Irizarry, W. Huber. New York: Springer; 2005: 397-420.

16. Konstantinopoulos PA, Spentzos D, Karlan BY, Taniguchi T, Fountzilas E, Francoeur N et al:

Gene expression profile of BRCAness that correlates with responsiveness to chemotherapy and

with outcome in patients with epithelial ovarian cancer. Journal of clinical oncology : official

journal of the American Society of Clinical Oncology 2010; 28:3555-61.

17. Ahmadiyeh N, Pomerantz MM, Grisanzio C, Herman P, Jia L, Almendro V et al: 8q24 prostate,

breast, and colon cancer risk loci show tissue-specific long-range interaction with MYC. Proceedings of the National Academy of Sciences of the United States of America 2010;

107:9742-6.

18. Wu D, Smyth GK: Camera: a competitive gene set test accounting for inter-gene correlation.

Nucleic acids research 2012; 40:e133.

19. Parikh R, Mathai A, Parikh S, Chandra Sekhar G, Thomas R: Understanding and using

sensitivity, specificity and predictive values. Indian journal of ophthalmology 2008; 56:45-50.

20. Liberzon A, Subramanian A, Pinchback R, Thorvaldsdottir H, Tamayo P, Mesirov JP:

Molecular signatures database (MSigDB) 3.0. Bioinformatics 2011; 27:1739-40.

21. Richardson AL, Wang ZC, De Nicolo A, Lu X, Brown M, Miron A et al: X chromosomal

abnormalities in basal-like human breast cancer. Cancer cell 2006; 9:121-32.

22. Etemadmoghadam D, deFazio A, Beroukhim R, Mermel C, George J, Getz G et al: Integrated

genome-wide DNA copy number and expression analysis identifies distinct mechanisms of

primary chemoresistance in ovarian carcinomas. Clin Cancer Res 2009; 15:1417-27.

23. Rozen S, Skaletsky H: Primer3 on the WWW for General Use and for Biologist Programmers.

In: Bioinformatics Methods and Protocols. Volume 132, edn. Edited by Misener S, Krawetz SA.

Totowa, N.J.: Humana Press Inc.; 2000: 365-86.

24. Wang TL, Maierhofer C, Speicher MR, Lengauer C, Vogelstein B, Kinzler KW et al: Digital

karyotyping. Proceedings of the National Academy of Sciences of the United States of America

2002; 99:16156-61.