Twin Research and Human Genetics Volume 15 Number 5 pp. 615–623 C The Authors 2012 doi:10.1017/thg.2012.38

Genome-Wide Association Study for Ovarian Susceptibility Using Pooled DNA

Yi Lu,1 Xiaoqing Chen,1 Jonathan Beesley,1 Sharon E. Johnatty,1 Anna deFazio,2,3 Australian Ovarian Cancer Study (AOCS) Study Group, Sandrina Lambrechts,4 Diether Lambrechts,5,6 Evelyn Despierre,4 Ignace Vergotes,4 Jenny Chang-Claude,7 Rebecca Hein,7 Stefan Nickels,7 Shan Wang-Gohrke,8 Thilo Dork,¨ 9 Matthias Durst,¨ 10 Natalia Antonenkova,11 Natalia Bogdanova,11,12 Marc T. Goodman,13 Galina Lurie,13 Lynne R. Wilkens,13 Michael E. Carney,14 Ralf Butzow,15 Heli Nevanlinna,15 Tuomas Heikkinen,15 Arto Leminen,15 Lambertus A. Kiemeney,16,17,18 Leon F.A.G. Massuger,19 Anne M. van Altena,19 Katja K. Aben,17,18 Susanne Kruger¨ Kjaer,20 Estrid Høgdall,20 Allan Jensen,21 Angela Brooks-Wilson,22,23 Nhu Le,24 Linda Cook,25 Madalene Earp,23 Linda Kelemen,26 Douglas Easton,27 Paul Pharoah,27 Honglin Song,27 Jonathan Tyrer,27 Susan Ramus,28 Usha Menon,29 Alexandra Gentry-Maharaj,29 Simon A. Gayther,28 Elisa V. Bandera,30,31 Sara H. Olson,31 Irene Orlow,31 Lorna Rodriguez-Rodriguez,30 Stuart Macgregor,1∗ and Georgia Chenevix-Trench1∗ 1Queensland Institute of Medical Research, Brisbane, Australia 2Department of Obstetrics and Gynaecology, University of Sydney, Sydney, Australia 3Westmead Institute for Cancer Research, Westmead Hospital, Sydney, Australia 4Division of Gynaecological Oncology, Department of Obstetrics and Gynaecology, University Hospital Leuven, University of Leuven, Leuven, Belgium 5Vesalius Research Center, VIB, Leuven, Belgium 6Vesalius Research Center, University of Leuven, Belgium 7Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany 8Department of Obstetrics and Gynecology, University of Ulm, Ulm, Germany 9Gynaecology Research Unit, Hannover Medical School, Hannover, Germany 10Department of Gynaecology, Friedrich Schiller University, Jena, Germany 11Byelorussian Institute for Oncology and Medical Radiology, Aleksandrov NN, Minsk, Belarus 12Clinics of Radiation Oncology, Hannover Medical School, Hannover, Germany 13Cancer Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI, USA 14Department of Obstetrics and Gynecology, John A Burns School of Medicine, University of Hawaii, Honolulu, HI, USA 15Department of Obstetrics and Gynecology, Helsinki University Central Hospital, Helsinki, Finland 16Department of Urology, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands 17Department of Epidemiology, Biostatistics and HTA, Nijmegen Medical Centre, Radboud University, Nijmegen, The Netherlands 18Comprehensive Cancer Center, Nijmegen, The Netherlands 19Department of Gynaecology, Nijmegen Medical Centre, Radboud University, Nijmegen, The Netherlands 20Department of Viruses, Hormones and Cancer, Institute of Cancer Epidemiology, Danish Cancer Society, Copenhagen, Denmark 21Department of Gynecology, Juliane Marie Centre, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark 22Department of Biomedical Physiology and Kinesiology, Simon Fraser University, Burnaby, British Columbia, Canada 23Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, Canada 24Cancer Control Research, BC Cancer Agency, Vancouver, British Columbia, Canada 25Division of Epidemiology and Biostatistics, University of New Mexico, Albuquerque, NM, USA 26Alberta Health Services — Cancer Care, Calgary, Alberta, Canada 27Departments of Oncology and Public Health and Primary Care, University of Cambridge, Cambridge, UK 28Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA 29Gynaecological Oncology Unit, UCL EGA Institute for Women’s Health, University College London, London, UK 30The Cancer Institute of New Jersey, New Brunswick, NJ, USA 31Memorial Sloan-Kettering Cancer Center, New York, NY, USA ∗These two authors contributed equally.

RECEIVED 6 March 2012; ACCEPTED 19 April 2012.

ADDRESS FOR CORRESPONDENCE: Stuart Macgregor, Queensland Institute of Medical Research, Locked Bag 2000, Herston, Queensland 4029, Australia. E-mail: [email protected]

615 Downloaded from https://www.cambridge.org/core. IP address: 170.106.34.90, on 29 Sep 2021 at 01:23:22, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/thg.2012.38 Yi Lu et al.

Recent Genome-Wide Association Studies (GWAS) have identified four low-penetrance ovarian cancer susceptibility loci. We hypothesized that further moderate- or low-penetrance variants exist among the subset of single-nucleotide polymorphisms (SNPs) not well tagged by the genotyping arrays used in the previous studies, which would account for some of the remaining risk. We therefore conducted a time- and cost-effective stage 1 GWAS on 342 invasive serous cases and 643 controls genotyped on pooled DNA using the high-density Illumina 1M-Duo array. We followed up 20 of the most significantly associated SNPs, which are not well tagged by the lower density arrays used by the published GWAS, and genotyping them on individual DNA. Most of the top 20 SNPs were clearly validated by individually genotyping the samples used in the pools. However, none of the 20 SNPs replicated when tested for association in a much larger stage 2 set of 4,651 cases and 6,966 controls from the Ovarian Cancer Association Consortium. Given that most of the top 20 SNPs from pooling were validated in the same samples by individual genotyping, the lack of replication is likely to be due to the relatively small sample size in our stage 1 GWAS rather than due to problems with the pooling approach. We conclude that there are unlikely to be any moderate or large effects on ovarian cancer risk untagged by less dense arrays. However, our study lacked power to make clear statements on the existence of hitherto untagged small-effect variants.

 Keywords: GWAS, DNA Pooling, Ovarian cancer risk, Nanodrop spectroscopy.

Genome-Wide Association Studies (GWAS) have been an on ovarian cancer susceptibility aiming to identify some of unprecedented success in identifying common alleles with the remaining unexplained familial risk. This GWAS used moderate to small effects associated with different diseases relatively low-density Illumina 610K and 550K arrays for and phenotypes. In particular, more than 100 common, cases and controls, respectively Song et al., (2009), Bolton low-penetrance loci of different have been uncov- et al., (2010), Goode et al., (2010). The confirmed suscep- ered by GWAS (Varghese & Easton, 2010). The discovery of tibility loci reaching genome-wide significance level (p < 5 susceptibility loci will provide significant insights to cancer × 10−8) uncovered by this GWAS are at 9p22.2 near BNC2 etiology and an improved understanding of the mechanisms (Song et al., 2009), 19p13.11 near C19orf62 (also known of tumor biology. In addition, loci associated with tumor as MERIT40) (Bolton et al., 2010), 8q24, and 2q31; two progression after treatment will offer targets for therapeutic other borderline significant loci at 3q25 and 17q21 were intervention, and the risk predictions based on accumulated also identified (Goode et al., 2010). In addition, a candidate knowledge of cancer genetics, together with environmental gene study has implicated the TERT locus, which has been risk factors, will help to identify individuals with an elevated found to contain susceptibility SNPs by many other cancer risk of cancer (Fletcher & Houlston, 2010). Although each GWAS (Johnatty et al., 2010). All these loci confer small of the common loci identified through GWAS only account risks (per-allele relative risk less than 1.3), supporting the for a small proportion of risk, collectively more than 20% concept of polygenic architecture underlying ovarian can- of familial risk of prostate cancer has been explained, and cer susceptibility. As found in other cancer types, ovarian ∼7%, ∼6%, and ∼5% of familial risk of lung, colorectal, cancer also shows histological subtype variation. The as- and breast cancers, respectively, can now be explained by sociations identified so far are stronger for serous tumors GWAS results (Varghese & Easton, 2010). These estimates than for other histological subtypes. To identify histology- are likely to be conservative, as the effects of causal variants specific risk loci, separate GWAS on different subtypes will are typically larger than the associations detected through be more powerful than a single GWAS, including all sub- tag single-nucleotide polymorphisms (SNPs); Fletcher & types. However, the high cost of GWAS limits the desirabil- Houlston, 2010). ity of carrying out studies with individual genotyping for Globally, ovarian cancer is the seventh leading cause of the less common subtypes, such as endometrioid, muci- cancer mortality among woman. Despite its relatively rare nous, and clear cell ovarian cancers, which may not be well incidence, it has the same pattern of familial aggregation powered. as other major cancers. Early twin studies have shown that The GWAS genotyping on pooled DNA has proved to most of the excess familial risk of ovarian cancer is due be a time- and cost-effective alternative to conventional to genetic factors rather than shared environmental fac- GWAS, which individually genotype all the study subjects tors (Lichtenstein et al., 2000). It is well established that (Craig et al., 2009; Macgregor et al., 2006, 2008; Norton although rare mutations in BRCA1 and BRCA2,identified et al., 2004; Sham et al., 2002; Visscher & Le Hellard, 2003). originally by linkage studies, are the most important ge- In this study we conducted a pooled GWAS on serous ovar- netic risk factors in terms of their high penetrance, they ian cancer risk, followed by validation of the pooled results do not fully account for the excess ovarian cancer risk seen and genotyping SNPs of interest in an independent large in families. To date, one large GWAS has been conducted dataset from the Ovarian Cancer Association Consortium.

616 OCTOBER 2012 TWIN RESEARCH AND HUMAN GENETICS Downloaded from https://www.cambridge.org/core. IP address: 170.106.34.90, on 29 Sep 2021 at 01:23:22, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/thg.2012.38 GWAS for Ovarian Cancer Susceptibility Using Pooled DNA

We used the high-density Illumina 1M-Duo array contain- TABLE 1 ing 1.2 million SNPs for our pooled GWAS, because it has Design of Case-Control Pool Comparisons a superior coverage with 93% of common SNPs in the CEU 2 ≥ p-value of mean population tagged at r 0.8. The aim of this study was age difference two-fold: to test the hypothesis that common SNPs with Pool Mean age between cases a moderate or low risks, which are not well tagged by the comparison Case-control status N (±SD) and controls lower density arrays (Illumina 550K and 610K arrays used 1 Case (good survival) 101 58.9 (8.6) 0.3038 Control 182 57.7 (10.4) in the previous GWAS), also account for some of the residual 2 Case (medium 101 59.4 (10.5) 0.1223 ovarian cancer risk; and to determine whether the pooled survival) Control 182 57.3 (11.2) GWAS can be effectively carried out on DNA quantified by 3 Case (poor survival) 101 62.2 (9.5) 2.5e-4 spectrophotometry, as opposed to Picogreen absorption, Control 181 57.3 (12.5) which we have used previously. 4 Case (extracted by 39 60.5 (9.9) 0.0288 Qiagen columns) Control 90 56.2 (10.8) Note: aWe stratified the cases in three large pools by overall survival time. Materials and Methods A small subset of 39 case DNAs isolated by Qiagen columns was kept together in one pool. Ethics Statement This study was conducted according to the principles ex- pressed in the Declaration of Helsinki. The study was ap- of case versus control pools, where each individual case proved by the human ethics committee of Queensland Insti- was matched with approximately two individual controls tute of Medical Research. All participants provided written (Table 1). informed consent. We used samples from 12 sites in the Ovarian Cancer Association Consortium (OCAC) in the replication stage (Table 2). In all, we genotyped 13,779 samples, including Samples 6,966 non-Hispanic White controls and 4,651 non-Hispanic We used samples from the Australian Ovarian Cancer Study White invasive cases, among which 2,245 were of serous (AOCS) for the pooled GWAS. AOCS ascertained ovarian histology. cancer cases through the surgical treatment centers in Aus- tralia, and from the Cancer Registries of Queensland, South Genotyping and Quality Control and Western Australia, New South Wales, and Victoria, All the DNA pools were genotyped on Illumina Human while controls were population-based and drawn from the 1M-Duo arrays using standard protocols. All pools were Commonwealth electoral roll (Burkey & Kanetsky, 2009). genotyped in triplicate, with the exception of one control We selected 342 invasive serous cases and 643 controls for pool, which was genotyped in quadruplicate. A number of the pooled GWAS. All the study subjects were self-reported quality control (QC) steps described elsewhere (Lu et al., White with non-Hispanic origin. Age at diagnosis and in- 2010) were also applied here: (1) SNPs must have less than terview was recorded for cases and controls, respectively. 10% negative intensity values on each pool; (2) The num- Detailed clinical information was also available for ovarian ber of working probes for SNP on each pool must be larger cancer patients, including primary site of tumor, stage and than 20; (3) The sum of raw red and green intensity val- grade, and overall survival time. Most of the DNAs had ues must be more than 1,200; (4) Minor allele frequency been isolated using salt extraction (Chang et al., 2009), but (MAF) in the HapMap CEU samples is over 5%; (5) SNP a subset had been isolated with Qiagen columns, and so must not present significant variance difference between these DNAs were kept separate. case and control pools. A number of additional checks were DNA concentration was measured before the pools were also applied. (6) The differential amplification parameter made by spectrophometry using a Nanodrop, and the sam- of SNP must be between 1/3 and 3. ‘Differential amplifica- ples were adjusted through serial dilution to 48–52 ng/μL. tion’ refers to a phenomenon that the alleles at a locus are Each DNA sample of 2 μL was combined for each pool, unequally amplified; in these cases the allele frequency es- and the final concentration was verified by Nanodrop. The timates are biased because of the imbalanced raw intensity salt-extracted DNAs from 303 invasive serous cases were value. However, the differential amplification cancels out to further divided into tertiles according to their overall sur- a good approximation when we assess the allele frequency vival time. We made a separate pool of 39 DNAs from un- difference between case and control pools. We discarded selected cases that were isolated with Qiagen columns. The SNPs with very extreme differential amplification (<1/3 or controls were randomly placed in seven pools, each with a >3). This additional check is equivalent to discarding SNPs size of 90–91 samples. We then matched each of the three with estimated allele frequencies that are very different from salt-extracted case sets with two of the control sets. The the reference samples, for example, the HapMap CEU sam- smaller Qiagen-extracted DNA pool was matched with the ples used here. (7) SNPs that passed quality control for remaining control pool. Thus, we had four comparisons more than two pool pairs out of four were kept, because in

TWIN RESEARCH AND HUMAN GENETICS OCTOBER 2012 617 Downloaded from https://www.cambridge.org/core. IP address: 170.106.34.90, on 29 Sep 2021 at 01:23:22, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/thg.2012.38 Yi Lu et al.

TABLE 2 Summary of OCAC Samples Used for the Replication Study

Controls Cases

Controls Cases (non- Invasive case Invasive serous cases (non-Hispanic Hispanic (non-Hispanic (non-Hispanic OCAC site Study name White) White) White)a White)b

AUSc Australia Ovarian Cancer Study/Australian Cancer Study 576 (524) 1,276 (1,207) 921 502 BEL Belgium Ovarian Cancer Study 428 (428) 257 (253) 197 128 GER German Ovarian Cancer Study 420 (420) 252 (251) 223 105 HJO/HMO Hannover–Jena and Hannover–Minsk Ovarian Cancer Study 913 (903) 467 (463) 246 70 HAW Hawaii Ovarian Cancer Study 625 (166) 417 (102) 81 43 HOCS Helsinki Ovarian Cancer Study 456 (456) 262 (262) 261 130 NTH Nijmegen Polygene Study & Nijmegen Biomedical Study 599 (598) 305 (300) 297 101 MAL Danish Malignant Ovarian Tumour Study 1075 (1075) 263 (263) 263 162 NJO New Jersey Ovarian Cancer Study 189 (173) 200 (177) 176 105 OVA Ovarian Cancer in Alberta and British Columbia Study 530 (460) 834 (706) 538 291 SEA UK SEARCH Ovarian Cancer Study 1,231 (1,227) 1,172 (1,160) 972 377 UKO UK Ovarian Cancer Population Study 542 (536) 490 (476) 476 231 Total 7,584 (6,966) 6,195 (5,620) 4,651 2245 Note: aCases eligible for secondary analysis in the replication. bCases eligible for primary analysis in the replication. cAOCS cases and controls included in the pools using stage 1 were excluded from analysis in the replication study.

general the more the working pool pairs, the more the reli- were then meta-analyzed, where the allele frequency dif- able results. (8) For the SNPs of interest, the proxies (linkage ference between each set of case and control pools was disequilibrium (LD) r2 > 0.7) must have similar association weighted by its inverse variance (binomial variances in case results as the underlying SNP. We applied stringent quality and control pools plus pooling error variances; Macgregor controls to limit false positive results rising from pooling et al., 2006, 2008). A pooling program that incorporates the design. After a whole series of QC steps, 9,14,948 SNPs were steps of estimating pooled allele frequency, mean normal- retained. ization, quality controls, and finally association test taking Individual genotyping for 20 SNPs selected from into account pool-specific errors, has been developed for the pooled GWAS was performed using MALDI-TOF the pooled GWAS. This program is available on request. spectrophotometric mass determination of allele-specific For individually genotyped data, the SNP association primer extension products using Sequenom’s MassARRAY was assessed in a logistic regression model implemented system and iPLEX technology (Sequenom Inc.). The de- in PLINK (Purcell et al., 2007). Assuming a log additive sign of oligonucleotides was carried out according to the model of inheritance, the per-allele risk was estimated by guidelines of Sequenom and performed using MassARRAY fitting the number of rare alleles as continuous variable. We Assay Design software (version 4.0). Multiplex PCR am- did not adjust for age effect in individual genotyping (IG) plification of amplicons containing SNPs of interest was validation to allow for a direct comparison of pooled and performed using the Qiagen HotStart Taq Polymerase on individual genotyped results on the same AOCS samples. a Perkin Elmer GeneAmp 2400 thermal cycler with 5-ng However, the age-adjusted results were similar (results not genomic DNA. Primer extension reactions were carried out presented). In the replication stage, both age and study sites according to manufacturer’s instructions for iPLEX chem- were adjusted for in the logistic regression model. istry. Assay data were analyzed using Sequenom TYPER software (Version 3.4). These SNPs passed the following Results standard QC checks: (1) p-value for the Hardy–Weinberg equilibrium (HWE) test ≥0.05 in both cases and controls; In order to reduce heterogeneity, the majority of invasive (2) call rate > 95%; (3) concordance >98% between dupli- serous cases from the AOCS included in the pooled GWAS cate pairs (at least 5% per study site). One SNP (rs12078260) had tumors that originated in the ovary (except for one failed the HWE test in controls. case, whose tumor appeared to arise in fallopian tubes), and are of high stage (>92% cases with FIGO (International Federation of Gynaecology and Obstetrics) stage III or IV) Statistical Methods and Analytic Tools and grade (>99% cases with grade 2 or 3). Since age at In the pooled GWAS, the allele frequencies on each locus diagnosis for cases is a predictor of overall survival time, were estimated from each pool, and then the differences of age differences were observed in the comparison of cases allele frequencies between each pair of case/control pool with poor survival and controls, which were younger than were assessed in the association test. Details of pooling these cases. A nominally significant difference in mean age data analysis were described elsewhere (Lu et al., 2010). was also found in the comparison of cases extracted using The four sets of association results from each pool pair Qiagen columns and controls (Table 1). After carrying out

618 OCTOBER 2012 TWIN RESEARCH AND HUMAN GENETICS Downloaded from https://www.cambridge.org/core. IP address: 170.106.34.90, on 29 Sep 2021 at 01:23:22, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/thg.2012.38 GWAS for Ovarian Cancer Susceptibility Using Pooled DNA

TABLE 3 Comparison of Pooled GWAS and Individual Genotyping (IG) Validation Results for 19 SNPs in AOCS Samples

Pooled GWAS IG validation

SNP ID Chr Coordinate Allelesa OR SE P_poolb OR SE P_IG

rs7562599 2 153711404 [T/C] 0.595 0.122 2.14E-05 0.637 0.153 3.01E-03 rs10792844 11 85658476 [A/C] 0.463 0.190 4.94E-05 0.886 0.176 4.89E-01 rs1573110 10 9135501 [A/G] 1.744 0.141 8.46E-05 2.003 0.162 1.41E-05 rs17759746 2 28247797 [T/C] 0.596 0.135 1.23E-04 0.649 0.158 5.73E-03 rs8043748 16 11753732 [A/G] 0.686 0.098 1.28E-04 0.679 0.097 6.84E-05 rs17353424 8 107892457 [T/C] 0.638 0.117 1.28E-04 0.565 0.267 3.02E-02 rs7974375 12 117070081 [A/C] 1.608 0.125 1.43E-04 1.680 0.128 4.39E-05 rs10818911 9 125854025 [T/G] 0.627 0.125 1.91E-04 0.656 0.150 4.65E-03 rs4887515 15 85233018 [T/C] 0.610 0.133 1.95E-04 0.573 0.165 6.48E-04 Rs1903532 4 179644249 [T/G] 0.648 0.116 1.98E-04 0.751 0.136 3.46E-02 Rs11592097 10 2166806 [A/C] 0.645 0.121 2.85E-04 0.609 0.194 9.90E-03 Rs2798823 14 94490125 [A/G] 0.687 0.108 4.76E-04 0.701 0.130 6.17E-03 rs2086545 11 13164955 [C/G] 1.450 0.109 6.55E-04 1.154 0.130 2.72E-01 rs2499834 1 160236260 [A/C] 0.627 0.139 7.79E-04 0.623 0.175 6.28E-03 rs1053495 10 71544054 [T/C] 0.611 0.148 8.66E-04 0.779 0.188 1.84E-01 rs1566198 18 11249831 [T/G] 1.510 0.126 1.05E-03 1.373 0.128 1.27E-02 rs16135 7 24294445 [A/G] 0.576 0.170 1.16E-03 0.769 0.197 1.82E-01 rs16899823 5 81992978 [T/C] 0.665 0.133 2.20E-03 0.638 0.197 2.16E-02 rs12027970 1 37537850 [A/C] 0.685 0.124 2.31E-03 0.609 0.213 1.89E-02 Note: aThe first allele was the risk allele of SNP. bThe table was sorted by the strength of association found in pooled GWAS (P_pool).

extensive quality control, we tested association for 9,14,948 samples collected from 12 study sites in OCAC (Table 2). SNPs on each comparison of case-control pools (for details Among 4,651 eligible White invasive cases of non-Hispanic of pooling designs, see Samples in Materials and Methods origin, the majority (>95%) were classified as having pri- section), and meta-analyzed the four sets of genome-wide marytumorintheovary,asopposedtothefallopiantubeor association results using a standard weighting method in peritoneum. Unlike the AOCS cases included in the pooled order to maximize statistical power. GWAS, the OCAC cases on the whole were evenly dis- We chose 20 SNPs from the pooled GWAS for individ- tributed over all tumor stages and grades (∼54% in high ual genotyping in the same AOCS samples as a validation stage and ∼58% in high grade). Two sets of analyses were of pooled results. These SNPs were among the top-ranked performed according to histology: In the primary analysis SNPs from the pooled GWAS that had evidence of associ- we restricted to White non-Hispanic cases with the serous ation with ovarian cancer susceptibility, but none of these subtype, which allowed for direct replication of SNPs found reached genome-wide significance. Moreover, these were in the stage 1 GWAS on serous ovarian cancer cases; whereas selected for being in the subset of SNPs not well tagged in the secondary analysis we included cases with all histolog- by Illumina 610K array, as one of our aims was to test ical subtypes to determine whether these SNPs show asso- the hypothesis that this pooled GWAS using denser SNP ciation with ovarian cancer regardless of histological types. arrays could uncover additional risk SNPs not identified The association results adjusted for age and study site are by the previous GWAS. These 20 SNPs were successfully presented in Table 4. The results showed no replication for genotyped for nearly all the AOCS samples included in any of the 19 SNPs in the analyses restricted to serous cases the pooled GWAS (971 out of 985 pooled samples), but only (primary analysis), or in the analyses combining all one SNP failed quality control. Table 3 compares the odds histological subtypes (secondary analysis). ratios (OR) and p-values from the pooled GWAS and IG validation results. Despite slight difference in samples, good concordance was observed in OR estimates, with all risk di- Discussion rections in agreement in both sets of results. For 15/19 SNPs, To date, one ovarian cancer GWAS has revealed several the putative associations found in the pooled GWAS were SNPs associated with susceptibility. None of the identi- clearly validated in IG results. Therefore, by comparing the fied loci showed large effects (OR: 0.76–1.30 depending results from pooled genotyping and individual genotyping on the histological subtype), but the study was well pow- on the same set of samples, we showed that GWAS using ered to find common alleles with moderate effects (Song pooled DNA, quantified by spectrometry, has the poten- et al., 2009). In contrast, our study of the pooled GWAS tial to estimate allele frequencies accurately and provide an on serous ovarian cancer susceptibility was under-powered efficient test of association. to detect the alleles with moderate effects because of small In addition, we sought independent replication for these sample size. In our pooled GWAS, the published risk SNPs, 19 SNPs by individually genotyping a total of 13,779 rs3814113 at 9p22.2, rs2072590 at 2q31, and rs2665390 at

TWIN RESEARCH AND HUMAN GENETICS OCTOBER 2012 619 Downloaded from https://www.cambridge.org/core. IP address: 170.106.34.90, on 29 Sep 2021 at 01:23:22, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/thg.2012.38 Yi Lu et al.

TABLE 4 Replication Results of 19 SNP from Pooled GWAS by Individual Genotyping of 13,779 OCAC Samples

Primary analysisc Secondary analysisc

SNP Chr Coordinate Allelesa Nb OR SE p Nb OR SE p

rs7562599 2 153711404 [T/C] 9,185 1.06 0.05 0.26 11,580 1.03 0.04 0.48 rs10792844 11 85658476 [A/C] 9,188 0.98 0.06 0.76 11,588 1.06 0.05 0.27 rs1573110 10 9135501 [A/G] 9,184 1.00 0.06 0.96 11,581 0.99 0.04 0.88 rs17759746 2 28247797 [T/C] 9,175 1.03 0.06 0.62 11,572 0.99 0.05 0.80 rs8043748 16 11753732 [A/G] 9,178 1.03 0.04 0.40 11,574 1.03 0.03 0.37 rs17353424 8 107892457 [T/C] 9,198 0.89 0.09 0.24 11,599 0.89 0.07 0.11 rs7974375 12 117070081 [A/C] 9,147 0.98 0.05 0.61 11,547 1.00 0.04 0.97 rs10818911 9 125854025 [T/G] 9,194 1.00 0.05 0.96 11,597 0.98 0.04 0.61 rs4887515 15 85233018 [T/C] 9,175 0.98 0.06 0.77 11,566 1.03 0.05 0.50 rs1903532 4 179644249 [T/G] 9,167 1.08 0.05 0.17 11,551 1.01 0.04 0.84 rs11592097 10 2166806 [A/C] 9,152 0.97 0.06 0.60 11,535 0.95 0.05 0.27 rs2798823 14 94490125 [A/G] 9,184 0.95 0.05 0.34 11,583 0.94 0.04 0.08 rs2086545 11 13164955 [C/G] 9,184 1.03 0.05 0.57 11,581 1.06 0.04 0.19 rs2499834 1 160236260 [A/C] 9,152 0.93 0.06 0.20 11,547 0.96 0.04 0.33 rs1053495 10 71544054 [T/C] 9,200 1.01 0.06 0.93 11,600 1.02 0.05 0.72 rs1566198 18 11249831 [T/G] 9,193 1.03 0.05 0.44 11,593 1.04 0.04 0.30 rs16135 7 24294445 [A/G] 9,187 1.04 0.07 0.55 11,589 1.02 0.05 0.76 rs16899823 5 81992978 [T/C] 9,187 1.04 0.06 0.51 11,586 1.07 0.05 0.14 rs12027970 1 37537850 [A/C] 9,184 0.95 0.07 0.44 11,583 0.97 0.05 0.59 Note: aSame risk alleles as listed in Table 2. bNumber of samples with non-missing genotypes for each SNP. cPrimary analysis was restricted to serous cases; secondary analysis including all histological subtypes.

TABLE 5 Pooled GWAS Results on the Published Loci Known to be Associated with Serous Ovarian Cancer Susceptibility

Position SNP ID Publication A1 Reported per-allele ORa Reported p-value Pooled per-allele OR Pooled p-value

9p22.2 rs3814113 Song et al., 2009 C 0.77 (0.73–0.81) 4.1e-21 0.86 (0.72–1.05) 0.082 19p13 rs8170 Bolton et al., 2010 C 1.18 (1.12–1.25) 2.7e-09 0.97 (0.77–1.31) 0.175 rs2363956 G 1.16 (1.11–1.21) 3.8e-11 0.92 (0.78–1.12) 0.60 2q31 rs2072590 Goode et al., 2010 T 1.20 (1.14–1.25) 3.8e-14 1.21 (1.01–1.41) 0.03 3q25 rs2665390 C 1.24 (1.15–1.34) 7.1e-08 1.22 (0.93–1.79) 0.060 8q24 rs10088218 A 0.76 (0.70–0.81) 8.0e-15 1.28 (0.89–1.66) 0.882 17q21 rs9303542 G 1.14 (1.09–1.20) 1.4e-07 NAb NAb Note: aFor a direct comparison of results, reported per-allele ORs are the results from the published GWAS restricted to serous cases. brs9303542 was not on Illumina Human 1Mduo array, but rs6504172 is in high linkage disequilibrium with rs9303542 (r2 = 0.841). It had a per-allele OR in the pooled GWAS of 1.10 (0.84–1.56) (p = 0.49).

3q25, showed similar ORs and in the same direction as re- with survival time by comparing good, medium, and poor ported previously (Goode et al., 2010; Song et al., 2009), survivalpools,butwewouldhaveevenlesspowertodetect but these reached nominal or borderline significance only reliable association with survival, so these results are not (Table 5). The other three SNPs (rs8170 and rs2363956 at presented. 19p13, and rs10088218 at 8q24) identified by the published Although we were under-powered to locate any common GWAS (Bolton et al., 2010; Goode et al., 2010) were not SNPs with weak effects, our study had the potential to iden- significantly associated with risk in our results, and SNP tifycommonSNPswithmoderatetolargeeffectsonovarian rs9303542 at the 17q21 (Goode et al., 2010) was not on Il- cancer risk if any. A notable example in cancer genetics is the lumina Human 1M-Duo array (Table5). Wefound a similar common variant in KITLG with a per allele risk of 2.5 for OR for rs6504172, which is in high linkage disequilibrium testicular cancer, which was identified from an initial GWAS with rs9303542 (r2 = 0.841), but this SNP was not signifi- in ∼300 cases and ∼900 controls (Kanetsky et al., 2009). cantly associated with risk (p = 0.49). Wetherefore found no This empirical example suggests that although most loci ex- support for our hypothesis that additional common SNPs hibit smaller effect sizes, common SNPs with moderate to represented on the 1M-Duo arrays contribute to ovarian large effect do exist in cancer genetics, and therefore it is of cancer risk, probably because of insufficient power in stage interest to test similar hypothesis in different cancer types. 1 of this pooled GWAS. GWAS genotyping on pooled DNA does not suffer from In the pooling design, we divided serous ovarian cancer substantial power loss compared to a conventional study cases into four case pools according to the overall survival using individual genotyping. For example, in our pooled timeand/orthemethodinwhichtheDNAswereisolated. GWAS, assuming an additive effect risk allele with 20% In theory it would be possible to test for association of SNPs frequency that confers a relative risk of 2, power was 80%

620 OCTOBER 2012 TWIN RESEARCH AND HUMAN GENETICS Downloaded from https://www.cambridge.org/core. IP address: 170.106.34.90, on 29 Sep 2021 at 01:23:22, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/thg.2012.38 GWAS for Ovarian Cancer Susceptibility Using Pooled DNA

even after scaling the original sample size by 10% in order error. Therefore, constructing large pools is not likely to to account for additional variance because of pooling errors yield a great loss of power. An optimal pool design for a (Macgregor et al., 2008), in comparison with 88% power limited research budget will be a few large pools, which are using individual genotyping. An empirical study with ex- then genotyped for multiple times. One major criticism of amples of successful identification of the known variants, the pooled GWAS is that there is no information on individ- including the eye color locus at OCA2/HERC2 (15q11.2- ual genotypes or linkage disequilibrium information, so it is q12), the age-related macular degeneration locus at CFH generally impossible to impute missing genotypes, evaluate (1q32), and the locus for Pseudoexfoliation syndrome at haplotypes, or fine map the regions of interest. However, LOXL1 (15q22) clearly showed that common alleles with given the cost advantage, more expensive SNP arrays with large effects are not likely to be missed in the pooled GWAS better coverage can be used to partially compensate for the (Craig et al., 2009). Therefore, our results suggest that there power loss because of imperfect linkage disequilibrium be- are probably not hitherto poorly tagged common SNPs with tween genetic markers and causal variants. Furthermore, moderate to large effects still to be identified. fine mapping of loci identified by GWAS is usually per- This study also demonstrates that it is not always neces- formed in a stage 2 or 3 of genotyping once the loci have sary to measure DNA concentration by Picogreen absorp- been confirmed in additional samples. Here we have investi- tion, prior to making DNA pools. At least for the set of DNAs gated the use of dense Illumina 1M-Duo arrays in locating we used, which were largely isolated by salt-extraction, variants that were poorly tagged by previous arrays. We this study demonstrated highly consistent results between found that moderate to large effects on ovarian cancer risk pooled genotyping and individual genotyping. However, are unlikely to exist among the SNPs on this array, but we it is worth noting that we have previously found that the are not able to make a clear statement about the possible correlation between the concentrations measured by Nan- existence of additional SNPs with small effects because of odrop spectroscopy and Picogreen adsorption is high (r2 = limited study power. 0.5107) for a related set of 200 DNAs. In summary, we have carried out the pooled GWAS on It should be noted that lack of replication in this study 342 invasive serous cases and 643 controls. The accuracy was not due to problems in the DNA pooling method. of estimated odds ratios was then validated by individually Since additional errors, such as pool construction errors genotypingthesamesubjectsthatwereincludedinthepool. and pool measurement errors, could be involved (Sham We showed that pooled genotyping using DNAs quantified et al., 2002), we have implemented careful experiments and by Nanodrop spectroscopy, together with analytical tools for rigorous analysis to address this concern. Firstly, we per- the pooled data, work well in terms of achieving accurate OR formed careful experiments to ensure equal quantity of estimations and providing reasonable association signals. DNA contributed by individual samples during the forma- We therefore propose to use pooled GWAS for less common tion of the pools; secondly, all the pools were genotyped at subtypes of cancer or orphan diseases where research funds least thrice to yield better allele frequency estimates, and are limited. In addition, we have developed an analytical we applied stringent quality controls to limit the number of tool for analyzing the pooled GWAS data, which will be possible false positives; lastly, we accounted for additional available on request. variance because of pooling errors in the association tests. We also validated pooling results using individual genotyp- ing. Given that most of the top 20 SNPs from the pooled Acknowledgments GWAS were validated in the same samples by individual The Australian Ovarian Cancer Study (AOCS) Management genotyping, the lack of replication is most likely to be due Group (D. Bowtell, G. Chenevix-Trench, A. deFazio, D. to relatively small sample size in our stage 1 GWAS rather Gertig,A.Green,andP.M.Webb)gratefullyacknowledges than due to problems with the pooling approach. the contribution of all clinical and scientific collabora- In order to improve power for GWASusing pooled DNA, tors (see http://www.aocstudy.org/). The Australian Cancer larger sample sizes and higher density micro-arrays are re- StudyManagementGroup(A.Green,P.Parsons,N.Hay- quired. However, to properly accommodate a large sample ward,P.M.Webb,andD.Whiteman)thanksalloftheproject size, a balance between the statistical power and the accu- staff, collaborating institutions, and study participants. NJ racy of the allele frequency estimates is needed. A number Ovarian Cancer Study (NJO) investigators (EV Bandera, of empirical studies have investigated the impact of pool SH Olson, I Orlow, L Rodriguez-Rodriguez) would like to size (up to 1,000 samples in the pool) on the accuracy of thank Melony Williams-King and the staff at the New Jer- allele frequency estimate, and usually found no obvious re- sey State Cancer Registry, in particular, Lisa Paddock. BE- lationship between the pool size and the accuracy of allele LOCS investigators would like to thank Gilian Peuteman, frequency estimation (Jawaid & Sham, 2009; Le Hellard Thomas Van Brussel, and Dominiek Smeets for technical et al., 2002; Macgregor, 2007). As indicated in Macgregor assistance. SEARCH investigators thank study participants, et al. (2007), most variation from pooled DNA genotyping the Eastern Cancer Registration and Information Centre, is attributable to array error rather than pool construction the collaborating general practitioners, the SEARCH team

TWIN RESEARCH AND HUMAN GENETICS OCTOBER 2012 621 Downloaded from https://www.cambridge.org/core. IP address: 170.106.34.90, on 29 Sep 2021 at 01:23:22, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/thg.2012.38 Yi Lu et al.

for patient recruitment, and Caroline Baynes, Don Con- Craig,J.E.,Hewitt,A.W.,McMellon,A.E.,Henders,A.K., roy, and Craig Luccarini for sample preparation. The Ger- Ma,L.J.,Wallace,L.,Sharma,S.,Burdon,K.P.,Visscher, man Ovarian Cancer Study (GER) thanks Ursula Eilber P. M., Montgomery, G. W., & MacGregor, S. (2009). Rapid and Tanja Koehler for competent technical assistance. The inexpensive genome-wide association using pooled whole blood. Genome Research, 19, 2075–2080. Hannover–Jena Ovarian Cancer Study (HJO) gratefully ac- knowledges the contribution of our clinical collaborators Fletcher, O., & Houlston, R. S. (2010). Architecture of inherited susceptibility to common cancer. Nature Reviews Cancer, Frauke Kramer, Wen Zheng, Peter Hillemanns, and Ingo 10, 353–361. Runnebaum. HAWAII investigators thank all study par- Goode, E. L., Chenevix-Trench, G., Song, H., Ramus, S. J., ticipants and members of research teams of all partici- Notaridou, M., Lawrenson, K., & Pharoah, P. D. (2010). A pating studies, including research nurses, research scien- genome-wide association study identifies susceptibility loci tists, data entry personnel, and consultant gynecological for ovarian cancer at 2q31 and 8q24. Nature Genetics, 42, oncologists. Grant Support:Y.Luispartlysupportedby 874–879. Australian National Health and Medical Research Council Jawaid, A., & Sham, P.(2009). Impact and quantification of the (NHMRC) grant 496675. S. Macgregor is supported by a sources of error in DNA pooling designs. Annals of Human Career Development Award from the NHMRC. NJO was Genetics, 73, 118–124. funded by grants from the US National Cancer Institute Johnatty, S. E., Beesley, J., Chen, X., Macgregor, S., Duffy, (K07CA095666, R01CA83918, and K22CA138563) and the D. L., Spurdle, A. B., & Chenevix-Trench, G. (2010). Evalu- Cancer Institute of New Jersey. BELOCS was funded by the ation of candidate stromal epithelial cross-talk genes iden- Nationaal Kankerplan initiatief 29. SEARCH is funded by a tifies association between risk of serous ovarian cancer and programme grants from Cancer Research UK (A490/10119 TERT, a cancer susceptibility “hot-spot.” PLoS Genetics, 6, and A490/10124). The German Ovarian Cancer Study e1001016. (GER) was supported by the German Federal Ministry of Kanetsky, P. A., Mitra, N., Vardhanabhuti, S., Li, M., Vaughn, Education and Research of Germany, Programme of Clini- D.J.,Letrero,R.,Letrero,R.,Ciosek,S.L.,Doody,D. R., Smith, L. M., Weaver, J., Albano, A., Chen, C., Starr, cal Biomedical Research (01 GB 9401); genotyping in part J.R.,Rader,D.J.,Godwin,A.K.,Reilly,M.P.,Hakonarson, by the state of Baden-Wurttemberg¨ through the Medical H., Schwartz, S. M., & Nathanson, K. L. (2009). Common Faculty, University of Ulm (P.685); and data management variation in KITLG and at 5q31.3 predisposes to testicular by the German Cancer Research Center. HAWAII is funded germ cell cancer. Nature Genetics, 41, 811–815. by the US National Institutes of Health (R01 CA58598, N01- Le Hellard, S., Ballereau, S. J., Visscher, P. M., Torrance, H. S., CN-55424, N01-PC-67001, and N01-PC-95001–20). UKO Pinson, J., Morris, S. W., Thomson, M. L., Semple, C. A., is Funded by the Cancer Research UK, the Eve Appeal, Muir, W. J., Blackwood, D. H., Porteous, D. J., & Evans, the OAK Foundation, and the UK Department of Health’s K. L. (2002). SNP genotyping on pooled DNAs: Com- NIHR UCL/UCLH Biomedical Research Centre funding parison of genotyping technologies and a semi-automated scheme. method for data storage and analysis. Nucleic Acids Research, 30, e74. Lichtenstein, P., Holm, N. V., Verkasalo, P. K., Iliadou, A., Kaprio, J., Koskenvuo, M., Pukkala, E., Skytthe, A., & Hem- References minki, K. (2000). Environmental and heritable factors in the causation of cancer analyses of cohorts of twins from Bolton,K.L.,Tyrer,J.,Song,H.,Ramus,S.J.,Notaridou,M., Sweden, Denmark, and Finland. New England Journal of Jones, C., & Gayther, S. A. (2010). Common variants at Medicine, 343, 78–85. 19p13 are associated with susceptibility to ovarian cancer. Nature Genetics, 42, 880–884. Lu, Y., Dimasi, D. P., Hysi, P. G., Hewitt, A. W., Burdon, K. P., Toh, T., Ruddle, J. B., Li, Y. J., Mitchell, P., Healey, Burkey, A. R., & Kanetsky, P. A. (2009). Development of a P. R., Montgomery, G. W., Hansell, N., Spector, T. D., novel location-based assessment of sensory symptoms in Martin,N.G.,Young,T.L.,Hammond,C.J.,Macgregor, cancer patients: Preliminary reliability and validity assess- S., Craig, J. E., & Mackey, D. A. (2010). Common genetic ment. Journal of Pain and Symptom Management, 37, 848– variants near the Brittle Cornea Syndrome locus ZNF469 862. influence the blinding disease risk factor central corneal Chang,Y.M.,Newton-Bishop,J.A.,Bishop,D.T.,Armstrong, thickness. PLoS Genetics, 6, e1000947. B. K., Bataille, V., Bergman, W., Berwick, M., Bracci, P. M., Macgregor, S. (2007). Most pooling variation in array-based Elwood,J.M.,Ernstoff,M.S.,Green,A.C.,Gruis,N.A., DNA pooling is attributable to array error rather than pool Holly, E. A., Ingvar, C., Kanetsky, P. A., Karagas, M. R., construction error. European Journal of Human Genetics, Le Marchand, L., Mackie, R. M., Olsson, H., Østerlind, A., 15, 501–504. Rebbeck,T.R.,Reich,K.,Sasieni,P.,Siskind,V.,Swerdlow, A. J., Titus-Ernstoff, L., Zens, M. S., Ziegler, A., & Barrett, Macgregor, S., Visscher, P. M., & Montgomery, G. (2006). J. H. (2009). A pooled analysis of melanocytic nevus phe- Analysis of pooled DNA samples on high density arrays notype and the risk of cutaneous melanoma at different without prior knowledge of differential hybridization rates. latitudes. International Journal of Cancer, 124, 420–428. Nucleic Acids Research, 34, e55.

622 OCTOBER 2012 TWIN RESEARCH AND HUMAN GENETICS Downloaded from https://www.cambridge.org/core. IP address: 170.106.34.90, on 29 Sep 2021 at 01:23:22, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/thg.2012.38 GWAS for Ovarian Cancer Susceptibility Using Pooled DNA

Macgregor, S., Zhao, Z. Z., Henders, A., Martin, N. G., Mont- Sham, P., Bader, J. S., Craig, I., O’Donovan, M., & Owen, gomery, G. W., & Visscher, P. M. (2008). Highly cost- M. (2002). DNA pooling: A tool for large-scale association efficient genome-wide association studies using DNA pools studies. Nature Reviews Genetics, 3, 862–871. and dense SNP arrays. Nucleic Acids Research, 36, e35. Song, H., Ramus, S. J., Tyrer, J., Bolton, K. L., Gentry-Maharaj, Norton, N., Williams, N. M., O’Donovan, M. C., & Owen, A., Wozniak, E., & Gayther, S. A. (2009). A genome-wide M. J. (2004). DNA pooling as a tool for large-scale associa- association study identifies a new ovarian cancer suscepti- tion studies in complex traits. Annals of Medicine, 36, 146– bility locus on 9p22.2. Nature Genetics, 41, 996–1000. 152. Varghese, J. S., & Easton, D. F. (2010). Genome-wide associ- Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, ation studies in common cancers – what have we learnt? M. A., Bender, D., Maller, J., Sklar, P., de Bakker, P. I., Daly, Current Opinion in Genetics and Development, 20, 201–209. M. J., & Sham, P. C. (2007). PLINK: A tool set for whole- Visscher, P. M., & Le Hellard, S. (2003). Simple method to genome association and population-based linkage analyses. analyze SNP-based association studies using DNA pools. American Journal of Human Genetics, 81, 559–575. , 24, 291–296.

TWIN RESEARCH AND HUMAN GENETICS OCTOBER 2012 623 Downloaded from https://www.cambridge.org/core. IP address: 170.106.34.90, on 29 Sep 2021 at 01:23:22, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/thg.2012.38