Increased Genomic Burden of Germline Copy Number Variants Is Associated with Early Onset Breast Cancer: Australian Breast Cancer Family Registry Logan C
Total Page:16
File Type:pdf, Size:1020Kb
Walker et al. Breast Cancer Research (2017) 19:30 DOI 10.1186/s13058-017-0825-6 RESEARCH ARTICLE Open Access Increased genomic burden of germline copy number variants is associated with early onset breast cancer: Australian breast cancer family registry Logan C. Walker1, John F. Pearson2, George A. R. Wiggins1, Graham G. Giles3, John L. Hopper4* and Melissa C. Southey5 Abstract Background: Women with breast cancer who have multiple affected relatives are more likely to have inherited genetic risk factors for the disease. All the currently known genetic risk factors for breast cancer account for less than half of the average familial risk. Furthermore, the genetic factor(s) underlying an increased cancer risk for many women from multiple-case families remain unknown. Rare genomic duplications and deletions, known as copy number variants (CNVs), cover more than 10% of a human genome, are often not assessed in studies of genetic predisposition, and could account for some of the so-called “missing heritability”. Methods: We carried out a hypothesis-generating case-control study of breast cancer diagnosed before age 40 years (200 cases, 293 controls) using population-based cases from the Australian Breast Cancer Family Study. Genome-wide scanning for CNVs was performed using the Human610-Quad BeadChip and fine-mapping was conducted using PennCNV. Results: We identified deletions overlapping two known cancer susceptibility genes, (BRCA1 and BLM), and a duplication overlapping SMARCB1, associated with risk. The number of deletions across the genome was 1.5-fold higher for cases than controls (P =10-16), and 2-fold higher when only rare deletions overlapping genes (frequency <1%) were assessed (P =5×10-4). Association tests of CNVs, followed by experimental validation of CNV calls, found deletions overlapping the OR4C11 and OR4P4 genes were associated with breast cancer (P =0.02andP = 0.03, respectively). Conclusion: These results suggest rare CNVs might have a role in breast cancer susceptibility, at least for disease at a young age. Keywords: Breast cancer, Early onset, Copy number variants, Inherited susceptibility, Genome-wide association analysis Background currently only account for around 48% of average famil- Breast cancer is the most common malignancy among ial risk, and account for a lower proportion of the famil- women in the developed world, and is increasing rapidly ial risk of disease at a young age, and the vast majority in the developing world. A proportion of women with of women from multiple-case families do not have a breast cancer have multiple affected relatives and are known genetic explanation for their increased cancer therefore more likely to have inherited genetic factors risk [1, 2]. Thus, for a substantial fraction of women, in- that increase their risk of developing the disease. All the cluding young affected women and those with a family currently known genetic risk factors for breast cancer history, the cause of their disease remains unexplained. Copy number variants (CNVs) are estimated to cover – * Correspondence: [email protected] 5 10% of the human genome [3] and, based on nucleotide 4Centre for Epidemiology and Biostatistics, Melbourne School of Population coverage, are responsible for the majority of genetic vari- and Global Health, University of Melbourne, Melbourne, Victoria, Australia ability in human populations. CNVs have been reported to Full list of author information is available at the end of the article © The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Walker et al. Breast Cancer Research (2017) 19:30 Page 2 of 8 disrupt genes known to be involved in breast cancer sus- Genotyping and identification of CNVs ceptibility, including BRCA1, BRCA2, TP53 and CHEK2 All DNA samples were genotyped with the Human610- [4], and could similarly alter other genes involved in path- Quad BeadChip (Illumina, Inc, San Diego, CA, USA) ways related to breast cancer susceptibility. Furthermore, with approximately 610,000 markers (including approxi- several array-based studies have reported candidate rare mately 20,000 non-polymorphic markers) for single nu- CNVs that overlap genes; variants in these might contrib- cleotide polymorphism (SNP) and CNV analysis. Samples ute to breast cancer susceptibility [5–7]. However, there were processed using Illumina’s recommended protocol has been a notable lack of consistency across these studies, for Infinium HD assays. Data for each array were normal- probably because many women carry rare CNVs or be- ised using GenomeStudio 2011.1 software (Illumina). cause of false CNV calls caused by technical issues [8]. Probe information, including genomic location, signal in- Evidence from some studies has suggested that the fre- tensity (Norm R), allele frequency (Norm theta), log R ra- quency and size of germline CNVs are increased in tios (LRRs), and B allele frequencies (BAF), for each women with breast cancer [5, 7, 9], and that this sample was calculated and exported from GenomeStudio. might be strongest for CNVs that overlap gene re- All samples had a call rate >95%. The CNV calls were gen- gions [5, 7]. Thus, functional disruption of genes by erated using the PennCNV program (version 27 Aug. CNVs across the genome might contribute to the 2009), using the default program parameters, library files, genetic basis of breast cancer risk. and genomic wave adjustment. Only a few studies have examined the relationship be- Quality control procedures were performed to remove tween common germ-line CNVs and breast cancer risk. poor-quality array data (Additional file 1: Figure S1). A large genome-wide association study (2000 breast can- Samples were excluded if they met the following criteria: cer cases and 3000 controls) from the Wellcome Trust log R ratio standard deviation >0.28; B allele frequency Case Control Consortium suggested that such CNVs drift >0.01; waviness factor deviating from 0 by >0.04, or were unlikely to have a major role in the genetic basis of with the number of CNV calls exceeding 70. To reduce breast cancer [10]. However, more recent genome-wide false positives, CNV calls were excluded if they con- association studies of common CNVs (mean allele fre- tained <5 probes, and/or were ≥1000 kb in size. A total quency (MAF) ≥5%) in Chinese and European women of 200 cases and 293 controls passed quality control identified a deletion in the APOBEC3 gene cluster asso- steps and were assessed in the study. CNV data used in ciated with up to 1.3-fold and up to 2.3-fold increase in the study are shown in Additional file 2: Table S1. risk of breast cancer associated with hemizygous and homozygous deletions, respectively [11, 12]. Identification of genes overlapping CNVs and defining To better understand the role of CNVs in breast can- rare CNVs cer risk we have conducted a hypothesis-generating To avoid examining multiple isoforms from genes, we study of breast cancer at a young age (diagnosed before annotated 39,544 UCSC RefSeq (NCBI36/Hg18) tran- the age of 40 years). Our study aimed to assess whether scripts using the SOURCE database [27] and defined the CNVs across the genome are more frequent in such genomic intervals for a total of 18,791 unique genes. breast cancer cases when compared with unaffected Thus, each gene interval encompassed the start and end controls, and whether cancer susceptibility genes are of all associated RefSeq transcripts (Additional file 1: disrupted by rare CNVs. Figure S2). CNVs and gene regions that were estimated to overlap by at least one base pair were identified in a genome-wide scan using Intersect and Join tools from Methods the Galaxy web server [28–30]. Subjects Because putative CNV calls do not typically conform to We conducted a hypothesis-generating study of 258 discrete genomic regions in different women, we used the women who were diagnosed with breast cancer before genome coordinates of 18,791 RefSeq gene (NCBI36/Hg18) the age of 40 years from the population-based Australian boundaries to define a CNV region (Additional file 1: Breast Cancer Family Study [13–15]. These cases had Figure S1). Each of these regions therefore represented a clus- been previously screened for germline mutations in ter of one or more CNVs overlapping a well-characterised BRCA1, ATM, CHEK2, PALB2, TP53, BRCA2, CDH1, gene in the human genome and was used to measure the and FANCM [13, 14, 16–25]. A total of 348 women un- frequency of CNVs in our study. Rare CNVs were defined as affected with breast cancer (controls) were selected from thosewithafrequency<1%inthetotalsample. participants in the Australian Mammographic Density Twins and Sisters Study, a cross-sectional study of twins CNV validation by quantitative PCR and their sisters [15, 26]. All study participants provided DNA samples were used to experimentally validate written informed consent. putative CNVs at 12 genomic regions using Human Walker et al. Breast Cancer Research (2017) 19:30 Page 3 of 8 TaqMan® Copy Number Reference Assays (Thermo- compared with controls and that this feature was due fisher Scientific Inc). Primer and probe sequences are to the inheritance or de novo formation of genomic presented in Additional file 3: Table S2. RNaseP was deletions. used as an endogenous reference gene. All assays To assess the potential disruption of CNVs to func- were carried out in triplicate.