Allele-Specific Disparity in Breast Cancer Fatemeh Kaveh1, Hege Edvardsen1, Anne-Lise Børresen-Dale1,2, Vessela N Kristensen1,2,3* and Hiroko K Solvang1,4
Total Page:16
File Type:pdf, Size:1020Kb
Kaveh et al. BMC Medical Genomics 2011, 4:85 http://www.biomedcentral.com/1755-8794/4/85 RESEARCHARTICLE Open Access Allele-specific disparity in breast cancer Fatemeh Kaveh1, Hege Edvardsen1, Anne-Lise Børresen-Dale1,2, Vessela N Kristensen1,2,3* and Hiroko K Solvang1,4 Abstract Background: In a cancer cell the number of copies of a locus may vary due to amplification and deletion and these variations are denoted as copy number alterations (CNAs). We focus on the disparity of CNAs in tumour samples, which were compared to those in blood in order to identify the directional loss of heterozygosity. Methods: We propose a numerical algorithm and apply it to data from the Illumina 109K-SNP array on 112 samples from breast cancer patients. B-allele frequency (BAF) and log R ratio (LRR) of Illumina were used to estimate Euclidian distances. For each locus, we compared genotypes in blood and tumour for subset of samples being heterozygous in blood. We identified loci showing preferential disparity from heterozygous toward either the A/B-allele homozygous (allelic disparity). The chi-squared and Cochran-Armitage trend tests were used to examine whether there is an association between high levels of disparity in single nucleotide polymorphisms (SNPs) and molecular, clinical and tumour-related parameters. To identify pathways and network functions over-represented within the resulting gene sets, we used Ingenuity Pathway Analysis (IPA). Results: To identify loci with a high level of disparity, we selected SNPs 1) with a substantial degree of disparity and 2) with substantial frequency (at least 50% of the samples heterozygous for the respective locus). We report the overall difference in disparity in high-grade tumours compared to low-grade tumours (p-value < 0.001) and significant associations between disparity in multiple single loci and clinical parameters. The most significantly associated network functions within the genes represented in the loci of disparity were identified, including lipid metabolism, small-molecule biochemistry, and nervous system development and function. No evidence for over- representation of directional disparity in a list of stem cell genes was obtained, however genes appeared to be more often altered by deletion than by amplification. Conclusions: Our data suggest that directional loss and amplification exist in breast cancer. These are highly associated with grade, which may indicate that they are enforced with increasing number of cell divisions. Whether there is selective pressure for some loci to be preferentially amplified or deleted remains to be confirmed. Background alternations (CNAs) [1]. Such chromosomal variations Higher organisms such as humans are diploid, which have been associated with genetic disorders (for CNVs) means that all chromosomes except the sex chromo- and the progression of cancer (for CNAs) [2]. The size somes are present in two copies (2 n). During DNA of CNVs in normal tissues is often shorter than CNAs replication, unique genomic segments of original DNA in tumour tissues, which can cover a large region of the are normally copied just one time, but in specific situa- human genome (Mb) [3]. Array-based comparative tions, these fragments are either not replicated entirely genomic hybridization (CGH) is used for detection of ortheyarecopiedmorethanonce.Thevariation CNVs and CNAs [4,5]. between normal and actual copy number of a DNA seg- Recently, new techniques have been developed to ment is referred to as copy number differences, which studyCNVsandCNAsatahigherresolutionusing in normal cells are referred to as copy number varia- high-density arrays of single nucleotide polymorphisms tions (CNVs) and in tumour cells as copy number (SNPs) [6]. SNPs are variations in the single-base locus of the genome within members of a species or paired * Correspondence: [email protected] chromosomes of an individual. In general, SNPs have 1Department of Genetics, Institute for Cancer Research, Norwegian Radium precise positions on the chromosome and two alleles Hospital, Oslo University Hospital, Oslo, Norway (maternal and paternal), and SNP-based arrays are Full list of author information is available at the end of the article © 2011 2011 Kaveh et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Kaveh et al. BMC Medical Genomics 2011, 4:85 Page 2 of 13 http://www.biomedcentral.com/1755-8794/4/85 powerful tools for detection of allele-specific chromoso- Human-1 109K BeadChip array (Illumina, San Diego, mal aberrations and loss of heterozygocity (LOH) status CA, USA). For each sample, the corresponding log R [7]. Recent advances in the systematic study of genomic ratio (LRR) and B allele frequency (BAF) were extracted alterations in human cancer samples have made it possi- for both blood and tumour using the Illumina Beadstu- ble to understand this disease better and identify key dio Genotyping software, which is based on the transfor- genes with underlying roles in oncogenesis [8]. In the mation of two channels’ allelic intensity values [7]. present study, we describe the development of a numer- Briefly Illumina’s Beadstudio software performs normali- ical algorithm based on Euclidian distances to extract zation on the two raw allelic intensity values (X, Y)for the maximum level of information from SNP array data each locus (SNP). X and Y values are respectively in blood-tumour pairs, with regard to both chromoso- denoted as allele A and allele B. The normalized allele mal aberrations and loss of heterozygosity. We apply the intensities are transformed to total signal intensity, R (R algorithmtoIllumina109KSNParraydataandshow = X + Y), and relative allelic signal intensity ratio, theta how it can be used to mine information on a single- (θ =arctan(Y/X)(2/π)) [12].Illumina provides two sup- SNP, cytogenic region as well as in gene lists and at the plementary parameters, LRR and BAF,whichreferto pathway level. We focus on the disparity of CNAs in the normalized and calibrated form of R and θ.BAF tumour samples, which were compared to those in represents the proportion of one SNP allele (B)tothe blood in order to identify the directional loss of total number of A and B alleles (NB/(NA + NB)) for each heterozygosity. given loci. The Illumina Infinium platform detects chro- mosomal aberrations by comparing the normalized Methods intensity of a subject sample and a reference sample Materials using two modes of analysis: 1) a single-sample mode, The 112 samples used here are a subset of a larger which derives reference values from canonical genotyp- patient series consisting of 920 samples, which were col- ing clusters (0, 0.5, 1) created from clustering on normal lected from breast cancer patients to study the effect of reference samples; 2) Paired-sample mode, which makes disseminating tumour cells to the blood and bone mar- intensity (R) comparisons between a subject sample and row [9]. The samples were collected at five different hos- its paired reference sample. Genomic plots of LRR (LRR pitals in the Oslo region and have so far been extensively =log2(Rsubject/Rreference)) and BAF of the three canonical studied on both the clinical and biological levels [9,10]. clusters are used to detect chromosomal aberrations [1,13,14]. Comparing custom-clustered data to HapMap- DNA isolation clustered data, we observe improved scoring using cus- Tumour tissues were snap-frozen at -80°C. Samples of tom clustering. The genotypes for both blood and mononucleus peripheral blood cells were drawn in tubes tumour were therefore extracted using custom cluster- and centrifuged. Before freezing, the cell pellet was ing [15]. To improve the results of genotyping for those resuspended to a concentration of 30 × 106 cells/ml in a samples that have not performed well, we used the cus- freezing solution containing RPMI added with 10% tom-clustered data. DMSO and 20% FCS as well as Dnase. The samples Missing values are denoted as NaN for both BAF and were then frozen at -70°C before being stored in liquid LRR. Updated SNP information (position and gene asso- nitrogen. Before DNA isolation, 240 μlProt.K(AB- ciation) was derived when available using the UCSC Applied Biosystems) and 400 μl 1 × PBS without CaMg Genome Browser (Human Mar. 2006 Assembly were added to the frozen samples on ice. The samples (NCBI36/hg18)) [16], dbSNP (build 130), Ensembl gen- were incubated at 55°C until thawed and then trans- ome browser (Ensembl genes 57 database on Homo ferred after mixing to the vessels in the Applied Biosys- sapiens genes (GRCh37) dataset) [17], and CHIP Bioin- tems 340A Nucleic Acid Extractor. The remaining formatics Tools, Goldenpath: hg18, dbSNP: build 130 process of DNA isolation followed the standard proce- [18]. dures for the extractor using a phenol-chloroform based method. DNA from frozen tumour tissue was extracted Description of computational algorithm according to standard procedures of the Applied Biosys- Pre-processing of data tems 340A Nucleic Acid Extractor. Traditional Array CGH provides information on CNAs All patients provided informed consent, and the study based on the Log R intensity ratios. The benefit of the was approved by the local regional ethical committees. Illumina SNP array analysis is that it provides additional information related to B allele frequency. The Beadstu- Genotyping dio software presents information, based on two allelic The genotyping used has been previously described [11], intensity values, related to both Log R ratio and B allele and blood-tumour pairs were genotyped using the frequency. To analyze the Illumina output, we propose a Kaveh et al. BMC Medical Genomics 2011, 4:85 Page 3 of 13 http://www.biomedcentral.com/1755-8794/4/85 computational algorithm consisting of the following 1 n steps for each chromosome (illustrated by the flowchart deviation(S = (b − μ )2).