Bhuvaneshwar et al. BMC Cancer (2019) 19:357 https://doi.org/10.1186/s12885-019-5474-y

RESEARCHARTICLE Open Access Genome sequencing analysis of blood cells identifies germline haplotypes strongly associated with drug resistance in osteosarcoma patients Krithika Bhuvaneshwar1* , Michael Harris1, Yuriy Gusev1, Subha Madhavan1, Ramaswamy Iyer2, Thierry Vilboux2, John Deeken2, Elizabeth Yang3,4,5,6 and Sadhna Shankar3,4

Abstract Background: Osteosarcoma is the most common malignant bone tumor in children. Survival remains poor among histologically poor responders, and there is a need to identify them at diagnosis to avoid delivering ineffective therapy. Genetic variation contributes to a wide range of response and toxicity related to chemotherapy. The aim of this study is to use sequencing of blood cells to identify germline haplotypes strongly associated with drug resistance in osteosarcoma patients. Methods: We used sequencing data from two patient datasets, from Inova Hospital and the NCI TARGET. We explored the effect of mutation hotspots, in the form of haplotypes, associated with relapse outcome. We then mapped the single nucleotide polymorphisms (SNPs) in these haplotypes to and pathways. We also performed a targeted analysis of mutations in Drug Metabolizing Enzymes and Transporter (DMET) genes associated with tumor necrosis and survival. Results: We found intronic and intergenic hotspot regions from 26 genes common to both the TARGET and INOVA datasets significantly associated with relapse outcome. Among significant results were mutations in genes belonging to AKR enzyme family, cell-cell adhesion biological process and the PI3K pathways; as well as variants in SLC22 family associated with both tumor necrosis and overall survival. The SNPs from our results were confirmed using Sanger sequencing. Our results included known as well as novel SNPs and haplotypes in genes associated with drug resistance. Conclusion: We show that combining next generation sequencing data from multiple datasets and defined clinical data can better identify relevant pathway associations and clinically actionable variants, as well as provide insights into drug response mechanisms. Keywords: Whole genome sequencing, Childhood cancers, Drug resistance, Osteosarcoma, Pharmacogenomics, Genetics

* Correspondence: [email protected] 1Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington DC, USA Full list of author information is available at the end of the article

© The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Bhuvaneshwar et al. BMC Cancer (2019) 19:357 Page 2 of 16

Background great variation seen in human health and disease. Germ- Osteosarcoma (OS) is the commonest malignant bone line genetic variation between individuals may lie at the tumor in children. It accounts for about 2% of childhood heart of two critical questions: who is at risk to develop cancers in the US. Approximately 800 new cases are cancer, and how best to treat individuals once they are diagnosed in the US every year, about 400 of which are diagnosed. This genetic variation may account for the in children and teens [1]. OS occurs mostly between the wide variation seen in the response and toxicity related ages of 10 and 30. Approximately 60% of all malignant to chemotherapeutic agents. bone tumors diagnosed in the first two decades of life The pharmacogenetic differences between patients are are osteosarcomas. multi-factorial [12]. One factor is polymorphism in drug Theroleofadjuvantchemotherapy is well established in targets, including cell surface receptors and target . the treatment of osteosarcoma [2, 3]. Prior to use of sys- Another is polymorphism in cellular recovery mechanisms temic chemotherapy, two-year survival was less than 20% that repair cytotoxic agent-induced damage. Finally, there even in patients with clinically localized disease. Most are polymorphisms in genes encoding proteins involved in recent studies report a 3-year survival of 60–70% drug pharmacokinetics, including proteins that impact drug among patients with non-metastatic disease treated absorption, metabolism, distribution, and elimination with combination chemotherapy. Standard treatment for (ADME). Germline pharmacogenetic biomarkers have been non-metastatic osteosarcoma includes neo-adjuvant found for a number of anticancer agents, including irinote- chemotherapy followed by surgical resection and can, mercaptopurine, 5-flurouracil, and tamoxifen [13–16]. post-operative chemotherapy. The extent of necrosis of These studies often used a candidate approach, and the primary tumor at time of definitive surgical resection attempted to explain a drug’s efficacy or toxicity by identify- is the only significant prognostic factor in patients with ing one gene and even one variant within that gene [17]. non-metastatic osteosarcoma. Patients with less than 10% While candidate gene/single variant analysis provides viable tumor at time of definitive surgery have significantly important insights, there are several limitations in most lower risk of relapse as compared to those with more than published studies. The implicated alleles were often low 10% viable tumor [4]. Five-year survival is 75–80% among frequency and the absolute numbers of patients with patients with good histological response and only 45–55% these alleles were low. Only one or few single nucleotide among poor responders even after complete surgical polymorphisms (SNPs) were examined for each gene resection [5, 6]. of interest and potentially significant polymorphisms The most active chemotherapeutic agents against could have been missed. In this study, we applied osteosarcoma include cisplatin, doxorubicin and high both single SNP and multiple SNP analysis to get an dose methotrexate. The Children’s Oncology Group enhanced understanding of genetic polymorphism in (COG) considers combination cisplatin, doxorubicin and the disease. high dose methotrexate (MAP) as standard therapy for A genome-wide approach enables examination of poly- osteosarcoma [7]. Postoperative therapy is often modi- morphisms of a large number of implicated genes in fied in poor responders to improve outcome, such as the multiple pathways that may impact on response to use of ifosfamide and etoposide [8–10]. However, no chemotherapy. With advance in technology and reduction such attempts have been successful to date. It is likely in costs, whole-genome sequencing is now feasible with that the initial 8–12 weeks of ineffective therapy select next-generation sequencing. Complete sequences of impli- for resistant clones and allow the cancer cells to cated genes can be analyzed to identify polymorphisms in metastasize. Changing therapy at a later time point fails the context of pathway datasets, rather than as individual to change the outcome. There is a need to identify the data points. An integrative approach combining whole poor responders at time of initial diagnosis to avoid genome sequencing (WGS) with multiple datasets and delivering ineffective pre-operative therapy. Alternative defined clinical data can 1) better capture pathway associ- chemotherapeutic agents at initial diagnosis could ations, and 2) provide the opportunity for discovery of potentially alter outcomes in patients expected to have clinically actionable variants [18, 19]. poor response to standard cisplatin based chemotherapy. We have previously used this integrative approach to Thus, the key challenge is to determine the basis for define the pharmacogenetic profile of gemcitabine, and response and non-response in patients at the outset and defined a ‘sensitive’ and ‘resistant’ genotype using the identify patients who are eligible for intensified or alter- combination of pre-clinical data from the NCI60 cell native therapy given their personal profile. lines and a genome wide association studies (GWAS) The completion of the HapMap project is a historic clinical dataset from a large clinical trial [20]. We now achievement [11]. The project identified over one wish to expand this approach to pediatric osteosarcoma million single nucleotide polymorphisms (SNPs) across patients treated with multi-agent chemotherapy, includ- the , which may be at the root of the ing cisplatin, doxorubicin, and methotrexate. Bhuvaneshwar et al. BMC Cancer (2019) 19:357 Page 3 of 16

Theaimofthisstudyistoidentify,test,andvalidatea week 1; Methotrexate 12 g/m2 on day 1 of week 4, week genotype resistant to cisplatin, doxorubicin, and methotrex- 5. This entire cycle is repeated week 6–10. Surgical re- ate, in children with osteosarcoma, using two datasets section with tumor necrosis data on week 12. derived from clinical samples. The genetic signature that is strongly associated with drug response could be translated Processing of genomic data into clinical oncology [21] and used in the future to We used open source well-known best practices tools in personalize therapy. the processing of sequencing data. The tools included Sickle [27], Bowtie2 [28], Samtools [29], Picard [30], and Methods GATK’s[31] HaplotypeCaller. After quality control, the Whole genome sequencing data was obtained from a co- raw sequencing data in the form of FASTQ files were hort of 15 osteosarcoma patients from the Inova Fairfax aligned to the human reference genome (version hg19). Hospital for Children. Additional sequencing data and Post alignment processing was done on the aligned clinical outcomes of 85 patients with osteosarcoma were reads, so that it will be in the right format for the subse- obtained from the NCI TARGET dataset. The following quent step. A variant calling algorithm was applied, sections describe the datasets, sample collection proce- which mathematically checked the patient genome dures and data analyses. against the reference genome to identify variants in the form of single nucleotide polymorphisms (SNPs) or Datasets indels. The variants identified from each patient were merged into one file. Only those variants that passed  Inova Pediatric Group Osteosarcoma Patients quality check were chosen for further analysis. (labeled ‘INOVA’): Whole genome sequencing Additional file 1 shows the steps, file formats and tools (WGS) was performed for 15 children, who are up in this genomic data processing, which was performed to 21 years of age with osteosarcoma, including on an Amazon cloud r3.4xlarge instance. newly diagnosed and off therapy. Patients were The TARGET variants were a combination of whole-ex- recruited over a 2 year period from 2014 to 2016. ome and whole-genome data. We applied a filtering criter- Demographic, clinical outcome, chemotherapeutic ion on the variants such that if all patients, or if 84 of the exposure and pathological response data were 85 patients had the reference allele, then the variant was collected for all subjects. The study was IRB rejected. The same processing steps were applied to both approved; informed consent and assent were datasets in an effort to reduce batch effects. At the end of obtained from each child and parent as appropriate. this filtering, the TARGET dataset had about 900,000 SNPs DNA was extracted from whole blood samples, and and the INOVA dataset had about 8 million SNPs. whole genomic DNA was sequenced on an Illumina HiSeq 2500 whole genome sequencer at 30-50X Outcomes of interest coverage. Raw sequencing data were obtained in The outcomes of interest chosen were: (1) Relapse (2) the form of Fastq [22] files. Percent tumor necrosis and (3) overall survival.  TARGET Osteosarcoma Dataset (labeled Tumor necrosis following preoperative chemotherapy ‘TARGET’): This is a cohort of 85 fully characterized is the strongest prognostic factor for osteosarcoma [32]. patient cases from NCI’s TARGET database for Tumor necrosis as an outcome provides a window into osteosarcoma (released in Feb 2015) [23, 24]. The the early part of drug response, while ‘relapse’ as out- majority of these patients were teenagers. The come provides an extended view of drug response. samples were collected at the time of diagnostic Relapse information was available for all the 15 surgery. Aligned genomic data from whole blood INOVA patients and 85 TARGET patients. These pa- samples were downloaded from NCI’s dbGAP tients also had overall survival information (Table 1). database [25]. Out of this 85-patient cohort, whole Tumor necrosis data were available for the 15 patients exome sequencing (WXS) data were available on 52 from the INOVA cohort, but only for 44 out of 85 pa- patients and WGS data were available on 33 pa- tients from the TARGET cohort. Patients with tumor ne- tients. For each patient, aligned genomic data in the crosis greater than 90% at the time of resection were form of ‘BAM files’ short for Binary Alignment Map [26] were downloaded, decrypted and processed. Table 1 Summary of patients showing the number of patients who relapsed and were relapse-free All patients were treated with standard MAP therapy. Dataset # Relapse # Relapse-free Total # of patients The standard MAP neo-adjuvant chemotherapy regimen TARGET 39 46 85 is as follows: Doxorubicin (Adriamycin) 37.5 mg/m2/day INOVA 3 12 15 and Cisplatin (Platinum) 60 mg/m2/day on days 1,2 of Bhuvaneshwar et al. BMC Cancer (2019) 19:357 Page 4 of 16

‘Good responders’; patients with tumor necrosis <= 90% distribution, and elimination (ADME), also known as were ‘Poor responders’ (Table 2). Drug Metabolizing Enzymes and Transporters (DMET). The words DMET and ADME are used interchangeably Methods for SNP analysis in this manuscript. We explored their association with Two analyses were performed, shown in Fig. 1. tumor necrosis using machine learning methods. Among the 85 TARGET patients, only 44 had clinical Analysis 1: hotspots associated with relapse data available on tumor necrosis (Table 1). All the 15 We analyzed the INOVA and TARGET datasets separ- INOVA patients had data available on tumor necrosis. ately to look for hotspots associated with relapse out- Since the sample size would be small and there would come. Hotspots are regions in the genome that have not be enough power to obtain statistically significant multiple co-occurring mutations in the same or close-by results, we merged the two datasets to get a total of 59 regions [33]. In our analysis, we looked for hotspots, in patients (referred to as the ‘TARGET+INOVA’ cohort). terms of haplotypes, which are groups of markers (SNPs) Before starting the analysis, we tested the data for that are inherited together [34] . Once the significant batch effects. We performed a principal component ana- haplotypes associated with outcome were identified, we lysis (PCA) on variants in DMET genes (Additional file 2) then looked for haplotypes overlapping between the two using the R statistical platform (https://cran.r-project. datasets. org/). We found a clear batch effect due to the merging There are several advantages to grouping SNPs: it re- of the two datasets. We have accounted for this batch ef- duces the number of tests, making it easier to reject the fect in our analysis. null hypothesis, offering more power to the analysis; From the TARGET + INOVA cohort, we extracted SNPs in a haplotype block (haploblock) are inherited to- 36,504 variants in DMET genes. We binned the data into gether; the markers (SNPs) are often closely located and 0 (indicating no mutation) and 1 (indicating presence of will be in linkage disequilibrium (LD); increased robust- mutation). 85% of these data were randomly set as train- ness in statistical testing; groups of SNPs affecting out- ing set and the rest 15% were set as an independent come are more biologically relevant than a single SNP validation set using the caret package [38] (seed of 7) in affecting outcome [35]. R. Several filters were applied on the training set so that For this study, we performed haplotype based associ- only SNPs with the most variability were chosen for the ation test analysis using the PLINK tool [36, 37]. For analysis. At the end of all the filtering, we were left with each dataset, the tool identified haplotypes significantly 4543 SNPs for analysis. associated with relapse outcome (p value <= 0.05). Using For each of the 4543 variants, we created two general- location as criteria, we looked for overlap- ized linear models (GLMs) with tumor necrosis as out- ping haplotypes between the two datasets. We looked come variable: one with the variant and adjusting for the for partial or complete overlap in the chromosome loca- dataset to control for batch effect; and second one speci- tion; hence the SNPs in these common haplotypes may fied without the variant. This allowed us to perform lo- or may not be the same. Looking for overlap in two in- gistic regression analysis to find those variants most dependent datasets reduces the chances of randomly oc- associated with tumor necrosis outcome. We then used curring haplotypes and helps eliminate false positives. analysis of variance (ANOVA) to compare the two The SNPs in these common haplotypes were then models to obtain a p-value based on chi-square distribu- mapped to genes and pathways to enable downstream tion (described in Additional file 2). The p-values from system biology analysis to give insights into drug re- this test were adjusted for multiple testing using the sponse mechanisms. Benjamini Hochberg approach to control the false discovery rate (FDR) [39]. The significant variants (p Analysis 2: targeted analysis of variants in DMET genes value < 0.05) from this analysis were annotated using associated with tumor necrosis and survival SnpEff [40] variant annotation tool. The annotations We performed a targeted analysis of mutations in the were used to sub-divide these SNPs based on their im- genes involved in drug absorption, metabolism, pact into high, moderate, low impact, and modifiers (non-coding regions). Table 2 Summary of patients with Tumor necrosis information. We built a Random Forest predictive model with these Good Responders: > 90% tumor necrosis; Poor Responders: ≤ significant variants with 20 fold cross validation (seed 90% tumor necrosis 260), and performed a prediction on the independent Dataset # Poor Responders # Good Responders Total # of patients validation set (block diagram shown in Additional file 2). We also performed survival analysis on the significant TARGET 26 18 44 variants to assess their impact on overall survival. The INOVA 9 6 15 association with survival was tested using the log rank Bhuvaneshwar et al. BMC Cancer (2019) 19:357 Page 5 of 16

Fig. 1 Schematic of analysis methodology

test statistic [41]. The analysis was performed in the R significantly associated with tumor necrosis and over- programming language. Kaplan Meier survival curves all survival. [42] were generated. Results were compared to pub- lished literature. Results Mutation hotspots associated with relapse Validation of SNPs detected using sanger sequencing We found a total of 2178 haplotypes significantly associ- The significant SNPs and indels obtained from our ana- ated (p value < 0.05) with relapse outcome in the TAR- lysis of WGS data were confirmed in the lab using GET dataset and 110,000 significant haplotypes in the Sanger Sequencing technique. DNA extracted from the INOVA dataset (more haplotypes identified in INOVA INOVA cohort was used for PCR amplification. Using data because it was WGS data). Using chromosome lo- Primer3 (http://bioinfo.ut.ee/primer3-0.4.0/), primers cation as the criteria for finding overlapping haplotypes were designed to cover and validate the subset of vari- between the two datasets, we found a total of 231 over- ants identified by WGS. The PCR was performed using lapping haplotypes (Additional file 3). We mapped the AmpliTaq Gold 360 master mix (Applied Biosystems). SNPs in these haplotypes to genes, and found 26 genes After Exo/SAP purification (ThermoFisher Scientific), common to the TARGET and INOVA datasets associ- the amplicons were sequenced by using the BigDye V.3.1 ated with relapse, including AKR1D1, SLC13A2, MKI67 Terminator chemistry (Applied Biosystems) and sepa- and PIK3R1 and others (Table 3). rated on an ABI 3730xl genetic analyzer (Applied Bio- We also performed enrichment analysis using Reac- systems). Data were evaluated using Sequencher V.5.0 tome database (https://reactome.org/)[43] to map these software (Gene Codes). 26 genes to pathways (Table 4). This validation was performed on three of our ana- lysis results: (a) variants belonging to the most fre- quent haplotypes in the INOVA dataset (b) variants Most frequent haplotypes amongst the overlapping in the form of dbSNP ids amongst the overlapping hotspots haplotypes associated with relapse and common to Among the 231 overlapping haplotypes between the the two datasets (c) variants among the DMET genes INOVA and TARGET datasets, we looked in detail at Bhuvaneshwar et al. BMC Cancer (2019) 19:357 Page 6 of 16

Table 3 List of 26 common genes obtained from haplotypes the lab, indicates their potential as biomarkers for diag- overlapping between INOVA and TARGET and associated with nostics and drug discovery. relapse outcome List of 26 common genes Targeted analysis of variants in DMET genes associated 7SK with tumor necrosis and survival AKR1D1 We explored the association of variants in DMET genes ‘ ’ C10orf112 (also known as MALDR1) * (referred to as DMET variants ) with tumor necrosis as outcome, and obtained a total of 281 variants with p CACNA2D4 value < 0.05 that can separate good responders and poor $ CDH13 responders (Additional file 4). We built a predictive CDH9$ model using these variants. This model gave us a predic- CDRT15 tion accuracy of 87.5% when applied to the independent CSMD1 validation set. The confusion matrix and the summary of DGCR6* the model are shown in Additional file 5. We annotated these variants with SnpEff variant annotation tool [40] DQ576041 and grouped them into high impact, moderate impact, * DQ600701 (also known as PIR61811) and modifier (non-coding regions). DQ786190 A “high impact” variant is one that is predicted to have GABRG3$ a disruptive impact in the , such as protein trun- HBE1$ cation, loss of function or triggering nonsense mediated “ ” LOC643401 decay. A moderate impact variant is a non-disruptive variant that might change protein effectiveness. A “low MKI67 impact” variant is assumed to be harmless or unlikely to OCA2 change protein behavior. “Modifier variants” are OR51B5 non-coding variants or variants affecting non-coding PCGF2 genes [40]. PDZD4* Out of the 281 variants significantly associated with PIK3R1*$ tumor necrosis, there was one high impact variant, rs17143187, which is a splice donor variant in the PKHD1 ABCB5 gene. This variant is located on the boundary of * PPP1R12C exon and intron and could cause aberrant splicing that SLC13A2*$ would result in a disrupted protein [44]. Sixteen moder- ZNF321P ate impact variants, 15 low impact variants, and 249 ZNF816 modifier variants were identified. 210 out of the 281 of Genes marked with * indicate the most frequent haplotypes associated these variants were located in intronic regions. with relapse We performed survival analysis on these 281 variants, $ Genes marked with were significantly enriched in the pathway enrichment and obtained five variants as significant results (p value analysis (details in Table 4) < 0.05). Thus, these 5 variants are significantly associated those haplotypes that have the highest sample frequency with both tumor necrosis and overall survival (Table 6). in each dataset (Table 5). The Kaplan Meier survival curves for these five variants are shown in Fig. 2. Common SNPs amongst the overlapping hotspots Using dbSNP ids as criteria, we looked for common SNPs Validation of SNPs detected using sanger sequencing amongst the 231 overlapping hotspots (haplotypes) be- The lab validation experiment was performed on three tween TARGET and INOVA datasets. We found 10 groups of analysis results: (a) 20 variants belonging to the dbSNP ids common to the two datasets. These include most frequent haplotypes in the INOVA dataset (labeled four SNPs in MKI67 (rs7071768, rs11016073, rs61738284, as ‘Group A’) (b) 10 variants in the form of dbSNP ids rs11591817), one SNP in CACNA2D4 (rs10735005), three common to the two datasets amongst the overlapping SNPs in SLC13A2 (rs3217046, rs11568466, rs9890678), haplotypes significantly associated with relapse (labeled as and two SNPs in PPP1R12C (rs10573756, rs34521018). ‘Group B’) and (c) 5 variants among the DMET genes sig- These variants are important because they are part of hot- nificantly associated with tumor necrosis and overall sur- spots that are significantly associated with relapse out- vival (labeled as ‘Group C’). come. The fact that these same SNPs were found as part Among the variants in Group A, at-least one SNP in each of hotspots in two independent datasets, and validated in haplotype was successfully validated (Additional file 6). Bhuvaneshwar et al. BMC Cancer (2019) 19:357 Page 7 of 16

Table 4 Pathways enriched from the 26 genes common to TARGET and INOVA haplotypes Pathway name #Entities #Entities Entities P Entities Submitted entities found total value FDR found Adherens junctions interactions 2 35 0.003 0.228 CDH13; CDH9 Cell-Cell communication 3 133 0.003 0.228 CDH13; PIK3R1; CDH9 Factors involved in megakaryocyte development and platelet 3 179 0.007 0.228 HBE1 production Cell-cell junction organization 2 67 0.009 0.228 CDH13; CDH9 Cell junction organization 2 94 0.017 0.228 CDH13; CDH9 Sodium-coupled sulphate, di- and tri-carboxylate transporters 1 9 0.019 0.228 SLC13A2 MET activates PI3K/AKT signaling 1 10 0.021 0.228 PIK3R1 GP1b-IX-V activation signaling 1 12 0.025 0.228 PIK3R1 PI3K events in ERBB4 signaling 1 15 0.032 0.228 PIK3R1 GABA A receptor activation 1 15 0.032 0.228 GABRG3 Signaling by FGFR3 fusions in cancer 1 16 0.034 0.228 PIK3R1 Erythrocytes take up oxygen and release carbon dioxide 1 16 0.034 0.228 HBE1 Signaling by FGFR4 in disease 1 18 0.038 0.228 PIK3R1 PI3K events in ERBB2 signaling 1 22 0.046 0.228 PIK3R1 Tie2 Signaling 1 22 0.046 0.228 PIK3R1

Validation of seven variants (located in intronic regions of were found as part of a haplotype and in LD with in- two genes and one intergenic region) was not confirmed. tronic SNP rs2306847 (found among the hotpots in Several of those that were not confirmed were located in in- INOVA dataset) as part of a haplotype [45]. These SNPs sertion/deletion regions. All the variants in Groups B and C have been significantly associated with higher AKR1D1 were successfully validated (Additional file 6). mRNA expression [45]. AKR1D1 is a key genetic regula- tor of the P450 network, which affects drug metabolism, Discussion efficacy and adverse events in patients [45]. Summary of results Genes CDH13, and CDH9 are part of the cadherin family The main results are summarized in Table 7, showing of genes, and along with PKHD1, are part of the cell-cell (a) the 26 genes from common haplotypes, found in adhesion biological process. This biological process is asso- both the TARGET and INOVA datasets, that are associ- ciated with a multidrug resistant phenotype, “cell ated with relapse; and (b) The 10 dbSNP ids among the adhesion-mediated drug resistance,” or CAM-DR [46]. Os- overlapping hotspot regions common to the two datasets teoblasts express multiple cadherins [47], and cadherin me- and (c) the genes from targeted DMET analysis associ- diated cell-to-cell adhesion is critical for normal human ated with both tumor necrosis and overall survival. We osteoblast differentiation [47]. The cadherin family of genes explored these genes to see which of them are known in is associated with CCN3, which has been found to have relation to drug response from published literature. prognostic value in Osteosarcoma [48]. CDH13 and CHD9 are also part of the adherens junction biological process; Hotspots associated with relapse and adherens-dependent PI3K/AKT activation is known to We performed haplotype based association analysis in induce resistance to genotoxin-induced cell death in intes- the TARGET and INOVA datasets to find haplotypes as- tinalepithelialcells[49]. sociated with relapse outcome, and then looked for over- Variants in gene ZNF321P (among the TARGET lap. We found 231 haplotypes overlapping (based on haplotypes) and in gene PCGF2 (among the INOVA chromosome location) amongst the TARGET and haplotypes) are intronic and also located in active INOVA datasets. The SNPs in these haplotypes were promoter regions. Similarly, intronic variants in gene mapped to genes, and 26 genes were found common be- PPP1R12C (among the INOVA haplotypes) are located tween the two datasets. These 26 common genes were in strong enhancer regions containing transcription explored for known relationships with drug response. factor binding sites. Mutations in such gene regulatory Among them is AKR1D1, which has SNPs rs1872929 regions could inhibit transcription factor binding, lead- and rs1872930 among the hotspots in the TARGET ing to aberrant cell proliferation or drug response [50]. dataset, and are in the three prime untranslated region We hence see that some of the genes identified from (3′ UTR) of AKR1D1. SNPs rs1872929 and rs1872930 our analysis are linked with drug resistance or drug Bhuvaneshwar et al. BMC Cancer (2019) 19:357 Page 8 of 16

r- e- Inova_F 0.167 s- TARGET_F 0.506 0.699 0.487 0.478 0.353 0.341 0.0625 0.125 0.125 PDZD4 intronic, exonic IDH3G, PDZD4 intronic, exonic SLC13A2 intronic, exonic SLC13A2 intronic, exonic PPP1R12C intronic, exonic PPP1R12C intronic 153,056,311 153,084,802 222 18,878,593 18,879,911 22,222,122 intergenic DQ786190, DGCR6 18,878,593 18,879,898 2,222,212 intergenic DQ786190, DGCR6 18,878,349 18,878,632 11,222,221 intergenic DQ786190, DGCR6 26,816,365 26,817,537 11 26,816,365 26,818,676 112 19,620,439 19,641,239 1,221,212 intergenic DQ600701, C10orf112 (MALRD1) 0.0409 55,614,923 55,624,113 111 67,513,481 67,554,172 11,121 intronic PIK3R1 55,614,923 55,624,113 111 153,073,319 153,075,655 22,222 23 22 22 22 17 17 19 19 23 2 5 6 4 3 4 6 12 10 6 16 5 TARGET _NSNP TARGENHAP TARGET_CHR TARGET_BP1 TARGET_BP2 TARGET_Haplo TARGET_Region TARGET_Genes 5 INOVA_NSNP INOVA_NHAP INOVA_CHR INOVA_BP1 INOVA_BP2 INOVA_Haplo Inova_Region Inova_Genes Most frequent haplotypes in the TARGET and INOVA datasets Haplotype#1 Most frequent in Target 3 Table 5 Haplotype#4 Most frequent in Target 8 Haplotype#12 Most frequent in Target 7 Haplotype#20 Most frequent in Target 8 Haplotype#31 Most frequent in Target 2 Haplotype#32 Most frequent in Target 3 Haplotype#61 Most frequent in Inova 3 Haplotype#160 Most frequent in Inova 7 Haplotype#1 5 Haplotype#74 Most frequent in Inova 3 Haplotype#126 Most frequent in Inova 5 NSNP: Number of SNPsnumbers in the 1 and haplotype, 2 NHAP: represent Number the of genotypes. common F: haplotypes, Sample CHR: frequency Chromosome, BP1: Position of left most SNP, BP2: Position of right most SNP, Haplo: Indicates the haplotype that was formed. The Table 5 Most frequent haplotypes in the TARGET and INOVA datasets (Continued) Bhuvaneshwar Haplotype#4 8 8 22 18,877,869 18,878,859 22,122,121 intergenic DQ786190, DGCR6 0.106 Haplotype#12 8 8 22 18,877,869 18,878,859 22,122,121 intergenic DQ786190, DGCR6 0.106 Haplotype#20 8 8 22 18,877,869 18,878,859 22,122,121 intergenic DQ786190, DGCR6 0.106 ta.BCCancer BMC al. et Haplotype#31 8 9 17 26,816,365 26,822,518 11,121,221 intronic, exonic SLC13A2 0.0839 Haplotype#32 8 9 17 26,816,365 26,822,518 11,121,221 intronic, exonic SLC13A2 0.0839 Haplotype#61 7 7 19 55,619,303 55,622,518 2,222,222 intronic PPP1R12C 0.71 Haplotype#74 8 8 19 55,618,992 55,622,518 22,222,222 intronic PPP1R12C 0.675 Haplotype#126 5 9 5 67,548,442 67,549,836 22,222 intronic PIK3R1 0.448 (2019)19:357 Haplotype#160 7 9 10 19,621,103 19,624,121 2,222,221 intergenic DQ600701, C10orf112 (MALRD1) 0.267 ae9o 16 of 9 Page Bhuvaneshwar ta.BCCancer BMC al. et (2019)19:357

Table 6 Results of survival analysis showing association of variations with overall survival Name Of Variant # of Samples #of # of events in # of events in P value from Adjusted Hazard Gene ENCODE dbSNP id Impact Location without samples samples without samples with Log Rank p-value Ratio* Name annotation mutation with mutation mutation (SC) Test mutation chr6_160551093_T_G 44 15 8 7 0.007 0.59 3.723 SLC22A1 Heterochromatin; rs4646272 Modifier Intron variant low signal chr11_62761161_C_T 42 17 7 8 0.019 0.59 3.161 SLC22A8 Repressed rs2187384 Modifier Intron variant, UTR3 chr4_69528597_A_G 51 8 11 4 0.024 0.59 3.462 UGT2B15 . rs34073924 Modifier Intron variant chr7_2472429_C_A 53 6 11 4 0.040 0.59 3.124 CHST12 Transcription rs3735099 Moderate, Missense variant; Elongation modifier upstream gene variant chr7_2472455_A_T 53 6 11 4 0.040 0.59 3.124 CHST12 Transcription rs3735100 Moderate, Missense variant; Elongation modifier upstream gene variant Variant name is defined as “ChromsomeNumber_Position_ReferenceAllele_AlternateAllele” The Hazard ratio was obtained from the Cox Proportional Hazards Test – indicates hazard value of having an event (death) in the mutation group compared to the non-mutation group ae1 f16 of 10 Page Bhuvaneshwar et al. BMC Cancer (2019) 19:357 Page 11 of 16

Fig. 2 Kaplan Meier survival curves for the significant mutations in DMET genes. 0 (red): no mutation, and 1 (blue): presence of mutation ponse; others have not been previously linked with drug mRNA translation [53–55] and possibly affect drug re- response. In the genes previously linked with drug resist- sponse. For example, pilRNAs were found to play key roles ance or response (AKR1D1, CDH13, and CDH9), the in chemo resistance to cisplatin-based chemotherapy in SNPs and haplotypes found from our analyses are novel. lung squamous cell carcinoma (LSCC) [56]. Another frequent haplotype is located in the intergenic Most frequent haplotypes region between DQ786190 and DGCR6 [57, 58], which is Among the 231 common haplotypes between the located in chr 22, q11.21 region. According to Genbank, INOVA and TARGET datasets, we examined the haplo- the mRNA sequence DQ786190 is involved in types that have the highest sample frequency (Table 5). lineage-specific gene duplication and loss in humans [59]. The most frequent haplotype in the TARGET dataset According to ENCODE annotation, this intergenic region span intronic and intergenic regions in or near the fol- with chromosome position 18,878,593–18,878,632 con- lowing genes: IDH3G, PDZD4, DQ786190, DGCR6, and tains repetitive/copy number mutations. The INOVA SLC13A2. The most frequent haplotypes in the INOVA dataset also contains nearby haplotypes (position dataset span the region in or near the following genes: 18,877,874–18,878,489), which spans the intergenic region PPP1R12C, PIK3R1, DQ600701 and MALRD1. between DQ786190 and DGCR6. This region contains DQ600701 (also known as PIR61811) is a Piwi-interact- TATA boxes, therefore haplotypes containing multiple ing RNA (piRNA), a small non-coding RNA found in clus- SNPs in this intergenic region can potentially affect tran- ters as regulatory elements, and control gene expression in scription or gene copy number, and possibly drug re- germ cells [51–53] .Somatic cells express similar small sponse [60]. miR-145 is predicted to target this DQ786190 non-coding RNAs called piRNA-like (piR-Ls or pilRNA) mRNA sequence [40]. miR-145 was found to be 5 times with similar functions as piRNAs. piRNA/pilRNAs appear under expressed in the miRNA expression signature asso- to target the 3′ UTR of mRNAs and potentially regulate ciated with canine osteosarcoma [61]. Thus, the variant Bhuvaneshwar et al. BMC Cancer (2019) 19:357 Page 12 of 16

Table 7 Summary of results obtained at the gene level (a) List of 26 common genes from (b) List of 10 dbSNP ids common to the two datasets (c) Genes from targeted DMET analysis overlapping haplotypes associated with amongst the overlapping haplotypes associated with associated with both tumor necrosis and relapse relapse overall survival 7SK rs7071768 (MKI67) SLC22A8 AKR1D1 rs11016073 (MKI67) SLC22A1 CACNA2D4 rs61738284 (MKI67) UGT2B15 CDH13 rs11591817 (MKI67) CHST12 CDH9 rs10735005 (CACNA2D4) CDRT15 rs3217046 (SLC13A2) CSMD1 rs11568466 (SLC13A2) DGCR6* rs9890678 (SLC13A2) DQ576041 rs10573756 (PPP1R12C) DQ600701 (also known as PIR61811) * rs34521018 (PPP1R12C) DQ786190 GABRG3 HBE1 LOC643401 MALDR1 (also known as C10orf112) * MKI67 OCA2 OR51B5 PCGF2 PDZD4 * PIK3R1 * PKHD1 PPP1R12C* SLC13A2* ZNF321P ZNF816 Genes marked with * indicate the most frequent haplotypes associated with relapse haplotype could affect drug response through decreased potential to transport and deliver anticancer chemother- miRNA binding [62]. apeutic agents, and are being studied as drug targets in SLC13A2 is the only Drug Metabolizing Enzyme and cancer [69, 70]. Transporter (DMET) gene that has significant haplo- Another gene in the list is PIK3R1, which encodes types present in the TARGET and the INOVA dataset. regulatory subunits of PI3-kinase [71]. The PI3K path- This gene is known to play an important role in trans- way is frequently activated in cancer due to genetic (e.g., porter activity [63]. SLC13A2 was down regulated along amplifications, mutations, deletions) and epigenetic (e.g., with miR-9 overexpression in malignant murine masto- methylation, regulation by non-coding RNAs) aberra- cytoma cell lines, and in primary canine osteosarcoma tions targeting its key components, and may affect re- (OSA) tumors and cell lines [64]. Another gene in the sponse to specific therapeutic agents [72]. Hotspots in same solute carrier family is SLC19A1, which is a folate exonic regions of PIK3R1 (residue M582_splice, N564, carrier. Reduced folate carrier function has been associ- G376, R348, K567) have been found in tumor samples of ated with impaired methotrexate transport in osteosar- various cancers (http://cancerhotspots.org/)[73]. In coma tumors [65, 66]. Polymorphisms in SLC19A1 have Zhao et al., the authors found that up-regulation of long been associated with response to methotrexate treatment non-coding RNA promoted osteosarcoma proliferation in pediatric osteosarcoma [67, 68]. In recent years, these and migration through the regulation of PIK3IP1, an- SLC transporters have been recognized as having the other protein in the PI3K pathway [74]. Bhuvaneshwar et al. BMC Cancer (2019) 19:357 Page 13 of 16

Common SNPs amongst the overlapping hotspots potential biomarkers for early cancer detection, and The 4 SNPs in the MKI67 gene rs7071768, rs11016073, targets for cancer therapy [93]. Hotspot regions are rs61738284 and rs11591817 are present in the hotspots being researched for their potential as targets for diagno- in the TARGET and INOVA datasets. All these SNPs are sis and drug development [73, 94]. non-synonymous and are expected to affect protein Hence, some of the significant variants obtained from function. According to ENCODE annotation, these SNPs our analyses are supported by reports in the literature are located in regions of transcription elongation, which and serve as in-silico validation of our results, while can have broad effects on gene expression. Mutations in other variants are novel, and offer additional value for these regulatory regions have been linked with disease exploration of new and novel drug therapies. mechanisms [75] and possibly modified drug response. The gene MKI67 is often used as a surrogate biomarker Limitations to score the aggressiveness of the tumor, and expression Although the major findings were statistically significant of this gene has been used as a predictor of response to and confirmed by Sanger sequencing, the sample size of chemotherapy in breast cancer patients [76, 77]. the INOVA cohort was relatively low. The relevance of these findings with respect to drug response needs to be Targeted analysis of variants in DMET genes associated validated in a larger independent cohort of patients with tumor necrosis and overall survival before applying in clinics. As next steps, we plan to Among the 281 DMET variants that were significantly explore the application of the significant haplotypes and associated with tumor necrosis, was gene ABCG2. This single SNPs discovered, in a larger scale study to build a gene is part of the Methotrexate metabolic pathway predictive signature of genomic variants associated with (MTX pathway), and mutations in this gene have been treatment outcome. implicated in MTX efficacy and toxicity. This is gene is A recent publication on pan-can analyses of pediatric also linked with breast cancer treatment resistance [78]. cancers by the TARGET group [24] published in March The only high impact variant was rs17143187, which 2018, showed that that the mutation rate in osteosar- is a splicing and intron variant in the ABCB5 gene. Al- comas is much higher in the non-coding regions (0.79 ternative splicing and ABCB5 SNPs are known to affect per Mb) than in coding regions (0.53 per Mb). Being a drug response [79–81]. This mutation is predicted to be whole-genome sequencing dataset, the INOVA cohort a deleterious mutation based on FATHMM prediction lends itself as an independent dataset to be studied algorithm [82]. in-depth in future research projects. Hence there is Among the 281 variants that were significantly associ- much promise in the analysis and exploration of the ated with tumor necrosis, 5 variants were significantly whole genome for future research work in OS. associated with overall survival as well. All these 5 vari- ants have a hazard ratio of 3, meaning that at any par- ticular time, three times as many patients in the Conclusion mutation group are experiencing an event (death) com- Most publications that study drug response in Osteosar- pared to patients in the non-mutation group. coma focus on the exonic regions. However, most of the A total of 210 of the 281 significant variants were studies of OS have not been focusing on whole genome located in intronic regions, which is consistent with analysis with regard to treatment response. We hypothe- known literature. Luizon and Ahituv reviewed all pub- sized that genetic variation of the host may account for lished pharmacogenomics genome wide association the wide variation seen in the response and toxicity related studies (GWAS) and found that 96.4% of the SNPs res- to chemotherapeutic agents. Our analysis approach was ide in noncoding regions [50]. Intronic regions typically different from other studies that search for individual harbor microRNAs and long non-coding RNA, other SNPs to confer significance. Using groups of SNPs for regulatory elements, epigenetic elements and structural analyses has increased power in finding associations as variants [83, 84]. These intronic regions are sites of well as increased robustness in statistical testing. intron retention, in which the introns are not spliced From our analyses, we found a list of intronic and out, but are retained. Recent studies have shown that in- intergenic hotspot regions common to both the TAR- tron retention affects regulation of gene expression and GET and INOVA datasets that are significantly associ- RNA translation [85, 86]. ated with outcome, providing insights into drug Variants in introns can affect drug response by altering response mechanisms. Some of the genes with variants the gene expression [83, 87–91]. In recent years, non- found in this study are linked with drug response at the coding RNA are being researched as potential drug tar- pathway level (gene/pathway/biological processes level). gets since they affect gene expression and disease Our results include variants in genes not previously progression [92]. Enhancers have been identified as linked with drug response, as well as novel SNPs and Bhuvaneshwar et al. BMC Cancer (2019) 19:357 Page 14 of 16

haplotypes in genes known to be linked with drug resist- Authors’ contributions ance. The targeted single SNP analysis of the DMET SS, JD and MH designed the study. KB, MH and YG created the analysis plan and performed the analysis. KB and YG wrote the paper. EY, SS, MH and SM genes found variants significantly associated with both reviewed and edited the paper. RI performed the whole genome tumor necrosis outcome and survival. sequencing. RI and TV performed the lab validation. All other authors We were able to validate the majority of the variants provided editorial comments. All authors have read and approved the manuscript. in our results using Sanger sequencing at the individual patient level. Identification and validation of such gen- Ethics approval and consent to participate etic markers that predict drug treatment response pro- The study was approved by the Institutional Review Board of the Inova Fairfax Hospital. Written informed consent was obtained from participants vide the basis for prospective evaluation of these who were greater than 18 years of age. For participants under 18 years of candidate markers, and for future upfront treatment age, written informed consent was obtained from the parents. design based on individual genomic profiles. Consent for publication N/A Additional files Competing interests None of the authors have any competing interests. Additional file 1: The steps, file formats and tools in the genomic data processing. (DOCX 148 kb) ’ Additional file 2: Detailed processing steps of Analysis 2 including the Publisher sNote block diagram, filtering steps, PCA plot, equations of the generalized Springer Nature remains neutral with regard to jurisdictional claims in linear models. (DOCX 1860 kb) published maps and institutional affiliations.

Additional file 3: Details of the 231 haplotypes overlapping between Author details the two datasets. (XLSX 83 kb) 1Innovation Center for Biomedical Informatics, Georgetown University Additional file 4: Details of the 281 variants significantly associated with Medical Center, Washington DC, USA. 2Inova Translational Medicine Institute, tumor necrosis. (XLSX 114 kb) Fairfax, VA, USA. 3Inova Children’s Hospital, Falls Church, VA, USA. 4Center for Cancer and Blood Disorders of Northern Virginia, Pediatric Specialists of Additional file 5: The confusion matrix and the summary of the 5 generalized linear models. (DOCX 71 kb) Virginia, Falls Church, VA, USA. George Washington University School of Medicine, Washington DC, USA. 6Virginia Commonwealth University School Additional file 6: Details of the Sanger Sequencing validation. of Medicine, Inova Campus, Falls Church, VA, USA. (XLSX 88 kb) Received: 10 November 2017 Accepted: 14 March 2019 Abbreviations 3′ UTR: Three prime untranslated region; ADME: Absorption, metabolism, References distribution, and elimination; ANOVA: Analysis of variance; CAM-DR: Cell 1. Meyers PA, Gorlick R. Osteosarcoma. Pediatr Clin N Am. 1997;44(4):973–89. adhesion-mediated drug resistance; COG: Children’s Oncology Group; 2. Link MP, Goorin AM, Miser AW, Green AA, Pratt CB, Belasco JB, Pritchard J, DMET: Drug Metabolizing Enzymes and Transporter; FDR: False discovery Malpas JS, Baker AR, Kirkpatrick JA, et al. The effect of adjuvant rate; GLMs: Generalized linear models; GWAS: Genome wide association chemotherapy on relapse-free survival in patients with osteosarcoma of the studies; haploblock: Haplotype block; LD: Linkage disequilibrium; MTX extremity. N Engl J Med. 1986;314(25):1600–6. pathway: Methotrexate metabolic pathway; OS: Osteosarcoma; PCA: Principal 3. Meyers PA, Heller G, Healey J, Huvos A, Lane J, Marcove R, Applewhite A, component analysis; piR-Ls or pilRNA: piRNA-like; piRNA: Piwi-interacting Vlamis V, Rosen G. Chemotherapy for nonmetastatic osteogenic sarcoma: RNA; SNPs: Single nucleotide polymorphisms; WGS: Whole genome the memorial Sloan-Kettering experience. J Clin Oncol. 1992;10(1):5–15. sequencing 4. Provisor AJ, Ettinger LJ, Nachman JB, Krailo MD, Makley JT, Yunis EJ, Huvos AG, Betcher DL, Baum ES, Kisker CT, et al. Treatment of nonmetastatic Acknowledgements osteosarcoma of the extremity with preoperative and postoperative The results published here are in part based upon data generated by the chemotherapy: a report from the Children's Cancer group. J Clin Oncol. Therapeutically Applicable Research to Generate Effective Treatments 1997;15(1):76–84. (TARGET) initiative, phs000218, managed by the NCI. The data used for this 5. Bielack SS, Kempf-Bielack B, Delling G, Exner GU, Flege S, Helmke K, Kotz R, analysis are available via dbGAP at phs000468.v14.p6 [25]. Information about Salzer-Kuntschik M, Werner M, Winkelmann W, et al. Prognostic factors in TARGET can be found at http://ocg.cancer.gov/programs/target. high-grade osteosarcoma of the extremities or trunk: an analysis of 1,702 ’ The authors would also like to thank Keary Jane t for her compassion and patients treated on neoadjuvant cooperative osteosarcoma study group commitment to this project; Sakthi Madhappan and Miguel Cruz for setting protocols. J Clin Oncol. 2002;20(3):776–90. up and managing the Amazon cloud instance. 6. Whelan JS, Jinks RC, McTiernan A, Sydes MR, Hook JM, Trani L, Uscinska B, Bramwell V, Lewis IJ, Nooij MA, et al. Survival from high-grade localised Funding extremity osteosarcoma: combined results and prognostic factors from The principal investigators were the recipients of the 2014 Hyundai Scholar’s three European osteosarcoma intergroup randomised controlled trials. Ann Hope Award, which has funded this work. This work was also funded by the Oncol. 2012;23(6):1607–16. P30 CA51008 Georgetown Lombardi Cancer Center support grant. The 7. Meyers PA, Schwartz CL, Krailo MD, Healey JH, Bernstein ML, Betcher D, funding agencies reviewed the original research proposal but were not Ferguson WS, Gebhardt MC, Goorin AM, Harris M, et al. Osteosarcoma: the involved in designing the study, collection, analysis or interpretation of data addition of muramyl tripeptide to chemotherapy improves overall survival-- or writing of the manuscript. a report from the Children's oncology group. J Clin Oncol. 2008;26(4):633–8. 8. Kung FH, Pratt CB, Vega RA, Jaffe N, Strother D, Schwenn M, Nitschke R, Availability of data and materials Homans AC, Holbrook CT, Golembe B, et al. Ifosfamide/etoposide The genome sequencing data for the Osteosarcoma TARGET cohort was combination in the treatment of recurrent malignant solid tumors of obtained and available to the public via the dbGAP repository at childhood. A pediatric oncology group phase II study. Cancer. phs000468.v14.p6 [25]. The whole genome sequencing data from the INOVA 1993;71(5):1898–903. cohort could be made available through reasonable request and with 9. Miser JS, Kinsella TJ, Triche TJ, Tsokos M, Jarosinski P, Forquer R, Wesley R, permission of Inova PIs after the manuscript has been published. Magrath I. Ifosfamide with mesna uroprotection and etoposide: an effective Bhuvaneshwar et al. BMC Cancer (2019) 19:357 Page 15 of 16

regimen in the treatment of recurrent sarcomas and other tumors of 36. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, children and young adults. J Clin Oncol. 1987;5(8):1191–8. Sklar P, de Bakker PI, Daly MJ, et al. PLINK: a tool set for whole-genome 10. Fuchs N, Bielack SS, Epler D, Bieling P, Delling G, Korholz D, Graf N, Heise U, association and population-based linkage analyses. Am J Hum Genet. Jurgens H, Kotz R, et al. Long-term results of the co-operative German- 2007;81(3):559–75. Austrian-Swiss osteosarcoma study group's protocol COSS-86 of intensive 37. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second- multidrug chemotherapy and surgery for osteosarcoma of the limbs. Ann generation PLINK: rising to the challenge of larger and richer datasets. Oncol. 1998;9(8):893–9. Gigascience. 2015;4:7. 11. International HapMap C. A haplotype map of the human genome. Nature. 38. Caret. https://cran.r-project.org/web/packages/caret/caret.pdf. Accessed 11 2005;437(7063):1299–320. Aug 2017. 12. Deeken J. The Affymetrix DMET platform and pharmacogenetics in drug 39. Benjamini Y, Hochberg Y. Controlling the false discovery rate - a practical development. Curr Opin Mol Ther. 2009;11(3):260–8. and powerful approach to multiple testing. J Roy Stat Soc B Met. 13. Iyer L, Das S, Janisch L, Wen M, Ramirez J, Karrison T, Fleming GF, Vokes EE, 1995;57(1):289–300. Schilsky RL, Ratain MJ. UGT1A1*28 polymorphism as a determinant of 40. Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, irinotecan disposition and toxicity. Pharmacogenomics J. 2002;2(1):43–7. Ruden DM. A program for annotating and predicting the effects of single 14. Evans WE, Hon YY, Bomgaars L, Coutre S, Holdsworth M, Janco R, Kalwinsky nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila D, Keller F, Khatib Z, Margolin J, et al. Preponderance of thiopurine S- melanogaster strain w1118; iso-2; iso-3. Fly (Austin). 2012;6(2):80–92. methyltransferase deficiency and heterozygosity among patients intolerant 41. Survival: Survival Analysis. https://cran.r-project.org/package=survival. to mercaptopurine or azathioprine. J Clin Oncol. 2001;19(8):2293–301. Accessed 23 June 2017. 15. Pullarkat ST, Stoehlmacher J, Ghaderi V, Xiong YP, Ingles SA, Sherrod A, 42. Goel MK, Khanna P, Kishore J. Understanding survival analysis: Kaplan-Meier Warren R, Tsao-Wei D, Groshen S, Lenz HJ. Thymidylate synthase gene estimate. Int J Ayurveda Res. 2010;1(4):274–8. polymorphism determines response and toxicity of 5-FU chemotherapy. 43. Fabregat A, Sidiropoulos K, Garapati P, Gillespie M, Hausmann K, Haw R, Pharmacogenomics J. 2001;1(1):65–70. Jassal B, Jupe S, Korninger F, McKay S, et al. The Reactome pathway 16. Dehal SS, Kupfer D. CYP2D6 catalyzes tamoxifen 4-hydroxylation in human knowledgebase. Nucleic Acids Res. 2016;44(D1):D481–7. liver. Cancer Res. 1997;57(16):3402–6. 44. Understanding Cancer Genomics. http://www.ubooks.pub/Books/ON/B0/ 17. Deeken JF, Figg WD, Bates SE, Sparreboom A. Toward individualized E10R1010/TOC.html. Accessed 29 Nov 2018. treatment: prediction of anticancer drug disposition and toxicity with 45. Chaudhry AS, Thirumaran RK, Yasuda K, Yang X, Fan Y, Strom SC, Schuetz pharmacogenetics. Anti-Cancer Drugs. 2007;18(2):111–26. EG. Genetic variation in aldo-keto reductase 1D1 (AKR1D1) affects the 18. Hattinger CM, Tavanti E, Fanelli M, Vella S, Picci P, Serra M. Pharmacogenomics expression and activity of multiple cytochrome P450s. Drug Metab Dispos. of genes involved in antifolate drug response and toxicity in osteosarcoma. 2013;41(8):1538–47. Expert Opin Drug Metab Toxicol. 2017;13(3):245–57. 46. Cell adhesion. https://en.wikipedia.org/wiki/Cell_adhesion. Accessed 26 June 19. Horton I, Lin Y, Reed G, Wiepert M, Hart S. Empowering Mayo Clinic 2017. Individualized Medicine with Genomic Data Warehousing. J Pers Med. 2017; 47. Cheng SL, Lecanda F, Davidson MK, Warlow PM, Zhang SF, Zhang L, Suzuki 7(3):E7. https://doi.org/10.3390/jpm7030007. S, St John T, Civitelli R. Human osteoblasts express a repertoire of cadherins, 20. Harris M, Bhuvaneshwar K, Natarajan T, Sheahan L, Wang D, Tadesse MG, which are critical for BMP-2-induced osteogenic differentiation. Shoulson I, Filice R, Steadman K, Pishvaian MJ, et al. Pharmacogenomic J Bone Miner Res. 1998;13(4):633–44. characterization of gemcitabine response--a framework for data integration to 48. Perbal B, Zuntini M, Zambelli D, Serra M, Sciandra M, Cantiani L, Lucarelli E, enable personalized medicine. Pharmacogenet Genomics. 2014;24(2):81–93. Picci P, Scotlandi K. Prognostic value of CCN3 in osteosarcoma. 21. Horak P, Frohling S, Glimm H. Integrating next-generation sequencing into Clin Cancer Res. 2008;14(3):701–9. clinical oncology: strategies, promises and pitfalls. ESMO Open. 2016;1(5): 49. Chae B, Yang KM, Kim TI, Kim WH. Adherens junction-dependent PI3K/Akt e000094. activation induces resistance to genotoxin-induced cell death in 22. FASTQ format. https://en.wikipedia.org/wiki/FASTQ_format. Accessed 11 differentiated intestinal epithelial cells. Biochem Biophys Res Commun. Aug 2017. 2009;378(4):738–43. 23. Osteosarcoma [https://ocg.cancer.gov/programs/target/projects/ 50. Luizon MR, Ahituv N. Uncovering drug-responsive regulatory elements. osteosarcoma] Last Accessed 11 Aug 2017. Pharmacogenomics. 2015;16(16):1829–41. 24. Ma X, Liu Y, Liu Y, Alexandrov LB, Edmonson MN, Gawad C, Zhou X, Li Y, 51. Piwi-interacting RNA (piRNA). https://en.wikipedia.org/wiki/Piwi-interacting_ Rusch MC, Easton J, et al. Pan-cancer genome and transcriptome analyses RNA. Accessed 12 June 2017. of 1,699 paediatric leukaemias and solid tumours. Nature. 52. PIR61811. http://www.genecards.org/cgi-bin/carddisp.pl?gene=PIR61811. 2018;555(7696):371–6. Accessed 12 June 2017. 25. TARGET: Osteosarcoma (OS). https://www.ncbi.nlm.nih.gov/projects/gap/cgi- 53. Ng KW, Anderson C, Marshall EA, Minatel BC, Enfield KS, Saprunoff HL, Lam bin/study.cgi?study_id=phs000468.v14.p6. Accessed 11 Aug 2017. WL, Martinez VD. Piwi-interacting RNAs in cancer: emerging functions and 26. Sequence Alignment Map. https://samtools.github.io/hts-specs/SAMv1.pdf. clinical utility. Mol Cancer. 2016;15:5. Accessed 11 Aug 2017. 54. Ortogero N, Schuster AS, Oliver DK, Riordan CR, Hong AS, Hennig GW, 27. Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ Luong D, Bao J, Bhetwal BP, Ro S, et al. A novel class of somatic small RNAs files [https://github.com/najoshi/sickle] Last Accessed 26 June 2017. similar to germ cell pachytene PIWI-interacting small RNAs. J Biol Chem. 28. Langmead B, Salzberg SL. Fast gapped-read alignment with bowtie 2. Nat 2014;289(47):32824–34. Methods. 2012;9(4):357–9. 55. Mei Y, Wang Y, Kumari P, Shetty AC, Clark D, Gable T, MacKerell AD, Ma MZ, 29. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis Weber DJ, Yang AJ, et al. A piRNA-like small RNA interacts with and G, Durbin R. Genome project data processing S: the sequence alignment/ modulates p-ERM proteins in human somatic cells. Nat Commun. map format and SAMtools. Bioinformatics. 2009;25(16):2078–9. 2015;6:7316. 30. Picard. http://broadinstitute.github.io/picard. Accessed 26 June 2017. 56. Wang Y, Gable T, Ma MZ, Clark D, Zhao J, Zhang Y, Liu W, Mao L, Mei Y. A 31. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, piRNA-like small RNA induces Chemoresistance to cisplatin-based therapy Garimella K, Altshuler D, Gabriel S, Daly M, et al. The genome analysis by inhibiting apoptosis in lung squamous cell carcinoma. toolkit: a MapReduce framework for analyzing next-generation DNA Mol Ther Nucleic Acids. 2017;6:269–78. sequencing data. Genome Res. 2010;20(9):1297–303. 57. DGCR6. http://www.genecards.org/cgi-bin/carddisp.pl?gene=DGCR6. 32. Davis AM, Bell RS, Goodwin PJ. Prognostic factors in osteosarcoma: a critical Accessed 30 June 2017. review. J Clin Oncol. 1994;12(2):423–31. 58. DIGEORGE SYNDROME CRITICAL REGION GENE 6; DGCR6. https://www. 33. Hey J. What's so hot about recombination hotspots? PLoS Biol. 2004;2(6):e190. omim.org/entry/601279. Accessed 30 June 2017. 34. Haplotype. https://en.wikipedia.org/wiki/Haplotype. Accessed 26 June 2017. 59. GenBank: DQ786190.1. https://www.ncbi.nlm.nih.gov/nuccore/DQ786190. 35. Tan Q, Christiansen L, Bathum L, Zhao JH, Yashin AI, Vaupel JW, Christensen Accessed 30 June 2017. K, Kruse TA. Estimating haplotype relative risks on human survival in 60. Savinkova LK, Ponomarenko MP, Ponomarenko PM, Drachkova IA, Lysova population-based association studies. Hum Hered. 2005;59(2):88–97. MV, Arshinova TV, Kolchanov NA. TATA box polymorphisms in human gene Bhuvaneshwar et al. BMC Cancer (2019) 19:357 Page 16 of 16

promoters and associated hereditary pathologies. Biochemistry (Mosc). 85. Jacob AG, Smith CWJ. Intron retention as a component of regulated gene 2009;74(2):117–29. expression programs. Hum Genet. 2017;136(9):1043–57. 61. Fenger JM, Roberts RD, Iwenofu OH, Bear MD, Zhang X, Couto JI, Modiano 86. Dvinge H, Bradley RK. Widespread intron retention diversifies most cancer JF, Kisseberth WC, London CA. MiR-9 is overexpressed in spontaneous transcriptomes. Genome Med. 2015;7(1):45. canine osteosarcoma and promotes a metastatic phenotype including 87. Sailaja K, Rao VR, Yadav S, Reddy RR, Surekha D, Rao DN, Raghunadharao D, invasion and migration in osteoblasts and osteosarcoma cell lines. BMC Vishnupriya S. Intronic SNPs of TP53 gene in chronic myeloid leukemia: Cancer. 2016;16(1):784. impact on drug response. J Nat Sci Biol Med. 2012;3(2):182–5. 62. Sarkar FH, Li Y, Wang Z, Kong D, Ali S. Implication of microRNAs in drug 88. Wang D, Guo Y, Wrighton SA, Cooke GE, Sadee W. Intronic polymorphism resistance for designing novel cancer therapy. Drug Resist Updat. in CYP3A4 affects hepatic expression and response to statin drugs. 2010;13(3):57–66. Pharmacogenomics J. 2011;11(4):274–86. 63. SLC13A2 Gene. http://www.genecards.org/cgi-bin/carddisp.pl?gene= 89. Elens L, Becker ML, Haufroid V, Hofman A, Visser LE, Uitterlinden AG, Stricker SLC13A2. Accessed 9 June 2017. B, van Schaik RH. Novel CYP3A4 intron 6 single nucleotide polymorphism is 64. Fenger JM. Investigating the biological and molecular consequences of associated with simvastatin-mediated cholesterol reduction in the MiR-9 dysregulation in canine mast cell tumors and osteosarcoma. The Rotterdam study. Pharmacogenet Genomics. 2011;21(12):861–6. Ohio State University; 2015. http://rave.ohiolink.edu/etdc/view?acc_num= 90. Elens L, Bouamar R, Hesselink DA, Haufroid V, van der Heiden IP, van Gelder osu1429761923. T, van Schaik RH. A new functional CYP3A4 intron 6 polymorphism 65. Sowers R, Wenzel BD, Richardson C, Meyers PA, Healey JH, Levy AS, Gorlick significantly affects tacrolimus pharmacokinetics in kidney transplant R. Impairment of methotrexate transport is common in osteosarcoma recipients. Clin Chem. 2011;57(11):1574–83. tumor samples. Sarcoma. 2011;2011:834170. 91. Elens L, Bouamar R, Hesselink DA, Haufroid V, van Gelder T, van Schaik RH. 66. Pletscher-Frankild S, Palleja A, Tsafou K, Binder JX, Jensen LJ. DISEASES: text The new CYP3A4 intron 6 C>T polymorphism (CYP3A4*22) is associated mining and data integration of disease-gene associations. Methods. 2015;74:83–9. with an increased risk of delayed graft function and worse renal function in 67. Park JA, Shin HY. Influence of genetic polymorphisms in the folate pathway cyclosporine-treated kidney transplant patients. Pharmacogenet Genomics. on toxicity after high-dose methotrexate treatment in pediatric 2012;22(5):373–80. osteosarcoma. Blood Res. 2016;51(1):50–7. 92. Matsui M, Corey DR. Non-coding RNAs as drug targets. Nat Rev Drug 68. Park JA, Shin HY. ATIC gene polymorphism and histologic response to Discov. 2017;16(3):167–79. chemotherapy in pediatric osteosarcoma. J Pediatr Hematol Oncol. 2017; 93. Sur I, Taipale J. The role of enhancers in cancer. Nat Rev Cancer. 39(5):e270–4. 2016;16(8):483–93. 69. Lin L, Yee SW, Kim RB, Giacomini KM. SLC transporters as therapeutic 94. Chen T, Wang Z, Zhou W, Chong Z, Meric-Bernstam F, Mills GB, Chen K. targets: emerging opportunities. Nat Rev Drug Discov. 2015;14(8):543–60. Hotspot mutations delineating diverse mutational signatures and biological 70. Li Q, Shu Y. Role of solute carriers in response to anticancer drugs. Mol Cell utilities across cancer types. BMC Genomics. 2016;17 Suppl 2:394. Ther. 2014;2:15. 71. PIK3R1 [http://www.genecards.org/cgi-bin/carddisp.pl?gene=PIK3R1] Last Accessed 9 June 2017. 72. Weigelt B, Downward J. Genomic determinants of PI3K pathway inhibitor response in Cancer. Front Oncol. 2012;2:109. 73. Chang MT, Asthana S, Gao SP, Lee BH, Chapman JS, Kandoth C, Gao J, Socci ND, Solit DB, Olshen AB, et al. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat Biotechnol. 2016;34(2):155–63. 74. Zhao J, Cheng L. Long non-coding RNA CCAT1/miR-148a axis promotes osteosarcoma proliferation and migration through regulating PIK3IP1. Acta Biochim Biophys Sin Shanghai. 2017;49(6):503–12. 75. Lee TI, Young RA. Transcriptional regulation and its misregulation in disease. Cell. 2013;152(6):1237–51. 76. Fasching PA, Heusinger K, Haeberle L, Niklos M, Hein A, Bayer CM, Rauh C, Schulz-Wendtland R, Bani MR, Schrauder M, et al. Ki67, chemotherapy response, and prognosis in breast cancer patients receiving neoadjuvant treatment. BMC Cancer. 2011;11:486. 77. Kim KI, Lee KH, Kim TR, Chun YS, Lee TH, Park HK. Ki-67 as a predictor of response to neoadjuvant chemotherapy in breast cancer patients. J Breast Cancer. 2014;17(1):40–6. 78. Jabeen S, Holmboe L, Alnaes GI, Andersen AM, Hall KS, Kristensen VN. Impact of genetic variants of RFC1, DHFR and MTHFR in osteosarcoma patients treated with high-dose methotrexate. Pharmacogenomics J. 2015;15(5):385–90. 79. Passetti F, Ferreira CG, Costa FF. The impact of microRNAs and alternative splicing in pharmacogenomics. Pharmacogenomics J. 2009;9(1):1–13. 80. Chhibber A, French CE, Yee SW, Gamazon ER, Theusch E, Qin X, Webb A, Papp AC, Wang A, Simmons CQ, et al. Transcriptomic variation of pharmacogenes in multiple human tissues and lymphoblastoid cell lines. Pharmacogenomics J. 2017;17(2):137–45. 81. Moitra K, Scally M, McGee K, Lancaster G, Gold B, Dean M. Molecular evolutionary analysis of ABCB5: the ancestral gene is a full transporter with potentially deleterious single nucleotide polymorphisms. PLoS One. 2011;6(1):e16318. 82. Shihab HA, Gough J, Mort M, Cooper DN, Day IN, Gaunt TR. Ranking non- synonymous single nucleotide polymorphisms based on disease concepts. Hum Genomics. 2014;8:11. 83. Pinto N, Dolan ME. Clinically relevant genetic variations in drug metabolizing enzymes. Curr Drug Metab. 2011;12(5):487–97. 84. Sim SC, Kacevska M, Ingelman-Sundberg M. Pharmacogenomics of drug- metabolizing enzymes: a recent update on clinical implications and endogenous effects. Pharmacogenomics J. 2013;13(1):1–11.