Robin Paul, Jeffrey P. Solzak, Milan Radovich Indiana University School of Medicine, Indianapolis, Indiana
Total Page:16
File Type:pdf, Size:1020Kb
Whole genome sequencing analysis of 77 breast cancer patients reveal increased germline transposable element insertions responsible for early onset of cancer Robin Paul, Jeffrey P. Solzak, Milan Radovich Indiana University School of Medicine, Indianapolis, Indiana Comparison of germline TE insertions Abstract TE insertion analysis on 77 TCGA breast cancer patients against age of diagnosis More than 50% of the human genome consists of transposable Moderate germline and somatic TE correlation L1 most active type of TE in somatic TE insertions High germline TE patients tend to have a lower age elements (TE), which mostly remain inactive and relatively of diagnosis (high TE: 47yrs, low TE = 59.77yrs) stable in healthy individuals. However, some retrotransposons P=0.019 are believed to become active during cancer due to loss of epigenetic silencing and DNA damage repair mechanisms. In this study, we analyzed somatic and germline TE insertions in 77 TCGA breast cancer (BC) cases and performed association analysis with corresponding somatic and germline variants. In our analysis, we identified a subset of patients who harbored a high number of germline TE insertions. Statistical burden testing of germline TE insertions with germline genetic variants (SNPs and Indels), revealed an enrichment of deleterious variants in the FANCF, MLH1, LGR6 and RING1 genes. These genes are known to regulate TEs using RNA silencing and other DNA damage repair mechanisms. Further, patients with Fig. 5 - Box plot of top 25% and bottom 75% germline TE patient’s age of high germline TE insertions had a lower median age of diagnosis of breast cancer (Welch’s two sample t-test p-value = 0.019, median diagnosis (top quartile vs bottom 75%, p-value=0.019), age of high TE patients = 47.15yrs, median age of low TE patients = 59.77yrs) suggesting a potential novel role of TEs in cancer predisposition. Lastly, comparison of somatic TE insertion Germline variant analysis pattern against somatic tumor mutation burden revealed an Fig. 3(a) Scatter plot of whole genome somatic TEs vs whole genome germline TEs (b) Scatter plot of ratio of (somatic L1)/(total somatic TE) vs total somatic TE mutually exclusive relationship, indicating that analysis of TE (Pearson correlation coefficient, R2 = 0.61, p-value = 4.84 × 10-9) insertions is critical to breast cancer diagnosis in the clinic. No enrichment of germline TEs between No enrichment of somatic TEs between Fig. 6 – Bioinformatics various hormonal status various hormonal status pipeline for variant analysis Bioinformatics Methodology of TCGA breast cancer patients Results of gene burden analysis Table 1 – List of (a) Germline (b) Somatic mutations in genes (and their corresponding biological pathway) responsible for silencing TEs (a) Germline variant analysis vs germline TEs gene-burden analysis Fig. 1 - Bioinformatics methodology for determination of TE inserts from LGR6, FANCF – Known TE repressor whole genome sequencing (WGS) data. MLH1, MSH6 - DNA repair TNRC6B - Epigenetic silencing (b) Germline + somatic variant analysis vs germline + somatic TEs gene-burden analysis Wet-lab validation in HCC1143 cell line (c) Somatic and (d) germline TE distribution in whole genomes of 77 TCGA breast cancer patients subdivided by histological type (TNBC, HER2+ and ER/PR+). MLH1, MSH6 - DNA repair TNRC6B, RING1 - Epigenetic regulation CCNK - transcriptional regulation Comparison of somatic TE insertions trend against tumor mutation burden LGR6, FANCF, ZNF10 – Known TE repressor and microsatellite instability Discussion We explore TE insertion as another type of mutation which may be Somatic TEs and TMB are mutually exclusive Somatic TEs and MSI are mutually exclusive responsible for cancer. Designed a bioinformatics pipeline (Fig. 1) for determination of TE insertions from WGS data and validated on the HCC1143 cell line (Fig. 2). High germline TE insertion patients tend to also have high somatic TE patients (Fig. 3a). L1 and ALU are the most active type of germline TEs Fig. 2 (a) View of PBRM1 gene (b) Gel electrophoresis of PCR whereas in case of somatic TEs L1s seem to be only active type of TE locus harboring TE insertion. amplified PBRM1 gene locus (Fig. 3b). TE insertions not specific towards any specific hormonal status (Fig. 3c and 3d). Comparison of somatic TEs against tumor mutation burden (TMB) (Fig. 500bp Tumor 4a) and microsatellite instability (MI) (Fig. 4b) reveals an mutually line exclusive relationship suggesting analysis of TE insertions is necessary 400bp to get the complete mutation profile of a patient. 300bp Blood On comparing germline TEs against age of diagnosis reveals top quartile 200bp line of higher germline TE patients having lower age of diagnosis than other patients suggesting TEs could be responsible for cancer predisposition Tumor Blood (Fig. 5). (c) Gel electrophoresis of PCR (d) Sanger sequencing of PCR (b) Scatterplot of number of whole genome somatic TE insertions vs Variant analysis (Fig. 6) revealed several deleterious germline and Fig. 4 (a) Scatterplot of number of whole genome somatic TE insertions somatic mutations in genes known to be responsible for silencing TEs amplified TMEM181 gene amplified TMEM181 gene locus vs total mutation burden (from COSMIC database). total microsatellite instability. locus (Table 1).