Combining Whole-Exome and RNA-Seq Data Improves the Quality of PDX Mutation Profiles

Combining whole-exome and RNA-Seq data improves the quality of PDX mutation profiles Manuel Landesfeind 1, Bruno Zeitouni 1, Anne-Lise Peille 1, Vincent Vuaroqueaux 1 1 Oncotest - Charles River, Am Flughafen 12-14, 79108 Freiburg, Germany 1 Introduction 2 Mutation profiling technologies and data analysis 3 Evaluation of WES mutation detection pipeline Patient-derived xenograft tumor models (PDX) are an important tool for anti- Materials: Methods: PDX-specific bioinformatics pipelines (Fig. 3) comprises mapping of • Mutation screening using Sequenom Oncocarta panels 1-3 paired-end reads to human and mouse reference genomes using BWA and cancer agent testing. Oncotest has developed a unique collection of models SEQUENCING DATA (FASTQ) QC • We investigated as reference data, 25 genes in 284 models Figure 4: Venn- WES targeting mutations in 39 genes, validation of mutations by TopHat for WES and RNA-Seq data respectively. Reads that map better to covering all major lineages (Fig. 1). In the context of targeted therapy containing 500 mutations assessed by Sanger sequencing diagram of 500 Sanger sequencing in 25 genes in 284 models mouse are removed. Human-specific duplicate reads are discarded. After base Read Mapping on Human Read Mapping on Mouse mutations from Sanger development, an accurate and comprehensive molecular characterization is • WES data for 300 models, DNA extraction using Agilent quality score recalibration, WES Variant detection is performed utilizing GATK- • Our WES analysis retrieved > 95% of these mutations (Fig. 4) hg19 genome mm10 genome Sanger and 988 crucial for model selection and for biomarker identification. We recently SureSelect All Human Exon kits (v1, v4, or v5) and Lite, Samtools, Freebayes. Samtools only for RNA-Seq data. All variants are • The 4.4% mutations missed by WES are mainly due to pipeline mutations from 510 478 analyzed our PDX collection by whole-exome sequencing (WES) and RNA- sequencing with expected average-of-coverage of 100X annotated by SnpEff to predict their impact at the protein level. The filtering of 22 WES. Seq for the identification of mutations at both the genomic and transcriptomic • 92 models analyzed with RNA-Seq at minimum of 50M Removal of mouse reads variants is based on the variant coverage statistics, the predicted effect in the filtering or not being targeted by the enrichment kit (Fig. 5) levels. paired-end reads protein and by discarding known polymorphisms from population centric • WES analysis reported one additional mutation that was not HUMAN-SPECIFIC MAPPING DATA (BAM) annotation databases such as dbSNP, Hapmap, or 1000 genomes (Table 1). Sequenom / Sanger WES RNA-Seq QC confirmed by Sanger sequencing Here we investigated whether our WES analysis correctly detected mutations 100 Table 1: Filtering thresholds for NGS mutation analysis. previously identified by Sanger sequencing. Afterwards, we compared Variant calling • 509 additional mutations were reported by WES at positions not mutation analyses carried out at the genomic and transcriptomic levels and (%) 75 Filtering parameter Threshold value analyzed by Sanger sequencing Filtered No coverage at exon Alternative allele coverage min. 3 reads Figure 5: Pie chart investigated the observed similarities and discrepancies. Variant annotation (8) (3) 50 • Regarding the genomic regions analyzed by WES and Sanger, a showing models Alternative allele frequency min. 5% Others (64) Colon (59) high overall detection quality is achieved by our pipeline (Tab. 2) distribution of Variant filtering 25 Functional impact “moderate” or “high” reasons for not Position of mutation in Figure 1: Distribution of cancer types STK11 not targeted Profiled retrieving in the available PDX collection by Kidney (25) Found in resequencing project max. 2 times Table 2: Quality metrics for WES mutation detection. 0 mutations in WES. by Agilent Kit v1 38mb lineage (n=339). Data from Oncotest Pan- Mela- Stom- Number of tools reporting variant (WES only) all 3 (6) Lung (74) Colon Lung Breast Kidney Others Total MUTATIONS: SNPs and INDELs (VCF) creas noma ach Sensitivity Precision Compendium v1.6. Stomach (33) Homopolymer run length (RNA-Seq only) max. 5 nucleotides 95.6% 99.8% Melanoma (22) Figure 2: Proportion of PDX models characterized by Figure 3: PDX-specific mutation analysis pipeline for WES and RNA- Median length of mapped segments showing alternative min. 10 bases Breast (17) different methods. Data from Oncotest Compendium v1.6. KRAS mutations not found in pancreatic models Pancreas (45) Seq data. (RNA-Seq only) with high mouse stroma content (5) Figure 9: Mutation landscape WES specific RNA-Seq specific Common of lineage-specific cancer- APC KMT2C 4 Comparison of mutation profiles between WES and RNA-Seq data related genes for all analyzed TP53 KRAS TP53 Conclusions PIK3CA FLG 5 10 20 30 models. Color indicates the WES RNA-Seq NOTCH1 BRCA1 Mutation rate WES 16 Wild-type analysis method by which the TGFBR2 Covered in WES ALK LRP2 allele only mutated gene was reported. For SMAD4 ATM • Good correlation between mutation rates (number of 50 12 (3.3% / 12.2%) Analysis of cancer-related genes using our WES Mutation Mutation (0.5% / 15.4%) 201 genes, mutations are found PTEN RYR3 mutations per Mb) detected by WES and RNA-Seq RNA NOTCH2 RYR2 pipeline accurately identified mutations that were 8 Filtered in WES on both transcriptome and ERBB4 - MAP3K1 Seq rate BRAF previously detected by Sanger sequencing. analysis (Spearman correlation coefficient = 0.79). 40 ERBB3 LRP1 (2.8% / 84.6%) genome. 141 genes are 4 Low coverage in WES EGFR GATA3 • Mutation rate detected in WES is usually higher than mutated on genome level only ATM WES analysis investigated larger genomic ranges and 30 (23.9% / 87.7%) MSH6 ETV6 detected by RNA-Seq (Fig. 6). and 27 genes on transcriptome MLH1 ERBB2 thereby generated more comprehensive mutation level only. Non-expressed Colon Breast profiles. • The proportion of mutation types is not different between 20 genes are marked by “x”. Mutation rate Mb) (per Mutation TP53 WES and RNA-Seq (Fig. 7). KEAP1 Only a small proportion of genomic mutations are 10 RNA-Seq specific SMARCA4 retrieved on the transcriptomic level mainly due to non- (27.2%) U2AF1 • On average, less than 14% of all mutations are detected KRAS RB1 expression of gene or expression of wild-type allele. by both WES and RNA-Seq analyses (Fig. 8). Breast (10) Colon (47) SCLC (5) NSCLC (40) Common Filtered in RNA-Seq CDKN2A NF1 RNA-Seq analysis detected mutations in genomic • Main reasons for not retrieving a mutation are : (13.8%) (1.3% / 7.4%) ALK STK11 regions that are poorly enriched in WES. Figure 6: Mutation rate detected in WES and RNA-Seq data across analyzed PTEN • Non-expression of gene in RNA-Seq cancer types. The mutation rate is higher in WES than in RNA-Seq for 71 models. The ERBB2 Our results demonstrate the importance of transcript BRAF • Mono-allelic expression of wild-type in RNA-Seq mutation rate is calculated from mutations found in target regions of WES and in ARID1A mutation analysis which is closer to the protein level KDR expressed transcripts of RNA-Seq respectively. Top-right figure displays linear Figure 8: Comparison NFE2L2 and enhances understanding of cancer biology. • Low coverage in WES MET regression analysis. of overlap of WES specific MAG • Proportion of genes with mutation detected at both levels mutations detected in (58.9%) Covered KIF5B We believe that the combined molecular WES RNA-Seq in RNA-Seq ERBB4 reached 54.6% for cancer-related genes (Fig. 9). WES and RNA-Seq SLC34A2 characterization profiles from WES and RNA-Seq data analyses. Percentages (17.6% / 29.9%) SETD2 increase the quality of our PDX collection and aid the Discrepancies between WES and RNA-Seq for these 100 25 ERBB3 are averaged for 92 cancer-related genes are due to similar causes as for the NSCLC SCLC selection of models for cancer therapy studies as well 20 single model analyses. Low coverage in RNA-Seq 75 as for biomarker investigations. overall observation (Fig. 10). For each category, the (41.3% / 70.1%) Filtered (9) 15 first percentage (bold Gene not Figure 10: Pie chart showing 50 font) specifies the expressed Mono-allelic distribution of reasons for not 10 fraction of all exonic (41) Figure 7: Boxplot for proportions of mutations types reported by Gene not expressed expression of retrieving mutations in cancer- 25 mutations while the Gene expressed Mono-allelic WES and RNA-Seq analyses (n=92 per mutation type and data 5 (33.7% / 81.5%) wild-type related genes using RNA-Seq data. Proportion of mutations (%) of mutations Proportion second percentage (7.6% / 18.5%) expression type). Y-axis limits differ between plots due to the high proportion of (15.8% / 89.8%) Data from WES specific mutations specifies the proportion of wild-type missense mutations. 0 0 Homozygous exposed in Fig. 9. Low coverage but with respect to the (58) Mis- Codon Codon Frame Splice Non- in WES gene expressed Start lost Stop lost category above. sense deletion insertion shift site sense (0.5% / 2.8%) (33) .

Combining Whole-Exome and RNA-Seq Data Improves the Quality of PDX Mutation Profiles

Next-Gen Sequencing Identifies Non-Coding Variation Disrupting

Whole Exome and Whole Genome Sequencing

Clinical Exome Sequencing Tip Sheet – Medicare Item Numbers 73358/73359

The Genomics Era: the Future of Genetics in Medicine - Glossary

Whole Exome Sequencing (WES)

Genomic Technologies for Cancer Research

RNA Next Generation Sequencing Resources Available at the Experimental and Computational Genomics Core (ECGC)

Guide to Interpreting Genomic Reports: a Genomics Toolkit

Examples of How ENCODE Facilitates Biomedical Research

Report 103356

Whole Exome and Transcriptome Sequencing Solutions for Solid Tumor and Hematological Cancers

Glossary of Terms