Published OnlineFirst May 26, 2016; DOI: 10.1158/0008-5472.CAN-16-0058
Cancer Therapeutics, Targets, and Chemical Biology Research
Diverse, Biologically Relevant, and Targetable Gene Rearrangements in Triple-Negative Breast Cancer and Other Malignancies Timothy M. Shaver1,2, Brian D. Lehmann1,2, J. Scott Beeler1,2, Chung-I Li3, Zhu Li1,2, Hailing Jin1,2, Thomas P. Stricker4, Yu Shyr5,6, and Jennifer A. Pietenpol1,2
Abstract
Triple-negative breast cancer (TNBC) and other molecularly discovered a clinical occurrence and cell line model of the target- heterogeneous malignancies present a significant clinical chal- able FGFR3–TACC3 fusion in TNBC. Expanding our analysis to lenge due to a lack of high-frequency "driver" alterations amena- other malignancies, we identified a diverse array of novel and ble to therapeutic intervention. These cancers often exhibit geno- known hybrid transcripts, including rearrangements between mic instability, resulting in chromosomal rearrangements that noncoding regions and clinically relevant genes such as ALK, affect the structure and expression of protein-coding genes. How- CSF1R, and CD274/PD-L1. The over 1,000 genetic alterations ever, identification of these rearrangements remains technically we identified highlight the importance of considering noncod- challenging. Using a newly developed approach that quantita- ing gene rearrangement partners, and the targetable gene tively predicts gene rearrangements in tumor-derived genetic fusions identified in TNBC demonstrate the need to advance material, we identified and characterized a novel oncogenic fusion gene fusion detection for molecularly heterogeneous cancers. involving the MER proto-oncogene tyrosine kinase (MERTK) and Cancer Res; 76(16); 4850–60. 2016 AACR.
Introduction quency and widely varying clonality of therapeutically action- able alterations across subtypes, including mutations in Despite advances in precision medicine, the treatment of PIK3CA and BRAF,amplification of EGFR,lossofPTEN,and many molecularly heterogeneous cancers remains challenging expression of androgen receptor (AR;refs.3–5). However, a due to a lack of recurrent alterations amenable to therapeutic significant proportion of TNBC cases lack somatic alterations of intervention. To make significant clinical advances against any established "driver" gene, highlighting the importance of these recalcitrant cancers, integrative genomic and molecular continued genomic analysis and discovery integrated with analyses are required to understand their complexity and to molecular profiling of the tumor microenvironment (3). identify targetable features arising from lower-frequency genet- Adefining feature of TNBC and other clinically challenging, ic events. Our laboratory has identified six molecular subtypes molecularly heterogeneous cancers such as ovarian carcinoma of one such heterogeneous disease, triple-negative breast cancer and lung squamous cell carcinoma is copy number alteration (TNBC), with each subtype displaying unique ontologies and (CNA), which is frequently accompanied by mutation or inac- differential response to standard-of-care chemotherapy (1, 2). tivation of the p53 tumor suppressor (4, 6–8). The genomic Ongoing genomic analysis of TNBC has identified a low fre- instability that gives rise to CNA can also result in chromo- somal rearrangements and gene fusions, which have long been 1Department of Biochemistry,Vanderbilt University, Nashville,Tennes- recognized as oncogenic drivers and effective drug targets in a see. 2Vanderbilt-Ingram Cancer Center,Vanderbilt University Medical variety of hematological and solid malignancies (9). In recent Center, Nashville, Tennessee. 3Department of Statistics, National 4 years, the advent of next-generation sequencing (NGS) has Cheng Kung University, Tainan, Taiwan. Department of Pathology, fi Microbiology, and Immunology,Vanderbilt University Medical Center, enabled the identi cation of a number of gene fusion events Nashville, Tennessee. 5Center for Quantitative Sciences, Vanderbilt in epithelial cancers, with varying frequency and therapeutic 6 University Medical Center, Nashville, Tennessee. Department of Bio- relevance (10–14). statistics, Vanderbilt University Medical Center, Nashville, Tennessee. In order to broaden our understanding of the somatic altera- Note: Supplementary data for this article are available at Cancer Research tions underlying TNBC, we sought to identify known and novel Online (http://cancerres.aacrjournals.org/). gene rearrangements affecting the structure and/or expression T.M. Shaver and B.D. Lehmann contributed equally to this article. level of protein-coding transcripts. While a number of compu- Corresponding Authors: Jennifer A. Pietenpol, Vanderbilt-Ingram Cancer tational approaches have been developed to identify hybrid Center, 652 Preston Research Building, 2200 Pierce Avenue, Nashville, RNA and DNA sequences using NGS data, numerous technical TN 37232-6838. Phone: 615-936-1512; Fax: 615-936-1790; E-mail: hurdles complicate accurate and efficient gene rearrangement [email protected]; and Brian D. Lehmann, detection (15). For instance, widespread regions of sequence [email protected] homology and current limitations in read length result in doi: 10.1158/0008-5472.CAN-16-0058 sometimes-ambiguous alignment and a large number of false 2016 American Association for Cancer Research. positives. To combat this, many detection methodologies use
4850 Cancer Res; 76(16) August 15, 2016
Downloaded from cancerres.aacrjournals.org on September 29, 2021. © 2016 American Association for Cancer Research. Published OnlineFirst May 26, 2016; DOI: 10.1158/0008-5472.CAN-16-0058
Diverse and Targetable Gene Rearrangements in Cancer
filters designed to enrich for biologically relevant gene rearran- Ba/F3 IL3 withdrawal gements, such as restricting both potential fusion partners to Stably transfected Ba/F3 cells were seeded in 6-well plates protein-coding loci (16). While these filters enrich for currently (40,000 cells/well) in growth medium 1 ng/mL IL3. Live-cell known gene fusions, they are not effective for the discovery of counts were manually determined at three sequential 24-hour less canonical events, such as rearrangements between coding timepoints by visual inspection using a hemocytometer and and noncoding regions of the genome. We developed a new Trypan blue (Bio-Rad). algorithm, Segmental Transcript Analysis (STA), which uses exon-level expression estimates to rank a population of sam- Cloning and generation of stable cell lines ples based on their likelihood of harboring a rearrangement in The TMEM87B-MERTK expression construct, corresponding a given gene. We identified a number of known and novel to amino acids 1-55 of TMEM87B (NM_032824) and amino rearrangements involving functionally diverse gene partners in acids 433-1000 of MERTK (NM_006343), was synthesized in a TNBC, and expanded our analysis and discovery to a wider, pMK-RQ-Bb vector using GeneOptimizer and GeneArt (Life multicancer cohort from The Cancer Genome Atlas (TCGA). Technologies) and cloned into pBABE-puro (19) (Addgene) The DNA-validated rearrangements that we identified include by EcoRI/SalI restriction digest and ligation (New England noncoding portions of the genome acting as both 50 and 30 BioLabs). The complete pBABE-puro-TMEM87B-MERTK ex- partners in clinically relevant hybrid transcripts. pression construct sequence is available upon request. To generate stably transfected cell lines, pBABE-puro-TMEM87B- Materials and Methods MERTK or pBABE-puro empty vector retroviruses were pack- aged in Phoenix cells (Orbigen) and Ba/F3 and MCF10A Cell culture cells were transduced with virus for two 24-hour intervals with SUM185PE (Asterand, March 2010) and MCF10A cells (ATCC, 8 mg/mL polybrene (Sigma), then selected and maintained June 2012) were cultured as previously described (1). Ba/F3 cells with 1.5 mg/mL (Ba/F3) or 0.5 mg/mL (MCF10A) puromycin (provided by Dr. Christine Lovly, Vanderbilt University, Novem- (Sigma). ber 2014) were cultured in RPMI þ GlutaMAX (Gibco 61870) with 1 ng/mL IL3 (Life Technologies) and 5% (v/v) FBS (Gemini). All cell lines were maintained in 100 U/mL peni- Immunoblotting cillin and 100 mg/mL streptomycin (Gemini) and tested neg- SUM185PE cells were lysed 72 hours after siRNA transfection. ative for mycoplasma (Lonza). Cell Line Genetics performed Ba/F3 cells were grown in suspension and incubated for 90 positive short tandem repeat DNA fingerprinting analysis on minutes in the indicated media before lysis. MCF10A cells seeded SUM185PE (March 2011); cells were cultured for fewer than in 10-cm plates were incubated for 180 minutes in the indicated 2 months after identification. We also verified SUM185PE by growth media before in-well lysis. All cells were lysed in RIPA manual identification of unique variants in NGS data. DNA buffer supplemented with protease and phosphatase inhibitors. fingerprinting analysis was not performed on MCF10A or Ba/ Cell lysates [40 mg (SUM185PE) or 30 mg (Ba/F3 and MCF10A)] F3 cells, but the Ba/F3 cell line displayed the previously were separated on polyacrylamide gels and transferred to poly- published IL3 dependence phenotype (17). vinyl difluoride membranes (Millipore). Immunoblotting was performed using FGFR3 B-9 (1:200, Santa Cruz), GAPDH FGFR3/TACC3 siRNA transfection MAB374 (1:1000, Millipore), MERTK D21F11 (1:1000, Cell SUM185PE cells (8,000 cells/well) were reverse transfected Signaling Technology), phospho-Akt Ser473 D9E (1:2000, Cell in a 96-well format with RNAiMax (0.1 mL, Life Technologies) Signaling Technology), total Akt (1:1000, Cell Signaling 9272), and siRNAs (1.25 pmole) targeting the FGFR3 exon 4/5 junc- phospho-Erk1/2 Thr202/Tyr204 D.13.14.4E (1:2000, Cell Signal- tion (Dharmacon D-003133-06), FGFR3 exon 11 (D-003133- ing Technology), and total Erk1/2 3A7 (1:2000, Cell Signaling 05), TACC3 exon 13 (D-004155-03), TACC3 30 UTR (D- Technology). 004155- 04), TACC3 exon 4 (D-004155-01), or TACC3 exon 5 (D-004155-02; siRNA #1-6, respectively). AllStars Negative Segmental Transcript Analysis algorithm Control siRNA and AllStars Hs Cell Death Control siRNA The median-normalized, exon-level RPKM vector for sample (Qiagen) were included as experimental controls. Viability was i k RPKMg i k assessed at 72 hours by incubating cells with alamarBlue in gene is denoted as ik. The Fscore for sample in gene is fi (Invitrogen). For each experiment, the mean fluorescence value de ned as dGD (Ex/Em: 560/590 nm) of four wells per condition was normal- Fscore ¼ ik i ik G ized to the negative control. stdev d Di 8i ik
IC50 determination G g g where d is the geometric mean of dðRPKM ik; RPKM i kÞ, 8i „ i' SUM185PE cells were seeded in quadruplicate (8,000 cells/ ik ' and d(x, y) is a distance measurement (i.e., Euclidean distance) well) in 96-well plates. After overnight attachment, growth medi- between x and y. Di is the biggest difference of RPKM with lag 1 in um was replaced with medium (control) or medium containing G G sample i and stdev8id Di is the standard deviation of d Di for gene half-log serial dilutions of the FGFR inhibitor PD173074 (Sell- ik ik k eckchem). Viability was assessed at 72 hours by incubating cells . For ranking Fscore across genes, the normalized Fscore is fi with alamarBlue (Invitrogen). Half-maximal inhibitory con- de ned as Fscoreik min Fscoreik centration (IC50) values were determined after normalization to 8i;8k untreated wells and double-log transformation of dose response Gscoreik ¼ : max Fscoreik min Fscoreik curves as previously described (18). 8i;8k 8i;8k
www.aacrjournals.org Cancer Res; 76(16) August 15, 2016 4851
Downloaded from cancerres.aacrjournals.org on September 29, 2021. © 2016 American Association for Cancer Research. Published OnlineFirst May 26, 2016; DOI: 10.1158/0008-5472.CAN-16-0058
Shaver et al.
For ranking Gscore between samples, the Segmental Transcript Shah and colleagues data acquisition Analysis score is defined as RNA-seq data for 80 TNBC clinical specimens (3) were down- loaded from the European Genome-phenome Archive as 0 1 fi Gscoreik mean Gscoreik EGAS00001000132 on May 18, 2014. Aligned BAM les were STAscore ¼ @ 8k þ A; converted to fastq using bedtools v2.17.0 and were aligned and ik log2 2 stdev Gscoreik 8k processed as described in the Supplemental Methods.
where mean8kGscoreik is the mean of Gscore for all genes in sample Statistical analysis i and stdev8kGscoreik is the standard deviation of Gscore for all All analyses and graphical representations were performed genes in sample i. using R v3.2.0. Post-siRNA viability ratios were plotted as the P Corresponding R code is included as a supplementary file. mean SEM of four independent experiments, and values were Detailed descriptions of file sources, processing, and filtering are calculated using the Wilcoxon rank-sum test with Bonferroni available in the Supplementary Methods. multiple comparison adjustment. PD173074 IC50 values were plotted as the mean SEM of three independent experiments. Ba/F3 IL3 withdrawal data were plotted as the mean SD of Transcript and protein annotation two conditions with three independent experiments across three Gene locus and exon data were determined by GENCODE and days, and P values were calculated using the restricted maximum RefGene annotations as described in the Supplementary Methods. likelihood (REML)-based mixed effects model. For exon-level expression plots, the exon order is based on sequential order of exons in the RefGene annotation. The presence Supplementary information of isoforms may cause deviation from the normal exon number- Additional data (Supplementary Figs. S1–S9), their corre- ing scheme. For hybrid transcript frame calls, the RefGene anno- sponding legends, and expanded details of sequence file proces- tation was processed to generate starting and ending frame values sing are found as supplementary files. of 0, 1, 2 (CDS) or 1 (UTR) for each exon. Due to the potential for multiple reading frames at a given coordinate, a hybrid Results transcript between two exon boundaries featuring any concor- dance of starting and ending CDS frame was annotated as in- Gene rearrangement prediction by STA frame. A hybrid transcript between two exon boundaries with In order to prioritize downstream computation and minimize non-overlapping CDS frames was annotated as out-of-frame. An filters restricting the genomic location of putative rearrangement exon boundary with exclusively UTR status was annotated partners, we developed an algorithm known as Segmental Tran- accordingly. script Analysis (STA) that uses a distance matrix approach to The domains and protein features depicted in schematics were generate aberrant transcript scores for a population of samples obtained from UniProtKB (20). Features were exclusively selected (Materials and Methods). Based on exon-level expression values from annotations with "Reviewed" status, and all coding features across a gene, STA is used to assign a normalized score for each are to scale. sample by quantifying deviation from the population in both magnitude and directionality of expression. This approach allows the detection of different structural classes of rearrangements— SUM185PE RNA sequencing general up- or downregulation of a transcript might accompany Total RNA was isolated from SUM185PE cells using RNeasy promoter or untranslated region (UTR) swapping, for instance, (Qiagen). RNA quality was assessed by NanoDrop (Thermo) while abrupt gain or loss of expression from one exon to the next and Bioanalyzer (Agilent). Two micrograms of total RNA was could result from a breakpoint within the intervening DNA. While used for the TruSeq Stranded Total RNA Library Prep Kit past discovery approaches using microarray datasets have lever- (Illumina). Libraries were quantified by Qubit (Life Techno- aged exon-level expression comparison in individual transcripts logies) and qPCR, and library size and quality were assessed (21), the novel population-based comparison method used in by Bioanalyzer (Agilent). The constructed RNA-seq library STA effectively controls for confounding issues such as alternative was sequenced at the Vanderbilt Technologies for Advanced splicing and normalization artifacts that cause uneven exon-level Genomics (VANTAGE) core on an Illumina HiSeq 2500 using expression values across genes and facilitates detection of modest a paired-end 100-bp protocol. Reads were de-multiplexed but biologically relevant changes in transcript levels. and trimmed using SeqPrep. FASTQ files are deposited in To evaluate the utility of STA as a prediction tool for both the NCBI Read Sequence Archive under accession number known and novel gene rearrangements, we developed a vertically SRP077076. integrated NGS analysis pipeline (Fig. 1). Briefly, exon-level SUM185PE RNA-seq reads were aligned using the TCGA RNA- expression data were collated for a given tumor type and STA seq v2 pipeline (https://cghub.ucsc.edu/, July 31, 2013 revision) scores were calculated for each sample on a per-gene basis (Fig. 1A to ensure compatibility with TCGA data. Reference data and and B). RNA-seq data for each aberrant transcript passing a custom scripts for exon-level expression quantification were defined STA score threshold were assessed for evidence of rear- downloaded from the UNC database as referenced in the pro- rangement between the candidate gene and another genomic tocol. Alignment was conducted using the default TCGA work- region (Fig. 1C), followed by analysis and realignment of nearby flow and MapSplice v12_07, RSEM v1.1.13, and UBU v1.2. Cell whole-genome sequencing reads (WGS) to identify a DNA break- line RNA-seq data were grouped with the TCGA BRCA dataset for point with unique sequence spanning both rearrangement part- STA input and processed using the STA discovery pipeline ners (Fig. 1D and E; ref. 22). Samples displaying a discrete described in the Supplementary Methods. breakpoint upon realignment were classified as DNA-validated
4852 Cancer Res; 76(16) August 15, 2016 Cancer Research
Downloaded from cancerres.aacrjournals.org on September 29, 2021. © 2016 American Association for Cancer Research. Published OnlineFirst May 26, 2016; DOI: 10.1158/0008-5472.CAN-16-0058
Diverse and Targetable Gene Rearrangements in Cancer
A 500 • 400 300 Collate exon-level RNA-seq expression data 200 • for a population of samples • (RPKM) 100 • 0 Aberrant Transcript: • Expression ROS1 1 5 10 15 20 25 30 35 40 44 Exon B 5 4 Apply Segmental Transcript Analysis (STA) 3 algorithm and determine samples passing 2 STA score threshold for each gene 1 0
ROS1 STA Score STA ROS1 0 100 200 300 400 500 Aberrant Transcript: Sample order C
Identify breakpoint-spanning RNA-seq reads in STA-flagged gene (region A) and determine putative rearrangement partner (region B) ROS1 Exon 33 ROS1 Exon 34 Region A RNA: Region B RNA: D
Identify discordant DNA-seq read pairs mapping to regions A and B
Region A DNA: Region B DNA: ROS1 DNA E
Conduct high-accuracy realignment of nearby DNA-seq reads to sequences of regions A and B CD74 DNA ROS1 DNA Region A DNA: Region B DNA:
Figure 1. Quantitative prediction by STA facilitates an integrated fusion detection pipeline. A–E, stepwise description of STA discovery pipeline with accompanying schematics and example data. A, exon-level expression values for a population of samples plotted as continuous lines. Samples passing STA score threshold (B) are plotted in red and denoted by asterisks; samples below the threshold are plotted in black. B, ROS1 STA scores for each sample plotted in descending order. Samples with an STA score of 2 or above are plotted in red; samples with an STA score below 2 are plotted in black. C, schematic of RNA-seq reads. Dotted lines, continuous read segments. D, schematic of discordant WGS read pairs. Thin lines, read pairs. E, schematic of breakpoint-spanning WGS reads identified after realignment. In C–E, colors indicate alignment location of individual read segments, as depicted in the description at left. Schematics are not to scale.
rearrangements. While reliant upon the availability and adequate adenocarcinoma, although DNA validation was not available in sequencing depth of WGS data, this approach provides indepen- all cases due to incomplete availability of WGS data (Supplemen- dent structural evidence for the validity of any detected rearrange- tary Fig. S1C; ref. 11). Numerous aberrant transcripts were detected ments. As a consequence, rearrangements that might be omitted for the RET receptor tyrosine kinase, which undergoes rearrange- in an RNA sequencing-only approach due to concerns about ment in 10% to 20% of sporadic papillary thyroid cancers (Sup- homology and false positivity, such as hybrid transcripts between plementary Fig. S1D; ref. 26). When possible, we obtained DNA coding and noncoding regions of the genome, can be identified validation for RET fusions in these samples, including known with greater confidence. rearrangements with CCDC6, ERC1,andNCOA4 (Supplementary Using test RNA-seq and WGS data from TCGA, we confirmed Table S1). To evaluate the ability of STA to predict fusions previ- the ability of STA to identify known gene fusions across multiple ously identified in TCGA RNA-seq, we compared the STA score cancer types, including CD74–ROS1 in lung adenocarcinoma, distribution of fusion transcripts across all genes (Supplementary NFASC–NTRK1 in glioblastoma, and TMPRSS2–ETV4 in prostate Fig. S2A; ref. 27) and recurrently fused kinases (Supplementary adenocarcinoma (Fig. 1 and Supplementary Fig. S1A and S1B; Fig. S2B; ref. 28) to the distribution for all genes and samples refs. 23–25). The algorithm also reliably identified therapeutically evaluated. In both cases, STA scores were significantly higher in actionable anaplastic lymphoma kinase (ALK)fusionsinlung the rearrangement-harboring transcripts (P < 2.2 10 16).
www.aacrjournals.org Cancer Res; 76(16) August 15, 2016 4853
Downloaded from cancerres.aacrjournals.org on September 29, 2021. © 2016 American Association for Cancer Research. Published OnlineFirst May 26, 2016; DOI: 10.1158/0008-5472.CAN-16-0058
Shaver et al.
Due to the ability of STA to detect aberrant loss of expression in A novel oncogenic kinase fusion in TNBC addition to gain, we hypothesized that inactivation of tumor To discover known and novel gene rearrangements in clinical suppressors would constitute a substantial portion of STA-pre- TNBC cases, we used NGS data from TCGA and performed STA dicted rearrangements. Indeed, the algorithm displayed a robust prediction on 173 TNBC tumors. We identified and validated at ability to identify intragenic loss of expression of known tumor the DNA level a previously uncharacterized fusion involving the suppressors. In a lung squamous cell carcinoma sample, we MER proto-oncogene tyrosine kinase (MERTK; ref. 34). In this identified a rearrangement resulting in early truncation of the rearrangement, a nearby gene encoding the transmembrane pro- SWI/SNF subunit ARID1A (Supplementary Fig. S3A; ref. 29). We tein TMEM87B acts as 50 partner, breaking shortly after its signal additionally validated rearrangements with noncoding DNA in peptide and fusing with the late extracellular domain-coding bladder and endometrial carcinoma resulting in the loss of portion of MERTK (Fig. 2A). The resulting fusion transcript dis- functional domains of the PI3K-negative regulator PTEN and the plays increased expression and retains the full transmembrane p300 histone acetyltransferase, respectively (Supplementary Fig. and intracellular kinase domains of MERTK, which is overex- S3B and S3C; refs. 30, 31). In a kidney chromophobe tumor, we pressed or ectopically expressed in numerous cancers (Fig. 2B; identified a rearrangement between a noncoding portion of ref. 35). In order to assess if this truncated form of MERTK retains chromosome 5 and the gene encoding the miRNA-processing its ability to activate the oncogenic MAPK/Erk and Akt signaling enzyme DROSHA that results in its early truncation (Supplemen- pathways (36), we engineered a retroviral expression construct tary Fig. S3D; ref. 32). Interestingly, inactivating DROSHA muta- encoding the tumor-derived TMEM87B–MERTK fusion. In the tions have recently been reported in over 10% of cases of Wilms IL3-dependent Ba/F3 mouse lymphocyte cell line, stable expres- tumor, a pediatric kidney cancer (33). sion of TMEM87B–MERTK led to constitutively elevated levels of
A 55 aa B MERTK TMEM87B TMEM87B-MERTK SP SP TM TM TM TMEM87B
432 aa SP SP
MERTK TM TM SP Extracellular TM Kinase
TMEM87B-MERTK (BRCA)
SP TM Kinase Kinase Kinase
TMEM87B- TMEM87B- C Vector D F Vector pBabe MERTK TMEM87B-MERTK pBabe MERTK
pBabe ) Serum/growth Serum/IL3 + – + – 5 30 + – + – P = 0.9681 factors
MERTK 20 MERTK + IL3
10