ARTICLE https://doi.org/10.1038/s41467-019-13035-2 OPEN Transposable element expression in tumors is associated with immune infiltration and increased antigenicity Yu Kong1, Christopher M. Rose 2,5, Ashley A. Cass3,5, Alexander G. Williams 2, Martine Darwish2, Steve Lianoglou2, Peter M. Haverty2, Ann-Jay Tong2, Craig Blanchette2, Matthew L. Albert 2, Ira Mellman2, Richard Bourgon 2, John Greally 1, Suchit Jhunjhunwala2 & Haiyin Chen-Harris 2,4* 1234567890():,; Profound global loss of DNA methylation is a hallmark of many cancers. One potential consequence of this is the reactivation of transposable elements (TEs) which could stimulate the immune system via cell-intrinsic antiviral responses. Here, we develop REdiscoverTE,a computational method for quantifying genome-wide TE expression in RNA sequencing data. Using The Cancer Genome Atlas database, we observe increased expression of over 400 TE subfamilies, of which 262 appear to result from a proximal loss of DNA methylation. The most recurrent TEs are among the evolutionarily youngest in the genome, predominantly expressed from intergenic loci, and associated with antiviral or DNA damage responses. Treatment of glioblastoma cells with a demethylation agent results in both increased TE expression and de novo presentation of TE-derived peptides on MHC class I molecules. Therapeutic reactivation of tumor-specific TEs may synergize with immunotherapy by inducing inflammation and the display of potentially immunogenic neoantigens. 1 Department of Genetics and Center for Epigenomics, Albert Einstein College of Medicine, New York 10461, USA. 2 Genentech, Inc., 1 DNA Way, South San Francisco, CA 94080, USA. 3 Department of Bioinformatics Interdepartmental Program, University of California at Los Angeles, Los Angeles, CA, USA. 4Present address: Argonaut Genomics, Inc., 1025 Alameda De Las Pulgas #806, Belmont, CA 94002, USA. 5These authors contributed equally: Christopher M. Rose, Ashley A. Cass. *email: [email protected] NATURE COMMUNICATIONS | (2019) 10:5228 | https://doi.org/10.1038/s41467-019-13035-2 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-13035-2 ver the past few years, we have come to understand that expression with host genes or intron retention, we next divided Otumor-associated neoantigens provide important targets the aggregated expression for each TE subfamily into exonic, for anticancer T-cell responses. This conceptual break- intronic, and intergenic expression by stratifying all elements through has led to efforts to therapeutically target patient-specific under a given TE subfamily by their genomic locations with mutations using personalized vaccines1. Neoantigens stemming respect to host genes. For example, of the 1610 annotated from point mutations in the coding exons alone, however, likely instances of L1HS, 951, 654, and 5 are located in the intergenic underestimate the true mutational burden in the tumor2,3; other regions, gene intronic regions and gene exons, respectively. Here, cancer-specific antigens might also exist and contribute to L1HS intergenic expression was defined as the aggregate expres- immune response against the tumor. Indeed, there is growing sion from the 951 elements within the intergenic regions. Other evidence from in vitro and preclinical studies that DNA deme- subfamilies were treated similarly. thylation inhibitors can trigger innate antiviral response to We benchmarked the performance of REdiscoverTE with human endogenous retroviral (HERVs) expression in the extensive simulations using RSEM16 (Fig. 1b, Supplementary tumor4–7. Such retroviral transcripts also have the potential to Fig. 1). REdiscoverTE’s approach of expression aggregation to the produce antigens that activate adaptive immunity8. HERVs, subfamily level demonstrated high accuracy of TE quantification together with other classes of transposable elements (TEs) com- (Spearman correlation r ≥ 0.99, mean absolute relative difference, prise about 45% of the human genome, a sequence space that mean absolute relative difference (MARD) ≤ 0.05), likely by vastly eclipses that of the coding genome9. Although in normal reducing mapping noise observed at the individual element level tissue, much of TE activity is under tight epigenetic control, in (Supplementary Fig. 1e–f). REdiscoverTE performed best on cancer, we hypothesized that wide-spread TE expression occurs, intergenic TEs, followed by exonic, then intronic TEs (Fig. 1b, particularly in those with extensive epigenetic dysregulation10. Supplementary Fig. 1f). Exonic TE expression, which comprised a Investigation of the extent of expression by these evolutionarily minority of the total TE read fraction, was excluded from further sequestered sequences in cancer, and their antigenic potential, has analysis to rule out the potential confounding expression from thus far been very limited. overlapping host genes and ease downstream interpretation. A major impediment to understanding TE expression and its We compared REdiscoverTE to three previously published TE potential relevance to tumor immunity is the analytic challenge of expression quantification methods: the method used in6, RepEn- accurate quantification of short-read sequences from repetitive rich17, and SalmonTE18 (Supplementary Fig. 1g–k, Supplementary regions in the transcriptome. Standard pipelines typically discard Note 1) and have found REdiscoverTE to be significantly more repetitive reads11. To shed light on these hidden parts of the comprehensive and accurate as a robust, whole-transcriptome transcriptome, we have developed and benchmarked a new quantification method. method, REdiscoverTE, to comprehensively quantify expression by all repetitive elements including TEs in RNA-seq data, then applied it to 7750 cancer and normal tissue samples from The TE expression is dysregulated in cancer. To characterize the Cancer Genome Atlas (TCGA) and a replication study12. Here, landscape of TE expression in cancer, we applied REdiscoverTE to we provide the first landscape analysis of TE expression in cancer, 7345 TCGA RNA-seq samples (containing 1232 tumor and reveal that the rate of tumor DNA demethylation proximal to TEs matched normal samples, the rest are tumor samples without is far greater than global demethylation, and identify multiple normals) across 25 cancer types. For validation of our findings in ways in which TE expression may impact cellular and immune select cancer types, we also analyzed an additional 405 tumor and responses to the tumor. We show that tumor cells present matched normal RNA-seq samples across five cancer types potentially immunogenic peptides derived not only from HERVs, (Supplementary Table 2) from the Cancer Genome Project but also other classes of TE: long interspersed nuclear elements (CGP)12. Both data sets were generated from poly-dT RNA-seq (LINE), short interspersed nuclear elements (SINE), and SINE- library preparations, which can capture poly-adenylated TEs VNTR-Alu (SVA). TE-derived tumor-specific antigens that are transcripts. On average, 1% of RNA-seq output mapped to TEs conserved and not patient-specific may be evaluated as “off-the- (Supplementary Fig. 2a). Notably, TE expression was observed shelf” vaccine targets both for therapeutic intervention and, even from all TE classes (N = 5) and most families (N = 43), including more provocatively, for cancer prophylaxis. both retrotransposons and DNA transposons (Supplementary Fig. 2b). For each TE class, the bulk of expression stems from intergenic regions (Supplementary Fig. 2c), suggesting autono- Results mous TE expression from intergenic loci compared with read- Genome-wide TE expression quantification approach. REdis- through transcription in host, protein-coding genes. Human coverTE was devised to simultaneously quantify expression by all DNA transposons had been thought to be completely inactive, annotated genes defined in Gencode13 and all RepeatMasker based on the lack of evidence for transposition in the human sequences (over five million) in the human genome14 (Fig. 1a, genome19. Although these data cannot address transposition, our Supplementary Table 1, detailed in the Methods). To mitigate the results suggest active gene expression by DNA transposons in uncertainty associated with mapping reads to repetitive features, many cancers. we leveraged a recently developed light-weight mapping approach In all cancer types, TE expression was detected in both tumor for isoform quantification, Salmon, which uses an expectation- and matched normal tissues, suggesting basal levels of TE maximization (EM) algorithm to assign multi-mapping reads transcriptional activities in normal tissues. Across the two data probabilistically to transcripts, based on evidence from uniquely sets, 10 cancer types showed a significantly higher proportion of mapped reads15. Post-quantification, we restricted our down- reads mapping to TEs in tumor compared with matched normal stream analysis to only TEs sequences (over four million), which tissues, suggesting particularly active TE expression in these are classified into 1052 distinct TE subfamilies in five classes: cancers; the reverse was observed in four cancer types LINE, SINE, long terminal repeats (LTR), SVA, and DNA (Supplementary Fig. 2a, d, e). Differential expression analysis of transposons (Supplementary Fig. 1a). To measure the total tumor samples with respect to matched normals20–22 confirmed transcriptional output from a group of related TEs, we aggregated that stomach, bladder (Supplementary
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages14 Page
-
File Size-