Functional Annotation of Exon Skipping Event in Human Pora Kim1,*,†, Mengyuan Yang1,†,Keyiya2, Weiling Zhao1 and Xiaobo Zhou1,3,4,*
Total Page:16
File Type:pdf, Size:1020Kb
D896–D907 Nucleic Acids Research, 2020, Vol. 48, Database issue Published online 23 October 2019 doi: 10.1093/nar/gkz917 ExonSkipDB: functional annotation of exon skipping event in human Pora Kim1,*,†, Mengyuan Yang1,†,KeYiya2, Weiling Zhao1 and Xiaobo Zhou1,3,4,* 1School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA, 2College of Electronics and Information Engineering, Tongji University, Shanghai, China, 3McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA and 4School of Dentistry, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA Received August 13, 2019; Revised September 21, 2019; Editorial Decision October 03, 2019; Accepted October 03, 2019 ABSTRACT been used as therapeutic targets (3–8). For example, MET has lost the binding site of E3 ubiquitin ligase CBL through Exon skipping (ES) is reported to be the most com- exon 14 skipping event (9), resulting in an enhanced expres- mon alternative splicing event due to loss of func- sion level of MET. MET amplification drives the prolifera- tional domains/sites or shifting of the open read- tion of tumor cells. Multiple tyrosine kinase inhibitors, such ing frame (ORF), leading to a variety of human dis- as crizotinib, cabozantinib and capmatinib, have been used eases and considered therapeutic targets. To date, to treat patients with MET exon 14 skipping (10). Another systematic and intensive annotations of ES events example is the dystrophin gene (DMD) in Duchenne mus- based on the skipped exon units in cancer and cular dystrophy (DMD), a progressive neuromuscular dis- normal tissues are not available. Here, we built order. To restore dystrophin protein functions that were lost ExonSkipDB, the ES annotation database available due to frame shifting through ES, multi-exon skipping an- at https://ccsm.uth.edu/ExonSkipDB/, aiming to pro- tisense oligonucleotides (ASOs) have been used for DMD treatment (11). Therefore, systematic identification and in- vide a resource and reference for functional annota- tensive analyses of ES events in pan-cancer and healthy tion of ES events in multiple cancer and tissues to tissues will provide important insights into disease mech- identify therapeutically targetable genes in individ- anisms and identify novel cancer type-specific therapeutic ual exon units. We collected 14 272 genes that have targets. 90 616 and 89 845 ES events across 33 cancer types With the recent exponential growth of cancer genomic and 31 normal tissues from The Cancer Genome At- and other biomedical data, several studies have analyzed al- las (TCGA) and Genotype-Tissue Expression (GTEx). ternative splicing events in multiple cancer or tissue types For the ES events, we performed multiple functional (e.g. pan-cancer studies) (12,13). There exist several web annotations. These include ORF assignment of exon tools providing pan-cancer AS annotations such as TCGA skipped transcript, studies of lost protein functional SpliceSeq (14), TSVdb (15) and CAS-viewer (16). However, features due to ES events, and studies of exon skip- these studies focused on the identification of alternative splicing events and visualization of isoform structures based ping events associated with mutations and methy- on known gene structures. Furthermore, these studies did lations based on multi-omics evidence. ExonSkipDB not present detailed functional impacts of AS events. So far, will be a unique resource for cancer and drug re- a systematic and intensive annotation of ES events based on search communities to identify therapeutically tar- the skipped exon units in cancer and normal tissues regard- getable exon skipping events. ing their functional impacts has not been available. Here, we built ExonSkipDB, the ES annotation database, aiming to INTRODUCTION provide resources and references for functional annotation Accumulated evidence has shown that the disruption of al- of ES events in cancer to identify therapeutically targetable ternative splicing (AS) contributes to human diseases (1). exons. Among the typical patterns of alternative splicing, exon To achieve this goal, we first collected ES event infor- skipping (ES) is the most common event (2). Since ES re- mation of total 14 272 genes from 33 cancer types and 31 sults in the loss of functional domains/sites or frame shift- normal tissues of the Cancer Genome Atlas (TCGA) and ing of the open reading frame (ORF), skipped exons have Genotype-Tissue Expression (GTEx) from Kahles et al.’s *To whom correspondence should be addressed. Tel: +1 713 500 3923; Email: [email protected] Correspondence may also be addressed to Pora Kim. Tel: +1 713 500 3636; Email: [email protected] †These authors contributed equally to this work. C The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] Nucleic Acids Research, 2020, Vol. 48, Database issue D897 study (12). These genes contain 90 616 and 89 845 ES sQTM). Through systematic survival analysis of sQTM events of TCGA and GTEx, respectively. For these ∼14K pairs, we identified 50 prognostic cases. In addition, we genes, we performed functional annotations including in- identified 1534 approved drugs for targeting 1715 ES genes, vestigation of the open reading frame (ORF) for individual and 7324 genes reported to be associated with 5039 different ES events and lost protein functional features of in-frame types of diseases in DrugBank (18) and DisGeNet (19), re- ES events based-on the canonical transcript sequences and spectively. Figure 1 is an overview of ES annotations using amino acid sequences, respectively. We also analyzed muta- MET as a representative example. All entries and annota- tions and methylations associated with ES events. Users can tion data are available for browsing and downloading on the access a variety of annotations, including gene summaries, ExonSkipDB web site (https://ccsm.uth.edu/ExonSkipDB). gene structures such as gene isoform and ES events from TCGA and GTEx, landscape of percent spliced in (PSI) and isoform abundances across multiple human tissues and DATA INTEGRATION AND ANNOTATIONS cancers, ORF annotation, analysis of lost protein func- Exon skipping information tional features, skipped exon region mutations with cover- age depth and differential PSIs, sQTL and sQTM analy- We downloaded the exon skipping event information of ses, and ES gene-related drugs and diseases. This paper in- 8,705 patients of TCGA’s 33 cancer types and over 3,000 troduces ExonSkipDB, the web interface, and its applica- normal samples from GTEx’s 31 different tissues (Supple- tions, aiming to provide resources or references for func- mentary Tables S1 and S2) from Kahles et al.’s study (12). tional annotation of exon skipping events in cancer. Ex- Only the exon skipping events with conserved loci among onSkipDB will be a unique resource for cancer and drug the six splice-sites of three exons were used in this study. research communities to identify disease-associated exon These exons are involved in exon skipping such as ‘upstream skipping events. exon’, ‘skipped exon’ and ‘downstream exon’. Of these six sites, four sites are the acceptor site of the upstream exon, the acceptor and donor sites of the skipped exon, and the DATABASE OVERVIEW donor site of the downstream exon. Based on the GEN- Through multiple annotations of the ExonSkipDB, users CODE v19 annotation reference genome (20), we obtained can get help from the following aspects. (i) Comparing 90 616 and 89 845 annotated gene structure-based exon ES events with PSIs and isoform abundance between pan- skipping events from TCGA and GTEx, respectively. cancer and healthy human tissues can help detect poten- tial cancer or cancer type-specific ES events. Through com- Open reading frame (ORF) annotation paring 90 616 and 89 845 ES events from TCGA and GTEx, we identified 121 cancer-specific in-frame ES events- For specific exon skipping events based on the GENCODE related genes. (ii) Analysis of the lost protein features of v19 (20) (Supplementary Tables S3 and S4), we examined exon skipping events will help to deepen understanding the open reading frames of major isoform transcript se- the detailed loss-of-functional effects of tumorigenic ES quences. When both of the nucleotide start and end posi- events. We have identified 2427 and 249 in-frame genes that tions of exon skipping were located inside of the coding have lost protein functional domains and DNA-binding do- region (CDS) and the number of transcript sequences af- main, respectively. One hundred forty-two genes lost spe- ter exclusion of the skipped exon sequence was a multiple cific amino acid sites such as the site required for ligand- of three, then we reported such exon skipped gene isoform induced CBL-mediated ubiquitination in MET exon 14. (iii) as ‘in-frame’. When one or two nucleotide insertions are ORF annotation of skipped exons classifies translational present, we reported such transcripts as ‘frame-shift’. and non-translational ES events and provides potential can- didates to be targeted by antisense oligonucleotides to re- Retention analysis of 39 protein features from UniProt store lost protein functions. Among ∼14 000 genes with ES events in cancer, there were 8667 and 9623 ES genes of Firstly, we created the exon skipped transcript sequence in-frame and frame-shift ORFs, respectively. Among 9623 based on the canonical transcript sequence. Then this se- genes with the frame-shifted ES events, 453 genes can be tar- quence was aligned with non-redundant protein sequence geted by the drugs from IUPHAR database.