RNA processing genes characterize RNA splicing and further stratify lower-grade glioma
Rui-Chao Chai, … , Fan Wu, Yong-Zhi Wang
JCI Insight. 2019. https://doi.org/10.1172/jci.insight.130591.
Clinical Medicine In-Press Preview Genetics Oncology
Graphical abstract
Find the latest version: https://jci.me/130591/pdf 1 RNA processing genes characterize RNA splicing and
2 further stratify lower-grade glioma 3 Authors: Rui-Chao Chai1,2,3*, Yi-Ming Li1,2,4*, Ke-Nan Zhang1,2, Yu-Zhou Chang4, 4 Yu-Qing Liu1,2, Zheng Zhao1,2, Zhi-Liang Wang 1,2, Yuan-Hao Chang1,2, Guan-Zhang 5 Li1,2, Kuan-Yu Wang1,2, Fan Wu1,2#, Yong-Zhi Wang 1,2,3,4# 6 7 1Department of Molecular Neuropathology, Beijing Neurosurgical Institute, Capital 8 Medical University, Beijing, China; 9 2Chinese Glioma Cooperative Group (CGCG), Beijing, China; 10 3China National Clinical Research Center for Neurological Diseases; 11 4Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, 12 Beijing, China; 13 14 * Rui-Chao Chai and Yi-Ming Li contributed equally to this work. 15 16 # Corresponding authors: 17 1. Yong-Zhi Wang 18 Beijing Neurosurgical Institute, Beijing Tiantan Hospital, Capital Medical University, 19 No. 119 South Fourth Ring West Road, 20 Beijing 100070, China 21 Tel/Fax: +861067096794 22 E-mail: [email protected] 23 2. Fan Wu 24 Beijing Neurosurgical Institute, Beijing Tiantan Hospital, Capital Medical University, 25 No. 119 South Fourth Ring West Road, 26 Beijing 100070, China 27 Tel/Fax: +861067096794 28 E-mail: [email protected] 29 30 Keywords: RNA processing, lower-grade gliomas, clinical outcomes, alternative 31 splicing, TTF2 32 33 Conflict of Interest: The authors declare no conflict of interest. 34 35
1
1 Abstract
2 BACKGROUND. Aberrant expression of RNA processing genes may drive the
3 alterative RNA profile in lower-grade gliomas (LGGs). Thus, we aimed to further
4 stratify LGGs based on the expression of RNA processing genes.
5 METHODS. This study included 446 LGGs from The Cancer Genome Atlas (TCGA,
6 training set) and 171 LGGs from the Chinese Glioma Genome Atlas (CGGA, validation
7 set). The least absolute shrinkage and selection operator (LASSO) Cox regression
8 algorithm was conducted to develop a risk-signature. The receiver operating
9 characteristic (ROC) curves and Kaplan–Meier curves were used to study the prognosis
10 value of the risk-signature.
11 RESULTS. Among the tested 784 RNA processing genes, 276 were significantly
12 correlated with the OS of LGGs. Further LASSO Cox regression identified a 19-gene
13 risk-signature, whose risk score was also an independent prognosis factor (P<0.0001,
14 multiplex Cox regression) in the validation dataset. The signature had better prognostic
15 value than the traditional factors “age”, “grade” and “WHO 2016 classification” for 3‐
16 and 5‐year survival both two datasets (AUCs > 85%). Importantly, the risk-signature
17 could further stratify the survival of LGGs in specific subgroups of WHO 2016
18 classification. Furthermore, alternative splicing events for genes such as EGFR and
19 FGFR were found to be associated with the risk score. mRNA expression levels for
20 genes, which participated in cell proliferation and other processes, were significantly
21 correlated to the risk score.
22 CONCLUSIONS. Our results highlight the role of RNA processing genes for further
2
1 stratifying the survival of patients with LGGs and provide insight into the alternative
2 splicing events underlying this role.
3 FOUNDING. Supported by the National Natural Science Foundation of China
4 (81773208, 81402052), the Beijing Nova Program (Z16110004916082), and the
5 National Key Research and Development Plan (2016YFC0902500).
6
3
1 Introduction
2 Diffuse glioma is the most frequent primary malignant brain tumors in adults (1,
3 2). Although intensive treatment with surgery, radiotherapy, and chemotherapy are
4 conducted, nearly all gliomas relapse. The median overall survival (OS) of patients with
5 glioblastoma (GBM, WHO grade IV) is only approximately 15 months (1, 3, 4).
6 Patients with lower-grade diffuse glioma (LGG, WHO grade II/III) have a relatively
7 favorable prognosis, but these tumors evolve and most relapse as therapy-resistant
8 higher grade gliomas over time (5-7). LGGs can be classified as WHO grade II or III
9 based on histological features or as oligodendroglioma with isocitrate dehydrogenase
10 (IDH)-mutant and 1p/19q codeletion, astrocytoma with IDH-mutant, and astrocytoma
11 with IDH-wildtype according to the WHO 2016 integrated analysis (1, 3, 8). However,
12 this classification still does not fully reflect the rate of LGG evolution, and for the OS
13 of patients with LGGs in a stratified subgroup is rather heterogeneous (7, 8). Thus,
14 further stratification of LGGs using various methods (9-15), including RNA expression
15 profiles (11, 14, 16-18), has attracted attention.
16 A dysregulated RNA expression profile is an important hallmark of tumors (19).
17 RNA processing factors, such as RNA-binding proteins and RNA methylation
18 regulators, are the main factors controlling the life-cycle of RNA in cells and thus play
19 vital roles in numerous diseases and cancers by determining the final content and
20 isoforms of mature RNA transcripts (6, 20-24). However, studies on RNA processing
21 factors in gliomas have mainly focused on GBM, and the roles of RNA processing
22 factors in stem cell maintenance, cell adhesion, cell migration, cell growth, colony
23 formation, and invasiveness have been demonstrated in GBM (6, 20, 25-29). Recently,
24 some RNA processing factors, such as HnRNPH1, PTB, SNRP1, and ELAVL, which
25 are evaluated in the progression and prognosis of GBM, also showed prognostic value
26 for all grades of diffuse gliomas (6), indicating their potential role in LGGs.
27 Considering that abnormal expression of RNA processing genes may drive changes in
28 the dysregulated RNA expression profile in glioma, it is important to systematically
29 examine the roles of RNA processing factors in LGGs.
4
1 RNA processing factors also govern the alternative splicing events (ASEs) of
2 individual genes, which can affect several important factors in tumor initiation and
3 progression (30). For instance, our recent study demonstrated the vital role of 14 exon
4 skip of MET in the malignant progression and targeted therapy of secondary GBM (31).
5 The Cancer Genome Atlas (TCGA) SpliceSeq dataset lists seven ASE types (32). This
6 makes it possible to link the dysregulation of RNA processing factors with aberrant
7 ASEs in LGGs.
8 In this study, we systemically analyzed the expression profile of RNA processing
9 genes and their prognostic values in 617 LGGs from TCGA (n = 446) and the Chinese
10 Glioma Genome Atlas (CGGA; n = 171). We also profiled the ASEs underlying LGGs
11 stratified by the risk signature built with RNA processing genes and identified the
12 corresponding functions.
5
1 Results
2 Prognostic value of RNA processing genes and their biological function in LGGs
3 Our approach and workflow for the selection of RNA processing genes, development
4 and validation of a prognostic risk signature, and the analysis of the ASEs and altered
5 RNA expression profile that correlated to the signature are summarized in Figure 1.
6 Of the 784 RNA processing genes tested, 276 were significantly correlated with
7 the OS of LGGs (Supplementary Table 1). A total of 142 of the 276 genes had an HR >1
8 and were considered risk-associated, while the remaining 134 genes had HR <1 and
9 were considered protection-associated. Risk-associated genes showing increasing
10 expression (upper panel) and protection-associated genes showing decreasing
11 expression (lower panel) were associated with increased malignancy in three subgroups
12 according to the WHO 2016 classification (Figure 2A). Some of these genes also
13 showed inconsistent expression levels within the same subgroups, suggesting the
14 potential prognostic value of these genes within specific subgroups.
15 Since these 276 genes are a collection of genes that participated in any process
16 involved in the conversion of one or more primary RNA transcripts into one or more
17 mature RNA molecules. We used Gene Ontology (GO) analysis to study the more
18 specific biological processes that these genes are enriched in, and the results indicated
19 that these survival-associated genes were correlated with the terms such as rRNA
20 processing, nuclear-transcribed mRNA catabolic process, nonsense-mediate decay, and
21 mRNA splicing, via spliceosome (Figure 2B). We also further annotated the 142 risk-
22 associated and 134 protection-associated genes through GO and KEGG analysis,
6
1 respectively (Supplementary Table 2 and 3). The results indicated that GO_BP term of
2 mRNA splicing, via spliceosome and KEGG pathway of spliceosome were the most
3 associated terms for the 142 risk-associated genes, and the GO_BP term of rRNA
4 processing and KEGG pathway of ribosome were the most associated terms for the 134
5 protection-associated genes.
6 Identification of a panel of 19 RNA processing genes as a risk signature in LGGs
7 To easily and reliably stratify the outcomes of LGGs with RNA processing genes, we
8 applied the least absolute shrinkage and selection operator (LASSO) Cox regression
9 algorithm to the 276 genes in TCGA dataset (Supplementary Figure 1). A total of 19
10 genes were selected to build the risk signature, and the coefficients and normalized
11 expression levels of these genes were used together to calculate risk scores for both the
12 training dataset (TCGA dataset) and validation dataset (CGGA dataset) (Figure 2C and
13 Supplementary Table 4).
14 We divided patients into high-risk and low-risk groups in various stratified glioma
15 subtypes using their respective median risk score as a cutoff. We found that patients
16 with low-risk scores had significantly longer OS than patients with high-risk scores in
17 all subtypes evaluated in TGCA datasets: all LGGs (Figure 2D), oligodendroglioma
18 with IDH-mutant and 1p/19q codeletion (Figure 2E), astrocytoma with IDH-mutant and
19 IDH-wild-type (Figure 2F, Supplementary Figure 2A–B). Similar results were observed
20 in the CGGA datasets (Figure 2G–I, Supplementary Figure 2C–D).
21 To investigate the prognostic value of the risk signature and other
22 clinicopathological features, univariate and multivariate Cox regression analysis was
7
1 performed both in TCGA (training) and CGGA (validation) data set. The results
2 revealed that high risk score was a prognostic factor in both datasets (P<0.0001),
3 independent of grade, age, gender, WHO 2016 subgroups, IDH mutation, 1p19q
4 codeletion, and MGMT methylation status (Table 1).
5 We also summarized the race composition in datasets used in this study, as a
6 specific race may represent population-specific exposure and risk factor. In the TCGA
7 dataset, 92.00% of cases are White, 3.36% of cases are Black and African American,
8 1.79% of cases are Asian, 0.22% of cases are American Indian or Alaska Native, and
9 the rest 2.02% of cases unknown (Supplementary Figure 3A). All of the cases in the
10 CGGA dataset are Asian. We found that patients with low risk scores had significantly
11 longer OS than patients with high risk scores in White (Supplementary Figure 3B) and
12 Black or African American (Supplementary Figure 3C) population of TGCA dataset.
13 We cannot compare the OS of the 8 cases of Asian in the TCGA dataset, as all of them
14 survived at the end of follow-up (Supplementary Figure 3D). However, the signature
15 could stratify the survival of Asian population, since all the cases in the CGGA dataset
16 are Asian.
17 Collectively, these results strongly support the ability of our risk signature to
18 accurately stratify the prognosis of patients with definite subgroups based on WHO
19 2016 integrated diagnosis.
20
21 The signature risk score is correlated with malignant clinicopathological features
22 of LGGs
8
1 To determine whether the risk signature also reflects the malignant clinicopathological
2 features of LGGs, we determined risk signature expression of 19 genes and
3 clinicopathological features in LGGs with low and high risk in the form of a heatmap
4 (Supplementary Figure 4A). Significant differences were observed between the high-
5 and low-risk groups with respect to the WHO grade (P < 0.001), IDH status (P < 0.001),
6 1p/19q codeletion status (P < 0.001), WHO 2016 subgroups (P < 0.001), MGMT
7 promoter methylation status (P < 0.001), age (P < 0.001), and CIC (P < 0.001)
8 (Supplementary Figure 4A and Supplementary Table 5). We also found that the risk
9 score was significantly different in patients with LGGs stratified by WHO grade,
10 subgroups according to WHO 2016 classification, IDH mutation status, 1p/19q
11 codeletion status, MGMT promoter methylation status, and age (Supplementary Figure
12 4B–G). Moreover, the distributions of risk score between LGGs with and without TERT
13 promoter mutation were highly dependent on the IDH mutant status (Supplementary
14 Figure 4H). The risk score significantly increased in TERT-mutant LGGs with wild-
15 type-IDH (P < 0.01), while the risk score deceased in TERT-mutant LGGs with mutant-
16 IDH (P < 0.01). A similar distribution of risk score was observed in the CGGA dataset
17 (all P values < 0.05, except for TERT status, and P=0.07 for TERT status,
18 Supplementary Figure 5). These results indicated that the signature was associated with
19 the malignant clinicopathological features of gliomas.
20 The receiver operating characteristic (ROC) curve analysis showed that the
21 signature risk score had the best efficiency (compared with age, WHO 2016 subgroups
22 and WHO grade) for predicting the 3-year and 5-year survival of patients with LGGs
9
1 both in TCGA and CGGA datasets (Figure 3). Areas under the curve of risk score, age
2 WHO 2016 subgroups and WHO grade were 93.7%, 81.8%,77.6%,70.2% respectively
3 for 3 year’s survival in TCGA dataset; 88.1%,78.8%, 74.8%, 68.4% respectively for 5
4 year’s survival in TCGA dataset; 85.9%, 64.4%, 76.1%, 78.6% respectively for 3 year’s
5 survival in CGGA dataset; 87.3%, 63.7%, 76.4%, 75.3% respectively for 5 year’s
6 survival in CGGA dataset.
7
8 The signature risk score is closely correlated with the alternative splicing events
9 RNA splicing activities are governed by RNA processing genes, and our findings
10 showed that worse survival-associated RNA processing genes were most enriched in
11 RNA splicing related activities. Thus, we also systematically characterized ASEs in
12 LGGs with different risk scores. In total, tens of thousands of seven ASE types,
13 including alternate acceptor site (AA), alternate donor site (AD), alternate promoter
14 (AP), alternate terminator (AT), exon skip (ES), mutually exclusive exons (ME), and
15 retained intron (RI), were detected in each LGG (Supplementary Figure 6A).
16 Furthermore, the proportion of differential ASE types in LGGs varied widely (from 0.5%
17 to approximately 35%). Although all LGGs shared similar patterns of ASE types (ES
18 and AP were the most frequently observed ASE types, while ME and RI were the least
19 frequently observed ASE types), the total number of detected differential ASEs
20 gradually increased along with the increasing risk score (Figure 4A), and the total
21 number of ASEs was significantly smaller in LGGs with a lower (1st quarter n=111) risk
22 score than with a higher (4th quarter n=111) risk score (Figure 4B).
10
1 Percent Spliced In (PSI) is the ratio of reads indicating the presence of a transcript
2 element versus the total reads covering the event (32). We also analyzed the PSI levels
3 for ASEs in all LGGs and LGGs with lower and higher risk scores. The results revealed
4 that events with lower PSI levels (PSI ≤ 0.2) and higher PSI levels (PSI > 0.8)
5 constituted most types of ASEs in all LGGs and LGGs with lower and higher risk scores
6 (Supplementary Figure 6B–H). These observations indicated that transcripts with high
7 splice-out alternative exons or high splice-in alternative exons were the predominantly
8 transcribed form of most genes.
9 We further identified differentially expressed RNA splicing genes (P < 0.05 and
10 fold-change ≥2 or ≤0.5) and ASEs with significantly different PSI (P < 0.05 and fold-
11 change ≥2 or ≤0.5) in LGGs with lower (1st quarter n=111) and higher (4th quarter n=111)
12 risk scores (Figure 5A, Supplementary Table 6 and Supplementary Table 7). There were
13 257 ASEs for 247 genes with decreased PSI levels in LGGs with higher risk score and
14 604 ASEs for 527 genes with increased PSI levels in LGGs with higher risk scores
15 (Supplementary Figure 7A). The proportion of AP type of ASEs dramatically increased
16 (from 23.14% to 47.7%) in these ASEs with significantly different PSI. In LGGs with
17 higher risk score, AP (45.1%) and ES (26.5%) showed the largest proportion in ASEs
18 with decreased PSI, while AP (49.3%) and AT (25.0%) constituted a larger proportion
19 of ASEs with increased PSI (Supplementary Figure 7B). This suggests that ES
20 transcripts with high splice-out exons are more likely to be present in LGGs with higher
21 risk scores and the AP type of ASEs appeared to be more relevant to the prognosis of
22 LGGs.
11
1 We found that genes involved in the receptor tyrosine kinase signaling pathway
2 (EGFR, FGFR1, and FGFR2), DNA damage response (C5orf45), RNA binding
3 (CIRBP), transcription regulation (CREM), and brain disease development (MAPT)
4 were differentially spliced between LGGs with lower and higher risk scores (Figure
5 5B). To further explore the function of alternative splicing in the malignancy of LGGs,
6 GO analysis was performed for all genes that were differentially spliced in LGGs with
7 lower and higher risk scores. In general, genes with differential PSI were mainly
8 enriched in biological processes, such as “positive regulation of GTPase activity”,
9 “SRP-dependent co-translational protein targeting to membrane”, “cytoskeleton”,
10 “cytokinesis”, “Transmembrane receptor protein tyrosine kinase signaling pathway”,
11 “cell adhesion”, and others (Figure 5C, Supplementary Figure 8). Our analysis
12 indicated that differential ASEs participate in many cancer-related biological processes,
13 particularly signal transduction pathways, suggesting that ASEs are a critical
14 mechanism underlying the prognostic value of RNA processing genes in LGGs.
15 We further investigated the potential regulatory networks between the significantly
16 changed 28 RNA splicing genes and 862 ASEs (see Methods). Finally, a network of the
17 28 differential RNA splicing genes and 221 of their significantly correlated differential
18 ASEs was constructed (Figure 6A and Supplementary Table 8). The 28 RNA splicing
19 genes regulated distinct amounts of ASEs ranging from 4 to 104 (Figure 6B). For RNA
20 splicing genes whose expression was increased in LGGs with higher risk scores, TTF2
21 and MBNL3 were found to regulate a larger number of ASEs, while GPATCH1 and
22 DHX15 regulated fewer ASEs (Figure 6B upper panel). In RNA splicing genes whose
12
1 expression was decreased in LGGs with higher risk scores, POLR2F and PQBP1
2 regulated a larger number of ASEs, while GPKOW and SAP18 regulated fewer ASEs
3 (Figure 6B lower panel). The proportion of AP type of ASE was still the largest one in
4 these 221 ASEs.
5
6 The function analysis of genes whose mRNA expression are correlated with the
7 signature risk score in LGGs
8 Since RNA processing genes are the main factors controlling the life-cycle of RNA in
9 cells, we also evaluate the RNA expression profile influenced by the differentially
10 expressed RNA processing genes. We identified genes positively (Pearson
11 coefficient >0.5 and Bonferroni corrected P < 0.01) or negatively (Pearson coefficient
12 < -0.5, and Bonferroni corrected P < 0.01) correlated with the risk score and then
13 annotated their functions using GO analysis for biological processes and KEGG
14 pathways analysis (Figure 7A and B). Overall, the results indicated that the risk scores
15 positively correlated genes were mainly enriched in regulation of chromosome division
16 and DNA stability, cell division, DNA replication, cell cycle, DNA repair, angiogenesis
17 and other malignancy-related biological processes (Figure 7A). The corresponding
18 pathways could also be observed in the Kyoto Encyclopedia of Genes and Genomes
19 (KEGG) pathway analysis (Figure 7B). Suggesting that the altered expression of RNA
20 processing genes may be associated with the increased expression of genes enriched in
21 these processes. Interesting, we observed that the signature risk scores negatively
22 correlated genes were mainly involved in the GO processes of translation and rRNA
13
1 processing, and the KEGG pathway of ribosome (Figure 7A and B). This finding is
2 consistent with the finding that the GO_BP term of rRNA processing and KEGG
3 pathway of ribosome are the most associated terms for the 134 protection-associated
4 RNA processing genes (Supplementary Table 2).
5 Furthermore, Gene Set Enrichment Analysis (GSEA) revealed that the hallmarks
6 of malignant tumors, including epithelial-mesenchymal transition, angiogenesis, G2M
7 checkpoint, mitotic spindle, inflammatory response, complement, PI3K-AKT-MTOR
8 signaling, and KRAS signaling (Figure 8A–H), as significantly enriched in LGGs with
9 risk scores higher than median risk score. These findings indicate that the risk score of
10 the RNA processing signature reflects the expression alterations of genes involved in
11 malignant biological processes, signaling pathways, and malignant hallmarks in LGGs.
12 RNA expression profile influenced by knock-down of mRNA expression of TTF2
13 To study whether the change of RNA processing gene is a driver or merely correlative
14 to the dysregulated RNA profile. We selected three specific siRNA to knock-down the
15 mRNA expression of TTF2 genes, this gene is risk-associated in the signature (with
16 highest HR), and it also included in the GO term of RNA splicing. The results indicated
17 that all three siRNA could significantly downregulate the mRNA expression of TTF2
18 in glioma cell line LN229 cells, and TTF2 siRNA2 appeared to be the most effective
19 siRNA (Figure 9A).
20 We collected whole transcriptome data of LN229 cells at 48 hr after transfection
21 of TTF2 siRNA2 (n=3) and scramble siRNA (n=3), respectively. Compared with
22 scrambled siRNA group, there were 2676 genes with increased expression and 2710
14
1 genes with decreased expression in the group of TTF2 siRNA, and the mRNA
2 expression of TTF2 gene was also significantly decreased (Figure 9B, Supplementary
3 Table 9). We found that the 2710 genes with decreased expression after knock-down
4 mRNA of TTF2 gene were significantly enriched in the biological processes of
5 chromosome segregation, cell division, cell cycle, mRNA metabolic, and RNA splicing
6 (Figure 9C, Supplementary Table 10), and the KEGG pathway analysis showed these
7 genes also were enriched in the pathways of “cell cycle”, “spliceosome”, “DNA
8 replication”, “RNA transport”, “proteasome”, “mismatch repair”, and others (Figure 9D,
9 Supplementary Table 11). This finding was largely consistent with the result of function
10 analysis of genes whose mRNA expressions were positively correlated to the risk
11 signature, indicating that changes of RNA processing gene TTF2 maybe a driver of
12 these dysregulated RNA profiles in glioma.
13 As mentioned above, functions of genes whose mRNA expressions were
14 negatively correlated to the risk signature mainly enrich in the biological processes of
15 “translation” and “SRP-dependent co-translational protein targeting to membrane”, the
16 KEGG pathways of “Ribosome”. We also noticed that functions of the 2676 up-
17 regulated genes were mainly involved in the protein processing activities, such as the
18 biological processes of “extracellular matrix organization”, “response to endoplasmic
19 reticulum stress” and others, the KEGG pathways of “lysosome”, “protein processing
20 in endoplasmic reticulum” and others (Figure 9C and D, Supplementary Table 12 and
21 13).”
15
1 Discussion
2 In this study, we found that the general expression pattern of RNA processing genes is
3 correlated with the malignancy features of LGGs and identified RNA processing genes
4 significantly associated with the prognosis of LGGs. We further built a risk signature
5 that not only further predicted the prognosis of stratified LGGs but also could perfectly
6 reflect the malignancy-correlated clinicopathological features, biological processes,
7 key signaling pathways, and hallmarks. Moreover, we systemically analyzed ASEs
8 underlying LGGs with lower and higher risk scores and identified the corresponding
9 functions. Our results highlight the role of RNA processing genes in further stratifying
10 the survival of patients with LGGs, and the developed signature shows a potential to
11 become an effective supplement for the integrated diagnosis criteria of WHO 2016 in
12 further stratifying the prognosis of LGGs. Moreover, we also reveal the associated
13 ASEs and biological processing that were associated with the risk scores of the
14 developed signature.
15 Genetic alterations, such as IDH mutation and 1p/19q codeletion, have been
16 included in the integrated diagnosis criteria of WHO 2016 (3). Very large differences
17 in prognosis exist in LGGs within the definite WHO 2016 subgroup (16). Several
18 important biomarkers have been identified to further stratify LGGs, including genetic
19 alterations in CDKN2A/B in astrocytoma with mutant-IDH (11, 33); EGFR
20 amplification, chromosome 7 gain and chromosome 10 loss, and TERT promoter
21 mutation in astrocytoma with wild-type-IDH (11). Importantly, “the EGFR
22 amplification, or chromosome 7 gain and chromosome 10 loss, or TERT promoter
16
1 mutation” had been recommended as the diagnostic criteria for “Diffuse astrocytic
2 glioma, IDH‐wild-type, with molecular features of glioblastoma, WHO grade IV” by
3 the Consortium to Inform Molecular and Practical Approaches to Central Nervous
4 System Tumor Taxonomy (34). Thus, further stratification of LGGs with definite IDH
5 and 1p/19q status is urgently needed.
6 Apart from these genetic alteration markers, RNA expression is also valuable for
7 predicting the outcomes of tumors with low rates of mutation, and the RNA expression
8 profile had become an indispensable element for classifying medulloblastoma (35).
9 Compared to GBM, LGGs have relative lower mutation rates and the RNA expression
10 of a set of genes was suggested to be useful for predicting the prognosis of patients with
11 1p19q co-deletion diffuse glioma (16). In this study, we confirmed the prognostic value
12 of a signature built with 19 RNA processing genes in each stratified subgroup of LGGs
13 (Figure 2D–I). Compared with traditional stratifying factors (age, WHO grade,
14 subgroups of WHO 2016 classification), The risk score of this signature also had the
15 best predictive efficiency on both the 3-year and 5-year survival (Figure 3). Though the
16 risk score of the signature is highly associated with traditional clinicopathological
17 features, we confirmed that the risk score is an independent prognosis factor in both the
18 training and validation dataset (Table 1). A previous study identified 104 key genes that
19 were prognostic for LGGs. However, the average area under the ROC curve (AUC)
20 values ranged from 0.7 to 0.8 (16), which was lower than the performance of our
21 signature (AUC > 0.85). These data indicate that the RNA processing gene signature is
22 a powerful tool for further predicting prognosis of LGGs, which were stratified by
17
1 isocitrate dehydrogenase and 1p/19q status based on the 2016 World Health
2 Organization classification guidelines
3 The ASEs of individual genes which can affect several important players in tumor
4 initiation and progression (6, 30, 36-38). For instance, delta isoform of Max could
5 promote glioma cell proliferation by enhancing functions of MYC in GBM harboring
6 EGFRvIII mutation (39); two of CD97 isoforms, EGF (1,2,5) and EGF (1,2,3,5) could
7 promote growth, migration, metastasis, and angiogenesis in GBM (40); and an aberrant
8 splicing of cyclin-dependent kinase-associated protein phosphatase KAP increases
9 proliferation and migration in glioblastoma (30). Currently, genome-wide analyses of
10 exon expression arrays have begun to reveal the roles of ASEs correlated with the
11 progression and prognosis of GBM (41). However, the role of alterations of ASEs in
12 LGGs is not well-understood. Our study not only reveals the landscape of ASEs in
13 LGGs, but also identified 861 ASEs correlated with the prognosis of LGGs
14 (Supplementary Table 5), although more specific studies are needed to thoroughly
15 determine the functions of these identified ASEs. Moreover, we identified RNA splicing
16 genes correlated with differential ASEs in LGGs with distinct prognosis.
17 RNA plays crucial roles in biological functions in cells not only by passing
18 genetic information from DNA to protein but also by regulating various biological
19 processes (42). Dysregulation of RNA profiles, including hypoxia-associated gene sets
20 (43), vascular gene sets (44), WNT/beta-catenin pathway-related genes (45), genes in
21 the NF-κB signaling pathway (46), micro-RNA (18), and long non-coding RNA (47)
22 have been shown to play crucial roles in the malignant progression and prognosis of
18
1 LGGs. Here, we highlight the stratification ability of RNA processing genes in LGGs.
2 Additionally, we found that genes whose expression was significantly correlated with
3 the risk score of the signature built with 19 RNA processing genes were enriched in the
4 biological processes/pathways of cell cycle, cell division, DNA replication and repair,
5 angiogenesis, cell proliferation, and pathways in cancer (Figure 6A). Among the 19
6 genes included in the signature, RPL3 and BICD1 have been suggested to be associated
7 with temozolomide resistant in glioblastoma (48, 49); and ELAVL1 was reported
8 participated in the cell proliferation, migration and glioma stem-like cell maintenance
9 (50, 51). RNA expression profile and RNA fate are highly dependent on RNA
10 processing factors responsible for precise temporal and spatial coordinating gene
11 expression (6, 20). In addition, our finding also confirmed that knock-down the
12 expression of TTF2 genes could alter the expression of genes involved in the activities
13 of the cell cycle, RNA splicing, DNA replication, mismatch repair, extracellular matrix
14 organization, and others. All of these indicate that abnormal expression of RNA
15 processing genes is potential drivers of the previously reported dysregulated RNA
16 profiles in LGGs.
17 In summary, our study highlights the prognostic value of RNA processing genes
18 in LGGs and revealed a signature with 19 RNA processing genes for further stratifying
19 the outcomes of LGGs with definite WHO subgroups. Clinicopathological features,
20 ASEs, biological processes, signaling pathways, and hallmarks of tumors correlated
21 with the risk signature were also identified. These results provide fundamental
22 information for understanding the roles of RNA processing and indicate the potential
19
1 clinical implications of RNA processing genes in LGGs.
2
3
20
1 Methods
2 Patients
3 A total of 457 LGG samples were collected from the TCGA publication (52). Of these
4 TCGA samples, we excluded 11 samples that do not contain RNA sequencing data or
5 molecular pathological information or useful OS information, and the other 446 LGG
6 samples with RNA-seq transcriptome data and corresponding clinical and molecular
7 pathological information were obtained from TCGA (http://cancergenome.nih.gov/)
8 and used for systematic analysis. Another 171 LGG samples with RNA-seq
9 transcriptome and corresponding clinicopathological information form CGGA
10 (www.cgga.org.cn) were used to validate the performance of the risk signature.
11 Clinicopathological information for the TCGA and CGGA datasets was summarized in
12 Supplementary Table 14.
13 Selection of RNA processing genes
14 We first collated a list of 835 genes that participated in any process involved in the
15 conversion of one or more primary RNA transcripts into one or more mature RNA
16 molecules (http://amigo.geneontology.org/amigo/term/GO:0006396), and then we used
17 the 784 genes with available RNA expression data in TCGA datasets for further analysis.
18 Identification of the risk signature
19 We performed univariate Cox regression analyses of the expression of RNA pressing
20 genes to identify genes significantly correlated with the prognosis of patients with
21 LGGs in TCGA dataset. Next, we employed the LASSO Cox regression algorithm to
22 build an optimal risk signature with the minimum number of genes (53). Finally, a set
21
1 of genes and their coefficients were determined by minimum criteria, which involves
2 selecting the best penalty parameter λ associated with the smallest 10-fold cross
3 validation within the training set. The risk score for the gene signature was calculated
4 using the formula:
푛 5 Risk score = ∑푖=1 퐶표푒푓푖 ∗ 푥푖
6 where Coef is the coefficient and 푥푖 is the z-score-transformed relative expression
7 value of each selected gene.
8 Patients were divided into “high-risk” and “low-risk” groups using the median risk
9 score as the cutoff value. Patients could also be compared between groups with lower
10 risk score (1st quarter) and higher risk score (4th quarter).
11 Bioinformatic analysis
12 GO and KEGG pathway enrichment analyses with the Database for Annotation,
13 Visualization, and Integrated Discovery (http://david.abcc.ncifcrf.gov/home.jsp) to
14 functionally annotate genes with prognosis value in LGGs, significantly correlated to
15 the risk score, and differentially spliced between patients with lower (1st quarter) and
16 higher (4th quarter) risk score. GSEA was performed to investigate the functions of
17 genes that significantly correlated to the risk score.
18 Construction of regulation networks between RNA splicing genes and ASEs in
19 LGGs
20 The RNA splicing data were collected from the online database
21 (http://bioinformatics.mdanderson.org/TCGASpliceSeq). The ASEs could be
22 quantified by the PSI value, which represents the ratio of included transcript reads in
22
1 total transcript reads forms, and the detailed information on the PSI calculation as
2 reported study (32).
3 RNA splicing genes (GO: 0008380) showing significant changes in expression
4 levels were predicted to be correlated with the differential PSI level of ASEs between
5 LGGs with lower (1st quarter) and higher (4th quarter) risk scores. We calculated the
6 Pearson correlation for each RNA splicing gene-ASE pair, RNA splicing gene-ASE pair
7 with correlation coefficients greater than 0.4 or less than -0.4, and the corresponding P
8 value less than 0.05 were considered significantly correlated (Bonferroni correction
9 was performed to adjust the P value for multiple comparisons). The predicted
10 regulatory network was visualized with Cytoscape (http://www.cytoscape.org).
11 Cell lines
12 Human glioma cell line LN229 cell was purchased from the ATCC (Manassas, VA). We
13 have performed the reauthentication by short tandem repeat analysis in July 2017 in our
14 laboratory. Cells were routinely cultured in Dulbecco’s modified Eagle’s medium
15 supplemented with 10% fetal bovine serum (HyClone, Logan, Utah), 100 units/ml
16 penicillin and 100 mg/ml streptomycin (Invitrogen, Carlsbad, CA) at 37 ℃ in a
17 humidified atmosphere of 5% CO2 as our previously reported study (7).
18 Small interference RNA (siRNA), quantitative PCR
19 The scrambled small interfering RNA (siRNA) and KIF2C specific siRNA were
20 synthesized from the Genepharma Corporation (Suzhou, Jiangsu, China). The
21 sequences for TTF2 siRNA as follow: TTF2 siRNA1 5′-GGC CCU CAG GUA AUG
22 CUA ATT-3′; TTF2 siRNA2 5′- CCA AGA UCA CGU UCA UGC ATT-3′; TTF2
23
1 siRNA3 5′- GGA AAG AGC UUC UAC GUG UTT-3′.
2 The quantitative PCR was performed by the SYBR® Select Master Mix (Thermo
3 Fisher, Waltham, MA) in ABI 7500 (Thermo Fisher). The sequences for the primers of
4 TTF2 were 5′-ATT TAC CGA GTA GGG CAG CA-3′ (forward primer) and 5′- GGA
5 CTC TGA GGT CAG CCA AG-3′ (reverse primer). The sequences for the primers of
6 glyceraldehyde 3-phosphate dehydrogenase were 5′-GGT GGT CTC CTC TGA CTT
7 CAA CA-3′ (forward primer) and 5′-GTT GCT GTA GCC AAA TTC GTT GT-3′
8 (reverse primer).
9 Transcriptome sequencing
10 The total RNA of LN229 cells were extracted by the Trizol reagent (Thermo Fisher,
11 Waltham, MA). Then the RNA sequencing library and sequencing were finished by the
12 Novogene (Beijing, China) using Illumina Novaseq 6000 System (Illumina, San Diego,
13 CA, USA). After quality control and data mapping, quantitative analysis of gene
14 expression was finished by the package of subread. Differential analysis of gene
15 expression was performed by the DESeq2 (54). The quantitative RNA expression data
16 of genes in LN229 with or without TTF2 knock-down had been uploaded to the figshare
17 website (https://figshare.com/), and the DOI is 10.6084/m9.figshare.9164399.
18 Statistics
19 Patients were divided into “high-risk” and “low-risk” groups using the median risk
20 score as the cutoff value in both the training and validation datasets. A nonparametric
21 test was used to compare the distribution of age between the two risk groups, and Chi-
22 square tests were used to compare the distribution of other clinicopathological features.
24
1 One-way ANOVA test was performed to compare the risk scores in patients grouped
2 by WHO 2016 subgroup or combination of TERT promoter and IDH status. Two tailed
3 Student’s t test was performed to compare the risk scores in patients grouped by other
4 clinical or molecular-pathological characteristics, and a P value less than 0.05 was
5 considered significant.
6 The prediction efficiency of the signature risk scores, age, WHO grade, and
7 subgroups according to WHO 2016 classification for 3-year survival and 5-year
8 survival were examined using ROC curves. Univariate and multivariate Cox regression
9 analysis was performed to determine the prognostic value of the risk score and various
10 clinical and molecular-pathological characteristics.
11 The Kaplan–Meier method with a two-sided log-rank test was used to compare the
12 OS of patients in the different RNA processing clusters or in the high- and low-risk
13 groups. All statistical analyses were conducted using R v3.4.1 (https://www.r-
14 project.org/), SPSS 16.0 (SPSS, Inc., Chicago, IL, USA) and Prism 7 (GraphPad
15 Software, Inc., La Jolla, CA, USA).
16 Study approval
17 This retrospective study was approved and reviewed by the institutional review board
18 of Beijing Tiantan Hospital, and the informed consent was waived.
19
20 Author Contributions
21 R.C.C. and Y.M.L.: study design, data analysis and manuscript drafting; R.C.C and
22 Y.Z.C performed the cell biological experiments and analyzed transcriptional data.
25
1 K.N.Z., Y.Z.C., Y.Q.L., Z.Z.: data illustration; Z.L.W, Y.H.C., G.Z.L., K.Y.W.: data
2 acquisition; F.W. and Y.Z.W.: study design and supervision.
3
4 Acknowledgments
5 This study was supported by the National Natural Science Foundation of China
6 (81773208, 81402052), the Beijing Nova Program (Z16110004916082), and the
7 National Key Research and Development Plan (2016YFC0902500).
8
9 Abbreviations:
10 The abbreviations used in the manuscript are listed in the Supplementary Table 15.
26
1 References:
2 1. Jiang T, Mao Y, Ma W, Mao Q, You Y, Yang X, Jiang C, Kang C, Li X, Chen L, et al. CGCG 3 clinical practice guidelines for the management of adult diffuse gliomas. Cancer Lett. 4 2016;375(2):263-73. 5 2. Ostrom QT, Gittleman H, Xu J, Kromer C, Wolinsky Y, Kruchko C, and Barnholtz-Sloan JS. 6 CBTRUS Statistical Report: Primary Brain and Other Central Nervous System Tumors 7 Diagnosed in the United States in 2009-2013. Neuro-oncology. 2016;18(suppl_5):v1-v75. 8 3. Louis DN, Perry A, Reifenberger G, von Deimling A, Figarella-Branger D, Cavenee WK, 9 Ohgaki H, Wiestler OD, Kleihues P, and Ellison DW. The 2016 World Health Organization 10 Classification of Tumors of the Central Nervous System: a summary. Acta 11 neuropathologica. 2016;131(6):803-20. 12 4. Chai RC, Liu YQ, Zhang KN, Wu F, Zhao Z, Wang KY, Jiang T, and Wang YZ. A novel 13 analytical model of MGMT methylation pyrosequencing offers improved predictive 14 performance in patients with gliomas. Modern pathology : an official journal of the United 15 States and Canadian Academy of Pathology, Inc. 2019;32(1):4-15. 16 5. Baysan M, Woolard K, Cam MC, Zhang W, Song H, Kotliarova S, Balamatsias D, Linkous A, 17 Ahn S, Walling J, et al. Detailed longitudinal sampling of glioma stem cells in situ reveals 18 Chr7 gain and Chr10 loss as repeated events in primary tumor formation and recurrence. 19 International journal of cancer. 2017;141(10):2002-13. 20 6. Marcelino Meliso F, Hubert CG, Favoretto Galante PA, and Penalva LO. RNA processing 21 as an alternative route to attack glioblastoma. Human genetics. 2017;136(9):1129-41. 22 7. Chai RC, Zhang KN, Chang YZ, Wu F, Liu YQ, Zhao Z, Wang KY, Chang YH, Jiang T, and 23 Wang YZ. Systematically characterize the clinical and biological significances of 1p19q 24 genes in 1p/19q non-codeletion glioma. Carcinogenesis. 2019. 25 8. Consortium G. Glioma through the looking GLASS: molecular evolution of diffuse gliomas 26 and the Glioma Longitudinal Analysis Consortium. Neuro-oncology. 2018;20(7):873-84. 27 9. Pekmezci M, Rice T, Molinaro AM, Walsh KM, Decker PA, Hansen H, Sicotte H, Kollmeyer 28 TM, McCoy LS, Sarkar G, et al. Adult infiltrating gliomas with WHO 2016 integrated 29 diagnosis: additional prognostic roles of ATRX and TERT. Acta neuropathologica. 30 2017;133(6):1001-16. 31 10. Aibaidula A, Chan AK, Shi Z, Li Y, Zhang R, Yang R, Li KK, Chung NY, Yao Y, Zhou L, et al. 32 Adult IDH wild-type lower-grade gliomas should be further stratified. Neuro-oncology. 33 2017;19(10):1327-37. 34 11. Aoki K, Nakamura H, Suzuki H, Matsuo K, Kataoka K, Shimamura T, Motomura K, Ohka F, 35 Shiina S, Yamamoto T, et al. Prognostic relevance of genetic alterations in diffuse lower- 36 grade gliomas. Neuro-oncology. 2018;20(1):66-77. 37 12. Eckel-Passow JE, Lachance DH, Molinaro AM, Walsh KM, Decker PA, Sicotte H, Pekmezci 38 M, Rice T, Kosel ML, Smirnov IV, et al. Glioma Groups Based on 1p/19q, IDH, and TERT 39 Promoter Mutations in Tumors. The New England journal of medicine. 40 2015;372(26):2499-508. 41 13. Gleize V, Alentorn A, Connen de Kerillis L, Labussiere M, Nadaradjane AA, Mundwiller E, 42 Ottolenghi C, Mangesius S, Rahimian A, Ducray F, et al. CIC inactivating mutations identify 43 aggressive subset of 1p19q codeleted gliomas. Annals of neurology. 2015;78(3):355-74.
27
1 14. Kamoun A, Idbaih A, Dehais C, Elarouci N, Carpentier C, Letouze E, Colin C, Mokhtari K, 2 Jouvet A, Uro-Coste E, et al. Integrated multi-omics analysis of oligodendroglial tumours 3 identifies three subgroups of 1p/19q co-deleted gliomas. Nature communications. 4 2016;7(11263. 5 15. Thon N, Kunz M, Lemke L, Jansen NL, Eigenbrod S, Kreth S, Lutz J, Egensperger R, Giese 6 A, Herms J, et al. Dynamic 18F-FET PET in suspected WHO grade II gliomas defines distinct 7 biological subgroups with different clinical courses. International journal of cancer. 8 2015;136(9):2132-45. 9 16. Hu X, Martinez-Ledesma E, Zheng S, Kim H, Barthel F, Jiang T, Hess KR, and Verhaak RGW. 10 Multigene signature for predicting prognosis of patients with 1p19q co-deletion diffuse 11 glioma. Neuro-oncology. 2017;19(6):786-95. 12 17. Ames HM, Yuan M, Vizcaino MA, Yu W, and Rodriguez FJ. MicroRNA profiling of low- 13 grade glial and glioneuronal tumors shows an independent role for cluster 14q32.31 14 member miR-487b. Modern pathology : an official journal of the United States and 15 Canadian Academy of Pathology, Inc. 2017;30(2):204-16. 16 18. Rao SA, Santosh V, and Somasundaram K. Genome-wide expression profiling identifies 17 deregulated miRNAs in malignant astrocytoma. Modern pathology : an official journal of 18 the United States and Canadian Academy of Pathology, Inc. 2010;23(10):1404-17. 19 19. Jaffrey SR, and Kharas MG. Emerging links between m(6)A and misregulated mRNA 20 methylation in cancer. Genome medicine. 2017;9(1):2. 21 20. Coppin L, Leclerc J, Vincent A, Porchet N, and Pigny P. Messenger RNA Life-Cycle in 22 Cancer Cells: Emerging Role of Conventional and Non-Conventional RNA-Binding 23 Proteins? International journal of molecular sciences. 2018;19(3). 24 21. Geles KG, Zhong W, O'Brien SK, Baxter M, Loreth C, Pallares D, and Damelin M. 25 Upregulation of RNA Processing Factors in Poorly Differentiated Lung Cancer Cells. 26 Translational oncology. 2016;9(2):89-98. 27 22. Correa BR, de Araujo PR, Qiao M, Burns SC, Chen C, Schlegel R, Agarwal S, Galante PA, 28 and Penalva LO. Functional genomics analyses of RNA-binding proteins reveal the 29 splicing regulator SNRPB as an oncogenic candidate in glioblastoma. Genome biology. 30 2016;17(1):125. 31 23. Kechavarzi B, and Janga SC. Dissecting the expression landscape of RNA-binding proteins 32 in human cancers. Genome biology. 2014;15(1):R14. 33 24. Chai RC, Wu F, Wang QX, Zhang S, Zhang KN, Liu YQ, Zhao Z, Jiang T, Wang YZ, and 34 Kang CS. m(6)A RNA methylation regulators contribute to malignant progression and 35 have clinical prognostic impact in gliomas. Aging. 2019;11(4):1204-25. 36 25. Yuan M, Eberhart CG, and Kai M. RNA binding protein RBM14 promotes radio-resistance 37 in glioblastoma by regulating DNA repair and cell differentiation. Oncotarget. 38 2014;5(9):2820-6. 39 26. Lefave CV, Squatrito M, Vorlova S, Rocco GL, Brennan CW, Holland EC, Pan YX, and 40 Cartegni L. Splicing factor hnRNPH drives an oncogenic splicing switch in gliomas. The 41 EMBO journal. 2011;30(19):4084-97. 42 27. Fontana L, Rovina D, Novielli C, Maffioli E, Tedeschi G, Magnani I, and Larizza L. Suggestive 43 evidence on the involvement of polypyrimidine-tract binding protein in regulating 44 alternative splicing of MAP/microtubule affinity-regulating kinase 4 in glioma. Cancer Lett.
28
1 2015;359(1):87-96. 2 28. Su R, Dong L, Li C, Nachtergaele S, Wunderlich M, Qing Y, Deng X, Wang Y, Weng X, Hu 3 C, et al. R-2HG Exhibits Anti-tumor Activity by Targeting FTO/m(6)A/MYC/CEBPA 4 Signaling. Cell. 2018;172(1-2):90-105 e23. 5 29. Cui Q, Shi H, Ye P, Li L, Qu Q, Sun G, Sun G, Lu Z, Huang Y, Yang CG, et al. m(6)A RNA 6 Methylation Regulates the Self-Renewal and Tumorigenesis of Glioblastoma Stem Cells. 7 Cell reports. 2017;18(11):2622-34. 8 30. Yu Y, Jiang X, Schoch BS, Carroll RS, Black PM, and Johnson MD. Aberrant splicing of 9 cyclin-dependent kinase-associated protein phosphatase KAP increases proliferation and 10 migration in glioblastoma. Cancer research. 2007;67(1):130-8. 11 31. Hu H, Mu Q, Bao Z, Chen Y, Liu Y, Chen J, Wang K, Wang Z, Nam Y, Jiang B, et al. 12 Mutational Landscape of Secondary Glioblastoma Guides MET-Targeted Trial in Brain 13 Tumor. Cell. 2018. 14 32. Ryan M, Wong WC, Brown R, Akbani R, Su X, Broom B, Melott J, and Weinstein J. 15 TCGASpliceSeq a compendium of alternative mRNA splicing in cancer. Nucleic acids 16 research. 2016;44(D1):D1018-22. 17 33. Shirahata M, Ono T, Stichel D, Schrimpf D, Reuss DE, Sahm F, Koelsche C, Wefers A, 18 Reinhardt A, Huang K, et al. Novel, improved grading system(s) for IDH-mutant astrocytic 19 gliomas. Acta neuropathologica. 2018;136(1):153-66. 20 34. Brat DJ, Aldape K, Colman H, Holland EC, Louis DN, Jenkins RB, Kleinschmidt-DeMasters 21 BK, Perry A, Reifenberger G, Stupp R, et al. cIMPACT-NOW update 3: recommended 22 diagnostic criteria for "Diffuse astrocytic glioma, IDH-wildtype, with molecular features of 23 glioblastoma, WHO grade IV". Acta neuropathologica. 2018. 24 35. Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JY, 25 Goumnerova LC, Black PM, Lau C, et al. Prediction of central nervous system embryonal 26 tumour outcome based on gene expression. Nature. 2002;415(6870):436-42. 27 36. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, 28 and Burge CB. Alternative isoform regulation in human tissue transcriptomes. Nature. 29 2008;456(7221):470-6. 30 37. Wan L, Yu W, Shen E, Sun W, Liu Y, Kong J, Wu Y, Han F, Zhang L, Yu T, et al. SRSF6- 31 regulated alternative splicing that promotes tumour progression offers a therapy target 32 for colorectal cancer. Gut. 2017. 33 38. Li S, Hu Z, Zhao Y, Huang S, and He X. Transcriptome-Wide Analysis Reveals the 34 Landscape of Aberrant Alternative Splicing Events in Liver Cancer. Hepatology. 2018. 35 39. Babic I, Anderson ES, Tanaka K, Guo D, Masui K, Li B, Zhu S, Gu Y, Villa GR, Akhavan D, et 36 al. EGFR mutation-induced alternative splicing of Max contributes to growth of glycolytic 37 tumors in brain cancer. Cell metabolism. 2013;17(6):1000-8. 38 40. Safaee M, Fakurnejad S, Bloch O, Clark AJ, Ivan ME, Sun MZ, Oh T, Phillips JJ, and Parsa 39 AT. Proportional upregulation of CD97 isoforms in glioblastoma and glioblastoma- 40 derived brain tumor initiating cells. PloS one. 2015;10(2):e0111532. 41 41. Yu F, and Fu WM. Identification of differential splicing genes in gliomas using exon 42 expression profiling. Molecular medicine reports. 2015;11(2):843-50. 43 42. Fu Y, Dominissini D, Rechavi G, and He C. Gene expression regulation mediated through 44 reversible m(6)A RNA methylation. Nature reviews Genetics. 2014;15(5):293-306.
29
1 43. Dao Trong P, Rosch S, Mairbaurl H, Pusch S, Unterberg A, Herold-Mende C, and Warta R. 2 Identification of a Prognostic Hypoxia-Associated Gene Set in IDH-Mutant Glioma. 3 International journal of molecular sciences. 2018;19(10). 4 44. Zhang L, He L, Lugano R, Roodakker K, Bergqvist M, Smits A, and Dimberg A. IDH 5 mutation status is associated with distinct vascular gene expression signatures in lower- 6 grade gliomas. Neuro-oncology. 2018;20(11):1505-16. 7 45. Kim SA, Kwak J, Nam HY, Chun SM, Lee BW, Lee HJ, Khang SK, and Kim SW. Promoter 8 methylation of WNT inhibitory factor-1 and expression pattern of WNT/beta-catenin 9 pathway in human astrocytoma: pathologic and prognostic correlations. Modern 10 pathology : an official journal of the United States and Canadian Academy of Pathology, 11 Inc. 2013;26(5):626-39. 12 46. Ius T, Ciani Y, Ruaro ME, Isola M, Sorrentino M, Bulfoni M, Candotti V, Correcig C, 13 Bourkoula E, Manini I, et al. An NF-kappaB signature predicts low-grade glioma prognosis: 14 a precision medicine approach based on patient-derived stem cells. Neuro-oncology. 15 2018;20(6):776-87. 16 47. Li J, Liang R, Song C, Xiang Y, and Liu Y. Prognostic and clinicopathological significance 17 of long non-coding RNA in glioma. Neurosurgical review. 2018. 18 48. Huang SP, Chang YC, Low QH, Wu ATH, Chen CL, Lin YF, and Hsiao M. BICD1 expression, 19 as a potential biomarker for prognosis and predicting response to therapy in patients with 20 glioblastomas. Oncotarget. 2017;8(69):113766-91. 21 49. Yi GZ, Xiang W, Feng WY, Chen ZY, Li YM, Deng SZ, Guo ML, Zhao L, Sun XG, He MY, et 22 al. Identification of Key Candidate Proteins and Pathways Associated with Temozolomide 23 Resistance in Glioblastoma Based on Subcellular Proteomics and Bioinformatical Analysis. 24 BioMed research international. 2018;2018(5238760. 25 50. Visvanathan A, Patil V, Arora A, Hegde AS, Arivazhagan A, Santosh V, and Somasundaram 26 K. Essential role of METTL3-mediated m(6)A modification in glioma stem-like cells 27 maintenance and radioresistance. Oncogene. 2018;37(4):522-33. 28 51. Lei W, Wang ZL, Feng HJ, Lin XD, Li CZ, and Fan D. Long non-coding RNA 29 SNHG12promotes the proliferation and migration of glioma cells by binding to HuR. 30 International journal of oncology. 2018;53(3):1374-84. 31 52. Ceccarelli M, Barthel FP, Malta TM, Sabedot TS, Salama SR, Murray BA, Morozova O, 32 Newton Y, Radenbaugh A, Pagnotta SM, et al. Molecular Profiling Reveals Biologically 33 Discrete Subsets and Pathways of Progression in Diffuse Glioma. Cell. 2016;164(3):550-63. 34 53. Bovelstad HM, Nygard S, Storvold HL, Aldrin M, Borgan O, Frigessi A, and Lingjaerde OC. 35 Predicting survival from microarray data--a comparative study. Bioinformatics. 36 2007;23(16):2080-7. 37 54. Love MI, Huber W, and Anders S. Moderated estimation of fold change and dispersion 38 for RNA-seq data with DESeq2. Genome biology. 2014;15(12):550. 39
30
1 Figure legends
2 Figure 1. The workflow of the current study.
3 4
31
1 Figure 2. Prognosis-associated RNA processing genes’ expression profile in LGGs.
2
3 (A) Heatmap showing the expression pattern of the 276 RNA processing genes
4 associated with patient survival. (B) GO biological process terms enriched among the
5 276 survival-associated RNA processing genes. (C) The 19 genes included in the
6 signature, their hazard ratios (HR), and 95% confidence intervals (CI) by univariate
7 Cox regression analysis, and the coefficients by multivariate Cox regression analysis
8 using LASSO. (D–I) Kaplan–Meier overall survival (OS) curves for patients from
9 TCGA (D–F) and CGGA (G–I) datasets with total lower grade glioma (D and G),
10 oligodendroglioma with IDH-mutant and 1p/19q codeletion (E and H), Astrocytoma
32
1 with IDH-mutant or IDH-wild-type (F and I). All of the sample numbers (n) and P
2 values are labeled in the figure.
3
33
1 Figure 3. Comparisons of survival predictive efficiencies between the risk score and
2 clinicopathological characteristics.
3
4 ROC curves showed the predictive efficiencies of risk scores, age, subgroups according
5 to WHO 2016 classification, and WHO grade on 3-year and 5-year survival in TCGA
6 (A, n=143 and B, n=111) and CGGA (C, n=121 and D, n=84) datasets.
7
34
1 Figure 4. Alternative splicing profile analysis in LGGs with lower or higher risk scores.
2
3 (A) Differentially spliced events in 446 LGGs patients with increased risk score. Bars
4 indicate the proportion of each ASE type. Points indicate the number of differentially
5 spliced events in each patient. The IDH status and 1p/19q codeletion status of these
6 LGGs are also presented. (B) The absolute numbers of all alternative spliced events
7 were compared in LGGs with lower (n=111) or higher risk (n=111) score.
8 ****P < 0.0001.
9
35
1 Figure 5. Differential ASE in LGGs with lower or higher risk scores.
2
3 (A) heatmaps showing the expression level of RNA splicing genes (upper panel) and
4 PSI of ASEs with significant differences between LGG groups with lower and higher
5 risk score. (B) representative ASEs with differential PSI between LGG groups with
6 lower and higher risk score. (C) GO_BP terms of spliced genes with differential PSI
7 between LGGs with lower and higher risk scores.
8
36
1 Figure 6. The ASE networks and RNA splicing genes.
2 3 (A) Labeled circles in the center represent RNA splicing genes. Red circles indicate up-
4 regulated RNA splicing genes in LGGs with higher risk score, while green circles
5 indicate down-regulated splicing genes. Colored circles connected to splicing genes by
6 red or blue lines are distinct types of differential ASEs. The red connecting lines
7 represent positive correlation, while blue connecting lines represent negative
8 correlation. (B) The numbers of ASE significantly correlated to up-regulated (upper
9 panel) or down-regulated (lower panel) RNA splicing genes are summarized.
10
37
1 Figure 7. Functions of genes correlated with risk scores.
2 3 (A–B) Functional analysis of genes positively (red bar chart) or negatively (green bar
4 chart) correlated with the risk score using GO terms of biological processes (A) and
5 KEGG pathway (B).
6
38
1 Figure 8. GSEA analysis of genes correlated with the risk scores.
2
3 (A–H) GSEA revealed the hallmarks of malignant tumors positively correlated with
4 LGGs with high risk scores.
5
39
1 Figure 9. Transcriptome changes of LN229 cells after knock-down mRNA expression
2 of TTF2.
3
4 (A) the mRNA expression changes of TTF2 at 48 hr after transfection of TTF2 specific
5 siRNA. ****P<0.0001, n=3. (B) The volcano figure of deferentially expression genes
6 between TTF2 siRNA (n=3) and Scramble siRNA (n=3). (C–D) Functional analysis of
7 genes down-regulated (red bar chart) or up-regulated (green bar chart) after knock-
8 down mRNA expression of TTF2 using GO terms of biological processes (A) and
9 KEGG pathway (B).
40
1 Table1. Univariate and multivariate Cox regression in TCGA (training) and CGGA (validation) data set Univariate cox analysis Multivariate cox analysis HR (95% CI) HR (95% CI) P-value HR Lower Higher P-value HR Lower Higher Grade (II vs III) 3.86E-06 0.306 0.185 0.505 2.19E-01 0.698 0.394 1.239 Age (Age<40 vs Age≥40) 2.31E-12 1.068 1.049 1.088 9.53E-06 1.048 1.026 1.07 Gender (female vs male) 8.78E-01 1.036 0.662 1.621 IDH (mutant vs wildtype) 4.49E-14 0.161 0.1 0.258 8.15E-01 0.902 0.382 2.129 1p/19q (codel vs non codel) 2.95E-03 0.425 0.241 0.747 2.42E-01 0.68 0.356 1.297 The TCGA dataset WHO 2016 subgroups Oligo, 1p/19q codel, IDH-mutant Reference Astro, IDH-mutant 2.71E-01 1.412 0.764 2.61 Astro, IDH-wildtype 2.04E-10 7.754 4.124 14.579 MGMT (methylation vs unmethylation) 1.52E-03 0.444 0.269 0.733 3.38E-01 1.401 0.703 2.789 Risk score (high vs low) 4.85E-26 4.7 3.526 6.265 6.68E-10 3.589 2.392 5.384
Grade (II vs III) 9.97E-09 0.166 0.09 0.307 2.40E-03 0.578 0.406 0.823 Increasing age 9.21E-04 1.048 1.019 1.078 1.19E-01 1.024 0.994 1.055 Gender (female vs male) 7.83E-01 0.923 0.521 1.634 IDH (mutant vs wildtype) 2.36E-06 0.253 0.143 0.447 9.05E-01 25.541 2.04E-22 3.19E+24 1p/19q (codel vs non codel) 8.08E-03 0.147 0.036 0.607 9.01E-01 0.001 7.55E-50 1.82E+43 WHO 2016 subgroups The CGGA dataset Oligo, 1p/19q codel, IDH-mutant Reference Astro, IDH-mutant 4.41E-02 4.426 1.04 18.833 Astro, IDH-wildtype 2.19E-04 15.542 3.629 66.568
MGMT (methylatation vs unmethylation) 5.30E-03 0.249 0.094 0.662 4.64E-01 0.787 0.416 1.492
Risk score (high vs low) 1.13E-13 11.578 6.065 22.1 2.23E-05 7.554 2.967 19.233 2 IDH: isocitrate dehydrogenase; codel: codeletion; Oligo: oligodendrocytoma; Astro: astrocytoma. 41
1 Supplementary Figures:
2
3 Supplementary Figure 1. Ten-fold cross validation for tuning parameter selection in the LASSO
4 model. LASSO coefficient profiles of the 1203 1p19q genes. The minimum criteria are indicated
5 by the dashed vertical line (left).
6
42
1 2 Supplementary Figure 2. Prognostic value of the risk signature in astrocytoma. Kaplan–Meier
3 overall survival (OS) curves for patients from TCGA (A-B) and CGGA (C-D) datasets with
4 astrocytoma with IDH-mutant or IDH-wildtype. The P value and sample numbers (n) are labeled
5 on the figure.
6
43
1
2 Supplementary Figure 3. Prognostic value of the risk signature in different races of TCGA dataset.
3 (A) The composition of races in TCGA dataset. (B-D) Kaplan–Meier overall survival (OS) curves
4 for White (B), Black or African American (C), and Asian population in TCGA (D), respectively.
5 The P value and sample numbers (n) are labeled on the figure.
6
44
1 2
3 Supplementary Figure 4. Distribution of risk scores in patients stratified by clinicopathological
4 and gene expression characteristics. (A) Heatmap showing the expression level of the 19 signature
5 genes in low-risk and high-risk LGGs. Expression levels are indicated by the color bar to the right
6 ranging from blue (no/low) to red (high) expression. The distribution of clinicopathological
7 features was also compared between the low- and high-risk groups. (B–H) Distribution of risk
8 scores for patients in TCGA dataset stratified by WHO grade (B) WHO 2016 subgroup (C), IDH
9 status (D), 1p/19q codeletion status (E), MGMT promoter methylation status (F), age (G), and
10 TERT promoter (H). **P < 0.01, ***P < 0.001, and ****P < 0.0001.
11
45
1
2 Supplementary Figure 5. Distribution of risk score in LGGs stratified by different pathological
3 features in CGGA dataset. (A–G) Distribution of risk scores for patients in the CGGA dataset
4 stratified by WHO grade (A), IDH status (B), 1p/19q codeletion status (C), MGMT promoter
5 methylation status (D), age (E), WHO 2016 subgroup (F), and TERT promoter (G). *P < 0.05,
6 **P < 0.01, ***P < 0.001, and ****P < 0.0001.
7
46
1 2 Supplementary Figure 6. Distribution of percent spliced in (PSI) levels for ASEs in LGGs. (A)
3 Fraction of events of distinct PSI levels in total LGGs, LGGs with lower and higher risk score.
4 (B–H) Numbers of events of distinct PSI in total LGGs (n = 446), LGGs with lower (n = 111), and
5 higher (n = 111) risk score. AA: alternate acceptor site; AD: alternate donor site; AP: alternate
6 promoter; AT: alternate terminator; ES: exon skip; ME: mutually exclusive exons; RI: retained
7 intron.
8
47
1
2 Supplementary Figure 7. General information for ASE with differential PSI between LGG with
3 lower and higher risk scores. (A–B) Total number of changed spliced genes and ASEs (A) and the
4 fraction of changed ASE types (B) between LGG groups with lower and higher risk scores are
5 summarized.
6
48
1
2 Supplementary Figure 8. Functions of spliced genes with differentially PSI between gliomas
3 with lower (n=111) and higher (n=111) risk scores. (A–B) functional analysis of spliced genes with
4 decreased PSI (A) or increased PSI (B) in LGGs with higher risk score.
49