RNA processing characterize RNA splicing and further stratify lower-grade glioma

Rui-Chao Chai, … , Fan Wu, Yong-Zhi Wang

JCI Insight. 2019. https://doi.org/10.1172/jci.insight.130591.

Clinical Medicine In-Press Preview Genetics Oncology

Graphical abstract

Find the latest version: https://jci.me/130591/pdf 1 RNA processing genes characterize RNA splicing and

2 further stratify lower-grade glioma 3 Authors: Rui-Chao Chai1,2,3*, Yi-Ming Li1,2,4*, Ke-Nan Zhang1,2, Yu-Zhou Chang4, 4 Yu-Qing Liu1,2, Zheng Zhao1,2, Zhi-Liang Wang 1,2, Yuan-Hao Chang1,2, Guan-Zhang 5 Li1,2, Kuan-Yu Wang1,2, Fan Wu1,2#, Yong-Zhi Wang 1,2,3,4# 6 7 1Department of Molecular Neuropathology, Beijing Neurosurgical Institute, Capital 8 Medical University, Beijing, China; 9 2Chinese Glioma Cooperative Group (CGCG), Beijing, China; 10 3China National Clinical Research Center for Neurological Diseases; 11 4Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, 12 Beijing, China; 13 14 * Rui-Chao Chai and Yi-Ming Li contributed equally to this work. 15 16 # Corresponding authors: 17 1. Yong-Zhi Wang 18 Beijing Neurosurgical Institute, Beijing Tiantan Hospital, Capital Medical University, 19 No. 119 South Fourth Ring West Road, 20 Beijing 100070, China 21 Tel/Fax: +861067096794 22 E-mail: [email protected] 23 2. Fan Wu 24 Beijing Neurosurgical Institute, Beijing Tiantan Hospital, Capital Medical University, 25 No. 119 South Fourth Ring West Road, 26 Beijing 100070, China 27 Tel/Fax: +861067096794 28 E-mail: [email protected] 29 30 Keywords: RNA processing, lower-grade gliomas, clinical outcomes, alternative 31 splicing, TTF2 32 33 Conflict of Interest: The authors declare no conflict of interest. 34 35

1

1 Abstract

2 BACKGROUND. Aberrant expression of RNA processing genes may drive the

3 alterative RNA profile in lower-grade gliomas (LGGs). Thus, we aimed to further

4 stratify LGGs based on the expression of RNA processing genes.

5 METHODS. This study included 446 LGGs from The Cancer Genome Atlas (TCGA,

6 training set) and 171 LGGs from the Chinese Glioma Genome Atlas (CGGA, validation

7 set). The least absolute shrinkage and selection operator (LASSO) Cox regression

8 algorithm was conducted to develop a risk-signature. The receiver operating

9 characteristic (ROC) curves and Kaplan–Meier curves were used to study the prognosis

10 value of the risk-signature.

11 RESULTS. Among the tested 784 RNA processing genes, 276 were significantly

12 correlated with the OS of LGGs. Further LASSO Cox regression identified a 19-

13 risk-signature, whose risk score was also an independent prognosis factor (P<0.0001,

14 multiplex Cox regression) in the validation dataset. The signature had better prognostic

15 value than the traditional factors “age”, “grade” and “WHO 2016 classification” for 3‐

16 and 5‐year survival both two datasets (AUCs > 85%). Importantly, the risk-signature

17 could further stratify the survival of LGGs in specific subgroups of WHO 2016

18 classification. Furthermore, alternative splicing events for genes such as EGFR and

19 FGFR were found to be associated with the risk score. mRNA expression levels for

20 genes, which participated in cell proliferation and other processes, were significantly

21 correlated to the risk score.

22 CONCLUSIONS. Our results highlight the role of RNA processing genes for further

2

1 stratifying the survival of patients with LGGs and provide insight into the alternative

2 splicing events underlying this role.

3 FOUNDING. Supported by the National Natural Science Foundation of China

4 (81773208, 81402052), the Beijing Nova Program (Z16110004916082), and the

5 National Key Research and Development Plan (2016YFC0902500).

6

3

1 Introduction

2 Diffuse glioma is the most frequent primary malignant brain tumors in adults (1,

3 2). Although intensive treatment with surgery, radiotherapy, and chemotherapy are

4 conducted, nearly all gliomas relapse. The median overall survival (OS) of patients with

5 glioblastoma (GBM, WHO grade IV) is only approximately 15 months (1, 3, 4).

6 Patients with lower-grade diffuse glioma (LGG, WHO grade II/III) have a relatively

7 favorable prognosis, but these tumors evolve and most relapse as therapy-resistant

8 higher grade gliomas over time (5-7). LGGs can be classified as WHO grade II or III

9 based on histological features or as oligodendroglioma with isocitrate dehydrogenase

10 (IDH)-mutant and 1p/19q codeletion, astrocytoma with IDH-mutant, and astrocytoma

11 with IDH-wildtype according to the WHO 2016 integrated analysis (1, 3, 8). However,

12 this classification still does not fully reflect the rate of LGG evolution, and for the OS

13 of patients with LGGs in a stratified subgroup is rather heterogeneous (7, 8). Thus,

14 further stratification of LGGs using various methods (9-15), including RNA expression

15 profiles (11, 14, 16-18), has attracted attention.

16 A dysregulated RNA expression profile is an important hallmark of tumors (19).

17 RNA processing factors, such as RNA-binding and RNA methylation

18 regulators, are the main factors controlling the life-cycle of RNA in cells and thus play

19 vital roles in numerous diseases and cancers by determining the final content and

20 isoforms of mature RNA transcripts (6, 20-24). However, studies on RNA processing

21 factors in gliomas have mainly focused on GBM, and the roles of RNA processing

22 factors in stem cell maintenance, cell adhesion, cell migration, cell growth, colony

23 formation, and invasiveness have been demonstrated in GBM (6, 20, 25-29). Recently,

24 some RNA processing factors, such as HnRNPH1, PTB, SNRP1, and ELAVL, which

25 are evaluated in the progression and prognosis of GBM, also showed prognostic value

26 for all grades of diffuse gliomas (6), indicating their potential role in LGGs.

27 Considering that abnormal expression of RNA processing genes may drive changes in

28 the dysregulated RNA expression profile in glioma, it is important to systematically

29 examine the roles of RNA processing factors in LGGs.

4

1 RNA processing factors also govern the alternative splicing events (ASEs) of

2 individual genes, which can affect several important factors in tumor initiation and

3 progression (30). For instance, our recent study demonstrated the vital role of 14 exon

4 skip of MET in the malignant progression and targeted therapy of secondary GBM (31).

5 The Cancer Genome Atlas (TCGA) SpliceSeq dataset lists seven ASE types (32). This

6 makes it possible to link the dysregulation of RNA processing factors with aberrant

7 ASEs in LGGs.

8 In this study, we systemically analyzed the expression profile of RNA processing

9 genes and their prognostic values in 617 LGGs from TCGA (n = 446) and the Chinese

10 Glioma Genome Atlas (CGGA; n = 171). We also profiled the ASEs underlying LGGs

11 stratified by the risk signature built with RNA processing genes and identified the

12 corresponding functions.

5

1 Results

2 Prognostic value of RNA processing genes and their biological function in LGGs

3 Our approach and workflow for the selection of RNA processing genes, development

4 and validation of a prognostic risk signature, and the analysis of the ASEs and altered

5 RNA expression profile that correlated to the signature are summarized in Figure 1.

6 Of the 784 RNA processing genes tested, 276 were significantly correlated with

7 the OS of LGGs (Supplementary Table 1). A total of 142 of the 276 genes had an HR >1

8 and were considered risk-associated, while the remaining 134 genes had HR <1 and

9 were considered protection-associated. Risk-associated genes showing increasing

10 expression (upper panel) and protection-associated genes showing decreasing

11 expression (lower panel) were associated with increased malignancy in three subgroups

12 according to the WHO 2016 classification (Figure 2A). Some of these genes also

13 showed inconsistent expression levels within the same subgroups, suggesting the

14 potential prognostic value of these genes within specific subgroups.

15 Since these 276 genes are a collection of genes that participated in any process

16 involved in the conversion of one or more primary RNA transcripts into one or more

17 mature RNA molecules. We used (GO) analysis to study the more

18 specific biological processes that these genes are enriched in, and the results indicated

19 that these survival-associated genes were correlated with the terms such as rRNA

20 processing, nuclear-transcribed mRNA catabolic process, nonsense-mediate decay, and

21 mRNA splicing, via spliceosome (Figure 2B). We also further annotated the 142 risk-

22 associated and 134 protection-associated genes through GO and KEGG analysis,

6

1 respectively (Supplementary Table 2 and 3). The results indicated that GO_BP term of

2 mRNA splicing, via spliceosome and KEGG pathway of spliceosome were the most

3 associated terms for the 142 risk-associated genes, and the GO_BP term of rRNA

4 processing and KEGG pathway of ribosome were the most associated terms for the 134

5 protection-associated genes.

6 Identification of a panel of 19 RNA processing genes as a risk signature in LGGs

7 To easily and reliably stratify the outcomes of LGGs with RNA processing genes, we

8 applied the least absolute shrinkage and selection operator (LASSO) Cox regression

9 algorithm to the 276 genes in TCGA dataset (Supplementary Figure 1). A total of 19

10 genes were selected to build the risk signature, and the coefficients and normalized

11 expression levels of these genes were used together to calculate risk scores for both the

12 training dataset (TCGA dataset) and validation dataset (CGGA dataset) (Figure 2C and

13 Supplementary Table 4).

14 We divided patients into high-risk and low-risk groups in various stratified glioma

15 subtypes using their respective median risk score as a cutoff. We found that patients

16 with low-risk scores had significantly longer OS than patients with high-risk scores in

17 all subtypes evaluated in TGCA datasets: all LGGs (Figure 2D), oligodendroglioma

18 with IDH-mutant and 1p/19q codeletion (Figure 2E), astrocytoma with IDH-mutant and

19 IDH-wild-type (Figure 2F, Supplementary Figure 2A–B). Similar results were observed

20 in the CGGA datasets (Figure 2G–I, Supplementary Figure 2C–D).

21 To investigate the prognostic value of the risk signature and other

22 clinicopathological features, univariate and multivariate Cox regression analysis was

7

1 performed both in TCGA (training) and CGGA (validation) data set. The results

2 revealed that high risk score was a prognostic factor in both datasets (P<0.0001),

3 independent of grade, age, gender, WHO 2016 subgroups, IDH mutation, 1p19q

4 codeletion, and MGMT methylation status (Table 1).

5 We also summarized the race composition in datasets used in this study, as a

6 specific race may represent population-specific exposure and risk factor. In the TCGA

7 dataset, 92.00% of cases are White, 3.36% of cases are Black and African American,

8 1.79% of cases are Asian, 0.22% of cases are American Indian or Alaska Native, and

9 the rest 2.02% of cases unknown (Supplementary Figure 3A). All of the cases in the

10 CGGA dataset are Asian. We found that patients with low risk scores had significantly

11 longer OS than patients with high risk scores in White (Supplementary Figure 3B) and

12 Black or African American (Supplementary Figure 3C) population of TGCA dataset.

13 We cannot compare the OS of the 8 cases of Asian in the TCGA dataset, as all of them

14 survived at the end of follow-up (Supplementary Figure 3D). However, the signature

15 could stratify the survival of Asian population, since all the cases in the CGGA dataset

16 are Asian.

17 Collectively, these results strongly support the ability of our risk signature to

18 accurately stratify the prognosis of patients with definite subgroups based on WHO

19 2016 integrated diagnosis.

20

21 The signature risk score is correlated with malignant clinicopathological features

22 of LGGs

8

1 To determine whether the risk signature also reflects the malignant clinicopathological

2 features of LGGs, we determined risk signature expression of 19 genes and

3 clinicopathological features in LGGs with low and high risk in the form of a heatmap

4 (Supplementary Figure 4A). Significant differences were observed between the high-

5 and low-risk groups with respect to the WHO grade (P < 0.001), IDH status (P < 0.001),

6 1p/19q codeletion status (P < 0.001), WHO 2016 subgroups (P < 0.001), MGMT

7 promoter methylation status (P < 0.001), age (P < 0.001), and CIC (P < 0.001)

8 (Supplementary Figure 4A and Supplementary Table 5). We also found that the risk

9 score was significantly different in patients with LGGs stratified by WHO grade,

10 subgroups according to WHO 2016 classification, IDH mutation status, 1p/19q

11 codeletion status, MGMT promoter methylation status, and age (Supplementary Figure

12 4B–G). Moreover, the distributions of risk score between LGGs with and without TERT

13 promoter mutation were highly dependent on the IDH mutant status (Supplementary

14 Figure 4H). The risk score significantly increased in TERT-mutant LGGs with wild-

15 type-IDH (P < 0.01), while the risk score deceased in TERT-mutant LGGs with mutant-

16 IDH (P < 0.01). A similar distribution of risk score was observed in the CGGA dataset

17 (all P values < 0.05, except for TERT status, and P=0.07 for TERT status,

18 Supplementary Figure 5). These results indicated that the signature was associated with

19 the malignant clinicopathological features of gliomas.

20 The receiver operating characteristic (ROC) curve analysis showed that the

21 signature risk score had the best efficiency (compared with age, WHO 2016 subgroups

22 and WHO grade) for predicting the 3-year and 5-year survival of patients with LGGs

9

1 both in TCGA and CGGA datasets (Figure 3). Areas under the curve of risk score, age

2 WHO 2016 subgroups and WHO grade were 93.7%, 81.8%,77.6%,70.2% respectively

3 for 3 year’s survival in TCGA dataset; 88.1%,78.8%, 74.8%, 68.4% respectively for 5

4 year’s survival in TCGA dataset; 85.9%, 64.4%, 76.1%, 78.6% respectively for 3 year’s

5 survival in CGGA dataset; 87.3%, 63.7%, 76.4%, 75.3% respectively for 5 year’s

6 survival in CGGA dataset.

7

8 The signature risk score is closely correlated with the alternative splicing events

9 RNA splicing activities are governed by RNA processing genes, and our findings

10 showed that worse survival-associated RNA processing genes were most enriched in

11 RNA splicing related activities. Thus, we also systematically characterized ASEs in

12 LGGs with different risk scores. In total, tens of thousands of seven ASE types,

13 including alternate acceptor site (AA), alternate donor site (AD), alternate promoter

14 (AP), alternate terminator (AT), exon skip (ES), mutually exclusive exons (ME), and

15 retained intron (RI), were detected in each LGG (Supplementary Figure 6A).

16 Furthermore, the proportion of differential ASE types in LGGs varied widely (from 0.5%

17 to approximately 35%). Although all LGGs shared similar patterns of ASE types (ES

18 and AP were the most frequently observed ASE types, while ME and RI were the least

19 frequently observed ASE types), the total number of detected differential ASEs

20 gradually increased along with the increasing risk score (Figure 4A), and the total

21 number of ASEs was significantly smaller in LGGs with a lower (1st quarter n=111) risk

22 score than with a higher (4th quarter n=111) risk score (Figure 4B).

10

1 Percent Spliced In (PSI) is the ratio of reads indicating the presence of a transcript

2 element versus the total reads covering the event (32). We also analyzed the PSI levels

3 for ASEs in all LGGs and LGGs with lower and higher risk scores. The results revealed

4 that events with lower PSI levels (PSI ≤ 0.2) and higher PSI levels (PSI > 0.8)

5 constituted most types of ASEs in all LGGs and LGGs with lower and higher risk scores

6 (Supplementary Figure 6B–H). These observations indicated that transcripts with high

7 splice-out alternative exons or high splice-in alternative exons were the predominantly

8 transcribed form of most genes.

9 We further identified differentially expressed RNA splicing genes (P < 0.05 and

10 fold-change ≥2 or ≤0.5) and ASEs with significantly different PSI (P < 0.05 and fold-

11 change ≥2 or ≤0.5) in LGGs with lower (1st quarter n=111) and higher (4th quarter n=111)

12 risk scores (Figure 5A, Supplementary Table 6 and Supplementary Table 7). There were

13 257 ASEs for 247 genes with decreased PSI levels in LGGs with higher risk score and

14 604 ASEs for 527 genes with increased PSI levels in LGGs with higher risk scores

15 (Supplementary Figure 7A). The proportion of AP type of ASEs dramatically increased

16 (from 23.14% to 47.7%) in these ASEs with significantly different PSI. In LGGs with

17 higher risk score, AP (45.1%) and ES (26.5%) showed the largest proportion in ASEs

18 with decreased PSI, while AP (49.3%) and AT (25.0%) constituted a larger proportion

19 of ASEs with increased PSI (Supplementary Figure 7B). This suggests that ES

20 transcripts with high splice-out exons are more likely to be present in LGGs with higher

21 risk scores and the AP type of ASEs appeared to be more relevant to the prognosis of

22 LGGs.

11

1 We found that genes involved in the receptor tyrosine kinase signaling pathway

2 (EGFR, FGFR1, and FGFR2), DNA damage response (C5orf45), RNA binding

3 (CIRBP), transcription regulation (CREM), and brain disease development (MAPT)

4 were differentially spliced between LGGs with lower and higher risk scores (Figure

5 5B). To further explore the function of alternative splicing in the malignancy of LGGs,

6 GO analysis was performed for all genes that were differentially spliced in LGGs with

7 lower and higher risk scores. In general, genes with differential PSI were mainly

8 enriched in biological processes, such as “positive regulation of GTPase activity”,

9 “SRP-dependent co-translational targeting to membrane”, “cytoskeleton”,

10 “cytokinesis”, “Transmembrane receptor protein tyrosine kinase signaling pathway”,

11 “cell adhesion”, and others (Figure 5C, Supplementary Figure 8). Our analysis

12 indicated that differential ASEs participate in many cancer-related biological processes,

13 particularly signal transduction pathways, suggesting that ASEs are a critical

14 mechanism underlying the prognostic value of RNA processing genes in LGGs.

15 We further investigated the potential regulatory networks between the significantly

16 changed 28 RNA splicing genes and 862 ASEs (see Methods). Finally, a network of the

17 28 differential RNA splicing genes and 221 of their significantly correlated differential

18 ASEs was constructed (Figure 6A and Supplementary Table 8). The 28 RNA splicing

19 genes regulated distinct amounts of ASEs ranging from 4 to 104 (Figure 6B). For RNA

20 splicing genes whose expression was increased in LGGs with higher risk scores, TTF2

21 and MBNL3 were found to regulate a larger number of ASEs, while GPATCH1 and

22 DHX15 regulated fewer ASEs (Figure 6B upper panel). In RNA splicing genes whose

12

1 expression was decreased in LGGs with higher risk scores, POLR2F and PQBP1

2 regulated a larger number of ASEs, while GPKOW and SAP18 regulated fewer ASEs

3 (Figure 6B lower panel). The proportion of AP type of ASE was still the largest one in

4 these 221 ASEs.

5

6 The function analysis of genes whose mRNA expression are correlated with the

7 signature risk score in LGGs

8 Since RNA processing genes are the main factors controlling the life-cycle of RNA in

9 cells, we also evaluate the RNA expression profile influenced by the differentially

10 expressed RNA processing genes. We identified genes positively (Pearson

11 coefficient >0.5 and Bonferroni corrected P < 0.01) or negatively (Pearson coefficient

12 < -0.5, and Bonferroni corrected P < 0.01) correlated with the risk score and then

13 annotated their functions using GO analysis for biological processes and KEGG

14 pathways analysis (Figure 7A and B). Overall, the results indicated that the risk scores

15 positively correlated genes were mainly enriched in regulation of division

16 and DNA stability, cell division, DNA replication, cell cycle, DNA repair, angiogenesis

17 and other malignancy-related biological processes (Figure 7A). The corresponding

18 pathways could also be observed in the Kyoto Encyclopedia of Genes and Genomes

19 (KEGG) pathway analysis (Figure 7B). Suggesting that the altered expression of RNA

20 processing genes may be associated with the increased expression of genes enriched in

21 these processes. Interesting, we observed that the signature risk scores negatively

22 correlated genes were mainly involved in the GO processes of translation and rRNA

13

1 processing, and the KEGG pathway of ribosome (Figure 7A and B). This finding is

2 consistent with the finding that the GO_BP term of rRNA processing and KEGG

3 pathway of ribosome are the most associated terms for the 134 protection-associated

4 RNA processing genes (Supplementary Table 2).

5 Furthermore, Gene Set Enrichment Analysis (GSEA) revealed that the hallmarks

6 of malignant tumors, including epithelial-mesenchymal transition, angiogenesis, G2M

7 checkpoint, mitotic spindle, inflammatory response, complement, PI3K-AKT-MTOR

8 signaling, and KRAS signaling (Figure 8A–H), as significantly enriched in LGGs with

9 risk scores higher than median risk score. These findings indicate that the risk score of

10 the RNA processing signature reflects the expression alterations of genes involved in

11 malignant biological processes, signaling pathways, and malignant hallmarks in LGGs.

12 RNA expression profile influenced by knock-down of mRNA expression of TTF2

13 To study whether the change of RNA processing gene is a driver or merely correlative

14 to the dysregulated RNA profile. We selected three specific siRNA to knock-down the

15 mRNA expression of TTF2 genes, this gene is risk-associated in the signature (with

16 highest HR), and it also included in the GO term of RNA splicing. The results indicated

17 that all three siRNA could significantly downregulate the mRNA expression of TTF2

18 in glioma cell line LN229 cells, and TTF2 siRNA2 appeared to be the most effective

19 siRNA (Figure 9A).

20 We collected whole transcriptome data of LN229 cells at 48 hr after transfection

21 of TTF2 siRNA2 (n=3) and scramble siRNA (n=3), respectively. Compared with

22 scrambled siRNA group, there were 2676 genes with increased expression and 2710

14

1 genes with decreased expression in the group of TTF2 siRNA, and the mRNA

2 expression of TTF2 gene was also significantly decreased (Figure 9B, Supplementary

3 Table 9). We found that the 2710 genes with decreased expression after knock-down

4 mRNA of TTF2 gene were significantly enriched in the biological processes of

5 chromosome segregation, cell division, cell cycle, mRNA metabolic, and RNA splicing

6 (Figure 9C, Supplementary Table 10), and the KEGG pathway analysis showed these

7 genes also were enriched in the pathways of “cell cycle”, “spliceosome”, “DNA

8 replication”, “RNA transport”, “proteasome”, “mismatch repair”, and others (Figure 9D,

9 Supplementary Table 11). This finding was largely consistent with the result of function

10 analysis of genes whose mRNA expressions were positively correlated to the risk

11 signature, indicating that changes of RNA processing gene TTF2 maybe a driver of

12 these dysregulated RNA profiles in glioma.

13 As mentioned above, functions of genes whose mRNA expressions were

14 negatively correlated to the risk signature mainly enrich in the biological processes of

15 “translation” and “SRP-dependent co-translational protein targeting to membrane”, the

16 KEGG pathways of “Ribosome”. We also noticed that functions of the 2676 up-

17 regulated genes were mainly involved in the protein processing activities, such as the

18 biological processes of “extracellular matrix organization”, “response to endoplasmic

19 reticulum stress” and others, the KEGG pathways of “lysosome”, “protein processing

20 in endoplasmic reticulum” and others (Figure 9C and D, Supplementary Table 12 and

21 13).”

15

1 Discussion

2 In this study, we found that the general expression pattern of RNA processing genes is

3 correlated with the malignancy features of LGGs and identified RNA processing genes

4 significantly associated with the prognosis of LGGs. We further built a risk signature

5 that not only further predicted the prognosis of stratified LGGs but also could perfectly

6 reflect the malignancy-correlated clinicopathological features, biological processes,

7 key signaling pathways, and hallmarks. Moreover, we systemically analyzed ASEs

8 underlying LGGs with lower and higher risk scores and identified the corresponding

9 functions. Our results highlight the role of RNA processing genes in further stratifying

10 the survival of patients with LGGs, and the developed signature shows a potential to

11 become an effective supplement for the integrated diagnosis criteria of WHO 2016 in

12 further stratifying the prognosis of LGGs. Moreover, we also reveal the associated

13 ASEs and biological processing that were associated with the risk scores of the

14 developed signature.

15 Genetic alterations, such as IDH mutation and 1p/19q codeletion, have been

16 included in the integrated diagnosis criteria of WHO 2016 (3). Very large differences

17 in prognosis exist in LGGs within the definite WHO 2016 subgroup (16). Several

18 important biomarkers have been identified to further stratify LGGs, including genetic

19 alterations in CDKN2A/B in astrocytoma with mutant-IDH (11, 33); EGFR

20 amplification, chromosome 7 gain and chromosome 10 loss, and TERT promoter

21 mutation in astrocytoma with wild-type-IDH (11). Importantly, “the EGFR

22 amplification, or chromosome 7 gain and chromosome 10 loss, or TERT promoter

16

1 mutation” had been recommended as the diagnostic criteria for “Diffuse astrocytic

2 glioma, IDH‐wild-type, with molecular features of glioblastoma, WHO grade IV” by

3 the Consortium to Inform Molecular and Practical Approaches to Central Nervous

4 System Tumor Taxonomy (34). Thus, further stratification of LGGs with definite IDH

5 and 1p/19q status is urgently needed.

6 Apart from these genetic alteration markers, RNA expression is also valuable for

7 predicting the outcomes of tumors with low rates of mutation, and the RNA expression

8 profile had become an indispensable element for classifying medulloblastoma (35).

9 Compared to GBM, LGGs have relative lower mutation rates and the RNA expression

10 of a set of genes was suggested to be useful for predicting the prognosis of patients with

11 1p19q co-deletion diffuse glioma (16). In this study, we confirmed the prognostic value

12 of a signature built with 19 RNA processing genes in each stratified subgroup of LGGs

13 (Figure 2D–I). Compared with traditional stratifying factors (age, WHO grade,

14 subgroups of WHO 2016 classification), The risk score of this signature also had the

15 best predictive efficiency on both the 3-year and 5-year survival (Figure 3). Though the

16 risk score of the signature is highly associated with traditional clinicopathological

17 features, we confirmed that the risk score is an independent prognosis factor in both the

18 training and validation dataset (Table 1). A previous study identified 104 key genes that

19 were prognostic for LGGs. However, the average area under the ROC curve (AUC)

20 values ranged from 0.7 to 0.8 (16), which was lower than the performance of our

21 signature (AUC > 0.85). These data indicate that the RNA processing gene signature is

22 a powerful tool for further predicting prognosis of LGGs, which were stratified by

17

1 isocitrate dehydrogenase and 1p/19q status based on the 2016 World Health

2 Organization classification guidelines

3 The ASEs of individual genes which can affect several important players in tumor

4 initiation and progression (6, 30, 36-38). For instance, delta isoform of Max could

5 promote glioma cell proliferation by enhancing functions of MYC in GBM harboring

6 EGFRvIII mutation (39); two of CD97 isoforms, EGF (1,2,5) and EGF (1,2,3,5) could

7 promote growth, migration, metastasis, and angiogenesis in GBM (40); and an aberrant

8 splicing of cyclin-dependent kinase-associated protein phosphatase KAP increases

9 proliferation and migration in glioblastoma (30). Currently, genome-wide analyses of

10 exon expression arrays have begun to reveal the roles of ASEs correlated with the

11 progression and prognosis of GBM (41). However, the role of alterations of ASEs in

12 LGGs is not well-understood. Our study not only reveals the landscape of ASEs in

13 LGGs, but also identified 861 ASEs correlated with the prognosis of LGGs

14 (Supplementary Table 5), although more specific studies are needed to thoroughly

15 determine the functions of these identified ASEs. Moreover, we identified RNA splicing

16 genes correlated with differential ASEs in LGGs with distinct prognosis.

17 RNA plays crucial roles in biological functions in cells not only by passing

18 genetic information from DNA to protein but also by regulating various biological

19 processes (42). Dysregulation of RNA profiles, including hypoxia-associated gene sets

20 (43), vascular gene sets (44), WNT/beta-catenin pathway-related genes (45), genes in

21 the NF-κB signaling pathway (46), micro-RNA (18), and long non-coding RNA (47)

22 have been shown to play crucial roles in the malignant progression and prognosis of

18

1 LGGs. Here, we highlight the stratification ability of RNA processing genes in LGGs.

2 Additionally, we found that genes whose expression was significantly correlated with

3 the risk score of the signature built with 19 RNA processing genes were enriched in the

4 biological processes/pathways of cell cycle, cell division, DNA replication and repair,

5 angiogenesis, cell proliferation, and pathways in cancer (Figure 6A). Among the 19

6 genes included in the signature, RPL3 and BICD1 have been suggested to be associated

7 with temozolomide resistant in glioblastoma (48, 49); and ELAVL1 was reported

8 participated in the cell proliferation, migration and glioma stem-like cell maintenance

9 (50, 51). RNA expression profile and RNA fate are highly dependent on RNA

10 processing factors responsible for precise temporal and spatial coordinating gene

11 expression (6, 20). In addition, our finding also confirmed that knock-down the

12 expression of TTF2 genes could alter the expression of genes involved in the activities

13 of the cell cycle, RNA splicing, DNA replication, mismatch repair, extracellular matrix

14 organization, and others. All of these indicate that abnormal expression of RNA

15 processing genes is potential drivers of the previously reported dysregulated RNA

16 profiles in LGGs.

17 In summary, our study highlights the prognostic value of RNA processing genes

18 in LGGs and revealed a signature with 19 RNA processing genes for further stratifying

19 the outcomes of LGGs with definite WHO subgroups. Clinicopathological features,

20 ASEs, biological processes, signaling pathways, and hallmarks of tumors correlated

21 with the risk signature were also identified. These results provide fundamental

22 information for understanding the roles of RNA processing and indicate the potential

19

1 clinical implications of RNA processing genes in LGGs.

2

3

20

1 Methods

2 Patients

3 A total of 457 LGG samples were collected from the TCGA publication (52). Of these

4 TCGA samples, we excluded 11 samples that do not contain RNA sequencing data or

5 molecular pathological information or useful OS information, and the other 446 LGG

6 samples with RNA-seq transcriptome data and corresponding clinical and molecular

7 pathological information were obtained from TCGA (http://cancergenome.nih.gov/)

8 and used for systematic analysis. Another 171 LGG samples with RNA-seq

9 transcriptome and corresponding clinicopathological information form CGGA

10 (www.cgga.org.cn) were used to validate the performance of the risk signature.

11 Clinicopathological information for the TCGA and CGGA datasets was summarized in

12 Supplementary Table 14.

13 Selection of RNA processing genes

14 We first collated a list of 835 genes that participated in any process involved in the

15 conversion of one or more primary RNA transcripts into one or more mature RNA

16 molecules (http://amigo.geneontology.org/amigo/term/GO:0006396), and then we used

17 the 784 genes with available RNA expression data in TCGA datasets for further analysis.

18 Identification of the risk signature

19 We performed univariate Cox regression analyses of the expression of RNA pressing

20 genes to identify genes significantly correlated with the prognosis of patients with

21 LGGs in TCGA dataset. Next, we employed the LASSO Cox regression algorithm to

22 build an optimal risk signature with the minimum number of genes (53). Finally, a set

21

1 of genes and their coefficients were determined by minimum criteria, which involves

2 selecting the best penalty parameter λ associated with the smallest 10-fold cross

3 validation within the training set. The risk score for the gene signature was calculated

4 using the formula:

푛 5 Risk score = ∑푖=1 퐶표푒푓푖 ∗ 푥푖

6 where Coef is the coefficient and 푥푖 is the z-score-transformed relative expression

7 value of each selected gene.

8 Patients were divided into “high-risk” and “low-risk” groups using the median risk

9 score as the cutoff value. Patients could also be compared between groups with lower

10 risk score (1st quarter) and higher risk score (4th quarter).

11 Bioinformatic analysis

12 GO and KEGG pathway enrichment analyses with the Database for Annotation,

13 Visualization, and Integrated Discovery (http://david.abcc.ncifcrf.gov/home.jsp) to

14 functionally annotate genes with prognosis value in LGGs, significantly correlated to

15 the risk score, and differentially spliced between patients with lower (1st quarter) and

16 higher (4th quarter) risk score. GSEA was performed to investigate the functions of

17 genes that significantly correlated to the risk score.

18 Construction of regulation networks between RNA splicing genes and ASEs in

19 LGGs

20 The RNA splicing data were collected from the online database

21 (http://bioinformatics.mdanderson.org/TCGASpliceSeq). The ASEs could be

22 quantified by the PSI value, which represents the ratio of included transcript reads in

22

1 total transcript reads forms, and the detailed information on the PSI calculation as

2 reported study (32).

3 RNA splicing genes (GO: 0008380) showing significant changes in expression

4 levels were predicted to be correlated with the differential PSI level of ASEs between

5 LGGs with lower (1st quarter) and higher (4th quarter) risk scores. We calculated the

6 Pearson correlation for each RNA splicing gene-ASE pair, RNA splicing gene-ASE pair

7 with correlation coefficients greater than 0.4 or less than -0.4, and the corresponding P

8 value less than 0.05 were considered significantly correlated (Bonferroni correction

9 was performed to adjust the P value for multiple comparisons). The predicted

10 regulatory network was visualized with Cytoscape (http://www.cytoscape.org).

11 Cell lines

12 Human glioma cell line LN229 cell was purchased from the ATCC (Manassas, VA). We

13 have performed the reauthentication by short tandem repeat analysis in July 2017 in our

14 laboratory. Cells were routinely cultured in Dulbecco’s modified Eagle’s medium

15 supplemented with 10% fetal bovine serum (HyClone, Logan, Utah), 100 units/ml

16 penicillin and 100 mg/ml streptomycin (Invitrogen, Carlsbad, CA) at 37 ℃ in a

17 humidified atmosphere of 5% CO2 as our previously reported study (7).

18 Small interference RNA (siRNA), quantitative PCR

19 The scrambled small interfering RNA (siRNA) and KIF2C specific siRNA were

20 synthesized from the Genepharma Corporation (Suzhou, Jiangsu, China). The

21 sequences for TTF2 siRNA as follow: TTF2 siRNA1 5′-GGC CCU CAG GUA AUG

22 CUA ATT-3′; TTF2 siRNA2 5′- CCA AGA UCA CGU UCA UGC ATT-3′; TTF2

23

1 siRNA3 5′- GGA AAG AGC UUC UAC GUG UTT-3′.

2 The quantitative PCR was performed by the SYBR® Select Master Mix (Thermo

3 Fisher, Waltham, MA) in ABI 7500 (Thermo Fisher). The sequences for the primers of

4 TTF2 were 5′-ATT TAC CGA GTA GGG CAG CA-3′ (forward primer) and 5′- GGA

5 CTC TGA GGT CAG CCA AG-3′ (reverse primer). The sequences for the primers of

6 glyceraldehyde 3-phosphate dehydrogenase were 5′-GGT GGT CTC CTC TGA CTT

7 CAA CA-3′ (forward primer) and 5′-GTT GCT GTA GCC AAA TTC GTT GT-3′

8 (reverse primer).

9 Transcriptome sequencing

10 The total RNA of LN229 cells were extracted by the Trizol reagent (Thermo Fisher,

11 Waltham, MA). Then the RNA sequencing library and sequencing were finished by the

12 Novogene (Beijing, China) using Illumina Novaseq 6000 System (Illumina, San Diego,

13 CA, USA). After quality control and data mapping, quantitative analysis of gene

14 expression was finished by the package of subread. Differential analysis of gene

15 expression was performed by the DESeq2 (54). The quantitative RNA expression data

16 of genes in LN229 with or without TTF2 knock-down had been uploaded to the figshare

17 website (https://figshare.com/), and the DOI is 10.6084/m9.figshare.9164399.

18 Statistics

19 Patients were divided into “high-risk” and “low-risk” groups using the median risk

20 score as the cutoff value in both the training and validation datasets. A nonparametric

21 test was used to compare the distribution of age between the two risk groups, and Chi-

22 square tests were used to compare the distribution of other clinicopathological features.

24

1 One-way ANOVA test was performed to compare the risk scores in patients grouped

2 by WHO 2016 subgroup or combination of TERT promoter and IDH status. Two tailed

3 Student’s t test was performed to compare the risk scores in patients grouped by other

4 clinical or molecular-pathological characteristics, and a P value less than 0.05 was

5 considered significant.

6 The prediction efficiency of the signature risk scores, age, WHO grade, and

7 subgroups according to WHO 2016 classification for 3-year survival and 5-year

8 survival were examined using ROC curves. Univariate and multivariate Cox regression

9 analysis was performed to determine the prognostic value of the risk score and various

10 clinical and molecular-pathological characteristics.

11 The Kaplan–Meier method with a two-sided log-rank test was used to compare the

12 OS of patients in the different RNA processing clusters or in the high- and low-risk

13 groups. All statistical analyses were conducted using R v3.4.1 (https://www.r-

14 project.org/), SPSS 16.0 (SPSS, Inc., Chicago, IL, USA) and Prism 7 (GraphPad

15 Software, Inc., La Jolla, CA, USA).

16 Study approval

17 This retrospective study was approved and reviewed by the institutional review board

18 of Beijing Tiantan Hospital, and the informed consent was waived.

19

20 Author Contributions

21 R.C.C. and Y.M.L.: study design, data analysis and manuscript drafting; R.C.C and

22 Y.Z.C performed the cell biological experiments and analyzed transcriptional data.

25

1 K.N.Z., Y.Z.C., Y.Q.L., Z.Z.: data illustration; Z.L.W, Y.H.C., G.Z.L., K.Y.W.: data

2 acquisition; F.W. and Y.Z.W.: study design and supervision.

3

4 Acknowledgments

5 This study was supported by the National Natural Science Foundation of China

6 (81773208, 81402052), the Beijing Nova Program (Z16110004916082), and the

7 National Key Research and Development Plan (2016YFC0902500).

8

9 Abbreviations:

10 The abbreviations used in the manuscript are listed in the Supplementary Table 15.

26

1 References:

2 1. Jiang T, Mao Y, Ma W, Mao Q, You Y, Yang X, Jiang C, Kang C, Li X, Chen L, et al. CGCG 3 clinical practice guidelines for the management of adult diffuse gliomas. Cancer Lett. 4 2016;375(2):263-73. 5 2. Ostrom QT, Gittleman H, Xu J, Kromer C, Wolinsky Y, Kruchko C, and Barnholtz-Sloan JS. 6 CBTRUS Statistical Report: Primary Brain and Other Central Nervous System Tumors 7 Diagnosed in the United States in 2009-2013. Neuro-oncology. 2016;18(suppl_5):v1-v75. 8 3. Louis DN, Perry A, Reifenberger G, von Deimling A, Figarella-Branger D, Cavenee WK, 9 Ohgaki H, Wiestler OD, Kleihues P, and Ellison DW. The 2016 World Health Organization 10 Classification of Tumors of the Central Nervous System: a summary. Acta 11 neuropathologica. 2016;131(6):803-20. 12 4. Chai RC, Liu YQ, Zhang KN, Wu F, Zhao Z, Wang KY, Jiang T, and Wang YZ. A novel 13 analytical model of MGMT methylation pyrosequencing offers improved predictive 14 performance in patients with gliomas. Modern pathology : an official journal of the United 15 States and Canadian Academy of Pathology, Inc. 2019;32(1):4-15. 16 5. Baysan M, Woolard K, Cam MC, Zhang W, Song H, Kotliarova S, Balamatsias D, Linkous A, 17 Ahn S, Walling J, et al. Detailed longitudinal sampling of glioma stem cells in situ reveals 18 Chr7 gain and Chr10 loss as repeated events in primary tumor formation and recurrence. 19 International journal of cancer. 2017;141(10):2002-13. 20 6. Marcelino Meliso F, Hubert CG, Favoretto Galante PA, and Penalva LO. RNA processing 21 as an alternative route to attack glioblastoma. Human genetics. 2017;136(9):1129-41. 22 7. Chai RC, Zhang KN, Chang YZ, Wu F, Liu YQ, Zhao Z, Wang KY, Chang YH, Jiang T, and 23 Wang YZ. Systematically characterize the clinical and biological significances of 1p19q 24 genes in 1p/19q non-codeletion glioma. Carcinogenesis. 2019. 25 8. Consortium G. Glioma through the looking GLASS: molecular evolution of diffuse gliomas 26 and the Glioma Longitudinal Analysis Consortium. Neuro-oncology. 2018;20(7):873-84. 27 9. Pekmezci M, Rice T, Molinaro AM, Walsh KM, Decker PA, Hansen H, Sicotte H, Kollmeyer 28 TM, McCoy LS, Sarkar G, et al. Adult infiltrating gliomas with WHO 2016 integrated 29 diagnosis: additional prognostic roles of ATRX and TERT. Acta neuropathologica. 30 2017;133(6):1001-16. 31 10. Aibaidula A, Chan AK, Shi Z, Li Y, Zhang R, Yang R, Li KK, Chung NY, Yao Y, Zhou L, et al. 32 Adult IDH wild-type lower-grade gliomas should be further stratified. Neuro-oncology. 33 2017;19(10):1327-37. 34 11. Aoki K, Nakamura H, Suzuki H, Matsuo K, Kataoka K, Shimamura T, Motomura K, Ohka F, 35 Shiina S, Yamamoto T, et al. Prognostic relevance of genetic alterations in diffuse lower- 36 grade gliomas. Neuro-oncology. 2018;20(1):66-77. 37 12. Eckel-Passow JE, Lachance DH, Molinaro AM, Walsh KM, Decker PA, Sicotte H, Pekmezci 38 M, Rice T, Kosel ML, Smirnov IV, et al. Glioma Groups Based on 1p/19q, IDH, and TERT 39 Promoter Mutations in Tumors. The New England journal of medicine. 40 2015;372(26):2499-508. 41 13. Gleize V, Alentorn A, Connen de Kerillis L, Labussiere M, Nadaradjane AA, Mundwiller E, 42 Ottolenghi C, Mangesius S, Rahimian A, Ducray F, et al. CIC inactivating mutations identify 43 aggressive subset of 1p19q codeleted gliomas. Annals of neurology. 2015;78(3):355-74.

27

1 14. Kamoun A, Idbaih A, Dehais C, Elarouci N, Carpentier C, Letouze E, Colin C, Mokhtari K, 2 Jouvet A, Uro-Coste E, et al. Integrated multi-omics analysis of oligodendroglial tumours 3 identifies three subgroups of 1p/19q co-deleted gliomas. Nature communications. 4 2016;7(11263. 5 15. Thon N, Kunz M, Lemke L, Jansen NL, Eigenbrod S, Kreth S, Lutz J, Egensperger R, Giese 6 A, Herms J, et al. Dynamic 18F-FET PET in suspected WHO grade II gliomas defines distinct 7 biological subgroups with different clinical courses. International journal of cancer. 8 2015;136(9):2132-45. 9 16. Hu X, Martinez-Ledesma E, Zheng S, Kim H, Barthel F, Jiang T, Hess KR, and Verhaak RGW. 10 Multigene signature for predicting prognosis of patients with 1p19q co-deletion diffuse 11 glioma. Neuro-oncology. 2017;19(6):786-95. 12 17. Ames HM, Yuan M, Vizcaino MA, Yu W, and Rodriguez FJ. MicroRNA profiling of low- 13 grade glial and glioneuronal tumors shows an independent role for cluster 14q32.31 14 member miR-487b. Modern pathology : an official journal of the United States and 15 Canadian Academy of Pathology, Inc. 2017;30(2):204-16. 16 18. Rao SA, Santosh V, and Somasundaram K. Genome-wide expression profiling identifies 17 deregulated miRNAs in malignant astrocytoma. Modern pathology : an official journal of 18 the United States and Canadian Academy of Pathology, Inc. 2010;23(10):1404-17. 19 19. Jaffrey SR, and Kharas MG. Emerging links between m(6)A and misregulated mRNA 20 methylation in cancer. Genome medicine. 2017;9(1):2. 21 20. Coppin L, Leclerc J, Vincent A, Porchet N, and Pigny P. Messenger RNA Life-Cycle in 22 Cancer Cells: Emerging Role of Conventional and Non-Conventional RNA-Binding 23 Proteins? International journal of molecular sciences. 2018;19(3). 24 21. Geles KG, Zhong W, O'Brien SK, Baxter M, Loreth C, Pallares D, and Damelin M. 25 Upregulation of RNA Processing Factors in Poorly Differentiated Lung Cancer Cells. 26 Translational oncology. 2016;9(2):89-98. 27 22. Correa BR, de Araujo PR, Qiao M, Burns SC, Chen C, Schlegel R, Agarwal S, Galante PA, 28 and Penalva LO. Functional genomics analyses of RNA-binding proteins reveal the 29 splicing regulator SNRPB as an oncogenic candidate in glioblastoma. Genome biology. 30 2016;17(1):125. 31 23. Kechavarzi B, and Janga SC. Dissecting the expression landscape of RNA-binding proteins 32 in human cancers. Genome biology. 2014;15(1):R14. 33 24. Chai RC, Wu F, Wang QX, Zhang S, Zhang KN, Liu YQ, Zhao Z, Jiang T, Wang YZ, and 34 Kang CS. m(6)A RNA methylation regulators contribute to malignant progression and 35 have clinical prognostic impact in gliomas. Aging. 2019;11(4):1204-25. 36 25. Yuan M, Eberhart CG, and Kai M. RNA binding protein RBM14 promotes radio-resistance 37 in glioblastoma by regulating DNA repair and cell differentiation. Oncotarget. 38 2014;5(9):2820-6. 39 26. Lefave CV, Squatrito M, Vorlova S, Rocco GL, Brennan CW, Holland EC, Pan YX, and 40 Cartegni L. Splicing factor hnRNPH drives an oncogenic splicing switch in gliomas. The 41 EMBO journal. 2011;30(19):4084-97. 42 27. Fontana L, Rovina D, Novielli C, Maffioli E, Tedeschi G, Magnani I, and Larizza L. Suggestive 43 evidence on the involvement of polypyrimidine-tract binding protein in regulating 44 alternative splicing of MAP/microtubule affinity-regulating kinase 4 in glioma. Cancer Lett.

28

1 2015;359(1):87-96. 2 28. Su R, Dong L, Li C, Nachtergaele S, Wunderlich M, Qing Y, Deng X, Wang Y, Weng X, Hu 3 C, et al. R-2HG Exhibits Anti-tumor Activity by Targeting FTO/m(6)A/MYC/CEBPA 4 Signaling. Cell. 2018;172(1-2):90-105 e23. 5 29. Cui Q, Shi H, Ye P, Li L, Qu Q, Sun G, Sun G, Lu Z, Huang Y, Yang CG, et al. m(6)A RNA 6 Methylation Regulates the Self-Renewal and Tumorigenesis of Glioblastoma Stem Cells. 7 Cell reports. 2017;18(11):2622-34. 8 30. Yu Y, Jiang X, Schoch BS, Carroll RS, Black PM, and Johnson MD. Aberrant splicing of 9 cyclin-dependent kinase-associated protein phosphatase KAP increases proliferation and 10 migration in glioblastoma. Cancer research. 2007;67(1):130-8. 11 31. Hu H, Mu Q, Bao Z, Chen Y, Liu Y, Chen J, Wang K, Wang Z, Nam Y, Jiang B, et al. 12 Mutational Landscape of Secondary Glioblastoma Guides MET-Targeted Trial in Brain 13 Tumor. Cell. 2018. 14 32. Ryan M, Wong WC, Brown R, Akbani R, Su X, Broom B, Melott J, and Weinstein J. 15 TCGASpliceSeq a compendium of alternative mRNA splicing in cancer. Nucleic acids 16 research. 2016;44(D1):D1018-22. 17 33. Shirahata M, Ono T, Stichel D, Schrimpf D, Reuss DE, Sahm F, Koelsche C, Wefers A, 18 Reinhardt A, Huang K, et al. Novel, improved grading system(s) for IDH-mutant astrocytic 19 gliomas. Acta neuropathologica. 2018;136(1):153-66. 20 34. Brat DJ, Aldape K, Colman H, Holland EC, Louis DN, Jenkins RB, Kleinschmidt-DeMasters 21 BK, Perry A, Reifenberger G, Stupp R, et al. cIMPACT-NOW update 3: recommended 22 diagnostic criteria for "Diffuse astrocytic glioma, IDH-wildtype, with molecular features of 23 glioblastoma, WHO grade IV". Acta neuropathologica. 2018. 24 35. Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JY, 25 Goumnerova LC, Black PM, Lau C, et al. Prediction of central nervous system embryonal 26 tumour outcome based on gene expression. Nature. 2002;415(6870):436-42. 27 36. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, 28 and Burge CB. Alternative isoform regulation in human tissue transcriptomes. Nature. 29 2008;456(7221):470-6. 30 37. Wan L, Yu W, Shen E, Sun W, Liu Y, Kong J, Wu Y, Han F, Zhang L, Yu T, et al. SRSF6- 31 regulated alternative splicing that promotes tumour progression offers a therapy target 32 for colorectal cancer. Gut. 2017. 33 38. Li S, Hu Z, Zhao Y, Huang S, and He X. Transcriptome-Wide Analysis Reveals the 34 Landscape of Aberrant Alternative Splicing Events in Liver Cancer. Hepatology. 2018. 35 39. Babic I, Anderson ES, Tanaka K, Guo D, Masui K, Li B, Zhu S, Gu Y, Villa GR, Akhavan D, et 36 al. EGFR mutation-induced alternative splicing of Max contributes to growth of glycolytic 37 tumors in brain cancer. Cell metabolism. 2013;17(6):1000-8. 38 40. Safaee M, Fakurnejad S, Bloch O, Clark AJ, Ivan ME, Sun MZ, Oh T, Phillips JJ, and Parsa 39 AT. Proportional upregulation of CD97 isoforms in glioblastoma and glioblastoma- 40 derived brain tumor initiating cells. PloS one. 2015;10(2):e0111532. 41 41. Yu F, and Fu WM. Identification of differential splicing genes in gliomas using exon 42 expression profiling. Molecular medicine reports. 2015;11(2):843-50. 43 42. Fu Y, Dominissini D, Rechavi G, and He C. Gene expression regulation mediated through 44 reversible m(6)A RNA methylation. Nature reviews Genetics. 2014;15(5):293-306.

29

1 43. Dao Trong P, Rosch S, Mairbaurl H, Pusch S, Unterberg A, Herold-Mende C, and Warta R. 2 Identification of a Prognostic Hypoxia-Associated Gene Set in IDH-Mutant Glioma. 3 International journal of molecular sciences. 2018;19(10). 4 44. Zhang L, He L, Lugano R, Roodakker K, Bergqvist M, Smits A, and Dimberg A. IDH 5 mutation status is associated with distinct vascular gene expression signatures in lower- 6 grade gliomas. Neuro-oncology. 2018;20(11):1505-16. 7 45. Kim SA, Kwak J, Nam HY, Chun SM, Lee BW, Lee HJ, Khang SK, and Kim SW. Promoter 8 methylation of WNT inhibitory factor-1 and expression pattern of WNT/beta-catenin 9 pathway in human astrocytoma: pathologic and prognostic correlations. Modern 10 pathology : an official journal of the United States and Canadian Academy of Pathology, 11 Inc. 2013;26(5):626-39. 12 46. Ius T, Ciani Y, Ruaro ME, Isola M, Sorrentino M, Bulfoni M, Candotti V, Correcig C, 13 Bourkoula E, Manini I, et al. An NF-kappaB signature predicts low-grade glioma prognosis: 14 a precision medicine approach based on patient-derived stem cells. Neuro-oncology. 15 2018;20(6):776-87. 16 47. Li J, Liang R, Song C, Xiang Y, and Liu Y. Prognostic and clinicopathological significance 17 of long non-coding RNA in glioma. Neurosurgical review. 2018. 18 48. Huang SP, Chang YC, Low QH, Wu ATH, Chen CL, Lin YF, and Hsiao M. BICD1 expression, 19 as a potential biomarker for prognosis and predicting response to therapy in patients with 20 glioblastomas. Oncotarget. 2017;8(69):113766-91. 21 49. Yi GZ, Xiang W, Feng WY, Chen ZY, Li YM, Deng SZ, Guo ML, Zhao L, Sun XG, He MY, et 22 al. Identification of Key Candidate Proteins and Pathways Associated with Temozolomide 23 Resistance in Glioblastoma Based on Subcellular Proteomics and Bioinformatical Analysis. 24 BioMed research international. 2018;2018(5238760. 25 50. Visvanathan A, Patil V, Arora A, Hegde AS, Arivazhagan A, Santosh V, and Somasundaram 26 K. Essential role of METTL3-mediated m(6)A modification in glioma stem-like cells 27 maintenance and radioresistance. Oncogene. 2018;37(4):522-33. 28 51. Lei W, Wang ZL, Feng HJ, Lin XD, Li CZ, and Fan D. Long non-coding RNA 29 SNHG12promotes the proliferation and migration of glioma cells by binding to HuR. 30 International journal of oncology. 2018;53(3):1374-84. 31 52. Ceccarelli M, Barthel FP, Malta TM, Sabedot TS, Salama SR, Murray BA, Morozova O, 32 Newton Y, Radenbaugh A, Pagnotta SM, et al. Molecular Profiling Reveals Biologically 33 Discrete Subsets and Pathways of Progression in Diffuse Glioma. Cell. 2016;164(3):550-63. 34 53. Bovelstad HM, Nygard S, Storvold HL, Aldrin M, Borgan O, Frigessi A, and Lingjaerde OC. 35 Predicting survival from microarray data--a comparative study. Bioinformatics. 36 2007;23(16):2080-7. 37 54. Love MI, Huber W, and Anders S. Moderated estimation of fold change and dispersion 38 for RNA-seq data with DESeq2. Genome biology. 2014;15(12):550. 39

30

1 Figure legends

2 Figure 1. The workflow of the current study.

3 4

31

1 Figure 2. Prognosis-associated RNA processing genes’ expression profile in LGGs.

2

3 (A) Heatmap showing the expression pattern of the 276 RNA processing genes

4 associated with patient survival. (B) GO biological process terms enriched among the

5 276 survival-associated RNA processing genes. (C) The 19 genes included in the

6 signature, their hazard ratios (HR), and 95% confidence intervals (CI) by univariate

7 Cox regression analysis, and the coefficients by multivariate Cox regression analysis

8 using LASSO. (D–I) Kaplan–Meier overall survival (OS) curves for patients from

9 TCGA (D–F) and CGGA (G–I) datasets with total lower grade glioma (D and G),

10 oligodendroglioma with IDH-mutant and 1p/19q codeletion (E and H), Astrocytoma

32

1 with IDH-mutant or IDH-wild-type (F and I). All of the sample numbers (n) and P

2 values are labeled in the figure.

3

33

1 Figure 3. Comparisons of survival predictive efficiencies between the risk score and

2 clinicopathological characteristics.

3

4 ROC curves showed the predictive efficiencies of risk scores, age, subgroups according

5 to WHO 2016 classification, and WHO grade on 3-year and 5-year survival in TCGA

6 (A, n=143 and B, n=111) and CGGA (C, n=121 and D, n=84) datasets.

7

34

1 Figure 4. Alternative splicing profile analysis in LGGs with lower or higher risk scores.

2

3 (A) Differentially spliced events in 446 LGGs patients with increased risk score. Bars

4 indicate the proportion of each ASE type. Points indicate the number of differentially

5 spliced events in each patient. The IDH status and 1p/19q codeletion status of these

6 LGGs are also presented. (B) The absolute numbers of all alternative spliced events

7 were compared in LGGs with lower (n=111) or higher risk (n=111) score.

8 ****P < 0.0001.

9

35

1 Figure 5. Differential ASE in LGGs with lower or higher risk scores.

2

3 (A) heatmaps showing the expression level of RNA splicing genes (upper panel) and

4 PSI of ASEs with significant differences between LGG groups with lower and higher

5 risk score. (B) representative ASEs with differential PSI between LGG groups with

6 lower and higher risk score. (C) GO_BP terms of spliced genes with differential PSI

7 between LGGs with lower and higher risk scores.

8

36

1 Figure 6. The ASE networks and RNA splicing genes.

2 3 (A) Labeled circles in the center represent RNA splicing genes. Red circles indicate up-

4 regulated RNA splicing genes in LGGs with higher risk score, while green circles

5 indicate down-regulated splicing genes. Colored circles connected to splicing genes by

6 red or blue lines are distinct types of differential ASEs. The red connecting lines

7 represent positive correlation, while blue connecting lines represent negative

8 correlation. (B) The numbers of ASE significantly correlated to up-regulated (upper

9 panel) or down-regulated (lower panel) RNA splicing genes are summarized.

10

37

1 Figure 7. Functions of genes correlated with risk scores.

2 3 (A–B) Functional analysis of genes positively (red bar chart) or negatively (green bar

4 chart) correlated with the risk score using GO terms of biological processes (A) and

5 KEGG pathway (B).

6

38

1 Figure 8. GSEA analysis of genes correlated with the risk scores.

2

3 (A–H) GSEA revealed the hallmarks of malignant tumors positively correlated with

4 LGGs with high risk scores.

5

39

1 Figure 9. Transcriptome changes of LN229 cells after knock-down mRNA expression

2 of TTF2.

3

4 (A) the mRNA expression changes of TTF2 at 48 hr after transfection of TTF2 specific

5 siRNA. ****P<0.0001, n=3. (B) The volcano figure of deferentially expression genes

6 between TTF2 siRNA (n=3) and Scramble siRNA (n=3). (C–D) Functional analysis of

7 genes down-regulated (red bar chart) or up-regulated (green bar chart) after knock-

8 down mRNA expression of TTF2 using GO terms of biological processes (A) and

9 KEGG pathway (B).

40

1 Table1. Univariate and multivariate Cox regression in TCGA (training) and CGGA (validation) data set Univariate cox analysis Multivariate cox analysis HR (95% CI) HR (95% CI) P-value HR Lower Higher P-value HR Lower Higher Grade (II vs III) 3.86E-06 0.306 0.185 0.505 2.19E-01 0.698 0.394 1.239 Age (Age<40 vs Age≥40) 2.31E-12 1.068 1.049 1.088 9.53E-06 1.048 1.026 1.07 Gender (female vs male) 8.78E-01 1.036 0.662 1.621 IDH (mutant vs wildtype) 4.49E-14 0.161 0.1 0.258 8.15E-01 0.902 0.382 2.129 1p/19q (codel vs non codel) 2.95E-03 0.425 0.241 0.747 2.42E-01 0.68 0.356 1.297 The TCGA dataset WHO 2016 subgroups Oligo, 1p/19q codel, IDH-mutant Reference Astro, IDH-mutant 2.71E-01 1.412 0.764 2.61 Astro, IDH-wildtype 2.04E-10 7.754 4.124 14.579 MGMT (methylation vs unmethylation) 1.52E-03 0.444 0.269 0.733 3.38E-01 1.401 0.703 2.789 Risk score (high vs low) 4.85E-26 4.7 3.526 6.265 6.68E-10 3.589 2.392 5.384

Grade (II vs III) 9.97E-09 0.166 0.09 0.307 2.40E-03 0.578 0.406 0.823 Increasing age 9.21E-04 1.048 1.019 1.078 1.19E-01 1.024 0.994 1.055 Gender (female vs male) 7.83E-01 0.923 0.521 1.634 IDH (mutant vs wildtype) 2.36E-06 0.253 0.143 0.447 9.05E-01 25.541 2.04E-22 3.19E+24 1p/19q (codel vs non codel) 8.08E-03 0.147 0.036 0.607 9.01E-01 0.001 7.55E-50 1.82E+43 WHO 2016 subgroups The CGGA dataset Oligo, 1p/19q codel, IDH-mutant Reference Astro, IDH-mutant 4.41E-02 4.426 1.04 18.833 Astro, IDH-wildtype 2.19E-04 15.542 3.629 66.568

MGMT (methylatation vs unmethylation) 5.30E-03 0.249 0.094 0.662 4.64E-01 0.787 0.416 1.492

Risk score (high vs low) 1.13E-13 11.578 6.065 22.1 2.23E-05 7.554 2.967 19.233 2 IDH: isocitrate dehydrogenase; codel: codeletion; Oligo: oligodendrocytoma; Astro: astrocytoma. 41

1 Supplementary Figures:

2

3 Supplementary Figure 1. Ten-fold cross validation for tuning parameter selection in the LASSO

4 model. LASSO coefficient profiles of the 1203 1p19q genes. The minimum criteria are indicated

5 by the dashed vertical line (left).

6

42

1 2 Supplementary Figure 2. Prognostic value of the risk signature in astrocytoma. Kaplan–Meier

3 overall survival (OS) curves for patients from TCGA (A-B) and CGGA (C-D) datasets with

4 astrocytoma with IDH-mutant or IDH-wildtype. The P value and sample numbers (n) are labeled

5 on the figure.

6

43

1

2 Supplementary Figure 3. Prognostic value of the risk signature in different races of TCGA dataset.

3 (A) The composition of races in TCGA dataset. (B-D) Kaplan–Meier overall survival (OS) curves

4 for White (B), Black or African American (C), and Asian population in TCGA (D), respectively.

5 The P value and sample numbers (n) are labeled on the figure.

6

44

1 2

3 Supplementary Figure 4. Distribution of risk scores in patients stratified by clinicopathological

4 and gene expression characteristics. (A) Heatmap showing the expression level of the 19 signature

5 genes in low-risk and high-risk LGGs. Expression levels are indicated by the color bar to the right

6 ranging from blue (no/low) to red (high) expression. The distribution of clinicopathological

7 features was also compared between the low- and high-risk groups. (B–H) Distribution of risk

8 scores for patients in TCGA dataset stratified by WHO grade (B) WHO 2016 subgroup (C), IDH

9 status (D), 1p/19q codeletion status (E), MGMT promoter methylation status (F), age (G), and

10 TERT promoter (H). **P < 0.01, ***P < 0.001, and ****P < 0.0001.

11

45

1

2 Supplementary Figure 5. Distribution of risk score in LGGs stratified by different pathological

3 features in CGGA dataset. (A–G) Distribution of risk scores for patients in the CGGA dataset

4 stratified by WHO grade (A), IDH status (B), 1p/19q codeletion status (C), MGMT promoter

5 methylation status (D), age (E), WHO 2016 subgroup (F), and TERT promoter (G). *P < 0.05,

6 **P < 0.01, ***P < 0.001, and ****P < 0.0001.

7

46

1 2 Supplementary Figure 6. Distribution of percent spliced in (PSI) levels for ASEs in LGGs. (A)

3 Fraction of events of distinct PSI levels in total LGGs, LGGs with lower and higher risk score.

4 (B–H) Numbers of events of distinct PSI in total LGGs (n = 446), LGGs with lower (n = 111), and

5 higher (n = 111) risk score. AA: alternate acceptor site; AD: alternate donor site; AP: alternate

6 promoter; AT: alternate terminator; ES: exon skip; ME: mutually exclusive exons; RI: retained

7 intron.

8

47

1

2 Supplementary Figure 7. General information for ASE with differential PSI between LGG with

3 lower and higher risk scores. (A–B) Total number of changed spliced genes and ASEs (A) and the

4 fraction of changed ASE types (B) between LGG groups with lower and higher risk scores are

5 summarized.

6

48

1

2 Supplementary Figure 8. Functions of spliced genes with differentially PSI between gliomas

3 with lower (n=111) and higher (n=111) risk scores. (A–B) functional analysis of spliced genes with

4 decreased PSI (A) or increased PSI (B) in LGGs with higher risk score.

49