Identifcation of LncRNAs as Therapeutic Targets in Chronic Lymphocytic Leukemia Simay Dolaner1,2, Harpreet Kaur2, Mohit Mazumder2, Julia Panov2,3, Elia Brodsky2 1Department of Molecular Biology and Genetics, Bahcesehir University, Istanbul, Turkey 2Pine Biotech, New Orleans, USA 3Tauber Bioinformatics Research Center, Haifa, Israel KEYWORDS: Chronic Lymphocytic Leukemia, spontaneous regression, therapeutic targets, LncRNAs, transcriptomics ABSTRACT: Chronic Lymphocytic Leukemia (CLL) is a type of blood cancer that has a very heterogeneous biological background and diverse treatment strategies. However, a small part of this malignancy may disappear without receiving any treatment, known as “spontaneous re- gression”, which occurs as a result of a poorly investigated mechanism. Exposing the underlying causes of this condition can lead to a novel treatment approach for CLL. In this article, we applied in-silico analysis on total RNA expression data from 24 CLL samples to determine possible reg- ulatory mechanisms of spontaneous regression in CLL. These were frst selected by comparing spontaneous regression with progressive samples of CLL at the transcriptional level using two unsupervised machine learning algorithms, i.e., Principal Component Analysis (PCA) and Hier- archical Clustering. Subsequently, the DESeq2 algorithm was used to scrutinize only statisti- cally signifcant (adjusted p-value < 0.01) RNA transcripts that can diferentiate both conditions. Here, at frst, we have elucidated 870 signifcantly diferentially expressed -coding that were involved in the biogenesis and processing of RNA. Consequently, these fndings led our study to investigate non-coding RNA, and 33 long non-coding RNAs (lncRNAs) were found to be signifcantly diferentially expressed among two conditions based on diferential ex- pression analysis. Further, our analysis in the current study suggested lncRNAs, PTPN22-AS1, PCF11-AS1, SYNGAP1-AS1, PRRT3-AS1, and H1FX-AS1 as potential therapeutic targets to trigger spontaneous regression. Ultimately, the results presented here reveal new insights into spontaneous regression and its relationship with non-coding RNAs, particularly lncRNAs.

INTRODUCTION is chronic lymphocytic leukemia (CLL), which is According to the American Cancer Society sta- a lymphoid malignancy due to failed apoptosis tistics, leukemia is the second leading blood and aggressive proliferation of mature B cells cancer, with roughly 60,000 new cases identifed [3]. These cells circulate through the blood as in 2020 [1]. Leukemia has diferent subtypes, non-proliferating cells or arrested cells in the which are classifed in the context of the tumor G0/G1 phase of the cell cycle and may afect origin [2]. The most common variety in adults the function of normal cells in other organs [4].

16

© 2021 Dolaner, Kaur, Mazumder, Panov, Brodsky. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits the user to copy, distribute, and transmit the work provided that the original authors and source are credited. Columbia Undergraduate Science Journal Vol. 15, 2021 Dolaner et. al.

CLL has a highly diverse biological and clinical pathways are identifed using next-generation background for each patient that determines the sequencing of mRNA expression [9]. In spite stage of the disease [5]. Although there are sev- of this progress, nearly 20% of CLL cells do eral stages of CLL classifed according to their not show chromosomal abnormalities or ge- genetic background and B cell number, one of nomic variation. Therefore, researchers re- the most intriguing concepts is known as spon- cently shifted their focus to the non-coding taneous regression, which is the disappearance region of the genome and regulatory RNA of the tumor over time either without any treat- molecules, especially long non-coding RNAs, ment or with treatment that is categorized as which are deregulated in many cancers [10]. insufcient to have an impact on the tumor [6]. Long non-coding RNAs (lncRNAs) Spontaneous regression can be seen are a subgroup of non-coding RNAs which in 1-2% of all CLL patients, and it is a phe- are longer than 200 nucleotides and encom- nomenon that is poorly understood [7]. In this pass thousands of diverse transcripts in hu- process, cells that proliferate uncontrolla- mans [11]. There are approximately 100,000 bly are transmitted to the quiescent state so known lncRNAs, and this quantity is ex- that the tumor disappears partially or com- panding each year with the new studies [12]. pletely with time [6]. Spontaneous regres- LncRNAs play a signifcant role in gene sion is not a common feature of cancer cells regulation, controlling multiple cellular mecha- and is regulated by mechanisms that are not nisms involved in tumor progression. They are well-understood. If such mechanisms can be involved in epigenetic regulation of gene ex- determined, target biological molecules that pression via modifcation, DNA meth- have a specifc role in disease progression ylation, or acetylation. Specifcally, these epi- can be identifed and manipulated in vitro. genetic regulations may include recruitment Current strategies aim to inhibit BCR of histone remodeling complexes, interaction signaling, which is crucial for the survival of the with histone methyltransferases and demeth- B-cells, and chemokine signaling that creates ylases to regulate DNA methylation, or his- survival signals and attracts leukemic cells to tone acetyltransferases and deacetylases to communicate with its microenvironment [5]. modulate the acetylation [13]. Furthermore, Moreover, activation of apoptosis pathways lncRNAs may regulate gene expression at the via blocking BCL2 activity, an anti-apoptotic transcriptional level by recruiting transcription protein, which is highly expressed in leukemic factors [14]. Moreover, lncRNAs can produce cells, is included in the aforementioned strate- hybrids or act as scafolds through interac- gies [8]. Although these types of targeted ther- tion with to regulate expression at apies improve the outcome of the patients, the the post-translational level, including regula- heterogeneity of the leukemia microenviron- tion of phosphorylation and ubiquitination [15]. ment reinforces the necessity of new targets. In tumor development, lncRNAs can As the targeted therapies may have an impact serve as either tumor suppressors, oncogenes, on tumor surroundings and afect the other cells or even both at the same time for some cancer found in the tumor microenvironment, triggering types [16]. Analysis of lncRNA expression pat- spontaneous regression mechanisms may im- terns had led to the identifcation of putative bio- prove the strategies as well as patient outcome. markers such as HOTAIR, H19, and DLEU1/2 In CLL, the most frequent chromo- [17]. Specifcally, HOTAIR serves as an onco- somal abnormalities and somatic mutations gene by inducing invasiveness and metastasis on the protein-coding region of the genome in several cancers via recruiting a demethylase have been distinguished as a result of ge- [13]. Since this lncRNA is highly expressed in nomic sequencing, and disrupted cellular aggressive tumors, it is considered as a bio-

17 Columbia Undergraduate Science Journal Vol. 15, 2021 Dolaner et. al. marker and a possible therapeutic target for from multiple subtypes of chronic lymphocytic many cancers [18]. DLEU1/2 epigenetically leukemia that can be seen in Table I. In our proj- regulate the tumor suppressors and are delet- ect, spontaneous regression and progressive ed in CLL cells, which results in the progression states were chosen for further analysis to inves- of the CLL. This allows it to serve as a biomark- tigate expression variation specifcally involved er in the diagnosis [19]. On the other hand, H19 in the regression mechanism in CLL cells. has a dual role in diferent cancers by stimu- lating distinct mechanisms through transcrip- Table I. Dataset of the BioProject tion factors, which makes it a target that needs to be studied separately for each cancer [13]. Furthermore, lncRNAs are known to play an important role in cell diferentiation and tissue specifcity [20]. However, characteriza- tion may be compelling because lncRNAs are transcribed in diferent loci and localized dis- tinctly. More importantly, lncRNA expression is RNA-seq Raw Data Processing generally tissue-specifc and can be detected Raw reads were used to construct an RNA se- under certain conditions [21]. As the lncRNAs quencing pipeline that contains pre-processing are diferentially expressed, their roles and ac- using Trimmomatic, mapping of reads on refer- tivities can be identifed in disease conditions. ence transcriptome with Bowtie2-t, and quan- Due to the diverse functions of ln- tifcation using RNA-Seq by Expectation-Max- cRNAs, novel studies are concentrated on imization (RSEM) algorithms with T-Bioinfo the identifcation of lncRNAs as therapeu- server (Supplementary Notes 1) as represent- tic targets. As the expression of these mol- ed in Figure 1. Briefy, in order to get gene ex- ecules is tissue and disease-specifc, this pression levels, duplicated sequences which specifcity makes them excellent targets resulted in PCR amplifcation were removed by compared to protein-coding genes [22]. PCR clean considering best coverage. Then, In the scope of this project, the trigger Trimmomatic was used to remove adaptor se- mechanism behind the spontaneous regres- quences and poor-quality data at the end of the sion process is investigated at the transcrip- sequence. Mapping of these clean reads on tomic level to identify a pattern of lncRNA transcriptome was performed by using the Bow- expression that would explain cell “deci- tie2-t algorithm based on the reference genome sion” by comparing CLL tissue at the sponta- (GRCh38). Bowtie2-t is an option of the Bowtie, neous regression and the progressive states. which is a mapping algorithm and aligns short reads according to the seed approach [23]. Fi- METHODS nally, for the quantifcation of gene expression, Dataset the RSEM algorithm was used with FPKM nor- The dataset of this project was generated by malization to obtain the gene expression table. Kwok et al. (2020) and published as a BioProj- ect on the NCBI with the accession number PRJNA535508 (Supplementary Notes 1) [6]. Transcriptome data contains raw reads of RNA sequencing from the Illumina Nextseq 550 plat- form by using paired-end sequencing. In this study, the authors compared multiple samples

18 Columbia Undergraduate Science Journal Vol. 15, 2021 Dolaner et. al.

The diferential gene expression (DGE) pipe- line, which includes pre-processing, mapping with HiSat2, RNA expression quantifcation using HTseq, and diferential gene expression analysis with DESeq2 algorithm was construct- ed by using the T-Bioinfo server (Figure 2). Here, initially, PCR cleaning was performed by considering coverage to get rid of the duplicat- ed sequences generated via PCR amplifca- tion. To eliminate the adaptors and bad quality sequences from the data, the Trimmomatic al- Figure 1: RNA-seq Data Processing Pipeline to gorithm was used. Next, the HiSat2 was utilized generate an RNA expression table on T-Bioinfo for the mapping of sequence reads to the refer- Server ence genome (GRCh38) by taking into account the splice junctions. Then, the HTseq algorithm Exploratory Analysis quantifed the gene expression through over- Exploratory analysis facilitated the examina- lapping reads and generated a gene expression tion of variation between all samples, includ- table in the form of count values. Finally, difer- ing healthy and diseased patients, in order ential gene expression analysis was performed to select a comparison parameter for further by using the DESeq2 algorithm which gives the analysis. Thus, the visual outputs were used expression diferences between two groups us- to determine the patterns. Data was explored ing the shrinkage estimators [26]. DESeq2 pro- by using principal component analysis (PCA), vides results, which include p-value, log 2-fold which is a dimensionality reduction technique change, and adjusted p-value. Then, an ad- that discerns the variability between the sam- justed p-value or False Discovery Rate (FDR) ples [24]. PCA was performed twice, both for that is a standard statistical value utilized for all the samples and for the spontaneous re- multiple testing correction, is calculated to gression and the progressive samples to be eliminate the false-positive results [26, 27]. able to observe the improvement on principle The T-Bioinfo server uses a one- components. Hierarchical Clustering, which step approach for DGE analysis and com- fnds patterns among the samples using sim- bines several methods. Although DESeq2 ilarity measures [25], made it possible to un- can be used for both normalization and sta- derstand the clustering aspects of spontaneous tistical analysis, the T-Bioinfo server pro- regression and progressive samples based on vides an additional approach, which includes their gene expression, especially after the se- gene set enrichment analysis (GSEA) to dis- lection of statistically signifcant genes, which cover related pathways and processes [28]. will be explained in the following section. Selection of Signifcant Genes Diferential Gene Expression Analysis The signifcant genes were identifed by their The Diferential Gene Expression analy- p-adjusted values and log2 fold change. The de- sis was conducted by contrasting spon- termined threshold for the adjusted p-value was taneous regression with progressive CLL 0.01 and the log2 fold change was ±1 in non-cod- samples. Separate studies were performed ing RNAs and ±1.5 in protein-coding genes. for protein-coding and non-protein cod- ing transcripts to understand the mecha- nisms involved in spontaneous regression.

19 Columbia Undergraduate Science Journal Vol. 15, 2021 Dolaner et. al.

ic scatter plot, it was obvious that there was an improvement in principal components, and the two groups were clearly separated from each other (Figure 3B). After the observed variation between the two groups, the reason for this diference and its impacts could be investigated at the level of gene regulation.

Figure 2: Diferential Gene Expression Analysis Pipeline Using the T-Bioinfo Server

Gene Ontology Analysis The enrichment analysis was done via us- ing the Enrichr platform for the protein-cod- ing genes [29]. Respectively, the GO and KEGG pathways were considered for the observed upregulated and downregulated Figure 3. Principal Component Analysis, (A) Healthy samples versus Diseased samples genes in spontaneous regression samples. (B) Spontaneous Regression (Red) versus Pro- gressive Samples (Purple) Data Visualization The gene expression patterns were observed Revealed Pathways Involved in Biogenesis by heatmaps. Furthermore, PCA and H-Clus- and Processing of RNA tering were repeated with the selected signif- Based on DESeq2 analysis, expression levels icant RNA transcripts to make a comparison of 870 protein-coding genes were identifed as as before and after, and see an improvement signifcantly diferent between the two groups on the basis of variance. Visualization was with p-adjusted values (<0.01) and log2 fold performed independently for protein-cod- change (+/- 1.5) (Supplementary Table 1). The ing genes and non-protein coding genes. heatmap of the protein-coding genes shows the gene expression patterns between spon- RESULTS taneous regression and progressive samples Variation is Detected Between Sponta- (Figure 4A). It is obvious that most of the genes neous Regression and the Progressive were upregulated in spontaneous regression Samples while they downregulated in progressive ones. The exploratory analysis using the PCA revealed To understand the biological importance of path- that there was a variation between the spon- ways, analysis was performed. taneous regression and the progressive state. Interestingly, although there were diferent path- Based on Figure 3A, it can be seen ways in upregulated genes, many represented that healthy samples and the diseased sam- biological pathways involved in biogenesis and ples are well separated, which means there processing of RNA, and mRNA translation (Fig- is signifcant variation between these groups. ure 4B). In addition, identifed pathways also When the spontaneous regression and pro- included ribosome-related pathways, which im- gressive samples were examined in a specif- ply translation of protein-coding RNAs is afect- ed. The same concept could be detected with

20 Columbia Undergraduate Science Journal Vol. 15, 2021 Dolaner et. al.

A B

C

D

Figure 4. Data visualization of protein-coding genes (A) Gene expression patterns of protein-coding genes. Heatmap exposes the downregulated and upreg- ulated genes among the samples (B) Gene ontology analysis of protein-coding genes. Pathway analysis shows the downregulated and upregulated pathways in the spontaneous regression samples, respectively. Arrows indicate the RNA related pathways (C) Representation of principal component analysis with 3D scatter plots. First PCA shows the separation of spontaneous regression (SR) and progressive (P) samples before the selection of the protein-coding genes, second PCA represents the separation of SR and P samples after the selection of the protein-cod- ing genes, and last PCA indicates the true variability source considering four outliers (D) Hierarchical Clustering results as dendrograms. Each dendrogram point out the related scatter plots that are placed above and reveal the clustering perspective. Red boxes indicate the SR samples, and the other ones are the P samples

21 Columbia Undergraduate Science Journal Vol. 15, 2021 Dolaner et. al. the downregulated genes as presented in Fig- Figure 5B that the two groups cluster separately ure 4B. Consequently, these fndings led this in the dendrogram, which implies these signif- research to investigate the non-coding RNAs, cant genes might predict tumor characteristics. which was the second branch of this study. To be able to observe the gene ex- pression pattern of the lncRNAs, a heatmap There was a Diversity Among the Sponta- was generated. As shown in Figure 5C, most neous Regression Samples of the genes were upregulated in progressive According to the frst principal component, spon- samples compared to spontaneous regression. taneous regression had a lot of variability within So far, pathway analysis of diferential- its own group (Figure 4C). This variation could ly expressed protein-coding genes in sponta- be observed from the dendrograms in Figure neous regression samples has demonstrated 4D. Correspondingly, spontaneous regression that global gene expression in these samples samples had two diverse groups. The clinical might be modulated at the transcriptional data of these samples, therefore, needed to be and post-transcriptional level, and non-pro- examined to interpret them as outliers or true tein coding genes have shown that lncRNAs variability sources. However, no diference was are diferentially expressed in these sam- detected between the four outliers and the oth- ples. To make a biologically relevant inter- er samples. This meant the diference had to pretation, the analysis will regard lncRNAs be in their biological background. That is why identifed as statistically most signifcant. these four diverse samples were selected in comparison with progressive samples to see DISCUSSION the most signifcant diference between the two groups in the context of expression patterns. Although spontaneous tumor regression is As a result, it can be seen in Figure 4C rare, it is a phenomenon that can be observed that there was a signifcant improvement with in some types of cancer, such as neuroblas- the well-separated two groups after the selec- toma, renal cell carcinoma, lung cancer, lym- tion of true variability source. Additionally, bet- phoma, and leukemia [30]. To be able to in- ter separation could also be observed from the crease the occurrence rate of this mechanism, dendrogram in Figure 4D that four samples and detailed studies should be conducted, and the progressive samples have distinct branch- the related pathways should be examined. es compared to the remaining twelve samples. In summary, we used RNA-seq datasets from CLL patients containing a total of 16 spon- Identifed Non-Coding RNAs were Novel taneous regression and 8 progressive state Transcripts samples. First of all, we looked for variation Based on diferential gene expression analysis, and pattern among these two groups by using 33 lncRNAs, which are signifcantly expressed, PCA and H-Clustering based on gene expres- were identifed with the specifc parameters sion. After the detection of variation, we hypoth- (Supplementary Table 2). Each lncRNA was in- esized that the reason for this diference should vestigated by their Ensembl ID and identifed as be at the gene expression level. Therefore, we novel transcripts. When the PCA was repeated applied diferential gene expression analysis by with selected lncRNAs, clear separation could comparing spontaneous regression and pro- be seen among the spontaneous regression gressive tumor states. We identifed 870 pro- and the progressive samples (Figure 5A). Even tein-coding genes which were signifcantly dif- though there was no dramatic improvement in ferentially expressed and also were associated principal components, hierarchical clustering with RNA-related pathways based on gene on- results were highly distinctive. It can be seen in tology analysis. Depending on these fndings, we also investigated non-coding RNAs and

22 Columbia Undergraduate Science Journal Vol. 15, 2021 Dolaner et. al.

A

B

C

Figure 5. Data visualization of non-coding RNAs (A) Representation of principal component analysis with 3D scatter plots. Initial PCA indicates the separa- tion of spontaneous regression (SR) and progressive (P) samples before the selection of the non-coding RNAs and second PCA shows the separation of SR and P samples after the selection of the non-coding RNAs (B) Hierarchical Clustering results as dendrograms. Each dendrogram specify the related scatter plots that are located above and expose the clustering perspective. Red boxes demonstrate the SR samples, and the other ones are the P samples (C) Gene expression patterns of non-coding RNAs. Heatmap represents the downregulated and upregu- lated genes between the samples 23 Columbia Undergraduate Science Journal Vol. 15, 2021 Dolaner et. al. identifed signifcantly expressed 33 lncRNAs. According to Shi et al. (2020), lncRNA This paper has highlighted that new therapeutic H1FX-AS1 is downregulated in cervical can- strategies may involve lncRNAs to trigger the cer, and low expression is correlated with poor spontaneous regression phenomena in can- prognosis and linked to tumor size as well as cer cells. Since the lncRNAs can regulate the metastasis. In silico analysis predicted that the important cellular mechanisms during cancer possible target of the H1FX-AS1 was the miR- progression, identifcation of disease-specifc 324-3p, and it was binding and regulating the lncRNAs may enlighten the way of new treat- DACT1 [33]. Furthermore, the same research ment strategies. Therefore, a detailed literature included overexpression studies, which re- review for each lncRNA and their sense mR- vealed that the high expression levels of this NAs had been done. Even though the detailed lncRNA signifcantly reduced the proliferation explanation can be found in Supplementary and invasiveness, and activated the apoptosis Table 2, identifed lncRNAs are involved in sev- pathways. H1FX-AS1, therefore, is identifed eral pathways including tumor growth, metas- as a tumor suppressor gene in cervical cancer tasis, cell survival, and regulation of the tumor [33]. On the contrary, high expression levels of microenvironment such as the immune system. lncRNA H1FX-AS1 are associated with a poor Previous studies showed that identifed prognosis in gastric cancer. It is predicted that two lncRNAs, the PRRT3-AS1 and H1FX-AS1, this lncRNA was related to epithelial to mesen- were focused on cancer progression [31, 32, chymal transition (EMT) and metastasis path- 33, 34]. Both lncRNAs were upregulated in the ways [34]. Additionally, in-silico prediction anal- progressive state compared to spontaneous re- ysis indicated several targets of the H1FX-AS1 gression, which indicates that these lncRNAs lncRNA, such as H1FX, COPG1, and MIR6826 were functioning as tumor enhancers in CLL that can be used as potential therapeutic tar- cells. Li et al. (2020) showed that the lncRNA gets in gastric cancer [34]. Since H1FX-AS1 PRRT3-AS1 is upregulated in prostate cancer is upregulated in the progressive state of CLL, and targets the PPARγ gene by binding its 3’ the precise roles of this lncRNA should be ex- end, which leads to regulation of the Akt/mTOR amined for CLL cells. Besides, considering the signaling pathway [31]. The same study also re- studies on gastric cancer, downregulation of vealed that the down-regulation of PRRT3-AS1 H1FX-AS1 may trigger the spontaneous re- inhibits prostate cancer progression by regulat- gression through inactivation of metastasis. ing cell proliferation, migration, and apoptosis. As the other 31 lncRNAs were novel Moreover, knocking down this lncRNA triggers transcripts, several examples could be consid- autophagy via the mTOR pathway [31]. Anoth- ered by examining the sense protein-coding er study suggested that PRRT3-AS1 is esti- genes of the lncRNAs to reveal the signifcance mated as an immune-related lncRNA involved and possible functions of these lncRNAs. in PPAR signaling and can be used as a poten- Thereby, the specifed lncRNAs and their tial target in glioblastoma [32]. In light of these sense protein-coding genes can be targeted fndings, it can be said that the PRRT3-AS1 for further analysis and treatment strategies. functions as a tumor promoter gene in prostate The frst one is the lncRNA PT- cancer and glioblastoma. Since it is revealed PN22-AS1 that was upregulated in the sponta- that this lncRNA is highly expressed in the neous regression samples. It is known that the progressive state of CLL, further studies may B cell receptor signaling is vital for the growth clarify the related pathways and novel targets. and survival of the leukemic cells [35]. These Moreover, downregulation of this lncRNA might signaling pathways can be stimulated with di- promote spontaneous regression by repressing verse tyrosine kinases as well as phospha- cell proliferation and inducing apoptosis. tases. PTPN22 is a protein tyrosine phospha-

24 Columbia Undergraduate Science Journal Vol. 15, 2021 Dolaner et. al. tase specifcally expressed in immune cells. It Another lncRNA is the SYNGAP1-AS1 can function as an enhancer or suppressor of which is upregulated in progressive samples. BCR and TCR signaling by regulating phos- Mutant RAS genes stimulate the GTP-bound phorylation status [36]. According to studies, state which constitutively activates RAS signal- the PTPN22 gene is upregulated in CLL cells, ing in metastatic cells [40]. RasGAPs inactivate and this upregulation results in attenuation of RAS signaling by converting active GTP-bound the apoptosis signals produced by the BCR and into its inactive state [41]. SYNGAP1, also stimulation of the AKT activity that generates a known as RASA5, is a member of the RasGAP survival signal [35]. This regulation reveals that family and functions as a tumor suppressor the cancer cells which express autoreactive gene [42, 43]. Studies proved that the RASA5 BCRs can escape from apoptosis by upregu- gene is epigenetically disrupted in multiple can- lating the PTPN22 gene. Therefore, downreg- cer types by promoter methylation, and gain of ulation of this gene through the lncRNA PT- function assays exposed that RASA5 expres- PN22-AS1 at multiple levels may trigger the sion leads to a reduction in metastasis by regu- spontaneous regression in CLL by eliminating lating EMT and cell stemness [43]. As it is known autoreactive B cells in the context of immune that the lncRNAs can regulate gene expression tolerance. So, if the upregulation of this lncRNA at the epigenetic level through methylation, this can be induced by an external factor, it could epigenetic silencing may be due to the lncRNA be used as a therapeutic target for CLL cells. SYNGAP1-AS1. So, the upregulation of the The next lncRNA would be PCF11-AS1, SYNGAP1-AS1 can knock down this gene at and it is also upregulated in spontaneous regres- the epigenetic level by methylating the promot- sion. Alternative polyadenylation (APA) leads to er region leading to aggressive tumor progres- transcription of diverse isoforms at the RNA 3’ sion in CLL patients. Even though further stud- end that afects the functioning of encoded pro- ies are essential, targeting this lncRNA may teins. This mechanism needs multicomponent also trigger spontaneous regression through protein complexes, and one of the protein com- metastasis and cell stemness pathways, and plexes is called CFII [37]. This complex com- enhance our understanding of this mechanism. prises the PCF11 gene that is a cleavage and As a result, the deduction that can be made polyadenylation factor subunit, and regulates from these exemplary lncRNAs is that the the transcription termination and RNA 3’ end detailed examination of the identifed novel maturation. Studies show that this gene partic- transcripts can be used to reveal the mecha- ularly regulates the alternative polyadenylation nisms that lead to spontaneous regression, of the genes associated with the WNT pathway as well as to identify these lncRNAs as tar- as an oncogene in neuroblastoma cells [38]. gets for small molecule inhibitors or activators Accordingly, the downregulation of this gene so that the spontaneous regression mecha- results in diminished cell growth as well as in- nism can be triggered in other cancer types. vasiveness. Interestingly, one of the studies indicates that spontaneously regressed neuro- CONCLUSION blastomas express the PCF11 gene at low lev- els compared to highly progressive neuroblasto- These fndings suggest that regulation through mas [39]. Basically, downregulation of this gene the lncRNAs might have a major role in cells’ at the epigenetic or post-transcriptional level fate, and their detailed examination can enlight- through the lncRNA PCF11-AS1 may induce en the way of discovery of the possible ther- the spontaneous regression in CLL via ceasing apeutic targets. In prospective studies, each the cell growth. Thus, the upregulation of this lncRNA should be investigated to perceive their lncRNA can be used as a therapeutic strategy. functional pathways by over-expression and suppression studies in various cancer types 25 Columbia Undergraduate Science Journal Vol. 15, 2021 Dolaner et. al. in order to understand their specifc roles. Ad- ods section, can be found in the following link. ditionally, the following study which considers https://www.dropbox.com/s/mlo17y1yjdxfekm these fndings should combine the protein-cod- ing genes and non-protein coding genes. The list of the signifcantly diferentially ex- Since these two groups revealed signifcant pressed protein-coding genes can be found in results independently, their combination can Supplementary Table 1 through the following link. lead to future selection and determination of https://www.dropbox.com/s/zc1oaryr3zyhxlf particular pathways which afected each other. This study had certain limitations that Supplementary Table 2, which contains need to be overcome to conduct more efcient the name, Ensembl ID, and the pos- studies. Spontaneous regression samples sible functions of the 33 signifcant ln- had diverse biological backgrounds, which in- cRNAs, can be found in the following link. creased the demand for the detailed investi- https://www.dropbox.com/s/4p67jf4d54if2b8 gation of these samples. Furthermore, it was problematic to identify the signifcant path- ACKNOWLEDGMENTS ways through diferential gene expression al- This research was performed within the frame- gorithms due to this variability. Finally, as the work of a research fellowship program and amount of the samples was limited, more com- was supported by Pine Biotech. Data pro- prehensive research is required to verify these cessing and visualization pipelines were run fndings and make a distinctive interpretation. with the T-Bioinfo Server. I would like to thank Elia Brodsky, Dr. Harpreet Kaur, Dr. Mohit AUTHOR INFORMATION Mazumder, and Dr. Julia Panov for their ex- pertise and guidance throughout the project. Corresponding Author *Simay Dolaner (College of Engineering and Natural Sciences, Department of Molecular ABBREVIATIONS Biology and Genetics, Bahcesehir University, CLL: Chronic Lymphocytic Leukemia Istanbul, Turkey) NCBI: National Center for Biotechnology Infor- [email protected] mation BCR: B Cell Receptor Author Contributions TCR: T Cell Receptor Simay Dolaner performed the data analysis PCA: Principal Component Analysis and produced the fgures and manuscript. Dr. H-Cluster: Hierarchical Clustering Harpreet Kaur provided technical assistance GO: Gene Ontology with analysis and pipelines. Elia Brodsky and KEGG: Kyoto Encyclopedia of Genes and Dr. Mohit Mazumder added project direc- Genomes tion and provided guidance. Dr. Julia Pan- lncRNA: Long Non-Coding RNA ov provided expert feedback for the project. PRRT3-AS1: Antisense to PRRT3 H1FX-AS1: Antisense to H1FX Competing Interests PTPN22-AS1: Antisense to PTPN22 The authors declare no competing fnancial PCF11-AS1: Antisense to PCF11 and non-fnancial interests. SYNGAP1-AS1: Antisense to SYNGAP1 EMT: Epithelial to Mesenchymal Transition

SUPPLEMENTARY DOCUMENTS REFERENCES Supplementary Notes 1, which includes external [1] American Cancer Society. Cancer Statistics sources and related links mentioned in the Meth- Center. https://cancerstatisticscenter.cancer.

26 Columbia Undergraduate Science Journal Vol. 15, 2021 Dolaner et. al. org. 2020 ing RNAs as Cancer Hallmarks in Chronic Lymphocytic Leukemia. Interna-tional Jour- [2] A. A. Ferrando, C. López-Otín, Clonal evo- nal of Molecular Sciences, 21(18), 6720–. lution in leukemia. Nature Medicine, 23(10), doi:10.3390/ijms21186720 (2020). 1135–1145. doi:10.1038/nm.4410 (2017). [11] J. Fernandes, et al., Long Non-Coding [3] M. V. Haselager, A. P. Kater, E. Eldering, RNAs in the Regulation of Gene Expres-sion: Proliferative Signals in Chronic Lymphocytic Physiology and Disease. Non-coding RNA, Leukemia; What Are We Missing? Frontiers 5(1), 17. https://doi.org/10.3390/ncrna5010017 in Oncology, 10(), 592205–. doi:10.3389/ (2019). fonc.2020.592205 (2020). [12] L. Nobili, D. Ronchetti, E. Taiana, A. Neri, [4] G. Fabbri, R. Dalla-Favera, The molecular Long non-coding RNAs in B-cell malig-nan- pathogenesis of chronic lymphocytic leukae- cies: a comprehensive overview. On-cotar- mia. Nature Reviews Cancer, 16(3), 145–162. get, 8(36), –. doi:10.18632/oncotarget.17303 doi:10.1038/nrc.2016.8 (2016). (2017).

[5] D. L. Longo, J. A. Burger, Treatment of [13] M. Aprile, V. Katopodi, E. Leucci, V. Chronic Lymphocytic Leukemia. New En- Cos-ta, LncRNAs in Cancer: From garbage gland Journal of Medicine, 383(5), 460–473. to Junk. Cancers, 12(11), 3220. https://doi. doi:10.1056/NEJMra1908213 (2020). org/10.3390/cancers12113220 (2020).

[6] M. Kwok et al., Integrative analysis of [14] M. Gourvest, P. Brousset, M. Bousquet, spontaneous CLL regression highlights genet- Long Noncoding RNAs in Acute Myeloid Leu- ic and microenvironmental interdependency in kemia: Functional Characterization and Clini- CLL. Blood, 135 (6): 411–428. doi: https://doi. cal Relevance. Cancers, 11(11), 1638. https:// org/10.1182/blood.2019001262 (2020). doi.org/10.3390/cancers11111638 (2019).

[7] J. Delgado, F. Nadeu, D. Colomer, E. Cam- [15] X. Zhang, et al., Mechanisms and Func- po, Chronic lymphocytic leukemia: from molec- tions of Long Non-Coding RNAs at Multi-ple ular pathogenesis to novel therapeutic strat- Regulatory Levels. International jour-nal of egies. Haematologica, 105:xxx doi:10.3324/ molecular sciences, 20(22), 5573. https://doi. haematol.2019.236000 (2020). org/10.3390/ijms20225573 (2019).

[8] T. J. Kipps, M. Y. Choi, Targeted Therapy in [16] M. A. Diamantopoulos, P. Tsiakanikas, A. Chronic Lymphocytic Leukemia. Cancer jour- Scorilas, Non-coding RNAs: the riddle of the nal (Sudbury, Mass.), 25(6), 378–385. https:// transcriptome and their perspectives in can- doi.org/10.1097/PPO.0000000000000416 cer. Annals of translational medi-cine, 6(12), (2019). 241. https://doi.org/10.21037/atm.2018.06.10 (2018). [9] F. Bosch, R. Dalla-Favera, Chronic lym-phocytic leukaemia: from genetics to treat- [17] J. Gao, F. Wang, P. Wu, Y. Chen, Y. Jia, ment. Nat Rev Clin Oncol, 16, 684–701 https:// Aberrant LncRNA Expression in Leuke-mia. doi.org/10.1038/s41571-019-0239-8 (2019). Journal of Cancer, 11(14), 4284–4296. https:// doi.org/10.7150/jca.42093 (2020). [10] L. Fabris, J. Juracek, G. Calin, Non-Cod-

27 Columbia Undergraduate Science Journal Vol. 15, 2021 Dolaner et. al.

[18] Q. Tang, S. S. Hann, HOTAIR: An On- Ge-nome Biol 15, 550 (2014). https://doi. co-genic Long Non-Coding RNA in Human org/10.1186/s13059-014-0550-8 Cancer. Cellular Physiology and Biochem-istry, (), 893–913. doi:10.1159/000490131 (2018). [27] Y. Benjamini, Y. Hochberg, Controlling the False Discovery Rate: A Practical and Pow- [19] Y. Chi, D. Wang, J. Wang, W. Yu, J. Yang, erful Approach to Multiple Testing. Journal of Long Non-Coding RNA in the Pathogene-sis the Royal Statistical Society: Series B (Meth- of Cancers. Cells, 8(9), 1015. https://doi. odological), 57: 289-300 (1995). https://doi. org/10.3390/cells8091015 (2019). org/10.1111/j.2517-6161.1995.tb02031.x

[20] J. Iwakiri, G. Terai, M. Hamada, Compu- [28] A. Subramanian, et al., Gene set en- ta-tional prediction of lncRNA-mRNA inter-ac- rich-ment analysis: A knowledge-based ap- tions by integrating tissue specifcity in human proach for interpreting genome-wide ex-pres- transcriptome. Biology Direct 12, 15 https://doi. sion profles. Proceedings of the Na-tional org/10.1186/s13062-017-0183-4 (2017). Academy of Sciences, 102(43), 15545–15550. doi:10.1073/pnas.0506580102 (2005). [21] G. M. Cruz-Miranda, et al., Long Non-Coding RNA and Acute Leuke-mia. Inter- [29] M. V. Kuleshov, et al., Enrichr: a com- national journal of molecular sciences, 20(3), pre-hensive gene set enrichment analysis web 735. https://doi.org/10.3390/ijms20030735 server 2016 update. Nucleic acids re-search, (2019). 44(W1), W90–W97. https://doi.org/10.1093/ nar/gkw377 (2016). [22] G. Arun, S. D. Diermeier, D. L. Spector, Therapeutic Targeting of Long Non-Coding [30] P. Ghatalia, C. J. Morgan, G. Sonpavde, RNAs in Cancer. Trends in molecu-lar medi- Meta-analysis of regression of advanced solid cine, 24(3), 257–277. https://doi.org/10.1016/j. tumors in patients receiving placebo or no molmed.2018.01.001 (2018). anti-cancer therapy in prospective trials. Crit- ical Reviews in Oncolo-gy/Hematology, 98(), [23] B. Langmead, S. Salzberg, Fast gapped- 122–136. doi:10.1016/j.critrevonc.2015.10.018 read alignment with Bowtie 2. Nat Meth-ods (2016). 9, 357–359 (2012). https://doi.org/10.1038/ nmeth.1923 [31] L. Fan, H. Li, W. Wang, Long noncoding RNA PRRT3-AS1 silencing inhibits pros-tate [24] J. Lever, M. Krzywinski, N. Altman, Points cancer cell proliferation and promotes apopto- of Signifcance: Principal component analy- sis and autophagy. Experimental Physiology, sis. Nature Methods, 14(7), 641–642 (2017). (), EP088011–. doi:10.1113/ep088011 (2020). doi:10.1038/nmeth.4346 [32] X. Li, M. Yutong, Immune-Related lncRNA [25] N. Altman, M. Krzywinski, Points of Risk Signatures Predict Survival of IDH Wild- Sig-nifcance: Clustering. Nature Meth- Type and MGMT Promoter Unmethylated ods, 14(6), 545–546 (2017). doi:10.1038/ Glioblastoma, BioMed Re-search Internation- nmeth.4299 al, vol. 2020, Article ID 1971284, 11 pages, https://doi.org/10.1155/2020/1971284 (2020). [26] M. I. Love, W. Huber, S. Anders, Mod- erat-ed estimation of fold change and dis- [33] X. Shi, et al. A newly identifed lncRNA per-sion for RNA-seq data with DESeq2. H1FX-AS1 targets DACT1 to inhibit cervi-cal

28 Columbia Undergraduate Science Journal Vol. 15, 2021 Dolaner et. al. cancer via sponging miR-324-3p. Can-cer Cell synapse connectivity, synaptic in-hibition and Int 20, 358 https://doi.org/10.1186/s12935- cognitive function. Nature communications, 7, 020-01385-7 (2020). 13340. https://doi.org/10.1038/ncomms13340 (2016). [34] E. Delshad, et al. In-silico identifcation of novel lncRNAs with a potential role in di-ag- [41] O. Maertens, K. Cichowski, An expand- nosis of gastric cancer, Journal of Bio-molec- ing role for RAS GTPase activating proteins ular Structure and Dynamics, (), 1–9. DOI: (RAS GAPs) in cancer. Advances in Bio-log- 10.1080/07391102.2019.1624615 (2019). ical Regulation, 55(), 1–14. doi:10.1016/j. jbior.2014.04.002 (2014). [35] R. Negro, et al., Overexpression of the au-toimmunity-associated phosphatase [42] M. Berryer, et al. Decrease of SYNGAP1 PTPN22 promotes survival of antigen-stim- in GABAergic cells impairs inhibitory syn-apse ulated CLL cells by selectively activat-ing connectivity, synaptic inhibition and cognitive AKT. Blood, 119(26), 6278–6287. https://doi. function. Nature Communica-tions 7, 13340 org/10.1182/blood-2012-01-403162 (2012). https://doi.org/10.1038/ncomms13340 (2016).

[36] R. Cubas, et al., Autoimmunity linked pro- [43] L. Li, et al., Tumor Suppression of Ras tein phosphatase PTPN22 as a target for can- GTPase-Activating Protein RASA5 through cer immunotherapy. Journal for Im-munoTher- Antagonizing Ras Signaling Perturbation apy of Cancer, 8(2), e001439–. doi:10.1136/ in Carcinomas. iScience, 21, 1–18. doi: jitc-2020-001439 (2020). 10.1016/j.isci.2019.10.007 (2019).

[37] S. W. Yang, et al., A Cancer-Specifc Ubiq-uitin Ligase Drives mRNA Alternative Pol-yadenylation by Ubiquitinating the mRNA 30 End Processing Complex, Molecular Cell, 77(6), 1206–1221.e7. https://doi.org/10.1016/j. molcel.2019.12.022 (2020).

[38] J. Nourse, S. Spada, S. Danckwardt, Emerging Roles of RNA 3′-end Cleavage and Polyadenylation in Pathogenesis, Di-agnosis and Therapy of Human Disorders. Biomole- cules, 10(6), 915. doi:10.3390/biom10060915 (2020).

[39] A. Ogorodnikov, et al., Transcriptome 3′end organization by PCF11 links alterna-tive polyadenylation to formation and neu-ronal dif- ferentiation of neuroblasto-ma. Nature commu- nications, 9(1), 5331. https://doi.org/10.1038/ s41467-018-07580-5 (2018).

[40] M. H. Berryer, et al., Decrease of SYN- GAP1 in GABAergic cells impairs in-hibitory

29