Table of Contents

Abstract iv Acknowledgments vi

Chapter 1: Introduction

Long non-coding RNAs 1 Functions of long non-coding RNAs 4 Nuclear lncRNAs 4 Cytoplasmic lncRNAs 6 Long non-coding RNAs in cancer 9 Long non-coding RNAs in gliomas 12 Oncogenic long non-coding RNAs in gliomas 14 Tumor suppressor long non-coding RNAs in gliomas 17 References 19 Figures 33

Chapter 2: The functional role of APTR long non-coding RNA

Abstract 35 Introduction 36 Results 38 Discussion 46 Materials and Methods 51 References 56 Figures 61

Chapter 3: Adapted from “LINC00152 promotes invasion in glioblastomas through a hairpin structure located near its 3’ end”

My contribution to the paper 80 Abstract 82 Introduction 83 Results 84

ii

Discussion 94 Materials and Methods 100 References 103 Figures 113

Chapter 4: Closing Remarks and Future Directions

Alternative methods for discovering APTR interacting partners 136 APTR processing into different fragments 138 APTR and LINC00152 secondary structure 139 Identifying LINC00152 interactor 140 APTR as a regulator of expression 142 Molecular functions of LINC00152 144 Clinical potentials of APTR and LINC00152 145 References 146 Figures 152

iii

Abstract

Since the advent of high-throughput sequencing methods, long non-coding RNAs

(lncRNAs) have been emerging as important players in physiological and pathological conditions. Among them, APTR (Alu-mediated p21 transcriptional regulator) is a lncRNA upregulated in tumors. This RNA regulates the cell cycle by recruiting the PRC2 protein complex (Polycomb Repressive Complex 2) to the promoter region of p21 gene, inducing its suppression. In addition, APTR and p21 expressions are anti-correlated in glioblastoma patient samples. Exploring the possibility that APTR interact with other genomic sites and recruit other binding , the present work shows that APTR interacts with more than 2,000 genomic loci. By intersecting a publicly available EZH2 dataset, I found that 115 DNA APTR bound sites are marked with EZH2. Moreover, a microarray study revealed that APTR regulates the expression of 274 , of which 43 are nearby an APTR associated DNA. Another pull-down assay uncovered 70 potential binding partners for APTR. BRD7 (Bromodomain-containing protein 7) and NFIC (Nuclear

Factor I C) RNA Immunoprecipitation confirmed APTR interaction with these two transcription factors.

This study also identified another lncRNA, LINC00152, as upregulated in 12 types of tumor, including glioblastomas. In addition, its upregulation is indicative of worse prognosis in 9 cancer types. LINC00152 knockdown decreases GBM cells anchorage- independent proliferation and cellular invasion. While overexpression induces these phenotypes. RNA-seq experiment revealed that LINC00152 controls the expression of genes involved in epithelial-to-mesenchymal transition. A protein bound stem-loop at the

3’ end of LINC00152, identified by PARIS, Ribo-seq and secondary structure prediction

iv data, is sufficient to increase invasion of glioblastoma cell lines. Overexpression of

LINC00152 stem-loop mutants suggests that stem formation in the hairpin is essential for

LINC00152 function.

Finally, because of their high expression in glioblastomas, APTR and LINC00152 may serve as biomarkers for development and progression of patients with this cancer.

In addition, since it is upregulated in 12 different tumors and its upregulation indicates worse outcomes in 9 of them, LINC00152 may serve as a biomarker for patient prognosis.

v

Acknowledgments

I would like to start thanking my parents, Soraia and Claudio, for encouraging me to follow my dreams and ambitions. You are my role models.

I also want to thank my sister, Juliana, and my girlfriend, Maraysa, for all the love and encouragement during all these years.

I thank, Dr. Dutta, for mentoring me during this process. Your passion for science is inspiring.

I also thank my fellow lab mates for the insights, advices and stimulating discussions.

I express my sincere gratitude to the Brazilian government, particularly to the

Coordination for the Improvement of Higher Level Personnel (CAPES BEX 0320-13-7) for the financial support.

vi

Chapter 1: Introduction

Long non-coding RNAs

For a long period of time, it was accepted that the RNA molecule was solely an intermediate in the genetic information flow. This central dogma of molecular biology stated that proteins were the main effectors of the cellular molecular processes [1]. Less than 5% of the genome was believed to be transcribed with the vast majority of the RNA arising from protein-coding genes. Recently, however, there was a paradigm shift thanks to the advent of whole genome and transcriptome sequencing techniques. High- throughput sequencing methods evidenced that more than 70% of the is transcribed, furthermore, the ENCODE (Encyclopedia of DNA Elements) project showed that the majority of the human genome is expressed in the form of non-coding RNAs

(ncRNAs) [2-4].

The ncRNAs are often categorized according to the size of the transcript. Thus, ncRNAs containing less than 200 base pairs are considered small ncRNAs, such as microRNAs. Accordingly, the long non-coding RNAs (lncRNAs) are those where transcripts reach more than 200 base pairs [5]; although, many of them are larger than

2kb and their average length is 500bp. Many of these molecules are transcribed by RNA pol II (RNA polymerase II), are polyadenylated and may be found either in nuclear or cytosolic fractions [6,7]. From the evolutionary point of view, although still conserved, most of reported lncRNAs are less conserved than protein-coding genes [7,8]. Suggesting that the structure conservation or simply the act of expression itself is sufficient for the biological functions of lncRNAs.

1

In addition, genes for lncRNAs have the same active chromatin marks as protein- coding genes [9]. Moreover, according to the NONCODE database, the number of lncRNAs is larger than the number of protein-coding RNAs, around 82,300 transcripts. In contrast, there are more than 172,000 lncRNA transcripts clustered into 96,308 gene loci in the human genome [10,11].

Even though generally expressed at lower levels than protein-coding mRNAs [12], lncRNAs, once thought to be transcriptional noise in the genome [13], have been described as important regulators of multiple biological processes, such as organ and tissue development, cellular differentiation, reprogramming of stem cells and metabolic processes [14-17].

In this context, the first two lncRNAs to be discovered were H19 and XIST. H19, was discovered in the early 1980s by Pachnis et al. [18]. In 1990, Brannan and collaborators [19] revealed that the lncRNA H19 is 2.3kb in length, spliced, polyadenylated and is highly expressed in embryonic tissues. They also showed that, during the embryonic development H19 is paternally imprinted and maternally expressed.

Over time, other groups have reported that until 8 weeks of gestation both alleles are expressed, but by the time the embryo reaches 10 weeks, there is only maternal H19 expression [20,21]. Postnatally, H19 is silenced in almost all tissues except for skeletal muscle [22], where it appears to have an important role as primary microRNA for the myogenic miRNAs, miR-675-5p and miR-675-3p [23]. In addition, H19 is evolutionarily conserved in humans and mice [24] and overexpression of H19 is lethal in embryonic mice [22].

2

Subsequently, Brown et al. [25] reported XIST (X Inactive Specific Transcript), a

~19kb lncRNA, which plays an important role in the X inactivation. During female development, XIST entirely covers one of the X and initially thought to promote the silencing of gene expression on the inactive X chromosome by bringing the PRC2 (Polycomb Repressive Complex 2), which promotes H3K27me3 (histone H3 lysine 27 trimethylation) to the chromosomal DNA [26]. More recently X inactivation by

XIST was reported to be dependent on SHARP (SMRT and HDAC Associated Repressor

Protein) and HDAC3 (Histone Deacetylase 3) [27]. Meanwhile, on the future active X chromosome, TSIX, XIST antisense lncRNA, prevents the XIST coating of the chormosome [28].

Other lncRNAs have been described as parts of protein containing machines. The rRNA, structural and catalytic components of ribosomes, have been known for quite some time [29]. Similarly, telomerase RNA component (TERC) was found as part of the telomerase complex and reported to provide the crucial template on which the telomeric repeats are extended by the telomerase catalytic protein [30].

Since the discovery of rRNAs, H19, XIST and TERC the advent of high throughput sequencing has led to the discovery of thousands of other lncRNAs. Moreover, a search on the pubmed shows that the number of published articles on lncRNAs has exponentially increased over the last decade from 148 publications in 2008 to 3,068 publications in

2017 (Figure 1). Elucidating the role of a given lncRNA in a specific cellular process and the mechanism of its action is of great interest. To this end, as an initial framework to understand their functions, it is convenient to separate lncRNAs into different categories.

3

Functions of long non-coding RNAs

LncRNAs can be subclassified into four classes according to their genomic intersection with protein-coding genes: (1) intergenic, which do not intersect with any protein-coding loci; (2) overlapping, if a protein-coding gene is transcribed from an intron of a lncRNA; (3) exonic, if at least one of its exons intersects a protein-coding exon; and

(4) intronic, in which a lncRNA is completely contained within an intron of a protein-coding gene [31].

However, this classification does not help discern function, and thus, a more attractive way to categorize lncRNAs is to divide them according to their subcellular localization. In these terms, nuclear lncRNAs can work as regulators of gene expression, as molecular decoys and structural components of nuclear compartments. In contrast, in the cytoplasm lncRNAs mainly function as regulators of mRNA and protein stability, as competing endogenous RNAs (ceRNAs), and as molecular decoys.

Nuclear lncRNAs

Most of lncRNAs located in the nucleus are known to regulate gene expression by interacting with and recruiting transcription activators or repressors to gene promoters.,

These transcripts can be broadly divided into cis acting when they influence the expression of nearby genes, or trans acting if lncRNAs leave the site of transcription and regulate the expression of a distant gene. The Hox transcript antisense intergenic RNA

(HOTAIR) is an example of a trans acting lncRNA. This molecule is upregulated in various types of cancer, including breast cancer, where it promotes cancer metastasis [32-34].

HOTAIR is transcribed from chromosome 12q13.13 and functions by recruiting the

4

Polycomb Repressive Complex 2 (PRC2) [32,35] to repress the expression of JAM2

(Junctional Adhesion Molecule 2), PCDH10 (Protocadherin 10) and PCDHB5

(Protocadherin Beta 5), genes located at chromosomes 21q21.3, 4q28.3 and 5q31.3, respectively [32]. On the other hand, the long non-coding RNA lincRNA-p21 [36] acts in cis to regulate the expression of its neighboring gene, p21 [37]. At this genomic loci lincRNA-p21 recruits HNRNPK (Heterogeneous nuclear ribonucleoprotein K) which acts together with p53 to induce p21 expression. Moreover, lincRNA-p21−/− cells replicate faster compared to lincRNA-p21+/+ cells. Consistent with this, when lincRNA-p21 is depleted there is an increase of the S-phase cell population after DNA damage [37].

Nuclear lncRNAs can be further categorized as eRNAs (enhancer RNAs) when the lncRNA is transcribed from an enhancer region, a DNA sequence where transcription activators associate to increase transcription of a gene [38]. As the name suggests, eRNAs act in cis to regulate the expression of an adjacent gene. A recent example of this kind is MUNC (MyoD Upstream Noncoding RNA) or DRR-eRNA [39]. This lncRNA is transcribed 5 kb upstream of MyoD1 (Myoblast determination protein 1) and in the murine myoblast cell line, C2C12, siMUNC decreases MyoD1 expression and myotube formation. Conversely, MUNC overexpression increases MyoD1 mRNA levels. It should be noted that even though the mechanism of action of MUNC is not completely elucidated, its knockdown and overexpression positively regulates the expression of other genes far from MyoD1, such as Myh3 (Myosin-3) and Myog (Myogenin). Therefore, MUNC should also be classified as a trans acting lncRNA [39].

PANDA (p21 Associated ncRNA DNA Damage Activated) promotes cell survival under DNA damage conditions. This lncRNA is a molecular decoy, since it binds and

5 sequesters NF-YA (Nuclear transcription factor Y subunit alpha) preventing this transcription factor from activating CCNB1 (Cyclin B1) or the pro-apoptotic genes like

FAS (Fas Cell Surface Death Receptor), BBC3 (BCL2 Binding Component 3), PMAIP1

(Phorbol-12-Myristate-13-Acetate-Induced Protein 1) [40].

Some lncRNAs are part of nuclear structural components, like NEAT1 (Nuclear

Enriched Abundant Transcript 1). This transcript was reported to play a crucial role in the structural integrity of subnuclear bodies called paraspeckles. In 2009, Clemson et al. revealed that NEAT1 is enriched in paraspeckles in the nucleus. Moreover, NEAT1 depletion decreased paraspeckle formation, while overexpression increased the number of paraspeckles. In agreement with this concept, NEAT1 was found to be associated to

PSP1 (Paraspeckle Component 1) protein in vivo and in vitro. In addition, considering that NEAT1 is suppressed in human embryonic stem cells, where paraspeckles are absent, and is induced upon differentiation to trophoblasts, when paraspeckles are present, NEAT1 might have a role in developmental regulation [41]. Other lncRNAs also localize to different subnuclear regions. For instance, MALAT1 (Metastasis-Associated

Lung Adenocarcinoma Transcript 1, or NEAT2), is specifically localized to splicing speckles [42].

Cytoplasmic lncRNAs

Most of the cytoplasmic lncRNAs function by base-pairing with mRNAs or proteins to regulate mRNA and protein stability, which is the case of TINCR (Terminal

Differentiation-Induced ncRNA). This lncRNA controls the differentiation of human epidermal cells, as it is required for upregulation of several important differentiation genes. TINCR interaction with its mRNA targets occur through a 25-nucleotide sequence

6

(TINCR box) present in the mRNA partners. Moreover, TINCR binds to STAU1 (Staufen1)

[15], an RNA binding protein (RBP) responsible for mRNA decay [43]. The TINCR-STAU1 interaction mediates the stabilization of the mRNA of genes involved in differentiation, such as KRT80 (Keratin 80) [15]. Another example of lncRNA that regulates mRNA and protein stability is BACE1-AS (BACE1 antisense). BACE1-AS forms an RNA duplex with and increases stability of BACE1 (Beta-secretase 1) mRNA through complementarity [44]. Furthermore, BACE1-AS and the RBP HuD (Hu Antigen D) form a cytoplasmic complex that stabilizes BACE1 mRNA [45].

The Noncoding RNA Activated by DNA Damage, or NORAD, is a transcript induced after DNA damage. Lee and collaborators revealed that NORAD depletion triggers aneuploidy in HCT116 colorectal carcinoma. Mechanistically, NORAD maintains genomic stability by sequestering PUMILIO proteins, evolutionarily conserved RBPs that negatively regulate gene expression. In the absence of NORAD, PUMILIO induces chromosomal instability by repressing genes related to DNA repair and DNA replication, such as PARP2 [Poly(ADP-Ribose) Polymerase 2], EXO1 (Exonuclease 1), MCM4

(Minichromosome Maintenance Complex Component 4), and MCM8 (Minichromosome

Maintenance Complex Component 8) [46]. Therefore, these data indicate that this lncRNA is a molecular decoy. Recently reports show that NORAD is upregulated in different types of tumors, for instance esophageal squamous cell carcinoma [47] and bladder cancer [48]. It is interesting to notice that a single lncRNA may have different functions depending on the cell context or environment. For example, Zhang and collaborators demonstrated that NORAD positively contributes to colorectal cancer progression by serving as a ceRNA for miR-202-5p [49].

7

Like NORAD, many lncRNAs are being described as ceRNAs, where the lncRNA associates to a microRNA by sequence complementarity and prevents it from degrading its mRNA targets. For instance, in addition to its importance during embryonic development, H19 is described as a molecular sponge for microRNA let‐7 [50]. SNHG5

(Small nucleolar RNA host gene 5), also known as Long Intergenic Non-Protein Coding

RNA 44 is downregulated in gastric cancer, where miR-32 is induced. Overexpression of wild type and mutant constructs reveals that SNHG5 sponges miR-32 to regulate KLF4

(Kruppel-like factor 4) expression, promoting cell proliferation, migration and invasion

[51]. Lastly, linc‐MD1 (Long Intergenic Non-Protein Coding RNA, Muscle Differentiation

1) is a muscle-specific long non-coding RNA that also works as microRNAs sponges.

Downregulation of linc-MD1 delays muscle differentiation of human and mouse myoblasts while linc-MD1 overexpression reverses this phenotype. Moreover, linc-MD1 sponges miR-133 and miR-135 to regulate the expression of MAML1 (Mastermind‐like‐1) and

MEF2C (Myocyte‐specific Enhancer Factor 2C), transcription factors that activate muscle-specific gene expression. Therefore, linc-MD1 controls the time of muscle differentiation by acting as a ceRNA in mouse and human myoblasts [52].

One caveat about lncRNAs that have been postulated to act as sponges of microRNAs will become relevant in Chapter 3. Because lncRNAs are usually expressed at a low level, the copy number of a lncRNA is often insufficient to effectively titrate a microRNA from its cellular targets, which are expressed at a much higher copy number per cell. Thus, the only true test that a physiological level of a lncRNA acts as a sponge or ceRNA, is to determine whether knockdown of the lncRNA releases enough

8 microRNAs in a cell to cause the repression of the endogenous targets of the mRNA. This test is often lacking in reports that claim that a lncRNA acts by sequestering a microRNA.

Long non-coding RNAs in cancer

Cancer is one of the main causes of death in the world. It is estimated that in the

US there will be around 1,700,000 new cases of cancer in 2018 alone. Despite the advances of cancer therapies, more than 600,000 people will die this year in the USA as a result of tumors [53].

Although, cancer is a heterogeneous and complex disease, in 2000 Hanahan and

Weinberg proposed a set of common tumor characteristics, the hallmarks of cancer: sustaining proliferative signaling; evading growth suppressors; enabling replicative immortality; activating invasion and metastasis; inducing angiogenesis and resisting cell death [54]. Recently, with the increasing number of sequenced cancer transcriptomes, lncRNAs have been emerging as important regulators of tumorigenesis. Thus, it follows that lncRNAs have functions related to the cancer hallmarks [54] including: uncontrolled growth, replication immortality, metastasis, and angiogenesis.

Perhaps the most prominent hallmark among tumors is their ability to replicate independently of external stimuli, often as result of deregulated signaling pathways. Not surprisingly, there are several of lncRNAs which play a role in cell proliferation. For instance, in colon cancer, Wnt/β-catenin activation leads to upregulation of the transcription factor c-Myc, causing an increase in cell proliferation. Additionally, Kawasaki et al identified a direct target of c-Myc, MYU, or c-Myc-upregulated lncRNA. This lncRNA is upregulated in colon cancers samples and MYU depletion reduces cell proliferation of

9 colon cancer in vitro and in vivo. Mechanistically, MYU binds to hnRNP-K and stabilizes

CDK6 expression, promoting G1/S transition [55].

Another hallmark of tumor cells is the ability to evade growth suppression. Under tumorigenesis, many cancer cells find a way to repress the expression of the tumor suppressors, like p53. In 2017, Li et al. discovered that PURPL (p53 upregulated regulator of p53 levels) associates with the p53 activator MYBBP1A and suppresses p53, stimulating the proliferation of colorectal cancer cells [56]. PURPL is not the only lncRNA involved in escape from growth suppression. ANRIL (antisense non-coding RNA in the

INK4 locus) promotes cell proliferation by blocking the activity of the tumor suppressor gene p15 (INK4B). This lncRNA acts in cis and recruits SUZ12, one of the PRC2 subunits, to p15 promoter region, causing its epigenetic silencing. Further, ANRIL knockdown decreases SUZ12 occupancy at the p15 locus and induces its expression [57]. In addition,

APTR, which will be described in Chapter 2, is also an example of a lncRNA that repress the expression of a tumor suppressor to promote cell proliferation [58].

The third hallmark of cancer is enabling replicative immortality. Unlike normal cells, tumor cells possess an ability to replicate almost unlimitedly. In physiological conditions, telomeres exert an important function by protecting the ends of the chromosomes. Over time, after each cell division, however, the telomere length decreases and the progressively loss of the telomeres leads to cell death [59]. TERRA (Telomeric Repeat-

Containing RNA) is described as a group of molecules transcribed from several telomeric loci and important for maintain the telomere integrity [60]. Functionally, TERRA is involved in telomeric heterochromatin formation [61] and in negatively regulating telomerase activity [62,63]. Furthermore, TERRA associates with HNRNPA1 (Heterogeneous nuclear

10 ribonucleoprotein A) and POT1 (Protection of Telomeres 1) to promote telomere capping and preserve genomic integrity [64].

The ability to invade and form metastasis is another hallmark tumor cells. One good example of a lncRNA that plays a role in invasion and metastasis is Downregulated

RNA In Cancer (DRAIC). Discovered in Dr. Dutta’s laboratory, DRAIC was identified in a

RNA-seq screening strategy to find lncRNAs involved in prostate cancer progression.

Gain and loss of function experiments revealed that DRAIC acts as a tumor suppressor by repressing cellular migration and invasion. Moreover, high expression of DRAIC is a good prognostic indicator for patients with different types of cancer, including lung, liver, gliomas and others [65]. Another example of lncRNA regulator of invasion is NKILA (NF- kB interacting long noncoding RNA). In invasive breast cancer, where it is downregulated and associated with poor patient outcome, NKILA forms a complex with NF-kB and IkB inhibiting IkB phosphorylation and thereby inhibiting NF-kB signaling [66].

Angiogenesis, the fifth cancer hallmark, is a common phenomenon in most tumors.

One trigger of angiogenesis is hypoxia, a low non-physiological oxygen concentration.

Here again, many reports confirm that lncRNAs are important modulators of a cell’s response to hypoxia. In an RNA-seq screen to find lncRNAs differentially expressed during hypoxia, Neumann et al demonstrated that the long non-coding antisense transcript of GATA6 (GATA6-AS) is upregulated in human umbilical vein endothelial cells after 24h of hypoxia. Moreover, GATA6-AS knockdown reduces angiogenic sprouting in vitro by spheroid assays. In addition, it was also determined that GATA6-AS regulates the expression of hypoxia and angiogenesis related genes PTGS2 (Prostaglandin-

Endoperoxide Synthase 2) and POSTN (Periostin). For this purpose, GATA6-AS recruits

11

LOXL2 (Lysyl Oxidase Like 2) to PTGS2 and POSTN promoter regions where it removes

H3K4me3 (mark of actively transcribed genes). LncHIFCAR (long noncoding HIF-1α co- activating RNA) is another hypoxia-induced lncRNA. In oral squamous cell carcinoma,

LncHIFCAR level is upregulated and it is associated with poor patient outcomes.

LncHIFCAR regulates the HIF-1 transcriptional network by binding and recruiting HIF-1α and p300 to the target promoters GLUT1 (Glucose transporter 1), LDHA (Lactate

Dehydrogenase A), VEGF (Vascular endothelial growth factor) and PDK1 (Pyruvate

Dehydrogenase Kinase 1) [67].

Finally, the last hallmark of cancer is resisting cell death. Originally reported in prostate cancer, PCGEM1 [(Prostate-Specific Transcript (Non-Protein Coding)] [68] upregulation results in the inhibition of apoptosis induced by doxorubicin. In LnCap cells overexpressing PCGEM1 treated with doxorubicin, Fu X and collaborators found that p53 and p21 stimulation is delayed and cleaved caspase 7 and cleaved PARP are reduced when compared to control cells [69].

In summary, the studies highlighted here underline the importance of lncRNAs in all tumorigenesis steps in many different types of cancer. As this thesis is focused on gliomas, the next section highlights the roles of a few lncRNAs in brain tumors.

Long non-coding RNAs in gliomas

Gliomas are a group of tumors derived from glial cells, the supporting cells in the nervous system. In the US, there are around 20,000 newly diagnosed cases of gliomas every year and the incidence rate is approximately 6 cases per 100,000 people [70].

These tumors used to be classified according to the cell phenotype in the belief that they

12 indicated the cell of origin. Therefore, the most common types of gliomas were: ependymomas, oligodendrogliomas and astrocytomas. Ependymomas were tumors that are histologically resembling to ependymal cells, which line the ventricular cavities in the brain and central canal of the spinal cord. These cells are responsible for regulating the cerebrospinal fluid diffusion into the periventricular brain regions. In pediatric patients, ependymomas account for around 5% of all CNS (Central Nervous System) tumors, while in adults they represent around 2%. This type of tumor rarely spreads outside the brain and has a high risk of recurrence [71-73]. Oligodendrogliomas account for less than 4% of CNS tumors and are relatively slow growing but can spread into nearby tissue. The cells that comprise these tumors phenotypically resemble oligodendrocytes.

Oligodendrocytes are glial cells that surround the axon of each neuron and form the myelin sheaths [70,73,74]. Finally, astrocytes are star-shaped glial cells important to maintain the blood–brain barrier and to modulate synaptic transmission, among other physiological functions [75]. Similarly, astrocytomas are tumors in which its cells share similar histological features to astrocytes.

The most aggressive glial tumor is the glioblastoma (GBM). This type of tumor, originated from oligodendrocytes and astrocytes, is highly invasive and has a very poor prognosis, with a median survival time of 1 year [76,77]. The standard treatment for this type of tumor consists of maximal surgical resection, radiotherapy, and temozolomide

(TMZ) chemotherapy. However, in spite of all these efforts, the tumors frequently recur

[78].

In 2016 a new classification system was introduced by the WHO [79] that includes genetic lesions in the tumors for classification. The former gliomas and glioblastomas are

13 now distributed into the following main subgroups: 1) Diffuse astrocytic and oligodendroglial tumors, 2) Other astrocytic tumors, 3) Ependymal tumors, and 4) Other gliomas. The first category is further subdivided into (a) Diffuse astrocytomas (IDH mutant, IDH WT and NOS), (b) Anaplastic astrocytomas (IDH mutant, IDS WT and NOS),

(c) Glioblastoma (IDH mutant, IDS WT and NOS), (d) Diffuse midline glioma (H3K27M mutant), (e) Oligodendroglioma (IDH mutant and 1p19q deleted, NOS), (f) Anaplastic oligodendroglioma (IDH mutant and 1p19q deleted, NOS), (g) Oliogastrocytoma (NOS) and (h) Anaplastic oligoastrocytoma (NOS). NOS stands for “Not Otherwise Specified” where the genetic test was not done on the tumor.

Recently, many groups have shown that several lncRNAs play important roles in glioma tumorigenesis. For example, Zhang et al. [80] identified more than 100 differentially expressed lncRNAs in gliomas when compared to normal brain, using publicly available gene expression profiles from the Gene Expression Omnibus (GEO).

More recently a study from the Dutta laboratory [81] used publicly available RNA-seq datasets to discover more than 1,200 differentially expressed lncRNAs in gliomas compared to normal brain. Moreover, 584 of them were associated with a poor prognosis, while 282 were associated with a good prognosis in GBMs.

Oncogenic long non-coding RNAs in gliomas

Interestingly, in both of these analysis, CRNDE (Colorectal Neoplasia Differentially

Expressed) was found to be induced in gliomas [80,81], suggesting its importance for brain tumor progression. In fact, Wang Y and collaborators [82] showed that, in GBM cells, CRNDE promotes migration, invasion and cell growth by activating the mTOR signaling pathway. Meanwhile, other groups reported CRNDE as a ceRNA. Zheng J and

14 collaborators [83] showed that CRNDE promotes tumor progression by sponging the tumor suppressor miR-384. In normal cells, miR-384 decreases PIWIL4 (Piwi Like RNA-

Mediated Gene Silencing 4) expression and STAT3 (Signal Transducer and Activator of

Transcription 3) activity. However, when CRNDE is upregulated, miR-384 can no longer inhibit PIWIL4, resulting in an increase of phosphorylated STAT3 (Signal Transducer and

Activator of Transcription 3), which, in turn, promotes cell proliferation, migration, and invasion and prevents apoptosis of GBM cells. Zheng et al. [84] reported that CRNDE promotes cell proliferation, migration, invasion and inhibits apoptosis by acting as a ceRNA for miR-186, causing upregulation of XIAP (X-linked Inhibitor of Apoptosis) and

PAK7 [P21 (RAC1) Activated Kinase 7] in GBM cell lines. Furthermore, another group showed that the same phenotypes, migration, invasion and proliferation were associated with CRNDE upregulation in glioma cells, however by a different mechanism. Li et al. propose that CRNDE competitively binds to miR-136-5p inducing BCL2 (B-Cell

CLL/Lymphoma 2) and WNT2 (Wnt Family Member 2) protein levels [85].

Another oncogenic and upregulated lncRNA in brain tumor is ECONEXIN

(Evolutionary Conserved and Expressed in Neural Tissues). Using two different glioblastoma cell lines, U87 and U251, Deguchi and collaborators revealed that

ECONEXIN is primarily located in the cytoplasm and its suppression decreases cell proliferation. Mechanistically it was demonstrated that ECONEXIN interacts with miR-

411-5p to regulate TOP2A (Topoisomerase 2 Alpha) expression [86].

HULC (Highly Upregulated in Liver Cancer) was originally described as highly expressed in hepatocellular carcinomas by Panzitt et al [87]. Consistent with that, this lncRNA is also upregulated in gliomas. Zhu Y and collaborators silenced HULC in GBM

15 cell lines and observed that angiogenesis, cell proliferation, invasion and migration were impaired. Moreover, HULC knockdown arrested cells in G1/S. Curiously, these phenotypes were reversed when ESM-1 (endothelial cell specific molecule 1) was overexpressed. In the same work, it was discovered that HULC activates the

PI3K/Akt/mTOR signaling pathway [88]. In agreement with Zhu Y, Yan et al. found that

HULC is upregulated in glioma tissues and high levels of HULC were indicative of poor patient survival. Moreover, HULC overexpression increased cell proliferation and colony formation, whereas knockdown of HULC expression reduced these phenotypes in glioma cell lines [89].

Taurine Upregulated Gene 1 (TUG1) was originally discovered by Young et al as lncRNA necessary for accurate development of photoreceptors in mouse retina [90]. In gliomas, TUG1 reports have been controversial. The first study in this type of cancer, suggested that TUG1 behaves as a tumor suppressive lncRNA. Using brain tumor patient samples TUG1 was found to be downregulated compared to adjacent normal tissues. It was also shown that TUG1 overexpression increased apoptosis by activating caspase-3 and-9 and inhibiting the expression of BCL2. Conversely, TUG1 knockdown increased cell viability [91]. However, another study demonstrated that TUG1 works both in the nucleus and in the cytoplasm to maintain the stemness of glioma stem cells, and thus, behaving as an oncogenic lncRNA. In the nucleus TUG1 physically interacted with PRC2 components, EZH2 and SUZ12, and epigenetically suppressed multiple neuronal differentiation-associated genes. In the cytoplasm, TUG1 counteracted miR-145 and maintained the expression of SOX2 and MYC [92]. More recently, a third group [93] also

16 suggested that TUG1 is an onco-lncRNA, since TUG1 knockdown in the GBM cell line

U251 inhibited cell proliferation and invasion, while promoted apoptosis.

Tumor suppressor long non-coding RNAs in gliomas

Initially identified in pituitary adenomas by Zhang et al. in 2010 [94], MEG3

(Maternally Expressed 3) is a tumor suppressor lncRNA. Wang et al. showed that MEG3 is downregulated in gliomas, when compared with the adjacent normal tissue [95] and

Gong et al. showed that MEG3 high expression is associated with good patient outcomes

[96]. In addition, MEG3 overexpression suppressed proliferation, causing accumulation of glioma cells in G0/G1 phase, and promoted cell apoptosis. Moreover, MEG3 upregulated caspase 8/3, TP53 mRNA levels and increased p53 activity. Furthermore,

MEG3 and p53 were found to be physically associated by both RIP (RNA

Immunoprecipitation) and RNA pull-down, suggesting that MEG3 physically associates to p53 to regulate its activation [95]. In 2016, Li J and collaborators demonstrated that

MEG3 downregulation was caused by hypermethylation of its promoter region in a

DNMT1 (DNA Methyltransferase 1) mediated manner [97]. MEG3 was also reported as a tumor suppressor lncRNA that inhibits cell proliferation by regulating Wnt/β-catenin signaling activity [96]. Other reports also suggest that MEG3 is capable of sponging miR-

19a and miR-93 to induce PTEN and repress the PI3K/AKT pathways, respectively

[98,99].

CASC2 (Cancer Susceptibility Candidate 2), is another example of a lncRNA downregulated in gliomas [100-103]. Not surprisingly, CASC5 upregulation is associated with good patient outcome in gliomas [100-102]. In addition, CASC2 overexpression suppresses glioma cell tumorigenicity by inhibiting proliferation, migration, and invasion

17

[100,101]. Wang et al. proposed that CASC2 and miR-21 physically interact to act as tumor suppressors in glioma cells [100], while other group affirmed that CASC2 works by suppressing Wnt/β-catenin signaling pathway [101]. Interestingly, it was discovered that

CASC2 sensitizes glioma cells to TMZ cytotoxicity by interacting with miR-181a [102] and miR-193a-5p [103] and regulating PTEN and mTOR signaling pathways, respectively.

Growth arrest-specific 5 (GAS5) has been reported to have an important role in different types of tumors, such as colorectal, gastric, cervical, bladder, among others

[104-107]. In gliomas, GAS5 overexpression increased expression of the tumor suppressor BMF (BCL2 Modifying Factor) and PLXNC1 (Plexin C1) by binding to miR-

222 [108]. Overexpression of GAS5 also was also associated to the cellular response to erlotinib, an EGFR inhibitor used for glioma treatment [109].

TUNAR (TCL1 Upstream Neural Differentiation-Associated RNA) is another example of a tumor suppressor lncRNA. When overexpressed, TUNAR inhibited cell proliferation, migration, invasion and induced apoptosis. Dai J and collaborators revealed that TUNAR exerts its anti-cancer role in glioma cells by upregulating miR-200a and inhibiting RAC1 (Ras-related C3 Botulinum Toxin Substrate 1) [110].

In addition to the aforementioned, other extensively studied lncRNAs were also reported as important players in brain tumors, such as HOTAIR, NEAT1, MALAT1, XIST and H19. HOTAIR promotes cell proliferation, migration and invasion while inhibits apoptosis [111]. In addition, HOTAIR interacts with the PRC2 complex to regulate the cell cycle [32]. NEAT1 promotes cell growth and invasion in gliomas. It physically associates with EZH2 to promote H3K27 trimethylation on ICAT, GSK3B, and Axin2 promoter regions and, consequently, downregulating the expression of these genes [112]. MALAT1

18 is downregulated in glioma patient samples compared to normal brain tissues. Moreover,

MALAT1 suppresses invasion and proliferation by decreasing miR-155 expression and increasing regulated FBXW7 expression [113]. Upregulated in gliomas, XIST promotes angiogenesis by increasing the expression of CXCR7 (C-X-C Chemokine Receptor Type

7) [114]. And lastly, H19 promotes cell invasion and angiogenesis inducing vasohibin 2

(VASH2) expression [115].

As evidenced here, many scientists have been concentrating their efforts in understanding the role of lncRNAs in a diverse number of biological functions. We hope to expand on their work to further elucidate the mechanisms by which lncRNAs promote or suppress tumorigenesis. Specifically, the work described in this study aims to characterize the role of two lncRNAs in tumor progression: APTR and LINC00152.

References

1 - Wapinski O, Chang HY. Long noncoding RNAs and human disease. Trends Cell Biol. 2011 Jun;21(6):354-61. Review. Erratum in: Trends Cell Biol. 2011 Oct;21(10):561.

2 - Sánchez Y, Huarte M. Long non-coding RNAs: challenges for diagnosis and therapies. Nucleic Acid Ther. 2013 Feb;23(1):15-20. Review.

3 - Ma L, Bajic VB, Zhang Z. On the classification of long non-coding RNAs. RNA Biol. 2013 Jun;10(6):925-33. Review.

4 - ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74.

5. Rinn JL, Chang HY. Genome regulation by long noncoding RNAs. Annu Rev Biochem. 2012; 81:145-66. Review.

6 - C.P. Ponting, P.L. Oliver, W. Reik, Evolution and functions of long noncoding RNAs, Cell 136 (2009) 629–641

19

7 - Ponting CP, Oliver PL, Reik W. Evolution and functions of long noncoding RNAs. Cell. 2009 Feb 20;136(4):629-41. Review.

8 - Pauli A, Valen E, Lin MF, Garber M, Vastenhouw NL, Levin JZ, Fan L, Sandelin A, Rinn JL, Regev A, Schier AF. Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis. Genome Res. 2012 Mar;22(3):577-91.

9 - Guttman M., Amit I., Garber M., French C., Lin M.F., Feldser D., Huarte M., Zuk O., Carey B.W., Cassady J.P., Cabili M.N., Jaenisch R., Mikkelsen T.S., Jacks T., Hacohen N., Bernstein B.E., Kellis M., Regev A., Rinn J.L., Lander E.S. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009; 458:223–227.

10 - Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S, Barnes I, Bignell A, Boychenko V, Hunt T, Kay M, Mukherjee G, Rajan J, Despacio-Reyes G, Saunders G, Steward C, Harte R, Lin M, Howald C, Tanzer A, Derrien T, Chrast J, Walters N, Balasubramanian S, Pei B, Tress M, Rodriguez JM, Ezkurdia I, van Baren J, Brent M, Haussler D, Kellis M, Valencia A, Reymond A, Gerstein M, Guigó R, Hubbard TJ. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012 Sep;22(9):1760-74.

11 - Fang S, Zhang L, Guo J, Niu Y, Wu Y, Li H, Zhao L, Li X, Teng X, Sun X, Sun L, Zhang MQ, Chen R, Zhao Y. NONCODEV5: a comprehensive annotation database for long non-coding RNAs. Nucleic Acids Res. 2018 Jan 4;46(D1):D308-D314.

12 - Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, Guernec G, Martin D, Merkel A, Knowles DG, Lagarde J, Veeravalli L, Ruan X, Ruan Y, Lassmann T, Carninci P, Brown JB, Lipovich L, Gonzalez JM, Thomas M, Davis CA, Shiekhattar R, Gingeras TR, Hubbard TJ, Notredame C, Harrow J, Guigó R. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 2012 Sep;22(9):1775-89.

13 - Cheng W, Zhang Z, Wang J. Long noncoding RNAs: new players in prostate cancer. Cancer Lett. 2013 Oct 1;339(1):8-14. Review.

20

14 - Mercer TR, Dinger ME, Sunkin SM, Mehler MF, Mattick JS. Specific expression of long noncoding RNAs in the mouse brain. Proc Natl Acad Sci U S A. 2008 Jan 15;105(2):716-21.

15 - Kretz M, Siprashvili Z, Chu C, Webster DE, Zehnder A, Qu K, Lee CS, Flockhart RJ, Groff AF, Chow J, Johnston D, Kim GE, Spitale RC, Flynn RA, Zheng GX, Aiyer S, Raj A, Rinn JL, Chang HY, Khavari PA. Control of somatic tissue differentiation by the long non-coding RNA TINCR. Nature. 2013 Jan 10;493(7431):231-5.

16 - Loewer S, Cabili MN, Guttman M, Loh YH, Thomas K, Park IH, Garber M, Curran M, Onder T, Agarwal S, Manos PD, Datta S, Lander ES, Schlaeger TM, Daley GQ, Rinn JL. Large intergenic non-coding RNA-RoR modulates reprogramming of human induced pluripotent stem cells. Nat Genet. 2010 Dec;42(12):1113-7. Erratum in: Nat Genet. 2010 Dec;42(12): 3 p following 1117.

17 - Zhao XY, Lin JD. Long noncoding RNAs: a new regulatory code in metabolic control. Trends in Biochemical Sciences. 2015; 40: 586-96.

18 - Pachnis V, Belayew A, Tilghman SM. Locus unlinked to alpha-fetoprotein under the control of the murine raf and Rif genes. Proc Natl Acad Sci U S A. 1984 Sep;81(17):5523- 5527.

19 - Brannan CI, Dees EC, Ingram RS, Tilghman SM (January 1990). "The product of the H19 gene may function as an RNA". Mol. Cell. Biol. 10 (1): 28–36.

20 - Arima T, Matsuda T, Takagi N, Wake N (January 1997). "Association of IGF2 and H19 imprinting with choriocarcinoma development". Cancer Genet. Cytogenet. 93 (1): 39–47.

21 - Banet G, Bibi O, Matouk I, et al. (September 2000). "Characterization of human and mouse H19 regulatory sequences". Mol. Biol. Rep. 27 (3): 157–65.

22 - Brunkow ME, Tilghman SM (June 1991). "Ectopic expression of the H19 gene in mice causes prenatal lethality". Genes Dev. 5 (6): 1092–101.

21

23 - Dey BK, Pfeifer K, Dutta A. The H19 long noncoding RNA gives rise to microRNAs miR-675-3p and miR-675-5p to promote skeletal muscle differentiation and regeneration. Genes Dev. 2014 Mar 1;28(5):491-501.

24 - Bergström R, Whitehead J, Kurukuti S, Ohlsson R (February 2007). "CTCF regulates asynchronous replication of the imprinted H19/Igf2 domain". Cell Cycle. 6 (4): 450–4.

25 - Brown CJ, Ballabio A, Rupert JL, Lafreniere RG, Grompe M, Tonlorenzi R, Willard HF. A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome. Nature. 1991 Jan 3;349(6304):38-44.

26 - Zhao J, Sun BK, Erwin JA, Song JJ, Lee JT. Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science. 2008 Oct 31;322(5902):750-6.

27 - McHugh CA, Chen CK, Chow A, Surka CF, Tran C, McDonel P, Pandya-Jones A, Blanco M, Burghard C, Moradian A, Sweredoski MJ, Shishkin AA, Su J, Lander ES, Hess S, Plath K, Guttman M. The Xist lncRNA interacts directly with SHARP to silence transcription through HDAC3. Nature. 2015 May 14;521(7551):232-6.

28 - Lee JT, Davidow LS, Warshawsky D (1999). "Tsix, a gene antisense to Xist at the X- inactivation centre". Nat. Genet. 21 (4): 400–4.

29 - Alberts B, Johnson A, Lewis J, et al. Molecular Biology of the Cell. 4th edition. New York: Garland Science; 2002.

30 - Feng J, Funk WD, Wang SS, Weinrich SL, Avilion AA, Chiu CP, Adams RR, Chang E, Allsopp RC, Yu J, et al. The RNA component of human telomerase. Science. 1995 Sep 1;269(5228):1236-41.

31 - Shi X, Sun M, Liu H, Yao Y, Song Y. Long non-coding RNAs: a new frontier in the study of human diseases. Cancer Lett. 2013 Oct 10;339(2):159-66. Review.

32 - Gupta RA., Shah N, Wang KC, Kim J, Horlings HM, Wong DJ, Tsai MC, Hung T, Argani P, Rinn JL, Wang Y, Brzoska P, Kong B, Li R, West RB, van de Vijver MJ, Sukumar S, Chang HY. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature. 2010; 464:1071–1076.

22

33 - Kogo R, Shimamura T, Mimori K, Kawahara K, Imoto S, Sudo T, Tanaka F, Shibata K, Suzuki A, Komune S, Miyano S, Mori M. Long noncoding RNA HOTAIR regulates polycomb-dependent chromatin modification and is associated with poor prognosis in colorectal cancers. Cancer Res. 2011 Oct 15;71(20):6320-6. Erratum in: Cancer Res. 2012 Feb 15;72(4):1039.

34 - Li D, Feng J, Wu T, Wang Y, Sun Y, Ren J, Liu M. Long intergenic noncoding RNA HOTAIR is overexpressed and regulates PTEN methylation in laryngeal squamous cell carcinoma. Am J Pathol. 2013 Jan;182(1):64-70.

35 - Zhang K, Sun X, Zhou X, Han L, Chen L, Shi Z, Zhang A, Ye M, Wang Q, Liu C, Wei J, Ren Y, Yang J, Zhang J, Pu P, Li M, Kang C. Long non-coding RNA HOTAIR promotes glioblastoma cell cycle progression in an EZH2 dependent manner. Oncotarget. 2015 Jan 1;6(1):537-46.

36 - Huarte M, Guttman M, Feldser D, Garber M, Koziol MJ, Kenzelmann-Broz D, Khalil AM, Zuk O, Amit I, Rabani M, Attardi LD, Regev A, Lander ES, Jacks T, Rinn JL. A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response. Cell. 2010 Aug 6;142(3):409-19.

37 - Dimitrova N, Zamudio JR, Jong RM, Soukup D, Resnick R, Sarma K, Ward AJ, Raj A, Lee JT, Sharp PA, Jacks T. LincRNA-p21 activates p21 in cis to promote Polycomb target gene expression and to enforce the G1/S checkpoint. Mol Cell. 2014 Jun 5;54(5):777-90.

38 - Blackwood EM, Kadonaga JT. Going the distance: a current view of enhancer action. Science. 1998 Jul 3;281(5373):60-3. Review.

39 - Mueller AC, Cichewicz MA, Dey BK, Layer R, Reon BJ, Gagan JR, Dutta A. MUNC, a long noncoding RNA that facilitates the function of MyoD in skeletal myogenesis. Mol Cell Biol. 2015 Feb;35(3):498-513.

40 - Hung T, Wang Y, Lin MF, Koegel AK, Kotake Y, Grant GD, Horlings HM, Shah N, Umbricht C, Wang P, Wang Y, Kong B, Langerød A, Børresen-Dale AL, Kim SK, van de Vijver M, Sukumar S, Whitfield ML, Kellis M, Xiong Y, Wong DJ, Chang HY. Extensive

23 and coordinated transcription of noncoding RNAs within cell-cycle promoters. Nat Genet. 2011 Jun 5;43(7):621-9.

41 - Clemson CM, Hutchinson JN, Sara SA, Ensminger AW, Fox AH, Chess A, Lawrence JB. An architectural role for a nuclear noncoding RNA: NEAT1 RNA is essential for the structure of paraspeckles. Mol Cell. 2009 Mar 27;33(6):717-26.

42 - Hutchinson JN, Ensminger AW, Clemson CM, Lynch CR, Lawrence JB, Chess A. A screen for nuclear transcripts identifies two linked noncoding RNAs associated with SC35 splicing domains. BMC Genomics. 2007 Feb 1; 8:39.

43 - Park E, Maquat LE. Staufen-mediated mRNA decay. Wiley Interdiscip Rev RNA. 2013 Jul-Aug;4(4):423-35. Review.

44 - Faghihi MA, Modarresi F, Khalil AM, Wood DE, Sahagan BG, Morgan TE, Finch CE, St Laurent G 3rd, Kenny PJ, Wahlestedt C. Expression of a noncoding RNA is elevated in Alzheimer's disease and drives rapid feed-forward regulation of beta-secretase. Nat Med. 2008 Jul;14(7):723-30.

45 - Kang MJ, Abdelmohsen K, Hutchison ER, Mitchell SJ, Grammatikakis I, Guo R, Noh JH, Martindale JL, Yang X, Lee EK, Faghihi MA, Wahlestedt C, Troncoso JC, Pletnikova O, Perrone-Bizzozero N, Resnick SM, de Cabo R, Mattson MP, Gorospe M. HuD regulates coding and noncoding RNA to induce APP→Aβ processing. Cell Rep. 2014 Jun 12;7(5):1401-1409.

46 - Lee S, Kopp F, Chang TC, Sataluri A, Chen B, Sivakumar S, Yu H, Xie Y, Mendell JT. Noncoding RNA NORAD Regulates Genomic Stability by Sequestering PUMILIO Proteins. Cell. 2016 Jan 14;164(1-2):69-80.

47 - Wu X, Lim ZF, Li Z, Gu L, Ma W, Zhou Q, Su H, Wang X, Yang X, Zhang Z. NORAD Expression Is Associated with Adverse Prognosis in Esophageal Squamous Cell Carcinoma. Oncol Res Treat. 2017;40(6):370-374.

48 - Li Q, Li C, Chen J, Liu P, Cui Y, Zhou X, Li H, Zu X. High expression of long noncoding RNA NORAD indicates a poor prognosis and promotes clinical progression and metastasis in bladder cancer. Urol Oncol. 2018 Mar 28. pii: S1078-1439(18)30076-0.

24

49 - Zhang J, Li XY, Hu P, Ding YS. LncRNA NORAD contributes to colorectal cancer progression by inhibition of miR-202-5p. Oncol Res. 2018 Feb 22.

50 - Kallen AN, Zhou XB, Xu J, Qiao C, Ma J, Yan L, Lu L, Liu C, Yi JS, Zhang H, Min W, Bennett AM, Gregory RI, Ding Y, Huang Y. The imprinted H19 lncRNA antagonizes let-7 microRNAs. Mol Cell. 2013 Oct 10;52(1):101-12.

51 - Zhao L, Han T, Li Y, Sun J, Zhang S, Liu Y, Shan B, Zheng D, Shi J. The lncRNA SNHG5/miR-32 axis regulates gastric cancer cell proliferation and migration by targeting KLF4. FASEB J. 2017 Mar;31(3):893-903.

52 - Cesana M, Cacchiarelli D, Legnini I, Santini T, Sthandier O, Chinappi M, Tramontano A, Bozzoni I. A long noncoding RNA controls muscle differentiation by functioning as a competing endogenous RNA. Cell. 2011 Oct 14;147(2):358-69. Erratum in: Cell. 2011 Nov 11;147(4):947.

53 - Siegel RL, Miller KD, Jemal A. Cancer statistics, 2018. CA Cancer J Clin. 2018 Jan;68(1):7-30.

54 - Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000 Jan 7;100(1):57-70. Review.

55 - Kawasaki Y, Komiya M, Matsumura K, Negishi L, Suda S, Okuno M, Yokota N, Osada T, Nagashima T, Hiyoshi M, Okada-Hatakeyama M, Kitayama J, Shirahige K, Akiyama T. MYU, a Target lncRNA for Wnt/c-Myc Signaling, Mediates Induction of CDK6 to Promote Cell Cycle Progression. Cell Rep. 2016 Sep 6;16(10):2554-2564.

56 - Li XL, Subramanian M, Jones MF, Chaudhary R, Singh DK, Zong X, Gryder B, Sindri S, Mo M, Schetter A, Wen X, Parvathaneni S, Kazandjian D, Jenkins LM, Tang W, Elloumi F, Martindale JL, Huarte M, Zhu Y, Robles AI, Frier SM, Rigo F, Cam M, Ambs S, Sharma S, Harris CC, Dasso M, Prasanth KV, Lal A. Long Noncoding RNA PURPL Suppresses Basal p53 Levels and Promotes Tumorigenicity in Colorectal Cancer. Cell Rep. 2017 Sep 5;20(10):2408-2423.

25

57 - Kotake Y, Nakagawa T, Kitagawa K, Suzuki S, Liu N, Kitagawa M, Xiong Y. Long non-coding RNA ANRIL is required for the PRC2 recruitment to and silencing of p15(INK4B) tumor suppressor gene. Oncogene. 2011 Apr 21;30(16):1956-62.

58 - Negishi M, Wongpalee SP, Sarkar S, Park J, Lee KY, Shibata Y, Reon BJ, Abounader R, Suzuki Y, Sugano S, Dutta A. A new lncRNA, APTR, associates with and represses the CDKN1A/p21 promoter by recruiting polycomb proteins. PLoS One. 2014 Apr 18;9(4):e95216.

59 - Shay JW, Wright WE. Hayflick, his limit, and cellular ageing. Nat Rev Mol Cell Biol. 2000 Oct;1(1):72-6.

60 - Azzalin CM, Reichenbach P, Khoriauli L, Giulotto E, Lingner J. Telomeric repeat containing RNA and RNA surveillance factors at mammalian chromosome ends. Science. 2007 Nov 2;318(5851):798-801.

61 - Deng Z, Norseen J, Wiedmer A, Riethman H, Lieberman PM. TERRA RNA binding to TRF2 facilitates heterochromatin formation and ORC recruitment at telomeres. Mol Cell. 2009 Aug 28;35(4):403-13.

62 - Redon S, Reichenbach P, Lingner J. The non-coding RNA TERRA is a natural ligand and direct inhibitor of human telomerase. Nucleic Acids Res. 2010 Sep;38(17):5797-806.

63 - Ng LJ, Cropley JE, Pickett HA, Reddel RR, Suter CM. Telomerase activity is associated with an increase in DNA methylation at the proximal subtelomere and a reduction in telomeric transcription. Nucleic Acids Res. 2009 Mar;37(4):1152-9.

64 - Flynn RL, Centore RC, O'Sullivan RJ, Rai R, Tse A, Songyang Z, Chang S, Karlseder J, Zou L. TERRA and hnRNPA1 orchestrate an RPA-to-POT1 switch on telomeric single- stranded DNA. Nature. 2011 Mar 24;471(7339):532-6.

65 - Sakurai K, Reon BJ, Anaya J, Dutta A. The lncRNA DRAIC/PCAT29 Locus Constitutes a Tumor-Suppressive Nexus. Mol Cancer Res. 2015 May;13(5):828-38.

66 - Liu B, Sun L, Liu Q, Gong C, Yao Y, Lv X, Lin L, Yao H, Su F, Li D, Zeng M, Song E. A cytoplasmic NF-κB interacting long noncoding RNA blocks IκB phosphorylation and suppresses breast cancer metastasis. Cancer Cell. 2015 Mar 9;27(3):370-81.

26

67 - Shih JW, Chiang WF, Wu ATH, Wu MH, Wang LY, Yu YL, Hung YW, Wang WC, Chu CY, Hung CL, Changou CA, Yen Y, Kung HJ. Long noncoding RNA LncHIFCAR/MIR31HG is a HIF-1α co-activator driving oral cancer progression. Nat Commun. 2017 Jun 22; 8:15874.

68 - Srikantan V, Zou Z, Petrovics G, Xu L, Augustus M, Davis L, Livezey JR, Connell T, Sesterhenn IA, Yoshino K, Buzard GS, Mostofi FK, McLeod DG, Moul JW, Srivastava S. PCGEM1, a prostate-specific gene, is overexpressed in prostate cancer. Proc Natl Acad Sci U S A. 2000 Oct 24;97(22):12216-21.

69 - Fu X, Ravindranath L, Tran N, Petrovics G, Srivastava S. Regulation of apoptosis by a prostate-specific and prostate cancer-associated noncoding gene, PCGEM1. DNA Cell Biol. 2006 Mar;25(3):135-41.

70 - Mesfin FB, Al-Dhahir MA. Cancer, Brain, Gliomas. 2017 Oct 6. StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2018 Jan.

71 - Rao JS. Molecular mechanisms of glioma invasiveness: the role of proteases. Nat Rev Cancer. 2003 Jul;3(7):489-501. Review.

72 - Wu J, Armstrong TS, Gilbert MR. Biology and management of ependymomas. Neuro Oncol. 2016 Jul;18(7):902-13. Review.

73 - Ostrom QT, Gittleman H, Liao P, Rouse C, Chen Y, Dowling J, Wolinsky Y, Kruchko C, Barnholtz-Sloan J. CBTRUS statistical report: primary brain and central nervous system tumors diagnosed in the United States in 2007-2011. Neuro Oncol. 2014 Oct;16 Suppl 4: iv1-63.

74 - Wesseling P, van den Bent M, Perry A. Oligodendroglioma: pathology, molecular mechanisms and markers. Acta Neuropathol. 2015 Jun;129(6):809-27.

75 - Kolb B, and Whishaw, I. Fundamentals of Human Neuropsychology. Worth Publishers. 6th ed. 2008 ISBN 0716795868

76 - Bleeker FE, Molenaar RJ, Leenstra S. Recent advances in the molecular understanding of glioblastoma. J Neurooncol. 2012 May;108(1):11-27.

27

77 – Ohgaki H, Kleihues P. Epidemiology and etiology of gliomas. Acta Neuropathol. 2005 Jan;109(1):93-108. Epub 2005 Feb 1. Review.

78 - Sathornsumetee S, Reardon DA, Desjardins A, Quinn JA, Vredenburgh JJ, Rich JN. Molecularly targeted therapy for malignant glioma. Cancer. 2007 Jul 1;110(1):13-24. Review.

79 - Louis DN, Perry A, Reifenberger G, von Deimling A, Figarella-Branger D, Cavenee WK, Ohgaki H, Wiestler OD, Kleihues P, Ellison DW. The 2016 World Health Organization Classification of Tumors of the Central Nervous System: a summary. Acta Neuropathol. 2016 Jun;131(6):803-20. Review.

80 - Zhang XQ, Sun S, Pu JKS, Tsang ACO, Lee D, Man VOY, et al. Long non-coding RNA expression profiles predict clinical phenotypes in glioma. Neurobiol Dis. 2012;48(1):1–8.

81 - Reon BJ, Anaya J, Zhang Y, Mandell J, Purow B, Abounader R, et al. Expression of lncRNAs in low-grade gliomas and glioblastoma multiforme: An in silico analysis. PLoS Med. 2016;13(12):e1002192.

82 - Wang Y, Wang Y, Li J, Zhang Y, Yin H, Han B. CRNDE, a long-noncoding RNA, promotes glioma cell growth and invasion through mTOR signaling. Cancer Lett. 2015;367(2):122–8.

83 - Zheng J, Liu XB, Wang P, Xue YX, Ma J, Qu CB, et al. axis. Mol Ther. 2016;24(7):1199–215.

84 - Zheng J, Li XD, Wang P, Liu XB, Xue YX, Hu Y, et al. CRNDE affects the malignant biological characteristics of human glioma stem cells by negatively regulating miR-186. Oncotarget. 2015;6(28):25339–55.

85 - Li DX, Fei XR, Dong YF, Cheng CD, Yang Y, Deng XF, Huang HL, Niu WX, Zhou CX, Xia CY, Niu CS. The long non-coding RNA CRNDE acts as a ceRNA and promotes glioma malignancy by preventing miR-136-5p-mediated downregulation of Bcl-2 and Wnt2. Oncotarget. 2017 Oct 4;8(50):88163-88178.

28

86 - Deguchi S, Katsushima K, Hatanaka A, Shinjo K, Ohka F, Wakabayashi T, Zong H, Natsume A, Kondo Y. Oncogenic effects of evolutionarily conserved noncoding RNA ECONEXIN on gliomagenesis. Oncogene. 2017 Aug 10;36(32):4629-4640.

87 - Panzitt K, Tschernatsch MM, Guelly C, Moustafa T, Stradner M, Strohmaier HM, Buck CR, Denk H, Schroeder R, Trauner M, Zatloukal K. Characterization of HULC, a novel gene with striking up-regulation in hepatocellular carcinoma, as noncoding RNA. Gastroenterology. 2007 Jan;132(1):330-42.

88 - Zhu Y, Zhang X, Qi L, Cai Y, Yang P, Xuan G, Jiang Y. HULC long noncoding RNA silencing suppresses angiogenesis by regulating ESM-1 via the PI3K/Akt/mTOR signaling pathway in human gliomas. Oncotarget. 2016 Mar 22;7(12):14429-40.

89 - Yan H, Tian R, Zhang M, Wu J, Ding M, He J. High expression of long noncoding RNA HULC is a poor predictor of prognosis and regulates cell proliferation in glioma. Onco Targets Ther. 2016 Dec 21; 10:113-120.

90 - Young TL, Matsuda T, Cepko CL. The noncoding RNA taurine upregulated gene 1 is required for differentiation of the murine retina. Curr Biol. 2005 Mar 29;15(6):501-12.

91 - Li J, Zhang M, An G, Ma Q. LncRNA TUG1 acts as a tumor suppressor in human glioma by promoting cell apoptosis. Exp Biol Med (Maywood). 2016 Mar;241(6):644-9.

92 - Katsushima K, Natsume A, Ohka F, Shinjo K, Hatanaka A, Ichimura N, Sato S, Takahashi S, Kimura H, Totoki Y, Shibata T, Naito M, Kim HJ, Miyata K, Kataoka K, Kondo Y. Targeting the Notch-regulated non-coding RNA TUG1 for glioma treatment. Nat Commun. 2016 Dec 6; 7:13616.

93 - Zhao Z, Wang B, Hao J, Man W, Chang Y, Ma S, Hu Y, Liu F, Yang J. Downregulation of the long non-coding RNA taurine-upregulated gene 1 inhibits glioma cell proliferation and invasion and promotes apoptosis. Oncol Lett. 2018 Mar;15(3):4026-4032.

94 - Zhang X, Rice K, Wang Y, Chen W, Zhong Y, Nakayama Y, Zhou Y, Klibanski A. Maternally expressed gene 3 (MEG3) noncoding ribonucleic acid: isoform structure, expression, and functions. Endocrinology. 2010 Mar;151(3):939-47.

29

95 - Wang P, Ren Z, Sun P. Overexpression of the long non-coding RNA MEG3 impairs in vitro glioma cell proliferation. J Cell Biochem. 2012 Jun;113(6):1868-74.

96 - Gong X, Huang M. Long non-coding RNA MEG3 promotes the proliferation of glioma cells through targeting Wnt/β-catenin signal pathway. Cancer Gene Ther. 2017 Sep;24(9):381-385.

97 - Li J, Bian EB, He XJ, Ma CC, Zong G, Wang HL, Zhao B. Epigenetic repression of long non-coding RNA MEG3 mediated by DNMT1 represses the p53 pathway in gliomas. Int J Oncol. 2016 Feb;48(2):723-33.

98 - Qin N, Tong GF, Sun LW, Xu XL. Long Noncoding RNA MEG3 Suppresses Glioma Cell Proliferation, Migration, and Invasion by Acting as a Competing Endogenous RNA of miR-19a. Oncol Res. 2017 Nov 2;25(9):1471-1478.

99 - Zhang L, Liang X, Li Y. Long non-coding RNA MEG3 inhibits cell growth of gliomas by targeting miR-93 and inactivating PI3K/AKT pathway. Oncol Rep. 2017 Oct;38(4):2408-2416.

100 - Wang P, Liu YH, Yao YL, Li Z, Li ZQ, Ma J, Xue YX. Long non-coding RNA CASC2 suppresses malignancy in human gliomas by miR-21. Cell Signal. 2015 Feb;27(2):275- 82.

101 - Wang R, Li Y, Zhu G, Tian B, Zeng W, Yang Y, Li Z. Long noncoding RNA CASC2 predicts the prognosis of glioma patients and functions as a suppressor for gliomas by suppressing Wnt/β-catenin signaling pathway. Neuropsychiatr Dis Treat. 2017 Jul 11; 13:1805-1813.

102 - Liao Y, Shen L, Zhao H, Liu Q, Fu J, Guo Y, Peng R, Cheng L. LncRNA CASC2 Interacts With miR-181a to Modulate Glioma Growth and Resistance to TMZ Through PTEN Pathway. J Cell Biochem. 2017 Jul;118(7):1889-1899.

103 - Jiang C, Shen F, Du J, Fang X, Li X, Su J, Wang X, Huang X, Liu Z. Upregulation of CASC2 sensitized glioma to temozolomide cytotoxicity through autophagy inhibition by sponging miR-193a-5p and regulating mTOR expression. Biomed Pharmacother. 2018 Jan; 97:844-850.

30

104 - Li Q, Ma G, Sun S, Xu Y, Wang B. Polymorphism in the promoter region of lncRNA GAS5 is functionally associated with the risk of gastric cancer. Clin Res Hepatol Gastroenterol. 2018 Mar 27. pii: S2210-7401(18)30035-4.

105 - Liu L, Meng T, Yang XH, Sayim P, Lei C, Jin B, Ge L, Wang HJ. Prognostic and predictive value of long non-coding RNA GAS5 and mircoRNA-221 in colorectal cancer and their effects on colorectal cancer cell proliferation, migration and invasion. Cancer Biomark. 2018 Mar 30.

106 - Li Y, Wan YP, Bai Y. Correlation between long strand non-coding RNA GASS expression and prognosis of cervical cancer patients. Eur Rev Med Pharmacol Sci. 2018 Feb;22(4):943-949.

107 - Wang M, Guo C, Wang L, Luo G, Huang C, Li Y, Liu D, Zeng F, Jiang G, Xiao X. Long noncoding RNA GAS5 promotes bladder cancer cells apoptosis through inhibiting EZH2 transcription. Cell Death Dis. 2018 Feb 14;9(2):238.

108 - Zhao X, Wang P, Liu J, Zheng J, Liu Y, Chen J, Xue Y. Gas5 Exerts Tumor- suppressive Functions in Human Glioma Cells by Targeting miR-222. Mol Ther. 2015 Dec;23(12):1899-911.

109 - García-Claver A, Lorente M, Mur P, Campos-Martín Y, Mollejo M, Velasco G, Meléndez B. Gene expression changes associated with erlotinib response in glioma cell lines. Eur J Cancer. 2013 May;49(7):1641-53.

110 - Dai J, Ma J, Yu B, Zhu Z, Hu Y. Long non-coding RNA TUNAR represses growth, migration and invasion of human glioma cells through regulating miR-200a and Rac1. Oncol Res. 2018 Mar 14.

111 - Ke J, Yao YL, Zheng J, Wang P, Liu YH, Ma J, Li Z, Liu XB, Li ZQ, Wang ZH, Xue YX. Knockdown of long non-coding RNA HOTAIR inhibits malignant biological behaviors of human glioma cells via modulation of miR-326. Oncotarget. 2015 Sep 8;6(26):21934- 49.

112 - Chen Q, Cai J, Wang Q, Wang Y, Liu M, Yang J, Zhou J, Kang C, Li M, Jiang C. Long Noncoding RNA NEAT1, Regulated by the EGFR Pathway, Contributes to

31

Glioblastoma Progression Through the WNT/β-Catenin Pathway by Scaffolding EZH2. Clin Cancer Res. 2018 Feb 1;24(3):684-695.

113 - Cao S, Wang Y, Li J, Lv M, Niu H, Tian Y. Tumor-suppressive function of long noncoding RNA MALAT1 in glioma cells by suppressing miR-155 expression and activating FBXW7 function. Am J Cancer Res. 2016 Nov 1;6(11):2561-2574. eCollection 2016.

114 - Yu H, Xue Y, Wang P, Liu X, Ma J, Zheng J, Li Z, Li Z, Cai H, Liu Y. Knockdown of long non-coding RNA XIST increases blood-tumor barrier permeability and inhibits glioma angiogenesis by targeting miR-137. Oncogenesis. 2017 Mar 13;6(3):e303.

115 - Jia P, Cai H, Liu X, Chen J, Ma J, Wang P, Liu Y, Zheng J, Xue Y. Long non-coding RNA H19 regulates glioma angiogenesis and the biological behavior of glioma-associated endothelial cells by inhibiting microRNA-29a. Cancer Lett. 2016 Oct 28;381(2):359-69.

32

3500

3000

2500

2000

1500

1000

500 Number of publications per year ofperpublicationsNumber

0

Figure 1. Number of lncRNA related papers published per year. Search based on the National Library of Medicine Pubmed using the term "lncRNA" and annual date limitations (Jan 1- Dec 31).

33 lncRNA Function Proposed mechanisms Reference Promotes migration, invasion and cell growth Activates mTOR signaling pathway Wang Y et al., 2015 [78] Promotes cell proliferation, migration and invasion Sponges miR-384, causing PIWIL4 upregulation and increasing Zheng J et al., 2016 [79] and inhibits apoptosis STAT3 phosphorylation CRNDE Promotes cell proliferation, migration and invasion Sponges miR-186, causing XIAP and PAK7 upregulation Zheng J et al., 2015 [80] and inhibits apoptosis Promotes cell proliferation, migration and invasion Sponges miR-136-5p, inducing BCL2 and WNT2 protein levels Li DX et al., 2017 [81] ECONEXIN Promotes cell proliferation Sponges miR-411-5p, causing TOP2A upregulation Deguchi S et al., 2017 [82] Promotes angiogenesis, cell proliferation, invasion Activates the PI3K/Akt/mTOR signaling pathway Zhu Y et al., 2016 [84] HULC and migration Promotes cell proliferation and colony formation Not described Yan H et al., 2016 [85] Promotes apoptosis and decreased cell viability Activates caspase-3 and-9 and decreases BCL2 expression Li J et al., 2016 [87] Interacts with EZH2 and SUZ12 to suprress neuronal differentiation- Maintains stem cells stemness associated genes Katsushima K et al., 2016 [88] TUG1 Sponges miR-145 and maintaines SOX2 and MYC expression Promotes cell proliferation and invasion, and inhibits Not described Zhao Z et al., 2018 [89] apoptosis Interacts with p53 inducing its activity. Increases p53, caspases 3 and Inhibits cell proliferation and promotes apoptosis Wang P et al., 2012 [91] 8 expression Inhibits cell proliferation Decreases Wnt/β-catenin signaling Gong X et al., 2017 [92] DNMT1 hypermethylates MEG3 promoter, repressing MEG3 and MEG3 Inhibits cell proliferation and promotes apoptosis Li J et al., 2016 [93] inhibting p53 pathway Inhibits cell proliferation, migration and invasion Sponges miR-19a, inducing PTEN protein levels Qin N et al., 2017 [94] Sponges miR-93, induces caspase-3 and caspase-9 and repress Inhibits cell proliferation, and promotes apoptosis Zhang L et al., 2017[95] PI3K/AKT pathway Inhibits cell proliferation, migration, and invasion and Sponges miR-21 Wang et al., [96] promotes apoptosis CASC2 Inhibits cell proliferation, migration and invasion Suppresses Wnt/β-catenin signaling pathway Wang R et al., 2017 [97] Sensitizes cells to temozolomide Sponges miR-181a, inducing PTEN protein Liao Y et al., 2017 [98] Sensitizes cells to temozolomide Sponges miR-193a-5p and regulating mTOR signaling pathway Jiang C et al., 2018 [99] Inhibits cell proliferation, migration and invasion and Sponges miR-222, increasing BMF and PLXNC1 expression Zhao X et al., 2015 [104] promotes apoptosis GAS5 Sensitizes cells to erlotinib Not described García-Claver A et al., 2013 [105] Inhibits cell proliferation, migration and invasion and TUNAR Represses RAC1 and induces miR-200a Dai J et al., 2018 [106] promotes apoptosis Interacts with the PRC2 complex to regulate genes involved in cell Represses cell growth Zhang K et al., 2015 [32] cycle HOTAIR Promotes cell proliferation, migration and invasion Sponges miR-326, repressing FGF1 expression Ke J et al., 2015 [107] and inhibits apoptosis NEAT1 Inhibits cell growth and invasion Interacts with PRC2 and inhibits ICAT, GSK3B, and Axin2 Chen Q et al., 2018 [108] MALAT1 Inhibits cell proliferation and invasion Sponges miR-155, inducing FBXW7 expression Cao S et al., 2016 [109] XIST Promotes angiogenesis Increases CXCR7 expression Yu H et al., 2017 [110] H19 Promotes cell invasion and angiogenesis Induces VASH2 expression Jia P et al., 2016 [111]

Table 1. Summary of the lncRNAs and their role in gliomas. 34 Chapter 2: The functional role of APTR long non-coding RNA

Abstract

Previous studies carried out in the laboratory of Dr. Dutta identified a new lncRNA

(long non-coding RNA), called APTR (Alu-mediated p21 transcriptional regulator), which has been found to be upregulated in cancer and involved in cell cycle regulation. APTR expression represses the CDKN1A/p21 gene, a cell cycle inhibitor which inhibits progression through G1 and S phase. Correspondingly, APTR and p21 transcript levels are anti-correlated in glioma patient samples. Additionally, APTR recruits the PRC2 protein (Polycomb Repressive Complex 2) to the promoter region of p21 gene, which induces its suppression. Considering the possibility that APTR may interact with other genomic sites and recruit other binding proteins, this work intended to unravel the mechanisms which APTR uses to regulate gene expression.

In this study, I have found that APTR interacts with multiple genomic sites and binds to several proteins. Chromatin isolation by RNA purification followed by DNA sequencing (CHIRP-seq) demonstrated that APTR is associated to more than 5,000 genomic loci. However, CHIRP-qPCR revealed that 50% of these sites were false positive peaks. According to a microarray data, APTR regulates the expression of 274 genes and out of these 274, there are 86 sites with an adjoining APTR CHIRP-seq peak. Again,

CHIRP-qPCR showed that only around 50% of these sites were validated. Moreover, 115

APTR bound DNA loci are marked with EZH2. Using a pull-down system using cells overexpressing APTR, I discovered around 70 potential binding partners for APTR, including the transcription factors BRD7 (Bromodomain-containing protein 7) and NFIC

(Nuclear Factor I C). RNA Immunoprecipitation of these two proteins suggests that APTR

35 may be fragmented into three RNAs, where the 5’ RNA fragment binds to NFIC, and the

3’ RNA fragment associates with BRD7.

Introduction

The cell cycle is a precisely controlled process which is composed by four phases:

G1, S, G2 and M [1]. The cyclin-dependent kinase inhibitor p21 is an important negative regulator of the cell cycle, inhibiting cell proliferation [2]. By binding to and inhibiting cyclin

D/CDK4, cyclin D/CDK6 and cyclin E/CDK2, the p21 protein promotes the accumulation of hypophosphorylated form of the Rb protein, causing the repression of the E2F transcription factor, leading to a G1 phase cell cycle arrest [3,4]. Mechanistically, p21 uses two motifs, Cy1 and Cy2, to bind to the cyclin and another motif, the K motif that binds directly to the catalytic domain of the CDK [5] Another mechanism in which p21 inhibits cell proliferation is by interacting with the PCNA (proliferating cell nuclear antigen) causing inhibition of DNA replication [6]. In response to DNA damage, p21 expression is upregulated at the transcription level by the tumor suppressor p53 [7,8]. Additionally, p21 is also controlled post-transcriptionally by E3 ubiquitin ligases. In response to UV irradiation, the CRL4-CDT2 complex promotes the ubiquitination of p21 in S phase when it is bound to PCNA, targeting it to proteasome degradation [9]. Other E3 ubiquitin ligases also regulate the degradation of p21 by targeting it to the proteasome, such as SCF-SKP2

[10] and APC-CCDC20 [11].

In 2013 Dr. Negishi and his collaborators in Dr. Dutta’s laboratory published a study [12] showing that a new long non-coding (lncRNA) APTR (Alu-mediated p21 transcriptional regulator) was involved in regulating p21 mRNA. A siRNA screen of the

Full-length Long Japan (FLJ) library of sequenced human cDNAs followed by BrdU

36 incorporation was used to find lncRNAs required for cell proliferation and viability (Figure

1A). Accordingly, siAPTR and siORC2 (a DNA replication factor used as positive control) inhibited BrdU incorporation to comparable levels in both MCF10A and PC3 cell lines, suggesting that APTR is essential for cell proliferation. Moreover, APTR was shown to be

2.3kb in length, capped and poly(A) tailed lncRNA expressed mainly in nucleus. In addition, it is an intergenic lncRNA (transcribed in chromosome 7q21, between PTPN12 and RSBN1L, on the opposite strand). APTR also contains two sequences complementary to SINE/Alu elements (c-Alu) and one sequence complementary to

LINE/L2 element (Figure 1B and C). Interestingly, the c-Alu sequence was shown to be required for p21 suppression. The cell cycle profile following APTR knockdown and release into nocodazole (to block cells in mitosis) showed that cells treated with siAPTR were unable to efficiently progress through G1 phase and early S phase. Also, induction of both the p21 mRNA and protein was observed after siAPTR treatment. By MS2- crosslinking and immunoprecipitation (MS2-CLIP) technique, APTR was proven to associate with the p21 promoter. Chromatin immunoprecipitation (ChIP) using antibodies against one of the PRC2 components, revealed that Suz12 associates with the p21 promoter in an APTR dependent manner (Figure 1D and E). In addition, this study also reported that APTR and p21 transcript levels were anti-correlated in glioblastoma (GBM) patient samples (Figure 1F), suggesting that the elevated expression of APTR is important for decreasing the expression of the cell-cycle inhibitor p21 in GBM tumors and thus promote cell proliferations. In summary, Dr. Negishi’s work showed that APTR is an oncogenic nuclear lncRNA critical for tumor progression by recruiting the PRC2

37

(Polycomb Repressive Complex 2, a repressive chromatin modulator) proteins to the p21 promoter locus and repress its expression.

Other groups have reported that APTR is also an important lncRNA in promoting liver pathologies. In 2015, Funjun Yu and collaborators [13] demonstrated that APTR is upregulated in fibrotic liver patient samples. Additionally, APTR levels were increased in serum from patients with liver cirrhosis. Another study, from Shanshan Yu [14], reported that APTR can be used as a biomarker for cirrhotic patients with portal hypertension.

Although Dr. Negishi’s work revealed that APTR recruits the PRC2 to the promoter region of p21 gene, suppressing its expression, it is still unknown whether APTR is capable of interacting with other RNA binding proteins proteins and regulate the expression of genes other than p21. Therefore, this project aims to unravel the mechanisms that APTR uses to regulate gene expression by (1) mapping the genomic binding sites of APTR in the genome and (2) by identifying APTR binding proteins, other than PRC2, that interact with this lncRNA.

Results

APTR is highly upregulated in 293T cells

Except a few examples, like MALAT1, NEAT1 and HOTAIR [15], lncRNAs are generally expressed at lower levels than protein-coding genes [16], and some of them are present in less than a copy per cell. Using a recent RNA-seq data performed in the laboratory using U87 cells, I assessed the FPKM (Fragments Per Kilobase Million) for

APTR. In two of the three libraries APTR was not detected and in the third library the

38

FPKM for APTR was 10.5. As a comparison, the FPKM for NEAT1 was 4021.3, 5755.3 and 5615.6 in the three libraries. Working with a molecule that is as lowly expressed as

APTR, can be extremely difficult, and so, to solve this issue, I assessed the expression of APTR in 26 different of cell lines by qRT-PCR (Figure 2A and B). APTR was shown to be highly upregulated in 293T cells compared to other cell lines; for instance, it is around

17 times more expressed compared to the prostate cancer cell line PC3 and 3.6 times up-regulated in 293T over the U87 GBM cell line. Considering these differences in the level of expression among the different cell lines, the 293T cell line was used for further experiments, except for overexpression studies, where LNCap cells were used.

APTR suppresses p21 transcription by recruiting PRC2

To confirm previously published data, that APTR represses the p21 mRNA by recruiting the PRC2 to its promoter, I performed H3K27me3 and Suz12 ChIP followed by measuring the recovery of the p21 promoter region in the precipitate by qPCR (Figure

3A). This experiment confirmed that both of these proteins are enriched at the p21 promoter locus. Moreover, by repeating the ChIP-qPCR after APTR knockdown (Figure

3C), I could see an increase in p21 mRNA expression (Figure 3D) and a decrease in the

H3K27me3 and Suz12 marks at the p21 promoter region (Figure 3E and F), reinforcing the concept that APTR regulates the p21 transcript levels by recruiting the PRC2 to its promoter locus.

APTR is associated to more than 2,000 genomic loci

To investigate if APTR is able to regulate gene expression by interacting with other genomic loci, I used a technique called CHIRP (Chromatin Immunoprecipitation by RNA

39

Purification) [17]. Briefly, cells are lysed, and its chromatin is sonicated. Next, the cell lysate is hybridized with a DNA oligonucleotide 3’ biotinylated probe designed to be complementary to APTR or to LacZ, used as a negative control. Then the lncRNA, associated with cellular proteins and DNA, is isolated using magnetic beads conjugated to streptavidin. The co-purified chromatin is then eluted and stored for subsequent analysis.

After confirming that endogenous APTR was being recovered by CHIRP-qRT-PCR

(Figure 4A) and that the p21 promoter was associated with APTR (Figure 4B), high- throughput sequencing libraries were prepared using APTR bound DNA. LacZ was used as a negative control for the sequencing experiment, and 10% of the DNA input was used for background correction. Analysis of the CHIRP-seq peaks revealed that APTR was enriched at 5,419 loci in the genome. after considering a 4-fold cut-off comparing APTR bound DNA versus the negative control.

In order to confirm APTR interaction with these genomic sites, I performed APTR

CHIRP-qPCR using primers flanking the top 20 most enriched CHIRP-seq peaks. This validation, however, revealed a high false positive rate, since only 10 out of 20 peaks were confirmed with at least a 2-fold enrichment between APTR CHIRP and negative control (Table 1). In addition, for all sites the fold-enrichment by qPCR of CHIRP precipitates was much less than the fold-enrichment determined from CHIRP-seq.

Nevertheless a 50% confirmation rate, suggests that around 2,709 genomic sites interact with APTR. A table containing all the 5419 sites is stored at Dr. Dutta’s laboratory computer and it is available for further study.

APTR regulates the expression of 43 genes by binding to adjoining genomic loci

40

There are various examples in the literature where nuclear lncRNAs, like APTR, regulate gene expression by binding to regulatory regions on the DNA [18-20], and even though the CHIRP-seq experiment showed a high false positive rate, many genomic loci were confirmed to be associated to APTR. However, proving that APTR is bound to a determined DNA locus does not implicate that an adjoining gene is regulated by the lncRNA. Thus, to test the hypothesis that APTR regulates the expression of genes adjoining the CHIRP-seq sites, I took advantage of an existing RNA microarray data obtained in the laboratory after knockdown of APTR. I intersected a list of 274 genes that showed at least 4-fold change in expression upon APTR knockdown with the APTR

CHIRP peak sites. This analysis showed that there are 86 differentially expressed genes that are nearby sites identified by CHIRP-seq, 39 of which are up-regulated and 47 are down regulated after APTR knock-down. However, here again, CHIRP-qPCR validation step unveiled a high number of false positives peaks in our CHIRP-seq, corroborating the idea that the CHIRP pull-down assay was not stringent enough. Out of the 22 tested peaks, only 11 were validated (Table 2), i.e., approximately 50% of the CHIRP-seq peaks were confirmed by CHIRP-qPCR. Consequently, there are around 43 genes regulated by

APTR that have a binding site nearby.

To gain knowledge of potential pathways that APTR may regulate, I performed

GSEA (gene set enrichment analysis), a method that can identify biological pathways enriched among genes regulated by APTR from gene sets in databases like GO and

KEGG. Using the GeneTrail2 tool [21], I separately input the list of 39 up-regulated and

47 down regulated genes that contained adjoining CHIRP-seq sites to find that there was no biological pathway significantly enriched. This is intriguing, since APTR is essential for

41 cell proliferation, and therefore, an enrichment for cell viability and/or cell proliferation pathways was expected. Not surprisingly, since it only contained 11 genes, the list of transcripts whose transcription start site was near sites that associate with APTR in

CHIRP-qPCR also did not show an enrichment for a biological pathway.

115 APTR bound DNA loci are marked with EZH2

Since APTR has been shown to recruit PRC2 to p21 promoter region and repress p21 mRNA levels. I hypothesized that there would be a significant fraction of the APTR

CHIRP peaks would overlap with a PRC2 DNA enriched material. Thus, another student in the laboratory, Brian Reon, analyzed a publicly available EZH2 ChIP seq data

(GSM1215133) and found that 463 reads overlap with APTR CHIRP peak sites. Once again, the validation step (CHIRP-qPCR) indicated a high false positive rate. In fact, in this case, the percentage of false positive APTR bound sites was even higher than before, as only 3 sites out of 12 tested were positive for APTR association by CHIRP-qPCR (25% of confirmation) (Table 3).

APTR is not a cis-acting lncRNA

LncRNAs may be classified as cis-acting or trans-acting. LncRNAs that act in trans are capable of regulating the expression of a distant gene. In contrast, lncRNAs that act in cis regulate the expression of a neighboring gene, for example, by recruiting regulatory proteins to the DNA locus. Since APTR is transcribed from chromosome 7q21, between two genes PTPN12 (Protein Tyrosine Phosphatase, Non-Receptor Type 12) and

RSBN1L (Round Spermatid Basic Protein 1 Like), I considered the possibility that APTR may act in cis to regulate the expression of these genes. To this end, I knocked down

42

APTR in 293T cells and verified that neither PTPN12 or RSBN1L were differentially expressed when compared to siGL2 control samples (Figure 5). Therefore, APTR is not capable of regulating the expression of these two neighboring genes, ergo it is not a cis- acting lncRNA like an enhancer RNA (eRNA).

S1-APTR overexpression pull-down reveals more than 70 potential protein binding partners

In order to find protein partners of APTR, I overexpressed APTR fused to S1 aptamer, a streptavidin binding aptamer [22] in LNCaP cells and used streptavidin beads to pull-down APTR-S1. The eluate, containing APTR associated proteins were subjected to mass-spectrometry. This experiment revealed 72 potential proteins associated with

APTR compared to pull-downs using overexpressed S1 tag alone, and two other negative controls: S1-DRAIC and S1-DRAIC-S1 (Table 4).

To gain knowledge about potential biological pathways enriched in APTR-bound proteins, I again used the GeneTrail2 tool [21]. As opposed to the CHIRP results, this experiment revealed that two biological processes related to cell proliferation were found to be enriched (Table 5). The genes from these pathways that were associated with APTR are shown in Table 6). The first pathway was “regulation of cell cycle G1 S phase transition”, in which six proteins were represented: ATP2B4 (ATPase Plasma Membrane

Ca2+ Transporting 4), BRD7 (Bromodomain-containing protein 7), DDX3X (DEAD-Box

Helicase 3, X-Linked), PLRG1 (Pleiotropic Regulator 1), PSMC4 (Proteasome 26S

Subunit, ATPase 4) and PSMD2 (Proteasome 26S Subunit, Non-ATPase 2). In the second pathway, “regulation of G1 S transition of mitotic cell cycle”, the proteins enriched were five of the same proteins involved in the previous pathway: BRD7, DDX3X, PLRG1,

43

PSMC4 and PSMD2. These data support the idea that APTR is involved in positively regulating the cell cycle.

In addition, GeneTrail2 also revealed 8 cellular components enriched in the APTR- bound proteins list (Tables 5 and 7): “mitochondrial protein complex”, “mitochondrial membrane part”, “inner mitochondrial membrane protein complex”, “mitochondrial outer membrane”, “organelle outer membrane”, “outer membrane”, “NADH dehydrogenase complex, respiratory chain complex I, mitochondrial respiratory chain complex I” and

“respiratory chain complex”. This analysis suggests that APTR may interact with mitochondrial proteins to regulate the cell cycle progression. However, confirmatory experiments are necessary such as RIP of the proteins present in these categories, before pursuing this line of investigation.

Since APTR is primarily localized to the nuclei of cells (70% in the nucleus, 30% in the cytoplasmic fraction [12] and since it regulates the p21 RNA level, from the list of

72 possible APTR binding partners, I have selected two transcription factors that may contribute to the ability of APTR to regulate gene expression: BRD7 (Bromodomain- containing protein 7) and NFIC (Nuclear Factor I C). To validate the mass-spectrometry results, I performed RNA immunoprecipitation (RIP) using BRD7 or NFIC antibodies, and

IgG as a negative control. Both NFIC and BRD7 RIP followed by qPCR confirmed APTR association (Figure 6A and B). Interestingly, however, APTR was only detected in NFIC pull-down when using primers that detect its 5’ region, while primers amplifying the 3’ region or a central region did not show any interaction with NFIC (Figure 6A). A similar phenomenon was observed when BRD7 was pulled down. APTR was amplified when primers that detect the 3’ region were used, but when primers that detect the 5’ region or

44 the middle region were used, there was no signal from the BRD7 precipitate (Figure 6B).

These results suggest that APTR could be processed in at least three fragments. A 5’ region, which interacts with NFIC; a 3’ region, which binds to BRD7; and a central fragment that does not interact with NFIC or BRD7.

APTR secondary structure

There are, in the literature, examples where the secondary structure of a lncRNA is important for its function [23,24], because it often dictates which regions a lncRNA interacts with a protein partner. To address this, Simon MD and collaborators have developed a RNase H mapping approach to identify single-stranded accessible regions in a lncRNA molecule [25]. The accessible regions will be annealed to a single-stranded anti-sense DNA probe and become susceptible to RNaseH digestion. Following this, qPCR using primers that flank the probed region determine whether the RNaseH digested the probed region and so diminished the yield of the PCR product. Some sections of a lncRNA are more accessible for oligonucleotide hybridization than others, based on whether they are single-stranded or not and based on whether they are not obscured by interacting cellular proteins. These regions of the lncRNA are usually located at single- stranded loops in the RNA structure.

Using 39 antisense single stranded DNA probes complementary to APTR, I was able to find four accessible regions: (1) from nucleotides 812 to 831 [probe #17]; (2) from nucleotides 1458 to 1477 [probe #25]; (3) from nucleotides 1738 to 1757 [probe #32] and

(4) from nucleotides 1898 to 1917 [probe #36] (Figure 7A and B). As expected, the majority of the RNA is either protected by a protein or folded into a stem structure, since the other 35 probes did not promote RNaseH digestion.

45

To address whether these accessible regions were indeed in loop regions, I used a publicly available RNA secondary structure prediction tool, mfold [26], to predict the folding structure of APTR. Because mfold limits the number of nucleotides used as input,

I used sections of APTR of 200 nucleotides. The predicted folding is consistent with the idea that the regions covered by probes #17 (nucleotides 812 to 831) and #32

(nucleotides 1738 to 1757) are located in loops (Figures 7B, 8A and 8C). The accessibility seen by probe #25 (nucleotides 1458 to 1477) can be partially explained because although a significant portion is predicted to fold into a stem structure, there is a loop that starts at the nucleotide 1463 and goes until 1469 (Figure 8B). However, the secondary structure predicted by mfold could not explain why nucleotides 1898 to 1917 (probe #36) showed high accessibility levels (Figures 7B and 8D).

Discussion

For a long period of time, it was believed that most RNA molecules were solely an intermediate in the genetic information flow. The central dogma of molecular biology stated that proteins were the main effectors of the cellular molecular processes [27].

Recently, however, there was a paradigm shift thanks to the advent of whole genome and transcriptome sequencing techniques. High-throughput sequencing methods revealed that more than 70% of the human genome is transcribed, whereas, the ENCODE

(Encyclopedia of DNA Elements) project showed that only 2% of the human transcriptome is translated into proteins, and the rest is expressed in the form of non-coding RNAs

(ncRNAs) [28-30].

Among these RNAs, the transcripts larger than 200 nucleotides encoding no open reading frames >50 amino acids are believed to be lncRNAs. These molecules are

46 generally expressed at lower levels than protein-coding genes [16], but have been proposed to play a role in diverse biological functions [31], such as development, cellular differentiation, reprogramming of stem cells and diseases, including cancer [32-35]. In this context, APTR was discovered in Dr. Dutta’s laboratory. This lncRNA was found to negatively regulate the p21 expression by recruiting the PRC2 components to its promoter region.

Here, I tested whether APTR interacts and regulates other genes by associating with different cellular proteins, such as transcription factors. For this purpose, after a

CHIRP-seq screen, CHIRP-qPCR validation discovered that APTR was enriched at more than 2,000 sites in the genome. Even though many genomic sites were found to be associated with APTR, to my surprise, the p21 promoter region was not one of them, even though the APTR CHIRP reactions input into the sequencing library were initially pre- screened by PCR for the enrichment of the p21 promoter. This result suggests that the high throughput sequencing did not have enough depth to detect all potential APTR bound DNA, including the p21 promoter. Additionally, based on results from previous

CHIRP-seq experiments performed in the laboratory and other groups reports in the literature, we concluded that, although the negative control (LacZ probes pull-down) did not recover APTR RNA (Figure 4A), the pull-down stringency was not high enough, since

50% of the CHIRP-seq peaks were not validated by CHIRP-qPCR. Based on this, the

CHIRP-seq experiment should be further optimized, increasing the stringency of the pull- down and consequently decrease the number of false positive peaks. To this end, a different set of nonspecific oligos could be used as a negative control, such as probes that hybridize to an unrelated lncRNA. Also, another negative control filter is to perform

47 the experiment using a cell line where APTR is absent. Moreover, even though 293T cells expressed the highest levels of APTR, among the cell lines tested, its expression is still low, and so, an alternative approach would be increasing the starting number of cells or using cells that stably overexpress APTR. Finally, the sequencing depth has to be increased by pooling several CHIRP experiments so that our positive control site, the p21 promoter, is reliably detected in the CHIRP-seq data.

In an attempt to increase the specificity of the peaks identified by CHIRP-seq, I intersected the results from this experiment with two other data sets. First, I used an existing siAPTR microarray data available in the laboratory, to identify APTR-bound sites that are near APTR regulated transcription start sites. Unfortunately, this strategy did not improve the CHIRP-qPCR validation, as the confirmation rate was still around 50%.

Although microarray data from APTR depleted cells discover 274 differentially expressed genes, only a small fraction of these, 43 genes, contained a validated APTR binding site nearby. Surprisingly, GSEA did not reveal any biological pathway enriched in this gene list, most likely because the gene list was too short. The second attempt to improve the

CHIRP-qPCR validation rate was dependent on the expectation that APTR binds to PRC2 complex: we intersected a publicly available EZH2 ChIP-seq dataset with the APTR

CHIRP-seq data to identify sites that were bound both by EZH2 and APTR. Surprisingly,

APTR CHIRP validation by qPCR on this list of sites decreased from 50% to 25%. This is very intriguing, as APTR was already proven to interact with PRC2 components, including EZH2 [12]. As mentioned previously, these results only confirmed the idea that pull-down specificity was not high enough and there are a lot of false positive peaks in the APTR CHIRP-seq experiment.

48

As an intergenic lncRNA, APTR could be acting in cis to regulate the expression of its neighboring genes, PTPN12 and RSBN1L. Even though these transcripts were not in the pool of differentially regulated genes upon APTR knockdown by microarray,

PTPN12 and RSBN1L RT-qPCR was performed in siAPTR treated cells, considering that qPCR is a more sensitive method than the microarray. However, it was demonstrated here that APTR is not a cis-acting lncRNA (like an eRNA), since it does not regulate the expression of its neighboring genes.

Instead of continuing the isolation of endogenous APTR, I used an overexpression system to find APTR protein partners in which APTR was fused to a streptavidin binding aptamer, named S1. Biotin beads were used to pull down the S1 aptamer fused to the

APTR and associated cellular proteins. Next, the APTR enriched material was sent to mass spectrometry. This method discovered around 70 potential APTR binding proteins.

As a confirmation strategy, RIP was performed using two of the transcription factors presented in the list. NFIC and BRD7. Each of these proteins were shown to be associated with different regions on APTR. This is an unexpected piece of data, since there is no immediate reason for only one part of APTR, but not the others, to be present in the NFIC and BRD7 pull downs. One explanation for this result is that although APTR is expressed as a full length 2.3kb transcript, perhaps it is processed into smaller regions that interact with different cellular proteins. For example, APTR could be processed into at least three fragments: the 5’ fragment interacts with NFIC; the 3’ fragment binds to

BRD7; and a central fragment that does not interact with NFIC or BRD7. One idea to further investigate this hypothesis is to perform northern blotting of cellular RNA (and of

RNA associated with NFIC and BRD7 using probes that detect different parts of APTR to

49 test whether multiple bands appear when probing the total cellular RNA, but selected bands appear in the NFIC or BRD7 immunoprecipitates. Moreover, even though only two proteins were chosen to be validated, both of them were proven to be associated with

APTR. Thus, this overexpression system could also be more appropriate for detecting the sites in the genome where APTR interacts.

Through a RNase H mapping strategy, I was able to identify four accessible regions in the APTR lncRNA molecule. These sites are thought to represent loops in

APTR secondary structure and two of these areas, nucleotides 812 to 831 and 1458 to

1477 were confirmed by an RNA structure prediction tool. To improve this data, the

RNase H mapping experiment could be repeated using a proteinase K digestion step before the probe hybridization, thus, the areas protected by a protein should become accessible. Therefore, intersecting the data from this experiment with the conventional

RNase H mapping, previously performed, it will be possible to map the protein binding sites on APTR molecule. This information could reveal potential binding motifs on APTR.

Considering that many RBPs (RNA binding proteins) have known binding motifs, this data could be used to narrow down the APTR protein interactor investigation.

Since this work was begun, other groups have proposed that APTR is implicated in liver pathology. Funjun Yu et al. suggested that APTR is a potential biomarker for liver cirrhosis, since its expression was increased in serum of patients with liver cirrhosis [13].

Moreover, the same group reported that APTR is upregulated in fibrotic liver tissues when compared to normal liver [13]. APTR is also upregulated during activation of hepatic stellate cells (HSCs) [13], an important step during initiation and progression of liver fibrosis [36]. In addition, siAPTR treatment diminishes the activation of HSCs in vitro and

50 inhibits the progression of mouse liver fibrosis in vivo. More recently, another study analyzed APTR expression during transjugular intrahepatic portosystemic shunt (TIPS) procedure [14]. In TIPS procedure, a tube is placed between the portal vein and the hepatic vein to decrease portal hypertension, which is an important cause of mortality in cirrhotic patients [37]. Therefore, analyzing blood drawn from patients with liver cirrhosis and portal hypertension, Shanshan Yu et al. showed that APTR upregulation is an indicative of poor patient outcome. In addition, APTR expression was reduced in the portal vein after TIPS, indicating that APTR can be used as a biomarker for cirrhotic patients with portal hypertension. Thus, a fruitful line of inquiry would be to study APTR on a hepatoma cell line like HepG2 or Huh7.

Materials and Methods

Cell culture, knockdown and overexpression of APTR

293T cells were maintained in DMEM supplemented with 10% FBS and 1% P/S.

LnCap cells were maintained in RPMI-1640 medium supplemented with 10% FBS and

1% P/S. For knockdown, 293T cells were transfected for 8h using 40 ηM of siAPTR_1

(5’-CCAGGUACUGCCUUCUAAC-3’), siAPTR_2 (5’-CCAUGAUCCGGUAUCACCA-3’) or a nonspecific siGL2 control siRNA (5’-CGUACGCGGAAUACUUCGA-3’) and 5 µL of

Lipofectamine 2000 transfection reagent (Thermo Fisher). 48h later, the cells were harvested and used for subsequent analysis.

51

For overexpression, 20µg of S1-APTR, S1-DRAIC, DRAIC-S1 or pCDNA3-flag vectors were transfected into 20x106 LnCap cells using 20µL of Lipofectamine 2000

(Thermo Fisher). 48 hours later, cells were harvested and used for downstream analysis.

RNA isolation, cDNA synthesis and qPCR

Total RNA and nuclear/cytoplasmic RNAs were extracted using TRIzol total RNA isolation reagent (Thermo Fisher), Protein and RNA Isolation System (ThermoFisher), respectively. cDNA was produced from 1µg RNA using Superscript III kit (Thermo Fisher) according to manufacturer’s instructions.

Chromatin Immunoprecipitation (ChIP)

10x106 293T cells were collected in 7mL of culture medium. Fixation was performed by adding 700µL of 11x fixation solution [11.1% formaldehyde, 50mM HEPES

(pH 8.0), 100mM NaCl, 1mM EDTA (pH=8.0)] and the cells were incubated for 15min on a shaking platform. To quench crosslinking, 700µL of 1.5M glycine was added and the cells were centrifuged at 1200rpm for 3min at 4ºC and the pellet was washed with PBS.

Cells were again centrifuged at 1200rpm for 3min at 4ºC Next, cell pellet was resuspended in 1400µL of SDS lysis buffer [10mM EDTA (pH=8.0), 500mM Tris-HCl

(pH=7.0), 1% SDS] supplemented with 14µL of Protease Inhibitor and incubate on ice for

20min. Cells were sonicated in a 4°C water bath at 10% POWER with 30 seconds ON,

30 seconds OFF pulse intervals for 8 min. The cell lysate was then transferred to a new

1.5mL tube and spun at maximum speed for 2min at 4ºC. 200µL of the sonicated sample was transferred to a new 2mL tube and 1800µL of ChIP dilution buffer [50mM TrisHCl

(pH=8.0), 150mM NaCl, 1% Triton X-100 and 0.1% Sodium Deoxycholate] supplemented

52 with 20µL of proteinase inhibitor] was added to each sample. 200µL of the sample was stored to be used as 10% input. The mixture was centrifuged at 10,000rpm for 10s at 4ºC.

The supernatant was collected and 580µL was transferred to a new tube and 25µL of C1

Dynabeads slurry was added to each tube. 2µg of the antibody of interest was added and the lysate was incubated at 4°C in a rotating platform for 16h. Lysate and beads mixture was washed 4 times with RIPA buffer. After the last wash, samples were eluted using

200µL of elution buffer [300mM NaCl, 5mM EDTA (pH=8.0), 0.5% SDS]. Next, samples were de-crosslinked at 65ºC using dry bath for 8h. After RNase and Proteinase K treatment, DNA was isolated using phenol chloroform and ethanol precipitation.

Chromatin Isolation by RNA Purification (CHIRP)

CHIRP was performed as previously described [17] with a few changes. Briefly,

20x106 293T cells were collected in culture medium and washed 2 times with 10mL of

PBS. Cells were resuspended in 1200µL of SDS lysis buffer [10mM EDTA (pH=8.0),

500mM Tris-HCl (pH=7.0), 1% SDS, 12µL of Protease Inhibitor and 6µL of RNase

Inhibitor] and incubate on ice for 20min. Cells were sonicated in a 4°C water bath at 10%

POWER with 30 seconds ON, 30 seconds OFF pulse intervals for 8 min. The cell lysate was then transferred to a new 1.5mL tube and spun at maximum speed for 2min at 4ºC.

400µL of the sonicated sample was transferred to a new 1.5mL tube and 800µL of hybridization buffer [750mM NaCl, 1% SDS, 50mM Tris-Cl (pH=7.0), 1mM EDTA, 15% formamide, supplemented with proteinase and RNase inhibitors] was added to each sample. 40µL of the sample was stored to be used as 10% input. To each CHIRP sample,

25µL of C1 Dynabeads slurry and 700 ρmoles of each probe set (APTR or lacZ) was added to each tube. The mixture was incubated overnight with rotation at 37ºC. Samples

53 were washed 4 times with RIPA buffer. After the last wash, samples were eluted using

200µL of elution buffer [300mM NaCl, 5mM EDTA (pH=8.0), 0.5% SDS]. After RNase and

Proteinase K treatment, DNA was isolated using phenol chloroform and ethanol precipitation. To assess the RNA recovery, RNA was isolated using TRIzol total RNA isolation reagent (Thermo Fisher).

S1 aptamer pull-down

20x106 of LnCap cells overexpressing S1-APTR, or the negative controls, were lysed using 5mL of lysis buffer [200mM Tris HCL (pH=7.5), 500mM NaCl, 20mM MgCl2,

1% NP-40, 1mM DTT, proteinase and RNase inhibitors]. Cells were passed through a

1mL syringe for 5 times and centrifuged at maximum speed for 30min. The supernatant was collected and the lysate was pre-cleared using 100µL of avidin agarose beads at 4ºC for 1h. Samples were centrifuged at 5000xg for 5min. The supernatant was collected and

25µL of C1 Dynabeads slurry was added and the samples were incubated overnight at

4ºC with shaking. Samples were then washed 5 times using 1mL of lysis buffer [50mM

HEPES, 500mM NaCl, 1mM MgCl2, 0.1% NP-40, proteinase and RNase inhibitors]. The dried beads were then sent for mass spectrometry at Dr. James Wohlschlegel laboratory in the University of California, Los Angeles.

RNA Immunoprecipitation (RIP)

20x106 293T cells were collected in culture medium and washed with PBS 2 times.

Cell pellet was resuspended in 1000µL of SDS lysis buffer [10mM EDTA (pH=8.0),

500mM Tris-HCl (pH=7.0), 1% SDS, protease and RNase inhibitors] and incubate on ice for 20min. The cell lysate was centrifuged at 10,000rpm for 3min at 4ºC. 1000µL of RIPA

54 buffer, supplemented with proteinase and RNase inhibitors, was added to each sample.

200µL of the sample was stored to be used as 10% input. Next, the mixture was pre- cleared using 20 µL of C1 Dynabeads slurry for 1h at 4ºC. Beads were separated from the solution, and the supernatant was mixed with 30µL of C1 Dynabeads slurry and 5µg of the antibody of interest was added. The lysate was incubated at 4°C in a rotating platform for 16h. Lysate and beads mixture was washed 4 times with RIPA buffer. After the last wash, samples were eluted using 200µL of proteinase K buffer [100mM NaCl,

10mM TrisCl (pH=7.0), 1mM EDTA, 0.5% SDS]. After Proteinase K treatment, RNA was isolated using TRIzol total RNA isolation reagent (Thermo Fisher).

RNase H mapping

RNase H mapping was performed as previously described [25] with a few changes.

Briefly, 20x106 293T cells were collected in 7mL of culture medium. Fixation was performed by adding 700µL of 11x fixation solution [11.1% formaldehyde, 50mM HEPES

(pH 8.0), 100mM NaCl, 1mM EDTA (pH=8.0)] and the cells were incubated for 15min on a shaking platform. Cells were centrifuged at 1200rpm for 3min at 4ºC and the pellet was washed twice with PBS. Then, cell pellet was resuspended in 600µL of SDS lysis buffer

[10mM EDTA (pH=8.0), 500mM Tris-HCl (pH=7.0), 1% SDS, protease and RNase inhibitors]. Cells were sonicated in a 4°C water bath at 10% POWER with 30 seconds

ON, 30 seconds OFF pulse intervals for 8 min. The cell lysate was then transferred to a new 1.5mL tube and spun at maximum speed for 2min at 4ºC. Nest, 15μL of buffer A

[10μL of cell lysate, 0.03μL of 1M MgCl2, 1μL of 0.1M DTT, 2.5μL of 5U/μl RNase H and

2μL of 5U/μL Rnase Inhibitor] was added to 1μL of each RNase H probe (100 pmol/μl stock) except for the negative controls, where water should be used instead of a DNA

55 oligonucleotide. The mixture was incubated at 30ºC for 30min. Then, 1μl of buffer B [1μL of RQ1 DNase per reaction and 0.1μL of 60mM CaCl2 per reaction] was added to each tube and the sample was incubated 30ºC for 10min. To quench crosslinking, 2μL of quenching buffer [20µL of 0.5M EDTA; 20µL of 1M Tris HCl (pH=7.0); 20µL of 10% SDS;

20µL of 20mg/ml Proteinase K] was added to each tube and samples were incubated for

60 min at 55◦C followed by 2h at 65◦C. RNA was isolated using TRIzol total RNA isolation reagent (Thermo Fisher). DNA was degraded using RQ1 RNase-Free DNase (Promega). cDNA was produced using Superscript III kit (Thermo Fisher) according to manufacturer’s instructions. Next, qPCR was performed and the results were analyzed using the following formula:

APTR structure predictions

Secondary structure predictions of APTR were determined using mfold. The lowest predicted free energies structure was selected for analysis.

References

1 – Cooper GM. The Cell: A Molecular Approach. 2nd edition. Sunderland (MA): Sinauer Associates; 2000. The Eukaryotic Cell Cycle. Available from: https://www.ncbi.nlm.nih.gov/books/NBK9876/

56

2 – Abbas T, Dutta A. p21 in cancer: intricate networks and multiple activities. Nat Rev Cancer. 2009 Jun;9(6):400-14. doi: 10.1038/nrc2657. Review.

3 – Sherr CJ, Roberts JM. CDK inhibitors: positive and negative regulators of G1-phase progression. Genes Dev. 1999 Jun 15;13(12):1501-12. Review

4 – Bartek J, Lukas J. Mammalian G1- and S-phase checkpoints in response to DNA damage. Curr Opin Cell Biol. 2001 Dec;13(6):738-47. Review.

5 – Chen J, Saha P, Kornbluth S, Dynlacht BD, Dutta A. Cyclin-binding motifs are essential for the function of p21CIP1. Mol Cell Biol. 1996 Sep;16(9):4673-82.

6 – Waga S, Hannon GJ, Beach D, Stillman B. The p21 inhibitor of cyclin-dependent kinases controls DNA replication by interaction with PCNA. Nature. 1994 Jun 16;369(6481):574-8.

7 – el-Deiry WS, Tokino T, Velculescu VE, Levy DB, Parsons R, Trent JM, Lin D, Mercer WE, Kinzler KW, Vogelstein B. WAF1, a potential mediator of p53 tumor suppression. Cell. 1993 Nov 19;75(4):817-25.

8 – Dulić V, Kaufmann WK, Wilson SJ, Tlsty TD, Lees E, Harper JW, Elledge SJ, Reed SI. p53-dependent inhibition of cyclin-dependent kinase activities in human fibroblasts during radiation-induced G1 arrest. Cell. 1994 Mar 25;76(6):1013-23.

9 – Abbas T, Sivaprasad U, Terai K, Amador V, Pagano M, Dutta A. PCNA-dependent regulation of p21 ubiquitylation and degradation via the CRL4Cdt2 ubiquitin ligase complex. Genes Dev. 2008 Sep 15;22(18):2496-506.

10 – Bornstein G, Bloom J, Sitry-Shevah D, Nakayama K, Pagano M, Hershko A. Role of the SCFSkp2 ubiquitin ligase in the degradation of p21Cip1 in S phase. J Biol Chem. 2003 Jul 11;278(28):25752-7.

11 – Amador V, Ge S, Santamaría PG, Guardavaccaro D, Pagano M. APC/C(Cdc20) controls the ubiquitin-mediated degradation of p21 in prometaphase. Mol Cell. 2007 Aug 3;27(3):462-73.

57

12 - Negishi M, Wongpalee SP, Sarkar S, Park J, Lee KY, Shibata Y, Reon BJ, Abounader R, Suzuki Y, Sugano S, Dutta A. A new lncRNA, APTR, associates with and represses the CDKN1A/p21 promoter by recruiting polycomb proteins. PLoS One. 2014 Apr 18;9(4):e95216.

13 - Yu F, Zheng J, Mao Y, Dong P, Li G, Lu Z, Guo C, Liu Z, Fan X. Long non-coding RNA APTR promotes the activation of hepatic stellate cells and the progression of liver fibrosis. Biochem Biophys Res Commun. 2015 Aug 7;463(4):679-85.

14 - Yu S, Qi Y, Jiang J, Wang H, Zhou Q. APTR is a prognostic marker in cirrhotic patients with portal hypertension during TIPS procedure. Gene. 2018 Mar 1;645:30-33.

15 - Dodd DW, Gagnon KT, Corey DR. Digital quantitation of potential therapeutic target RNAs. Nucleic Acid Ther. 2013 Jun;23(3):188-94.

16 - D8 - Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, Guernec G, Martin D, Merkel A, Knowles DG, Lagarde J, Veeravalli L, Ruan X, Ruan Y, Lassmann T, Carninci P, Brown JB, Lipovich L, Gonzalez JM, Thomas M, Davis CA, Shiekhattar R, Gingeras TR, Hubbard TJ, Notredame C, Harrow J, Guigó R. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 2012 Sep;22(9):1775-89.

17 - Chu C, Qu K, Zhong FL, Artandi SE, Chang HY. Genomic maps of long noncoding RNA occupancy reveal principles of RNA-chromatin interactions. Mol Cell. 2011 Nov 18;44(4):667-78.

18 - Vance K.W. The long non-coding RNA Paupar regulates the expression of both local and distal genes. EMBO J. 2014;33:296–311.

19 - Hacisuleyman E. Topological organization of multichromosomal regions by the long intergenic noncoding RNA Firre. Nat. Struct. Mol. Biol. 2014;21:198–206.

20 - Takayama K.I. Androgen-responsive long noncoding RNA CTBP1-AS promotes prostate cancer. EMBO J. 2013;32:1665–1680.

58

21 - Stöckel D, Kehl T, Trampert P, Schneider L, Backes C, Ludwig N, Gerasch A, Kaufmann M, Gessler M, Graf N, Meese E, Keller A, Lenhof HP. Multi-omics enrichment analysis using the GeneTrail2 web service. Bioinformatics. 2016 May 15;32(10):1502-8.

22 - Srisawat C, Engelke DR. Streptavidin aptamers: affinity tags for the study of RNAs and ribonucleoproteins. RNA. 2001;7:632–641.

23 - Neumann P, Jaé N, Knau A, Glaser SF, Fouani Y, Rossbach O, Krüger M, John D, Bindereif A, Grote P, Boon RA, Dimmeler S. The lncRNA GATA6-AS epigenetically regulates endothelial gene expression via interaction with LOXL2. Nat Commun. 2018 Jan 16;9(1):237.

24 - Liu B, Sun L, Liu Q, Gong C, Yao Y, Lv X, Lin L, Yao H, Su F, Li D, Zeng M, Song E. A cytoplasmic NF-κB interacting long noncoding RNA blocks IκB phosphorylation and suppresses breast cancer metastasis. Cancer Cell. 2015 Mar 9;27(3):370-81.

25 - Simon MD, Wang CI, Kharchenko PV, West JA, Chapman BA, Alekseyenko AA, Borowsky ML, Kuroda MI, Kingston RE. The genomic binding sites of a noncoding RNA. Proc Natl Acad Sci U S A. 2011 Dec 20;108(51):20497-502.

26 - Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003 Jul 1;31(13):3406-15.

27 - Wapinski O, Chang HY. Long noncoding RNAs and human disease. Trends Cell Biol. 2011 Jun;21(6):354-61. Review. Erratum in: Trends Cell Biol. 2011 Oct;21(10):561.

28 - Sánchez Y, Huarte M. Long non-coding RNAs: challenges for diagnosis and therapies. Nucleic Acid Ther. 2013 Feb;23(1):15-20. Review.

29 - Ma L, Bajic VB, Zhang Z. On the classification of long non-coding RNAs. RNA Biol. 2013 Jun;10(6):925-33. Review.

30 - ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74.

31 - Cheng W, Zhang Z, Wang J. Long noncoding RNAs: new players in prostate cancer. Cancer Lett. 2013 Oct 1;339(1):8-14. Review.

59

32 - Mercer TR, Qureshi IA, Gokhan S, Dinger ME, Li G, Mattick JS, Mehler MF. Long noncoding RNAs in neuronal-glial fate specification and oligodendrocyte lineage maturation. BMC Neurosci. 2010 Feb 5;11:14.

33 - Kretz M, Siprashvili Z, Chu C, Webster DE, Zehnder A, Qu K, Lee CS, Flockhart RJ, Groff AF, Chow J, Johnston D, Kim GE, Spitale RC, Flynn RA, Zheng GX, Aiyer S, Raj A, Rinn JL, Chang HY, Khavari PA. Control of somatic tissue differentiation by the long non-coding RNA TINCR. Nature. 2013 Jan 10;493(7431):231-5.

34 - Loewer S, Cabili MN, Guttman M, Loh YH, Thomas K, Park IH, Garber M, Curran M, Onder T, Agarwal S, Manos PD, Datta S, Lander ES, Schlaeger TM, Daley GQ, Rinn JL. Large intergenic non-coding RNA-RoR modulates reprogramming of human induced pluripotent stem cells. Nat Genet. 2010 Dec;42(12):1113-7. Erratum in: Nat Genet. 2010 Dec;42(12): 3 p following 1117.

35 - Wapinski O, Chang HY. Long noncoding RNAs and human disease. Trends Cell Biol. 2011 Jun;21(6):354-61. Review. Erratum in: Trends Cell Biol. 2011 Oct;21(10):561.

36 - Mederacke I, Hsu CC, Troeger JS, Huebener P, Mu X, Dapito DH, Pradere JP, Schwabe RF. Fate tracing reveals hepatic stellate cells as dominant contributors to liver fibrosis independent of its aetiology. Nat Commun. 2013;4:2823.

37 - Patidar KR, Sydnor M, Sanyal AJ. Transjugular intrahepatic portosystemic shunt. Clin Liver Dis. 2014 Nov;18(4):853-76. doi: 10.1016/j.cld.2014.07.006. Review.

60

A B Full length human cDNA library 21,243 Full-length Long Japan (FLJ)

Reads that did not contain an ORF

286 lncRNAs (Negishi et al., 2014)

siRNA screening (cell proliferation) C BrdU incorporation analysis normalized to MTT assay

5 lncRNAs

(Negishi et al., 2014)

APTR (Negishi et al., 2014)

D E F

Figure 1. APTR regulates p21 epigenetically via the PRC2 complex. A) Schematic of lncRNAs essentials for cell proliferation. 21,243 full length cDNAs present in the Full-length Long Japan (FLJ) database was screened for transcript that did not contain an open reading frame to find 286 lncRNAs. Using MCF10A cells BrdU incorporation was measured after siRNA transfection against each lncRNA and compared with cells transfected with siGL2 (negative control) and siORC2 (positive control). APTR was selected among the lncRNAs that inhibited BrdU incorporation to at least 90% of the level of inhibition seen with siORC2. B) Schematic of the APTR locus. In red, APTR lncRNA; in blue, flanking coding genes; in black, APTR exons. C) Schematic of lncRNA APTR. ‘‘c-‘‘: complementary to Alu or LINE elements. D) SUZ12 ChIP followed by qPCR of the p21 promoter in 293T cells transfected with either siGL2 or siAPTR. Numbers 1 through 7 in the x-axis represent 7 different pairs of primers for the p21 promoter region. E) H3k27me3 ChIP followed by qPCR of the p21 promoter in 293T cells transfected with either siGL2 or siAPTR. Numbers 1 through 7 in the x-axis represent 7 different pairs of primers for the p21 promoter region. F) APTR and p21 RNA expression in ten GBM patient samples measured by qPCR relative to normal brain tissue. Figure reproduced from Negishi et. al. [1] 61 A B APTR qRT-PCR APTR qRT-PCR 20 1.4 18 1.2 16 14 1 12 0.8 10 8 0.6 6 0.4

4 APTR Relative Expression Relative APTR

2 Expression Relative APTR 0.2 0

0

PC3

C4-2

293T

HeLa

VCaP

A549

MCF7

C4-2B

PC3M

T-47D

WI-38

U87

LNCaP

H1299

C-33 A C-33

U-2 OS U-2

293T

A172

DU-145

U251 U373

SK-BR-3

Hs 578T Hs

SK-OV-3

OVCAR-3

MDA-MB-231 MDA-MB-468

Figure 2. APTR is highly upregulated in 293T cells. A) RT-qPCR showing APTR expression in 22 different cell lines. APTR expression relative to PC3 cell line and normalized to GAPDH. B) RT-qPCR showing APTR expression in the GBM cell lines A172, U87, U251 and U373 relative 293T cell line and normalized to Actin.

62 A B p21

-5Kb -4Kb -3Kb -2Kb -1Kb +1 R2

C D

E F

Figure 3. APTR represses the p21 expression by recruiting PRC2 to its promoter. A) ChIP of H3k27me3 and SUZ12 on the p21 promoter in 293T cells. Negative: primer pair for RRP (used as negative control). B) Schematic of the p21 promoter region. Indicated as R2 is the primer pair for the p21 promoter located at -4kb from the TSS. C) qRT-PCR shows APTR knockdown efficiency by siAPTR using APTR Md primers. D) qRT-PCR shows APTR knockdown efficiency by siAPTR using APTR 3’ primers. E) ChIP of H3k27me3 and SUZ12 on the p21 promoter in 293T cells after siGL2 treatment. Negative: primer pair for RRP (used as negative control). F) ChIP of H3k27me3 and SUZ12 on the p21 promoter in 293T cells after siAPTR treatment. Negative: primer pair for RRP (used as negative control). 63 A B

6 6

5 5

4 4 LacZ 3 3 LacZ APTR

% Input% APTR 2 2

% of APTR RNA Recovery RNA ofAPTR% 1 1

0 0 R2 R3 R6 ORC3 APTR Md APTR 3’

p21 C APTR Md APTR 3’ amplicon amplicon -5Kb -4Kb -3Kb -2Kb -1Kb +1 R2 R3 R6

Figure 4. APTR CHIRP-qPCR enriches for the p21 promoter region. A) ChIRP enriches for APTR RNA in 293T cells. Black (APTR probes) and grey (LacZ probes) bars represent the percentage of APTR RNA recovered measured by qRT-PCR for 2 different sets of primers for APTR (APTR Md and APTR 3’). Lower panel is a representation of APTR molecule and the qPCR amplicons when APTR Md (in red) or APTR 3’ (in blue) primers are used. B) APTR interacts with the p21 promoter region by ChIRP-qPCR in 293T cells. Black (APTR probes) and grey (LacZ probes) bars represent the percentage of DNA purified and amplified by qPCR. R2, R3 and R6 are the different primer pairs for the p21 promoter, and ORC3 serve as negative control. C) Schematic of the p21 promoter region. Indicated as R2, R3 and R6 are the different primer pairs for the p21 promoter located at -4kb, -3kb and -1.5kb from the TSS, respectively.

64 A B C APTR Md RSNB1L PTPN12 1.2 1.2 1.4

1 1 1.2 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4

0.4 APTR Relative Expression Relative APTR

0.2 0.2 0.2

RSNB1L mRNA Relative Expression Relative mRNA RSNB1L PTPN12 mRNA Relative Expression Relative mRNA PTPN12 0 0 0

Figure 5. APTR knockdown does not alter the expression of its neighboring genes. A) qRT- PCR showing knockdown of APTR after treatment with siRNA. B) qRT-PCR of RSNB1L after APTR knockdown. C) qRT-PCR of PTPN12 after APTR knockdown.

65 A B NFIC RIP BRD7 RIP 0.25 0.5

0.2 0.4

0.15 0.3

IgG IgG % Input% % Input% 0.1 0.2 NFIC BRD7 0.05 0.1

0 0 APTR APTR APTRAPTR 3' Actin APTR APTR APTRAPTR 3' Actin 3315’ MdAxis Title3’ 3315’ MdAxis Title3’

Figure 6. NFIC and BRD7 RNA Immunoprecipitation suggests APTR is fragmented in at least 3 portions. A) qRT-PCR after NFIC IP showing APTR 5’ end enrichment relative to IgG control. B) qRT-PCR after BRD7 IP showing APTR 3’ end enrichment relative to IgG control. APTR Md portion was not detected in any of the two IPs.

66 A

B

30

25 [CT – CT ] 2 (APTR)(probe) (APTR)(no probe) 20 RNase H Sensitivity = [CT – CT ] 2 (GAPDH)(probe) (GAPDH)(no probe) 15

RNase H Sensitivity H RNase 10

5

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

Figure 7. RNase H mapping reveals accessible regions in APTR molecule. A) Schematic of the APTR lncRNA and the 39 different RNase H mapping probes, designed to assess the regions for probe hybridization on APTR. It also contains the qPCR primers amplicon sizes and their relative positions to APTR. B) RNase H sensitivity graph. The bars represent the RNase H activity using each of the 39 different probes. Highlighted in purple are the 4 most RNase H sensitive regions on APTR.

67 A B

C D

Figure 8. APTR secondary structure prediction. A) Predicted secondary structure of APTR; nucleotides 801 to 1000. Blue highlighted area represents RNase H probe #17 (nucleotides 812 to 831). B) Predicted secondary structure of APTR; nucleotides 1401 to 1600. Red highlighted area represents RNase H probe #25 (nucleotides 1468 to 1477). C) Predicted secondary structure of APTR; nucleotides 1601 to 1800. Green highlighted area represents RNase H probe #32 (nucleotides 1738 to 1757). D) Predicted secondary structure of APTR; nucleotides 1801 to 2000. Yellow highlighted area represents RNase H probe #36 (nucleotides 1898 to 1919). 68 CHIRP-seq APTR-bound sites with Highest Fold Enrichment Chromosome Position Nearest TSS CHIRP-seq Fold Change qPCR ratio R (APTR/lacZ) chr1:31263237-31264833 LAPTM5 79.1 1.0 chr12:122618961-122620557 LRRC43 74.1 1.3 chr22:23990993-23992589 DRICH1 61.5 1.9 chr3:33476072-33477668 UBP1 60.3 1.2 chr1:206873143-206874739 MAPKAPK2 59.0 1.2 chr16:89872816-89874412 FANCA 59.0 1.4 chr18:43788763-43790359 C18orf25 56.5 1.1 chr7:95769844-95771440 MIR591 55.2 0.8 chr8:125418076-125419672 TMEM65 55.2 0.6 chr6:37810777-37812373 ZFAND3 54.0 1.1 chr17:56726230-56727826 TEX14 51.5 2.6 chr4:39724910-39726506 UBE2K 51.5 2.5 chr19:1780199-1781795 ONECUT3 50.2 4.3 chr13:92152757-92154353 GPC5 49.0 3.4 chr12:83257869-83259465 TMTC2 47.7 9.5 chr15:65781878-65783474 DPP8 47.7 18.2 chr2:233678530-233680126 KCNJ13 47.7 6.4 chr8:128925385-128926981 PVT1 47.7 26.3 chr1:186434651-186436247 PDC 46.5 4.9 chr12:51463064-51464660 CSRNP2 46.5 26.3

Table 1. CHIRP-seq reveals DNA APTR bound sites. The genomic loci where APTR is binding and the respective chromosome position, the nearest Transcription Start Site (TSS), the CHIRP- seq fold enrichment (APTR/LacZ) and the qPCR ratio R (APTR/lacZ). Highest Fold Enrichment represents the loci where the fold enrichment between APTR pulldown over LacZ pulldown is the highest by CHIRP-seq.

69 A Genes Up Regulated upon siAPTR near CHIRP-seq APTR-bound sites

Chromosome Position Nearest TSS CHIRP-seq qPCR ratio R (APTR/lacZ) chr1:28578746-28580342 SESN2 4.3 0.6 chr11:62667599-62669195 SLC3A2 8.2 5.4 chr12:52371559-52373155 ACVR1B 8.5 0.9 chr12:52333509-52335105 ACVR1B 6.3 0.6 chr2:205339984-205341580 PARD3B 30.8 0.9 chr19:18492711-18494307 GDF15 7.4 2.8 chr19:18487075-18488671 GDF15 4.0 2.3 chr2:128713100-128714696 AMMECR1L 5.4 5.2 chr2:222305123-222306719 EPHA4 9.0 4.6 chr1:236298949-236300545 GPR137B 6.6 2.0 chr2:149918649-149920245 LYPD6B 6.3 1.4 chr4:12239225-12240821 HS3ST1 4.6 1.3

B Genes Down Regulated upon siAPTR near CHIRP-seq APTR-bound sites

Chromosome Position Nearest TSS CHIRP-seq qPCR ratio R (APTR/lacZ) chr20:34676660-34678256 EPB41L1 8.5 0.7 chr21:37511542-37513138 CBR3 17.6 1.7 chr16:69069041-69070637 HAS3 10.7 4.0 chr22:45557148-45558744 NUP50-AS1 10.5 6.7 chr6:34544293-34545889 SPDEF 10.3 0.9 chr12:56528458-56530054 ESYT1 10.3 1.7 chr2:10242007-10243603 RRM2 4.3 7.1 chr14:22017635-22019231 SALL2 4.9 0.9 chr1:99983235-99984831 PALMD 6.4 5.0 chr2:173224686-173226282 ITGA6 6.3 4.8

Table 2. APTR CHIRP-seq reveals peaks adjoining differentially expressed genes upon APTR knockdown. APTR is binding CHIRP-seq peaks and overlap with differentially regulated genes in siAPTR microarray data. Each column represents the chromosome position, the nearest Transcription Start Site (TSS), the CHIRP-seq fold enrichment (APTR/LacZ) and the qPCR ratio R (APTR/lacZ) for each CHIRP-seq peak. A) Overlap between APTR CHIRP-seq peaks and up- regulated genes discovered by siAPTR by microarray. B) Overlap between APTR CHIRP-seq peaks and down regulated genes discovered by siAPTR by microarray. 70 CHIRP-seq APTR-bound sites that overlap with EZH2 ChIP-seq sites Chromosome Position Nearest TSS CHIRP-seq qPCR ratio R (APTR/lacZ) chr1:31263237-31264833 LAPTM5 79.09 1.0 chr12:56579744-56581340 SMARCC2 45.19 1.8 chr11:1926738-1928334 LSP1 35.15 2.1 chrX:117112645-117114241 KLHL13 30.13 1.1 chr4:88016828-88018424 AFF1 30.13 1.0 chr9:99177839-99179435 ZNF367 28.87 1.1 chr7:107786075-107787671 NRCAM 28.87 1.0 chr15:72502915-72504511 PKM 26.36 1.4 chr5:60460847-60462443 SMIM15 25.74 1.1 chr12:57172828-57174424 HSD17B6 25.74 0.1 chr22:39545957-39547553 CBX7 25.11 2.2 chr14:75484129-75485725 MLH3 23.22 2.1

Table 3. APTR bound DNA loci are marked with EZH2. APTR CHIRP-seq peaks which overlap with EZH2 ChIP-seq peaks and the respective chromosome position, the nearest Transcription Start Site (TSS), the CHIRP-seq fold enrichment (APTR/LacZ) and the qPCR ratio R (APTR/lacZ).

71 Number of peptides detected by MS Description S1-DRAIC-S1 S1-DRAIC S1-APTR S1 alone 0 0 122.211 0 28S ribosomal protein S16, mitochondrial 0 0 99.267 0 Probable U3 small nucleolar RNA-associated protein 11 0 0 82.073 0 Parathymosin 0 0 75.76 0 Stromal cell-derived factor 2-like protein 1 0 0 73.007 0 Mitochondrial import inner membrane translocase subunit Tim17-B 0 0 73.007 0 NADH dehydrogenase [ubiquinone] 1 beta subcomplex subunit 10 0 0 72.168 0 NADH dehydrogenase [ubiquinone] 1 alpha subcomplex subunit 5 0 0 66.794 0 Transmembrane and coiled-coil domain-containing protein 1 0 0 66.091 0 Nucleoside-triphosphatase C1orf57 0 0 61.555 0 Galectin-7 0 0 57.678 0 Nuclear factor 1 C-type 0 0 54.716 0 H/ACA ribonucleoprotein complex subunit 2 0 0 51.997 0 39S ribosomal protein L30, mitochondrial 0 0 51.359 0 S-phase kinase-associated protein 1 0 0 50.634 0 TIM21-like protein, mitochondrial 0 0 50.129 0 Centrin-3 0 0 48.785 0 Actin-like protein 6A 0 0 45.008 0 Pleckstrin homology-like domain family B member 3 0 0 44.372 0 Putative cytochrome b-c1 complex subunit Rieske-like protein 1 0 0 38.052 0 UPF0711 protein C18orf21 0 0 37.207 0 Glutathione S-transferase Mu 3 0 0 37.152 0 DNA-directed RNA polymerase I subunit RPA43 NADH dehydrogenase [ubiquinone] 1 alpha subcomplex subunit 10, 0 0 35.372 0 mitochondrial 0 0 34.688 0 45 kDa calcium-binding protein 0 0 33.063 0 ATP-dependent Clp protease ATP-binding subunit clpX-like, mitochondrial 0 0 32.297 0 ATPase family AAA domain-containing protein 3B 0 0 29.581 0 Voltage-dependent anion-selective channel protein 3 0 0 29.068 0 Short/branched chain specific acyl-CoA dehydrogenase, mitochondrial 0 0 24.195 0 Melanoma-associated antigen B3 0 0 24.125 0 Secretory carrier-associated membrane protein 3 0 0 22.565 0 Transmembrane protein C2orf18 0 0 20.569 0 Zinc finger protein 552 0 0 20.319 0 Argininosuccinate synthase 0 0 20.319 0 Chromobox protein homolog 6 0 0 20.027 0 26S protease regulatory subunit 6B 0 0 19.069 0 Multiple C2 and transmembrane domain-containing protein 2 0 0 18.521 0 Translation initiation factor eIF-2B subunit gamma 0 0 18.399 0 Acid sphingomyelinase-like phosphodiesterase 3b 0 0 18.225 0 Centrosomal protein of 78 kDa

Table 4. S1-APTR overexpression pull-down shows more than 70 protein binding partners. Streptavidin pull-down followed by mass spectrometry of LnCap cells overexpressing APTR fused to S1 aptamer. First four columns represent the number of peptides detected by MS for each construct: S1-APTR, lncRNA of interest; and three different negative controls: S1 tag alone, S1- DRAIC-S1, S1-DRAIC-S1. Table continues on the next page.

72 Number of peptides detected by MS Description S1-DRAIC-S1 S1-DRAIC S1-APTR S1 alone 0 0 18.159 0 Ribosomal RNA processing protein 1 homolog A 0 0 17.85 0 Calcium-binding mitochondrial carrier protein SCaMC-2 0 0 17.296 0 Splicing factor, arginine/serine-rich 11 0 0 16.287 0 Pleiotropic regulator 1 0 0 16.13 0 Cytosol aminopeptidase 0 0 15.474 0 Atlastin-3 0 0 13.83 0 26S proteasome non-ATPase regulatory subunit 2 0 0 13.694 0 Hexokinase-2 0 0 13.546 0 Mitochondrial Rho GTPase 2 0 0 13.437 0 Heterogeneous nuclear ribonucleoprotein Q 0 0 12.859 0 Bromodomain-containing protein 7 0 0 12.859 0 RNA polymerase I-specific transcription initiation factor RRN3 0 0 12.761 0 Ankyrin repeat and SAM domain-containing protein 3 0 0 12.665 0 ATP-dependent RNA helicase DDX3X 0 0 12.545 0 Serine/threonine-protein kinase TAO1 0 0 12.513 0 RNA-binding protein 14 0 0 12.168 0 Nucleolar protein 10 0 0 12.15 0 Sp110 nuclear body protein 0 0 11.073 0 Deoxynucleotidyltransferase terminal-interacting protein 2 0 0 10.915 0 tRNA (cytosine-5-)-methyltransferase NSUN2 0 0 9.438 0 Lateral signaling target protein 2 homolog 0 0 9.302 0 Methionyl-tRNA synthetase, cytoplasmic 0 0 7.943 0 CCAAT/enhancer-binding protein zeta 0 0 7.382 0 Ubinuclein-1 0 0 7.267 0 Microtubule-associated protein 4 0 0 6.982 0 Transcription initiation factor TFIID subunit 2 0 0 6.746 0 Plasma membrane calcium-transporting ATPase 4 0 0 4.754 0 Laminin subunit beta-4 0 0 4.656 0 Laminin subunit beta-2 0 0 3.885 0 Alpha-tectorin 0 0 3.392 0 Microtubule-associated protein 1B 0 0 1.615 0 Protein piccolo

Table 4 continued. S1-APTR overexpression pull-down shows more than 70 protein binding partners. Streptavidin pull-down followed by mass spectrometry of LnCap cells overexpressing APTR fused to S1 aptamer. First four columns represent the number of peptides detected by MS for each construct: S1-APTR, lncRNA of interest; and three different negative controls: S1 tag alone, S1-DRAIC-S1, S1-DRAIC-S1.

73 S1-APTR associated proteins enriched in Gene Ontogoly Categories

Gene Ontology – Biological Process

Biological Process Number of Proteins Expected Score p-Value regulation of cell cycle G1 S phase 6 0.170257 0.001557 transition regulation of G1 S transition of mitotic 5 0.159690 0.020985 cell cycle – Cellular Component

Cellular Component Number of Proteins Expected Score p-Value

mitochondrial protein complex 6 0.160864 0.000107

mitochondrial membrane part 6 0.203135 0.000204 inner mitochondrial membrane protein 5 0.126812 0.000444 complex mitochondrial outer membrane 5 0.170257 0.001356

organelle outer membrane 5 0.198438 0.002214

outer membrane 5 0.205483 0.002214

NADH dehydrogenase complex & respiratory chain complex 3 0.048142 0.015419 I & mitochondrial respiratory chain complex I

respiratory chain complex 3 0.073974 0.045762

Table 5. Gene ontology categories enriched in APTR associated proteins. List of APTR- bound proteins enriched in the gene ontology categories biological processes and cellular components, according to GeneTrail2 tool.

74 S1-APTR associated proteins enriched in Gene Ontogoly Biological Processes regulation of cell cycle G1 S phase transition

Symbol Description ATP2B4 ATPase Plasma Membrane Ca2+ Transporting 4 BRD7 Bromodomain-containing protein 7 DDX3X DEAD-Box Helicase 3, X-Linked PLRG1 Pleiotropic Regulator 1 PSMC4 Proteasome 26S Subunit, ATPase 4 PSMD2 Proteasome 26S Subunit, Non-ATPase 2 regulation of G1 S transition of mitotic cell cycle Symbol Description BRD7 Bromodomain-containing protein 7 DDX3X DEAD-Box Helicase 3, X-Linked PLRG1 Pleiotropic Regulator 1 PSMC4 Proteasome 26S Subunit, ATPase 4 PSMD2 Proteasome 26S Subunit, Non-ATPase 2

Table 6. Gene ontology biological processes enriched in APTR associated proteins. List of APTR-bound proteins enriched in the gene ontology biological processes category, according to GeneTrail2 tool.

75 S1-APTR associated proteins enriched in Gene Ontogoly Cellular Component mitochondrial protein complex Symbol Description CLPX Caseinolytic Mitochondrial Matrix Peptidase Chaperone Subunit NDUFA10 NADH:Ubiquinone Oxidoreductase Subunit A10 NDUFA5 NADH:Ubiquinone Oxidoreductase Subunit A5 NDUFB10 NADH:Ubiquinone Oxidoreductase Subunit B10 TIMM17B Translocase Of Inner Mitochondrial Membrane 17B TIMM21 Translocase Of Inner Mitochondrial Membrane 21 mitochondrial membrane part Symbol Description NDUFA10 NADH:Ubiquinone Oxidoreductase Subunit A10 NDUFA5 NADH:Ubiquinone Oxidoreductase Subunit A5 NDUFB10 NADH:Ubiquinone Oxidoreductase Subunit B10 RHOT2 Ras Homolog Family Member T2 TIMM17B Translocase Of Inner Mitochondrial Membrane 17B TIMM21 Translocase Of Inner Mitochondrial Membrane 21 inner mitochondrial membrane protein complex Symbol Description NDUFA10 NADH:Ubiquinone Oxidoreductase Subunit A10 NDUFA5 NADH:Ubiquinone Oxidoreductase Subunit A5 NDUFB10 NADH:Ubiquinone Oxidoreductase Subunit B10 TIMM17B Translocase Of Inner Mitochondrial Membrane 17B TIMM21 Translocase Of Inner Mitochondrial Membrane 21 mitochondrial outer membrane Symbol Description ASS1 Argininosuccinate Synthase 1 DDX3X DEAD-Box Helicase 3, X-Linked HK2 Hexokinase 2 RHOT2 Ras Homolog Family Member T2 VDAC3 Voltage Dependent Anion Channel 3

Table 7. Gene ontology cellular component enriched in APTR associated proteins. List of APTR-bound proteins enriched in the gene ontology cellular component category, according to GeneTrail2 tool.

76 S1-APTR associated proteins enriched in Gene Ontogoly Cellular Component organelle outer membrane Symbol Description ASS1 Argininosuccinate Synthase 1 DDX3X DEAD-Box Helicase 3, X-Linked HK2 Hexokinase 2 RHOT2 Ras Homolog Family Member T2 VDAC3 Voltage Dependent Anion Channel 3 outer membrane Symbol Description ASS1 Argininosuccinate Synthase 1 DDX3X DEAD-Box Helicase 3, X-Linked HK2 Hexokinase 2 RHOT2 Ras Homolog Family Member T2 VDAC3 Voltage Dependent Anion Channel 3 NADH dehydrogenase complex, respiratory chain complex I, mitochondrial respiratory chain complex I Symbol Description NDUFA10 NADH:Ubiquinone Oxidoreductase Subunit A10 NDUFA5 NADH:Ubiquinone Oxidoreductase Subunit A5 NDUFB10 NADH:Ubiquinone Oxidoreductase Subunit B10 respiratory chain complex Symbol Description NDUFA10 NADH:Ubiquinone Oxidoreductase Subunit A10 NDUFA5 NADH:Ubiquinone Oxidoreductase Subunit A5 NDUFB10 NADH:Ubiquinone Oxidoreductase Subunit B10

Table 7 continued. Gene ontology cellular component enriched in APTR associated proteins. List of APTR-bound proteins enriched in the gene ontology cellular component category, according to GeneTrail2 tool.

77 Oligo Sequence (5’ to 3’) siGL2 CGUACGCGGAAUACUUCGA siAPTR_1 CCAGGUACUGCCUUCUAAC siAPTR_2 CCAUGAUCCGGUAUCACCA APTR 5’ primer F CGATTGATGGGAAGTGTTCA APTR 5’ primer R ACTGTTGCCGGTATCACAGC APTR Md primer F TGTGGGTACAAAAGGAGAGTAACAT APTR Md primer R CTGTAGTTGCAGCTCCAGATCTAC APTR 3’ primer F GAGGAAGAGAAATTCTGAGGGTAAAGATA APTR 3’ primer R AATGGATGAATAGAGTGATGCACAGATC Actin primer F TGAAGGCTTTTGGTCTCCCTG Actin primer R TCAACTGGTCTCAAGTCAGTGT GAPDH primer F ACTTTGTCAAGCTCATTTCC GAPDH primer R TGCAGCGAACTTTATTGATG p21 mRNA primer F GATGGAACTTCGACTTTGTCACCGA p21 mRNA primer R TGGTAGAAATCTGTCATGCTGGTCTG p21 promoter R2 primer F GTCTGCTGCAAATCTCAGTTTGCCC p21 promoter R2 primer R GTGTGCACGTAACAGAGCGCATCA p21 promoter R3 primer F CAGATTTGTGATGCTAGGAACATGA p21 promoter R3 primer R AGAGGCGGAACAAAGATAGAACATT p21 promoter R6 primer F GGCCTTTCTGGGGTTTAGCCACAA p21 promoter R6 primer R CTTCTTCCTCTAACGCAGCTGACCTC

Table 8. List of oligonucleotides used in this study.

78 Probe Sequence (5’ to 3’) Probe Sequence (5’ to 3’) APTR RNase H probe #1 CTCTTCGGAGGTCAGAGCTC APTR RNase H probe #21 TTTTGTTTTTTTTTAAAGGT APTR RNase H probe #2 GTGAACTCGCGTGCCCGGTG APTR RNase H probe #22 GAGATTTCTTCATGTTTTTG APTR RNase H probe #3 GTGAACTCGCGTGCCCGGTG APTR RNase H probe #23 GATTGTTTTCCTTTTTGGAA APTR RNase H probe #4 GCCCCGGACCCCCAGATCTG APTR RNase H probe #24 TACATAGCTATGTTGAGCTC APTR RNase H probe #5 GTCTCTGCGGCCACGAAGAC APTR RNase H probe #25 GTAGGCTAATTGTCTTAACT APTR RNase H probe #6 TCCCTTCCATTTTTCCTGGG APTR RNase H probe #26 ATGTGTGAAGATTATACTTG APTR RNase H probe #7 GCGCGTTCAGCAAGCTGGAT APTR RNase H probe #27 TTCATTTCTTCAGTAAATAT APTR RNase H probe #8 AGTCGGTAGTCGATTGATGG APTR RNase H probe #28 CCAACGAGTGTCTGTCTACA APTR RNase H probe #9 AAGAGAAAAAATTGCCGGGA APTR RNase H probe #29 TAAATCCCTGTCCTCGTGGA APTR RNase H probe #10 ATGTGGTGACAAGGTTTCAC APTR RNase H probe #30 AACAAACATTTAATTTATGT APTR RNase H probe #11 GAGCTGGCCATTTCCACATG APTR RNase H probe #31 TATAACAACATCTTGACTTC APTR RNase H probe #12 TTTTAACTACCCGTGTGATA APTR RNase H probe #32 CTAAAGAATTGCTACCTTCA APTR RNase H probe #13 GTGGGTACAAAAGGAGAGTA APTR RNase H probe #33 GGGAGAGAGGCCTGTGATAT APTR RNase H probe #14 ACTGGCTCTTGTTCCCATGG APTR RNase H probe #34 AGTAGCTGTAAAAAGAGGAA APTR RNase H probe #15 ATTTCAAAACTTCACATTTG APTR RNase H probe #35 ATAAAAAATGTTGAAATCTT APTR RNase H probe #16 GAAAGTTTCTCTGTGGCCCT APTR RNase H probe #36 GTCTCTTGCCCTAATTGATG APTR RNase H probe #17 TCCATCCAGCAACACAAGTC APTR RNase H probe #37 TCCTTCCCTGGACCTTCTCC APTR RNase H probe #18 AGCGCCAGCTGAGCCCCGGA APTR RNase H probe #38 GAACTTACTTTTAATGCATT APTR RNase H probe #19 ACTCTTCTGCTGCACATAGC APTR RNase H probe #39 CCAGTGGACAAATACTATAC APTR RNase H probe #20 GTTGCAGCTCCAGATCTACA

Table 9. List of RNase H probes used in this study.

79 Chapter 3: Adapted from “LINC00152 Promotes Invasion Through a 3'-hairpin

Structure and Associates with Prognosis in Glioblastoma”

Brian J Reon *, Bruno Takao Real Karia *, Manjari Kiran and Anindya Dutta 1

Department of Biochemistry and Molecular Genetics, University of Virginia,

Charlottesville, VA, USA.

*These authors contributed equally

1: correspondence to [email protected]

Molecular Cancer Research. Article in press.

I am co-first-author on this paper and my contributions to the paper are:

- I performed the subcellular fractionation of U87 cells, the western blot of Lamin A/C and Actin, qRT-PCR of LINC00152, Actin, and MALAT1 – Fig. 3A and B.

- I performed the subcellular fractionation of A172 cells, the western blot of Lamin A/C and Actin, qRT-PCR of LINC00152, Actin, and MALAT1 – Sup. Fig. 3A and B.

- I performed LINC00152 in-situ hybridization on U87 cells – Fig. 3C.

- I performed LINC00152 knockdown and overexpression, the qRT-PCRs and the respective invasion assays on U87 cells – Fig. 3D-G.

- I performed LINC00152 knockdown and overexpression, the qRT-PCRs and the respective invasion assays on A172 cells – Sup Fig 3.

- I performed LINC00152 knockdown and overexpression, the qRT-PCRs and the respective proliferation assays on U87 cells attached to plastic – Fig. 4A-C.

- I performed LINC00152 knockdown and overexpression, the qRT-PCRs and the respective proliferation assays on U87 cells in suspension – Fig. 5 and Fig 6.

- I performed the LINC00152 knockdown RNA-seq libraries.

80

- I constructed LINC00152 M8 vector – Fig 8B.

- I performed LINC00152 deletion mutants overexpression, the qRT-PCRs and the respective invasion assays – Fig 8C and D.

- I performed LINC00152 rescue experiments; including siRNA and plasmid transfections, the qRT-PCRs and the respective invasion assays – Fig 8E.

- I constructed LINC00152 M8, mut A and mut B and mut AB vectors, performed the overexpression, the qRT-PCRs and the respective invasion assays – Fig 9.

- I performed IGFBP4, LUM and TGM2 knockdowns, the qRT-PCRs and the respective invasion assays – Sup Fig 4.

- I performed IGFBP4, LUM and TGM2 knockdowns in the LINC00152 overexpressing cells, the qRT-PCRs and the respective invasion assays – Sup Fig 5.

- I performed BCL2L11 qRT-PCRs after LINC00152 knockdown and overexpression conditions; including including siRNA and plasmid transfections and the qRT-PCRs – Sup Fig 7.

- I performed TIMP3, CADM1, SLIT2, PTHLH, NID2, MATN3, TPM2, PTX3, IGFBP4, NNMT, TGM2, SPP1 and LUM qRT-PCRs after LINC00152 knockdown conditions – Sup. Table 1.

- I performed TPM2, PTX3, IGFBP4, TGM2, SPP1 and LUM qRT-PCRs after LINC00152 knockdown and overexpression conditions – Sup. Table 2.

- I also contributed to the manuscript writing.

81

Abstract

Long non-coding RNAs (lncRNAs) are increasingly implicated in oncogenesis. Here, it is determined that LINC00152/CYTOR is upregulated in glioblastoma multiforme (GBM) and aggressive wild-type IDH1/2 grade II/III gliomas and upregulation associates with poor patient outcomes. LINC00152 is similarly upregulated in over 10 other cancer types and associates with a poor prognosis in 7 other cancer types. Inhibition of the mostly cytoplasmic LINC00152 decreases, and overexpression increases cellular invasion.

LINC00152 knockdown alters the transcription of genes important to epithelial-to- mesenchymal transition (EMT). PARIS and Ribo-seq data, together with secondary structure prediction, identified a protein bound 121bp stem-loop structure at the 3' end of

LINC00152 whose overexpression is sufficient to increase invasion of GBM cells. Point mutations in the stem-loop suggest that stem formation in the hairpin is essential for

LINC00152 function. LINC00152 has a nearly identical homolog, MIR4435-2HG, which encodes a near identical hairpin, is equally expressed in low-grade glioma (LGG) and

GBM, predicts poor patient survival in these tumors and is also reduced by LINC00152 knockdown. Together, these data reveal that LINC00152 and its homolog MIR4435-2HG associate with aggressive tumors and promote cellular invasion through a mechanism that requires the structural integrity of a hairpin structure.

Implications: Frequent upregulation of the lncRNA, LINC00152, in glioblastoma and other tumor types combined with its prognostic potential and ability to promote invasion suggests LINC00152 as a potential biomarker and therapeutic target.

82

Introduction

GBM (glioblastoma) are highly aggressive grade IV gliomas and are the most common type of malignant glioma, with 10,000 new diagnoses each year [1]. GBMs are a heterogeneous group of tumors that can be separated into four different subtypes, mesenchymal, classical, proneural and neural, based on their transcriptional profile. Most of the focus on understanding glioma tumor biology has been on studying protein coding genes and microRNAs [2]. These efforts have identified commonly altered signaling pathways in GBMs, including mutations in EGFR, p53 and mTOR signaling [3,4].

Furthermore, microRNAs have been shown to play a role in many of the oncogenic phenotypes of GBMs, such as invasiveness and stemness of GBM stem cells [5,6].

Although there has been much effort on creating new targeted therapies for GBMs focusing on some of the aforementioned pathways, most have not been effective and the standard of care therapy, a combination of surgical resection, radiotherapy and

Temozolomide, still leaves patients with a 5-year survival rate of roughly 10% [7].

High throughput sequencing revealed that a majority of the human genome, long thought to be transcriptionally silent, is actually expressed. Indeed, when surveyed across many different cell types it was found that nearly 80% of the human genome is actually transcribed [8]. Many of these newly discovered transcripts are lncRNAs (long noncoding

RNAs). LncRNAs are a class of ncRNAs that are longer than 200 bases in length and can be further subdivided into subclasses based on chromosomal position relative to other genes, enhancers or other genomic regulatory elements. LncRNAs have been shown to play many different functional roles in the cell, in part through regulation of transcription, mRNA stability and mRNA translational efficiency [9,10]. Most of the research into the

83 role of ncRNA in GBMs has been on microRNAs, with relatively few studies on lncRNAs.

This leaves a crucial gap in our understanding of glioma pathogenesis. Indeed, lncRNAs have been shown to function in critical roles in a variety of tumor types, e.g. HOTAIR in breast cancer, SChLAP1 in aggressive prostate cancer, MALAT1 in lung cancer and

DRAIC in prostate cancer [11-13].

LINC00152 is a lncRNA that was first identified as being hypomethylated during hepatocellular carcinoma tumorigenesis [14]. It is also dysregulated in gastric cancer and esophageal squamous cell carcinoma [15,16]. However, there are conflicting reports on exactly how LINC00152 functions to promote the invasive phenotype. One study has argued that LINC00152 directly interacts with EGFR and affects AKT signaling while others have suggested that LINC00152 acts as a competing endogenous RNA (ceRNA) through titrating microRNAs [5,6,17-20]. Recently, we identified LINC00152 through an in-depth genomic analysis of gliomas as being highly expressed in GBMs [21]. In this study we characterize LINC00152’s association with GBM clinical features and with tumor cell invasion and begin to functionally characterize LINC00152 structurally. Furthermore, we find that LINC00152 is overexpressed in 10 other tumor types compared to matched normal tissue and high LINC00152 expression is associated with a poor prognosis in 7 of these tumors.

Results

LINC00152 is a lncRNA overexpressed in aggressive gliomas

We first identified LINC00152 from a comprehensive analysis of lncRNAs in gliomas [21]. LINC00152 was one of the most differentially expressed lncRNAs in GBMs

84 compared to normal brain tissue, however it is not upregulated in grade II and III gliomas

(Fig 1A and Sup Fig 1A). We have validated the upregulation of LINC00152 in an independent set of GBM patients compared to normal FFPE brain tissue [21].

We tested whether LINC00152 is preferentially expressed in a particular GBM subtype, but that did not appear to be the case. The differences in LINC00152 expression between the subtypes were not statistically significant, although the median expression of LINC00152 is lowest in the proneural GBM subtype (p<0.1) (Sup Fig 1B). Even though

LINC00152 is not upregulated in LGGs as a whole, the IDHwt LGG subtype expresses 4 times as much LINC00152 as normal brains (p < 0.00001) (Fig 1B). This is interesting, because IDHwt LGGs are far more aggressive than the other LGG subtypes and display clinical properties similar to GBMs [22].

LINC00152 expression predicts survival in GBMs and LGGs

Since LINC00152 is upregulated in brain tumors compared to normal brain tissue, we next elucidated the association of LINC00152 association with survival of GBM and

LGG patients. To do this, we assessed the survival difference of patients expressing high

(above median) and low (below median) level of LINC00152 expression from the TCGA for both GBM and LGG. In GBMs, patients who had high expression of LINC00152 had a poor prognosis (p = 0.0039) compared to the patients expressing low level of

LINC00152, with a median survival of 11.9 and 15.4 months, respectively (Fig 2A).

Furthermore, LINC00152 expression was also able to separate patients into two distinct prognostic groups in LGGs. LGG patients with high expression of LINC00152 had a median survival of 62.1 months, while the low expressing group had a median survival of

98.2 months (p = 1.4 e-5) (Fig 2B). These results demonstrate that not only is LINC00152

85 overexpressed in gliomas, but that this overexpression is associated with poor patient outcome.

LINC00152 in other cancers

It was intriguing to examine LINC00152 expression in other cancers compared to their respective normal tissues. We compared the expression of LINC00152 in all TCGA tumor samples with paired normal and tumor RNA-seq data. Surprisingly, LINC00152 is upregulated in nearly every tumor type we analyzed, including head and neck squamous carcinoma, renal papillary tumor, hepatocellular carcinoma, colorectal carcinoma, renal clear cell carcinoma, breast invasive carcinoma, stomach adenocarcinoma, uterine carcinoma, thyroid carcinoma and lung adenocarcinoma (Fig 1C-L). Prostate adenocarcinoma was the only tumor type that had statistically significant decrease in the levels of LINC00152 in the tumor samples compared to normal (data not shown).

Since LINC00152 is overexpressed in the majority of tumors that we have analyzed, we next wanted to determine whether LINC00152 expression is associated with patient survival in the TCGA tumors that had higher levels of LINC00152 compared to the paired normal samples. To do this, we performed Kaplan Meier analysis for each tumor type by separating patients into two groups, the top quartile LINC00152 expressing tumors and the lowest quartile LINC00152 expressing tumors. From the original list of tumors, LINC00152 expression was associated with poor patient outcome in head and neck squamous cell carcinoma, lung adenocarcinoma, renal clear cell carcinoma and hepatocellular carcinoma (Fig 2C-F). The poor outcome of patients with renal papillary carcinoma was not statistically significant comparing the top and bottom quartiles of

LINC00152 expression (p = 0.1), but the poor outcome was statistically significant (p =

86

0.014) when we compared patients in the top third and bottom third based on LINC00152 expression (Sup Fig 2A).

Although LINC00152 was not overexpressed in LGGs relative to normal brain, it was upregulated in an aggressive subpopulation of LGGs (those with IDH wild type) and was associated with poor patient outcome. This made us realize that even if a tumor type does not overexpress LINC00152 globally relative to normal tissue, overexpression of the lncRNA in specific tumors may still be associated with poor outcome. We therefore examined other TCGA tumors which did not show a global increase of LINC00152 expression in the cancers relative to normal tissue for the predictive value of the expression of this lncRNA. Interestingly, even among these tumors, LINC00152 expression was associated with poor patient outcome in pancreatic adenocarcinoma and acute myeloid leukemia, when we comparing the tumors in the top quartile and bottom quartile for LINC00152 expression (Sup Fig 1B and C). These results highlight the fact that in nine tumor types (GBMs, LGGs, head and neck squamous cell carcinoma, renal clear cell carcinoma, hepatocellular carcinoma, lung adenocarcinoma, renal papillary carcinoma, pancreatic adenocarcinoma and acute myeloid leukemia) LINC00152 appears to function as unfavorable gene whose expression is associated with a poor patient outcome.

LINC00152 expression controls proliferation rate of GBM cells in suspension

Subcellular fractionation (Fig 3A and B) and in-situ hybridization (Fig 3C) revealed that LINC00152 is primarily localized in the cytoplasm of U87 and A172 cells (Sup Fig 3A and B). We next sought to determine whether the upregulation of LINC00152 seen in

GBMs is associated with any cancer phenotypes in GBM cell lines. LINC00152 has

87 previously been shown to affect multiple cellular phenotypes, including cell growth, migration, invasion and epithelial-to-mesenchymal transition (EMT) [23,24]. We knocked down LINC00152 expression using two separate siRNAs or overexpressed the lncRNA and found that LINC00152 knockdown or overexpression did not affect cell growth of cells attached to plastic (Fig 4B and D). However, when cells were seeded in Poly(HEMA)- coated plates, to avoid cells anchorage to the plate, knockdown of LINC00152 significantly decreased cell proliferation (Fig 5B-F). Consistent with this result, when

LINC00152 was overexpressed in U87 cells, the proliferation rate was increased by 57% after 6 days (Fig 6B-F), suggesting that LINC00152 has an anchorage-independent proproliferative function.

LINC00152 expression controls GBM cell invasion

We next assayed whether LINC00152 expression was associated with tumor cell invasion using a transwell migration assay. Knockdown of LINC00152 in U87 and A172 cell lines led to a statistically significant reduction in cell invasion with both siRNAs targeting LINC00152 (Fig 3D and E, Sup Fig 3C and D). Conversely, overexpression of

LINC00152 led to an increase of over 2-fold in the number of invaded cells (Fig 3F and

G, Sup Fig 3E and F). These findings suggest that LINC00152 knockdown decreases invasion of GBM cells, while upregulation in GBMs promotes the invasive phenotype that is commonly seen in patient tumors.

LINC00152 knockdown results in decreased expression of pro-invasive genes

In order to better understand how LINC00152 affects cellular invasion we performed RNA-seq on U87 following knockdown of LINC00152 using a combination of

88 two different siRNAs. Knockdown of LINC00152 leads to large changes in gene expression, with 259 genes significantly up-regulated and 295 down-regulated at least 2- fold (Fig 7A). Thus, to determine the most significant molecular pathways regulated by

LINC00152, we performed GSEA (gene set enrichment analysis), a method that can identify pathway enrichment from fold change based pre-ranked gene list from RNA-seq

[26]. This analysis showed a significant enrichment of up-regulated genes upon siLINC00152 involved in Epithelial to Mesenchymal transition (EMT) (Fig 7B). Among the differentially expressed genes involved in EMT, the changes were validated by qPCR on

12 out of 13 genes after si00152 treatment (Sup Table 1). More interestingly, six of the genes that were downregulated by LINC00152 knockdown were conversely upregulated by overexpression of the lncRNA: TPM2 (Tropomyosin 2), PTX3 (Pentraxin 3), IGFBP4

(Insulin growth factor binding protein 4), TGM2 (Transglutaminase 2), SPP1 (Secreted phosphoprotein 1) and LUM (Lumican)] (Sup Table 1). In addition, knockdown of IGFBP4,

TGM2 and LUM repressed invasion of U87 cells (Sup Fig 4B-D), confirming that these genes play a role in promoting invasion of GBM cells. Moreover, concomitant knockdown of IGFBP4, LUM and TGM2 prevents the invasive phenotype produced by LINC00152 overexpression (Sup Fig 5B-E). These results indicate that LINC00152 may induce U87 cells invasion by regulating the expression of at least these three genes.

LINC00152 is not involved in sponging of miRNAs

Several previous studies have suggested that LINC00152 acts as a microRNA sponge by titrating different microRNAs (miR-376c-3p, miR-4775, miR-4767, miR-138-

5p, miR-103 and miR-205) in different types of tumors, including GBMs [5,6,17-20].

However, suggestions that an lncRNA acts as a microRNA sponge are sometimes

89 questioned because the abundance of the lncRNA is often far less than that of the targets of the microRNAs and of the microRNAs themselves. If LINC00152 acts as a miRNA sponge in U87 cells we would expect that the targets of these microRNAs would be repressed upon knockdown of the lncRNA and the subsequent release of the microRNAs from interaction with the lncRNA. However, we find that there is a statistically significant up-regulation of the targets of these six microRNAs compared with non-targets when

LINC00152 is knocked down ruling out the possibility of LINC00152 acting as a ceRNA for these miRNAs (Fig 7C).

Secondary structure components of LINC00152

Over the past decade several new technologies have been developed to examine the secondary structures of lncRNAs on a global basis, one such technique is PARIS

(psoralen analysis of RNA interactions and structures) [27]. PARIS is based on reversibly crosslinking RNA duplexes (stems of stem-loops) and gentle digestion with a single- strand RNase, S1 nuclease, to cut looped single stranded portions of an RNA’s secondary structure. The surviving RNA duplexes from the stems are then ligated to each other and subjected to high throughput sequencing. RNAs containing stem-loops will have sequencing reads corresponding to the stems with gaps (corresponding to the loops) that do not overlap with a splice site. We analyzed publicly available PARIS data from HeLa cells to determine whether LINC00152 contains any secondary structure elements that could be detected by PARIS. Following alignment, we identified reads with a 2-nt gap that were present in the PARIS libraries (Sup Fig 6C). These reads are positioned from position 285 to 373 of the 496 nt long LINC00152, with a small 2 base gap starting at

90 position 342 (Fig. 8A). Sequence analysis of this region revealed some complementarity, suggesting that this region might in fact form a stem-loop structure (Fig 8A).

To get a better understanding of overall LINC00152 secondary structure, we used publicly available RNA secondary structure prediction tool, mfold, to identify secondary structure predictions for LINC00152 that are consistent with a stem-loop being present from position 285-373 [25]. The top 2 secondary structures with the lowest free energy differed in their exact base-pairing, but the overall stem-loop structure was largely the same. Importantly, both structures were consistent with a stem-loop being present from position 285 to 373 (Fig 8A and Sup Fig 6A and B). Furthermore, the resulting loop from the stem formation is rather small, 4 nt, which is consistent with the small 2 nt gap seen by PARIS.

We next asked if we could use a separate method to independently validate the hairpin formation in LINC00152. Ribo-seq (Ribosome profiling) is a technique that has been used to identify RNAs that interact with the ribosome and how the ribosome is distributed across those RNAs [28]. This information has also been used to ascertain that some lncRNAs are associated with ribosomes, but not translating ribosomes [29].

Recently it was determined that the polysomes isolated for Ribo-seq are contaminated with other ribonucleoprotein (RBP) complexes. As a result RNA footprints from RBPs that are not ribosome proteins can be detected in Ribo-seq data [30]. To determine if we could identify RBP-RNA footprints from LINC00152 we analyzed publicly available Ribo-seq data from normal brain samples [31]. In two out of the three normal brain Ribo-seq samples we detected a RBP footprint at positions 303-330 of LINC00152. In addition, in one of the samples there was an RBP footprint from 354-382 (Fig 8A and Sup Fig 6D).

91

These two footprinted areas are located on opposite strands of the same stem-loop that was detected by PARIS, providing additional evidence of the existence of this stem-loop and suggesting that this stem is bound by a protein in an RBP (Fig 8A).

LINC00152 stem-loop, M8, is sufficient to promote cell invasion and proliferation of cells in suspension.

In order to determine whether this newly identified, potentially protein bound, stem- loop plays a role in LINC00152 function, we created a series of LINC00152 deletion mutants (Fig 8B). The sites of the deletions were chosen based on PARIS and Ribo-seq analysis as well as two in silico predicted structures of LINC00152 (Fig 8A and Sup Fig

6C and D). We assessed whether independent overexpression of the mutants was able to stimulate U87 cell invasion. Overexpression of M2 (which removed the minimal amount of the protein bound stem-loop, nucleotides 280-401) or M3 (which removed the stem- loop and the remaining 3’ end) led to a decreased cell invasion significantly compared to full-length LINC00152 (p < 0.05) (Fig 8D). On the other hand, the mutant M4 (which removed the 3’ end but preserved the stem-loop) or M7 (which removed the extreme 3’ end, and also preserved the stem-loop) increased U87 cell invasion. Other deletion mutants that removed regions of LINC00152 5’ to the stem loop (M5 or M6) stimulated cellular invasion to a similar extent as full-length LINC00152. Finally, overexpression of

M8, containing only the protein bound stem-loop (nucleotides 280-401) was sufficient to stimulate invasion of U87 cells (Fig 8D). These results suggest that M8 stem-loop is necessary and sufficient for stimulation of cell invasion.

Consistent with this conclusion, overexpression of the stem-loop also induced the six genes involved in EMT to the same extent as the full length LINC00152 (Sup Table

92

2). In addition, knockdown of LINC00152 by si00152_II (a siRNA that targets a region on

LINC00152 outside of M8) decreased cell invasion while the siRNA-resistant M8 was sufficient to rescue cell invasion (Fig 8E).

In addition to inducing invasion, LINC00152 is capable of inducing cell proliferation when seeded in suspension (Fig 5 and Fig 6). Thus, we hypothesized that the newly identified M8 stem-loop is also able to recapitulate this phenotype. We found that, overexpression of M8 induced proliferation of U87 cells in suspension to the same extent as LINC00152 full length (Fig 6B-F). This result indicates that the M8 stem-loop is sufficient to promote anchorage-independent proliferation.

The M8 stem-loop structure is important for stimulating invasion.

We next tested with point mutations whether the ability to stimulate invasion of U87 cells depends on the LINC00152 stem-loop structure. Two mutants on opposite side of the stem disrupt the stem-loop (mutA: changes bases 333-336 and mutB: changes bases

349-352) (Fig 9D). Neither mutA nor mutB stimulated the invasion of U87 cells as well as full length LINC00152 or M8 (Fig 9A). In contrast, when the two mutations were combined in mutAB, the stem-loop structure was reconstructed and this promoted invasion to the same extent as full length LINC00152 or M8 (Fig 9A). Therefore, we can conclude that the stem-loop structure itself is essential for LINC00152 to stimulate cellular invasion.

MIR4435-2HG, a homolog of LINC00152

As previously reported [32], LINC00152 is a close homolog of another lncRNA,

MIR4435-2HG, both have nearly identical sequences (with only 6 base mismatches) and both contain the M8 sequence (Fig 10A). LINC00152 and MIR4435-2HG are both

93 transcribed from chromosome 2, LINC00152 is located at chr2: 87455476-87606739 and

MIR4435-2HG is transcribed from chr2:111196350-111495115. In order to estimate the expression level of these two RNAs, we considered RNA-seq reads that uniquely mapped without any mismatch to either LINC00152 or MIR4435-2HG. This analysis showed that

LINC00152 and MIR4435-2HG are expressed at the same level in U87 cells, and both of the RNAs are knocked down upon treatment of si00152 to a similar extent (Fig 10B and

C). Thus, the phenotype that we observe with siRNA directed towards LINC00152 is also likely through knocking down the highly similar MIR4435-2HG. However, we see an increase in cell invasion when we exogenously express LINC00152, saying that

LINC00152 by itself can promote cell invasion. Again, upon considering uniquely mapped reads in TCGA RNA-seq data, we found that both LINC00152 and MIR4435-2HG are equally expressed in LGG and GBM (Fig 10D). Analysis of TCGA RNA-seq data also revealed a positive correlation between the expression of the two RNAs in GBM and LGG

(Fig 10E and G), suggesting that these two RNAs may be co-regulated. Moreover, the

Kaplan Meier plot to estimate survival showed that expression of either RNA is associated with poor patient survival (Fig 10G and H).

Discussion

The human genome was once thought to be mainly dormant and that most of the transcription was devoted in producing protein coding genes. We now know that the genome is transcriptionally vibrant and only a small fraction of the expressed genome, roughly 2%, encodes for protein coding genes. GWAS and high throughput sequencing studies have found that many of the genomic lesions and expression alterations seen in cancer and other pathologies fall within non-protein coding regions of the genome and

94 may lead to dysregulation of ncRNAs [33-35]. Furthermore, there is a growing body of evidence implicating lncRNAs in playing a direct role in normal cellular physiology, as well as driving pathogenesis in a variety of disorders, including cancer [35–38]. Indeed, recent work has illustrated the critical role that lncRNAs play in cancer, including iconic examples such as HOTAIR in breast cancer and HULC in hepatocellular carcinoma and DRAIC in prostate cancer [11,14,39].

In this study we have shown that LINC00152 is a lncRNA that is upregulated in many different cancer types and is highly upregulated in GBMs. Although LINC00152 is not upregulated in all LGGs relative to normal brain tissue, it is upregulated in the highly malignant IDHwt LGG subtype, further supporting LINC00152’s association with aggressive tumors. This raised the interesting possibility that in tumors where LINC00152 is not differentially over-expressed or is moderately upregulated in the tumor population relative to normal tissue, LINC00152 could still be highly upregulated in a more aggressive subgroup of the tumors. This was indeed found to be true in Pancreatic

Adenocarcinomas and Acute Myeloid Leukemias. LINC00152 expression is associated with patient survival in nine different cancer types, including GBMs and LGGs. LINC00152 expression promotes cell invasion, which is consistent with its association with poor patient outcomes.

Previous studies have shown that LINC00152 is an oncogenic lncRNA involved in regulating invasion in different types of tumors [5,6,18-20] , including gliomas [5,6].

Mingjun Yu and collaborators [5] have reported an in vivo tumor xenograft study, downregulation of LINC00152 produced smaller tumors and increased survival rates when compared to control. Consistent with these findings, our results demonstrated that

95

LINC00152 knockdown impaired the ability of U87 cells to proliferate under anchorage- independent conditions. Reciprocally, overexpression of LINC00152 induced proliferation of U87 cells in suspension. Anchorage-independent growth is an important feature of tumor cells that has been linked to metastatic potential [40], leading to cell cycle defect and anoikis [41]. Moreover, by knockdown and overexpression experiments we reported that LINC00152 controls invasion of GBM cells. Thus, our findings reinforce the idea that

LINC00152 is an oncogenic lncRNA that is associated with aggressive tumors by promoting cell invasion. Moreover, through analysis of global RNA structure mapping and

RNA-protein interaction data, we identified a protein bound stem-loop in the 3’ region of

LINC00152. The structure-function analysis demonstrated that this stem-loop is necessary and sufficient for stimulating invasion of U87 cells, that it can rescue the loss of invasion seen after knockdown of LINC00152 and that the base-pairing of the opposite strands of the stem-loop, rather than the sequence at the mutated sites, is more important for stimulating invasion of U87 cells.

GSEA of RNA-seq from LINC00152 knocked down cells also supports the idea that LINC00152 is involved in promoting invasion. However, despite previous suggestions, analysis of the RNA-seq showed us that LINC00152 is not acting as a ceRNA that sponges miRNAs. More specifically, our results suggest that LINC00152 promotes invasion of GBM cells by regulating the expression of IGFBP4, LUM and TGM2, since knockdown of these genes prevents the invasive phenotype produced by

LINC00152 overexpression.

IGFBP4 was previously reported to be a tumor suppressor in many types of cancer, including lung [42,43], breast [44,45], thyroid [46] and prostate [47]. However, in

96 kidney [48] and brain [49] tumors it has been reported as oncogenic. In glioma patient samples, IGFBP4 was upregulated compared to normal brain tissue [50]. Moreover,

IGFBP4 overexpression induced cell proliferation, migration and invasion, while knockdown reversed these phenotypes [50], consistent with our idea of IGFBP4 controlling invasion in GBM cells. The same report showed that IGFBP4 induced expression of VIM (Vimentin), CDH2 (N-cadherin), SNAI1 (Snail), SNAI2 (Slug), and

TWIST1 (Twist), whereas decreased CDH1 (E-cadherin) expression more than 2-fold compared to vector transfection. However, IGFBP4 knockdown did not reverse the expression of these genes, using the same cut-off [50]. Consistently, analysis of siLINC00152 RNA-seq revealed that none of these mRNAs were differentially expressed more than 2-fold, when compared to control.

LUM is another gene whose expression is controversial in tumors. It is downregulated in breast [51] and prostate cancer [52,53]. Additionally, Troup et al. and

Panis et al. reinforced the idea that LUM is a tumor suppressor gene, as it can predict good prognosis in breast and prostate tumors, respectively [54,55]. However, in accordance with our results, most reports propose that lumican is oncogenic in different tumors, since it is upregulated in breast cancer [56-58], colorectal cancer [59-61], pancreatic cancer [62-66]., cervical cancer [67], lung cancer [68,69] and prostate cancer

[70]. Moreover, LUM has been reported as poor prognostic marker in breast [56-58] and colorectal tumors [60,61]. Interestingly, LUM overexpression induced cell migration and invasion in colon and prostate cancer [70,71]. In gliomas, LUM has been identified as upregulated in CSC (cancer stem cells), a subpopulation often reported as resistant to conventional therapies, causing de novo tumor formation [72].

97

TGM2 has been reported as induced in tumors [73-77]. TGM2 upregulation is associated to invasion in non-small-cell lung cancer cells by upregulating the expression of MMP-9 [78]. In breast cancer, TGM2 overexpression induces EMT by upregulation of mesenchymal mRNAs, such as CDH2 (N-cadherin), FN1 (Fibronectin) and VIM

(Vimentin), and by downregulation of the epithelial markers CDH1 (E-cadherin), KRT19

(Keratin 19), KRT14 (Keratin 14), KRT17 (Keratin 17) and OCLN (Occludin) [79].

However, none of these EMT genes were differentially expressed more than 2-fold in our

RNA-seq upon LINC00152 knockdown. TGM2 has been also reported as upregulated in cervical cancer, and its high expression is correlated with poor patient outcomes in oral and cervical carcinomas [80]. Moreover, in cervical tumors, TGM2 knockdown suppresses migration and invasion, while TGM2 overexpression increases cell motility and invasion. In addition, TGM2 interacts with integrin-α5β1 by co-IP and microscopy experiments [80]. In meningiomas, TGM2 works as an oncogene. Huang et al. observed that TGM2 is highly expressed in clinically-aggressive meningiomas and this upregulation was associated with higher patient tumor grade and recurrence. Further, knockdown of

TGM2 induced meningioma cell death [81]. These studies corroborate with our findings that, after LINC00152 overexpression, TGM2 is upregulated and works as an oncogene.

LINC00152 is unusual among lncRNAs in being conserved between mice and humans. Mouse has a single gene, MIR4435-2HG (Gm14005 or MORRBID), but humans have two closely related genes, LINC00152 and MIR4435-2HG. Mouse MIR4435-2HG has been proposed to be a pro-survival lncRNA that represses a gene in cis, the proapoptic gene BCL2L11 (BIM) by recruiting the polycomb repressive complex, PRC2, to the BCL2L11 promoter [82]. We considered the possibility that LINC00152, although

98 cytoplasmic, is acting as a pro-oncogenic RNA by similarly suppressing BCL2L11. siLINC00152 increases BCL2L11 RNA (Sup Fig 7A-C), but this is not expected to decrease cell invasion. Second, overexpression of LINC00152 from a heterologous site or the M8 hairpin of the lncRNA did not decrease BCL2L11 (Sup Fig 7D-F) and yet increased cell invasion. Third, analyzing previously published mouse PAR-CLIP data, we determined that EZH2 from the polycomb complex associates with an intronic region of

MORRBID (mouse MIR4435-2HG/MORRBID) which is not near the M8 region. Fourth, mouse MIR4435-2HG encodes only the first third of the M8 hairpin that we have found to be functions in human LINC00152 or human MIR4435-2HG. Finally, LINC00152 is predominantly cytoplasmic, arguing against any role in recruiting any factors to the genome. Collectively, these results suggest that interaction with PRC2 is not necessary for the stimulation of invasion seen upon overexpression of the LINC00152 or the M8 hairpin RNA.

In conclusion, LINC00152/CYTOR and its homolog MIR4435-2HG functions as an oncogenic lncRNA in GBMs through the action of a protein-bound stem-loop and potentially plays a critical oncogenic role in a wide variety of cancer types. The results rule out a mechanism of action involving the sponging of miRNAs as proposed in the literature, or interaction with the Polycomb complex proposed for the mouse MIR4435-

2HG/MORRBID RNA. LINC00152 could also serve as a tumor biomarker or a target for future cancer therapeutics.

Acknowledgments

We thank Dutta lab members for advice and helpful discussions. This work was supported by grants from the NIH R01 CA166054, AR067712 and a V foundation award

99

D2018-002 to AD. BTRK was partly supported by CAPES BEX 0320-13-7. MK was partly supported by a DOD award PC151085.

Materials and Methods

Cell culture, knockdown and overexpression of LINC00152

U87 cells were maintained in MEM supplemented with 1% non-essential amino acids solution (cat # 11140-050, Gibco), 1mM sodium pyruvate (cat # 11360070, Gibco),

0.15% sodium bicarbonate (cat # 25080094, Gibco), 10% FBS and 1% P/S.

For knockdown, U87 cells were transfected during two rounds of transfection. First, cells were reverse transfected with 40 ηM of si00152_II (5’-

UGACACACUUGAUCGAAUA-3’), si00152_III (5’-CCGGAAUGCAGCUGAAAGA-3’) or a nonspecific siGL2 control siRNA (5’-CGUACGCGGAAUACUUCGA-3’) and 9 µL of

Lipofectamine RNAiMAX transfection reagent (Thermo Fisher). 24 hours later, a second round of transfection was performed using the same quantities of reagents. 24 hours after the final transfection, cells were harvested and used for subsequent analysis.

500ηg of LINC00152 or LINC00152 mutants pCDNA3-flag vectors were transfected into U87 cells using 2µL of Lipofectamine 2000 (Thermo Fisher). Cells were harvested after 48 hours for downstream analysis.

RNA isolation, cDNA synthesis and qPCR

Total RNA and nuclear/cytoplasmic RNAs were extracted using TRIzol total RNA isolation reagent (Thermo Fisher), Protein and RNA Isolation System (ThermoFisher), respectively. RNA samples were treated with RQ1 RNase-Free DNase (Promega)

100 according to according to manufacturer’s instructions. cDNA was produced from 1µg RNA using Superscript III kit (Thermo Fisher) according to manufacturer’s instructions.

Poly(HEMA)-coated plate preparation

Poly(HEMA) experiments were conducted as previously described [83]. Briefly,

Poly(HEMA) reagent [Poly(2-hydroxyethyl methacrylate)] (Sigma-Aldrich, cat number:

P3932-10G) was dissolved at 5mg/mL in 95% ethanol at 37ºC with rotation overnight.

200µL of the solution was pipetted into wells of 24-well plates and the plates were dried for 2h at room temperature inside tissue culture hood.

MTT and matrigel invasion assays

For measuring cell growth, 1,000 cells were plated in quadruplets in 96 well plates and cell growth was measured using standard MTT reagent (Promega). To measure invasion, 2x105 U87 cells in serum free media were seeded into 24-well Matrigel Invasion

Chambers (BD Biosciences) and the bottom was filled with media and 10% FBS as the chemoattractant. Cells were allowed to invade for 8 hours and then fixed and stained with crystal violet/methanol and invaded cells were counted. For Poly(HEMA) experiments,

24-wells were used, adjusting the number of cells and MTT volumes accordingly.

Expression of LINC00152 in TCGA datasets and survival analysis

The expression of LINC00152 in GBMs and LGGs compared to normal brain and tumor subtypes was performed as previously described [20]. Expression of LINC00152 in all other TCGA tumors was determined by comparing expression data of only those tumors that had a matched normal tissue sample. Statistical significance was determined using a paired t-test. TCGA patient survival data for GBMs and LGGs were retrieved from

101 cBioPortal (www.cbioportal.org) and survival data for the remaining tumor types were retrieved from OncoLnc (www.oncolnc.org) on 12/2016 [84-86]. The expression threshold used to separate patients are outlined in the main text. Kaplan Meier plots and p-values were generated using the ‘survminer’ package for R.

RNA-seq analysis

U87 cells were treated with a combination of the two siRNA as mentioned earlier and total cell RNA was isolated using TRIzol and subsequently purified using RNeasy

Isolation kit (Qiagen). Sequencing libraries were generated using NEB NEXT Ultra directional RNA Library prep kit and samplers were barcoded with NEBNext Multiplexing oligos per standard manufacturer protocols. Libraries were sequenced with 75 bp paired- end reads NextSeq500 instrument, in the Biomolecular Analysis Facility, University of

Virginia School of Medicine. Sequencing reads were aligned to the hg38 reference genome using HISAT [87]. Gene abundances and identification of differentially expressed genes were performed using HTSeq and DESeq2 [88,89]. An adjusted P-value (obtained by DESeq2) cut-off of 0.05 and Log 2-Fold change of 2 was used to define differentially expressed genes. GSEA analysis was performed on preranked gene list based on fold change (siLINC00152/siGL2) against 50 hallmark gene sets [26]. The raw and processed data were deposited in Gene Expression Omnibus (GEO) under accession number

GSE111652.

LINC00152 structure predictions

Secondary structure predictions of LINC00152 were determined using mfold [25].

The two structures with the lowest predicted free energies were selected for comparisons

102 with PARIS and Ribo-seq. For PARIS data analysis of LINC00152, raw sequencing data from Lu et. al. was aligned to the hg19 genome using STAR (spliced transcripts alignment to a reference) with the alignment parameters outlines in Lu et. al. [27,90]. Aligned reads were then processed to identify gapped mapping to LINC00152 and visualized with IGV

[91]. We used ribosome profiling data from Gonzalez et. al. and aligned reads to the hg19 genome using HISAT2 [30]. We then examined reads that mapped to LINC00152 for their distribution along the message to ensure that they were not legitimate ribosome footprints using IGV [91]. The predicted secondary structure elements and protein bound region were then compared to the in silico secondary structure predictions.

References

1 - Ostrom QT, Gittleman H, Fulop J, Liu M, Blanda R, Kromer C, et al. CBTRUS Statistical Report: Primary Brain and Central Nervous System Tumors Diagnosed in the United States in 2008-2012. Neuro-Oncology. 2015;17(suppl 4):iv1-iv62.

2 - Huse JT, Holland EC. Targeting brain cancer: advances in the molecular pathology of malignant glioma and medulloblastoma. Nat Rev Cancer. 2010;10(5):319–31.

3 - Network CGAR. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455:1061–8.

4 - Brennan CW, Verhaak RGW, McKenna A, Campos B, Noushmehr H, Salama SR, et al. The Somatic Genomic Landscape of Glioblastoma. Cell. 2013;155(2):462–77.

5 - Yu M, Xue Y, Zheng J, Liu X, Yu H, Liu L, et al. Linc00152 promotes malignant progression of glioma stem cells by regulating miR-103a-3p/FEZF1/CDC25A pathway. Mol Cancer. 2017;16(1):110.

103

6 - Zhu Z, Dai J, Liao Y, Ma J, Zhou W. Knockdown of Long Noncoding RNA LINC0000125 Suppresses Cellular Proliferation and Invasion in Glioma Cells by Regulating MiR-4775. Oncol Res. 2017. doi: 10.3727/096504017X15016337254597.

7 - Stupp R, Hegi ME, Mason WP, van den Bent MJ, Taphoorn MJB, Janzer RC, et al. Effects of radiotherapy with concomitant and adjuvant temozolomide versus radiotherapy alone on survival in glioblastoma in a randomised phase III study: 5-year analysis of the EORTC-NCIC trial. Lancet Oncol. 2009;10(5):459–66.

8 - Consortium TEP. An Integrated Encyclopedia of DNA Elements in the Human Genome. Nature. 2012;489(7414):57–74.

9 - Kretz M, Siprashvili Z, Chu C, Webster DE, Zehnder A, Qu K, et al. Control of somatic tissue differentiation by the long non-coding RNA TINCR. Nature. 2013;493(7431):231– 5.

10 - Gupta RA, Shah N, Wang KC, Kim J, Horlings HM, Wong DJ, et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature. 2010;464(7291):1071–6.

11 - Prensner JR, Iyer MK, Sahu A, Asangani IA, Cao Q, Patel L, et al. The long noncoding RNA SChLAP1 promotes aggressive prostate cancer and antagonizes the SWI/SNF complex. Nat Genet. 2013;45(11):1392–8.

12 - Gutschner T, Hämmerle M, Eißmann M, Hsu J, Kim Y, Hung G, et al. The non-coding RNA MALAT1 is a critical regulator of the metastasis phenotype of lung cancer cells. Cancer Res. 2013 Feb 1;73(3):1180–9.

13 - Sakurai K, Reon BJ, Anaya J, Dutta A. The lncRNA DRAIC/PCAT29 Locus Constitutes a Tumor-Suppressive Nexus. Mol Cancer Res. 2015;13(5):828–38.

14 - Neumann O, Kesselmeier M, Geffers R, Pellegrino R, Radlwimmer B, Hoffmann K, et al. Methylome analysis and integrative profiling of human HCCs identify novel protumorigenic factors. Hepatology. 2012;56(5):1817–27.

15 - Cao WJ, Wu HL, He BS, Zhang YS, Zhang ZY. Analysis of long non-coding RNA expression profiles in gastric cancer. World J Gastroenterol. 2013;19(23):3658–64.

104

16 - Yang S, Ning Q, Zhang G, Sun H, Wang Z, Li Y. Construction of differential mRNA- lncRNA crosstalk networks based on ceRNA hypothesis uncover key roles of lncRNAs implicated in esophageal squamous cell carcinoma. Oncotarget. 2016;7(52):85728- 85740.

17 - Zhang YH, Fu J, Zhang ZJ, Ge CC, Yi Y. LncRNA-LINC00152 down-regulated by miR-376c-3p restricts viability and promotes apoptosis of colorectal cancer cells. Am J Transl Res. 2016;8(12):5286-5297.

18 - Teng W, Qiu C, He Z, Wang G, Xue Y, Hui X. Linc00152 suppresses apoptosis and promotes migration by sponging miR-4767 in vascular endothelial cells. Oncotarget. 2017;8(49):85014-85023.

19 - Cai Q, Wang Z, Wang S, Weng M, Zhou D, Li C, et al. Long non-coding RNA LINC00152 promotes gallbladder cancer metastasis and epithelial-mesenchymal transition by regulating HIF-1α via miR-138. Open Biol. 2017;7(1). pii: 160247.

20 - Wang Y, Liu J, Bai H, Dang Y, Lv P, Wu S. Long intergenic non-coding RNA 00152 promotes renal cell carcinoma progression by epigenetically suppressing P16 and negatively regulates miR-205. Am J Cancer Res. 2017;7(2):312-322.

21 - Reon BJ, Anaya J, Zhang Y, Mandell J, Purow B, Abounader R, et al. Expression of lncRNAs in Low-Grade Gliomas and Glioblastoma Multiforme: An In Silico Analysis. PLOS Med. 2016;13(12):e1002192.

22 - Network CGAR. Comprehensive, Integrative Genomic Analysis of Diffuse Lower- Grade Gliomas. N Engl J Med. 2015;372(26):2481–98.

23 - Ji J, Tang J, Deng L, Xie Y, Jiang R, Li G, et al. LINC00152 promotes proliferation in hepatocellular carcinoma by targeting EpCAM via the mTOR signaling pathway. Oncotarget. 2015;6(40):42813–24.

24 - Zhao J, Liu Y, Zhang W, Zhou Z, Wu J, Cui P, et al. Long non-coding RNA Linc00152 is involved in cell cycle arrest, apoptosis, epithelial to mesenchymal transition, cell migration and invasion in gastric cancer. Cell Cycle. 2015;14(19):3112–23.

105

25 - Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003;31(13):3406-15.

26 - Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome- wide expression profiles. Proc Natl Acad Sci. 2005;102(43):15545–50.

27 - Lu Z, Zhang QC, Lee B, Flynn RA, Smith MA, Robinson JT, et al. RNA Duplex Map in Living Cells Reveals Higher-Order Transcriptome Structure. Cell. 2016;165(5):1267– 79.

28 - Ingolia NT, Ghaemmaghami S, Newman JRS, Weissman JS. Genome-Wide Analysis in Vivo of Translation with Nucleotide Resolution Using Ribosome Profiling. Science (80- ). 2009;324(5924):218 LP-223.

29 - Guttman M, Russell P, Ingolia NT, Weissman JS, Lander ES. Ribosome Profiling Provides Evidence that Large Noncoding RNAs Do Not Encode Proteins. Cell. 2013;154(1):240–51.

30 - Ji Z, Song R, Huang H, Regev A, Struhl K. Transcriptome-scale RNase-footprinting of RNA-protein complexes. Nat Biotech. 2016;34(4):410–3.

31 - Gonzalez C, Sims JS, Hornstein N, Mela A, Garcia F, Lei L, et al. Ribosome Profiling Reveals a Cell-Type-Specific Translational Landscape in Brain Tumors. J Neurosci. 2014;34(33):10924–36.

32 - Nötzold L, Frank L, Gandhi M, Polycarpou-Schwarz M, Groß M, Gunkel M, Beil N, Erfle H, Harder N, Rohr K, Trendel J, Krijgsveld J, Longerich T, Schirmacher P, Boutros M, Erhardt S, Diederichs S. The long non-coding RNA LINC00152 is essential for cell cycle progression through mitosis in HeLa cells. Sci Rep. 2017;7(1):2265.

33 - Mirza AH, Kaur S, Brorsson CA, Pociot F. Effects of GWAS-Associated Genetic Variants on lncRNAs within IBD and T1D Candidate Loci. PLoS One. 2014;9(8):e105723.

34 - Huarte M. The emerging role of lncRNAs in cancer. Nat Med. 2015;21(11):1253–61.

106

35 - Iyer MK, Niknafs YS, Malik R, Singhal U, Sahu A, Hosono Y, et al. The landscape of long noncoding RNAs in the human transcriptome. Nat Genet. 2015;47(3):199–208.

36 - Grote P, Wittler L, Währisch S, Hendrix D, Beisaw A, Macura K, et al. The tissue- specific lncRNA Fendrr is an essential regulator of heart and body wall development in the mouse. Dev Cell. 2013;24(2):206–14.

37 - Flockhart RJ, Webster DE, Qu K, Mascarenhas N, Kovalski J, Kretz M, et al. BRAFV600E remodels the melanocyte transcriptome and induces BANCR to regulate melanoma cell migration. Genome Res. 2012 ;22(6):1006–14.

38 - Van Grembergen O, Bizet M, de Bony EJ, Calonne E, Putmans P, Brohée S, et al. Portraying breast cancers with long noncoding RNAs. Sci Adv. 2016;2(9):e1600220.

39 - Panzitt K, Tschernatsch MMO, Guelly C, Moustafa T, Stradner M, Strohmaier HM, et al. Characterization of HULC, a Novel Gene With Striking Up-Regulation in Hepatocellular Carcinoma, as Noncoding RNA. Gastroenterology. 2007;132(1):330–42.

40 - Cifone MA, Fidler IJ. Correlation of patterns of anchorage-independent growth with in vivo behavior of cells from a murine fibrosarcoma. Proc Natl Acad Sci USA. 1980 Feb;77(2):1039-43.

41 - Mori S, Chang JT, Andrechek ER, Matsumura N, Baba T, Yao G, Kim JW, Gatza M, Murphy S, Nevins JR. Anchorage-independent cell growth signature identifies tumors with metastatic potential. Oncogene. 2009 Aug 6;28(31):2796-805.

42 - Wegmann BR, Schöneberger HJ, Kiefer PE, Jaques G, Brandscheid D, Havemann K. Molecular cloning of IGFBP-5 from SCLC cell lines and expression of IGFBP-4, IGFBP- 5 and IGFBP-6 in lung cancer cell lines and primary tumours. Eur J Cancer. 1993;29A(11):1578-84.

43 - Pavelić J, Pavelić L, Karadza J, Krizanac S, Unesić J, Spaventi S, Pavelić K. Insulin- like growth factor family and combined antisense approach in therapy of lung carcinoma. Mol Med. 2002 Mar;8(3):149-57.

44 - Pekonen F, Nyman T, Ilvesmäki V, Partanen S. Insulin-like growth factor binding proteins in human breast cancer tissue. Cancer Res. 1992 Oct 1;52(19):5204-7.

107

45 - Mita K, Zhang Z, Ando Y, Toyama T, Hamaguchi M, Kobayashi S, Hayashi S, Fujii Y, Iwase H, Yamashita H. Prognostic significance of insulin-like growth factor binding protein (IGFBP)-4 and IGFBP-5 expression in breast cancer. Jpn J Clin Oncol. 2007 Aug;37(8):575-82.

46 - Bachrach LK, Nänto-Salonen K, Tapanainen P, Rosenfeld RG, Gargosky SE. Insulin- like growth factor binding protein production in human follicular thyroid carcinoma cells. Growth Regul. 1995 Jun;5(2):109-18.

47 - Damon SE, Maddison L, Ware JL, Plymate SR. Overexpression of an inhibitory insulin-like growth factor binding protein (IGFBP), IGFBP-4, delays onset of prostate tumor formation. Endocrinology. 1998 Aug;139(8):3456-64.

48 - Ueno K, Hirata H, Majid S, Tabatabai ZL, Hinoda Y, Dahiya R. IGFBP-4 activates the Wnt/beta-catenin signaling pathway and induces M-CAM expression in human renal cell carcinoma. Int J Cancer. 2011 Nov 15;129(10):2360-9. doi: 10.1002/ijc.25899.

49 - van den Boom J, Wolter M, Kuick R, Misek DE, Youkilis AS, Wechsler DS, Sommer C, Reifenberger G, Hanash SM. Characterization of gene expression profiles associated with glioma progression using oligonucleotide-based microarray analysis and real-time reverse transcription-polymerase chain reaction. Am J Pathol. 2003 Sep;163(3):1033-43.

50 - Praveen Kumar VR, Sehgal P, Thota B, Patil S, Santosh V, Kondaiah P. Insulin like growth factor binding protein 4 promotes GBM progression and regulates key factors involved in EMT and invasion. J Neurooncol. 2014 Feb;116(3):455-64. doi: 10.1007/s11060-013-1324-y. Epub 2014 Jan 7.

51 - Eshchenko TY, Rykova VI, Chernakov AE, Sidorov SV, Grigorieva EV. Expression of different proteoglycans in human breast tumors. Biochemistry (Mosc). 2007 Sep;72(9):1016-20.

52 - Holland JW, Meehan KL, Redmond SL, Dawkins HJ. Purification of the keratan sulfate proteoglycan expressed in prostatic secretory cells and its identification as lumican. Prostate. 2004 May 15;59(3):252-9.

108

53 - Suhovskih AV, Mostovich LA, Kunin IS, Boboev MM, Nepomnyashchikh GI, Aidagulova SV, Grigorieva EV. Proteoglycan expression in normal human prostate tissue and prostate cancer. ISRN Oncol. 2013 Apr 18; 2013:680136.

54 - Troup S, Njue C, Kliewer EV, Parisien M, Roskelley C, Chakravarti S, Roughley PJ, Murphy LC, Watson PH. Reduced expression of the small leucine-rich proteoglycans, lumican, and decorin is associated with poor outcome in node-negative invasive breast cancer. Clin Cancer Res. 2003 Jan;9(1):207-14.

55 - Panis C, Pizzatti L, Herrera AC, Cecchini R, Abdelhay E. Putative circulating markers of the early and advanced stages of breast cancer identified by high-resolution label-free proteomics. Cancer Lett. 2013 Mar 1;330(1):57-66.

56 - Leygue E, Snell L, Dotzlaw H, Hole K, Hiller-Hitchcock T, Roughley PJ, Watson PH, Murphy LC. Expression of lumican in human breast carcinoma. Cancer Res. 1998 Apr 1;58(7):1348-52. PubMed PMID: 9537227.

57 - Leygue E, Snell L, Dotzlaw H, Troup S, Hiller-Hitchcock T, Murphy LC, Roughley PJ, Watson PH. Lumican and decorin are differentially expressed in human breast carcinoma. J Pathol. 2000 Nov;192(3):313-20. PubMed PMID: 11054714.

58 - Somiari RI, Sullivan A, Russell S, Somiari S, Hu H, Jordan R, George A, Katenhusen R, Buchowiecka A, Arciero C, Brzeski H, Hooke J, Shriver C. High-throughput proteomic analysis of human infiltrating ductal carcinoma of the breast. Proteomics. 2003 Oct;3(10):1863-73.

59 - Lu YP, Ishiwata T, Kawahara K, Watanabe M, Naito Z, Moriyama Y, Sugisaki Y, Asano G. Expression of lumican in human colorectal cancer cells. Pathol Int. 2002 Aug;52(8):519-26.

60 - Seya T, Tanaka N, Shinji S, Yokoi K, Koizumi M, Teranishi N, Yamashita K, Tajiri T, Ishiwata T, Naito Z. Lumican expression in advanced colorectal cancer with nodal metastasis correlates with poor prognosis. Oncol Rep. 2006 Dec;16(6):1225-30.

61 - de Wit M, Belt EJ, Delis-van Diemen PM, Carvalho B, Coupé VM, Stockmann HB, Bril H, Beliën JA, Fijneman RJ, Meijer GA. Lumican and versican are associated with

109 good outcome in stage II and III colon cancer. Ann Surg Oncol. 2013 Dec;20 Suppl 3:S348-59.

62 - Ping Lu Y, Ishiwata T, Asano G. Lumican expression in alpha cells of islets in pancreas and pancreatic cancer cells. J Pathol. 2002 Mar;196(3):324-30.

63 - Ishiwata T, Cho K, Kawahara K, Yamamoto T, Fujiwara Y, Uchida E, Tajiri T, Naito Z. Role of lumican in cancer cells and adjacent stromal tissues in human pancreatic cancer. Oncol Rep. 2007 Sep;18(3):537-43.

64 - Köninger J, Giese T, di Mola FF, Wente MN, Esposito I, Bachem MG, Giese NA, Büchler MW, Friess H. Pancreatic tumor cells influence the composition of the extracellular matrix. Biochem Biophys Res Commun. 2004 Sep 24;322(3):943-9.

65 - Yang ZX, Lu CY, Yang YL, Dou KF, Tao KS. Lumican expression in pancreatic ductal adenocarcinoma. Hepatogastroenterology. 2013 Mar-Apr;60(122):349-53.

66 - Pan S, Chen R, Stevens T, Bronner MP, May D, Tamura Y, McIntosh MW, Brentnall TA. Proteomics portrait of archival lesions of chronic pancreatitis. PLoS One. 2011;6(11):e27574.

67 - Naito Z, Ishiwata T, Kurban G, Teduka K, Kawamoto Y, Kawahara K, Sugisaki Y. Expression and accumulation of lumican protein in uterine cervical cancer cells at the periphery of cancer nests. Int J Oncol. 2002 May;20(5):943-8.

68 - Okano T, Kondo T, Kakisaka T, Fujii K, Yamada M, Kato H, Nishimura T, Gemma A, Kudoh S, Hirohashi S. Plasma proteomics of lung cancer by a linkage of multi- dimensional liquid chromatography and two-dimensional difference gel electrophoresis. Proteomics. 2006 Jul;6(13):3938-48.

69 - Matsuda Y, Yamamoto T, Kudo M, Kawahara K, Kawamoto M, Nakajima Y, Koizumi K, Nakazawa N, Ishiwata T, Naito Z. Expression and roles of lumican in lung adenocarcinoma and squamous cell carcinoma. Int J Oncol. 2008 Dec;33(6):1177-85.

70 - Coulson-Thomas VJ, Coulson-Thomas YM, Gesteira TF, Andrade de Paula CA, Carneiro CR, Ortiz V, Toma L, Kao WW, Nader HB. Lumican expression, localization and antitumor activity in prostate cancer. Exp Cell Res. 2013 Apr 15;319(7):967-81.

110

71 - Radwanska A, Litwin M, Nowak D, Baczynska D, Wegrowski Y, Maquart FX, Malicka- Blaszkiewicz M. Overexpression of lumican affects the migration of human colon cancer cells through up-regulation of gelsolin and filamentous actin reorganization. Exp Cell Res. 2012 Nov 1;318(18):2312-23.

72 - Farace C, Oliver JA, Melguizo C, Alvarez P, Bandiera P, Rama AR, Malaguarnera G, Ortiz R, Madeddu R, Prados J. Microenvironmental Modulation of Decorin and Lumican in Temozolomide-Resistant Glioblastoma and Neuroblastoma Cancer Stem- Like Cells. PLoS One. 2015 Jul 31;10(7):e0134111.

73 - Mehta K, Fok J, Miller FR, Koul D, Sahin AA. Prognostic significance of tissue transglutaminase in drug resistant and metastatic breast cancer. Clin Cancer Res. 2004 Dec 1;10(23):8068-76.

74 - Mangala LS, Fok JY, Zorrilla-Calancha IR, Verma A, Mehta K. Tissue transglutaminase expression promotes cell attachment, invasion and survival in breast cancer cells. Oncogene. 2007 Apr 12;26(17):2459-70.

75 - Satpathy M, Shao M, Emerson R, Donner DB, Matei D. Tissue transglutaminase regulates matrix metalloproteinase-2 in ovarian cancer by modulating cAMP-response element-binding protein activity. J Biol Chem. 2009 Jun 5;284(23):15390-9.

76 - Ai L, Kim WJ, Demircan B, Dyer LM, Bray KJ, Skehan RR, Massoll NA, Brown KD. The transglutaminase 2 gene (TGM2), a potential molecular marker for chemotherapeutic drug sensitivity, is epigenetically silenced in breast cancer. Carcinogenesis. 2008 Mar;29(3):510-8.

77 - Verma A, Mehta K. Tissue transglutaminase-mediated chemoresistance in cancer cells. Drug Resist Updat. 2007 Aug-Oct;10(4-5):144-51. Review.

78 - Kim HJ, Roh MS, Son CH, Kim AJ, Jee HJ, Song N, Kim M, Seo SY, Yoo YH, Yun J. Loss of Med1/TRAP220 promotes the invasion and metastasis of human non-small- cell lung cancer cells by modulating the expression of metastasis-related genes. Cancer Lett. 2012 Aug 28;321(2):195-202.

111

79 - Kumar A, Xu J, Brady S, Gao H, Yu D, Reuben J, Mehta K. Tissue transglutaminase promotes drug resistance and invasion by inducing mesenchymal transition in mammary epithelial cells. PLoS One. 2010 Oct 12;5(10):e13390.

80 - Caffarel MM, Chattopadhyay A, Araujo AM, Bauer J, Scarpini CG, Coleman N. Tissue transglutaminase mediates the pro-malignant effects of oncostatin M receptor over- expression in cervical squamous cell carcinoma. J Pathol. 2013 Oct;231(2):168-79.

81 - Huang YC, Wei KC, Chang CN, Chen PY, Hsu PW, Chen CP, Lu CS, Wang HL, Gutmann DH, Yeh TH. Transglutaminase 2 expression is increased as a function of malignancy grade and negatively regulates cell growth in meningioma. PLoS One. 2014 Sep 23;9(9):e108228.

82 - Kotzin JJ, Spencer SP, McCright SJ, Kumar DBU, Collet MA, Mowel WK, Elliott EN, Uyar A, Makiya MA, Dunagin MC, Harman CCD, Virtue AT, Zhu S, Bailis W, Stein J, Hughes C, Raj A, Wherry EJ, Goff LA, Klion AD, Rinn JL, Williams A, Flavell RA, Henao- Mejia J. The long non-coding RNA Morrbid regulates Bim and short-lived myeloid cell lifespan. Nature. 2016;537(7619):239-243.

83 - Fukazawa H, Mizuno S, Uehara Y. A microplate assay for quantitation of anchorage- independent growth of transformed cells. Anal Biochem. 1995 Jun 10;228(1):83-90.

84 - Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, et al. The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data. Cancer Discov. 2012;2(5):401 LP-404.

85 - Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, et al. Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal. Sci Signal. 2013;6(269):pl1 LP-pl1.

86 - Anaya J. OncoLnc: linking TCGA survival data to mRNAs, miRNAs, and lncRNAs. PeerJ Comput Sci. 2016;2(e67).

87 - Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Meth. 2015;12(4):357–60

112

88 - Anders S, Pyl PT, Huber W. HTSeq—a Python framework to work with high- throughput sequencing data. Bioinformatics. 2014;31(2):166–9.

89 - Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550.

90 - Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15–21.

91 - Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotech. 2011;29(1):24–6.

113

A B

200 80 *** *** 150 60

100 40

50 20

0 0

LINC00152 Expression (FPKM) LINC00152 Expression (FPKM) Normal G2 G3 GBM Normal IDH mut IDH mut IDH wt 1p19q C D E F G HNSCC KIRP LIHC COADREAD KIRC 400 1000 600 500 p = 0.002 1200 p < 0.0001 p = 0.0005 p < 0.0001 p < 0.0001 500 1000 800 400 300 400 800 300 600 300 600 200 400 200 400 200 100 100 100 200 200 0 0 0 0 0 Normal Tumor LINC00152 Expression (RSEM) Normal Tumor LINC00152 Expression (RSEM) Normal Tumor Normal Tumor LINC00152 Expression (RSEM)

LINC00152 Expression (RSEM) Normal Tumor LINC00152 Expression (RSEM) H I J K L BRCA STAD UCEC THCA LUAD 350 p < 0.0001 p < 0.0001 p < 0.0001 p < 0.0001 800 800 300 p = 0.0015 800 800 250 600 600 600 600 200 400 400 150 400 400 100 200 200 200 200 50 0 0 0 0 0 Normal Tumor LINC00152 Expression (RSEM) Normal Tumor LINC00152 Expression (RSEM) Normal Tumor Normal Tumor LINC00152 Expression (RSEM)

LINC00152 Expression (RSEM) Normal Tumor LINC00152 Expression (RSEM)

Figure 1. LINC00152 is upregulated in aggressive gliomas and in many cancer types. A) Boxplot of LINC00152 expression in normal brain tissue, G2 (grade 2 glioma), G3 (grade 3 glioma) and GBM. B) Boxplot of LINC00152 expression in LGG subtypes and normal brain tissue. C – L) Expression (RSEM) of LINC00152 in tumors and matched normal tissue from the TCGA in head and neck squamous carcinoma, renal papillary tumor, hepatocellular carcinoma, colorectal carcinoma, renal clear cell carcinoma, breast invasive carcinoma, stomach adenocarcinoma, uterine carcinoma, thyroid carcinoma and lung adenocarcinoma, respectively. 114 A B GBM LGG 1.00 1.00 LINC00152 High LINC00152 High LINC00152 Low 0.75 LINC00152 Low 0.75

0.50 0.50 p = 0.0039 0.25 0.25

p < 0.0001 Survival probability Survival Survival probability Survival 0.00 0.00 0 1000 2000 3000 4000 0 2000 4000 6000 Time (days) Time (days) C D 1.00 HNSCC 1.00 LUAD LINC00152 High LINC00152 High 0.75 LINC00152 Low LINC00152 Low 0.75 0.50 0.50 0.25

0.25 Survival probability Survival

p = 0.0071 probability Survival 0.00 p = 0.04 0.00 0 1000 2000 3000 4000 5000 0 2000 4000 6000 Time (days) Time (days) E F 1.00 KIRC 1.00 LIHC LINC00152 High LINC00152 High LINC00152 Low 0.75 0.75 LINC00152 Low

0.50 0.50

0.25 0.25 Survival probability Survival Survival probability Survival p < 0.0001 p = 0.0087 0.00 0.00 0 1000 2000 3000 4000 0 1000 2000 3000 Time (days) Time (days)

Figure 2. High level of LINC00152 expression is associated with poor patient prognosis in GBMs, LGGs and many other tumors. A) Kaplan Meier of GBM patients separated into the 50% highest expressing LINC00152 cohort and the lower 50% expressing cohort. B) Kaplan Meier of LGG patients separated into the 50% highest expressing LINC00152 cohort and the lower 50% expressing cohort. C-F) Kaplan Meier plots of the highest LINC00152 expressing quartile and lowest LINC00152 expressing quartiles for head and neck squamous carcinoma, lung adenocarcinoma, renal clear cell carcinoma, and hepatocellular carcinoma, respectively.

115 A B C nucleus cytoplasm - control LINC00152 1.2

1 U87 0.8 0.6 Actin

0.4 U251 0.2

Lamin A/C expression Relative 0 Actin MALAT1 LINC00152

D E * 1.2 1.4 *

1 siGL2 1.2

0.8 1 0.8 0.6 0.6

0.4 si00152_II 0.4 0.2

Relative Invasion Relative 0.2 Relative 00152 expression 00152 Relative 0 siGL2 si00152_II si00152_III 0

si00152_III siGL2 si00152_II si00152_III

* F G 2.5 25 2 20

pcDNA3 1.5 15

- 1 10

0.5

5 00152

pcDNA3 Relative Invasion Relative 0 0 pcDNA3 pcDNA3-00152

Relative 00152 expression 00152 Relative pcDNA3 pcDNA3-00152

Figure 3. LINC00152 is a cytoplasmic lncRNA that promotes cell invasion in U87 cells. A) Western blot of Lamin A/C and Actin, markers of the nucleus and cytoplasm, respectively. B) qRT-PCR of LINC00152 and a cytoplasmic RNA marker, Actin, and a nuclear RNA marker, MALAT1. C) In-situ hybridization of LINC00152 in U87 and U251 cell lines; DRAIC lncRNA probes were used as negative control (“- control”). Purple color: positive signal. D) qRT-PCR showing knockdown of LINC00152 after treatment with two different siRNAs. E) Invasion assay with U87 cells after treatment with two diferent siRNAs against LINC00152; * p-value < 0.05. F) qRT-PCR showing overexpression of LINC00152 after transient overexpression. G) Invasion assay with U87 cells overexpressing LINC00152; * p-value < 0.05.

116 A B siGL2 si00152_II si00152_III

4 1.2 3.5 1 3 0.8 2.5 0.6 2 1.5 0.4 1

0.2 Relative 00152 expression 00152 Relative Relative Cell Proliferation Cell Relative 0.5 0 0 Day 0 Day 1 Day 2 Day 3

C D pcDNA3 pcDNA3-00152 25 2.5

20 2

15 1.5

10 1

5 0.5 Relative Cell Proliferation Cell Relative Relative 00152 expression 00152 Relative 0 0 Day 0 Day 1 Day 2 Day 3 Day 4

Figure 4. LINC00152 does not affect U87 cell proliferation rate. A) qRT-PCR showing knockdown of LINC00152 after treatment with two different siRNAs. B) Cell proliferation rate measured by MTT assay of U87 cells treated with two different siRNA against LINC00152. C) qRT-PCR showing overexpression of LINC00152 after LINC00152 full length overexpression. D) Cell proliferation rates of U87 cells with and without LINC00152 overexpression.

117 A B Day 2 Day 4 Day 6

1.2

1 siGL2 0.8 0.6

0.4 II si00152_ siGL2 0.2

0

III

si00152_ 00152 expression relative relative toexpression 00152

C D siGL2 si00152_2 si00152_3 0.4 Day 2 1 0.8 0.3 0.6 siGL2 0.2 0.4 si00152_2 0.2

si00152_3 ratio Frequency 0.1 0

0 - 1 1 - 2 2 - 3 3 - 4 > 4 Abosbance at 590nmat Abosbance 0 pixels Day 0 Day 2 Day 4 Day 6 E siGL2 si00152_2 si00152_3 F siGL2 si00152_2 si00152_3 Day 4 Day 6 1 1 0.8 0.8 0.6 0.6 0.4 0.4

0.2 0.2 Frequency ratio Frequency 0 ratio Frequency 0 0 - 1 1 - 2 2 - 3 3 - 4 > 4 0 - 1 1 - 2 2 - 3 3 - 4 > 4 pixels pixels Figure 5. LINC00152 knockdown reduces proliferation rate of U87 cells in suspension. A) qRT- PCR showing knockdown of LINC00152 after treatment with two different siRNAs. B) Representative pictures of U87 cells seeded in Poly(HEMA)-coated plates after treatment with two different siRNAs against LINC00152. C) Cell proliferation rate measured by MTT assay of U87 cells seeded in Poly(HEMA)-coated plates treated with two different siRNA against LINC00152. D-F) Histogram showing the size of U87 cell clusters seeded in Poly(HEMA)-coated plates treated with two different siRNAs against LINC00152 after 2 (D), 4 (E) and 6 (F) days.

118 A B Day 2 Day 4 Day 6 35 30

25 pcDNA3

20 -

15 00152

10 pcDNA3 00152 expression 00152

relative to pcDNA3 to relative 5

- m8 0 -

pcDNA3 00152 m8

pcDNA3 00152

C D Empty 00152 m8 0.50 Day 2 1 0.40 0.8 0.6 0.30 pcDNA3 0.4 0.2 0.20 00152

m8 ratio Frequency 0

0.10 0 - 1 1 - 2 2 - 3 3 - 4 > 4 Abosbance at 590nmat Abosbance 0.00 pixels Day 0 Day 2 Day 4 Day 6

E Empty 00152 m8 F Empty 00152 m8 Day 6 1 Day 4 1 0.8 0.8 0.6 0.6 0.4 0.4

0.2 0.2 Frequency ratio Frequency 0 ratio Frequency 0 0 - 1 1 - 2 2 - 3 3 - 4 > 4 0 - 1 1 - 2 2 - 3 3 - 4 > 4 pixels pixels Figure 6. LINC00152 full length or M8 overexpression induces proliferation rate of U87 cells in suspension. A) qRT-PCR showing U87 cells overexpressing LINC00152. B) Representative pictures of U87 cells overexpressing LINC00152 seeded in Poly(HEMA)-coated plates. C) Cell proliferation rate measured by MTT assay of U87 cells overexpressing LINC00152 seeded in Poly(HEMA)-coated plates. D-F) Histogram showing the size of U87 cells overexpressing LINC00152 seeded in Poly(HEMA)-coated plates after 2 (D), 4 (E) and 6 (F) days.

119 A B

C

Figure 7. LINC00152 regulates genes involved in invasion in U87 cells. A) Volcano plot of statistical significance against fold-change highlighting differentially regulated genes in red color upon siLINC00152 in U87 cells. B) Plot from gene set enrichment analysis (GSEA) showing the gene set involved in epithelial-to- mesenchymal transition (EMT) enriched among upregulated genes (red end of spectrum) after LINC00152 knockdown in U87 cells. C) Cumulative distribution frequency plots of miRNA target mRNAs (as predicted by TargetScan) or non-targets showing fraction of genes with fold change less than that indicated on the X-axis after LINC00152 knockdown. None of the miRNAs previously proposed to be sponged by LINC00152 are released as evident from the fact that their targets are not repressed upon LINC00152 knockdown. 120 A

B C 60 50 40 30 20 10

0 Relative 00152 expression 00152 Relative

*

* -

D - 2.5 *

m3

m6

- -

2

pcDNA3

pcDNA3

pcDNA3

00152 00152

1.5

-

-

-

m7

m4

- -

1

00152

pcDNA3

pcDNA3

pcDNA3

00152 00152

0.5

Relative Invasion Relative

-

-

-

m2

m8

m5

- -

- 0

pcDNA3

pcDNA3

pcDNA3

00152

00152 00152

E 2.5 14 12 2

siGL2 10

pcDNA3 + + pcDNA3 pcDNA3 + + pcDNA3 si00152_II 1.5 8 1 6 4 0.5

2

siGL2 00152 + + 00152

00152 + + 00152 0 si00152_II

Relative Invasion Relative 0

Relative 00152 expression 00152 Relative

m8 + + m8

si00152_II m8 + + m8siGL2

Figure 8. A 120 nucleotide hairpin at the 3’ end of LINC00152 (M8) is sufficient for promoting cell invasion in U87 cells. A) Predicted secondary structure of LINC00152 and the stem loop and protein bound regions identified by PARIS (RNA Duplex) and Ribo-seq (Supp. Fig. 5). B) Schematics of LINC00152 deletion mutants. C) LINC00152 qRT-PCR confirming overexpression levels of the different constructs. D) Invasion of U87 cells after overexpressing the different LINC00152 deletion mutants; * p- value < 0.05. E) Invasion of U87 cells decreases after treatment with si00152_II but is rescued by LINC00152 m8 overexpression; * p-value < 0.05. 121 A * 2.5

* * A

2

mut pcDNA3

1.5 -

B 1

mut

00152 Relative Invasion Relative pcDNA3 0.5

0

-

m8

-

AB

mut

pcDNA3 00152

D mut A gggc to cccg B 25 20

15

10

5 mut B

0 gccc to cggg 00152 Relative Expression Relative 00152

C

mutAB gggc to cccg and gccc to cggg

Figure 9. Point-mutation of nucleotides 333-336 or 349-352 of LINC00152 shows the importance of M8 hairpin for stimulating invasion. A) Invasion of U87 cells after the different LINC00152 deletion mutants are overexpressed. Mut A or mut B are incapable of inducing invasion in U87 cells. Combining the two mutants (mut AB) restores the hairpin (Fig. 6D) and induces invasion to the same level as full length LINC00152; * p-value < 0.05. B) qRT-PCR confirming overexpression of the different LINC00152 constructs. C) Predicted secondary structure of full length LINC00152 with the black line marking the sequence at the tip of the hairpin and the ochre and green lines marking the residues that are mutated in Mut A or B, respectively. D) Predicted secondary structures of LINC00152 mutants A, B and AB. The black, ochre and green lines mark the corresponding residues as in Fig. 6C. 122 A D

20

10 mapped reads mapped

Unique reads mapped million per mapped reads Unique 0 LINC00152 MIR4435

B C E F LGG GBM

15 ρ: 0.68 20

2HG 2HG

- 15 10 - 10 5 5

ρ: 0.86 MIR4435 0 MIR4435 0

0.0 2.5 5.0 7.5 10 0 10 20 Uniquely mapped reads mapped Uniquely Uniquely mapped reads mapped Uniquely LINC00152 LINC00152 LINC00152 MIR4435-2HG G LINC00152 - LGG H MIR4435-2HG - LGG

1.00 LINC00152 High 1.00 MIR4435-2HG High LINC00152 Low MIR4435-2HG Low 0.75 0.75

0.50 0.50

0.25

p = 0.00012 0.25 p < 0.0001

Survival probability Survival Survival probability Survival 0.00 0.00 0 2000 4000 6000 0 2000 4000 6000 Time (days) Time (days) Figure 10. LINC00152 is highly similar to the lncRNA MIR4435-2HG. A) Sequence alignment of LINC00152 and MIR4435-2HG. M8 is highlighted in blue. B) LINC00152 specific RNA-seq reads in cells treated with siGL2 or siLINC00152. C) MIR4435-2HG specific RNA-seq reads in cells treated with siGL2 or siLINC00152. D) LINC00152 or MIR4435-2HG specific reads in TCGA RNA-seq data for LGG and GBM. E) Correlation of expression of LINC00152 and MIR4435-2HG in LGGs (spearman correlation 0.68 p value < 2.2 e-16). F) Correlation of expression of LINC00152 and MIR4435-2HG in GBMs (spearman correlation: 0.86, P value < 2.2e-16). G) Kaplan Meier Plot of LGG patients separated into the 50% highest expressing LINC00152 cohort and the lowest 50% expressing cohort. H) Kaplan Meier Plot of LGG patients separated into the 50% highest expressing MIR4435-2HG cohort and the lowest 50% expressing cohort. 123 A B

100 300 50 250 40 200 30 150 20 100 10 50

0 0

LINC00152 Expression (FPKM) LINC00152 Expression (FPKM) Normal G2 G3 Normal Proneural Classical Neural GBM subtypes

Supplementary Figure 1. LINC00152 is not differentially expressed in LGGs globally or between GBM subtypes. A) Boxplot of LINC00152 expression in normal brain tissue, G2 (grade 2 glioma) and G3 (grade 3 glioma). B) Boxplot of LINC00152 expression in GBM subtypes.

124 A B 1.00 KIRP LINC00152 High 1.00 PAAD LINC00152 High LINC00152 Low LINC00152 Low 0.75 0.75

0.50 0.50

0.25 0.25

Survival probability Survival p = 0.014 Survival probability Survival p = 0.015 0.00 0.00 0 2000 4000 6000 0 1000 2000 Time (days) Time (days) C LAML 1.00 LINC00152 High LINC00152 Low 0.75

0.50

0.25

Survival probability Survival p = 0.019 0.00 0 500 1000 1500 2000 2500 Time (days)

Supplementary Figure 2. High levels of LINC00152 expression is associated with negative patient outcomes in three more cancers. A-C) Kaplan Meier plots of the top third of LINC00152 expressing patients and the bottom third of LINC00152 expressing patients in renal papillary tumor, pancreatic adenocarcinoma, and acute myeloid leukemia, respectively.

125 A B nucleus cytoplasm

1.2

1

0.8

Actin 0.6

0.4

0.2

Lamin A/C Relative expression Relative 0 * Actin MALAT1 LINC00152 *

C D

Relative Invasion Relative Relative expression Relative

E F *

Relative Invasion Relative Relative Invasion Relative

Supplementary Figure 3. LINC00152 is a cytoplasmic lncRNA that promotes cell invasion in A172 cells. A) Western blot of Lamin A/C and Actin, markers of the nucleus and cytoplasm, respectively. B) qRT-PCR of LINC00152 and a cytoplasmic RNA marker, Actin, and a nuclear RNA marker, MALAT1. C) qRT-PCR showing knockdown of LINC00152 after treatment with two different siRNAs. D) Invasion assay with A172 cells after treatment with two different siRNAs against LINC00152; * p-value < 0.05. E) qRT- PCR showing overexpression of LINC00152 after transient overexpression. F) Invasion assay with U87 cells overexpressing LINC00152; * p-value < 0.05.

126 A B C 1.2 1.4 1.2 1 1.2 1 0.8 1 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2

0 0 0

relative to siGL2to relative siGL2to relative

relative to siGL2to relative siGL2 siLUM siGL2 siTGM2

00152 RNA expression RNA 00152 expression RNA 00152 expression RNA 00152

D 1.2 1 0.8 0.6 0.4

Relative Invasion Relative 0.2 0

Supplementary Figure 4. IGFBP4, LUM and TGM2 knockdowns decrease invasion of U87 cells. A-C) qRT-PCR showing knockdown of IGFBP4, LUM and TGM2 in U87 cells, respectively. D) Invasion assay with U87 cells after treatments with IGFBP4, LUM and TGM2 siRNAs.

127 A 00152 B IGFBP4 C LUM 120 6 5 100 5 4 80 4 3 60 3 2 40 2

20 1 1

to pcDNA3 + + siGL2 pcDNA3to to pcDNA3 + + siGL2 pcDNA3to

0 + siGL2 pcDNA3to 0 0

RNA expression normalized expression RNA

RNA expression normalized expression RNA normalized expression RNA

D E 2.5 TGM2 2

5 siGL2 1.5 pcDNA3 + + pcDNA3 4 1 0.5 3

Relative Invasion Relative 0

2

siGL2 00152 + + 00152

to pcDNA3 + + siGL2 pcDNA3to 1

RNA expression normalized expression RNA 0

siGL2 pcDNA3 + + pcDNA3

Supplementary Figure 5. Knockdown of IGFBP4, LUM and TGM2 prevents the invasive phenotype produced by LINC00152 overexpression in U87 cells. A) qRT-PCR showing LINC00152 overexpression in U87 cells. B-D) qRT-PCR showing knockdown of IGFBP4, LUM and TGM2 after LINC00152 overexpression in U87 cells, respectively. E) Invasion assay with U87 overexpressing cells after concomitant IGFBP4, LUM and TGM2 siRNAs treatment.

128 A B

C D

Hela 1 Brain Sample 1 Hela 2 Ribosome Sample 2 Profiling Sample 3 Sample 1 LINC00152 Exon 2 Brain Sample 2 RNA-seq Sample 3

LINC00152 Exon 2

Supplementary Figure 6. A protein-bound stem loop in LINC00152. A and B) The two lowest free- energy structures of LINC00152, predicted by mfold. Beige shaded areas show overlapping stem-loop regions between the structures which also overlaps with the PARIS reads and Ribo-seq reads. C) IGV view of PARIS reads (two separate experiments) generated from HeLa cells. The beige areas overlap with the beige area in A. The darker shaded areas show the reads obtained from PARIS with 2-bp gap where indicated. D) Top three tracks: IGV view of Ribo-seq reads from the brain showing sites bound by RNPs as darker shaded areas. Beige area is the same as the beige area in A. Bottom three tracks: RNA-seq reads from the brain that map to exon 2 of LINC00152. The different tracks represent three biological samples. If there is >20% of reads at a position is different from the reference genome (due to mutation, polymorphism, cross-link site), the nucleotide is colored as follows: A: green; T: red; C: blue; G: orange.

129 A B C

LINC00152 BCL2L11 #1 1.6 BCL2L11 #2 1.2 1.6 1.2 0.8 1.2 0.8

0.8 expression 2 0.4

0.4 0.4

relative to siGL2to relative 00152 expression 00152

0 siGL2to relative 0

relative to siGL2to relative 0

BCL2L11 # BCL2L11 BCL2L11 #1 expression #1 BCL2L11

D E F LINC00152 BCL2L11 #1 BCL2L11 #2 40 1.2 1.2 1 1 30 0.8 0.8 20 0.6 0.6 0.4 0.4 10

0.2 0.2 00152 expression 00152

relative to pcDNA3 to relative 0 relative to pcDNA3 to relative

relative to pcDNA3 to relative 0 0

BCL2L11 #2 expression #2 BCL2L11 BCL2L11 #1 expression #1 BCL2L11

Supplementary Figure 7. LINC00152 knockdown decreases BCL2L11 expression but overexpression of LINC000152 or the M8 hairpin does not upregulate BC2L11 expression. A) qRT- PCR showing knockdown of LINC00152 after treatment with two different siRNAs. B) and C) qRT-PCR of BCL2L11 using two different primer pairs after LINC00152 knockdown. D) qRT-PCR shows overexpression of full length LINC00152 or the M8 hairpin alone. E) and F) Same as B) and C) except after overexpression of LINC00152 or M8.

130 A

100 100 95 93

Supplementary Figure 8. LINC00152 knockdown leads to changes in gene expression. A) Heatmap showing clustering of replicates from the same condition (with or without LINC00152 knockdown) based on differentially regulated genes in U87 cells. Three biological replicates for each condition are shown to cluster together. Bootstrap values based on 1000 repetitions are shown near to the corresponding branches.

131 A

M2 (△280-401) LINC00152 Full Length M8 (280-401)

M7 (1-430)

M3 (1-280)

M4 (1-400) M5 (220-491) M6 (130-491)

Supplementary Figure 9. LINC00152 full length and deletion mutants. A) Predicted secondary structures of LINC00152 deletion mutants. M2 and M3 lack the protein bound stem-loop. M8 is highlighted in blue (nucleotides 280-401). On the other hand, M4 through M8 contain the protein bound stem-loop (highlighted in blue).

132 A Oligo Sequence (5’ to 3’) siGL2 CGUACGCGGAAUACUUCGA si00152_II UGACACACUUGAUCGAAUA si00152_III CCGGAAUGCAGCUGAAAGA LINC00152 primer 1 F ATGCAGCTGAAAGATTCCCT LINC00152 primer 1 R AGACTGGCCAGACAAATGG LINC00152 primer 2 F CTCTACCTGTTGCCCGCC LINC00152 primer 2 R GCAAATGCAGAGGCCTCAGA LINC00152 primer 3 F TCCTTCTTAGTCGTGTGTACATCA LINC00152 primer 3 R AGAGCTTCCTGTTTCATCTCCC Actin primer F TGAAGGCTTTTGGTCTCCCTG Actin primer R TCAACTGGTCTCAAGTCAGTGT MALAT1 primer F AAGCAAGCAGTATTGTATCG MALAT1 primer R AGATGTTAAAACAAGCCCAG TPM2 primer F AGAGGTCTGTGGCAAAGTTG TPM2 primer R AGGTTGTTGAGTTCCAGCAG PTX3 primer F TTTTGGAAGCGTGCATCCAG PTX3 primer R TGTGGCTTTGACCCAAATGC IGFBP4 primer F ATCGAGGCCATCCAGGAAAG IGFBP4 primer R TTTTGGCGAAGTGCTTCTGC TGM2 primer F TGCCCTTTGGAAAGCCATTG TGM2 primer R TTTTTGCCTGCTCCAAGGAG SPP1 primer F TGCCAGCAACCGAAGTTTTC SPP1 primer R TGTCAGGTCTGCGAAACTTC LUM primer F TTGCGTTTGGATGGCAATCG LUM primer R GAGTGACTTCGTTAGCAACACG BCL2L11 primer #1 F TGGGAAGCATTTGGTGTTGG BCL2L11 primer #1 R AAGCCTGCAACCAGAAAAGC BCL2L11 primer #2 F TTGTGCTGCTGGCTTTTCTG BCL2L11 primer #2 R TGGACTCTGCTGTAATGGACAC B siRNA II siRNA III LINC00152 Stem-loop

Primer 3 F Primer 3 R Primer 2 F Primer 2 R

Primer 1 F Primer 1 R

Supplementary Figure 10. A) List of oligos used in this study. B) Schematics of LINC00152 siRNA and primers relative to LINC00152 sequence.

133 Gene Description si00152 RNA-seq si00152 qPCR (log2) validation (log2)

TIMP3 metalloproteinase inhibitor 3 2.9 3.7

CADM1 cell adhesion molecule 1 1.9 3.0 SLIT2 slit guidance ligand 2 1.5 0.3

PTHLH parathyroid hormone-like hormone 1.3 1.8

NID2 nidogen 2 1.2 1.9 MATN3 matrilin 3 1.0 1.9 TPM2 tropomyosin 2 -1.0 -1.0 PTX3 pentraxin 3 -1.3 -0.7

IGFBP4 insulin like growth factor binding protein 4 -1.3 -1.1

NNMT nicotinamide N-methyltransferase -1.5 0.3

TGM2 transglutaminase 2 -1.8 -1.0

SPP1 secreted phosphoprotein 1 -1.8 -1.1

LUM lumican -1.8 -1.3

Supplementary Table 1. Changes in expression of genes involved in EMT independently confirmed by qRT-PCR after knockdown of LINC00152.

134 Gene Description si00152 si00152 qPCR 00152 M8 RNA-seq validation (log2) overexpression overexpression (log2) qPCR validation qPCR validation (log2) (log2)

TPM2 tropomyosin 2 -1.0 -1.0 0.3 0.3 PTX3 pentraxin 3 -1.3 -0.7 1.9 1.7 insulin like growth factor IGFBP4 -1.3 -1.1 0.7 0.5 binding protein 4 TGM2 transglutaminase 2 -1.8 -1.0 1.7 1.7

SPP1 secreted phosphoprotein 1 -1.8 -1.1 1.9 1.7

LUM lumican -1.8 -1.3 3.2 3.1

Supplementary Table 2. Changes in expression of genes involved in EMT independently confirmed by qRT-PCR after knockdown of LINC00152, LINC00152 full length overexpression and LINC00152 M8 stem-loop overexpression.

135 Chapter 4: Closing Remarks and Future Directions

Alternative methods for discovering APTR interacting partners

As discussed in Chapter 2, thousands of genomic sites were found to be associated with APTR, however, stringency of the pull-down was not high enough and the sequencing of the pulled-down genomic sites did not have enough depth to detect all potential APTR bound DNA. There were a number of solutions indicated in that Chapter, such as, (1) including a different set of negative control oligos (unrelated lncRNA); (2) using a cell line where APTR is absent, as a negative control; (3) increasing the starting number of cells; and (4) using cells that stably overexpress APTR. However, there are alternative approaches that can be used to map APTR genomic binding sites with higher sensitivity and specificity.

One of these methods is RAP (RNA Antisense Purification). RAP also uses antisense biotinylated oligos to enrich protein, DNA and RNA associated to a lncRNA of interest, however, it differs from CHIRP in a few steps. The main difference is the use of overlapping longer oligonucleotides (90 to 120-nucleotide) that tile across the entire lncRNA of interest. Using this strategy, the oligonucleotides form a much stronger complex with the lncRNA and its associated protein, DNA and RNA partners, leading to the second advantage of RAP over CHIRP. Because it relies on longer overlapping tiling oligos, RAP allows the use of higher salt and detergent conditions in hybridization and wash buffers, increasing the pull-down specificity [1].

Another alternative technique to CHIRP is CHART (Capture Hybridization Analysis of RNA Targets). This technique was originally developed to discover the genomic targets

136 of roX2, a lncRNA required for sex dosage compensation in Drosophila S2 cells. As in the previous two methods, CHART also uses biotinylated oligos complementary to the lncRNA of interest. However, the main difference of CHART is the use of a RNase-H mapping assay, previously described in Chapter 2, to ensure that the oligonucleotides target a region on the lncRNA that is not blocked by a protein binding partner [2].

Both of these methods have been employed in studies of other lncRNAs in different cell contexts [3-7]. Therefore, since they are more robust methods, using CHART or RAP followed by DNA sequencing will increase the sensitivity of APTR pull-down. Moreover, performing one or both of these methods and to overlap the data with the existing CHIRP- seq data will significantly improve the specificity of identification of APTR interacting genomic loci.

Even though RAP and CHART are more robust methods for pulling down endogenous APTR, a more promising approach is to use the overexpression system previously pointed out in Chapter 2. In that strategy APTR was fused to the S1 aptamer and more than 70 proteins were identified as APTR partners by mass-spectrometry. This approach could be repeated, but instead of sending the APTR-enriched material for mass-spectrometry, the DNA bound to APTR should be sent for sequencing. We expect this to considerably increase the sensitivity of APTR pull-down.

It should also be noted that SILAC (Stable Isotope Labeling by Amino acids in Cell culture) can be coupled to one of the aforementioned methods to identify protein interactors of lncRNAs [8]. SILAC consists in incorporating either heavy or light lysine and arginine amino acids in cells in culture, maintaining the cells in these conditions until a point when nearly all the lysines and arginines in the culture are labeled with heavy or

137 light isotopes. At this point, one of the pull-down methods is performed and the proteins that co-purify with the RNA is sent to mass-spectrometry analysis. A pool of cells labeled with heavy isotopes is used to perform the APTR pull-down and, as a negative control, the pool of cells labeled with the light isotopes is used to perform the nonspecific pull- down. Thus, the proteins associated with APTR contain heavy labeled lysines and arginines, while the nonspecific pull-down contains proteins with light labeled lysines and arginines. The ratio of the heavy to light isoforms for each peptide (containing lysine or arginine) is calculated [8]. Peptides with increased heavy to light ratio identify proteins that specifically associate with APTR over negative control. The procedure is then reversed with the light isotope labeled cell lysate being pulled down by APTR and the heavy isotope labeled lysate pulled down by the non-specific control. Furthermore, these newly identified APTR interacting proteins can be compared with the list of 72 proteins previously identified by S1-APTR overexpression pull-down, to narrow down the protein interactor candidates.

To confirm whether APTR specifically associates with the proteins identified by this screening strategy, immunoprecipitation assays followed by qRT-PCR need to be performed using the top candidate proteins. As negative controls, other mRNAs and lncRNAs, whose expression levels are similar to APTR, should be used. Negative controls for the immunoprecipitation assays include the proteins that do not co-precipitate with APTR in the mass spectrometry screen. Finally, once a real APTR interactor is identified, ChIP-seq of that protein will answer whether the protein binds to the genome at sites that overlap with APTR-bound sites.

APTR processing into different fragments

138

The BRD7 and NFIC RIP (RNA immunoprecipitation) experiments raised the unexpected possibility of APTR being processed in three fragments, since NFIC was shown to be bound to the 5’ region of APTR and BRD7 to the 3’ region, while a central region did not interact with either of these proteins. Thus, in addition to perform northern blotting using probes that detect different parts of APTR, another idea to confirm this hypothesis is to perform in vitro pull-down experiments using APTR deletion mutants.

Also, overexpression of APTR mutants followed by in vitro pull-down can help map the nucleotides necessary for NFIC, BRD7, and other proteins, bind to APTR.

APTR and LINC00152 secondary structure

In order to gain insight on the molecular mechanism of action of APTR, we used an RNase H mapping strategy to identify four accessible regions on this lncRNA molecule. We believe that these could represent loops and potential binding motifs on

APTR secondary structure. For LINC00152, we used publicly available PARIS and

Riboseq data [9,10] to identify a functional 3’ protein bound stem-loop. Unfortunately, the same analysis did not reveal such a protein bound stem-loop on APTR, since neither of these datasets contained reads that mapped to APTR.

There is growing evidence in the literature highlighting the functional importance of a lncRNA secondary structure [11-13]. In this context, a method called SHAPE-MaP

(Selective 2’-Hydroxyl acylation Analyzed by primer extension and Mutational Profiling) was reported to determine RNA secondary structure. SHAPE-MaP relies on the treatment of cells with 2’-hydroxyl-selective reagents, which attack single-stranded RNA, followed by high throughput sequencing. During library preparation, when the RNA is converted into cDNA, the reverse transcriptase enzyme incorporates noncomplementary

139 nucleotides at sites of single stranded RNA that have been covalently modified by the 2’- hydroxyl-selective reagent. Alignment of the sequenced cDNA libraries reveals mutated nucleotides that are misaligned to the reference transcriptome. The secondary structure of the lncRNA is then deduced after separating regions of heavily mutated sites (areas of single stranded RNA) from nonmutated sites (areas of double stranded RNA) and comparing with secondary structure predictions from RNAfold [14,15]. The experiment is done both on naked RNA and on RNA in cells (where it interacts with cellular proteins).

Areas that are protected in the in-cell preparations, but not in the naked RNA preparations also indicate possible sites bound by cellular proteins. Therefore, this is an additional approach to unravel the structure of APTR and LINC00152. In fact, another student in Dr.

Dutta’s laboratory, Roza Przanowska, has been working with the lncRNA MUNC and preliminary data from this experiment is promising. Thus, she has also started to optimize this method to discover the secondary structure of LINC00152.

Identifying LINC00152 protein interactor

As described in Chapter 1, many lncRNAs act by interacting with proteins.

However, this is perhaps the most challenging step in unveiling the mechanism of action of a lncRNA. Since we know that LINC00152 has a functional stem-loop structure, we used publicly available eCLIP (enhanced Cross-Linking and Immunoprecipitation) data from ENCODE (Encyclopedia of DNA Elements) in K562 cells to find protein partners candidates [16]. This analysis suggested a direct interaction between LINC00152 and the spliceosome proteins U2AF1 (U2 small nuclear RNA auxiliary factor 1) and SRSF1

(Serine and Arginine Rich Splicing Factor 1), even though the enrichment peak was modest (Figure 1A). U2AF1 was reported to be frequently mutated in lung

140 adenocarcinoma, [17] and in hematological malignancies [18,19, 20]. In addition, when

U2AF1 is mutated in leukemias, patients have a worse prognosis [18]. SRSF1 is a splicing factor protein essential for pre-mRNA splicing [21] and it is reported to be upregulated in different tumors [22]. Moreover, SRSF1 has been implicated in inducing proliferation and reducing apoptosis in breast cancer cells [23]. Since both proteins are components of the spliceosome machinery, not surprisingly, SRSF1 have been reported to interact with

U2AF1 [24,25]. Thus, to experimentally test whether LINC00152 and spliceosome proteins are associated, I performed RIP using SRSF1 antibody, and IgG as a negative control. SRSF1 RIP was followed by qRT-PCR for LINC00152. HIST3H3, a non-spliced mRNA, was used as a negative control since it should not be associating with spliceosome proteins such as SRSF1. This experiment showed that LINC00152 association with SRSF1 was not significantly higher than the association of SRSF1 with the HIST3H3 mRNA, suggesting that LINC00152 is not associated specifically with the spliceosome machinery to promote invasion (Figure 1B).

Even though capturing lncRNAs interacting protein partners is a major struggle because of possible transient and weak interactions, finding the protein interactor is of great interest. Therefore, in addition to searching for potential LINC00152 protein binding partners using publicly available eCLIP data, I performed LINC00152 endogenous pull- down using the CHIRP method. We compared LINC00152 with the negative controls

(beads only, and two other lncRNAs GS1 and DRAIC) in three different GBM cell lines

(Figure 2A-C). Later, LINC00152 enriched material was divided in two parts. The first half was submitted to SDS-PAGE followed by silver staining (Figure 2D-F). The second half was submitted to mass spectrometry at Dr. Ashish Lal’s laboratory at the National Cancer

141

Institute (NIH). Although qPCR showed that LINC00152 was successfully isolated from

GBM cells (Figure 2A-C), mass spectrometry results demonstrated that the number of peptides associated to LINC00152 was extremely low (Table 1). Thus, revealing that, as for APTR, further optimization is required for achieving high sensitivity of LINC00152 pull- down. However, there is a 24 kD band that specifically appears in the LINC00152 pull- downs in the silver-stained gels that appears very promising. This band is seen in all three cell-lines and is not visible in the GS1 or DRAIC pull-downs.

To explore alternative methods for pulling down lncRNAs, another scientist in Dr.

Dutta’s laboratory, Dr. Shekhar Saha has been optimizing an in vitro system. His idea to circumvent this specificity problem is to in vitro transcribe LINC00152 using Br-UTP and incubate cell lysates with anti BrU antibodies. His preliminary data from another lncRNA is promising and he is optimizing this method to discover the protein interactors of

LINC00152.

APTR as a regulator of gene expression

As discussed in Chapter 1, the major advantage of knowing the subcellular localization of a lncRNA is the gain of insight about its molecular mechanism. As a nuclear lncRNA, APTR has been shown to interact with PRC2 and recruit it to the p21 promoter region causing p21 downregulation [26]. Thus, APTR is a trans acting lncRNAs because it is capable of leaving its site of transcription (at chromosome 7) and regulate the expression of p21, a distant gene (transcribed from chromosome 6).

In this study, we observed that APTR does not act as an enhancer RNA (eRNA), since its downregulation did not significantly alter the expression of its two neighboring

142 genes, PTPN12 and RSBN1L. Furthermore, although CHIRP-seq experiment had low sensitivity and specificity, we showed that there are around 2,000 genomic sites interacting with APTR. Among them, 115 of these sites are marked with EZH2, suggesting that APTR may recruit PRC2 to these 115 genomic loci, but not to the remaining ~1900 sites. In addition, 43 CHIRP-seq peaks are near genes that are differentially expressed upon APTR knockdown, indicating that APTR may regulate expression of only a few genes by direct interaction with genomic loci. These results suggest that while APTR could works as a regulator of gene expression by recruiting PRC2 to some genomic loci, this is not the primary mode of action of the lncRNA. Additionally, S1-APTR overexpression pull-down experiment showed that there are more than 70 potential protein binding partners for APTR. Interestingly, two transcription factors, BRD7 and

NFIC, were present in this pool and their association with APTR was validated by RIP experiments. Analysis of a publicly available BRD7 ChIP-seq data from 293 cells

(GSE65974) revealed that BRD7 only associates to 157 sites in the genome. Thus, due to the limited number of BRD7 DNA-bound sites, there is only one APTR CHIRP-seq peak that overlaps with BRD7 ChIP-seq peak, after considering a 1kb distance between the peaks. In contrast, ChIP-seq data from HepG2 (GSM2902642) shows that NFIC binds to thousands of sites in the genome. However, only 164 of these sites overlap with APTR

CHIRP-seq peaks, after considering a 1kb distance between the peaks. Thus, in addition to recruit PRC2 to genomic sites, APTR may also regulate gene expression by recruiting these transcription factors to gene promoters. However, since the overlap between the three ChIP-seq datasets and APTR CHIRP-seq did not show a large overlap, APTR association with PRC2, BRD7 or NFIC may not represent its primary mode of action.

143

Another possibility is that APTR serve as a molecular decoy for these two proteins, preventing them from regulating gene expression. However, to prove any of these hypothesis further experiments have to be done similar to the ones previously done for

APTR-PRC2 interactions.

Molecular functions of LINC00152

Recent reports have proposed that LINC00152 functions as a competing endogenous RNA (ceRNA) by titrating different microRNAs. Despite the fact that there is an important difference between the relatively low level of expression of a lncRNA versus that of microRNA targets, and indeed many microRNAs, we tested this hypothesis by verifying the expression of the mRNA targets of these microRNAs upon LINC00152 knockdown. The results revealed that none of the miRNA targets were repressed upon

LINC00152 knockdown, suggesting that LINC00152 does not act as a ceRNA for these miRNAs. We also showed, by gain and loss of function experiments, that LINC00152 contributes to GBM progression by regulating the invasion and the proliferation of GBM cells under anchorage-independent conditions. In addition, we identified a protein bound stem-loop in the 3’ region of LINC00152 (M8) which is sufficient for inducing both of these phenotypes. Thus, as a cytoplasmic lncRNA, LINC00152 could be working as a molecular decoy, by sequestering proteins important to gene regulation and preventing them from entering the nucleus. As an example, I refer to the mechanism of action of NORAD in

Chapter 1. This lncRNA sequesters PUMILIO proteins, which prevents repression of genes related to DNA repair and DNA replication. Another interest possibility is that

LINC00152 may work as a regulator of protein activity by binding to proteins involved in signal transduction and either enhancing or preventing its activity. The example cited in

144

Chapter 1 is NKILA, a lncRNA that inhibits NF-kB signaling by preventing IkB phosphorylation. As well as for APTR, finding the protein interactor of LINC00152 using the methods previously mentioned will narrow down these possibilities.

Clinical potential of APTR and LINC00152

High expression of APTR or LINC00152 promotes cancer phenotypes and therefore they can be considered oncogenic lncRNAs [26-29]. APTR was shown to be upregulated in GBMs and liver pathologies [26-28], and so may serve as a biomarker for the development or progression of these diseases. LINC00152 is upregulated in 12 different cancer types, and more interestingly its high expression predicts poor outcomes of patients in 9 of them. Therefore, LINC00152 may serve as a biomarker for patient prognosis.

In addition to serving as cancer biomarkers, APTR and LINC00152 may be used as targets for cancer treatment. One strategy to this end is to use an siRNA-based therapy. In fact, there are examples in the literature supporting the use of this method to treat diseases, including cancer [30-32]. Another attractive strategy is to transcriptionally silence lncRNAs using CRISPRi (CRISPR interference), where a dead-Cas9 fused with transcriptional repressors is targeted to the promoter region of a lncRNA, causing its silencing [33,34]. However, to achieve a clinical feasible stage, targeting lncRNAs with siRNA or with CRISPRi have to overcome the difficulty of finding efficient delivery methods [35]. Thus, a few strategies have been employed, such as using viral vectors

[36], lipid vectors [37], and inorganic nanoparticles [38], but none have been widely adopted. A third approach is to solve the structure of the ribonucleoprotein containing the oncogenic lncRNA and then to use small chemicals that will either interfere with the

145 correct folding of the lncRNA or with the enzymatic activities of the proteins in the ribonucleoprotein. In fact, small startups have been created (for instance, Ribometrix, created by Dr. Kevin Weeks and Dr. Katie Warner) to discover small chemicals that will interact directly with a given RNA, preventing its proper folding and thus impairing its activity.

For tumor suppressive lncRNAs, the therapeutic strategy is more challenging. If the lncRNA is known to be induced by specific signal transduction pathways, it would be possible to stimulate this pathway to induce the lncRNA expression. More complicated approaches would involve carrying out structure-functions studies on the lncRNA to determine the smallest part of the RNA that can still suppress tumors, and then devise gene-therapy approaches to deliver this small part of the lncRNA to cancer cells in vivo.

However, as with all gene therapy strategies, this approach needs considerable optimization before it can be used in practice.

References

1 - Engreitz JM, Pandya-Jones A, McDonel P, Shishkin A, Sirokman K, Surka C, Kadri S, Xing J, Goren A, Lander ES, Plath K, Guttman M. The Xist lncRNA exploits three- dimensional genome architecture to spread across the X chromosome. Science. 2013 Aug 16;341(6147):1237973. doi: 10.1126/science.1237973.

2 - Simon MD, Wang CI, Kharchenko PV, West JA, Chapman BA, Alekseyenko AA, Borowsky ML, Kuroda MI, Kingston RE. The genomic binding sites of a noncoding RNA. Proc Natl Acad Sci U S A. 2011 Dec 20;108(51):20497-502.

3 - Geng H, Bu HF, Liu F, Wu L, Pfeifer K, Chou PM, Wang X, Sun J, Lu L, Pandey A, Bartolomei MS, De Plaen IG, Wang P, Yu J, Qian J, Tan XD. In Inflamed Intestinal

146

Tissues and Epithelial Cells, Interleukin 22 Signaling Increases Expression of H19 Long Noncoding RNA, Which Promotes Mucosal Regeneration. Gastroenterology. 2018 Apr 2. pii: S0016-5085(18)30367-6.

4 - Luo Y, Liang M, Yao W, Liu J, Niu Q, Chen J, Liu Z, Li M, Shi B, Pan J, Zhou L, Zhou X. Functional role of lncRNA LOC101927497 in N-methyl-N'-nitro-N-nitrosoguanidine- induced malignantly transformed human gastric epithelial cells. Life Sci. 2018 Jan 15;193:93-103.

5 - Vance KW, Sansom SN, Lee S, Chalei V, Kong L, Cooper SE, Oliver PL, Ponting CP. The long non-coding RNA Paupar regulates the expression of both local and distal genes. EMBO J. 2014 Feb 18;33(4):296-311.

6 - Chalei V, Sansom SN, Kong L, Lee S, Montiel JF, Vance KW, Ponting CP. The long non-coding RNA Dali is an epigenetic regulator of neural differentiation. Elife. 2014 Nov 21;3:e04530.

7 - West JA, Davis CP, Sunwoo H, Simon MD, Sadreyev RI, Wang PI, Tolstorukov MY, Kingston RE. The long noncoding RNAs NEAT1 and MALAT1 bind active chromatin sites. Mol Cell. 2014 Sep 4;55(5):791-802.

8 - Ong SE, Blagoev B, Kratchmarova I, Kristensen DB, Steen H, Pandey A, Mann M. Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics. 2002 May;1(5):376-86.

9 - Lu Z, Zhang QC, Lee B, Flynn RA, Smith MA, Robinson JT, et al. RNA Duplex Map in Living Cells Reveals Higher-Order Transcriptome Structure. Cell. 2016;165(5):1267–79.

10 - Gonzalez C, Sims JS, Hornstein N, Mela A, Garcia F, Lei L, et al. Ribosome Profiling Reveals a Cell-Type-Specific Translational Landscape in Brain Tumors. J Neurosci. 2014;34(33):10924–36.

11 - Neumann P, Jaé N, Knau A, Glaser SF, Fouani Y, Rossbach O, Krüger M, John D, Bindereif A, Grote P, Boon RA, Dimmeler S. The lncRNA GATA6-AS epigenetically regulates endothelial gene expression via interaction with LOXL2. Nat Commun. 2018 Jan 16;9(1):237.

147

12 - Liu B, Sun L, Liu Q, Gong C, Yao Y, Lv X, Lin L, Yao H, Su F, Li D, Zeng M, Song E. A cytoplasmic NF-κB interacting long noncoding RNA blocks IκB phosphorylation and suppresses breast cancer metastasis. Cancer Cell. 2015 Mar 9;27(3):370-81.

13 - Maenner S, Blaud M, Fouillen L, Savoye A, Marchand V, Dubois A, Sanglier- Cianférani S, Van Dorsselaer A, Clerc P, Avner P, Visvikis A, Branlant C. 2-D structure of the A region of Xist RNA and its implication for PRC2 association. PLoS Biol. 2010 Jan;8(1):e1000276.

14 - Siegfried NA, Busan S, Rice GM, Nelson JA, Weeks KM. RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP). Nat Methods. 2014 Sep;11(9):959-65.

15 - Smola MJ, Christy TW, Inoue K, Nicholson CO, Friedersdorf M, Keene JD, Lee DM, Calabrese JM, Weeks KM. SHAPE reveals transcript-wide interactions, complex structural domains, and protein interactions across the Xist lncRNA in living cells. Proc Natl Acad Sci U S A. 2016 Sep 13;113(37):10322-7.

16 - ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247.

17 - Imielinski M, Berger AH, Hammerman PS, Hernandez B, Pugh TJ, Hodis E, Cho J, Suh J, Capelletti M, Sivachenko A, Sougnez C, Auclair D, Lawrence MS, Stojanov P, Cibulskis K, Choi K, de Waal L, Sharifnia T, Brooks A, Greulich H, Banerji S, Zander T, Seidel D, Leenders F, Ansén S, Ludwig C, Engel-Riedel W, Stoelben E, Wolf J, Goparju C, Thompson K, Winckler W, Kwiatkowski D, Johnson BE, Jänne PA, Miller VA, Pao W, Travis WD, Pass HI, Gabriel SB, Lander ES, Thomas RK, Garraway LA, Getz G, Meyerson M. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell. 2012 Sep 14;150(6):1107-20.

18 - Makishima H, Visconte V, Sakaguchi H, Jankowska AM, Abu Kar S, Jerez A, Przychodzen B, Bupathi M, Guinta K, Afable MG, Sekeres MA, Padgett RA, Tiu RV, Maciejewski JP. Mutations in the spliceosome machinery, a novel and ubiquitous pathway in leukemogenesis. Blood. 2012 Apr 5;119(14):3203-10.

148

19 - Waterfall JJ, Arons E, Walker RL, Pineda M, Roth L, Killian JK, Abaan OD, Davis SR, Kreitman RJ, Meltzer PS. High prevalence of MAP2K1 mutations in variant and IGHV4-34-expressing hairy-cell leukemias. Nat Genet. 2014 Jan;46(1):8-10.

20 - Yoshida K, Ogawa S. Splicing factor mutations and cancer. Wiley Interdiscip Rev RNA. 2014 Jul-Aug;5(4):445-59. doi: 10.1002/wrna.1222. Epub 2014 Feb 12. Review.

21 - Kim DJ, Oh B, Kim YY. Splicing factor ASF/SF2 and transcription factor PPAR- gamma cooperate to directly regulate transcription of uncoupling protein-3. Biochem Biophys Res Commun. 2009 Jan 23;378(4):877-82. Erratum in: Biochem Biophys Res Commun. 2009 Jun 12;383(4):521.

22 - Karni R, de Stanchina E, Lowe SW, Sinha R, Mu D, Krainer AR. The gene encoding the splicing factor SF2/ASF is a proto-oncogene. Nat Struct Mol Biol. 2007 Mar;14(3):185- 93.

23 - Anczuków O, Rosenberg AZ, Akerman M, Das S, Zhan L, Karni R, Muthuswamy SK, Krainer AR. The splicing factor SRSF1 regulates apoptosis and proliferation to promote mammary epithelial cell transformation. Nat Struct Mol Biol. 2012 Jan 15;19(2):220-8.

24 - Xiao SH, Manley JL. Phosphorylation-dephosphorylation differentially affects activities of splicing factor ASF/SF2. EMBO J. 1998 Nov 2;17(21):6359-67.

25 - Zhang WJ, Wu JY. Functional properties of p54, a novel SR protein active in constitutive and alternative splicing. Mol Cell Biol. 1996 Oct;16(10):5400-8.

26 - Negishi M, Wongpalee SP, Sarkar S, Park J, Lee KY, Shibata Y, Reon BJ, Abounader R, Suzuki Y, Sugano S, Dutta A. A new lncRNA, APTR, associates with and represses the CDKN1A/p21 promoter by recruiting polycomb proteins. PLoS One. 2014 Apr 18;9(4):e95216.

27 - Yu F, Zheng J, Mao Y, Dong P, Li G, Lu Z, Guo C, Liu Z, Fan X. Long non-coding RNA APTR promotes the activation of hepatic stellate cells and the progression of liver fibrosis. Biochem Biophys Res Commun. 2015 Aug 7;463(4):679-85. doi: 10.1016/j.bbrc.2015.05.124. Epub 2015 Jun 1. PubMed PMID: 26043697.

149

28 - Yu S, Qi Y, Jiang J, Wang H, Zhou Q. APTR is a prognostic marker in cirrhotic patients with portal hypertension during TIPS procedure. Gene. 2018 Mar 1;645:30-33. doi: 10.1016/j.gene.2017.12.040. Epub 2017 Dec 21. PubMed PMID: 29274906.

29 - Reon BJ, Anaya J, Zhang Y, Mandell J, Purow B, Abounader R, et al. Expression of lncRNAs in Low-Grade Gliomas and Glioblastoma Multiforme: An In Silico Analysis. PLOS Med. 2016;13(12):e1002192.

30 - Song E, Lee SK, Wang J, Ince N, Ouyang N, Min J, Chen J, Shankar P, Lieberman J. RNA interference targeting Fas protects mice from fulminant hepatitis. Nat Med. 2003 Mar;9(3):347-51.

31 - Petrocca F, Altschuler G, Tan SM, Mendillo ML, Yan H, Jerry DJ, Kung AL, Hide W, Ince TA, Lieberman J. A genome-wide siRNA screen identifies proteasome addiction as a vulnerability of basal-like triple-negative breast cancer cells. Cancer Cell. 2013 Aug 12;24(2):182-96.

32 - Wittrup A, Lieberman J. Knocking down disease: a progress report on siRNA therapeutics. Nat Rev Genet. 2015 Sep;16(9):543-52. doi: 10.1038/nrg3978. Review.

33 - Thakore PI, D'Ippolito AM, Song L, Safi A, Shivakumar NK, Kabadi AM, Reddy TE, Crawford GE, Gersbach CA. Highly specific epigenome editing by CRISPR-Cas9 repressors for silencing of distal regulatory elements. Nat Methods. 2015 Dec;12(12):1143-9.

34 - Liu SJ, Horlbeck MA, Cho SW, Birk HS, Malatesta M, He D, Attenello FJ, Villalta JE, Cho MY, Chen Y, Mandegar MA, Olvera MP, Gilbert LA, Conklin BR, Chang HY, Weissman JS, Lim DA. CRISPRi-based genome-scale identification of functional long noncoding RNA loci in human cells. Science. 2017 Jan 6;355(6320). pii: aah7111.

35 - Dowdy SF. Overcoming cellular barriers for RNA therapeutics. Nat Biotechnol. 2017 Mar;35(3):222-229. doi: 10.1038/nbt.3802.

36 - Sicard F, Gayral M, Lulka H, Buscail L, Cordelier P. Targeting miR-21 for the therapy of pancreatic cancer. Mol Ther. 2013 May;21(5):986-94.

150

37 - Cardoso AL, Simões S, de Almeida LP, Plesnila N, Pedroso de Lima MC, Wagner E, Culmsee C. Tf-lipoplexes for neuronal siRNA delivery: a promising system to mediate gene silencing in the CNS. J Control Release. 2008 Dec 8;132(2):113-23.

38 - Liu Z, Winters M, Holodniy M, Dai H. siRNA delivery into human T cells and primary cells with carbon-nanotube transporters. Angew Chem Int Ed Engl. 2007;46(12):2023-7.

151

A

B 0.4

0.3

0.2 IgG

% Input % SRSF1 0.1

0 00152 HIST3H3

Figure 1. eCLIP data suggests that LINC00152 interacts with spliceosome proteins. A) Snap shot of LINC00152 M8 stem-loop structure from the UCSC genome browser with U2AF1 and SRSF1 eCLIP data track. Positive and negative strands shown separately. Two replicates of each pull down and a mock pull-down are also shown. Dark lines indicate reads present it the precipitate. B) LINC00152 and HIST3H3 (negative control) qRT-PCR after SRSF1 IP and IgG control. 152 A B C U87 A172 U251 160 250 80

120 200 60 00152 150 00152 00152 80 40

GS1 100 GS1 GS1

% Input %

% Input % % Input % 40 DRAIC 50 DRAIC 20 DRAIC Beads Beads Beads

0 0 0

GS1

GS1

GS1

Actin

Actin

Actin

00152

00152

00152

DRAIC

DRAIC DRAIC D E F U87 A172 U251

Marker 00152 GS1 DRAIC Beads Marker 00152 GS1 DRAIC Beads Marker 00152 GS1 DRAIC Beads

250kD 150kD 100kD 75kD

50kD

37kD

25kD * * * 20kD

Figure 2. LINC00152 endogenous pull-down in GBM cells. A-C) qPCR of LINC00152, GS1, DRAIC and Actin after LINC00152, GS1 and DRAIC endogenous pull-downs in GBM cell lines U87 (A), A172 (B) and U251 (C). D- F) Eluates and beads fractions of endogenous LINC00152, GS1 and DRAIC lncRNAs isolated from GBM cell lines U87 (D), A172 (E), and U251 (F) analysed by SDS-PAGE and silver staining. Asterisk indicates a specific band present in the LINC00152 lane, and not in the negative pull-downs.

153 Molecular # Unique Gene Symbol Description Weight (kD) Peptides

UBQLN1 Ubiquilin-1 66 2

SUMO3 Small ubiquitin-related modifier 3 12 1

SFN 14-3-3 protein sigma 30 1

CRYBA1 Beta-crystallin A3 25 1

GAPDH Glyceraldehyde-3-phosphate dehydrogenase 36 1

REEP1 Receptor expression-enhancing protein 1 22 1

GAA Lysosomal alpha-glucosidase 105 1

TTN Titin 3816 1

Table 1. List of proteins identified by mass spectrometry in LINC00152 pull-down. Endogenous LINC00152 and associated proteins were subjected to SDS-PAGE and sections of the acrylamide gel were submitted to mass spectrometry. As negative controls, GS1 and DRAIC oligos were used. The number of unique peptides are shown in the third column. As observed, because the number of unique peptides are very low, the study of the proteins associated were considered not worth pursuing.

154