Gene Ontology – Biological Process
Total Page:16
File Type:pdf, Size:1020Kb
Table of Contents Abstract iv Acknowledgments vi Chapter 1: Introduction Long non-coding RNAs 1 Functions of long non-coding RNAs 4 Nuclear lncRNAs 4 Cytoplasmic lncRNAs 6 Long non-coding RNAs in cancer 9 Long non-coding RNAs in gliomas 12 Oncogenic long non-coding RNAs in gliomas 14 Tumor suppressor long non-coding RNAs in gliomas 17 References 19 Figures 33 Chapter 2: The functional role of APTR long non-coding RNA Abstract 35 Introduction 36 Results 38 Discussion 46 Materials and Methods 51 References 56 Figures 61 Chapter 3: Adapted from “LINC00152 promotes invasion in glioblastomas through a hairpin structure located near its 3’ end” My contribution to the paper 80 Abstract 82 Introduction 83 Results 84 ii Discussion 94 Materials and Methods 100 References 103 Figures 113 Chapter 4: Closing Remarks and Future Directions Alternative methods for discovering APTR interacting partners 136 APTR processing into different fragments 138 APTR and LINC00152 secondary structure 139 Identifying LINC00152 protein interactor 140 APTR as a regulator of gene expression 142 Molecular functions of LINC00152 144 Clinical potentials of APTR and LINC00152 145 References 146 Figures 152 iii Abstract Since the advent of high-throughput sequencing methods, long non-coding RNAs (lncRNAs) have been emerging as important players in physiological and pathological conditions. Among them, APTR (Alu-mediated p21 transcriptional regulator) is a lncRNA upregulated in tumors. This RNA regulates the cell cycle by recruiting the PRC2 protein complex (Polycomb Repressive Complex 2) to the promoter region of p21 gene, inducing its suppression. In addition, APTR and p21 expressions are anti-correlated in glioblastoma patient samples. Exploring the possibility that APTR interact with other genomic sites and recruit other binding proteins, the present work shows that APTR interacts with more than 2,000 genomic loci. By intersecting a publicly available EZH2 dataset, I found that 115 DNA APTR bound sites are marked with EZH2. Moreover, a microarray study revealed that APTR regulates the expression of 274 genes, of which 43 are nearby an APTR associated DNA. Another pull-down assay uncovered 70 potential binding partners for APTR. BRD7 (Bromodomain-containing protein 7) and NFIC (Nuclear Factor I C) RNA Immunoprecipitation confirmed APTR interaction with these two transcription factors. This study also identified another lncRNA, LINC00152, as upregulated in 12 types of tumor, including glioblastomas. In addition, its upregulation is indicative of worse prognosis in 9 cancer types. LINC00152 knockdown decreases GBM cells anchorage- independent proliferation and cellular invasion. While overexpression induces these phenotypes. RNA-seq experiment revealed that LINC00152 controls the expression of genes involved in epithelial-to-mesenchymal transition. A protein bound stem-loop at the 3’ end of LINC00152, identified by PARIS, Ribo-seq and secondary structure prediction iv data, is sufficient to increase invasion of glioblastoma cell lines. Overexpression of LINC00152 stem-loop mutants suggests that stem formation in the hairpin is essential for LINC00152 function. Finally, because of their high expression in glioblastomas, APTR and LINC00152 may serve as biomarkers for development and progression of patients with this cancer. In addition, since it is upregulated in 12 different tumors and its upregulation indicates worse outcomes in 9 of them, LINC00152 may serve as a biomarker for patient prognosis. v Acknowledgments I would like to start thanking my parents, Soraia and Claudio, for encouraging me to follow my dreams and ambitions. You are my role models. I also want to thank my sister, Juliana, and my girlfriend, Maraysa, for all the love and encouragement during all these years. I thank, Dr. Dutta, for mentoring me during this process. Your passion for science is inspiring. I also thank my fellow lab mates for the insights, advices and stimulating discussions. I express my sincere gratitude to the Brazilian government, particularly to the Coordination for the Improvement of Higher Level Personnel (CAPES BEX 0320-13-7) for the financial support. vi Chapter 1: Introduction Long non-coding RNAs For a long period of time, it was accepted that the RNA molecule was solely an intermediate in the genetic information flow. This central dogma of molecular biology stated that proteins were the main effectors of the cellular molecular processes [1]. Less than 5% of the genome was believed to be transcribed with the vast majority of the RNA arising from protein-coding genes. Recently, however, there was a paradigm shift thanks to the advent of whole genome and transcriptome sequencing techniques. High- throughput sequencing methods evidenced that more than 70% of the human genome is transcribed, furthermore, the ENCODE (Encyclopedia of DNA Elements) project showed that the majority of the human genome is expressed in the form of non-coding RNAs (ncRNAs) [2-4]. The ncRNAs are often categorized according to the size of the transcript. Thus, ncRNAs containing less than 200 base pairs are considered small ncRNAs, such as microRNAs. Accordingly, the long non-coding RNAs (lncRNAs) are those where transcripts reach more than 200 base pairs [5]; although, many of them are larger than 2kb and their average length is 500bp. Many of these molecules are transcribed by RNA pol II (RNA polymerase II), are polyadenylated and may be found either in nuclear or cytosolic fractions [6,7]. From the evolutionary point of view, although still conserved, most of reported lncRNAs are less conserved than protein-coding genes [7,8]. Suggesting that the structure conservation or simply the act of expression itself is sufficient for the biological functions of lncRNAs. 1 In addition, genes for lncRNAs have the same active chromatin marks as protein- coding genes [9]. Moreover, according to the NONCODE database, the number of lncRNAs is larger than the number of protein-coding RNAs, around 82,300 transcripts. In contrast, there are more than 172,000 lncRNA transcripts clustered into 96,308 gene loci in the human genome [10,11]. Even though generally expressed at lower levels than protein-coding mRNAs [12], lncRNAs, once thought to be transcriptional noise in the genome [13], have been described as important regulators of multiple biological processes, such as organ and tissue development, cellular differentiation, reprogramming of stem cells and metabolic processes [14-17]. In this context, the first two lncRNAs to be discovered were H19 and XIST. H19, was discovered in the early 1980s by Pachnis et al. [18]. In 1990, Brannan and collaborators [19] revealed that the lncRNA H19 is 2.3kb in length, spliced, polyadenylated and is highly expressed in embryonic tissues. They also showed that, during the embryonic development H19 is paternally imprinted and maternally expressed. Over time, other groups have reported that until 8 weeks of gestation both alleles are expressed, but by the time the embryo reaches 10 weeks, there is only maternal H19 expression [20,21]. Postnatally, H19 is silenced in almost all tissues except for skeletal muscle [22], where it appears to have an important role as primary microRNA for the myogenic miRNAs, miR-675-5p and miR-675-3p [23]. In addition, H19 is evolutionarily conserved in humans and mice [24] and overexpression of H19 is lethal in embryonic mice [22]. 2 Subsequently, Brown et al. [25] reported XIST (X Inactive Specific Transcript), a ~19kb lncRNA, which plays an important role in the X chromosome inactivation. During female development, XIST entirely covers one of the X chromosomes and initially thought to promote the silencing of gene expression on the inactive X chromosome by bringing the PRC2 (Polycomb Repressive Complex 2), which promotes H3K27me3 (histone H3 lysine 27 trimethylation) to the chromosomal DNA [26]. More recently X inactivation by XIST was reported to be dependent on SHARP (SMRT and HDAC Associated Repressor Protein) and HDAC3 (Histone Deacetylase 3) [27]. Meanwhile, on the future active X chromosome, TSIX, XIST antisense lncRNA, prevents the XIST coating of the chormosome [28]. Other lncRNAs have been described as parts of protein containing machines. The rRNA, structural and catalytic components of ribosomes, have been known for quite some time [29]. Similarly, telomerase RNA component (TERC) was found as part of the telomerase complex and reported to provide the crucial template on which the telomeric repeats are extended by the telomerase catalytic protein [30]. Since the discovery of rRNAs, H19, XIST and TERC the advent of high throughput sequencing has led to the discovery of thousands of other lncRNAs. Moreover, a search on the pubmed shows that the number of published articles on lncRNAs has exponentially increased over the last decade from 148 publications in 2008 to 3,068 publications in 2017 (Figure 1). Elucidating the role of a given lncRNA in a specific cellular process and the mechanism of its action is of great interest. To this end, as an initial framework to understand their functions, it is convenient to separate lncRNAs into different categories. 3 Functions of long non-coding RNAs LncRNAs can be subclassified into four classes according to their genomic intersection with protein-coding genes: (1) intergenic, which do not intersect with any protein-coding loci; (2) overlapping, if a protein-coding gene is transcribed from an intron of a lncRNA; (3) exonic, if at least one of its exons intersects a protein-coding exon; and (4) intronic, in which a lncRNA is completely contained within an intron of a protein-coding gene [31]. However, this classification does not help discern function, and thus, a more attractive way to categorize lncRNAs is to divide them according to their subcellular localization. In these terms, nuclear lncRNAs can work as regulators of gene expression, as molecular decoys and structural components of nuclear compartments. In contrast, in the cytoplasm lncRNAs mainly function as regulators of mRNA and protein stability, as competing endogenous RNAs (ceRNAs), and as molecular decoys.