Functional Exploration of Antisense Long Non-Coding Rnas Containing Transposable Elements: a Bioinformatics Approach
Total Page:16
File Type:pdf, Size:1020Kb
Open Research Online The Open University’s repository of research publications and other research outputs Functional Exploration of Antisense Long Non-Coding RNAs Containing Transposable Elements: A Bioinformatics Approach Thesis How to cite: Gadekar, Veerendra Parsappa (2016). Functional Exploration of Antisense Long Non-Coding RNAs Containing Transposable Elements: A Bioinformatics Approach. PhD thesis The Open University. For guidance on citations see FAQs. c 2016 The Author https://creativecommons.org/licenses/by-nc-nd/4.0/ Version: Version of Record Link(s) to article on publisher’s website: http://dx.doi.org/doi:10.21954/ou.ro.0000ef4d Copyright and Moral Rights for the articles on this site are retained by the individual authors and/or other copyright owners. For more information on Open Research Online’s data policy on reuse of materials please consult the policies page. oro.open.ac.uk Functional exploration of antisense long non-coding RNAs containing transposable elements: a bioinformatics approach A thesis submitted to the Open University of London for the degree of Doctor of Philosophy by Veerendra P Gadekar Master of Science in Bioinformatics, Manipal University, Manipal- Karnataka, India Stazione Zoologica Anton Dhorn, Naples, Italy The Open University, London, United Kingdom Director of studies: Dr. Remo Sanges, Ph.D. External Supervisor: Dr. Stefano Gustincich, Ph.D. Co-supervisor: Dr. Paolo Sordino, Ph.D. January, 2016 soSrms&ion : X ° \ owme of ; 30 a-°1^ ProQuest Number: 13834602 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a com plete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. uest ProQuest 13834602 Published by ProQuest LLC(2019). Copyright of the Dissertation is held by the Author. All rights reserved. This work is protected against unauthorized copying under Title 17, United States C ode Microform Edition © ProQuest LLC. ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106- 1346 Abstract Long non-coding RNA (IncRNAs) show a wide range of regulatory functions at the transcriptional and post-transcriptional levels both in the nucleus and cytoplasm. Recently, antisense IncRNAs (ASlncRNAs) were reported to up-regulate protein synthesis post- transcriptionally through a mechanism depending on an embedded inverted SINE B2 and 5' overlap to the target mRNAs. Such ASlncRNAs are also referred as SINEUPs. Synthetic SINEUPs with identical modular organization were also demonstrated to exert the same activity suggesting a functional relationship between SINE repetitive elements and ASlncRNAs. In order to gain a broader insight on the contribution of transposable elements (TEs) in the sequence composition of ASlncRNAs, I have developed a bioinformatic pipeline that can identify and characterize transcripts containing TEs and analyze TEs coverage for different classes of coding/non-coding sense/antisense (S/AS) pairs. I aimed at identifying if the functional activity of SINEUPs could be a widespread phenomenon across multiple similar natural ASlnRNAs in the transcriptomes of the extensively studied model organisms that have a well annotated catalog of IncNRAs. From my initial analysis I identified human and mouse are the two species that showed a significant coverage enrichment of SINE repeats among ASlncRNAs. I further performed several functional enrichment analysis for the sense coding genes overlapping to ASlncRNAs taking into consideration of different characteristics of the 5' binding domain and the 3' embedded SINE repetitive elements. This permitted me to identify the effect of these modular features over the functional associations of sense coding genes. The results of the analysis showed that the products of coding genes associated to ASlncRNAs containing SINEs are significantly enriched for mitochondrial localization. Further, to determine if these ASlncRNAs could exert SINEUP-like activity during stress, I analyzed the data from a published custom microarray experiment study, that were associated to the polysome fractions of MRC5 cell lysates in control and oxidative stress condition. The results revealed that the ASlncRNA carrying inverted or direct SINE repeats and their corresponding sense coding genes do not show any significant differential polysome loading in stress with respect to normal conditions, which is not a desired characteristic of a potential SINEUR However, ASlncRNAs with inverted and direct SINE repeats corresponding to high translating polysome fractions showed a significantly higher ratio of means for RNA levels in stress over control, in contrast to noASlncRNA. This suggests that the ASlncRNA containing SINE elements are the key RNA molecules that are active during stress, although to determine if they are also involved in the increased polysome loading of their respective sense coding mRNAs, there is a need of further experimentation and exploration. Altogether, the work presented in this thesis provides a novel bioinformatics approach to study transcriptome-wide ASlncRNAs containing TEs and their functional association over the sense coding genes, and discover new significant functional features of ASlncRNA to be biologically validated. Acknowledgments Firstly, I would like to thank the Open University (OU) and Stazione Zoologica Anton Dohm (SZN), Naples, for giving me the opportunity and funding to pursue my Ph.D. I would like to express my sincere gratitude to my director of studies Dr. Remo Sanges for the immeasurable amount of support and guidance he has provided throughout my Ph.D. His motivation and supervisory role has helped me to grow as a research scientist. Thank you Remo! for showing enormous patience towards me during this whole time of research and writing of this thesis. Your words of encouragement and advice on my research and career have been priceless. I would also like to take this opportunity to thank my viva examiners - Dr. Paul Kersey (EBI, UK), Dr. Brunella Franco (TIGEM, Italy) and Dr. Marizio Ribera D'alcala (SZN, Italy) for their very helpful comments and suggestions. My sincere thanks also goes to Dr. Raffaella Casotti, the chair of the board of examiners, for organizing my Ph.D. viva. I also thank her for being very supportive and helpful throughout my Ph.D., as the coordinator of the Ph.D. programs at SZN. I would also like to acknowledge the support and invaluable assistance of Dr. Gabriella Grossi, the secretary of OU Ph.D. programs at SZN, in resolving any kind of issues and documentations related to my Ph.D. I am very thankful to my external supervisor Dr. Stefano Gustincich (SISSA, Italy), for his timely visits and valuable advice throughout my research. I am extremely grateful to receive his comments and questions that have helped me to improve my presentation skills. I also thank my co-supervisor Dr. Paolo Sordino (SZN) for being kind and supportive during all the time of research and sharing his insightful suggestions. My heartfelt thanks also goes to Mr. Prashantha Hebbar (DDI, Kuwait), he was my tutor during my master’s degree. His motivations, advice and support on many occasion were invaluable and inspired me to pursue a career in bioinformatics and research. A good support system is important to survive the ups and downs of three years of Ph.D. I was lucky to have my good friends and lab-mates for this. I thank Francesco, Giuseppe, Swaraj and Massimiliano for being there and cheering me up whenever I needed. I will never forget the innumerable coffee breaks we have taken together when we discussed science and philosophy. I would also like to thank all my batch-mates at SZN, with whom I share a lot of common experience of growing up as a researcher. I can’t thank enough to my dear friend Harpreet, who has been my weekend gaming partner! I just cannot forget hours of playing PES and COD in his Xbox. Being in such a special city like Naples became even more memorable to me because of all my cheerful friends. I would like to thank Pacos, Veer, Christopher, Mauro, Gianni for accompanying me to the sightseeing, in and around Napoli and for BBQ Hangouts! My special thanks goes to a veiy sweet Italian family with whom I stayed during this time. I am thankful to Maria, Paolo and Roberto for treating me as a family member and teaching me to cook Italian cuisines! I would always cherish the time we spent together. Finally, I would like to thank my family. Words cannot express how grateful I am to my parents for their love, support and all of the sacrifices that you’ve made for me. I am very grateful to my elder brother who is also my role model and my two lovely elder sisters for being so supportive during my away time from home. The last word of acknowledgment I have saved for my dear wife Sweta. Thank you for having unwavering love and support towards me. I really appreciate the extra-special care you have shown during this whole time of my Ph.D., it helped me to stay motivated until the writing of this thesis. Table of Contents Abstract ii Acknowledgments iv Table of contents vii List of Figures xv List of Tables xvii List of Abbreviations xviii Chapter 1 Introduction 1.1. Long non-coding RNAs in the post-genomic era 1 1.1.1. Brief history of the genomic era ....................................................................... 1 1.1.2. Paradoxes associated with organismal complexity ......................................... 4 1.1.2.1. C-Value paradox ................................................................................... 4 1.1.2.2. N or G-value paradox ........................................................................... 5 1.1.2.3. Role of non-coding DNA in organismal complexity ........................ 7 1.1.3. Major projects venturing transcriptomes ......................................................... 10 1.1.3.1. FANTOM (Functional annotation of mouse) ..................................... 10 1.1.3.2. ENCODE (ENCyclopedia Of DNA Elements) .................................