Genomic Data Mining for Functional Annotation of Human Long Noncoding Rnas Brian L

Genomic Data Mining for Functional Annotation of Human Long Noncoding Rnas Brian L

Clemson University TigerPrints All Dissertations Dissertations 5-2018 Genomic Data Mining for Functional Annotation of Human Long Noncoding RNAs Brian L. Gudenas Clemson University, [email protected] Follow this and additional works at: https://tigerprints.clemson.edu/all_dissertations Recommended Citation Gudenas, Brian L., "Genomic Data Mining for Functional Annotation of Human Long Noncoding RNAs" (2018). All Dissertations. 2146. https://tigerprints.clemson.edu/all_dissertations/2146 This Dissertation is brought to you for free and open access by the Dissertations at TigerPrints. It has been accepted for inclusion in All Dissertations by an authorized administrator of TigerPrints. For more information, please contact [email protected]. GENOMIC DATA MINING FOR FUNCTIONAL ANNOTATION OF HUMAN LONG NONCODING RNAs A Dissertation Presented to the Graduate School of Clemson University In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Genetics by Brian L. Gudenas May 2018 Accepted by: Dr. Liangjiang Wang, Committee Chair Dr. Hong Luo Dr. Anand Srivastava Dr. Rajandeep Sekhon ABSTRACT Life may have begun in an RNA world, which is supported by the increasingly vital role that RNA has been shown to perform in biological systems. To understand how the genome encodes life, one must look to the transcriptome, the set of all RNA molecules in a cell. The transcriptome illustrates which RNA transcripts are expressed at what times and this orchestrated network of gene expression is responsible for multicellular development. In humans, most genes are noncoding RNAs, meaning that they do not encode proteins. The largest class of noncoding genes are long noncoding RNAs (lncRNAs), RNA transcripts greater in length than 200 nucleotides which lack protein-coding capacity. Some lncRNAs have been shown to be key regulators; however, most lncRNAs are uncharacterized. Therefore, we developed genomic data mining methodologies for lncRNA functional annotation. Many lncRNAs are brain-specific and their dysregulation is suspected to be involved in neurodevelopmental disorders. Two prevalent brain disorders are intellectual disability (ID) and autism spectrum disorder (ASD), which are genetically heterogeneous with unidentified genetic risk factors. In this study, we created brain developmental gene coexpression networks, for ID and ASD, to identify lncRNAs associated with known disease genes. We found lncRNAs highly co-expressed with ID genes which harbored ID- associated copy number variants (CNVs). To find ASD-associated lncRNAs we identified lncRNAs differentially expressed in the ASD brain and then refined these candidates by filtering for associations with ASD risk genes in a human brain developmental coexpression network. These candidate-ASD associated lncRNAs were associated with the ii synaptic transmission and immune response pathways, in addition to residing within ASD- associated CNVs at a high frequency. The mechanism by which lncRNAs function is partly determined by functional motifs in the RNA transcript sequence. To identify lncRNA motifs, we developed a genetic algorithm capable of finding long motifs and found a motif associated with lncRNA nuclear localization. LncRNA functions are compartmentalized within the cell; therefore, knowledge of lncRNA subcellular localization provides insight into their biological function. We developed a deep learning model that predicts lncRNA subcellular localization from lncRNA transcript sequences. This model obtained high prediction accuracy on lncRNAs with known localizations suggesting that sequence motifs are involved in subcellular localization. In summary, we developed genomic data mining methods for the functional characterization of lncRNAs based on their expression patterns and transcript sequences. iii DEDICATION For my mom and dad, without their lifelong love and support I would have never made it this far. iv ACKNOWLEDGMENTS I want to sincerely thank my research advisor Dr. Wang for his excellent mentorship on bioinformatics, genetics and science. I would not be the scientist I am today without his guidance. I also want to extend gratitude to my graduate committee members, Dr. Luo, Dr. Srivastava and Dr. Sekhon. They have been incredible resources of scientific wisdom which have greatly helped my academic development. v TABLE OF CONTENTS Page TITLE PAGE .................................................................................................................... i ABSTRACT ..................................................................................................................... ii DEDICATION ................................................................................................................ iv ACKNOWLEDGMENTS ............................................................................................... v LIST OF TABLES ........................................................................................................ viii LIST OF FIGURES ........................................................................................................ ix CHAPTER I. LITERATURE REVIEW OF THE FUNCTIONAL ANNOTATION OF LONG NONCODING RNAS THROUGH GENOMIC DATA MINING ... 1 Introduction .............................................................................................. 1 Biological function of lncRNAs .............................................................. 3 Subcellular localization of lncRNAs ....................................................... 5 lncRNAs in human disease ...................................................................... 8 Genomic data mining ............................................................................. 10 Expression-based lncRNA functional annotation .................................. 14 Sequence-based lncRNA functional annotation .................................... 17 Concluding remarks ............................................................................... 21 II. GENE COEXPRESSION NETWORKS IN HUMAN BRAIN DEVELOPMENTAL TRANSCRIPTOMES IMPLICATE THE ASSOCIATION OF LONG NONCODING RNAS WITH INTELLECTUAL DISABILITY ............................................................................................... 22 Introduction ............................................................................................ 23 Methods.................................................................................................. 25 Results and Discussion .......................................................................... 29 Conclusion ............................................................................................. 38 III. INTEGRATIVE GENOMIC ANALYSES FOR IDENTIFICATION AND PRIORITIZATION OF LONG NONCODING RNAS ASSOCIATED WITH AUTISM ...................................................................................................... 40 vi Introduction ............................................................................................ 41 Results and Discussion .......................................................................... 44 Conclusions ............................................................................................ 55 Methods.................................................................................................. 56 IV. A GENETIC ALGORITHM FOR FINDING DISCRIMINATIVE FUNCTIONAL MOTIFS IN LONG NONCODING RNA ........................ 62 Introduction ............................................................................................ 63 Methods.................................................................................................. 65 Results .................................................................................................... 68 Conclusion ............................................................................................. 71 V. PREDICTION OF LNCRNA SUBCELLULAR LOCALIZATION WITH DEEP LEARNING FROM SEQUENCE FEATURES ............................... 73 Introduction ............................................................................................ 74 Methods.................................................................................................. 77 Results .................................................................................................... 83 Conclusion ............................................................................................. 91 V. CONCLUSION ............................................................................................ 93 APPENDICES ............................................................................................................... 98 A: Additional files............................................................................................. 99 B: Supplementary figures ............................................................................... 100 REFERENCES ............................................................................................................ 103 vii LIST OF TABLES Table Page 2.1 Prioritized list of ID-associated lncRNA candidates ................................... 37 5.1 Performance metrics on the test set ............................................................. 86 viii LIST OF FIGURES Figure Page 1.1 Functional themes of lncRNAs ...................................................................... 5 1.2 LncRNA cellular functions ............................................................................ 7 1.3 Overview of coexpression network analysis...............................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    124 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us