Leveraging Omics Data to Expand the Value and Understanding of Alternative Splicing

Leveraging Omics Data to Expand the Value and Understanding of Alternative Splicing By ______________________ Nathan T. Johnson A Dissertation Submitted to the Faculty of WORCESTER POLYTECHNIC INSTITUTE In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy In Bioinformatics & Computational Biology APPROVED: __________________________ __________________________ Dmitry Korkin, Ph.D. Amity L. Manning, Ph.D. Advisor Committee Member Program Director __________________________ __________________________ Zheyang Wu, Ph.D. Scarlet Shell, Ph.D. Committee Member Committee Member __________________________ Ben Raphael, Ph.D. External Committee Member i “Science, my boy, is made up of mistakes, but they are mistakes which it is useful to make, because they lead little by little to the truth.” Jules Verne, Journey to the Center of the Earth ii ABSTRACT Utilizing ‘omics’ data of diverse types such as genomics, proteomics, transcriptomics, epigenomics, and others has largely been attributed as holding great promise for solving the complexity of many health and ecological problems such as complex genetic diseases and parasitic destruction of farming crops. By using bioinformatics, it is possible to take advantage of ‘omics’ data to gain a systems level molecular perspective to achieve insight into possible solutions. One possible solution is understanding and expanding the use of alternative splicing (AS) of mRNA precursors. Typically, genes are considered the focal point as the main players in the molecular world. However, due to recent ‘omics’ analysis across the past decade, AS has been demonstrated to be the main player in causing protein diversity. This is possible as AS rearranges the key components of a gene (exon, intron, and untranslated regions) to generate diverse functionally unique proteins and regulatory RNAs. AS is highly prevalent, where on average 10 AS transcripts occur for every gene in humans. Furthermore, multiple transcripts can be expressed at the same moment leading to different protein products that can interact within their molecular environment in unique ways. The prevalence on which transcripts are alternatively spliced has been demonstrated to be based on age, tissue, cell type, and disease state. This work brings together different ‘omics’ data to expand our understanding and promote the value of AS Specifically, there are six projects described here which make use of transcriptomics, proteomics, genomics, and epigenomics, which often overlap, on the focus in a couple of complex genetic diseases as well as analyzing a parasite, which infects soybeans. The projects range from systemically profiling machine learning methods utilizing RNA-Seq based alternative splicing expression data to promote its use, development of a method to predict whether an alternative spliced protein affects its interaction, a systematic analysis across the transcriptome for comparing iii binding sites and domains with alternative splicing and expression patterns, assessment of single nucleotide variation on protein binding sites in cancer, assessment of epigenomics with transcriptomics within the context of acute lymphoblastic leukemia, and looking for patterns of alternative splicing on parasites infecting soybeans. iv ACKNOWLEDGEMENTS I, first, would like to thank my Ph.D. mentor Dr. Dmitry Korkin, who has provided valuable guidance to shape how I approach science. His insight and invaluable ideas constantly refined and challenged me to be a better researcher. I will be forever indebted to him for his valuable mentorship. Additionally, I would like to thank my committee members, Drs. Amity Manning, Ben Raphael, Scarlet Shell, and Zheyang Wu for their valuable feedback and suggestions to improve my quality of communication both orally and in written format. I, second, would like to thank my fellow lab members and collaborators, who without, this work would not have been possible to be completed: Andi Dhroso, Hongzhu Cui, Suhas Srinivasan, Oleksandr (Alex) Narykov, Xingyan Kuang, Katelyn (Katie) Hughes, Anastasia (Ana) Leshchyk, Nan Hu, and Nan (Alan) Zhao. Specifically, I would like to give a double thanks to Andi, Suhas, Hongzhu, and Alex due to their lengthy discussions, debates, and collaborative work that has allowed the various works within this dissertation to be the quality it is. I, third, am grateful to the University of Missouri, Worcester Polytechnic Institute, the National Science Foundation, and National Institute of Health for the environment and resources that allow research to exist. Explicitly, would like to thank my external and internal collaborators, Drs. Kristen Taylor, Melissa Mitchum, Gerald Arthur, Amity Manning, Rodrigo Aldecoa, Dmitri Krioukov, Nematode Consortium, Suzanne Scarlata, Michael Green, and Haiyuan Yu. As without their problems, insight, and data, the work described here and elsewhere would not be possible. Lastly, would like to thank Takeda Pharmaceuticals for their internship, which opened my eyes to the possibilities within science in industry. v I, fourth, would like to acknowledge the countless people that can be attributed for their valuable contribution to the scientist and person that I have become prior to entering this Ph.D. program: Chestnut Laboratories and Drs. Glenner Richards, Stephen Badger, Elizabeth Bryda, and Robert White. Without their countless impact, I would not have ever considered a scientific career path or been equipped with the tools to excel. Finally, I would like to thank my family for their constant support and understanding during this process. Their admiration and support helped give me the persistence to finish this process. Exceptional thanks go to my wife, Amanda Johnson, to whom without I would never have finished. Her constant love, support, and support (on purpose twice) gave me the motivation and perseverance. My honors and achievements are dedicated to all these people, as without, impossible. vi TABLE OF CONTENTS ABSTRACT……………………………………………………………………………………………...iii ACKNOWLEDGEMENTS ............................................................................................. v TABLE OF CONTENTS ............................................................................................... vii LIST OF ILLUSTRATIONS .......................................................................................... xi LIST OF TABLES ....................................................................................................... xxiii CHAPTER 1: Introduction and Literature Review ...................................................... 1 1.1 21ST CENTURY PROBLEMS ............................................................................................ 1 1.2 VALUE OF OMICS DATA (14) ....................................................................................... 2 1.3 ALTERNATIVE SPLICING AND POSTRANSCRIPTIONAL VARIATION IN COMPLEX GENETIC DISORDERS 5 1.4 STUDYING GENETIC VARIATION IN COMPLEX GENETIC DISORDERS ............................... 8 1.5 STUDYING EPIGENOMIC VARIATION IN COMPLEX GENETIC DISORDERS ....................... 10 1.6 SUMMARY 13 1.7 SUMMARY OF DISSERTATION ....................................................................................... 13 CHAPTER 2: TRANSCRIPTOMICS .......................................................................... 16 2.1 EPIGENETIC AND RNA EFFECTS IN ACUTE LYMPHOBLASTIC LEUKEMIA (ALL) .......... 16 2.1.1 Abstract ............................................................................................................. 16 2.1.2 Introduction....................................................................................................... 16 2.1.3 Results ............................................................................................................... 18 2.1.3 Discussion ......................................................................................................... 28 vii 2.1.3 Materials and Methods ..................................................................................... 31 2.2 NOVEL GLOBAL EFFECTOR MINING FROM THE TRANSCRIPTOME OF EARLY LIFE STAGES OF THE SOYBEAN CYST NEMATODE HETERODERA GLYCINES (158) .................................................. 36 2.2.1 Abstract ............................................................................................................. 36 2.2.2 Introduction....................................................................................................... 36 2.2.3 Results ............................................................................................................... 39 2.2.3 Discussion ......................................................................................................... 56 2.2.4 Materials & Methods ........................................................................................ 63 2.3 BIOLOGICAL CLASSIFICATION WITH RNA-SEQ DATA: CAN ALTERNATIVE SPLICED TRANSCRIPT EXPRESSION ENHANCE MACHINE LEARNING CLASSIFIER? (239) ......................................... 71 2.3.1 Abstract ............................................................................................................. 71 2.3.2 Introduction....................................................................................................... 72 2.3.3 Results ............................................................................................................... 75 2.3.4 Discussion ........................................................................................................

Leveraging Omics Data to Expand the Value and Understanding of Alternative Splicing

Product Data Sheet

University of California, San Diego

HIST1H3D: a Promising Therapeutic Target for Lung Cancer

Contrasting Expression Patterns of Histone Mrna and Microrna 760 in Patients with Gastric Cancer

Gene Expression Imputation Across Multiple Brain Regions Provides Insights Into Schizophrenia Risk

Network Assessment of Demethylation Treatment in Melanoma: Differential Transcriptome-Methylome and Antigen Profile Signatures

Download Download

Cancer TNT Ashwin Ram 12/5/2017 Background: Chromatin Writers, Readers, Erasers

Description: Uniprot:P68431 Alternative Names：

Genome-Wide Association Study of Dietary Intake in the UK Biobank Study and Its Associations with Schizophrenia and Other Traits Maria Niarchou1,2,3, Enda M

Comprehensive Genome Methylation Analysis in Bladder Cancer: Identification and Validation of Novel Methylated Genes and Application of These As Urinary Tumor Markers

Molecular Effects of Isoflavone Supplementation Human Intervention Studies and Quantitative Models for Risk Assessment