Patterns and Signals of Biology: an Emphasis on the Role of Post
Total Page:16
File Type:pdf, Size:1020Kb
University of Nebraska at Omaha DigitalCommons@UNO Student Work 5-2016 Patterns and Signals of Biology: An Emphasis On The Role of Post Translational Modifications in Proteomes for Function and Evolutionary Progression Oliver Bonham-Carter Follow this and additional works at: https://digitalcommons.unomaha.edu/studentwork Part of the Computer Sciences Commons Patterns and Signals of Biology: An Emphasis On The Role of Post Translational Modifications in Proteomes for Function and Evolutionary Progression by Oliver Bonham-Carter A DISSERTATION Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Doctor of Philosophy Major: Information Technology Under the Supervision of Dhundy Bastola, PhD and Hesham Ali, PhD Omaha, Nebraska May, 2016 SUPERVISORY COMMITTEE Chair: Dhundy (Kiran) Bastola, PhD Co-Chair: Hesham Ali, PhD Lotfollah Najjar, PhD Steven From, PhD Pro Que st Num b e r: 10124836 A ll rig hts re se rve d INFO RMA TION TO A LL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will b e no te d . Also , if m a te ria l ha d to b e re m o ve d , a no te will ind ic a te the d e le tio n. Pro Que st 10124836 Pub lishe d b y Pro Que st LLC (2016). Co p yrig ht o f the Disserta tio n is he ld b y the Autho r. A ll rig hts re se rve d . This work is protected against unauthorized copying under Title 17, United States Code Microform Edition © ProQuest LLC. Pro Q u e st LLC . 789 East Eisenho we r Pa rkwa y P.O. Box 1346 Ann Arb o r, MI 48106 - 1346 Abstract Patterns and Signals of Biology: An Emphasis On The Role of Post Translational Modifications in Proteomes for Function and Evolutionary Progression Oliver Bonham-Carter, MS, PhD University of Nebraska, 2016 Advisors: Dhundy (Kiran) Bastola, PhD and Hesham Ali, PhD After synthesis, a protein is still immature until it has been customized for a specific task. Post-translational modifications (PTMs) are steps in biosynthesis to perform this customization of protein for unique functionalities. PTMs are also important to protein survival because they rapidly enable protein adaptation to environmental stress factors by conformation change. The overarching contribution of this thesis is the construction of a computational profiling framework for the study of biological signals stemming from PTMs associated with stressed proteins. In particular, this work has been developed to predict and detect the biological mechanisms involved in types of stress response with PTMs in mitochondrial (Mt) and non-Mt protein. Before any mechanism can be studied, there must first be some evidence of its existence. This evidence takes the form of signals such as biases of biological actors and types of protein interaction. Our framework has been developed to locate these signals, distilled from “Big Data” resources such as public databases and the the entire PubMed literature corpus. We apply this framework to study the signals to learn about protein stress responses involving PTMs, modification sites (MSs). We developed of this framework, and its approach to analysis, according to three main facets: (1) by statistical evaluation to determine patterns of signal dominance throughout large volumes of data, (2) by signal location to track down the regions where the mechanisms must be found according to the types and numbers of associated actors at relevant regions in protein, and (3) by text mining to determine how these signals have been previously investigated by researchers. The results gained from our framework enable us to uncover the PTM actors, MSs and protein domains which are the major components of particular stress response mechanisms and may play roles in protein malfunction and disease. iii Copyright Copyright 2016, Oliver Bonham-Carter. iv We never work alone in this business. Oliver Bonham-Carter Dedication I dedicate this work to my wife; Janyl and son; Vincent, my sisters; Kate and Daisy, my brother; Alexander, and my mother and father; Jane and Nicholas. These are the people who I, likely, subjected to many long-winded and unwarranted updates about the progression of this document and that of my degree, in general. This work is also dedicated to my advisors Dr. Dhundy (Kiran) Bastola, and Dr. Hesham Ali who served on my doctorate committee with Dr. Lotfollah Najjar and Dr. Steven From. By your constant and tireless efforts, mentoring, and coaching, I have been inspired to create an exciting work of scientific investigation. I would also like to thank the following wonderful people for their constant warmth, support and friendship: Ishwor Thapa, Dr. Jasjit Banwait-Kaur, Sean West and Scott McGrath. It is very likely that I also subjected these good people to far too many general updates on this work and I thank them earnestly for their counsel. This work is also dedicated to my friends from the bioinformatics lab at the University of Nebraska at Omaha: Dr. Sanjukta Bhowmick, Dr. v Kate Cooper, Julia Warnke, Vladimir Ufimtsev, Jay Pedersen, Kritika Karri, Kaitlin Goettsch, Asuda Sharma, Sunandini Sharma, Suyeon Kim, Sahil Sethi and many others too! Without the encouragement from my family and friends, none of this work would have been possible to return to our scientific community, from whom Ihavetakensomuch. vi Contents 1 Introduction 1 1.1SignalsAfterMechanisms........................ 1 1.2 A Focus On PTMs, Proteins And Their Typical Stress Responses . 2 1.2.1 Protein Regulation And Modification .............. 4 1.2.2 Sudden Stresses .......................... 7 1.2.3 PTMLocalizationAndStudy.................. 8 1.2.4 ProteinDomains......................... 9 1.3MitochondrialAndNon-MitochondrialProtein............. 10 1.3.1 Mitochondria: Energy Production And Stress Response . 12 1.3.1.1 AnEnergyCrisis.................... 13 1.3.2 MechanismsFromSignalDetection............... 16 2 The Ubiquitous Nature Of Signals In Biology And Living Systems 18 2.1TrafficSignals............................... 18 2.2SignalsOfTheHeart........................... 21 2.3NaturalSignalsOfOrganismalBiology................. 23 2.3.1 Mobile Organisms ......................... 23 2.3.2 Immobile Organisms ....................... 25 2.4Signals,SemanticsAndComprehension................. 27 vii 3 Objectives 29 3.1Motivation................................. 29 3.2GeneralOrganization........................... 30 3.3 Thesis Contributions ........................... 31 3.3.1 Contribution 1 .......................... 31 3.3.2 Contribution 2 .......................... 31 3.3.3 Contribution 3 .......................... 32 3.3.4 Contribution 4 .......................... 32 3.3.5 Contribution 5 .......................... 33 3.3.6 Contribution 6 .......................... 33 3.3.7 Contribution 7 .......................... 34 3.3.8 Contribution 8 .......................... 34 3.3.9 Contribution 9 .......................... 35 3.3.10Contribution10.......................... 35 4 Alignment-Free Genetic Sequence Comparisons: A Review of Recent Approaches By Word Analysis 37 4.1Abstract.................................. 37 4.2Introduction................................ 38 4.3Background................................ 41 4.4FactorFrequencies............................ 41 4.4.1 BBCByAnalysisOfMutualInformation............ 42 4.4.2 FeatureFrequencyProfiles(FFPs)............... 44 4.4.2.1 A Comparison By Frequencies And The Jensen-ShannonDivergenceTest........... 46 4.4.3 Suffix Trees By k-merFrequencies................ 49 4.4.4 Composition Vectors Based On k-merFrequencies....... 50 4.4.4.1 Creation Of Composition Vectors, (CVs, CCVs) . 51 viii 4.4.4.2 CreationOfImprovedCCV’s............. 52 4.4.4.3 DistanceMeasurement................. 53 4.4.5 ARevisedStringCompositionMethod............. 54 4.4.6 D2 Statistic............................ 56 4.5 Data Compression And Dictionaries ................... 59 4.5.1 Text Compression Algorithms .................. 61 4.5.2 Average Common Substring (ACS) ............... 62 4.6 Applications Of Alignment-Free Methods ................ 64 4.6.1 Biological Data And Sequence Assembly ............ 64 4.6.2 ChromosomalDataAndPhylogeny............... 65 4.6.3 Horizontal Gene Transfer ..................... 66 4.7AdvantagesAndDisadvantagesOfMethods.............. 68 4.8Conclusion................................. 70 4.9ArticleDetails............................... 71 5 An Analysis Of Palindromic Content In The Coding And Non-coding DNA Regions Of Bacteria 73 5.1Abstract.................................. 73 5.2Introduction................................ 74 5.3Methods.................................. 74 5.4ResultsAndDiscussion.......................... 77 5.4.1 Lengths-{4,6,8,10} Palindromes ................. 78 5.5Conclusions................................ 79 5.6ArticleDetails............................... 80 6 A Base Composition Analysis Of Natural Patterns For The Pre- Processing Of Metagenome Sequences 81 6.1Abstract.................................. 81 ix 6.2IntroductionAndRelatedWork..................... 82 6.3Methods.................................. 85 6.3.1 GenomeSequences........................ 85 6.3.2 ReadAndContigSequences................... 86 6.3.3 Motifs............................... 87 6.3.3.1 BaseCompositionsAndSpectrumSets........ 87 6.3.4 Proportions............................ 90 6.4ResultsAndDiscussion.......................... 91 6.4.1 SequenceData.........................