UNIVERSITY of CALIFORNIA Los Angeles Cloud-Based Analysis And
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF CALIFORNIA Los Angeles Cloud-based Analysis and Integration of Proteomics and Metabolomics Datasets A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Bioinformatics by Jeong Ho (Howard) Choi 2019 © Copyright by Jeong Ho (Howard) Choi 2019 ABSTRACT OF THE DISSERTATION Cloud-based Analysis and Integration of Proteomics and Metabolomics Datasets by Jeong Ho (Howard) Choi Doctor of Philosophy in Bioinformatics University of California, Los Angeles, 2019 Professor Peipei Ping, Chair Our capabilities to define cardiovascular health and disease using highly multivariate “omics” datasets have substantially increased in recent years. Advances in acquisition technologies as well as bioinformatics methods have paved the way for ultimately resolving every biomolecule comprising various human “omes”. Understanding how different “omes” change and interact with one another temporally will ultimately unveil multi-omic molecular signatures that inform pathologic mechanisms, indicate disease phenotypes, and identify new therapeutic targets. Herein we describe a thesis project that creates novel, contemporary data science methods and workflows to extract temporal molecular signatures of disease from multi-omics analyses, and develops integrated omics knowledgebases for the cardiovascular community at-large. Chapter 1 provides an overview of cardiac physiology and pathophysiology involved in cardiac remodeling and heart failure (HF). A description of the systematic characterization of cardiac proteomes and metabolomes is included, including methodologies for multi-omics phenotyping. Finally, an overview of bioinformatics methods for driver molecule discovery is provided, discussing strategies for characterizing temporal patterns and conducting functional enrichment. ii Chapter 2 describes computational approaches to discern the oxidative posttranslational modification (O-PTM) proteome, an important factor in cardiac remodeling. We developed a novel platform involving a customized, quantitative biotin switch pipeline and advanced analytic workflow to profile O-PTMs in an isoproterenol (ISO)-induced cardiac remodeling mouse model. We identified 1,655 proteins containing 3,324 oxidized sites, and unveiled temporal progression of O-PTM in disease. Chapter 3 discusses computational approaches for identifying temporal metabolomics fingerprints in HF treatment. Pathologic remodeling from a healthy to diseased heart involves a series of alterations over time. Mechanical circulatory support devices (MCSD) are a promising strategy for unloading the heart and reversing this process. We sought to identify molecular drivers of pathologic remodeling and reverse remodeling in the plasma metabolome; however, machine learning (ML)-empowered technological platforms required for these analyses are lacking. Thus, we established a Multiple Reaction Monitoring (MRM)-based MS quantitative platform and ML-based computational workflow to discern metabolomics fingerprints. We quantified 610 plasma metabolites and identified those exhibiting high correlation to cardiac phenotype, demonstrating a novel platform for biomarker discovery. Finally, Chapter 4 integrates all aforementioned innovations into one unified, cloud-based computational knowledgebase, MetProt, equipped to analyze, annotate, and integrate metabolomics and proteomics information. This pipeline fully characterizes the plasma metabolome in HF, unveils the interplay of proteomes and metabolomes, and derives new knowledge in cardiovascular medicine. Innovations include engineered features for addressing large-scale clinical datasets as well as algorithms to connect various types of molecules (e.g., proteins and metabolites). Chapter 4 is subdivided into 3 projects: Project 1 describes a computational pipeline to characterize the plasma metabolome using datasets from the ISO mouse model of HF and human HF; Project 2 develops bioinformatics strategies to integrate proteome and metabolome datasets from six genetically distinct mouse strains; and Project 3 establishes the cloud-based MetProt to disseminate the above computational pipelines for the iii cardiovascular community at-large. Taken together, these innovations offer new approaches and workflows for integrated omics investigations that enable novel discovery and ultimately advance precision cardiovascular science and medicine. iv The dissertation of Jeong Ho (Howard) Choi is approved Alex Bui Mario C. Deng Wei Wang Peipei Ping, Committee Chair University of California, Los Angeles 2019 v To my family. vi TABLE OF CONTENTS Abstract of the Dissertation ………………………………………………………………………… ii Table of Contents …………………………………………………………………………………… vii List of Figures ……………………………………………………………………………………….. viii List of Tables ………………………………………………………………………………………… ix Recurring Notations and Abbreviations …………………………………………………………... x Acknowledgements …………………………………………………………………………………. xi Biographical Sketch …………………………………………………………………………………. xii CHAPTER 1: INTRODUCTION AND OVERVIEW. 1.I. The Landscape of the Human Heart and Disease ………………………………………….. 2 1.II. Systematic Characterization of the Cardiac Proteome and Metabolome ……………….. 5 1.III. Overview of Bioinformatic Methods for Driver Molecule Identification and Integration .. 6 1.IV. Overview and Rationale ……………………………………………………………………… 12 CHAPTER 2: COMPUTATIONAL APPROACHES TO DISSECT THE CYSTEINE O-PTM PROTEOME DURING CARDIAC HYPERTROPHY. 2.I. Abstract ………………………………………………………………………………………….. 15 2.II. Introduction ……………………………………………………………………………………… 15 2.III.Methods and Materials ……………………………………………………………………….... 17 2.IV.Results ………………………………………………………………………..…………………. 26 2.V. Discussion ………………………………………………………………………………………. 56 2.VI.Conclusions …………………………………………………………………………………...... 58 CHAPTER 3: COMPUTATIONAL APPROACHES TO IDENTIFY METABOLOME FINGERPRINTS OF PATHOLOGICAL STAGES FOLLOWING HEART FAILURE TREATMENT. 3.I. Abstract ………………………………………………………………………………………….. 60 3.II. Introduction ……………………………………………………………………………………… 61 3.III. Methods and Materials ………………………………………………………………………... 64 3.IV.Results ………………………………………………………………………..…………………. 68 3.V. Discussion ………………………………………………………………………………………. 83 CHAPTER 4: CLOUD-BASED COMPUTATIONAL KNOWLEDGEBASE TO ANALYZE, ANNOTATE, AND INTEGRATE METABOLOMICS AND PROTEOMICS DATASETS 4.I. Abstract ………………………………………………………………………………………….. 93 4.II. Introduction ……………………………………………………………………………………… 94 4.III.Results ………………………………………………………………………..…………………. 99 4.IV.Discussion ………………………………………………………………………………………. 108 4.V. Conclusions …………………………………………………………………………………...... 115 REFERENCES ..…………………………………………………………………………………...... 116 vii LIST OF FIGURES FIGURE 1.1 Hierachical Structure of Hypothesis Tests ………………………………………… 8 FIGURE 2.1 Quantitative cysteine O-PTM proteomics workflow ………………………………. 25 FIGURE 2.2 Impact of ISO-induced cardiac hypertrophy on the reversible cysteine O-PTM profile ………………………………………………………………………… 37 FIGURE 2.3 Temporal profiling of reversible cysteine O-PTM proteomes using cubic spline-based clustering. ………………………………………………………. 43 FIGURE 2.4 Temporal profiling combining reversible cysteine O-PTM and total cysteine proteomes using cubic spline-based co-clustering …………………….. 45 FIGURE 2.5 Phenotypic alteration and corresponding cysteine site fingerprints during ISO-induced cardiac hypertrophy ………………………………………….. 50 FIGURE 2.6 Signature pathways and proteins that contribute to cardiac hypertrophy …….. 52 FIGURE 3.1 Schematic Overview of Experimental Protocol for Global Plasma Metabolomics Profiling in ISO-stimulated Mice, Healthy Humans, and HF Patients. …………………………………………………………………….. 69 FIGURE 3.2 Metabolomics Dynamics in Mouse Plasma During ISO-induced Cardiac Remodeling.. ……………………………………………………………….. 71 FIGURE 3.3 Phenotypic Alteration and Corresponding Metabolic Fingerprints During ISO-induced Cardiac Remodeling in Mice .…………………….…………. 74 FIGURE 3.4 Healthy Human Subjects Plasma Metabolomics Stability and Their Distinction from HF Patients ……………………..…………………………... 77 FIGURE 3.5 Metabolomics Dynamics in Plasma from HF Patients Before and After MCSD Implantation.……………………………………………………………. 78 FIGURE 3.6 Temporal Changes of Clinical Manifestation and Corresponding Metabolomic Dynamics in HF Patients Following MCSD Implantation.………… 81 FIGURE 4.1 MetProt Integrated workflws .……………...…..…………..……………………… 100 FIGURE 4.2 Intersections where metabolome meets proteome .………………..…………... 103 FIGURE 4.3 MetProt Distributed System Archiecture ……..…..…………..………..…………. 106 FIGURE 4.4 MetProt User Interface ……..…..…………..………..………………………………107 viii LIST OF TABLES TABLE 2.1 Summary information of raw file contents and statistics for all experimental groups ……………………………………………………………... 23&24 TABLE 2.2 List of cysteine sites with alterations in modification abundance during cardiac hypertrophy ………………………………………………………. 28-32 TABLE 2.3 Enriched biological pathways of proteins bearing cysteine sites with significantly increased/decreased modification abundance ………….…. 33-35 TABLE 2.4 List of cysteine sites significantly correlated with the hypertrophy phenotype ………………………………………………………………………..... 46-48 TABLE 2.5 Enriched biological pathways of proteins associated with cysteine sites significantly correlated with the hypertrophy phenotype …………………… 51 TABLE 3.1 Biological Calsses of 610 Targeted Metabolites in Mouse and Human Plasma …………………………………………........................................ .63 TABLE 3.2 Demographics of the Human Cohorts …………………………………………. 65-66 ix RECURRING NOTATIONS AND ABBREVIATIONS