UCLA Electronic Theses and Dissertations
Total Page:16
File Type:pdf, Size:1020Kb
UCLA UCLA Electronic Theses and Dissertations Title An Informatics Roadmap Toward a FAIR Understanding of Mitochondrial Biology and Rare Mitochondrial Disease Permalink https://escholarship.org/uc/item/5h21d18r Author Garlid, Anders Olav Publication Date 2019 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA Los Angeles An Informatics Roadmap Toward a FAIR Understanding of Mitochondrial Biology and Rare Mitochondrial Disease A thesis submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Molecular, Cellular, & Integrative Physiology (MCIP) by Anders Olav Garlid 2019 © Copyright by Anders Olav Garlid 2019 ABSTRACT OF THE DISSERTATION An Informatics Roadmap Toward a FAIR Understanding of Mitochondrial Biology and Rare Mitochondrial Disease by Anders Olav Garlid Doctor of Philosophy in Molecular, Cellular, & Integrative Physiology (MCIP) University of California, Los Angeles, 2019 Professor Peipei Ping, Chair Mitochondrial biology is integral to our fundamental understanding of human health and many diseases. They exist in every human cell type except for red blood cells and have critical functions in metabolism, oxidative phosphorylation, oxidation-reduction, and as signaling hubs responsible for mediating protective mechanisms. Rare mitochondrial diseases (RMDs) are devastating and complex, affect multiple organ systems, and disproportionately impact young children. Despite copious existing knowledge and increased public interest, the knowledge is fragmented and difficult to access. Clinical case reports (CCRs) on RMDs contain valuable clinical insights, but they are scarce and lack the metadata necessary to facilitate their discovery among the two million CCRs on PubMed. The unstructured text data of CCRs is also ill-suited to computational approaches, limiting our ability to derive the knowledge contained within. To address these issues, I assembled all available informatics tools and resources with mitochondrial components and used them to contribute to Gene Wiki pages that enable easy access to mitochondrial knowledge for researchers, students, clinicians, and patients. Through these efforts, I made mitochondrial gene, protein, and disease knowledge widely accessible with contributions of over 4MB of content across 541 Gene Wiki pages. Concurrently, I used Gene Wiki as an educational platform to train over 50 students in the biosciences and pre-medical ii studies in mitochondrial biology and disease, as well as instilling effective research and writing methods in biomedicine. To impose structure on CCRs and render them FAIR (Findable, Accessible, Interoperable, Reusable), I developed and applied a standardized metadata template to RMD CCRs and codified patient symptomology with the International Statistical Classification of Disease and Related Health Problems (ICD) system. I created the open-source, cloud-based MitoCases RMD Knowledge Platform (http://mitocases.org/) to house data on 384 RMD CCRs, including 4,561 instances of 952 unique ICD codes. Supplementing CCRs with structured metadata amplifies machine-readable information content and provides a distinct improvement in searching for CCRs as compared to indexing by title and abstract. Finally, I employed these resources to conduct a thorough review of Barth syndrome and characterized the diversity of presentations, range of genetic etiologies, and treatment paradigms. iii The dissertation of Anders Olav Garlid is approved. Alex Ahn-Tuan Bui Mario C. Deng Thomas M. Vondriska Xia Yang Peipei Ping, Committee Chair University of California, Los Angeles 2019 iv DEDICATION PAGE This thesis is dedicated to my Mom, Randi Brannan, my Dad, Keith Garlid, my brother, Torleif Garlid, Fiona Palmer, Alec Emmons, and in loving memory of Nicole Deland, all of whom have supported me, pushed me, and inspired me to keep striving. v TABLE OF CONTENTS ABSTRACT ...................................................................................................................................ii COMMITTEE PAGE .....................................................................................................................iv DEDICATION ................................................................................................................................v TABLE OF CONTENTS ...............................................................................................................vi LIST OF FIGURES ......................................................................................................................vii LIST OF TABLES .......................................................................................................................viii ACKNOWLEDGEMENTS ............................................................................................................ix VITA/BIOGRAPHICAL SKETCH .................................................................................................xi CHAPTER 1. INTRODUCTION .....................................................................................................1 CHAPTER 2. EQUIPPING PHYSIOLOGISTS WITH AN INFORMATICS TOOL CHEST: TOWARD AN INTEGRATED MITOCHONDRIAL PHENOME .................................................... 13 CHAPTER 3. A METADATA EXTRACTION APPROACH FOR CLINICAL CASE REPORTS TO ENABLE ADVANCED UNDERSTANDING OF BIOMEDICAL CONCEPTS ............................... 39 CHAPTER 4. A REFERENCE SET OF CURATED BIOMEDICAL DATA AND METADATA FROM CLINICAL CASE REPORTS ...................................................................................................... 50 CHAPTER 5. A FAIR REPRESENTATION OF MITOCHONDRIAL BIOLOGY AND RARE MITOCHONDRIAL DISEASE VIA GENEWIKI AND MITOCASES ............................................. 68 CHAPTER 6. TAZ ENCODES TAFAZZIN, A TRANSACYLASE ESSENTIAL FOR CARDIOLIPIN FORMATION AND CENTRAL TO THE ETIOLOGY OF BARTH SYNDROME ........................ 109 CHAPTER 7. CONCLUSION ................................................................................................... 143 vi LIST OF FIGURES CHAPTER 1. INTRODUCTION Figure 1-1 ..................................................................................................................................3 CHAPTER 3. A METADATA EXTRACTION APPROACH FOR CLINICAL CASE REPORTS TO ENABLE ADVANCED UNDERSTANDING OF BIOMEDICAL CONCEPTS Figure 1. Workflow for Case Report Annotation. .................................................................... 44 Figure 2. Identification of Concept-Specific Text in a Clinical Case Report. ........................... 44 CHAPTER 4. A REFERENCE SET OF CURATED BIOMEDICAL DATA AND METADATA FROM CLINICAL CASE REPORTS Figure 1. Data creation workflow. ........................................................................................... 53 Figure 2. Contents of the MACCR dataset. ............................................................................ 54 Figure 3. Geographic regions with difference in publication frequency. ................................. 60 CHAPTER 5. A FAIR REPRESENTATION OF MITOCHONDRIAL BIOLOGY AND RARE MITOCHONDRIAL DISEASE VIA GENEWIKI AND MITOCASES Figure 5-1. Targeting cardiac mitochondrial proteins for article improvement on Gene Wiki.. 71 Figure 5-2. Mitochondrial Gene Wiki article improvement workflow. ...................................... 73 Figure 5-3. Mitochondrial Gene Wiki article completion progression. .................................... 78 Figure 5-4. Cloud-based MitoCases platform architecture. .................................................... 87 Figure 5-5. MitoCases MySQL database schema. ................................................................. 89 Figure 5-6. Data API endpoints. ............................................................................................. 90 Figure 5-7. Case Report Search. ............................................................................................ 93 Figure 5-8. Metadata upload functionality for user contributions. ........................................... 94 Figure 5-9. Comparing MeSH terms and ICD-11 codes for Barth syndrome CCRs. .............. 97 Figure 5-10. Genetic etiologies in selected mitochondrial diseases. ...................................... 98 Figure 5-11. PubMed case report search results. ................................................................. 100 CHAPTER 6. TAZ ENCODES TAFAZZIN, A TRANSACYLASE ESSENTIAL FOR CARDIOLIPIN FORMATION AND CENTRAL TO THE ETIOLOGY OF BARTH SYNDROME Figure 6-1. Tafazzin domains and mutation frequency, type, and pathogenicity. ................. 128 Figure 6-2. Complex symptomology of Barth syndromse codified by ICD-10. ..................... 129 vii LIST OF TABLES CHAPTER 2. EQUIPPING PHYSIOLOGISTS WITH AN INFORMATICS TOOL CHEST: TOWARD AN INTEGRATED MITOCHONDRIAL PHENOME Table 1. Mitochondrial resources and websites. ..................................................................... 19 Table 2. Mitochondrial entries in existing big resources. ........................................................ 22 Table 3. Publicly accessible mitochondrial datasets. .............................................................. 25 Table 4. Analysis pipelines and platforms ............................................................................... 28 CHAPTER 3. A METADATA EXTRACTION APPROACH FOR CLINICAL CASE REPORTS TO ENABLE ADVANCED UNDERSTANDING OF BIOMEDICAL CONCEPTS Table 1. Disease Categories for