Discovering Pharmacogenomic Variants and Pathways with the Epigenome and Spatial Genome
Total Page:16
File Type:pdf, Size:1020Kb
The Pharmacoepigenomics Informatics Pipeline and H-GREEN Hi-C Compiler: Discovering Pharmacogenomic Variants and Pathways with the Epigenome and Spatial Genome by Ari Lawrence Allyn-Feuer A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Bioinformatics) in the University of Michigan 2018 Doctoral Committee: Professor Brian D. Athey, Chair Assistant Professor Alan P. Boyle Professor Ivo D. Dinov Research Professor Gerald A. Higgins Professor Wolfgang Sadee, The Ohio State University Ari Lawrence Allyn-Feuer [email protected] ORCID iD: 0000-0002-8379-2765 © Ari Allyn-Feuer 2018 Dedication In the epilogue of Altneuland, Theodor Herzl famously wrote: “Dreams are not so different from deeds as some may think. All the deeds of men are dreams at first, and become dreams in the end.” Medical advances undergo a similar progression, from invisible to visible and back. Before they are accomplished, advances in the physician’s art are illegible: no one differentiates from the rest the suffering and death which could be alleviated with methods which do not yet exist. Unavoidable ills have none of the moral force of avoidable ones. Then, for a brief period, beginning shortly before it is deployed in mainstream practice, and slowly concluding over the generation after it becomes widespread, an advance is visible. People see the improvements and celebrate them. Then, subsequently, for the rest of history, if we are lucky, such an advance is more invisible than it was before it was invented. No one tallies the children who do not get polio, the firm ground that used to be a malarial swamp, or the quiet fact of sanitation. Few remark on the novelty of the un-novel. ii This thesis contemplates medical advances which do not yet exist, which are invisible. It contemplates the means by which medical science may make the genetic component of variation in disease risk, drug response, adverse events, and other medical phenotypes visible to the clinician so that informed decisions on the basis of this information may improve patient outcomes. It is offered in the fervent hope that it can contribute, in a small way, to improving and hastening the subsequent work of making these informed decisions possible. Accordingly, it is dedicated to the patients, currently largely anonymous, who suffer invisibly in ways which can be ameliorated by genetic testing and by actions undertaken on the basis of knowledge gleaned from genetic testing. May their alleviated suffering soon be equally invisible. iii Acknowledgements This work is the result of seven years of exertion, during which time I have been aided, in a variety of ways ranging from the direct to the oblique, by a very large number of people. It is cliché for an author to announce in the acknowledgements of a work the impossibility of acknowledging everyone who deserves it. This truism has never felt more true, for me, than it does now. I therefore first acknowledge those whom I have inadvertently omitted. I acknowledge all the scientists in this field and outside it who shared their work with me, by publication, presentation, or conversation. I acknowledge the patients, literally millions of them, who volunteered for the clinical research studies, including GWAS, on which this work and all the cited works are based. I acknowledge all of my teachers, professors, editors, mentors, and other instructors over the last twenty five years. I acknowledge my colleagues and coauthors. iv I acknowledge all the members of my thesis committee, particularly my advisor, Brian D. Athey, and Gerry Higgins. I acknowledge everyone whom I consulted for advice and who reviewed drafts of any of my publications and other documents, including this thesis. I acknowledge everyone on whom I practiced any of my various talks. I acknowledge everyone in the support staff of the department and university who accommodated all my various administrative and technical needs. I acknowledge everyone who endured a social slight or any form of inattention from me because I was working. I acknowledge all of my friends and lovers. I acknowledge most of all the support, encouragement, and assistance of my family. v Preface This work describes my modest contribution to a scientific project which has been underway, in one form or another, for thousands of years: the analysis and parsing of genetic inheritance and the use of genetic inferences to aid humanity’s unceasing labors. Specifically, it describes a suite of bioinformatics tools I have constructed, in an attempt to translate recent advances in the use of the epigenome and spatial genome to identify and parse important phenotypic variants, to an area of medical science, pharmacogenomics, which has largely failed to benefit from them. The bulk of this work is based on published and soon-to-be-published papers which were collaborative in nature and have other authors. I have endeavored in this narrative to tell the story of my work, but that narrative involves collaborative work which is not fully separable. Where there is any doubt, I urge you to maximize your appraisal of my colleagues. I begin by describing, in the first chapter, the preeminent role these advances have taken in biomedical science and in genetics in other contexts, and the disjunction between this picture and the picture in pharmacogenomics. This chapter is partially based on the 2015 review [Higgins et al 2015]. vi I then describe, in the second chapter, three variant analyses I and others undertook to explore pharmacogenomic phenotypes with this powerful new explanatory framework, in neuropsychiatric small molecule medications, lithium, and valproic acid, and the success of these analyses in discovering mechanistic and predictive variants. This chapter is largely based on the primary papers reporting these experiments [Higgins et al 2015, Higgins et al 2015, Higgins et al 2017]. I then describe, in the third chapter, the Pharmacoepigenomics Informatics Pipeline, a tool I built with the object of making such analyses easier to perform, more reproducible, and amenable to advanced methods which were not tractable in the semi-manual analytic mode in which such analyses had previously been run. I further describe the success of the PIP in reproducing the glutamatergic lithium pathway previously discovered, and in uncovering a previously unknown genetic basis for warfarin response. This chapter is largely based on the PIP paper [Allyn-Feuer et al 2018]. I then describe, in the fourth chapter, the H-GREEN Hi-C compiler, a tool I built to function as one portion of a target gene module for a future version of the PIP, to address the finding of distal target genes, an important gap which existing methods could not fill. I further describe its success at detecting distal chromatin contacts existing methods could not find in a head-to-head comparison, and in building spatial contact networks of phenotypically important chromatin domains. This chapter is currently being adapted for publication. vii I conclude by describing, in the fifth and final chapter, my vision for a future evolved PIP featureset, for the use of such pipelines to develop genetic tests, and for the extension of such methods in parallel across thousands of phenotypes to develop a pharmacophenomic atlas. This chapter is currently being adapted for publication, and a small part was based on the 2018 machine learning review [Kalinin et al 2018]. This work is not complete; no scientific work is ever truly complete. There are logical developments to be undertaken, separate elements to combine, more experiments to perform, and ultimately, widespread clinical deployments to be undertaken. It is my fond wish that this work take place, under whatever aegis, so that future students may have more to learn, and so that some of those who are sick may be well. Ari Lawrence Allyn-Feuer Ann Arbor, Michigan, United States of America July 2018 viii Table of Contents Dedication ...................................................................................................................................... ii Acknowledgements ...................................................................................................................... iv Preface ........................................................................................................................................... vi List of Figures .............................................................................................................................. xii Abstract ....................................................................................................................................... xiv Chapter 1: Pharmacoepigenomics .............................................................................................. 1 Pharmacogenomics Lags Other Genetics Disciplines in Benefiting from the Epigenome . 1 The GWAS Interpretation Challenge ..................................................................................... 4 The Epigenome ........................................................................................................................ 10 The Spatial Genome ................................................................................................................ 17 Pharmacoepigenomics ............................................................................................................ 26 The Paucity of Pharmacoepigenomics in Clinical Practice ................................................