Downloaded from the NHGRI Website
Total Page:16
File Type:pdf, Size:1020Kb
Development and Evaluation of Software for Applied Clinical Genomics by Casper Shyr BSc (Computer Science and Biology), The University of British Columbia, 2010 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Bioinformatics) The University of British Columbia (Vancouver) April 2016 © Casper Shyr, 2016 Abstract High-throughput next-generation DNA sequencing has evolved rapidly over the past 20 years. The Human Genome Project published its first draft of the human genome in 2000 at an enormous cost of 3 billion dollars, and was an international collaborative effort that spanned more than a decade. Subsequent technological innovations have decreased that cost by six orders of magnitude down to a thousand dollars, while throughput has increased by over 100 times to a current delivery of gigabase of data per run. In bioinformatics, significant efforts to capitalize on the new capacities have produced software for the identification of deviations from the reference sequence, including single nucleotide variants, short insertions/deletions, and more complex chromosomal characteristics such as copy number variations and translocations. Clinically, hospitals are starting to incorporate sequencing technology as part of exploratory projects to discover underlying causes of diseases with suspected genetic etiology, and to provide personalized clinical decision support based on patients’ genetic predispositions. As with any new large-scale data, a need has emerged for mechanisms to translate knowledge from computationally oriented informatics specialists to the clinically oriented users who interact with it. In the genomics field, the complexity of the data, combined with the gap in perspectives and skills between computational biologists and clinicians, present an unsolved grand challenge for bioinformaticians to translate patient genomic information to facilitate clinical decision-making. This doctoral thesis focuses on a comparative design analysis of clinical decision support systems and prototypes interacting with patient genomes under various sectors of healthcare to ultimately improve the treatment and well-being of patients. Through a combination of usability methodologies across multiple distinct clinical user groups, the thesis highlights reoccurring domain-specific challenges and introduces ways to overcome the roadblocks for translation of next-generation sequencing from research laboratory to a multidisciplinary hospital environment. To improve the interpretation efficiency of patient genomes and informed by the design analysis findings, a novel computational approach to prioritize exome variants ii based on automated appraisal of patient phenotypes is introduced. Finally, the thesis research incorporates applied genome analysis via clinical collaborations to inform interface design and enable mastery of genome analysis. iii Preface The work described in this thesis is based upon research done by Casper Shyr in Dr. Wyeth W. Wasserman’s group at the Centre for Molecular medicine and Therapeutics (CMMT), Child and Family Research Institute (CFRI) at the BC Children’s Hospital. Part of the research is done as collaborations with OMICS2TREATID team, led by Dr. Clara van Karnebeek, for which Casper Shyr was granted co- authorship on the publications that came out. Works that have been published in peer-reviewed scientific journals are listed below. Contributions from Casper Shyr and acknowledgements to other members are discussed below. Work in chapter 2 and 3 were done by Casper Shyr, with support and guidance from Dr. Andre Kushniruk, Dr. Jehannine Austin, Dr. Sohrab Shah, and Dr. Clara van Karnebeek. Dr. Kushniruk and Dr. Wasserman were particularly involved in designing the study. Dr. Kushniruk further assisted with the data analysis. Dr. van Karnebeek facilitated the recruitment process. Cynthia Ye, Alice Chou, and Dr. Ekaterina Nosova helped with the design of tutorial videos. Patrick Tan, Calvin Lefebvre, and David Arenillas supported the usability evaluations by being initial participants. Jonathan Chang and Michael Hockertz provided the equipment necessary to conduct the usability analysis. This work was supported by Canadian Institutes of Health Research (CIHR) grant number MOP-82875, Natural Sciences and Engineering Research Council of Canada (NSERC) grant number RGPIN355532-10, Omics2TreatID, and Genome Canada/Genome BC 174DE (ABC4DE project). Icons and graphic arts incorporated into the figures and tables are modified from open repositories freely available for academic use. These two chapters have received ethical approval from UBC Children’s and Women’s Research Ethics Board (H12-02738, H13-02034). The work for chapter 2 is in part published as a first-author research article in C. Shyr, A. Kushniruk, W.W. Wasserman, 2014, “Usability study of clinical exome analysis software: Top lessons learned and recommendations”, Journal of Biomedical Informatics, 51, 129- 136. iv The work for chapter 3 was in part published as a first-author research article in C.Shyr, C.D.M. van Karnebeek, A. Kushniruk, W.W. Wasserman, 2015, “Dynamic software design for clinical exome and genome analyses: insights from bioinformaticians, clinical geneticists and genetic counselors”, JAMIA, Jun 27. pii: ocv053. doi: 10.1093/jamia/ocv053. [Epub ahead of print] The work in chapter 4 was done by Casper Shyr, with data generation and analysis contributions from Jessica Lee and Mike Gottlieb. Jessica was specifically involved with extracting publication records from NCBI, and Mike was involved with the calculations of dN/dS and related gene-level measurements. Dr. Maja Tarailo-Graovac formulated the experiment and oversaw the project collectively with Dr. Wyeth wasserman. Dr. Clara van Karnebeek facilitated the clinical coordination to include a relevant patient case into the manuscript as proof-of-concept. Written informed consent was obtained from the patient’s guardian/parent/next of kin for the publication of this report. Additional acknowledgements go towards Drs. J. Wu, J. Rozmus, S. Vercauteren, K. Hildebrand, T. Dewan and A. Garcera for clinical evaluation and management of the patient; Mrs. X. Han for Sanger sequencing; Mr. B. Sayson for data management; Mrs. M. Higginson for DNA extraction, sample handling and technical data; Dr. C. Vilarino-Guell for timely whole exome sequencing; Dr. W. Cheung for MeSHOP support; Mr. D. Arenillas and Mr. M. Hatas for systems support, and Dora Pak for research management support. The work was published as joint-first-authorship with Dr. Tarailo-Graovac in: C. Shyr, M. Tarailo-Graovac, M. Gottlieb, J.J.Y. Lee, C. van Karnebeek and W.W. Wasserman, 2014, “FLAGS, Frequently Mutated Genes in Public Exomes”, BMC Medical Genomics, 7, 64. The work in chapter 5 was done by Casper Shyr, with guidance from Dr. Maja Tarailo-Graovac, Dr. Anthony Mathelier and Dr. Sohrab Shah. Jessica Lee, Wenqiang Shi, Yifeng Li, and David Arenillas provided bioinformatics advice. Mike Gottlieb supplied codes that he wrote from GeneYenta project that were adapted for my own project. I am also grateful to the clinicians involved, including Dr. S. Stockler, v Dr. G. Horvath, Dr. S. Sirrs, Dr. K. Boycott, Dr. A. Lehman, Dr. G. Sinclair, Dr. H. Vallance, Dr. R. Salvarinova, and Dr. B. Gibson. Additional contributors include Ms. C. Ye for data analysis; Dr. A. Bhavsar and Dr. Linhua for wetlab experimentation; Ms. X. Han and Dr. B. Drogemoller for Sanger sequencing; Mr. B. Sayson and Ms. K. Townsend for data management; Ms. M. Higginson for DNA extraction, sample handling and technical data; Dr. C. Vilarino-Guell providing data on Ion-torrent based whole exome sequencing; Dr. W. Cheung for MeSHOP support; Mr. M. Hatas for systems support, and Ms. Dora Pak for research management support. The work has not been published at the time of writing. Chapter 6 and Chapter 7 highlight the collaborative works done with OMICS2TREATID. Casper Shyr provided bioinformatics processing and interpretive analysis on exomes and whole-genomes. Casper also contributed to the methodologies and results write-up in the manuscripts. The work was supported by David Arenillas for software development, Miroslav Hastas for server management, Dr. Virginie Bernard, Dr. Maja Tarailo-Graovac and Jessica Lee for bioinformatics analysis. Wet lab validations were carried out by Dr. Linhua Zhang and Xiaohua Han from Dr. Colin Ross laboratory. For chapter 6, Xiahua Han performed CA5A molecular analyses, M. Higginson and M. Zhou conducted DNA extraction, sample handling, and technical data, and M. Thomas and M. Lafek oversaw consent procedure and data management. K. MacKenzie conducted psychological testing of the siblings, and J. Rilstone helped with text editing. Additional acknowledgements extend towards colleagues in the Departments of Pediatrics and Medical Genetics, University of British Columbia (Canada) for clinical management of patients; Dr. D. Wishart and Dr. Rupa Mandal for NMR analyses (University of Alberta, Canada); Dr. M. Baumgärtner and Dr. P. Burda (Zürich University, Switzerland), Dr. J. Cameron (University of Toronto, Canada), Dr. M. Evans (University of Edinburgh, United Kingdom), Dr. M. Kaczocha and Dr. R. Haltiwänger (Stony Brook University, USA), Dr. G. Lavery (University of Birmingham, United Kingdom), Dr. A. Waheed and Dr. W. Sly (St. Louis University, USA), Dr. S. Corvera (University of Massachusetts Medical School, USA) for in vitro studies