Protein Structure-Guided Approaches to Identify Functional Mutations in Cancer Sohini Sengupta Washington University in St

Washington University in St. Louis Washington University Open Scholarship Arts & Sciences Electronic Theses and Dissertations Arts & Sciences Winter 12-15-2018 Protein Structure-Guided Approaches to Identify Functional Mutations in Cancer Sohini Sengupta Washington University in St. Louis Follow this and additional works at: https://openscholarship.wustl.edu/art_sci_etds Part of the Bioinformatics Commons Recommended Citation Sengupta, Sohini, "Protein Structure-Guided Approaches to Identify Functional Mutations in Cancer" (2018). Arts & Sciences Electronic Theses and Dissertations. 1688. https://openscholarship.wustl.edu/art_sci_etds/1688 This Dissertation is brought to you for free and open access by the Arts & Sciences at Washington University Open Scholarship. It has been accepted for inclusion in Arts & Sciences Electronic Theses and Dissertations by an authorized administrator of Washington University Open Scholarship. For more information, please contact [email protected]. WASHINGTON UNIVERSITY IN ST. LOUIS Division of Biology and Biomedical Sciences Computational and Systems Biology Dissertation Examination Committee: Li Ding, Chair Greg Bowman Barak Cohen Cynthia Ma Chris Maher Protein Structure-Guided Approaches to Identify Functional Mutations in Cancer by Sohini Sengupta A dissertation presented to the Graduate School of Washington University in partial fulfillment of the requirements for the degree of Doctor of Philosophy December 2018 St. Louis, Missouri © 2018, Sohini Sengupta Table of Contents Acknowledgments ................................................................................................................... v Abstract of the Dissertation .................................................................................................... ix Chapter 1: Introduction .......................................................................................................... 1 1.1 Existing computational methods to identify driver mutations ..................................................... 3 1.1.1 Shortcomings of Current Computational Methods .................................................................... 7 Chapter 2: HotSpot3D: A computational algorithm to identify intra- and inter-molecular mutation clusters in protein structure ................................................................................... 11 Preface ........................................................................................................................................... 11 2.1 Abstract .................................................................................................................................... 13 2.2 Introduction ............................................................................................................................. 14 2.3 Results ...................................................................................................................................... 15 2.3.1 Intra- and inter-mutation clusters across 19 cancer types ....................................................... 15 2.3.2 Significant mutation clusters with cancer type specificity ........................................................ 17 2.3.3 Rare and medium recurrence functional mutation discovery .................................................. 18 2.3.4 Validation by protein array and functional experiment ........................................................... 20 2.3.5 Mutation-drug networks and clinical implications ................................................................... 23 2.4 Discussion ................................................................................................................................ 26 2.5 Methods ................................................................................................................................... 27 2.5.1 HotSpot3D and code comparison ............................................................................................. 28 2.5.2 Data preprocessing ................................................................................................................... 28 2.5.3 3D proximal pairs analysis ........................................................................................................ 29 2.5.4 Drug interaction module .......................................................................................................... 31 2.5.5 Cancer mutation data set and cancer types ............................................................................. 31 2.5.6 Identifying mutation and drug-mutation clusters .................................................................... 32 2.5.7 Prioritizing clusters with high cluster closeness ....................................................................... 33 2.5.8 Cluster conservation score ....................................................................................................... 34 2.5.9 Cluster validation ...................................................................................................................... 35 2.5.10 Mutation and drug annotations ............................................................................................. 36 2.5.11 Prioritized variant list for functional validation ...................................................................... 37 2.5.12 Software engineering aspects ................................................................................................ 37 2.6 Supplementary Note ................................................................................................................ 39 2.6.1 Performance assessment and comparison to existing tools .................................................... 39 2.6.2 Intra- and inter-mutation clusters across 19 cancer types ....................................................... 40 2.6.3 Significant mutation clusters with cancer type specificity ........................................................ 40 ii 2.6.4 Mutation-drug networks and clinical implications ................................................................... 41 2.6.5 SUPPLEMENTARY REFERENCES ................................................................................................ 42 2.7 Figures ...................................................................................................................................... 43 Table 1. Top (cluster closeness > 2.5) drug-mutation clusters with HGNC gene families and drug classifications from NIH and DrugBank. .......................................................................................... 56 References ..................................................................................................................................... 58 Chapter 3: Integrative Omics Analyses Broadens Treatment Targets in Human Cancer .......... 61 Preface ........................................................................................................................................... 61 3.2 Abstract .................................................................................................................................... 63 3.3 Background .............................................................................................................................. 64 3.4 Methods ................................................................................................................................... 66 3.4.1 Construction of Database of Evidence for Precision Oncology (DEPO) .................................... 66 3.4.2 Pan-Cancer Cohort and Cancer Types ...................................................................................... 67 3.4.3 Collection of Mutations in Pan-Cancer Cohort ......................................................................... 68 3.4.4 Drug-associated Mutations in Pan-Cancer Cohort ................................................................... 68 3.4.5 Proximity-Based Clustering of Drug-associated Mutations with Pan-Cancer Cohort ............... 70 3.4.6 Druggable Expression Outliers in Pan-Cancer Cohort ............................................................... 71 3.4.7 Fusion Analysis .......................................................................................................................... 72 3.4.8 Proteomic Analysis with CPTAC Mass-Spectrometry Data ....................................................... 72 3.4.9 Cell Line Based Validation ......................................................................................................... 73 3.4.10 Experimental Validation ......................................................................................................... 75 3.4.11 Integrative Omics Analysis of Druggability ............................................................................. 76 3.4.12 Druggability and Demographics .............................................................................................. 76 3.5 Results ...................................................................................................................................... 77 3.5.1 Database of Evidence for Precision Oncology .......................................................................... 77 3.5.2 Drug-associated Mutations in Pan-Cancer Cohort ................................................................... 78 3.5.3 Protein Structure-Based Clustering of Drug-Associated Mutations ......................................... 81 3.5.4 Druggable Gene and Protein

Load more