Computational Approaches to Somatic Exome Sequencing Data Runjun Kumar Washington University in St

Computational Approaches to Somatic Exome Sequencing Data Runjun Kumar Washington University in St

Washington University in St. Louis Washington University Open Scholarship Arts & Sciences Electronic Theses and Dissertations Arts & Sciences Spring 5-15-2018 Discerning Drivers of Cancer: Computational Approaches to Somatic Exome Sequencing Data Runjun Kumar Washington University in St. Louis Follow this and additional works at: https://openscholarship.wustl.edu/art_sci_etds Part of the Bioinformatics Commons, and the Oncology Commons Recommended Citation Kumar, Runjun, "Discerning Drivers of Cancer: Computational Approaches to Somatic Exome Sequencing Data" (2018). Arts & Sciences Electronic Theses and Dissertations. 1552. https://openscholarship.wustl.edu/art_sci_etds/1552 This Dissertation is brought to you for free and open access by the Arts & Sciences at Washington University Open Scholarship. It has been accepted for inclusion in Arts & Sciences Electronic Theses and Dissertations by an authorized administrator of Washington University Open Scholarship. For more information, please contact [email protected]. WASHINGTON UNIVERSITY IN ST. LOUIS Division of Biology and Biomedical Sciences Computational and Systems Biology Dissertation Examination Committee: Ron Bose, Chair Donald F. Conrad Li Ding Obi L. Griffith Daniel C. Link S. Joshua Swamidass Discerning Drivers of Cancer: Computational Approaches to Somatic Exome Sequencing Data by Runjun D. Kumar A dissertation presented to The Graduate School of Washington University in partial fulfillment of the requirements for the degree of Doctor of Philosophy May 2018 St. Louis, Missouri © 2018, Runjun D. Kumar Table of Contents List of Figures ................................................................................................................................. v List of Tables ................................................................................................................................. vi Acknowledgments......................................................................................................................... vii Abstract .......................................................................................................................................... ix 1. Introduction ................................................................................................................................. 1 1.1 Cancer Genome Sequencing ............................................................................................ 2 1.2 Converting Mutations to Treatments................................................................................ 3 1.3 Specific Approaches ......................................................................................................... 5 1.3.1 Identifying Cancer Genes ................................................................................................. 5 1.3.2 Predicting Mutation Functional Impact ............................................................................. 7 1.3.3 Identifying Tumor Drivers in the Kinome ....................................................................... 10 1.4 Connectedness of Approaches ............................................................................................ 12 1.5 Candidate Contributions ...................................................................................................... 13 2. Identifying Cancer Genes ......................................................................................................... 14 2.1 Introduction .................................................................................................................... 14 2.2 Materials and Methods ................................................................................................... 15 2.2.1 Data Gathering and Quality Control ...................................................................................... 15 2.2.2 HiConf Cancer Gene Panel Construction ............................................................................... 15 2.2.3 Comparison Tools ................................................................................................................. 16 2.2.4 Calculation of Individual Tests .............................................................................................. 17 2.2.5 Imputation of Missing Data ................................................................................................... 20 2.2.6 Generation of Ensemble Model ............................................................................................. 20 2.2.7 Assembly of Validation Gene Panels ..................................................................................... 21 2.2.8 Cancer Subset Analysis ......................................................................................................... 22 2.2.9 Statistics and Software .......................................................................................................... 22 2.2.10 Data Availability ................................................................................................................. 22 2.3 Results ............................................................................................................................ 22 2.3.1 Description of Data ............................................................................................................... 22 2.3.2 Developing a Panel of Known Cancer Genes ......................................................................... 22 ii 2.3.3 Assessing Individual Tests .................................................................................................... 23 2.3.4 Integration into a Single Model ............................................................................................. 25 2.3.5 Detection of Validation Gene Panels ..................................................................................... 26 2.3.6 Predicted Cancer Genes ......................................................................................................... 27 2.3.7 Application to Specific Cancer Types .................................................................................... 28 2.4 Discussion ........................................................................................................................... 30 3. Identifying Drivers with Parsimony.......................................................................................... 48 3.1 Introduction ......................................................................................................................... 48 3.2 Materials and Methods ........................................................................................................ 49 3.2.1 Data Gathering and Quality Control ...................................................................................... 49 3.2.2 Mutation Level Descriptors ................................................................................................... 50 3.2.3 Gene Level Descriptors ......................................................................................................... 51 3.2.4 Imputation and Data Scaling ................................................................................................. 51 3.2.5 Adapting the Expectation-Maximization Algorithm ............................................................... 52 3.2.6 Learning Initialization ........................................................................................................... 52 3.2.7 The E-step ............................................................................................................................. 53 3.2.8 The M-step ............................................................................................................................ 55 3.2.9 Algorithm Stop and Model Training ...................................................................................... 55 3.2.10 Methodological Controls ..................................................................................................... 56 3.2.11 AUROCs for Measuring Performance ................................................................................. 57 3.2.12 Statistics and Software ........................................................................................................ 57 3.2.13 Code Availability & URLs .................................................................................................. 58 3.3 Results ................................................................................................................................. 58 3.3.1 ParsSNP overview ................................................................................................................. 58 3.3.2 Datasets & Analysis Design .................................................................................................. 59 3.3.3 ParsSNP Training, Robustness and Performance ................................................................... 61 3.3.4 Testing ParsSNP with Pan-Cancer Data ................................................................................. 63 3.3.5 Testing ParsSNP with Experimental Data .............................................................................. 64 3.3.6 Summary of ParsSNP Performance ....................................................................................... 65 3.3.7 Application of ParsSNP to an Independent Dataset ................................................................ 66 3.3.8 ParsSNP and Novel Driver Identification..............................................................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    146 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us