Analytics for Improved Cancer Screening and Treatment John
Total Page:16
File Type:pdf, Size:1020Kb
Analytics for Improved Cancer Screening and Treatment by John Silberholz B.S. Mathematics and B.S. Computer Science, University of Maryland (2010) Submitted to the Sloan School of Management in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Operations Research at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2015 ○c Massachusetts Institute of Technology 2015. All rights reserved. Author................................................................ Sloan School of Management August 10, 2015 Certified by. Dimitris Bertsimas Boeing Leaders for Global Operations Professor Co-Director, Operations Research Center Thesis Supervisor Accepted by . Patrick Jaillet Dugald C. Jackson Professor Department of Electrical Engineering and Computer Science Co-Director, Operations Research Center 2 Analytics for Improved Cancer Screening and Treatment by John Silberholz Submitted to the Sloan School of Management on August 10, 2015, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Operations Research Abstract Cancer is a leading cause of death both in the United States and worldwide. In this thesis we use machine learning and optimization to identify effective treatments for advanced cancers and to identify effective screening strategies for detecting early-stage disease. In Part I, we propose a methodology for designing combination drug therapies for advanced cancer, evaluating our approach using advanced gastric cancer. First, we build a database of 414 clinical trials testing chemotherapy regimens for this cancer, extracting information about patient demographics, study characteristics, chemother- apy regimens tested, and outcomes. We use this database to build statistical models to predict trial efficacy and toxicity outcomes. We propose models that use machine learning and optimization to suggest regimens to be tested in Phase II and III clinical trials, evaluating our suggestions with both simulated outcomes and the outcomes of clinical trials testing similar regimens. In Part II, we evaluate how well the methodology from Part I generalizes to advanced breast cancer. We build a database of 1,490 clinical trials testing drug therapies for breast cancer, train statistical models to predict trial efficacy and toxicity outcomes, and suggest combination drug therapies to be tested in Phase II and III studies. In this work we model differences in drug effects based on the receptor status of patients in a clinical trial, and we evaluate whether combining clinical trial databases of different cancers can improve clinical trial toxicity predictions. In Part III, we propose a methodology for decision making when multiple mathe- matical models have been proposed for a phenomenon of interest, using our approach to identify effective population screening strategies for prostate cancer. We implement three published mathematical models of prostate cancer screening strategy outcomes, using optimization to identify strategies that all models find to be effective. Thesis Supervisor: Dimitris Bertsimas Title: Boeing Leaders for Global Operations Professor Co-Director, Operations Research Center 3 4 Acknowledgments I would first like to gratefully acknowledge my advisor Dimitris Bertsimas forhis role in the work presented in this dissertation and also for helping me mature as a researcher during my time at MIT. By adding me to his team studying the synthesis of clinical trial outcomes during the first month of my PhD, Dimitris introduced me to the research questions addressed in the first two chapters of my thesis and more broadly to the area of healthcare analytics, the focus of my entire dissertation. Dimitris’s unrelenting focus on research questions with real-world impact has helped cement in my mind the sorts of problems I should pursue as a researcher, something that will benefit me throughout my research career. I would also like to acknowledge the guidance of Vivek Farias and Retsef Levi. By serving on both my general exam committee and my thesis committee, Vivek and Retsef provided thoughtful feedback about my research that had a significant impact on the material present in this dissertation. I would like to acknowledge my collaborators on the work described in this thesis. It was a pleasure working with Stephen Relyea and Allison O’Hair on the study of gastric cancer clinical trials, which was awarded the 2013 Pierskalla Award from the INFORMS Health Applications Society. The study of breast cancer clinical trials would not have been possible without assistance in the literature review and data ex- traction from Allison O’Hair and from a number of undergraduate research assistants from both MIT and Wellesley — Emily Chen, Michael Chen, Shahrin Islam, Siva Nagarajan, David Sukhin, Pei Tao, Roza Trilesskaya, Victoria Wang, Mimi Williams, and Joanna Yeh. Finally, it has been a pleasure to collaborate with Brown Professor Tom Trikalinos on the prostate cancer screening work, and I would also like to thank MIT librarian Courtney Crummett for her assistance in the literature review for that thesis chapter. Throughout my PhD I have enjoyed spending time with and learning from fellow ORC students. In particular I wanted to thank Nataly Youssef and Swati Gupta for the great times as my friends and study buddies; Iain Dunning and Swati Gupta 5 for an enjoyable and productive research collaboration on the MQLib project; and Allison O’Hair, Iain Dunning, Angie King, Velibor Mišić, and Nataly Youssef for a successful collaboration in developing the Analytics Edge MOOC. Lastly, I wanted to thank my family for their support. My parents have always encouraged my interest in math and programming and my father introduced me to operations research in high school by connecting me with Bruce Golden, his advisor for his master’s degree. Thanks also to my in-laws for their kind words and support while I worked through my PhD. Finally, many thanks to my wife Sarah for all she has done to help me through the PhD, from serving as a sounding board for research ideas to providing feedback on manuscripts to celebrating successes to providing support through difficulties. 6 Contents 1 Introduction 19 1.1 Models to Design Drug Regimens for Advanced Cancer . 20 1.2 Optimizing Screening Strategies Using Multiple Mathematical Models 22 2 Designing Chemotherapy Regimens for Advanced Gastric Cancer 25 2.1 Introduction . 25 2.2 Clinical Trial Database . 31 2.2.1 Manual Data Collection . 33 2.2.2 An Overall Toxicity Score . 34 2.3 Statistical Models Predicting Clinical Trial Outcomes . 37 2.3.1 Data and Variables . 38 2.3.2 Statistical Models . 39 2.3.3 Statistical Model Results . 41 2.3.4 Identifying Unpromising Clinical Trials Before They Are Run 45 2.4 Design of Chemotherapy Regimens . 48 2.4.1 Phase II Regimen Optimization Model . 48 2.4.2 Phase III Regimen Selection Model . 52 2.4.3 Evaluation Techniques . 54 2.4.4 Optimization Results . 58 2.5 Discussion and Future Work . 62 3 Designing Drug Therapies for Advanced Breast Cancer 65 3.1 Introduction . 65 7 3.2 Breast Cancer Clinical Trial Database . 68 3.2.1 Manual Data Collection . 69 3.3 Statistical Models for Breast Cancer Outcomes . 73 3.3.1 Results . 75 3.3.2 Identifying Unpromising Trials . 77 3.4 Combined Databases for Toxicity Prediction . 79 3.4.1 Statistical Model Results . 81 3.5 Design of Chemotherapy Regimens . 82 3.6 Discussion . 88 4 Identifying Effective Screening Strategies for Prostate Cancer 91 4.1 Introduction . 91 4.2 Methods . 94 4.2.1 Model Parameters . 95 4.2.2 Quality of Life . 95 4.2.3 Screening Strategies . 96 4.3 Results . 97 4.3.1 Sensitivity Analysis . 100 4.4 Discussion . 102 A Data Preprocessing for Gastric Cancer Database 105 A.1 Performance Status . 105 A.2 Grade 4 Blood Toxicities . 106 A.3 Proportion of Patients with a DLT . 107 B Coefficients of Models Predicting Gastric Cancer Outcomes 111 C Pseudocode of Evaluation Techniques for Proposed Regimens 115 D Data Preprocessing for Breast Cancer Database 119 D.1 Imputation of Demographic Data . 119 D.2 Imputation of Toxicity Outcomes . 122 8 E Drugs Appearing in Breast Cancer Clinical Trial Database 125 F Literature Review Details 129 G Ranges Used in Sensitivity Analysis 131 H Building an Efficient Frontier of Screening Strategies 135 9 10 List of Figures 2-1 The results of the 372 clinical trial arms in our database with non- missing median OS and DLT proportion. The size of a point is pro- portional to the number of patients in that clinical trial arm. 37 2-2 Sequential out-of-sample prediction accuracy of survival models calcu- lated over 4-year sliding windows ending in the date shown, reported as the coefficient of determination (R2). 42 2-3 Four-year sliding-window sequential out-of-sample classification accu- racy of toxicity models, reported as the area under the curve (AUC) for predicting whether a trial will have high toxicity (DLT proportion > 0.5). 44 2-4 Performance of the ridge regression model for toxicity at flagging high- toxicity trials among the 397 trials arms in the testing set. 46 2-5 Performance of the ridge regression model for median OS at flagging trials that do not achieve top-quartile efficacy among the 397 trials arms in the testing set. 47 2-6 Median OS and DLT proportion for chemotherapy regimens that could potentially be selected for a Phase III trial experimental arm by our models. The left figure plots outcomes of the Phase II study that tested each regimen, and the right figure plots predicted outcomes for the regimen in a typical Phase III clinical trial patient population. The red points show the top five trials according to the data reported inthe Phase II trials, and the green points show the top five trials according to the prediction models. 54 11 2-7 Comparison between Phase III experimental arm suggestions (n=27) from our models and from current practice according to the simulation metric.