Integrative Analysis of Multi-Modality Data in Cancer
Total Page:16
File Type:pdf, Size:1020Kb
INTEGRATIVE ANALYSIS OF MULTI-MODALITY DATA IN CANCER DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Chao Wang Graduate Program in Electrical and Computer Engineering The Ohio State University 2015 Dissertation Committee: Dr. Kun Huang, Advisor Dr. Raghu Machiraju, co-advisor Dr. Umit V. Catalyurek Dr. Charles L. Shapiro Dr. Lori A. Dalton Copyright by Chao Wang 2015 Abstract Gleaning insights of highly complex, heterogeneous cancer biology requires data collected from different levels - genetic, genomic and phenotypic. There is a high degree of diversity between individuals with a wide spectrum of clinical, pathologic, and molecular features. Traditionally in clinical settings, phenotypic data such as histopathological images are often used for diagnosis, subtyping, staging, prognosis and treatment. With the advent of new high-throughput biotechnologies, multi-modality of genomics and genetic data provide extremely valuable information for cancer research and clinical biomarker discovery. However, the challenge still remains towards the determination of causal relationship in these multi-modality data and effective integration to gain better understanding of cancer biology. In particular, molecular basis of cellular phenotypes manifest in histopathological images are unknown and remain inexplicable. In this dissertation, I present a new analytic framework and accompanying computational methods to facilitate integrative analyses of multi-modality biomedical data. The first part of this volume describes the extraction of image features thus enabling quantitative analysis of the cellular structures. Our feature collections include texture features, previously discovered salient features and features designed to mimic the observations of a trained pathologist. In the next part, studies that establish the genotype– phenotype links using morphological features from histopathology are presented. Molecules and molecular events associated with breast cancer morphology are ii discovered. In the third part, beyond pairwise correlations, I explore multivariate molecular basis of lung adenocarcinoma morphology. This study suggests that a cellular structure can be potential target in treatment of lung adenocarcinoma. Finally, the last part aims to develop computational methods that can jointly cluster cancer patient samples based on multi-modality data. These effective integrative cluster methods allow patient stratification based on both essential categorical attributes and multi-dimensional data from different sources. I demonstrate the application of these methods using datasets pertaining to breast cancer. The proposed image processing workflows, the collection of morphological features, the analytical framework that links molecular expression to morphological measurements, and the integrative clustering methods show potential in revealing biological basis and new therapeutic targets of various types of cancer. The results from the studies indicate biologically interesting subtypes with potential biomarkers. The frameworks and methodologies presented in this dissertation can mine the large and complex collections of data to identify new comprehensive biomarkers generate new hypothesis. iii Dedication This document is dedicated to my family. iv Acknowledgments First and foremost, I would like to thank my research advisor Dr. Kun Huang for the support and guidance he has given me throughout the entire period of my work. He spent days revising my badly-written papers and provided me with scientific thinking that I am lack of as an engineering student. It has been a great pleasure to work with him. Without his help, it would be impossible for me to finish this research. I would also like to thank my co-advisor Dr. Raghu Machiraju for his unconditional help and suggestions. His intellectual and philosophical guidance always reminds me to keep thinking deep. He provided with many opportunities such as fellowships and internship during my study. He also spent many nights reviewing and revising my papers. I am grateful for his help. I thank Dr. Shapiro for his collaboration and providing wonderful resources for my study, including deep clinical knowledge. I'm grateful to Dr. Catalyurek and Dr. Dalton for being part of my dissertation committee, and for providing wonderful ideas and constructive comments. I thank Dr. Parag Mallick from Stanford for his helpful discussions and support for trip to Stanford. I would like thank Dr. Debra Zynger for her help on the pathological knowledge. I would like thank Dr. Lin Yang’s labs for collaboration with the lung adenocarcinoma project. I'm very thankful to Dr. Yang Xiang for his willingness to help. I thank lab members Cenny Taslim, Hao Ding, Nan Meng, Qihang Li, Michael Sharpnack and Brian Arand v and all. The Ph.D. journey would be harder, if it was not support from them. The chalk- board discussions and exciting ideas are most unforgettable things during the days. I want to thank friends I know at OSU, Cong Wang, Dong Zhang, Xinyu Li and friends I know in Bay area. I would like to than visiting scholars Drs. Zhi Han, Hongxing Yuan and Shengjun Xu for their great advice. I'd like to acknowledge financial support from The Ohio State University and The Howard Hughes Medical Institute Grad-to-Med fellowship and Pelotonia fellowship. Most importantly, I’m very grateful to my parents for their kindness and endless support. I feel guilty for not accompanying them during my 8 years of graduate school and 5 years of undergraduate. Without their support, I would not complete my 13 years journey of study in universities. I would like to thank my fiancée Yue for standing by me through good times and bad times. I owe my family everything and I will spend the rest of life loving them and appreciating them. vi Vita 2007................................................................B.S Electrical and Computer Engineering, Dalian University of Technology, China 2009................................................................M.S Electrical and Computer Engineering, Dalian University of Technology, China 2009-10 ..........................................................Software Engineer, Ericsson, Beijing, China 2013................................................................M.S Electrical and Computer Engineering, The Ohio State University 2010-12 ..........................................................Graduate Research Assistant, Electrical and Computer Engineering, The Ohio State University 2010-12 ..........................................................OSU-HHMI Med-Into-Grad Fellow, The Ohio State University 2014 to present ..............................................Pelotonia Graduate Fellow, The Ohio State University vii Publications Wang C, Pécot T, Zynger DL, Machiraju R, Shapiro C L, Huang K. Identifying survival associated morphological features of triple negative breast cancer using multiple datasets. J Am Med Inform Assoc. 2013;20(4):680–7. Wang C, Machiraju R, Huang K. Cancer Patient Integrative Stratification via a Two-step Consensus Clustering of Molecular Expression and Clinical Attributes. AMIA Sumimt on translational bioinformatics. 2014. Wang C, Machiraju R, Huang K. Breast cancer patient stratification using a molecular regularized consensus clustering method. Methods. 2014 Jun 1;67(3):304-12. Ding H, Wang C, Huang K, Machiraju R. iGPSe: A visual analytic system for integrative genomic based cancer patient stratification. BMC Bioinformatics. 2014;15(1):203. Hu Y, Wang C, Huang K, Xia F, Parvin JD, Mondal N. Regulation of 53BP1 Protein Stability by RNF8 and RNF168 Is Important for Efficient DNA Double-Strand Break Repair. PLoS One. 2014;9(10):e110522. viii Wu, Y, Kwak K J, Agarwal K, Marras A, Wang C, Mao Y, … Lee L J. Detection of extracellular RNAs in cancer and viral infection via tethered cationic lipoplex nanoparticles containing molecular beacons. Anal Chem. 2013;85(23):11265–11274. Gao P, Postiglione M P, Krieger T G, Hernandez L, Wang C, Han Z, … Shi, S-H. Deterministic Progenitor Behavior and Unitary Production of Neurons in the Neocortex. Cell. 2014;159(4):775–788. Fields of Study Major Field: Electrical and Computer Engineering ix Table of Contents Abstract ............................................................................................................................... ii Dedication .......................................................................................................................... iv Acknowledgments............................................................................................................... v Vita .................................................................................................................................... vii Table of Contents ................................................................................................................ x List of Tables .................................................................................................................. xvii List of Figures .................................................................................................................. xix Chapter 1 : Introduction ...................................................................................................... 1 1.1 Motivation ................................................................................................................. 1 1.2 Background ..............................................................................................................