Doctoral Dissertation by Yongsheng Huang
Total Page:16
File Type:pdf, Size:1020Kb
Integrative Statistical Learning with Applications to Predicting Features of Diseases and Health by Yongsheng Huang A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Bioinformatics) in The University of Michigan 2011 Doctoral Committee: Professor Alfred O. Hero III, Co-Chair Professor Jay L. Hess, Co-Chair Professor Daniel Burns Jr Professor Gilbert S. Omenn Associate Professor Kerby Shedden © Yongsheng Huang 2011 All Rights Reserved I dedicate this dissertation to my parents and my sisters. It is their unconditional love that gave me the courage and perseverance to continue on this long and winding road towards personal and professional improvement. For so many years, they have quietly and patiently waited for me to grow up. I dedicate this work to my true friend Jiehua Guo who is like my brother and helped me tremendously at many critical moments along this journey. I also dedicate my dissertation to the University of Michigan for granting me such a privilege to its invaluable educational resources. The time I spent studying here will always be one of the most significant parts of my life. ii ACKNOWLEDGEMENTS This dissertation is not even remotely possible without the guidance from Professor Alfred Hero, period. Professor Hero brought me into the world of mathematical statistics and taught me the true meaning of statistical thinking. He often worked with me late into the night and early in the morning, going through each analysis that I have performed and every sentence of manuscripts that I have written. He demonstrated the dedication and rigorous attitude towards science. Most important, he always challenged my intellectual capacity by showing me multiple elegant statistical approaches to a problem. That is the most powerful formula to motivate my desire to study more and think harder. And that is when I overcome the laziness inside of me and abandon the temptation to settle for easy solution. Dr. Jay Hess, kindly welcomed me into his laboratory to work with him and his team on the important problem of Hoxa9 protein. His scientific vision provided the biggest support to my research at the most challenging time when things do not piece together and make sense. He encouraged me to take the responsibility and ownership of my research and accepted my mistakes with forgiveness. As a mentor, both Prof. Hero and Dr. Hess genuinely care about my career development. They provided me complete academic freedom to pursue my research interests and to develop my professional skills. They showed, by example, the passion, scholarship, and mastery to scientific research. They emphasize and foster independent thinking and problem-solving ability. There has never been a doubt in my mind that I was granted a once-in-a-lifetime privilege to work with these two great mentors. iii During my study in Bioinformatics, Dr. Omenn has always been on my side whenever I make important decisions. From finding academic mentors to choosing post-graduation career path, I have always been prepared and blessed with his wisdom, encouragement, and positive energy. Professor Daniel Burns went his way to help me even before I arrived at Michigan. Over the years, he helped me so many times that I lost my count. But I do know, my experience would have been much much harder without his help. Professor Kerby Shedden is the first committee member I met before everyone else. He interviewed me on the recruitment day and we got to know each other since. He is kind and supportive to me and my research. But, what I would say if i were asked what I will always remember from all my interactions with him? Without a second of doubt, I would say it has to be the four words he gave me on the study of statistics — “know the stuff cold”. Plain simple! I never stopped working on it. Over the years, I have also been very fortunate to have the opportunities to collaborate with four groups of exceptional researchers from all around the world. I thank Dr. Aimee Zaas, Dr. Geoffrey Ginsberg, Dr. Christopher Woods, Dr. Timothy Veldman, Ms. Chris- tine Øien in the Duke University for the exceptional challenge study they have managed and the invaluable discussion they provided to my research. I also thank members of the Hess laboratory, especially Dr. Kajal Sitwala, Joel Bronstein, Daniel Sanders, and Mon- isha Dandekar. They have provided me superb data and biological insights. I thank Dr. Gordon Robertson and Mr. Timothee Cezard at the Genome Science Center in Vancouver, Canada. They generously shared with me their valuable knowledge on ChIP-sequencing. Particularly, I thank Dr. Robertson who literally taught me everything that I needed to know about next-generation sequencing analysis and showed me how quality research is done in high speed. I am also thankful to the members of Hero group, particularly Dr. Mark Kliger, Dr. Arvind Rao, Dr. Patrick Harrington, Mr. Yilun Chen, Mr. Kevin Xu, and iv Mr. Arnau Tibau Puig, who helped my research in many ways. They always reminded me how similar statistics problems are approached from engineering perspectives — different disciplines, distinct applications, but same magic. I am grateful to Ms. Julia Eussen, Ms. Denise Taylor-Moon, and Ms. Michelle Curry in the Bioinformatics Program; Ms. Lynn McCain in the Department of Pathology; Ms. Michele Feldkamp in the Department of Electrical Engineering and Computer Science. Their wonderful assistance have protected me from things that would have taken lot of time away from my research. I especially want to thank Ms. Linda Kents in the UM International Center for her extraordinary support and advices on critical international student issues. v TABLE OF CONTENTS DEDICATION ::::::::::::::::::::::::::::::::::: ii ACKNOWLEDGEMENTS :::::::::::::::::::::::::::: iii LIST OF FIGURES :::::::::::::::::::::::::::::::: x LIST OF TABLES ::::::::::::::::::::::::::::::::: xiii ABSTRACT ::::::::::::::::::::::::::::::::::::: xiv CHAPTER I. Introduction ................................1 1.1 Research Overview........................1 1.2 Outline of Dissertation.......................4 1.3 Contributions of Dissertation...................5 1.4 List of Relevant Publications and Softwares...........8 II. Temporal Dynamics of Host Molecular Responses Differentiate Symp- tomatic and Asymptomatic Influenza A Infection ........... 10 2.1 Introduction............................ 10 2.2 Results............................... 12 2.2.1 Temporal gene expression profiling of host transcrip- tional response..................... 12 2.2.2 Screening for genes with different temporal profiles between asymptomatic and symptomatic hosts.... 12 2.2.3 Co-clustering differentially expressed genes based on temporal expression dynamics............. 13 2.2.4 Intense activation of TLR and non-TLR mediated sig- naling in symptomatic subjects............. 15 2.2.5 A non-passive asymptomatic state is characterized by down-regulated expression of the NLRP3 inflamma- some, CASP5 and the IL1B pathway......... 17 vi 2.2.6 Distinct temporal kinetics of JAK-STAT pathway and SOCS family genes reveals a potential method of viral control in asymptomatic hosts............. 19 2.2.7 Ribosomal protein synthesis is enhanced in asymptomatic subjects as compared to symptomatic subjects..... 21 2.2.8 Unsupervised detection of disease signature with Bayesian Liner Unmixing (BLU)................. 22 2.2.9 Early and late phase disease stratification using a logis- tic boosting model................... 23 2.3 Discussion............................. 24 2.4 Materials and Methods...................... 28 2.5 Acknowledgements........................ 33 2.6 Supplementary Materials..................... 33 2.7 Supplementary Discussion..................... 46 III. Towards Early Detection: Temporal Spectrum of Host Response in Symptomatic Respiratory Viral Infection ................ 79 3.1 Introduction............................ 79 3.2 Results............................... 81 3.2.1 Similarity clustering: analysis of differential expres- sion for temporal profiling............... 81 3.2.2 Discriminatory clustering: Analysis of differential ex- pression for disease state prediction.......... 91 3.3 Discussion............................. 97 3.4 Materials and Methods...................... 102 IV. Identification of Hoxa9 and Meis1 Regulatory Functions ....... 129 4.1 Introduction............................ 129 4.2 Results............................... 131 4.2.1 High confidence Hoxa9/Meis1 (H/M) binding sites were determined with ChIP-seq analysis........... 131 4.2.2 Genome-wide analysis showed dominant distal bind- ing of Hoxa9 and Meis1................ 132 4.2.3 Hoxa9 and Mesi1 selectively bind to DNA sequences that are highly evolutionarily conserved........ 133 4.2.4 H/M peaks show high potential of regulatory functions 133 4.2.5 H/M peaks show epigenetic signatures that are charac- teristic of enhancers.................. 134 4.2.6 Temporal gene expression reveals Hoxa9 regulation on genes mediating proliferation, inflammation and differ- entiation........................ 135 vii 4.2.7 De novo motif discovery suggests binding of H/M col- laborators........................ 137 4.2.8 Motif enrichment analysis (MEA) revealed tiered or- ganization of transcriptional control.......... 139 4.2.9 Epigenetic state at H/M peaks are correlated with spe- cific motif configuration................ 141 4.3 Discussion............................. 142 4.4 Materials and Methods...................... 145 4.4.1 Statistical Analysis..................