PARSANA-DISSERTATION-2020.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
DECIPHERING TRANSCRIPTIONAL PATTERNS OF GENE REGULATION: A COMPUTATIONAL APPROACH by Princy Parsana A dissertation submitted to The Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy Baltimore, Maryland July, 2020 © 2020 Princy Parsana All rights reserved Abstract With rapid advancements in sequencing technology, we now have the ability to sequence the entire human genome, and to quantify expression of tens of thousands of genes from hundreds of individuals. This provides an extraordinary opportunity to learn phenotype relevant genomic patterns that can improve our understanding of molecular and cellular processes underlying a trait. The high dimensional nature of genomic data presents a range of computational and statistical challenges. This dissertation presents a compilation of projects that were driven by the motivation to efficiently capture gene regulatory patterns in the human transcriptome, while addressing statistical and computational challenges that accompany this data. We attempt to address two major difficulties in this domain: a) artifacts and noise in transcriptomic data, andb) limited statistical power. First, we present our work on investigating the effect of artifactual variation in gene expression data and its impact on trans-eQTL discovery. Here we performed an in-depth analysis of diverse pre-recorded covariates and latent confounders to understand their contribution to heterogeneity in gene expression measurements. Next, we discovered 673 trans-eQTLs across 16 human tissues using v6 data from the Genotype Tissue Expression (GTEx) project. Finally, we characterized two trait-associated trans-eQTLs; one in Skeletal Muscle and another in Thyroid. Second, we present a principal component based residualization method to correct gene expression measurements prior to reconstruction of co-expression networks. In this work, we demonstrated theoretically, in simulation, and empirically, that principal ii component correction of gene expression measurements prior to network inference can reduce false positive edges. Using data from the GTEx project in multiple tissues, we showed that this approach reduced false discoveries beyond correcting only for known confounders. Third, we present a multi-study integration approach to identify universal tran- scriptional patterns underlying epithelial to mesenchymal transition (EMT) across different cancer types. With informed statistical analysis and functional validation, we identified consensus ranked universal EMT genes. This gene list consisted ofa) known EMT genes, b) genes studied in a subset of carcinomas, unknown in prostate cancer, and c) novel unknown EMT and cancer genes such as C1orf116. Finally we present methods to integrate co-expression signals across multiple human RNA-seq data to reconstruct networks with increased power. First, we considered multiple aggregation strategies to build context-agnostic networks using data from recount2. These networks captured ubiquitous patterns of gene co-expression shared across tissues and cell types. Next, we briefly describe a hierarchical mixture model groupNet that leverages signal from multiple datasets to learn the structure of a Gaussian Markov random field (GRMF) to build context-specific co-expression networks. iii Thesis Readers Dr. Alexis Battle (Primary Advisor) Associate Professor Department of Biomedical Engineering Department of Computer Science Johns Hopkins University Dr. Kenneth J. Pienta (Co-advisor) Professor Department of Urology Johns Hopkins University Dr. Michael C. Schatz Associate Professor Department of Computer Science Department of Biology Johns Hopkins University iv For my mom Bharti, and my husband Anand v Acknowledgements I’m deeply grateful to all those who have supported me through my Ph.D - in terms of science and in terms of life. You all have helped me be one whole person through this journey. I owe it to so many people. I would like to express my deepest gratitude to my advisor Alexis Battle for her constant support, guidance, and mentorship throughout my Ph.D. I am grateful to her for taking a gamble on me as her first Ph.D. student, and bringing me to this day. Alexis welcomed me as the first member of Battle lab when I was relatively new to quantitative concepts. She provided me with a rich intellectual environment and showed tremendous patience while I learned statistical concepts and machine learning. Alexis has been a role model right from the beginning, and over the years I have only come to realize how hard it is to be as good as her. I continue to be amazed by her creativity, intelligence, and scientific rigor. In addition to science, I have valued the kindness and understanding that Alexis has shown in all these years. I could not have asked for a better Ph.D. advisor. Thank you for making this ride worthwhile; thank you for everything, Alexis. You will be always be my advisor. I am thankful to my co-adviser Kenneth Pienta, for his patience and support through these years. Ken has continuously emphasized the importance of communi- cating scientific research in a way that is accessible to a wide audience. In additionto science, he has invested time into helping me develop my presentation skills. I have appreciated his understanding when projects did not work. I am extremely thankful to Ken for his selfless support in all endeavours that I took up during my Ph.D. Thank vi you for Ken, for always believing in me. I am deeply grateful to both Alexis and Ken for moulding me into the scientist that I am today. Michael Schatz, has been a source of encouragement. His scientific and career advice during different stages of graduate school have been invaluable. Thank you Mike, for always believing in me, and keeping my confidence high. I would like to thank Jeffrey Leek for the collaboration due to which I had the opportunity towork on networks correction project. I am thankful for his scientific advice on the project, and his time and mentorship when I was trying to make career decisions. I would like to thank Michael Schatz, Jeffrey Leek, and Benjamin Langmead also for agreeing to be on my GBO committee. In addition, I would also like to thank Scott Smith, the Graduate Program Director of Computer Science for his support for the founding of CS Graduate Student Council. Scott has always been a student supporter, and I am thankful to him and Zacchary Burwell for many instances when they helped and advised us (the council members) to navigate and address student concerns. I would like to thank staff members across Hopkins who helped me so much during these years - Deborah DeFord, Zachhary Burwell, Tracy Marshall, Cecelia Nina Jackson-Goode, Laura Graham, Shani McPherson, Cathy Thornton, Beverly Comegys, Diane Labuda, and Sarah Andersson. I am also thankful to Gregory Hager; it was through him that I found Alexis was going to join Johns Hopkins. It has been a pleasure to work with many talented collaborators including Brian Jo, Barbara Englehardt, Meritxell Olivia, Barbara Stranger Tuuli Lappalainen, Stephen Montgomery, Christopher Brown, Andrew Jaffe, Kasper Hansen, Claire Ruberman, James Hernandez, Sarah Amend, Gonzalo Torga, Dorhyun Johng, and William Isaacs. I am also grateful to the members of the GTEx consortium; along with great scientific vii collaborations, I’ve also found friends by the virtue of being on the consortium. Working on GTEx papers has emphasized the importance and value of team work like nothing else. I am so grateful to the tutors and organizers of the Leena Peltonen School of Human Genomics that I attended in August 2019 - Manolis Dermitzakis, Nancy Cox, Ceciliar Lindgren, Tuuli Lappalainen, Sara Pulit, Alexandre Reymond, Jacques Fellay, Ruth Loos, Len Pennacchio, Samuli Ripatti, Didier Trono, Gosia Trynka, and Cisca Wijmenga. It was the best meeting of my graduate school. I am also thankful to the organizers of the Rising Stars Biomedical workshop that I attended in 2019; particularly Archana Venkatraman, Nicholas Durr, and Sri Sarma for their valuable inputs during the workshop. There have been many people in and outside of Hopkins who have helped me in my journey as a growing scientist. I would first like to thank Giovanni Parmigianni for responding to a cold email,and giving me the opportunity to intern with his group at Dana Farber Cancer Institute in 2012. Thank you to Curtis Huttenhower for co-advising me during the internship. This internship marked the beginning of my research career. In particular, I am grateful to Levi Waldron for all his patience during the internship and teaching me everything from how to organize my code, how to interpret results, to research writing. I am amazed by his kindness and patience, and his dedication towards science. My positive experience working with Giovanni, Levi, and Curtis was the reason I decided to go to graduate school. Not having studied at one of the top schools in India, this internship was very critical for my career. Thank you Giovanni, that email indeed opened doors for nearly all the opportunities that have followed. I owe it to you. I have been fortunate to be surrounded by individuals who have given their time to share thoughts about life and career - particularly thankful to Hariharan Easwaran, Sara Pulit, Cecilia Lindgren, Leonardo Collado-Torres, Abhinav Nellore, Stephanie viii Hicks, Margaret Taub, and Shannon Ellis. They have been generous with sharing their experiences, while encouraging me to choose what worked best for me. Waking up with the excitement to go to work is the sign of a good workplace and I have found that in my labmates - a group of smart, intelligent, and yet humble folks, who all strive to maintain a positive and friendly work environment. I have enjoyed long technical discussions that helped me grow as a researcher, and the tea/coffee breaks that helped me unwind and relax. Thank you - Benjamin Strober, Yuan He, Ashis Saha, Marios Arvanitis, Prashanthi Ravichandran, Amy He, Karl Tayeb, Diptavo Dutta, Benjamin Shapiro, Yungil Kim, Taibo Li, April Kim, Rebecca Keener, Jessica Bonnie, Surya Chhetri, Ashton Omdahl, Josh Popp, Victor Wang, and Matt Figdore.