Download.Html)[80]
Total Page:16
File Type:pdf, Size:1020Kb
2 Epigenetic determinants of cellular differentiation, transcriptional reprogramming, and human disease by Khoi Thien Nguyen Submitted to the Department of Biological Engineering on April 3, 2020, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biological Engineering Abstract Much of the diversity we observe in cellular and organismal phenotypes can be at- tributed to epigenetic and genetic variation. DNA provides the instructions for life, while epigenetic modifications regulate which parts of the genetic information con- tained in DNA can be read out in a given cell and how this information is interpreted. In recent years, epigenetic and genetic variation has been profiled on a large scale with sequencing-based assays, generating many datasets to be explored. In this thesis, I present three projects which apply computational techniques to identify and char- acterize epigenetic mechanisms that may contribute to the regulation of phenotypic variance. First, we mine a dataset charactering the epigenomes of diverse cell types in order to discover signatures of adult stem cell differentiation. We identify a novel marker of the multipotent state, a chromatin state characterized by the histone marks H3K36me3 and H3K9me3, and describe biological processes that may be linked to the loss of this chromatin state in fully differentiated cell types. Next, I present what we learned from profiling the epigenetic state of cells before and after transplantation into Xenopus oocytes, a process that transcriptionally reprograms the cells. This analysis elucidates how the initial epigenetic state of a cell influences the success of cellular reprogramming and identifies transcription factors that help regulate this pro- cess. Finally, we integrate studies measuring the effects of genetic variants on disease with studies measuring the effects of genetic variants on transcriptional and epigenetic activity. This identifies specific mechanisms underlying disease processes, and demon- strates that transcriptional and epigenetic mechanisms may independently contribute to disease pathogenesis. Together, these projects demonstrate the biological insights that can be gained from epigenetic profiling, and expand our understanding of the potential effects of epigenetic modifications. Thesis Supervisor: Manolis Kellis Title: Professor 3 4 Acknowledgments I am grateful to many people for their contributions to my thesis work and my expe- rience as a PhD student. First, I would like to thank my research advisor, Manolis Kellis, for his support and patience, and for giving me the opportunity to be a member of the MIT Com- putational Biology group. The breadth of people and ideas I’ve encountered here has helped me grow immensely as a scientist. I would also like to thank Doug Lauf- fenburger, Ernest Fraenkel, and Mary Gehring, for serving on my thesis committee. Your support, mentorship and advice over the years is appreciated. Next, I would like to thank my collaborators, who have been integral to my achievements. Even if projects didn’t always go the way we envisioned, I always learned a lot. I thank Pouya Kheradpour, Andreas Pfenning, Melina Claussnitzer, Je-Yoel Cho, Kei Miyamoto, Dinesh Kumar, Yongjin Park, Hailiang Huang, Lei Hou, and Carles Boix for working with me over the years and teaching me so much. By extension, I would also like to thank everyone else who has been a member of the Kellis lab, including but not definitely not limited to Alvin, Bob, Kunal, Angela, Xinchen, Abhishek, Max, Yaping, Laura, Xushen, Patty, Jackie, Na, Suvi, Leandro, Remy, Eloi, Maria, Irwin, Peter, Nate, and Nicola. You have all helped me along in my scientific journey in one way or another and have spent many hours laughing and commiserating with me. Many other people have also been a source of knowledge and comraderie, and have helped me get me through the inevitable rough patches in my PhD. Thank you to the students, faculty, and staff of the Biological Engineering department, the Muddy Charles, and all of my friends. Special thanks go to the BE class of 2013, the humans (and animals!) I’ve had the pleasure to live with, the inspirational women in my life, and those I can count on when I need to get my mind off of science. Adrienne, Deena, Natasha, Lynna, Claire, Andee, Kelly, Ashvin, Ben, Shawn, Santi, Griffin, Isaak, Naveen: thank you. Finally, I would like to thank my parents, Thoa and Dung, and my siblings, Danh 5 and Qui. They have been beside me my entire life and I am forever indebted to them for the sacrifices they’ve made to give me the opportunity to be whereIam today. Thank you for teaching me that material (and even academic) success is not the measure of a life well lived. 6 Contents 1 Introduction 17 1.1 Motivation . 17 2 Chromatin state signatures of cellular differentiation 21 2.1 Introduction . 21 2.2 Materials and methods . 23 2.2.1 Epigenome and gene expression data . 23 2.2.2 Model of differentiation effect . 23 2.2.3 Transitions . 24 2.2.4 Gene state classification . 24 2.2.5 Biological pathways . 24 2.3 Results and discussion . 25 2.3.1 The loss of the "ZNF/Rpts" chromatin state is a signature of late-stage cellular differentiation . 25 2.3.2 Chromatin state dynamics of "ZNF/Rpts"-associated regions and genes . 29 2.3.3 Changes in gene expression linked to loss of "ZNF/Rpts" . 35 2.3.4 Biological processes linked to genes that lose "ZNF/Rpts" . 37 2.4 Conclusion . 39 3 Chromatin accessibility impacts transcriptional reprogramming in oocytes 41 3.1 Introduction . 42 7 3.2 Methods . 44 3.2.1 Experimental methods . 44 3.2.2 Statistical analysis . 44 3.2.3 ATAC-seq data processing and peak calling . 44 3.2.4 Genomic distribution of peaks . 45 3.2.5 RNA-seq data . 45 3.2.6 ATAC-seq read coverage . 45 3.2.7 Correlation between samples . 46 3.2.8 Pathway analysis . 46 3.2.9 Motif analysis . 46 3.3 Results . 47 3.3.1 Optimization and Evaluation of ATAC-seq for Analyzing Cells Transplanted into Oocytes . 47 3.3.2 ATAC-seq reveals changes in chromatin accessibility after NT of somatic cells to Xenopus oocytes . 50 3.3.3 Open promoters of silent genes are permissive for transcrip- tional activation in oocytes . 52 3.3.4 Open chromatin is often maintained near genes down-regulated after NT . 53 3.3.5 Oocyte-Mediated Reprogramming Involves Opening of Closed Chromatin . 58 3.3.6 Transcription Factors in Oocytes Are Involved in Transcrip- tional Reprogramming . 58 3.3.7 Closed Chromatin Is Associated with Reprogramming-Resistant Genes . 61 3.3.8 Chromatin Opening Facilitates Transcriptional Reprogramming 61 3.4 Conclusions . 63 4 Multi-omic mediation analysis across multiple disease traits and cell types 67 8 4.1 Introduction . 67 4.2 Materials and Methods . 69 4.2.1 Datasets and processing . 69 4.2.2 Mediation analysis . 70 4.2.3 Functional analysis . 73 4.3 Results and Discussion . 73 4.3.1 Discovering molecular mediators of disease . 73 4.3.2 Every QTL type contributes to heritability explained . 80 4.3.3 Genes are linked to disease through varied mechanisms . 86 4.3.4 Functional dissection of GWAS loci . 91 4.4 Conclusion . 95 5 Conclusion 97 5.1 Summary of contributions . 97 5.2 Limitations . 98 5.2.1 Chromatin state signatures of differentiation . 99 5.2.2 Impact of chromatin accessibility on transcriptional reprogram- ming . 100 5.2.3 Multi-omic mediation analysis . 100 5.3 Looking to the future . 101 A Supplementary information related to Chapter 2 103 B Supplementary information related to Chapter 4 105 9 10 List of Figures 2-1 Strategy to identify epigenomic signatures of cellular differentiation . 25 2-2 Effect of differentiation stage on chromatin state prevalence. .26 2-3 Abundance of the "ZNF/Rpts" in cell types, grouped by differentiation stage and cell lineage . 26 2-4 Effect of differentiation stage on other chromatin features. .27 2-5 Annotation enrichment of chromatin states from the 15-state ChromHMM model . 28 2-6 Differentiation-associated change in chromatin state proportion at "ZNF/Rpts" regions . 30 2-7 Genes clustered by chromatin state composition . 31 2-8 Relative frequency distribution of genes across chromatin state clusters 32 2-9 Chromatin state composition of repressed and active genes with "ZNF/Rpts" 33 2-10 Cluster membership of genes that lose "ZNF/Rpts" during differentiation 33 2-11 Track image of a region that switches from the "ZNF/Rpts" to the "EnhG" chromatin state . 34 2-12 Gene expression changes associated with "ZNF/Rpts" loss . 36 2-13 Expression of genes in active or repressed epigenetic configuration, with or without "ZNF/Rpts" . 37 2-14 Gene sets enriched in genes that lose "ZNF/Rpts" during differentiation 38 2-15 Overlap between genes that lose "ZNF/Rpts" during differentiation in each cell lineage . 39 3-1 ATAC-seq enables the identification of open chromatin regions . 48 11 3-2 Correlation between mapped ATAC-seq reads for samples with varying numbers of cells . 49 3-3 Chromatin accessibility changes in oocyte reprogramming . 51 3-4 Track image of MEF ATAC-seq compared to reference datasets . 52 3-5 ATAC-seq reads in MEFs before and after NT were compared around TSSs in groups of genes . 54 3-6 ATAC-seq, DNase-seq, H3K4me3 ChIP-seq, and Pol II ChIP-seq at TSSs of reprogramming genes . 55 3-7 ATAC-seq reads for MEFs before and after NT around Eif4e2 and Ift46 56 3-8 Transcription factor analysis of genes downregulated after NT . 57 3-9 Track image of newly produced open chromatin during reprogramming 59 3-10 Gene categories associated with newly opened chromatin peaks after NT 59 3-11 Sequence motifs enriched in open chromatin regions after NT .