Download File
Total Page:16
File Type:pdf, Size:1020Kb
Systems Biology Approaches to The Study of Neurological Disorders and Somatic Cell Reprogramming William Shin Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2017 c 2016 William Shin All Rights Reserved ABSTRACT Systems Biology Approaches to The Study of Neurological Disorders and Somatic Cell Reprogramming William Shin This thesis describes the development of an systems biology method to study transcriptional pro- grams that are activated during early and late phases of cell-fusion mediated reprogramming, as well as an implementation of systems-level analysis using reverse-engineered regulatory networks to study CNS disorders like Alcohol Addiction, and neurodegenerative disorders like Alzheimer’s Disease (AD), and Parkinson’s Disease (PD). The results will show an unprecedented view into the mechanisms underlying complex processes and diseases, and will demonstrate the predictive power of these methodologies that extended far beyond their original contexts. Table of Contents List of Figures v List of Tables xvi 1 Introduction 1 1.1 Introduction . .1 1.1.1 ARACNe and Mutual Information-based Networks . .2 1.1.2 Interrogation of ARACNe networks : MARINa and VIPER . .3 1.1.3 What Lies Ahead . .5 2 Systems Biology of Heterokaryons 8 2.1 Introduction . .8 2.2 Results . 11 2.2.1 Generation of Heterokaryon Samples . 11 2.2.2 Sequencing and Correction . 11 2.2.3 Data Quality Control . 12 2.2.4 Regulatory Network Analysis of Heterokaryon Data using Virtual Infer- ence of Protein-activity by Enriched Regulon (VIPER) . 13 2.2.5 Singular Value Decomposition (SVD) Results: Heterokaryons . 14 2.2.6 Differentiation and Reprogramming Markers . 16 2.2.7 HSC Reference Dataset . 17 2.2.8 Hierarchical Clustering and VIPER on HSC dataset . 18 i 2.2.9 Comparison to Heterokaryon Results (Fisher’s Exact Test) . 19 2.2.10 Experimental Validation of the MRs controlling reprogramming . 21 2.2.11 Validation of genes that may control the reprogramming of B cells to lineage-committed progenitors . 21 2.2.12 Validation of genes that may control the reprogramming of lineage-committed progenitors to HSC/MPP/MLP like state . 22 2.3 Discussion . 24 2.4 Methods . 26 2.4.1 ARACNe Networks . 26 2.4.2 VIPER . 26 2.4.3 Transcription Factors Classification for Network . 27 2.4.4 HSC Dataset . 27 2.4.5 Transformation for HSC and Heterokaryon dataset for VIPER analysis. 27 2.4.6 Mapping to Human and Mouse Genome and Multi-mapping reads . 28 2.5 Supplementary Information . 29 2.5.1 Performance analysis of B-cell regulatory network, Murine ES and EpiSC regulatory networks . 29 2.5.2 Analysis of B-cell regulatory network, Murine ES and EpiSC regulatory networks on iPS dataset . 32 3 Systems Biology of Alcohol Addiction 67 3.1 Introduction . 67 3.2 Results . 70 3.2.1 Regulatory network assembly . 70 3.2.2 Regulatory network interrogation and validation . 71 3.3 Discussion . 74 3.4 Materials and Methods . 78 3.4.1 Statistical analysis . 78 3.4.2 Gene expression profiles . 78 ii 3.4.3 Cleaner . 78 3.4.4 Transcription Factor classification . 79 3.4.5 Signaling Molecule classification . 79 3.4.6 ARACNe . 79 3.4.7 MARINa and Candidate Selection . 80 3.4.8 Single-Sample MARINa and Activity Correlation Analysis . 81 4 Systems Biology of Alzheimer’s Disease 104 4.1 Introduction . 104 4.2 Results . 107 4.2.1 Selection and assessment of the gene expression profiles dataset . 107 4.2.2 Construction of the human neuronal transcriptional interactome . 107 4.2.3 Identification of candidate Master Regulators using MARINa . 108 4.2.4 Selection of candidate MRs for biochemical validation and results of ex- perimental validation . 109 4.3 Discussion . 111 4.4 Conclusions . 114 4.5 Materials and Methods . 114 4.5.1 Dataset Processing and Normalization . 114 4.5.2 Hierarchical Clustering . 115 4.5.3 ARACNe . 115 4.5.4 MARINa and Candidates Selection . 116 4.5.5 Transcription Factors classification . 117 4.5.6 Specificity-weighted GSEA and Bootstrapping . 117 5 Systems Biology of Parkinson’s Disease 137 5.1 Introduction . 137 5.2 Results . 139 iii 5.2.1 Generation of TRAP transgenic mice for the cell type-specific profiling of DA neurons and Characterization of Dat bacTRAP mice . 139 5.2.2 Translational profiling of midbrain DA neurons in a model of PD . 140 5.2.3 Identification of upstream transcriptional regulators of neurodegeneration . 141 5.2.4 Subtype-specific profiling of SNpc and VTA DA neurons . 142 5.2.5 Validation of novel determinants of SNpc DA neuron degeneration . 143 5.3 Discussion . 145 5.3.1 Brain regulatory network analysis and identification of MRs . 145 5.3.2 Validation of novel drivers of DA degeneration . 147 5.4 Methods . 148 5.4.1 TRAPseq and gene expression analysis . 148 5.4.2 Regulatory networks assembly . 149 5.4.3 MR analyses on mouse translatomes . 149 5.4.4 MR analysis on PD expression signatures . 150 5.4.5 GSEA analysis . 151 6 Discussion 156 6.1 Discussion . 156 Bibliography 158 iv List of Figures 2.1 Flow chart showing the generation of heterokaryon samples through cell-cell fu- sion and subsequent FACS sorting of fused-cells and paired-end sequencing. 34 2.2 Comparison of replicates for human gene expression during cell-fusion mediated reprogramming over five days. Spearman correlation is shown. Replicate 1 is shown on the x-axis and Replicate 2 is shown on the y-axis. 35 2.3 Expression of a representative set of human pluripotency markers in the heterokaryon samples during reprogramming. The results show the raw-count of each gene nor- malized to GAPDH . 36 2.4 Hierarchical clustering of heterokaryons samples show strong separation according to time point. Plotting was performed using the complete agglomeration method using the expression values of the top-2000 genes with the highest variance across all the samples. 37 2.5 MAplot showing differential expression of human genes after cell-cell fusion with murine ES cell. log2 fold-change of normalized counts are plotted on the y-axis and average log expression values are shown on the x-axis. Blue-dashed line indicates fold-change of 1. Genes displaying a significant (pval < 0.01) change in expression are shown in red. 38 v 2.6 MAplot showing differential expression of mouse genes in an ES cell after cell- cell fusion with a human b-cell. log2 fold-change of normalized counts are plotted on the y-axis and average normalized counts are shown on the x-axis. Blue-dashed line indicates fold-change of 1. Genes displaying a significant (pval < 0.01) change in expression are shown in red. 39 2.7 Heatmap of VIPER activity for significant heterokaryon MR candidates. Heatmap shows NES values of 633 TFs that were significant (FDR < 0.01) in the het- erokaryon samples. NES values were calculated using the VIPER algorithm, com- paring the each heterokaryon sample against the unfused B-cell. Positive NES val- ues are shown in red, and negative NES values are shown in blue. 40 2.8 Plot showing total variance of VIPER activity in the heterokaryon dataset ex- plained by each eigengene after SVD analysis. Eigengenes are plotted on the y- axis, and the proportion of variance is plotted on the x-axis. 41 2.9 Plot showing the Eigengene levels across the sample timepoints for the top-2 Eigengenes of the heterokaryon dataset. The feature levels show a bimodal pat- tern across timepoints after cell-cell fusion. 42 2.10 VIPER-predicted activity and differential expression of a representative set of hu- man Bcell, pluripotency, and HSC markers during reprogramming. VIPER-predicted activities are represented by Normalized Enrichment Sores (NES), and were cal- culated using the VIPER algorithm, comparing the heterokaryon samples of each time point (either 4h, 12h, 48h, 120h) against the unfused B-cell. Positive NES values are shown in red, and negative NES values are shown in blue. Differential expression was calculated using EdgeR, which uses a negative-binomial distribu- tion to calculate the variance of reads. Log2FC values are shown, with positive values shown in orange, and negative values shown in purple. A and B show the results for Bcell markers, C and D show the results for pluripotency markers, and E and F show the results for HSC markers. 43 vi 2.11 Hierarchical model of hematopoietic development, adapted from Laurenti et al. and Doulatov et al. [Laurenti et al., 2013; Doulatov et al., 2010] .......... 44 2.12 Hierarchical clustering of HSC samples shows significant variations between cells of the same type. Plotting was performed using the complete agglomeration method, with the expression values of the top-2000 genes with the highest variance across all the samples. 45 2.13 Heatmap of predicted activity for significant HSC MR candidates. Heatmap shows NES values of 445 TFs that were significant (FDR < 0.01) in each sample com- pared to unfused B-cell samples. NES values were calculated using the VIPER algorithm, comparing the samples of each cell-type against the unfused B-cell. Positive NES values are shown in red, and negative NES values are shown in blue. 46 2.14 Results of transcriptional similarity between heterokaryons and celltypes in the HSC dataset. Fisher’s Exact Test (FET) was used to calculate the significance of overlap between candidate TFs whose activities were both significant (FDR < 0.05) and positive in each celltype. {log10 of the p-values of the overlap are shown. 47 2.15 Heatmap showing the clustering of the TFS that were part of the early and late transcriptional programs in the Heterokaryons.