Analysis of Trans Esnps Infers Regulatory Network Architecture
Total Page:16
File Type:pdf, Size:1020Kb
Analysis of trans eSNPs infers regulatory network architecture Anat Kreimer Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2014 © 2014 Anat Kreimer All rights reserved ABSTRACT Analysis of trans eSNPs infers regulatory network architecture Anat Kreimer eSNPs are genetic variants associated with transcript expression levels. The characteristics of such variants highlight their importance and present a unique opportunity for studying gene regulation. eSNPs affect most genes and their cell type specificity can shed light on different processes that are activated in each cell. They can identify functional variants by connecting SNPs that are implicated in disease to a molecular mechanism. Examining eSNPs that are associated with distal genes can provide insights regarding the inference of regulatory networks but also presents challenges due to the high statistical burden of multiple testing. Such association studies allow: simultaneous investigation of many gene expression phenotypes without assuming any prior knowledge and identification of unknown regulators of gene expression while uncovering directionality. This thesis will focus on such distal eSNPs to map regulatory interactions between different loci and expose the architecture of the regulatory network defined by such interactions. We develop novel computational approaches and apply them to genetics-genomics data in human. We go beyond pairwise interactions to define network motifs, including regulatory modules and bi-fan structures, showing them to be prevalent in real data and exposing distinct attributes of such arrangements. We project eSNP associations onto a protein-protein interaction network to expose topological properties of eSNPs and their targets and highlight different modes of distal regulation. Overall, our work offers insights concerning the topological structure of human regulatory networks and the role genetics plays in shaping them. Tables of Contents List of Figures ................................................................................................................................. v List of Tables ............................................................................................................................... viii Acknowledgements ......................................................................................................................... x Chapter 1: Introduction ................................................................................................................... 1 Chapter 2: Inference of modules associated to eQTLs ................................................................... 7 2.1 Introduction ........................................................................................................................... 8 2.2 Results ................................................................................................................................. 13 2.2.1 Computational framework for detecting transcriptional modules ............................... 13 2.2.2 Modules’ topology ....................................................................................................... 16 2.2.3 Module’s score and filtering ........................................................................................ 18 2.2.4 Cis/trans-effects ........................................................................................................... 20 2.2.5 Independent cross validation by similar annotations from two sources of information and phenotypic analysis ........................................................................................................ 21 2.2.6 Comparison with standard approach to module construction ...................................... 23 2.2.7 Analysis of specific modules ....................................................................................... 24 2.3 Discussion ........................................................................................................................... 30 2.4 Materials and Methods ........................................................................................................ 32 i 2.4.1 Data details and processing .......................................................................................... 32 2.4.2 Step 1—nominal association testing ............................................................................ 32 2.4.3 Step 2—module construction, scoring and filtering .................................................... 34 2.4.4 Step 3—finding secondary SNPs ................................................................................. 36 2.4.5 Analysis of dependencies within modules ................................................................... 37 2.4.6 Module annotation ....................................................................................................... 39 2.4.7 Filtering modules using different criteria .................................................................... 39 2.4.8 Enrichment of cis-effects for main SNPs..................................................................... 40 Chapter 3: Co-regulated transcripts associated to cooperating eSNPs define bi-fan motifs in human gene networks ................................................................................................................... 41 3.1 Introduction ......................................................................................................................... 42 3.2 Results ................................................................................................................................. 45 3.2.1 Computational framework for associating pairs of SNPs with pairs of genes ............ 45 3.2.2 Distribution of genomic properties of eSNP sources and their gene targets ............... 52 3.2.3 Characterizing dependencies within cooperating quartets ........................................... 53 3.2.4 Identifying direction of effect between eSNP sources and gene targets ...................... 54 3.2.4 HLA quartet ................................................................................................................. 58 3.2.5 Functional enrichment of quartets ............................................................................... 60 3.2.6 Replication of quartet properties in a larger dataset .................................................... 60 3.3 Discussion ........................................................................................................................... 62 ii 3.4 Materials and Methods ........................................................................................................ 64 3.4.1 Data details and processing .......................................................................................... 64 3.4.2 Association testing ....................................................................................................... 64 3.4.3 Obtaining a random distribution of association test-statistics ..................................... 65 3.4.4 Creating and filtering quartets ..................................................................................... 65 3.4.5 Statistical challenges in comparing real vs. permuted quartets. .................................. 66 Chapter 4: Variants in exons and in transcription factors affect gene expression in trans ........... 67 4.1 Introduction ......................................................................................................................... 68 4.2 Results ................................................................................................................................. 72 4.2.1 Computational framework for mapping trans associations onto the PPI network ....... 72 4.2.2 Identifying topological properties of exonic eSNP interactions ............................. 74 4.2.3 Characterization of exon and transcription factor sources and targets ........................ 77 4.2.4 Co-expression of targets and cis-effects on the source gene ....................................... 82 4.2.5 Modular organization of eSNPs in TFs ....................................................................... 83 4.2.6 Support for eSNPs in TFs from different data sources ................................................ 85 4.2.7 Distribution of TF sources and targets in PPI functional clusters ............................... 86 4.2.8 Specific example of TF eSNP ...................................................................................... 86 4.2.9 Distribution of exonic sources and targets in PPI functional clusters ......................... 87 4.2.10 Specific example of exonic eSNP .............................................................................. 89 4.2.11 Mechanistic interpretation of exonic eSNPs .............................................................. 91 iii 4.3 Discussion ........................................................................................................................... 93 4.4 Materials and Methods ........................................................................................................ 99 4.3.1 Data details and processing .......................................................................................... 99 4.3.2 Association testing ....................................................................................................... 99 4.3.3 Obtaining a random