bioRxiv preprint doi: https://doi.org/10.1101/311167; this version posted May 3, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
i i “main” — 2018/5/2 — 9:50 — page 1 — #1 i i
Bioinformatics doi.10.1093/bioinformatics/xxxxxx Advance Access Publication Date: Day Month Year Manuscript Category
Research article Causal gene regulatory network inference using enhancer activity as a causal anchor Deepti Vipin 1, Lingfei Wang 2, Guillaume Devailly 1, Tom Michoel 2,∗ and Anagha Joshi 1,∗
1Division of Developmental Biology and 2Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Easter Bush, Midlothian, Eh25 9RG, UK.
∗To whom correspondence should be addressed. Associate Editor: XXXXXXX
Received on XXXXX; revised on XXXXX; accepted on XXXXX
Abstract Motivation: Transcription control plays a crucial role in establishing a unique gene expression signature for each of the hundreds of mammalian cell types. Though gene expression data has been widely used to infer the cellular regulatory networks, the methods mainly infer correlations rather than causality. We propose that a causal inference framework successfully used for eQTL data can be extended to infer causal regulatory networks using enhancers as causal anchors and enhancer RNA expression as a readout of enhancer activity. Results: We developed statistical models and likelihood-ratio tests to infer causal gene regulatory networks using enhancer RNA (eRNA) expression information as a causal anchor and applied the framework to eRNA and transcript expression data from the FANTOM consortium. Predicted causal targets of transcription factors (TFs) in mouse embryonic stem cells, macrophages and erythroblastic leukemia overlapped significantly with experimentally validated targets from ChIP-seq and perturbation data. We further improved the model by taking into account that some TFs might act in a quantitative, dosage-dependent manner, whereas others might act predominantly in a binary on/off fashion. We predicted TF targets from concerted variation of eRNA and TF and target promoter expression levels within a single cell type as well as across multiple cell types. Importantly, TFs with high-confidence predictions were largely different between these two analyses, demonstrating that variability within a cell type is highly relevant for target prediction of cell type specific factors. Finally, we generated a compendium of high-confidence TF targets across diverse human cell and tissue types. Availability: Methods have been implemented in the Findr software, available at https://github.com/lingfeiwang/findr Contact: [email protected], [email protected]
1 Introduction hundreds of human cell types, with an estimated 1.4% of the human genome associated to putative promoters and about 13% to putative Despite having the same DNA, gene expression is unique to each cell type enhancers. Enhancers physically interact with promoters to activate gene in the human body. Cell type specific gene expression is controlled by expression. Although the general rules governing these interactions (if any) short DNA sequences called enhancers, located distal to the transcription remain poorly understood, experimental techniques such as chromosome start site of a gene. Collaborative efforts such as the FANTOM (Andersson conformation capture (3C, 4C) combined with next generation sequencing et al., 2014) and Roadmap Epigenomics (Kundaje et al., 2015) projects (Hi-C) (Mifsud et al., 2015) as well as computational methods based on have now successfully built enhancer and promoter repertoires across correlations between histone modifications or DNase I hypersensitivity at enhancers with the expression of nearby promoters (Thurman et al.,
© The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected] 1
i i
i i bioRxiv preprint doi: https://doi.org/10.1101/311167; this version posted May 3, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
i i “main” — 2018/5/2 — 9:50 — page 2 — #2 i i
2 Vipin et al.
2012; He et al., 2014) are continually improving at predicting enhancer- an alternative (Halt) hypothesis, to support or reject the causal model promoter interactions. In contrast, understanding how the activation of one E → A → B, where E is an eQTL/enhancer in the regulatory region gene leads to the activation or repression of other genes, i.e. uncovering of gene A, and B is a putative target gene: primary linkage (E → A), the structure of cell-type specific transcriptional regulatory networks, secondary linkage (E → B), conditional independence (E → B only remains a major challenge. It is known that promoter expression levels of through A), B’s relevance (E → B or correlation between A and transcription factors (TFs) co-express and cluster together with promoters B), and excluding pleiotropy (partial correlation between A and B after of functionally related genes (Forrest et al., 2014), but without any conditioning on E). The log-likelihood ratios (LLRs) are computed for additional information such associations are merely correlative and do all possible targets of each gene, and then converted into p-values and not indicate a causal regulation by the TF. posterior probabilities (Storey and Tibshirani, 2003) of the alternative Statistical causal inference aims to predict causal models where the hypothesis being true; see Wang and Michoel (2017) for details. manipulation of one variable (e.g. expression of gene A) alters the We applied three treatments to enhancer expression data. First, we distribution of the other (e.g. expression of gene B), but not necessarily regarded enhancers as binary (on/off) variables, and after binarizing vice versa (Pearl, 2009). A key role in causal inference is played by causal the data (see Methods), used the existing Findr to predict TF targets anchors, variables that are known a priori to be causally upstream of others, directly. This approach will be referred to as Findr-B (“binary”). Second, and can be used to orient the direction of causality between other, relevant we adapted all five tests in Findr to use continuous instead of discrete variables. A major application of this principle has been found in genetical causal anchor data, and used this method on (untransformed) eRNA data. genomics or systems genetics: genetic variations between individuals This approach will be referred to as Findr-C (“continuous”). Third, to alter molecular and organismal phenotypes, but not vice versa, so these accommodate the co-existence of binary and continuous enhancers for quantitative trait loci (QTL) can be used as causal anchors to determine different TFs within the same dataset, we developed an automatic adaptive the direction of causality between correlated traits from population-based method to independently treat each enhancer as binary or continuous, data (Schadt et al., 2005; Chen et al., 2007; Rockman, 2008; Li et al., depending on the relative strength of the primary enhancer-TF linkage 2010). Such pairwise causal associations can then be assembled into causal with either method. We call this approach Findr-A (“adaptive”). gene networks to model how genetic variation at multiple loci collectively affect the status of molecular networks of genes, proteins, metabolites and downstream phenotypes (Schadt, 2009). Interestingly, several experiments have recently shown that enhancer 3 Methods regions can be transcribed to form short, often bi-directional transcripts, 3.1 Datasets called enhancers RNAs or eRNAs (Natoli and Andrau, 2012). eRNA expression is correlated with and crucially, precedes the expression of • We used Cap Analysis of Gene Expression (CAGE) data (TPM target genes (Arner et al., 2015). Though the functional role of eRNAs expression values) from the FANTOM5 Consortium for enhancer and remains to be understood, the presence of eRNA from a regulatory transcription start sites (TSS) in mouse embryonic stem (ES) cells region is an indicator of enhancer activity (Danko et al., 2015), and (36 experiments), macrophages (224 experiments) and erythroblastic eRNA expression has been successfully used to predict transcription factor leukemia (52 experiments) (Forrest et al., 2014; Arner et al., 2015). activity (Azofeifa et al., 2018). We therefore hypothesized that eRNA We also selected 1036 samples from all cell types and tissues in mouse. expression as a readout of enhancer activity could act as a causal anchor, • We use Chip-seq data from the Codex consortium (Sánchez-Castillo opening new avenues to reconstruct causal gene regulatory networks. et al., 2015). Data available for 78 TFs in mouse ES cells, 12 TFs in To test this hypothesis, we developed novel statistical models and macrophages and 17 TFs in erythroleukemia cells were used. likelihood-ratio tests for using (continuous) eRNA expression data in • For validation using Knock-out data, we have collected differentially causal inference, based on existing methods for discrete eQTL data, and expressed gene lists after perturbation of factors from published studies implemented these in the Findr software (Wang and Michoel, 2017). We in mouse and gene lists after over-expression of factors in mouse ES then applied this new method to CAGE data generated by the FANTOM5 cells from Xu et al. (2013). project, a unique resource of enhancer and promoter expression across • We obtained CAGE data (TPM expression values) from the hundreds of human and mouse cell types (Forrest et al., 2014; Arner FANTOM5 Consortium for enhancer and TSSs for human cell types et al., 2015), and validated predictions using ChIP-seq and perturbation and tissues (Forrest et al., 2014; Arner et al., 2015). We selected 360 data. We noted that continuous eRNA expression values increased target samples from all cell types and tissues (1826 in total), by removing prediction performance for some factors, while for other factors, a technical and biological replicates. binarized presence or absence of the enhancer signal performed better. Leveraging this observation, we found that a data-driven approach to classify enhancer expression as either binary or continuous was sufficient to 3.2 Data processing automatically select the best target prediction method, allowing parameter- • CAGE data was processed to clear for unannotated and non-expressed free application of the method to organisms and cell types where validation genes, and expression levels were log-transformed. Only enhancers data is not currently available. expressed in more than one third of experiments were retained. • For each TF, we selected the promoter with the highest median expression level as the promoter for that TF. • For each TF, all enhancers within 50 kb of the TF promoter region 2 Approach were detected using the GenomicRanges package in Bioconductor We used enhancer expression as a causal anchor to infer causal gene Lawrence et al. (2013), and considered as candidate causal anchor interactions, within the Findr framework (Wang and Michoel, 2017). Findr enhancers for that TF. provides accurate and efficient inference of gene regulations using eQTLs • Enhancer data was binarized by setting all experiments with zero read as causal anchors by accounting for hidden confounding factors and weak count to 0 and all others to 1. regulations. This is achieved by performing and combining five likelihood • For the ChIP-seq data, genes with a TF binding site within 1kb of their ratio tests (Figure 1B), each of which consists of a null (Hnull) and TSS were defined as targets for that TF.
i i
i i bioRxiv preprint doi: https://doi.org/10.1101/311167; this version posted May 3, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
i i “main” — 2018/5/2 — 9:50 — page 3 — #3 i i
Causal gene network 3
Fig. 1. Overview of Findr framework. A. The schematic representation of causal gene regulatory network inference using enhancer activity as a causal anchor B. Five statistical tests used by Findr for causal inference C. work flow of the Findr-A framework. i i
i i bioRxiv preprint doi: https://doi.org/10.1101/311167; this version posted May 3, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
i i “main” — 2018/5/2 — 9:50 — page 4 — #4 i i
4 Vipin et al.
• For the knockout and over-expression data, genes with differential 4. Relevance test: The relevance test verifies that B is regulated by expression q-value< 0.05 were defined as targets for the TF. either E or A. Its hypotheses are
(4) 3.3 Likelihood ratio tests with continuous causal anchor Hnull ≡ E → AB, data (4) Halt ≡ E → A ∧ E → B ← A. Given a causal relation E → A → B to test, where E is a (continuous) enhancer for TF A and B is a putative target gene, with their expression Similarly, data samples 1,...,n annotated in subscripts, we first convert each
continuous variable into a standard normal distribution by rank. Each (4) n 2 2 2 LLR = − ln (1 − ρˆ )(1 − ρˆ ) − (ˆρAB − ρˆEAρˆEB ) variable is modeled as a normal distribution with the mean linearly and 2 EA EB additively dependent on its regulators in the five tests below (illustrated in n + ln(1 − ρˆ2 ). Figure 1B). 2 EA
Primary linkage test 1. : The primary linkage test verifies that the (4) enhancer E regulates the regulator gene A. Its null and alternative LLRnull/n ∼ D(2, n − 3). hypotheses are 5. Controlled test: The controlled test verifies that E regulates B
(1) (1) through A, partially or fully, with the hypotheses as Hnull ≡ E A, Halt ≡ E → A. H(5) ≡ B ← E → A, The log likelihood ratio (LLR) and its null distribution is identical null with the correlation test in Wang and Michoel (2017). Therefore the (5) Halt ≡ B ← E → A ∧ A → B. LLR is n LLR(1) = − ln(1 − ρˆ2 ), 2 EA Its LLR is where (5) n 2 2 2 n LLR = − ln (1 − ρˆ )(1 − ρˆ ) − (ˆρAB − ρˆEAρˆEB ) 1 X 2 EA EB ρˆXY ≡ XiYi . n n n i=1 + ln(1 − ρˆ2 ) + ln(1 − ρˆ2 ), 2 EA 2 EB And its null distribution is with the null distribution (1) LLRnull/n ∼ D(1, n − 2). (5) LLRnull/n ∼ D(1, n − 3). The probability density function (PDF) for z ∼ D(k1, k2) is defined as: for z > 0, The LLR and its null distribution then allow to compute the p-values and the posterior probabilities of the null and alternative hypotheses 2 −2z(k1/2−1) −k z p(z | k1, k2) = 1 − e e 2 , separately for each subtest, as detailed in Wang and Michoel (2017). B(k1/2,k2/2)
and for z ≤ 0, p(z | k1, k2) = 0, where B(a, b) is the Beta function. 3.4 Findr-B and Findr-C 2. Secondary linkage test: The secondary linkage test verifies that In Wang and Michoel (2017), it was shown that a combined causal the enhancer E regulates the target gene B. The LLR and its null inference test performs best in terms of sensitivity and specificity for distribution are identical with those of the primary linkage test, except recovering true regulatory interactions, using both real and simulated test by replacing A with B. data. The combined test score is: 3. Conditional independence test: The conditional independence test verifies that E and B become independent after conditioning on A, 1 P = (P2P5 + P4) with its null and alternative hypotheses as: 2
(3) where Pi is the posterior probability for subtest i. Hnull ≡ E → A → B, The Findr-B method returns this combined P -value using the original (3) Halt ≡ B ← E → A ∧ (A correlates with B). Findr on binarized enhancer data. Findr-C does the same using the new tests on continuous enhancer data. Correlated genes are modelled as having a multi-variant normal distribution whose mean linearly depends on their regulator gene. 3.5 Adaptive method Findr-A Therefore, Given a set of TFs and for every TF, a set of candidate causal anchor (3) n 2 2 2 enhancers, the adaptive Findr-A method performs the following, for each LLR = − ln (1 − ρˆ )(1 − ρˆ ) − (ˆρAB − ρˆEAρˆEB ) 2 EA EB TF A (Figure 1C): n n + ln(1 − ρˆ2 ) + ln(1 − ρˆ2 ). 2 EA 2 AB 1. Compute the primary linkage test p-value for all candidate enhancers of A, both continuous and binarized. Following the same definition of the null data, its null distribution is 2. Find the enhancer E with the lowest p-value overall. 3. If the lowest p-value occurred for a binarized enhancer, use Findr-B (3) LLRnull/n ∼ D(1, n − 3). for TF A with E as its causal anchor, else use Findr-C.
i i
i i bioRxiv preprint doi: https://doi.org/10.1101/311167; this version posted May 3, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
i i “main” — 2018/5/2 — 9:50 — page 5 — #5 i i
Causal gene network 5
Fig. 2. Recall-precision curves for target predictions by Findr-B and Findr-C using ChIP- seq and perturbation data
3.6 Validation methods For the purpose of evaluation, we calculated the Findr-B and Findr- C scores for all TF-gene combinations. Genes with scores exceeding a certain threshold were considered as predicted targets for each TF. Precision-Recall curves, calculated using the the “PRROC” package, and Fig. 3. Robustness of Findr performance demonstrated by using different score thresholds hypergeometric overlap p-values with respect to known targets from ChIP- seq and knockout data in the same cell type were used to compare the performance of the two methods. the suitability of Findr-B or Findr-C for causal inference was dependent 4 Results on the factor, i.e. using continuous enhancer data performed better for some factors (Figure 2: Gata1, Fli1), while on/off data performed better 4.1 Causal inference from enhancer and transcript/gene for others (Figure 2: Myc,Klf2). Because the number of putative ChIP- expression CAGE data seq targets for each factor varied widely across factors and cell types, To test our hypothesis that causal inference using enhancer expression as a from only 420 gene targets for JunD in Erythroblasts to over 12,000 gene causal anchor predicts true TF targets, we used CAGE data generated by the targets for ESRRB in ES cells, the background precision levels differed FANTOM consortium across hundreds of cell types and tissues in human highly between factors (Figure 2). and mouse (Forrest et al., 2014). We used bi-directional expression in non- Transcription factor binding inferred using ChIP-seq data is thought promoter regions as an indicator of likely enhancer activity in each CAGE to be mostly opportunistic and therefore might not to provide direct clues sample (Andersson et al., 2014). The presence of enhancer expression about the functional targets of the factor (Cusanovich et al., 2014). We (as a proxy for enhancer activity) in each sample is crucial for the ability therefore collected available perturbation data, specifically expression data to apply causal inference techniques. We first selected three mouse cell after knock-out or knock-down (KO) of a factor, for 85 factors in ES types (embryonic stem cells, macrophages and erythroblastic leukemia) cells and 11 factors in macrophages (see Methods). We also collected for systematic characterization, as these had more than 20 samples per gene expression data after over-expression for 55 factors in mouse ES cell type, with diverse treatments or time series. Furthermore, ChIP-seq cells (Xu et al., 2013). Using differentially expressed gene lists as known and TF knockout validation datasets were available for each of these cell targets, we evaluated the predictions of both methods. This confirmed the types. We selected predicted enhancers (Andersson et al., 2014) within factor-specific suitability of either the Findr-B or Findr-C method (Figure 50kb of each transcription factor in a cell type, resulting in 109 enhancers 4). for 48 transcription factors in ES cells, 55 enhancers for 8 transcription We further tested whether these results were sensitive to the target factors in macrophages and 5 enhancers for 4 transcription factors in probability prediction threshold. The enrichments of true positives were erythroleukemia, with an average of 3.8 enhancers per transcription factor stable over a wide range around this threshold for both methods (Figure 3). across cell types. We inferred causal transcription factor-target interactions for each 4.2 Development and validation of an adaptive transcription factor using the enhancer element most strongly linked to model-selection approach for causal inference using each TF in each cell type. We predicted targets for 48 transcription factors with two methods, one using continuous enhancer data (“Findr-C”), and discretized or continuous data one using discretized, binary (on/off) enhancer data (“Findr-B”) (see As the optimal prediction performance depended on a factor-specific Methods). The Findr software outputs a score representing the putative choice between discretizing enhancer expression data or not, we probability of a causal interaction for each transcription factor-target pair investigated if this decision could be made in a data-driven, adaptive (see Methods). For both methods, the targets with predicted probability approach (henceforth called “Findr-adaptive” or “Findr-A”,see Figure 1C), of a causal interaction greater than 0.8 (see Methods) were validated in the absence of validation data. In short, Findr-A selects for each TF using a compendium of ChIP-seq data (Sánchez-Castillo et al., 2015), among all its candidate enhancers, both continuous and binary, the one containing 78 factors in ES cells, 12 factors in macrophages and 17 factors with the strongest primary linkage to the TF’s expression, and then uses in Erythroblasts. Of these factors, 18 in ES cells, 7 in macrophages and that enhancer and its corresponding method (Findr-B or Findr-C) to predict 4 in Erythroblasts had enhancer expression in CAGE data. We noted that downstream targets for that TF (see Methods for details). This adapative
i i
i i bioRxiv preprint doi: https://doi.org/10.1101/311167; this version posted May 3, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
i i “main” — 2018/5/2 — 9:50 — page 6 — #6 i i
6 Vipin et al.
Fig. 4. Comparison of Findr-A predictions using mouse ES cells and All cell types samples. A. bar plots representing enrichments for Findr-B, -C and -A predictions using ChIP-seq data as a validation dataset for ES cells (left) and all cell types (right). B. bar plots representing enrichments for Findr-B, -C and -A predictions using Knock-out data as a validation dataset for ES cells (left) and All cell types (right). C. overlap of factors between ES and All cell types using ChIP-seq (left) and knock-out (right) as validation datasets.
i i
i i bioRxiv preprint doi: https://doi.org/10.1101/311167; this version posted May 3, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
i i “main” — 2018/5/2 — 9:50 — page 7 — #7 i i
Causal gene network 7
approach was indeed able to select the best performing method for most information about the activity of the associated transcription factor of the factors (Figure 4A,B, Findr-A selection marked by a green star). (Figure 5). Reassuringly, factors known to form regulatory complexes, We further performed functional enrichment analysis of gene target including Max and Mxi1 or Runx1 and Smad1, also clustered together, sets predicted by Findr-A. 1119 targets of JunB in macrophages were i.e. shared predicted causal targets (Figure 5). enriched for ‘LPS signalling pathway’ (p < 10−8) and ‘RNA binding’ To investigate whether combining multiple enhancers was more (p < 10−8). The macrophage CAGE samples indeed measured the informative for determining causal targets, we compared two integrative response to LPS signalling. JunB is known to be a delayed response methods against the predictions of taking individual enhancers. Firstly, gene, attenuating transcriptional activity of immediate early genes and we used the median expression level of all the putative enhancers for RNA binding proteins, specifically terminating translation of mRNAs each transcription factor as a ‘meta-enhancer’ in Findr-A. Secondly, we induced by immediate early genes (Healy et al., 2013). 131 targets for calculated the first principal component of the binary target prediction Cbx7 in ES cells (Figure 3) were enriched for ‘regulation of transcription matrix for all enhancer of a TF in order to ‘average’ predictions. However, from RNA polymerase II promoter’ (p < 10−10), and included several we did not observe any significant overall improvement in performance developmental genes such as Hox family proteins. This enrichment was using either method. much stronger than for the ChIP bound targets of Cbx7 (p < 10−5). Cbx7 is a part of the PRC complex, which binds predominantly at bivalent 4.5 Causal inference using CAGE expression data across chromatin at promoters of transcription regulators in ES cells (Mantsoki human cell types et al., 2015). Finally, we inferred causal interactions between transcription regulators Traditional causal inference using causal anchors relies on a and targets using CAGE enhancer and TSS expression data in humans. conditional independence test (Figure 1B, test 3), but the presence of Specifically, we inferred causal interactions for 20 transcription factors common upstream regulatory factors results in low sensitivity for this test (with eRNA expression) using Findr-A (see Methods). We firstly validated in applications to gene network inference (Wang and Michoel, 2017). To the predicted interactions using a database of experimentally validated address this, both Findr-B and Findr-C (and hence also Findr-A) employ regulatory interactions in human (Han et al., 2018) and noted a statistically a combination of alternative tests (Figure 1B, test 2, 4 and 5, see Methods significant overlap between the predicted and experimentally validated for details) that resulted in significantly improved prediction of TF targets gene sets (p < 10−5). Figure 6 represents the network of top 200 from eQTL data (Wang and Michoel, 2017). This remained true for the interactions for each factor. The factors involved in biologically related CAGE data in all three mouse cell types (data not shown), and hence the processes shared predicted causal targets. For example, two members of combined test remains the recommended default. the SMAD family, SMAD3 and SMAD6, as well as BCOR and SIN3A involved in histone deacytelase activity showed a high overlap of predicted 4.3 Perturbations within and across cell types provide targets. causal targets for a distinct set of transcription factors We noted that most factors performed better using binary enhancer expression values and wondered if this might be due to the limited 5 Discussion & Conclusion expression data available for each cell type. The Fantom5 CAGE data We explored the utility of eRNA expression as a causal anchor to predict contains over 1000 samples across many more mouse cell types and tissues, transcription regulatory networks, by leveraging the observation that henceforth called “all-data”. Findr-C indeed performed marginally better eRNAs mark the activity of regulatory regions. Previous studies support on all-data rather than cell type specific data. Importantly, all-data and this notion, as eRNA expression has been shown to temporally precede the cell type (ES) specific data resulted in causal targets for a distinct set of expression of its effector gene (Arner et al., 2015), and to correlate strongly transcription factors. In particular, variation within a cell type was more with active regulatory regions across cell types (Danko et al., 2015). We informative for causal target predictions of cell type specific factors. For therefore developed a novel statistical framework to infer causal gene example, the targets of key pluripotency factors Sox2 and Esrrb (Dunn networks (Findr-A), by extending the Findr software for causal inference et al., 2014) were enriched in ES-data but not all-data (Figure 4A,B). using eQTL data (Wang and Michoel, 2017). There were only five common factors with causal targets predicted We demonstrated the applicability of Findr-A by predicting causal using both all-data and ES-specific data that were validated by ChIP-seq interactions from CAGE data generated through the FANTOM consortium, data, and only six common factors validated by KO data (Figure 4C). and validating them with ChIP-seq and perturbation data for three mouse Interestingly, the target genes predicted from ES-data and all-data for the cell types as well as on the entire FANTOM5 data. Notably, different same factor overlapped significantly. For example, 72% of Eed predicted factors were enriched for within cell type analysis as compared to across targets using ES-data overlapped with Eed predicted targets using all-data. cell types. The causal regulatory network of cell type specific factors (e.g. Sox2, Esrrb in ES cells) could be inferred only using expression variation within a cell type and not across cell types. Due to the limited availability 4.4 Multiple enhancers of the same factor have a highly of validation data, a more comprehensive assessment was not possible. correlated expression The current approach can be extended in several aspects in the future. Mammalian genes are controlled by multiple enhancers. We investigated Firstly, Findr assumes equal (or no) relations between all sample pairs the stability of inference outcome under different choices of enhancers (Wang and Michoel, 2017), which hold for the majority of eQTL datasets. as causal anchors. For all factors for which Findr-A predicted targets By accounting for heterogeneous sample relationships, such as biological in ES cells that overlapped significantly with perturbation data (Figure and/or technical replicates, time series, or population structure, we may 4), we predicted additional target sets using other available enhancers, be able to reconstruct more accurate networks. Secondly, the assumption resulting in target sets for 76 enhancer-transcription factor pairs for 46 that eRNAs act as causal anchors is only approximately true, because unique transcription factors. their activity ultimately is regulated by other regulatory factors, i.e. the The hierarchical clustering of transcription factor-enhancer pairs based assumption that they are a priori causally upstream of correlated TF–gene on these target sets clustered mostly by transcription factors, indicating pairs will not hold for all genes. Because eRNAs are temporally expressed that the expression of multiple enhancers contains highly redundant before their direct target genes, we hypothesize that explicit modelling of
i i
i i bioRxiv preprint doi: https://doi.org/10.1101/311167; this version posted May 3, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
i i “main” — 2018/5/2 — 9:50 — page 8 — #8 i i
8 Vipin et al.
Fig. 5. Circular plot of hierarchical clustering of overlap of targets predicted using multiple enhancers for the same factor in mouse ES cells
gene expression dynamics in the Findr framework will allow to detect and Funding correct for such feedback loops. Thirdly, eRNAs are expressed at relatively This work was funded by grants from the Biotechnology and Biological low levels, and therefore susceptible to noise, and a reliable eRNA signal Sciences Research Council (BBSRC) [BB/P013732/1, BB/M020053/1]. was available for only a limited number of known transcription factors in mouse or human. Generating deep sequencing data for CAGE, utilizing GRO-seq or epigenetic or transcription factor ChIP-seq data to estimate References enhancer activity could be possible ways to get around this. In conclusion, we have demonstrated that enhancer activity can be used Andersson, R. et al. (2014). An atlas of active enhancers across human to infer causal gene regulatory networks. We foresee this approach to be cell types and tissues. Nature, 507(7493), 455–461. of high value in the context of human medicine, by combining genetic, Arner, E. et al. (2015). Transcribed enhancers lead waves of coordinated epigenetic and transcriptomic information across individuals to unravel transcription in transitioning mammalian cells. Science, 347(6225), causal disease networks. 1010–1014. Azofeifa, J. G. et al. (2018). Enhancer RNA profiling predicts transcription factor activity. Genome Res.
i i
i i bioRxiv preprint doi: https://doi.org/10.1101/311167; this version posted May 3, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
i i “main” — 2018/5/2 — 9:50 — page 9 — #9 i i
Causal gene network 9
ADCY1
PDE1B
NKIRAS1
CREBL2 FABP7
SRCIN1
KIF21A SPATA7 CAP2 GABARAPL1
FBXO36
BRSK1 INPP5F CDKL5 STXBP1
DNAJC28 DNAL1 IGSF21 LRRC49 SCAMP1 HCN1 Res., 46(D1), D380–D386.
PAWR
TRPC1
UBQLN1 KIAA1467
RNF219
OLFM1 PHC1
BSN
DBC1
PCDH10 ZNF629 PNMA2 MDH1 SERINC1 KBTBD6 CCDC130
MAP1B
PCDHB15 FBXW7
ANK2 THY1
DCLK2
MPP2 TRAM1L1
SNAPC3
C11orf87 ZNF415 PRRT2
SLC25A25 ZFP2
ADAM23 PPP2R2B CADM2
TMEM35
DNAJB4 ANAPC10
CPE
DENND5A MEGF8 KLHL11
TUB VSNL1 SLC30A10
ZNF365 KIAA0195 CADM4 TMEM59L
LGI1 BEND6 CTXN1 et al. PREPL CPT1C He, B. (2014). Global view of enhancer-promoter interactome in
SC5DL SLC4A8 PTPN4
PCDHAC2 CNNM1 CLUAP1 ZFP1 GPHN DCTN6
CHGB
SNAP91 MAPK8IP2 SEZ6L2
PCDP1 SLC1A2
ZC3H3 HECW1
ZBTB7A SAMD14 SRSF12 HSD11B1L SPPL3
SEC61A2
SCN1B EFR3B SACS C16orf45 GRIA3 PDE4A ASCC2 CAB39L PER3 KIAA0913
TRIP12
RAPGEF4
PITPNM1 ZNF487P
ATL1 ZNF483 SLX4 SLC25A4 GPR37L1
PIP4K2B
DTX3 CTNND2 NAP1L3 SPATA2 ZNF484
LRRC4B
NAPG
BRMS1
CTNNBIP1 NEUROD2 CDH10 SERP2
AP3M2 CBLN2
SLC45A1 ARPC1B PTPRN GNAO1 DPYSL2 FAM59B NDRG3
NEURL4
NOVA1 CRMP1 MIR4787
ARL6
ELAVL2
MAPT PTPRS ZNF532 AIMP1 DMWD POU3F2
GAP43 B3GNT1 CNIH2 PCLO DLG2 Proceedings of the National Academy of Sciences 111 LINGO1 KCNJ3 human cells. , (21), HAVCR2 RWDD2A CECR6
C1orf114 TMEM130 CELSR3 C6orf168 AKAP11 HABP4 PSD CHST10 MAP7D1
ATP6V1C1 KLHDC3
MAP6 UBQLN4
ARMCX5 CA11 FAM57B TMEM8B MAPK8IP1
BRSK2 C19orf47 STMN4 GDI1 BAI2 ST8SIA1 CACNB3
SYT1 EFNB3 EPS15L1 FAM164A KIF1A
NRSN2 PODXL2 APLP1 ZNF579 BCL9L SKP1 WBP2
NLGN4X
RAB6B AXIN2 FBXL16 MAP3K10
EEF1A2 FAM131B CDK20 CBLN4 IP6K1 DLEU7 L3MBTL1
TUBG2 MAP2
PCSK1N
CPLX1 BBS4 TMEM123 SEPT5 SLC25A22 TRIM41
STT3B SNAPC4
MLLT11 SEPT7 RTN1 ARNT2 CAMK2N2 ANGEL2 ABCA3
ACSL6 CRIPT
GBAP1
SPTBN1 ACTR3 SPTBN4
RTN3 FAM115A
ANKRD46
CCDC106
GPR137 PRRC2B BTBD7 ZNF335 IRF2BP1
SRRM4 TYK2 USP22 KIF3A CACNA1G
AMN1 PI4KA E2191–E2199. HTATSF1P2 TMEM132B
KATNB1
DCAF7 TTC9B SERPINI1
ATP5J
FKBP4 BBS7 NMI CARD16 DUS1L
TIPARP ProSAPiP1 SMPD4
PRKAR1B
RAB11FIP4 SLC7A14 AP3B2 DYRK1B
GBP2 IRGQ LGALS8 CBX2 ATP6V0A1 LDOC1L
TUBB4A CLINT1 SGTA CELF3 MAN2A1 DPYSL5 MRPS11
VRK2 SHANK3 GPI
C10orf35 ATXN7L3B TMEM145
LYST EXOC6B GPR162
PTDSS2
SCAF11 RCC2
UBL5
STK38 KIAA0368 INA
NCAN
FAM91A1 CA10
CLSTN3 FAM171A2
ISCA2 SHOC2 UBE2O
C16orf13 DGKZ NEAT1 CDC25A TXNL4A RANBP3
GPSM1
POLR2I MATR3 PIN1 ADAR SFXN5
SYCE1L SLC8A1 NUDC IPMK TRAPPC2L DIRC2 DEAF1
C14orf1 CTAGE5 FBRSL1 NDRG4 CKB DUSP26 KYNU et al. LPPR5 Healy, S. (2013). Immediate early response genes and cell CLCN4 ITCH PHF21A FAM96B
PAICSP4 ELK4 CUEDC2 PPP1R7 PPP1R26 GSK3A RHBDD2 ANKAR RTN2 G3BP2 UBE2I DHX35 TBCCD1 RNF11 GANC COPS7B PGD LRRC61
SPAG1 ITGB3BP EPS15 AKR7A2 ACBD5 DHX34 IFNGR1 TNFSF10 C1QL1 CASC4 SMARCA4 ACAP2 SNRPB
TNFAIP2 PHF8 RUNDC3A
PTOV1 RTN4R LYSMD3 MIR3128 ZNF213
MUS81 RUVBL2 PRMT10 RTKN TTC23L PDXP C9orf86
TPGS2 HMGB3
UROS DGCR5 GPX4 NDUFB2 MARCKSL1
TRIM27 DBNDD1 NT5M AB590070 TNPO2 NDUFS5 MCM7 NDUFA7 BX649020 PPIL2 ARHGDIG MOB1A NR4A1 PLEKHJ1 CRNDE KIF3C HCN2 KIAA1967 TEX264 STMN1 TMEM165 MAGT1 FLYWCH2
MAP2K4 XIAP NONO SPTBN2
FAM82A1 DNM1
MZT2B
NFYC C11orf84 NOSIP TRAK2 AB384105 SOS2 XRCC1 MGAT5B C19orf25 RBMS1 VASH2 RNASEH2B TP53BP2 SCP2 TNFRSF14 DUSP8 PPP2R4 LINC00340 FBXL18 CCT7 FAM104A YWHAG BABAM1 PDCD10 ERP44 HLA-F C7orf41 TCF7L2 ZNF749 FAM71E1
ZNF649 ANKRD54 C12orf52 CTNNB1 ZBTB1
SMARCB1 ZNF574 RPS6KL1 SNRNP25 SVOP PATZ1
AGPAT4 CCNF COG3 CREB5 C10orf32 transformation. Pharmacol. Ther., 137(1), 64–77. CDKN2AIPNL CRY1 FADS1 EPG5 BAG6
NCF2 TM9SF3 LSM4 HRASLS5 FLAD1 MRP63 CCDC82 ZNF193 KIAA1045 UBXN4 AB462940 TSC22D2 SLC25A32 IL15 PTGES2
TBCD PAFAH1B3 SPHK2 ZNF414 ARHGAP31 RAC3
DDX3X SGMS1 COQ9 EIF2AK2 ALYREF HLA-A ALPK1 UBE2D3 TMSB15B CDK5R1 LRRC24 CXCL16 C3orf75 DIAPH1 SLC43A3 C2orf77
KCNJ10
COBRA1 EF212284 CU680404 ERI3 TUBG1
EED HIAT1
RNF2 DNAJC3-AS1 CDK14 CTBP1 ATXN7 CHAF1B PIH1D1
PIGA PNMAL1 RNF40 ATP7A DCK GPCPD1 MDK ARRDC3 LRR1 ZNF562 EAF1 ZBTB9 HDAC2 PLDN AZI1 TYRO3 CLASP2 AB384754 KCNK2 BMP5 MRPL43 LRRC41 SNRPA CHRNA5 DDX5 SYN1 LSMD1 CCT6B CD46 TNFAIP3 TBK1 TOM1 GRPR FDX1L FIBP SLC35A5 PIDD
CHMP2B UBL7 PGAM1 WAPAL RNMT SNRPD2P1 MARCH8 MAP3K2 SH3GL1 B4GALNT4
C6orf115 PURB SHARPIN SLC4A11 FAM129A MTA2 SYAP1 LUC7L2 VPS33B AHCTF1 ACTB
CNOT7 CMPK1 C16orf42 RSRC1 SMAD1 HIVEP1 WHSC2 MEX3A ADAM17 ATF3 ERN1 SF3B4 NR4A2 POLA2 C10orf54 C9orf78 DNTTIP2 TARBP2 RLIM C14orf142
RBM4 FAM102B HNRNPL RPL39 MYL6B GORAB DDX60 CNST BST1
EP300
SAE1 NBEAL1 TH1L Kundaje, A. et al. (2015). Integrative analysis of 111 reference human UBAP2L
NHP2 PRDX2 GPR113 TRAF2 SV2C RPUSD1 MICB NR4A3 CSNK1G3 NCKAP1L LATS1 C10orf46
MED17 ST7L USPL1 EIF2AK3 PRKACA RCOR2 OSBPL8
TSTD2 FTH1 MAP3K1 NUDT22 TRIM38 ELF4 TMF1 PELI1
PPP2R1A TIMM17B RABEPK
MTF1 AZIN1 ZNF394 RAP1A GGNBP2 FOS ADD3
CU688193 B3GNTL1 CHN1 OCIAD1 GOLGB1 PPP1CB
LRRFIP1 GCC2 DVL2
CTNND1 SP140L ATRIP CBX8 PUS10 NFE2L2 CD58 ALDH18A1 MIER1 GPC2 REST GALC
BTN3A1 ACSL1 LSM2 ACTR1A SENP5 LITAF GTF3C5 SLC25A36 WDR54 ATXN10 RPL10 TRAF3IP3 TMEM194B
PSEN1 AKR1A1 TOMM7 GBA PRMT1
MLST8 NEDD8 NUFIP2 RABGAP1L RANBP2 BAZ2A MORF4L1 JMJD1C
MDC1 MAP3K8 TOE1 RPUSD3 PIK3CA MIR3917 FAM105B PHF11 ITGB8 ISG20 CD2BP2 NFKB1 ISCU NEK7 TERC RBM15B PARP12 STRADA
DLEU2 PTPRE C7orf10 C7orf30 PFAS SEMA4C TNFSF4 ERAL1 NLRC5
CEBPZ SOX4 TERT
ATP5I MON1A
ZHX2 CLIP1
DNAJB12 POP7 MMGT1 IRF1 ESCO1 BAG5 MAGED2 MAPKAPK2 ZC3H4 USP12 METAP2 MRPL19 TRAF1
MKRN1 NAT14 Nature 518 HLA-C epigenomes. , (7539), 317–330. SON LCE2A FBXO9 ARRDC2 ARHGEF1
SAT1 TAB3 CNIH4 ETFB CCDC51 MBNL3 GCA CLEC4E TANK RLTPR KLHL25 SPAG9
ZCCHC6 UBE2R2
ENO2 RAC2 E2F3 JAGN1 TAOK3 ANKRD31
STAM2
NGRN CORO7 PML SAMD1
B9D1 AB464084 RB1 PHPT1 PATL1
SP100 TMEM9B CNOT8 VCAN
GINS1
PEMT PLCG2 TAP2 GEN1 ZDHHC4
FOSB SMAP2 USP11 TTC32
LCP1 PRCC
BACH1 UNC13D GRIK2 SP110 USP8
ELF1
TMEM223 ZNF331 SOCS4 TEAD2 GHDC USO1 KHSRP BOLA3 SKIL TBL3 TRAF3 NDUFC1 RILPL2 ELMO1
CWC25 NINJ2
MPV17L2 PARP14
GSK3B CTNNA1 ZMYM6 PSMB10 CDC73 ATXN3 PFN2 C3orf19 SAC3D1 HIBADH
NOD2 MRPS24 BIN3 PTTG2 SYF2
NDUFB10
PFN1 SNX20 BTG2 NFKBIB
SH3BP2 DTX3L NFKBIA METTL21D
CYTIP EXOC7 NFKBIE MOB3A MXD1 ARHGDIB CCDC93 PKN1
NUDT3 C4orf52 QSER1 RBM23 et al. MAZ Lawrence, M. (2013). Software for computing and annotating PSMB8
DDHD1 COX7A2L STX11 IFIH1 NKRF CBS PPP3CC PSMB9 DMXL2 HDDC2 MKLN1 FAM113B ADRBK1 PARK7
ZNF74
NIPSNAP1 CORO1A BAZ1A MRPL20
FNDC3A NGFRAP1
NIPBL KDM5A CHN2 TCEAL4 MLKL TNFAIP8L2 CISD2 PHF20 RASGRP3 EEF1B2 STK17B FKBP3 NCOA7 MX2 ZNF295 ALOX5AP NUP62 SLC9A8 WHAMM NFKBIZ MED13 FAM49B IL2RG
NCOA3
CDC42SE1 C15orf39 BOD1 HCG25 TSPAN6 ITSN2 RARA FUNDC2 GGA3 DZIP1 POLR2F B3GNT5 DENND4A RALY LAT2 FLJ31945 MIR4687 CD55 MCTP1
WDR7 RGS14 FSTL3 MTX2 FMO5 SP3 SMEK2 TET2 ACSL5 KLF13
USP15 ZFC3H1 ANKRD34A AB007865
IRAK1BP1
STAG1 UBR4 PSMB6 PRKCD ZDHHC14
PGRMC1 ATP2C1 PLA2G12A FTSJD2 AADAT DKFZP779L1853 PPP1CA IRF9 C22orf34 FAM200B
STAG2 CHST11 SWI5 METRN ANKMY2 GATAD1
GNA13
CCDC114 IREB2 HCG27
RPS6KA3 SERPINB1 ANP32A PTPN6
BROX PLCG1 CAST STK10 IL12RB1 DOCK7 B4GALT2 TBX21
TMEM69 ASH1L RABL5 genomic ranges. PLoS Comput. Biol., 9(8), e1003118.
UVRAG CASP8 PTPN1 FAM110A CCBL1 OGT MEN1 ACN9 IRF4 GRB2
ZBTB8A ZCCHC8 LIMK1 SP1 AMPD3 WWP2 DNAJC27 PPAP2A
GPR132
REL
BIRC3 TMEFF1 CREB3L2 USP16 ARHGAP15 DCP1A FMNL1 EGFEM1P XRCC3 CHCHD1
RNF44 CST7
DAZAP2 IRF2BP2 SSU72 CRTC2 ATF2 GZF1 UNC45A ZMYM2 PLEK2 CTC1 DNAJC13 PICALM GATAD2A MEI1 PURG CHD8 PIP4K2A XRN1 MCM9 VPS11 SLC25A1 CEP63 TTC14 NSD1 FAM49A TMEM154 STK4 DEF6 ZNF354C
RBFOX2 MIR4727 EVI2B UIMC1 ANKRD49 VRK1
MTR UPP1 WBP5 GMIP OVCA2
SPATS2L CCDC111 FOXM1 UQCRQ WRB
GPSM3 RAB43 RIT1 PAM16 CCDC136 XPO4 NBEAL2 C14orf118
C5orf39 HNRNPF MRPL51 SLC39A14 MORN4 PARD6A RCN1 CHRNB2 CBLB MYADM CCDC150 MED23 CASD1
UBTD1 SPN ZBTB39 CD83 PCBP1-AS1 CTSS PARP1 SLC16A5 RLF MRE11A IQGAP1 LRMP
PERP ERBB2IP CYB5R4 SNHG10 ARAP2 NFIL3 UBXN7
CCDC102B AGPAT5 RSBN1L C14orf37 NCKAP1 RABGGTB GPR65 XPO1 SKAP1 ARHGAP4 TRIM37 SOX6 C9orf72 CD38 HIST1H1E
LRRCC1 SMARCD2 JMJD6 ZNF519 DCBLD1 PIK3CG IFT172
BBX C1orf63 et al. HPRT1 YTHDC1 CDC20P1 Li, Y. (2010). Critical reasoning on causal inference in genome-wide ITGB7 NRXN1 THOC2 MACF1
DHX38 CYTH1 KIAA1456 APP
CRIP2 BMI1 ABI2 GIMAP5
SETX SEMA6D C1orf112 ZNF791 CCNT1
AQR PRR3 SEPW1 TOX SREK1 PIK3IP1 TAX1BP1 KIAA0240 RNF168 PBX4 C5orf54 ACLY FAM135A ACAT2 DSTN CLDN12 CHD7 FSIP1 CDKN1B PPWD1
FAM3C TYW1B ODZ3
GKAP1 PTPRK PNISR SF3B1 FAM122C
FLJ22184 HLA-L WDR17 NPTN SUV420H2
KRT18 PLAGL2 EIF2B5 TAF2 GPBP1L1 SERINC2 CDC42SE2 CDC14A
FAM76B CHID1 PALLD SLC30A9 FANCL TMEM9 PLXNC1 RFX7 ARHGEF2 ANTXR1 JARID2 SETDB2 PRDM1 PRIM1 RICTOR DTYMK CHD2 GNPNAT1 SPECC1 LAMA4 MYL6 FOSL2 SNX33 MSL2 AFTPH DDX39A
NFKBID ARID3B ING3 GNS COL5A2 ARHGAP9 SSH2 FCGRT CU688443 GPATCH1 NUP205 CDCP1 RASSF5 SLC35A2 FOXN2 C19orf57 CCNL1 PRKACB RSRC2
SUGP2 CEP192 MIR17HG AHNAK2 SARM1 TM9SF1 TAGAP KCTD6 BCLAF1 SCLT1 METTL21A CCDC88A SKA2 SLC44A5 HERC4 ZNF654 LMNB2
NCKIPSD LAMB2 PRPF4B PRKCB LSM5 VAV1 TMEM151B ICAM2 EFEMP1 STAT5A ZNF131 PDZD11 RAI14 FAHD1 ATP9A AUTS2 LBR ARID3A PSMB5 KATNAL2 DLGAP4 GPM6B TMEM170B PTK2B
ZNF267 CEP85 COPS4 PIK3CD C4orf29 MRPL33 CSNK1D POSTN PDE8A CBLL1 PYGL POU2F1 WIPF1 HSPB1 POC5 MAPK6 POMGNT1 TCTN1 PPHLN1 NUDT1 BDP1 ZNFX1 PEAK1 CNOT10
MYLIP ZCCHC2 DCP2 BIRC6 AKT1S1 C20orf111 NDST1 ZNF382 linkage and association studies. DYNLL1 CTTN ATAD5 TAPT1 FIGNL1 CD53 TPD52L1 NRGN MIR4638 C16orf54 SMG6 ZNF326 KANSL2 C1S BTN2A1 CDK5RAP1 ALDH7A1 TMC8 CCBL2 FBXL5 PRR14 GBP4 PDE7A NQO1 CDKAL1 AURKAIP1
DZIP3 LUC7L PRRC2C SMARCA2 IGFLR1 DPY19L2P2 NUP214 C10orf28 FAM53B PTPRC
SYT11 TPRKB SH2D4A CHPF IFNAR2 SMARCA5 WDR1 SELPLG MIB2 TAPBP EML2 DUSP4 SUMF1 ALDOA TNRC6B BCL10 ZNF430 CCDC88B LMTK2 ZCCHC7 GPR63
POM121 MFNG
TCF4 PNP
ST7 PLOD2 PCYOX1 ALDH1L2 BLM GEMIN2 VPRBP DMTF1 C10orf118 ANKRD44 BRPF1 YEATS4 BTD LCP2 PAM SLC25A19 RIN3 SAMHD1 CXCR4
ZNF644 SFSWAP
RRAGA CTDSPL2 SYNPO PRSS23 NCF4 CTSW CENPA ZBTB11 BAG3 PRDX5 SCAMP3 ARID4B SASH3 ECHS1 CELF2 GSN RELT CDK12 MTBP CLIP4 SNX12 EPS8 DENND4B KCNN3 CDR2L RBM39 CENPM RFXAP MAN1B1 IKZF5 ZRANB3 FAM127A PDLIM2 MRPL24
TNKS1BP1
SLC7A5P1 ATP8A1 GIMAP7 AKAP1
TMED8 MTMR14 GLT8D1 ARF4 NUTF2 ITGB2 NASP RHEBL1 PWP2 DNAJC7 PPDPF NPAT MPHOSPH9 PCM1 HACE1 SNRPA1 C1orf85 PLEK ZZEF1
FCHO1 BCL3 TDP1 C9orf93 SUV39H1 NXF1 BRAF MDM1 LCORL ERCC1 TXNL1 SPI1 DCTN2 F2RL1 MCFD2 FAT1 MYB SRXN1 RASGRP4 SYNGAP1 TTC8 MYSM1 DENND3
PPFIBP1 NLGN1 SEPT6 LARP6 CCDC142 SRSF11 GMFG
DDHD2 PSMD8 SNRK RASAL3 MICA ARHGAP30 SRSF5 C17orf79 RHOD URB2 RNH1 TUBGCP3 LPPR2 RBM25 IFT122 LAMB1 CU691854 HJURP
MIS18BP1 KDM6A FGD3
ARID5B AP2S1 NEDD9
DEPDC5 KIF21B TGOLN2 PON2 IKZF1 INPP5D GVINP1 ZBTB24 et al. EPC1 Mantsoki, A. (2015). CpG island erosion, polycomb occupancy MDN1 ARID5A CCDC134 SPC24 ABCA7 P4HA2 FAM98B GALNT10 WIPI1 OPA1 ARHGAP33 RBM38 UCP2 ZNF546 HECA UBE2S CD63 MTF2 RBL2 CEP97 SDC1 RPS9 PHLPP2 LIAS PNN SYK NDUFS8 CSTB CPSF6
FKBP9 PRPF38A TACC3 MRPS34 HERPUD2 TRIM24 AMICA1 HCST APBB1IP AX721088 C12orf48 PAN3 COL12A1 MYO1F ARHGEF40 MLL5 EVL RAD54B SRSF10 SSRP1 RBBP6 KATNAL1 KLF3 LAIR1
TM9SF2 PDHX DCN TTC28-AS1 RNASE6 PHF12 INO80D BCL2 GTF2A2 CDC42EP1 MS4A6A ARRB2 GTF2H5 PARVA FAF1 SLC39A7 ADAM22 CREBZF SAP30 VMP1 DDX26B EAF2 KIF1C COPG SDF4 ARID4A C5orf56 C15orf34 CDK2AP1 COPZ2 CENPC1 ARAP1 CREB1 NKG7 LMNB1 CALU HNRNPA1 TUBGCP4 NUP35 SH3D19 PDLIM4 RNF125 FAM161A NET1 ACVR1 SFI1 RPS15 TRNP1 PIK3R5 NPAS2 PAN2 XPO7 PIAS2 IFT43 SF3B14 PHF6 ZNF529 KDM4C PLD3 RPS6KA1 ZDHHC5 SLC3A2 PHF3 SLC10A3 BRD2 GIMAP1 EXOSC2 RAD51C CEP350 PPP1R18 TMCO6 C1R SIN3A PRPF19 PLBD2 TAGLN CCDC88C CTSZ CTU1 MPEG1 TAF4B CCND3 CSF2RB ZNF266 FAM126B FCHSD2 HN1 PVR ARHGAP26 C10orf128 BRI3 CCP110 RBM15 AOAH FGF11 SHISA4 ATP6V0E1 UCHL5 FEN1 KNTC1 NCOA2 C4orf27 LTBP1 FBLIM1 CNTRL RPL6 SENP7
MYO1B OXNAD1 B3GNT9 SPARC REXO2 ARHGEF9 DYRK1A LENG8 DENND1C
EPM2AIP1 ARL2BP EID1 ACO1 GLIPR1 STAT5B MAML1 CD93 KIAA1191 DOCK11 GRN TXNDC17 SETD1A NFATC3 SMAD6 TOR1AIP1 PTPRCAP FAM98A PEA15SETDB1 MDFI
CDC42BPB RTN4 CLMP CALD1 AKAP8 BIRC5 HMGB2
FNDC3B BTAF1 SIGIRR ARHGAP25 FPGS FDFT1 C4orf46 CNOT1 NAP1L4 SEMA3C SEMA4D HELB CARD8 NBL1 PSPC1 CREB3 TAF5 DALRD3 PPIC KIAA1586 CEP152 MKL1 MANBAL CEP57L1 PLK1 CYR61 FAM168A WSB2 EPB41
TPM2 PATL2 CPEB3 NUPL2 KDELR1
ANKRD37 SELM and sequence motif enrichment at bivalent promoters in mammalian PGCP DKK3 EFCAB7 PPP1R3C NDFIP2 MLL3 PTPN14 RC3H1 PHIP ZNF92 CD52 ENAH WAS PCF11 HMHA1 DOT1LC14orf43 SLC35B2 THAP4 ITGAX LAPTM5 CECR1 PRPF38B CTSB USP48 CLK1 C6orf132 LNPEP RTTN
TCTE3 CETN2 RAB2A KIF22 PXDC1 GUSBP11 ZNF292 HERC1 ARL1 USP33 PCMTD1 NCAPD3 CD276 AGTPBP1 FN1 HIRIP3 MDM4 POLR2H GNG2 ANTXR2 ATG16L2 SNX32 KIF15 DRG1 CYFIP1 USP37 ZNF431 MRC2 SMEK1 BCAR1 PARVG SNX7 FAM114A1 BAX HAUS5 NKTR PSMD14 LAMC1 EIF6 C5orf41 GTDC1 TPP2 CLK4 SMCHD1 YIF1A CAPN1 CAV2 ATP9B IL10RA PNPLA2 USP3 LRP5L EFEMP2 POGZ NUP160 LRRC37A3 ATL3 DDIT3 ZNF136 FADS3 CEP95 C1orf162
ELK3 RNPC3 HNRNPH1 ARHGAP1 MTHFR NUP210 VPS13B LINC00461 YIPF5 COL5A1 FSTL1 ARSJ FAM13B SLC25A40 PROCR SERGEF PPP1R16B C4orf21 ZNF273 TNFAIP1 SEC61B CD37 HNRPDL ZNF107 PPP1R10 HELZ ITPKB RBM22 KIAA1522 PTRF GIGYF2 GAB3
TACC1 CD151 ITGAM
DOCK2
RPS21 PTTG1IP PAQR8 SNAI2 SLC39A13 GPC1 VKORC1 PPIB UTRN RNGTT EEF1E1 RNF138 SEC11A GNG12
INPP5B C19orf38 RAB13 GIPC1 NTN4 SEC61G FAM129B JOSD2 SYNE1 NUPR1 EID2B TBXAS1 UBXN2A NUP153 POLR2L PLOD1 TGFBR2 MAU2 LRP10 MGST3 MGAT4B RFWD3 POLR3K ARID1A LRRC8B PRPF3 CD69 FYTTD1 RNF167 KIF2A ITGA3 AY726569 HNRPLL AHNAK TPM4 S100A13
DDOST CYTH4 HMBOX1 SEPT7P2 GPATCH8 FTH1P16 FTH1P3 DFNA5 SYNC SERPINE2 HSP90B1 MON2 TIMP2 SART3 CD84 IFITM3
FRYL SEC31A EHD2 AFF1 MVP PMS2P3 ITGB5 RPS6KA5 NRF1
TPT1 OSMR LOXL2 TRIP6 CAV1 GCFC1 CNOT2 CLASRP POLG2 Sci Rep 5 NIF3L1 CYBB embryonic stem cells. , , 16791. EIF4G1 KAT6A CYB5R3 TXNDC16 SCARNA2 LAMP1 POLR3E ZNF407 TNFRSF12A COL6A3 SELL TGFB1 FERMT3 SP4 S100A16 TUBGCP6 MYCBP2 COPS7A F13A1 NR3C1 MAP3K4 RPS23 PROCA1 ANKRD12 THBS2 SMTN TERF1 RUFY2 TMCO3 SCARB2 YKT6
CRIM1 RNF19A GAS6 CCDC80 FCGR2A PDCD1LG2 ITGB1 XBP1 SGMS2 AXL CU677000 CTGF VASN NRP1 PLAT
FNBP4 KBTBD8 N4BP2L1 HDLBP LOX OSTC SERPINH1 CDC42EP3 RHOC UTP15 MYO19 EIF2C3 FTX RCSD1 ACTN1 HOMER3 RFX3 ARID2 FIP1L1 SNX6 ASAP1 SRPX2 MLL COL1A1 RRAS AKIP1 CAGE1 IL16 ELL2P1 TIMP1 MMP2 PLCB2 UBE2Z SYDE1
RIN2 COPA GPR176 SRRM1 AHSA2 TMED3 ARHGAP19 RBM26
SCYL2 C10orf137 SAFB2 UNG GSS GLT8D2 MYO1G CCNT2 KCNA3 MIA3 CD81 UGDH AKNA SERPINE1 FOXN3 LRRC59 MRPL17 ARHGEF6 MBNL1 SLAIN1 ZNF789 MYO1C CD59 CCDC66 THUMPD2 KLHL6 COPB2 NR2F2 CFL2 TRAF3IP2-AS1 POLE ORMDL2 RCHY1 ECM2 COL6A2 C12orf24 ATXN2L GNG5 KPNA4 MT1P2 MALAT1 ATAD2B ZFR AHI1
THBS1 RNF181 EZH2 C1orf132 KIAA1324 PLEKHG2 MAP4 CSF3R
PRDX4 SLA RECQL PLAUR TRIM33 ARGLU1 COL8A1 VGLL3 INHBA C12orf35
YY1 CCDC84 CAPN2 TM2D2 SHC1 VAT1 NR2C2 ZCCHC11 GOLT1B TGFBI PDLIM7 LEMD3 CCND1 RAPGEF6 APOPT1 SAFB C9orf3 ERVK13-1 RSPRY1 ELOVL1 UBR7 VEGFCSRPX ZNF254 L32785 TRIM22 DDAH1 UBR2
ANKMY1 GNL1 ECM1 ZNF195 ARF1
ADM AKAP13 JAKMIP2 SHPRH PDGFC TPBG RBM33 CHD6 MT1E SCAPER MGST1 BX649164 POLA1
SHMT2 FZD6 FLNA SDC4 CDH13 SNRNP40 S100A11 et al. BCOR ASNA1 Mifsud, B. (2015). Mapping long-range promoter contacts in human PLEKHN1 TNS3 LEPREL4 MMP14 TGIF1 MYOF PRKCDBP RND3 SFPQ NT5E MFGE8 GPX8 PIAS1 ZC3H12A DDX55 ZNF493
UBN2 TRAPPC1 TAX1BP3 PRELID1
PSIP1 KAT7 CWF19L2 ATN1 INTS6 CDK17 MBD2 SLFN12 ZYX SFXN3 LIF LTBR ANXA2 DPH3 LGALS1
CAPZA1 RBM6 QSOX1 NFATC4 ADAM9
PLSCR3 CLTA MIR222 SKIV2L2 COPZ1 COL4A1 AB463129 MIR221 FRMD6 HIF1A RAB34 STT3A
CARD6 CEP170 KIAA0528 AL358752 PLEC ATP8B1 WWTR1 BMP1 ITGA5 SLTM EU016553 BCAS2 VAMP3 CKAP4 COPE FAM117B KIRREL PHLDA3 SNRNP200 MKL2 ANKRD28 TIGD1 TAF10 AKAP9 PCNT P4HB DAD1 MKI67IP TAF9B CAP1 FHL2 NNMT NFKB2 GOLGA8B MAK16 YIPF3 IGFBP6 ACTN4 MLF2 KRIT1 PTGR1 ERGIC3 ARFGAP3 DCBLD2 KIAA0232 WBP4 KIAA0430 RIPK2 RNF8 CFL1 PTBP3 PRPF39 SRPK1 BIRC2 OIP5-AS1 KDELR3 CERCAM C11orf30 RCOR3 MYO15B PLIN3 RCAN1 PXDN DUSP28 RBM5
UNKL PABPC1 HTRA2 CBX7 PAPOLG LGALS3 ANXA5 APBB3 BPTF
SS18L1 FLJ10038 MLLT6 S100A10 RCL1
USP1 PTBP2 GADD45A SMAD3 ZNF767 LZTS2 ZNF100 SPICE1 RPL27 CDC42EP5 SHISA5 IQCB1 TRAM2 RGL1 UAP1 ADAMTS6 C12orf51 MAPK1IP1L CU689793 MYEF2 SCAF4 ARHGAP27 CXCL1 RPL35A FOSL1 SLC19A1 C8orf38 DOPEY1 EHD1 ALCAM FAM176B
GNA15 RHBDF1 GRASP PILRB C19orf10 TRIM5 TWIST2 RPS27L CHERP WNK1 AZI2 ZBTB44 ZBTB20
EXT1 AIM1
ABL2 ACIN1 ZNF638 TMTC3 BUD13
ENO1P1 TCF12 SLC4A1AP IGFBP4 TMEM237 SECISBP2 MBTD1 CTF1 STARD9 IGFBP3 Nature Genetics 47 TBC1D2 SNX13 AB384358 CLIC1 TUBA1C cells with high-resolution capture Hi-C. , (6), 598– HQ013238 NR2C1 HIST2H2BE RBM7 GRAMD1A LTBP2 ZNF212 NSL1 SIRT1 C8orf58 YPEL1 TAF1 LIMA1 RALGAPA1 LAMP2 RPS27A TGFB1I1 SSR3 SNHG1
MFSD10 KIAA1958 FAP SDF2 HDAC9 TSPAN4 KCTD10 SLC38A2 C1orf122 EBNA1BP2 B4GALT1 C11orf71 SURF4 ANXA2P2 SCARNA13 RCN3 CORO1C TIA1 CCT6P1 AMBRA1 GLRX3 COL4A2 COL6A1 DIXDC1 PSME2 KLF6 NAA16 MED15 FERMT1 HSPA5 HEG1 ANKZF1 MGEA5 RNF138P1 FAM179B CD44 MMP19 REEP5
ZNF783 FAM176A HSPA13 RAB27B TCEB2 GLT25D1 IER3
GBE1 ARHGEF17 MET MIR21 EMD
DDX47 SRCAP BOD1L MAN2A2 EVC LETM2 NCOR2 RPL32P3 N4BP2L2 DRAP1 VPS13A RBMS2P1 RRBP1 RAB32
MLL4 POLR2J4
GNG11 BZW1 GPRC5A RIC8A GNA11 RPL19 TSPAN9 ILF3 CDK6 PPP1R12B PHLDA2 TM4SF1 ESYT2 LIG1
FAF2 VEGFA DUSP14
AMOTL2 AHR CU691178 NFE2L1 NOTCH2 CD109 NUP133 PRKCH CAPNS1 GJA1 PPP1R14B PLAU BCAR3 SH3PXD2B UNK TWSG1
KPNA5 WDR70 ADRM1 FAM83G SETD5 ANKRD13D KCTD11 RBL1 DDX17 RPL7 AHCYL2 RBM10
FGFR1OP SEPT9 CHTOP EMP1
RRAS2 RERE
NBN AF130094 CFLAR UBXN2B CLIC4 AFG3L1P MARCH6 SAR1A
U00943 OLFML3 CDC42EP2 EEF1A1P6 TUBB6 GBP1 ANXA1 AB384478 ARF6 RGS20 SMAGP VPS13D 606. EFNB1
HYAL2 UACA DNAJA4 BCL2L11 FADD ZBTB38 CSNK1A1
LHFPL2
TNFRSF1A CENPP HSPA14 PDIK1L FUBP1 CANX TNFRSF10B RGS3 KIAA1328 NAPB EIF1 SHB MIR4690 MYL9 SORBS3 TRIM52 EGFR GARNL3 EXT2 C14orf159 PSMA3 N4BP2 FKBP1A MXI1 TMEM43 COMMD7 C16orf57 AFAP1 NFYA DNASE1L1 ST6GALNAC3 TCEA1 PPP1R12A ZNF668 CSGALNACT1 DAB2 MECP2
PHLPP1
ENG SNX5 MYH9
HSPG2 MAML2 AGAP11 CRTAP EIF3D LAPTM4A LMNA ZNF33A SRSF3 FTSJ1 TMEM194A SRSF7 CEP72 THOC1
CRYZ FKBP10 KRT8 PTX3 PRDM2 PXN EIF5A CDK7 TFG MTUS1 EEF1A1 GNL3L KIF5A UGCG AB384019 AF086249 ANXA2P3 RIF1 UBC
MSH3 ABCA6 ECE1 PCDHGA11 PLXNB2 MTHFD2 ALMS1 HTRA1 SYNCRIP NOSTRIN TXN ZNF48 YPEL2
SEPN1 ZFAND3 BRP44L
LAMB3 TFE3 TMEM63B SLC39A1 NCOA5
UBE2E3 DRAM1 SMPD1 GLIS2 FASTKD1 WWOX PNPLA7 RHOA ATRX SLFN5
KIAA0101 LEPREL2
SCAMP5 PCMTD2 SLC25A39 RRAGD COMT CCNA2 DDR2 S100A6 MND1 CPA4 C14orf149 ACAD8 OXSR1
NUP37 RAB11FIP5 EIF3E EPHA2
ZNF678 SNX24 PHB2
TPPP3 NUMA1
PLK2 IQSEC2 ZNF397
SERTAD1 TGM2 SMC3 Natoli, G. and Andrau, J. C. (2012). Noncoding transcription at enhancers: GAPDH RIN1 TLE2 ZNF646 SH3YL1 QTRTD1 SLC39A10
CDH11
TPRA1 RPL17
IL20RB CDK1 G6PD GART
FAM65B
PLEKHG4 TAGLN2 AJUBA PTPN23
TRAPPC10 BFSP1 ZBTB4
ACVRL1 C15orf58
COQ4 RALGDS TSHZ1 RPS8
LASP1
DPM1 C11orf48 PAICS
ASPH
BICC1 GM2A IER5L ZNF138
GRID1 LMBRD1 TGFBR3 PARP16 CDKN1A
CTR9 SERPINB6 CTIF
PPP1R15A SERTAD3
C7orf46 RPS16 SH3GL3
RFTN2 CLPP RTF1 C9orf21
SEMA4A
MRPL40
PLP2 EIF2S3 CELF5 SNRPF SQRDL
PMP22
ADAMTS10 H2AFJ PTGES CSDA THRA FXYD5
ERGIC1 QKI FBXO31 ZHX3 ZEB2
LDHA EPS8L2 DDIT4 RPRD1A CKMT2
BEX5 EIF3H SECTM1 NTRK2 TRIP10 MAPRE2 MARVELD1
GSTM5
RCOR1 RPL29 CSAD DNMT3A
CLK3 RUVBL1
DDX42 OLA1 MSLN HDHD2 ATF6 MRPL13 C1orf95 C20orf177
MGAT1 SPOCD1 PBXIP1 CPSF3
SATB1 ZNF25 THNSL2 SNX4 LRTOMT Annu. Rev. Genet. 46 AIFM2 NR2F6 ATF4 general principles and functional models. , , 1–19. LDB2 TRIM9 RAET1G
CU676956 TNS1 C1QBP MAP3K7 MIAT KPNA2
ATIC NACA2 RPS6KB1 COA5 SLC9A9
PHLDB2 NCAPG PDK4 INPP5A
KLF9 ATF7IP CENPT RSL1D1 STARD3NL EXOC6 GNB2L1
ITPRIPL2 ZNF565 C19orf48 SLC39A9
RCBTB1 APOBEC3C ACTA2 CRY2 NRBP2 RPS15A SEMA5A ZNF835 PID1 RPS4X PHTF2 NOP16 RPL18 SLC2A4RG
SUN2 ARHGEF10L SRPK2 TNIP1 MYLK SNRPN KBTBD11 NF2
RUNX1 CRYAB CEP55 DCAF16 ILF2 G0S2 RHOB RNASE1 TMOD3 PRELP EFHA1 TCF25 KLF4 BCL6 CEP128 RAI2 HERC3 GPR56 JUN EEF1D
PDLIM1 RCAN2 CLDN5
EIF3J RPLP0 PPIA TAGLN2P1 ZBTB16 ALG3
ZNF346 LMOD1 TOM1L2
FIG4 S100A4 TP53INP2 FAM125B NR3C2
ZNF708 PGRMC2 KLF10 NPM3 FAM54A
TMEM19 SORBS2
ZNF362 EIF2D OSR2 SDC3
RPL13 CKS2 TP53I3
CASKIN2 WDR75 CTDSP2
C1QTNF6 CEBPB AURKA C11orf96 ZNF510 LTBP3 HNRNPAB
ST6GALNAC6 AVPI1 TRAPPC11 NDRG2 TNPO1 STAB1
LY6K TCEAL5 LBX2 CTPS
C1orf21 SOS1 CRNKL1 NFIB SNX30 TEF Causality TSC22D1 Pearl, J. (2009). . Cambridge university press. MELK TMEM173 ACOT11
ABHD10 CYBASC3 LMBR1 LMO7 MTMR10
ZNF75A EEF2 CD34
SEC24C TRIB3 RNF103 PLEKHM3
TLK1 YPEL3 ADAMTSL4 PSMD1 LY6E SYNRG ZNF280C B3GAT1 GPATCH2 RUNX1T1 VPS45 CRTC3 VLDLR RPS7 FXYD1
KIAA1147 PLCL1
MED1 CAPRIN1
RHOT1 MIR3187 LDLR CEP290
CSRP2BP SNRPD1
ARHGEF7 AURKB S100B RGN PEX1 ZNF643 RBM3 EXOC5 ADD2 VPS4A POLR3F JAZF1 GMFB
SLIT3 CAD
IFRD2 ERO1L
SNORD1C ATP2B1 KLHL7 DUSP5 VPS41 TUBA4A
DHX36 BNIP3 GLA ZFYVE21 FAM107A TPR A2M PIK3R1 C21orf63 MDM2
TP53BP1
MRPS12
HIF3A
FAM151B MYCT1 IGFBP5 CSDAP1
RC3H2 GPRASP1 C17orf85
SRA1 AASDHPPT APCDD1 GALNTL2 TPX2
PCDHB4 CXXC1 VAMP2
FHL5 NR1D1
NOTCH3 DDX49
LAMA2 ZKSCAN1
ZMIZ1 PLXDC1 CHEK2 Rockman, M. V. (2008). Reverse engineering the genotype–phenotype SNHG8 KIFC3 S1PR1 EIF2A BCAS3
RAB23
PVRL4 SPHK1 CRLF1 MAPK4
RPL36 PHF20L1 TICAM1
ZNF358
ADAMTS9-AS2
SIRT2
DDR1 RPL34 LGI4 METTL7A
RBCK1 RAPGEFL1
ADCY6
CCNB1
MED18
MOCS1 PDE5A KLHDC1 SSPN
ZNF134 AES PPAP2B PBX1
MAPK9 MAF ADCY4 UCK2
TBC1D16 EFS
TRAF3IP2
IL15RA FZD4
RPL8
SLC50A1
VSTM4
API5 STK24
OTUD3 map with natural genetic variation. Nature, 456(7223), 738–744.
MKNK2
ORC1
LYNX1
ACOX2
CTSF
TMC4
PEX11A
MGST2 DSG3
TMPRSS13
SDCBP2
BARX2 DEFB1 GALNT3
KLK7
MUC15 TACC2
GJB3 VAMP8
CYSLTR1
C2orf55 Sánchez-Castillo, M. et al. (2015). CODEX: A next-generation OVOL1 ANO1 PRSS8
CXCL13 DLG3
PLAC2 SYT8
VTCN1 LGALSL KLK5
MAPK13
LGALS2
KRT6A
CD74
CD3G KLF5 TMCO4 FXYD2
TTC22
AIM1L
FAM160A1
MUC20
IL20RA
PRKD2 DMAP1 LAD1 KRT13
RASAL2 KIAA1217
RGL2 ANK3
SH3BP1
ALS2CL
EFCAB4A PP14571ZNF750
PRSS22 CYB561 CCDC12 STAP2
SPINT1 CYP4B1 sequencing experiment database for the haematopoietic and embryonic EPS8L1
C20orf151 CMIP SCEL
TMPRSS4
CPNE5
MUC4 ATG9B
PSD4 MIR650
CEACAM1 SPINT2 EXPH5 C10orf95
LIFR
IGLV3-19
MYO3B
CLEC10A LYZ
PI4KB KRT5
MUC1 GPR110 TJP2 LY6D GRHL3
ILDR1
RREB1
ST6GALNAC2
ERP27
DQ126686 GRHL1
CDC42BPG
SLC2A9 FUT3 MPZL2 CD164L2 DSC2 TRIM29
PHLDB3
ZNF812 SCNN1B FAM83A KLK6
OSBPL5
LYPD3 SYTL1 OVOL2 INTS12 MPZL3
TMEM30B S100A2 RBM47 stem cell communities. Nucleic Acids Research, 43(D1), D1117–D1123. S100A9 FGFBP1 TMTC2
KRT15
PROM2
GGTA1P AP1G2 LLGL2 LRG1 EPN3
CBFA2T3
SBNO2 BNIPL
RXRA SLC22A18
MMRN2
PLEKHH3 LAG3
AQP3 JUP
PTK6
PLXNB1 TNFSF13
NOTCH1 CLDN4 ST14 EHF
TJP3
KRT14 HCAR2 SLC34A2
FRMD4B
KLF8
LCN2 CTLA4 GGT6 CCR2
SOWAHB
C17orf109
SLPI
F11R
ABLIM1
STX19 TACSTD2
CD40LG Schadt, E. E. (2009). Molecular networks as sensors and drivers of DUOXA1
IGHV6-1 XDH ACPP
SVIL NAALADL2
DNASE1L3 WFDC2 SULT2B1
ARHGAP32
SCNN1A
SLC28A3
GPR172B SH3D21 KLK10
KRT23
LAMA3
FUT2
GRHL2
LYPD2
SLC44A4
KHNYN common human diseases. Nature, 461, 218. Schadt, E. E. et al. (2005). An integrative genomics approach to infer
Fig. 6. Network representation of transcription factor-targets predicted using Findr-A causal associations between gene expression and disease. Nature causal inference on FANTOM5 human dataset Genetics, 37(7), 710–717. Storey, J. D. and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences, Chen, L. S. et al. (2007). Harnessing naturally randomized transcription to 100(16), 9440–9445. infer regulatory relationships among genes. Genome Biology, 8, R219. Thurman, R. E. et al. (2012). The accessible chromatin landscape of the Cusanovich, D. A. et al. (2014). The functional consequences of variation human genome. Nature, 489(7414), 75–82. in transcription factor binding. PLoS Genet., 10(3), e1004226. Wang, L. and Michoel, T. (2017). Efficient and accurate causal inference Danko, C. G. et al. (2015). Identification of active transcriptional with hidden confounders from genome-transcriptome variation data. regulatory elements from GRO-seq data. Nat. Methods, 12(5), 433–438. PLOS Computational Biology, 13(8), e1005703. Dunn, S. J. et al. (2014). Defining an essential transcription factor program Xu, H. et al. (2013). ESCAPE: database for integrating high-content for naïve pluripotency. Science, 344(6188), 1156–1160. published data collected from human and mouse embryonic stem cells. Forrest, A. et al. (2014). A promoter-level mammalian expression atlas. Database (Oxford), 2013, bat045. Nature, 507(7493), 462–+. Han, H. et al. (2018). TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions. Nucleic Acids
i i
i i