<<

Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

Leveraging chromatin accessibility for transcriptional regulatory network inference in T Helper 17 Cells

Emily R. Miraldi1,2,10, Maria Pokrovskii3, Aaron Watters4, Dayanne M. Castro5, Nicholas De Veaux4, Jason A. Hall3, June-Yong Lee3, Maria Ciofani6, Aviv Madar7, Nick Carriero4, Dan R. Littman3,8,10, Richard Bonneau4,5,9,10

1Divisions of Immunobiology and Biomedical Informatics, Cincinnati Children’s Hospital, Cincinnati, OH 45229, U.S.A. 2Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45257, U.S.A. 3Molecular Pathogenesis Program, The Kimmel Center for Biology and Medicine of the Skirball Institute, New York, NY 10016, U.S.A. 4Center for Computational Biology, Flatiron Institute, New York, NY 10010, U.S.A. 5Department of Biology, New York University, New York, NY 10012, U.S.A. 6Department of Immunology, Duke University School of Medicine, Durham, NC 27710, U.S.A. 7Current Address: Novartis Institutes for Biomedical Research, Cambridge, MA 02139, U.S.A. 8The Howard Hughes Medical Institute 9Center for Data Science, New York University, New York, NY, 10010 10Corresponding Authors: [email protected], [email protected], [email protected]

Key words: regulatory network, Th17, ATAC-seq, LASSO, StARS

Running Title: Regulatory networks from RNA-seq and ATAC-seq

Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

Abstract:

Transcriptional regulatory networks (TRNs) provide insight into cellular behavior by describing interactions between transcription factors (TFs) and their gene targets. The Assay for Transposase Accessible Chromatin (ATAC)-seq, coupled with transcription-factor motif analysis, provides indirect evidence of chromatin binding for hundreds of TFs genome-wide. Here, we propose methods for TRN inference in a mammalian setting, using ATAC-seq data to influence modeling. We rigorously test our methods in the context of T Helper Cell Type 17 (Th17) differentiation, generating new ATAC-seq data to complement existing Th17 genomic resources (plentiful gene expression data, TF knock-outs and ChIP-seq experiments). In this resource-rich mammalian setting, our extensive benchmarking provides quantitative, genome-scale evaluation of TRN inference combining ATAC-seq and RNA-seq data. We refine and extend our previous Th17 TRN, using our new TRN inference methods to integrate all Th17 data (gene expression, ATAC-seq, TF KO, ChIP-seq). We highlight newly discovered roles for individual TFs and groups of TFs (“TF-TF modules”) in Th17 gene regulation. Given the popularity of ATAC-seq, which provides high-resolution with low sample input requirements, we anticipate that our methods will improve TRN inference in new mammalian systems, especially in vivo, for cells directly from humans and animal models.

Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

Introduction

Advances in genome-scale measurement and mathematical modeling herald opportunities for high-quality reconstruction of transcriptional regulatory networks (TRNs). TRNs describe the control of gene expression patterns by transcription factors (TFs) (Hecker et al. 2009; Chai et al. 2014), providing mechanistic (and often genome-scale) insight into the complex regulation of cellular behavior (Bonneau et al. 2007). Measurements of chromatin state represent one such advance for TRN inference. For example, chromatin-immunoprecipitation with sequencing (ChIP-seq) (Robertson et al. 2007) enables identification of an individual TF’s binding sites genome-wide. These data provide evidence for regulatory interactions based on proximity of the TF binding site to the gene and have proved valuable for TRN inference (Lee et al. 2002; Ouyang et al. 2009; Ciofani et al. 2012). However, ChIP-seq might not be feasible for cell types and physiological settings where sample material and a priori knowledge of key transcriptional regulators is scarce.

Genome-scale chromatin accessibility measurements (Xi et al. 2007; Boyle et al. 2008; Giresi et al. 2007; Buenrostro et al. 2013) and ChIP-seq for histone marks (Barski et al. 2007) correlate with promoters, enhancers, and/or other locus control regions. These data can partially overcome limitations in a priori knowledge of cell-type-specific TF regulators, if integrated with TF DNA-binding motifs (Pique-Regi et al. 2011). Large-scale efforts to characterize TF motifs are ongoing, with motifs currently available for ~1000 TFs in human (~60% coverage) (Weirauch et al. 2014; Jolma et al. 2010; Lambert et al. 2018). Thus, chromatin state experiments integrated with TF motif analysis provide indirect DNA-binding evidence for hundreds of TFs. This scale would be difficult to attain from individual TF ChIP-seq experiments. Of techniques available, the Assay for Transposase-Accessible Chromatin (ATAC)-seq (Buenrostro et al. 2013) best overcomes limitations in sample abundance, requiring two orders of magnitude fewer cells than a typical ChIP-seq, FAIRE-seq, or DHS experiment in standard, widely-adopted protocols. ATAC-seq is also possible at single-cell resolution (Buenrostro et al. 2015).

In the context of TRN inference, chromatin state measurements provide an initial set of putative TF-gene interactions, based on evidence of TF binding near a gene locus. Evidence, be it direct (TF ChIP-seq) or indirect (e.g., TF motif occurrence in accessible chromatin), can be used to refine gene expression modeling (Blatti et al. 2015; Wilkins et al. 2016; Qin et al. 2014). Integration of chromatin state data in TRN inference could mitigate false-positive and false- negative TF-gene interactions expected from chromatin-state data analyzed in isolation (Äijö et Bonneau 2016; Siahpirani et Roy 2016). Sources of false positives and negatives include: (1) non-functional binding, (2) long-range interactions between and regulatory regions (3) the limited availability of individual TF ChIP experiments and incomplete knowledge of TF motifs, and (4) non-bound accessible motifs. Thus, an initial TRN derived solely from chromatin state data can be considered a useful but noisy prior, to be integrated with other data types for TRN inference.

Genome-scale inference of TRNs in mammalian settings is an outstanding challenge, given the increased complexity of regulatory mechanisms relative to simpler eukaryotes. Thus, chromatin state data are especially important for mammalian TRN inference. Construction of a genome- scale TRN for T Helper Cell Type 17 (Th17) differentiation provided a proof-of-concept for this idea (Ciofani et al. 2012). Rich genomics datasets informed the Th17 TRN: 143 RNA-seq experiments (including knock out (KO) of 20 TFs), ChIP-seq of 9 TFs, and microarray from the Immunological Genome Project (Heng et al. 2008). We used the Inferelator algorithm (Bonneau et al. 2006; Madar et al. 2010) to infer TRNs from the RNA-seq and microarray, and Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

independent methods to build networks from TF ChIP and KO data. We showed that rank combination of the networks performed best recovering known Th17 genes and GWAS disease genes associated with Th17 pathologies.

Since the original Th17 TRN publication, the Inferelator algorithm underwent developments that improve inference in unicellular organisms and are expected to improve TRN inference in a mammalian setting (Greenfield et al. 2013; Arrieta-Ortiz et al. 2015). While the Inferelator’s core model of transcriptional regulation still describes differential gene expression as a sparse multivariate linear function of TF activities, the methods to solve for the TF-gene interaction terms and estimate TF activities have advanced. For example, the current Inferelator (Arrieta- Ortiz et al. 2015) uses a Bayesian approach to incorporate prior information (Greenfield et al. 2013).

The focus of this work is development of mammalian TRN inference methods from chromatin accessibility and gene expression, data types available or likely feasible for an ever-growing number of cell types and biological conditions. In the context of mammalian TRN inference, several studies build TRNs directly from chromatin accessibility without further refinement by multivariate gene expression modeling (Neph et al. 2012; Rendeiro et al. 2016). Several other studies leverage variance in paired RNA-seq and ATAC-seq datasets; these TRN methods are exciting developments but require that ATAC-seq data for all or most RNA-seq conditions (Karwacz et al. 2017; Ramirez et al. 2017; Duren et al. 2017). In contrast, the present work is geared for TRN inference from RNA-seq and ATAC-seq, where ATAC-seq need not exist for more than one gene expression condition.

Development of any TRN inference method requires a comprehensive benchmark with a realistic experimental design, a recurrent challenge in computational biology. We previously developed substantial genomic resources in Th17 cells (Ciofani et al. 2012) and, with the addition of ATAC-seq to these resources, Th17 could be a powerful system to compare network inference from RNA-seq and ATAC-seq to a “gold standard” network constructed through the more laborious approach of TF ChIP-seq and TF KO RNA-seq. Given the central role of Th17 cells in the etiology of autoimmune and inflammatory diseases (Littman et Rudensky 2010; Stadhouders et al. 2018), an updated map of transcriptional regulation in Th17 (incorporating new experimental and computational advances) could also enhance our understanding of Th17 biology in health and disease.

Results

Construction of Th17 benchmark for TRN inference from ATAC-seq and RNA-seq To test the feasibility of TRN inference from chromatin accessibility and gene expression alone, we generated an ATAC-seq dataset in Th17 cells and other in vitro polarized T Helper (Th) cells, matching a subset of experimental conditions from the original publication (Ciofani et al. 2012) (Fig. 1A). We identified 63,049 accessible regions (peaks), and clustering revealed that most dynamically changed over the Th-polarization time courses (Supplemental Fig. S1). These patterns are also apparent from Principal Component Analysis (PCA) (Fig. 1A). Time was the most important driver of accessibility patterns. The first principal component (PC) explained 55% of the variance and captured peaks changing from two to 48hrs in Th17, Th0, Th2, Treg. The second PC captured accessibility differences between Th17 and the other Th polarizations. The ATAC-seq dataset contains additional perturbations, including TF KO of Stat3 and Maf for Th17 and Th0 conditions (48hr). STAT3 is required for Th17 differentiation and Stat3 KO dramatically altered Th17 chromatin accessibility, leading to a Th0-like profile (Fig. S1, 1A, red arrow), while Maf KO clustered with Th17 (Fig. S1, 1A, gray arrow). Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

To the 143 RNA-seq experiments from the original publication, we added an additional 111 RNA-seq experiments, for a total of 254 (Fig. 1B, Methods). The majority (166 samples) were Th17, spanning 1hr to 108hrs, involving KO, siRNA knock down, and/or drug inhibitors of TFs and signaling molecules. The study design also included other Th polarizations (Th0 (53), Tr1 (9), and Th1/Th2/Treg (2 each)) as well as naïve CD4+ T cells (25). Mirroring chromatin accessibility patterns, PCA of the gene expression data revealed time and T-Cell polarization conditions to be important drivers of transcriptional variation (Fig. 1B, S2).

While gene expression data are the only required input for the Inferelator, we hypothesized that inclusion of ATAC-seq data would improve TRN inference. Integration of ATAC-seq with TF motifs provides indirect evidence for the TF-binding events driving altered chromatin state (Fig. 1A) and transcription (Fig. 1B). We generated two “prior” networks of TF-gene interactions from the ATAC-seq data: the A(Th17) prior, limited to Th17 48hr conditions, and the A(Th) prior, including all 33 Th samples (Methods). The ATAC priors contained >1 million putative TF-gene interactions for ~800 TFs (Supplemental Table S1A). To test these noisy priors in TRN inference, we developed a study design enabling quantitative performance evaluation of the resulting TRNs (Fig. 1C). Specifically, the TRNs are evaluated based on precision-recall of an independent gold standard (G.S.), composed of TF-gene interactions supported by TF KO and TF ChIP data. As precision-recall is limited to the TFs previously selected for KO (25 TFs) and/or ChIP-seq (9 TFs), we also evaluate TRN methods based on out-of-sample gene expression prediction.

Using precision-recall and out-of-sample prediction metrics, we evaluate the effects of several key modeling decisions. Fig. 1C outlines inputs to the Inferelator algorithm. From the gene expression dataset, we seek to model the expression patterns of 3578 “target” genes (Methods) as functions of TF activities (TFAs). TFAs are rarely measured and technically infeasible for most TRN experimental designs. Thus, TF activity is a hidden (or latent) variable in TRN inference (Liao et al. 2003; Fu et al. 2011). TF mRNA is the most common TFA estimate. However, many TF transcriptional activities require protein post- translational modification. Thus, TF mRNA can be a poor proxy for protein TFA. TFA estimation based on prior knowledge of TF target genes provides an alluring alternative, as it appears to be technically feasible – requiring only partial a priori knowledge of TF-gene interactions and gene expression data (Methods). “Prior-based” TFA improved TRN inference in unicellular organisms (Arrieta-Ortiz et al. 2015; Tchourine et al. 2018). Here, we evaluate this approach in a mammalian setting.

We test two methods for model building: (1) Bayesian Best Subset Regression with Bayesian Information Criteria for model selection (BBSR-BIC) (Arrieta-Ortiz et al. 2015) and (2) an alternative proposed here, modified Least Absolute Shrinkage and Selection Operator (Studham et al. 2014; Gustafsson et al. 2015) with Stability Approach to Regularization Selection (Liu et al. 2010) (mLASSO-StARS). We hypothesized that mLASSO-StARS would scale better with the increased transcriptional complexity of a mammalian setting (e.g., requiring larger models) (Methods). Thus, we compare mLASSO-StARS and state-of-the-art BBSR-BIC.

Prior information can enter the inference procedure at two steps (1) prior-based TFA estimation (described above) and (2) to reinforce prior-supported TF-gene interactions at the multivariate regression step, using BBSR-BIC or mLASSO-StARS (Fig. 1C). The strength of prior reinforcement is an important TRN inference parameter; it controls the relative contribution of the prior (e.g., TF ChIP, ATAC-seq motif analysis) to evidence from the gene expression model Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

(variance explained by individual TFs). Thus, we test several levels of reinforcement in our study design and compare sources of prior information, in addition to ATAC-seq.

Modified LASSO-StARS improves inference of a mammalian TRN As illustrated in Fig. 1C, we use precision-recall to evaluate the impact of modeling decisions on TRN inference. This analysis depends on the quality of the G.S. Both TF KO and TF ChIP-seq gold standards have caveats. Differential expression analysis of TF KOs yields an imperfect G.S., as cellular TRNs adapt to the KO over time. Paralog compensation can lead to false negatives, while regulators downstream of the knocked-out TF can lead to secondary gene expression changes (false positives). The TF ChIP G.S. will also contain false positives (ChIP- seq peaks are not necessarily functional) and false negatives (peak-gene associations are based on linear proximity). Generating a G.S. from edges supported by both TF KO and TF ChIP reduces false positives, but at the expense of false negatives. Thus, precision-recall is a nuanced metric of method quality. For each G.S., Supplemental Table S1A summarizes the number of edges, TFs, and target genes as well as percent overlap with other priors. Because we have KO data for 25 TFs but KO+ChIP data for only 9 TFs, we also evaluate precision-recall of the KO G.S.

As expected, a TF ChIP-seq prior improves Th17 TRN inference (Fig. 2A, left panel). The ATAC prior boosts performance relative to the “No Prior” control TRNs (Fig. 2A central and right panels, Supplemental Fig. S3). In comparison to the ChIP-seq prior, the boost from the ATAC prior on the KO G.S. is smaller, likely reflecting increased levels of noise (e.g., from motif-based TF binding prediction). Also, in contrast to ChIP prior results, increasing the strength of prior reinforcement from moderate to high yields no advantage for the noisier ATAC prior. This suggests ATAC-seq prior reinforcement should be limited to moderate rather than high; gene expression data should be relied on to select a small subset of the regulatory hypothesis from the ATAC-seq prior network. For similar levels of prior reinforcement, prior-based TFA models outperform TF mRNA at low recall. For all ATAC TRNs and both gold standards, mLASSO- StARS outperforms BBSR-BIC.

To explore experimental designs without context-specific chromatin accessibility, we tested two contrasting, publicly available prior information sources. The first is TF motif analysis of ENCODE DHS data from 25 mouse tissues, none of which include Th17 (Stergachis et al. 2014). The second is derived from the curated TRRUST database of human TF-gene interactions (Han et al. 2015). While the ENCODE DHS prior includes ~1.5 million interactions between 546 TFs and ~17k genes (similar scale to the ATAC priors), the TRRUST prior is sparse: ~7k interactions between 582 TFs and ~2k genes (Supplemental Table S1A). The TRRUST and ENCODE priors overlap less with the gold standards than context-specific priors, and this is reflected in lower precision-recall, relative to ChIP and ATAC priors (Supplemental Fig. S3). However, use of either the ENCODE or TRRUST prior improves performance relative to the No Prior control; this improvement is substantial for prior-based TFA models. Again, across priors, TFA methods, and levels of prior reinforcement, mLASSO-StARS outperformed BBSR-BIC.

To evaluate performance on experimental designs with fewer gene expression samples, we reduced the gene expression matrix from 254 to 50 randomly selected samples (Supplemental Fig. S4). The reduced sample size had minor impact on precision-recall, especially for context- specific ChIP and ATAC priors. These results bode well for extension of our methods to contexts where gene expression data are less abundant.

Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

We also evaluated how the different modeling decisions affect target prediction for each TF (Fig. 2C). There is nearly an order-of-magnitude difference in TF degree in the KO+ChIP G.S. (Fig. 2B), so this per-TF analysis additionally ensured that results were not dominated by a few high-degree TFs. Overall, mLASSO-StARS also outperformed BBSR-BIC at TF-specific AUPR resolution (Supplemental Fig. S5).

AUPRs for many TFs were dependent on TFA estimation procedure (Fig. 2C). TF mRNA should work well for TFs whose main source of regulation is transcriptional, while, for TFs regulated by posttranslational modification, prior-based TFA would be preferable. Consistent with this, prior-based TFA models have higher AUPR for STAT3, while prior-based TFA did not always improve prediction of RORC targets. With the ChIP prior (which included RORγt TF ChIP), prior-based TFA AUPR was on par with TF mRNA AUPR. However, for the noisier ATAC-seq prior, prior-based TFA performed little better than random, while TF mRNA models (including the “No Prior” control) performed well across G.S.’s. For ATAC-based TRN inference, target prediction for some TFs was better using prior-based TFA (HIF1A, STAT3, NFE2L2), while TF mRNA was better for some TFs (RORC, MAF, FOSL2) and roughly equivalent for others. Summarizing across priors and parameter sets, no TFA method dominates (Supplemental Fig. S5B). Based on these results, we later construct “final” Th17 TRNs using both TFA estimation methods.

Th17 TRN models predict out-of-sample gene expression patterns We next evaluated whether the TRN models could predict out-of-sample gene expression patterns. In contrast to precision-recall, gene expression prediction provides the opportunity to evaluate all interactions in the model. This evaluation method is especially important in poorly characterized cellular contexts, for which gold standards do not exist. We chose three out-of- sample prediction leave-out sets, each with distinct patterns of gene expression (highlighted in Fig. 3A). The three leave-out sets were: “Early Th17” (all Th17 time points between 1-16 hours, 8 samples), “All Th0” (Th0 samples for all time points and perturbations, 53 samples), and “Late Th17” (18 Th17 samples from 60-108h post-TCR stimulation). For both BBSR-BIC and mLASSO-StARS methods, we tested prediction over a range of edge confidence values and corresponding model sizes. We quantified performance using r-squared of prediction, (Fig. 3B, 3C); >0 indicates that the model has predictive benefit (Methods). Across all leave-out sets and methods, out-of-sample prediction improved most as models expanded from average size of 1 to 5 TFs/gene (Fig. 3B). Most methods performed similarly well from 0-10 TFs/gene, with the exception of BBSR-BIC models using prior-based TFA, where prediction was worse. Predictive performance plateaued at ~10-15 TFs/gene, depending on the leave-out set (Fig. 3B, S6). For model sizes 10-15 TFs/genes, the mLASSO-StARS models outperformed BBSR-BIC models (Fig. 3B, 3C). These results, together with precision-recall analyses, support mLASSO-StARS over BBSR-BIC for mammalian TRN inference.

While we recommend StARS edge stabilities to rank interactions, we used the best quality metrics at hand (precision, recall, and ) to guide selection of model-size cutoff for the “final” Th17 TRN (Methods). These quality metrics are plotted versus model size (Fig. 3D, S6, S7) for TRNs with the Th17 ATAC (“ATAC-only”) or ChIP+KO+ATAC priors. (The ChIP+KO+ATAC prior (Methods) represents our best (combined) source of prior information and is later used to derive our “final” Th17 TRN.) Once average model sizes reach ~15 TFs/gene (Supplemental Fig. S6), predictive performance plateaus, suggesting an average of 15 TFs/gene as a cutoff for edge inclusion in the network. Standardizing network sizes to 53k TF-gene interactions (~15 TFs/gene), we calculated the percent edge overlap among TRNs built from ATAC, ChIP, KO, ENCODE DHS, TRRUST and combined priors. For each prior, we considered five modeling Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

modes: prior-based TFA with no, moderate or strong prior reinforcement and TF mRNA TFA with moderate or strong prior reinforcement. The percentage of shared edges between TRNs ranged from 83% to 10%. We clustered the networks to visualize how the modeling decisions affected resulting TRNs on a global scale (Supplemental Fig. S8, Supplemental Note 1),

“Core” Th17 TRNs contain literature-supported TF-gene interactions Our primary objective is to assess the feasibility of high-quality TRN inference from gene expression and ATAC-seq data. Therefore, it is important to examine the Th17 TRNs at high resolution. Here, we focus analysis on TRN predictions for 18 “core” Th17 TFs and genes, readily familiar to Th17 biologists (Fig. 4). TF-gene interactions in this “core” have been the focus of many studies (Christie et Zhu 2014; Li et al. 2014), which we leverage to evaluate the ATAC-based Th17 core TRNs.

From the literature and the KO-ChIP data (Fig. 4C), there is support for edges between RORC and several key Th17 cytokines and receptors: Il17a, Il17f, Il22, Il1r1 and Il23r. Two of these interactions (Il17a, Il23r) were present in the ATAC prior and resulting TRNs (Fig. 4A). Gene expression modeling with TF mRNA was sufficient to recover four interactions (Il17a, Il17f, Il1r1 and Il23r) in the No Prior TRN Fig. 4B; these were also recovered in the ATAC TF mRNA TRN. The prior-based TFA ATAC TRN (right panel Fig. 4A) recovers a data-driven edge between RORC and Il22. By combining predictions from both ATAC TRNs (Methods), all five RORC targets are recovered (Fig. 4A, central panel). As expected, inclusion of RORC ChIP and/or KO in the prior also leads to recovery of all five TF targets (Fig. 4D).

STAT3 is required for Th17 differentiation, playing a crucial role in driving Rorc expression. There is support for this interaction not only from context-specific ATAC-seq prior but also the ENCODE DHS prior. Consistent with the poor quality of mRNA-based STAT3 predictions (Fig. 2C), this interaction is not present in the No Prior TRN. It is, however, recovered by all ATAC and ENCODE TRNs, even those built with TF mRNA, where prior reinforcement likely overcomes weak correlation between TF mRNA and protein activity level (Fig. 4A-C, S9).

MAF is another key regulator of Th17 cytokine and expression, with KO-ChIP support for Il17a, Il17f, Il23r, and Il1rb. There is ATAC-seq support for MAF regulation of Il17a, Il17f, Il23r. The prior-supported targets are recovered by the TF-mRNA ATAC models, but only the Il23r interaction is present in prior-based TFA models (Fig. 4A). Similar to RORC, Maf mRNA might be the better proxy for TFA. Prior reinforcement also played a role, as only two of the four interactions are present in the No Prior TRN (Fig. 4B). In the absence of context-specific prior information and a strong signal from the gene expression model, only a single edge (Il23r) was recovered by one of the ENCODE models (Supplemental Fig. S9).

These results highlight the potential for TRN inference in new settings, where integration of chromatin accessibility and gene expression is more feasible than sequential TF ChIP and KO experiments. Consistent with the TF-resolved AUPR analysis (Fig. 2C), they also suggest that there is value to building models from both TFA methods. For construction of the “final” Th17 TRN, we combine models based on both TFA methods (Methods). The literature-curated core of our final Th17 TRN contains the RORC and MAF cytokine and receptor interactions highlighted from the literature, as well as the established connection between STAT3 and Rorc (Fig. 4D),

ATAC-derived Th17 TRNs contain known and novel Th17 TFs Having verified that the Th17 TRNs contain core Th17 TF-gene interactions from the literature, we develop a global, unbiased analysis of the final ChIP+ATAC+KO TRN to identify “core” Th17 Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

regulators de novo. In addition, we extend our analysis to a final TRN using the ATAC-only prior, to simulate mammalian TRN inference in less well-studied systems, where KO and/or ChIP data might be unavailable. Overall, prior-supported edges make up 63% and 43% of the ~53k TF-gene interactions in ChIP+KO+ATAC and ATAC-only TRNs, respectively (Supplemental Table S5). Of the 715 potential TF regulators considered for final models, nearly all (~95%) have targets in the final TRNs, with positive interactions outnumbering negative nearly two-fold (1.8:1 (ChIP+KO+ATAC) or 1.9:1 (ATAC-only)). TF degree varies dramatically (Supplemental Fig. S10, S11). Whereas the ATAC-only network democratizes TF degree distribution (no TF has > 500 targets), the addition of ChIP and KO leads to very high degree for several TFs in the ChIP+KO+ATAC TRN (>500 targets for IRF4, BATF, MAF, SP4, FOSL2, and STAT3). While there is a bias for TFs in the prior to have higher degree, several TFs without prior support have >100 targets in the final networks (4 TFs for the ChIP+KO+ATAC TRN and 14 TFs for the ATAC-only TRN). Thus, prior information is not weighted so strongly as to preclude inclusion of TFs without known motifs. This aspect is important for discovery and holds for TFs with edges in the prior. While only 16% or 7% of input prior edges remain in the TRN, 27% or 46% of learned regulatory interactions for TFs with prior information are new (not originally in the prior) in KO+ChIP+ATAC or ATAC-only TRNs, respectively. Thus, our method can reduce both false negatives and false positives found in prior networks. For example, motif analysis of the RORC ChIP data revealed that only ~1/3 of -4 RORC peaks contained a RORC motif (at motif occurrence cutoff Praw=10 ). Although RORC can bind DNA directly, the RORC ChIP data suggests that RORC might also bind DNA indirectly (e.g., via TF complexes). Such indirect binding would be difficult to detect by ATAC- seq motif analysis alone, but their gene targets can be recovered via gene expression modeling.

Many of the TFs with highest degree are shared between the ChIP+KO+ATAC and ATAC-only TRNs (e.g., IRF4, BATF, SP4, RXRA, STAT3,... Supplemental Fig. S10, S11). We developed an unbiased approach to identify key regulators of the Th17 program. TFs were included in the set of “core” Th17 regulators if they met one of two criteria: (1) The TF promotes Th17 gene expression through activation of Th17 genes or (2) The TF promotes Th17 expression through repression of non-Th17 genes (Methods). Similar de novo Th17 core TFs were recovered from both ChIP+KO+ATAC and ATAC-only TRNs, with several recognizable Th17-specific TFs from the literature (RORA, RORC, STAT3, MAF) in both networks (Fig. 5A,B). We note that this “core” TF analysis is robust to model-size cutoffs, as analysis of TRNs with average model-size 5 or 10 TFs/gene yields similar results (Supplemental Fig. S12). Similarly, top-degree TFs per TRN are robust across model sizes (Supplemental Fig. S13).

TF-TF modules exhibit coordinated control of gene pathways in Th17 To aid in exploring the large Th17 TRNs (~53k TF-gene interactions), we identified clusters of TFs with significant overlap in target genes (see Methods, Fig. 5C, S14, S15). We then applied a comprehensive gene-set enrichment analysis to predict functional roles for the “TF-TF clusters”, looking for consensus among pathway enrichments from five databases (, Pathway Commons, KEGG, WikiPathways and signatures from MSigDB (Gene Ontology Consortium 2004; Cerami et al. 2010; Kanehisa et Goto 2000; Pico et al. 2008; Liberzon et al. 2011)) (Supplemental Fig. S16, S17). Most clusters were conserved between ChIP+KO+ATAC and ATAC-only networks, and, within clusters, TFs shared features. For example, several clusters contained TFs defined in the de novo Th17 cores. RORC was a member of a Th17-promoting TF-TF Module including RORA, NR1D1, VAX2 (light-blue square, Fig. 5C, S14-S17); functional annotations for this cluster include “IL23 Signaling” and “Rheumatoid Arthritis”, which are consistent with prior knowledge. Th17-promoting genes HIF1A, HIF3A, DPF1, SP9, and SCRT1 cluster with 5-6 other TFs (green square), and Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

enrichments for this cluster include “hypoxia”, “HIF1A network”, and “glycolysis”.

Other clusters contained TFs that promote the expression of genes repressed at 48hr in Th17 cells. One such cluster contained Th1 TFs (IRF1, STAT1 and STAT2) with additional interferon response factors and STATs (hot-pink box, Fig. 5C). As expected, this “interferon cluster” has enrichments for “Response to Interferon Gamma”, “Type 1 Interferon Pathway”, “Response to Virus”. Although TF gene expression for this cluster is highest in Th1 relative to other Th populations at 48hrs, gene expression is at its highest at the 1hr Th17 time point, suggesting an interferon-like response for Th17 cells very early in the Th17 polarization time course (Supplemental Fig. S14, S15). This result is consistent with predictions from another Th17 TRN (Yosef et al. 2013), in which authors also predict roles for IRF1, IRF2, IRF9, STAT1, STAT2 within the first four hours of Th17 polarization. Both findings are consistent with potential plasticity, observed in vivo, in Th17 cell programs that are homeostatic or pathogenic, with expression of Th1-like features in the latter.

Gene-set enrichment provides functional predictions for other TF-TF Modules, including: “Amino Acid Transport”, “Integrin Signaling”, “DNA Mismatch Repair”, “ Signaling” and others (Fig. 5C, S14-S17). These predictions provide further confirmation of TRN quality, as many modules have predicted function in processes for which individual TFs are already implicated (e.g., HIF1A and HIF3A in the HIF1A/hypoxia module, TRP53 and TRP73 in the p53-signaling module). TF-TF modules and functional annotations are largely conserved between KO+ChIP+ATAC and ATAC-only TRNs, the latter prior network being much more economically feasible than the first. More fundamentally, these predictions suggest how altering sets of TFs might influence Th17 pathways and responses.

New phenotypes are associated with TFs in the Th17 TRN Th17 cells contribute to the pathogenesis of multiple autoimmune diseases (Stadhouders et al. 2017). We previously tested whether genes co-regulated by the “Th17 core” (RORC, STAT3, BATF, IRF4 and MAF) were enriched for gene sets from GWAS of nine autoimmune diseases and three “negative controls” (Alzheimer’s, schizophrenia and Type 2 Diabetes) (Ciofani et al. 2012). Consistent with the known role for Th17 in autoimmune disease, genes from the autoimmune-disease sets were enriched (Ciofani et al. 2012). Since then, Th17 cells have also been implicated in obesity-related diseases (Endo et al. 2017; Harley et al. 2014) and psychiatric disorders (Debnath et Berk 2014; Choi et al. 2016). In parallel, the number of genome-wide association studies grew exponentially (MacArthur et al. 2016), and, as demonstrated above, our network model improved in both comprehensiveness and accuracy.

We performed an extensive, unbiased GWAS analysis of our “final” updated (KO+ChIP+ATAC) Th17 TRN, including any phenotype with five or more associated genes. 991 phenotypes met this criterion. Not only did we dramatically expand the phenotypes considered, we more broadly queried the Th17 TRN. For each of the 605 TFs individually, we tested for TF-target genes enrichment in each of the GWAS gene sets (Supplemental Table S6). Despite the large number of TF-phenotype associations tested, eight reached significance (FDR = 10%, Fig. 6A). STAT3 targets were significantly enriched for genes associated with inflammatory bowel disease (IBD) as well as the two IBD-subtypes, Crohn’s disease and ulcerative colitis. Both genetic (Cho 2008) and functional studies (Xavier et Podolsky 2007) support a role for STAT3 in IBD; indeed, STAT3 is a proposed IBD therapeutic target (Lee et al. 2015; Nguyen et al. 2015). Our analysis also newly implicates FOXB1 in regulation of IBD genes. We compared the centrality of STAT3 and FOXB1 in the Th17 TRN (Supplemental Fig. S18A) to their centrality in the subnetwork limited to the 54 IBD genes in the Th17 TRN (Fig. 6B, left panel). We Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

examined both degree and betweenness centrality. For each TF, betweenness is the fraction of shortest paths connecting TFs to target genes in the network that contain the TF. Whereas degree is a local measure (TF’s direct effect on gene expression), betweenness is a more global measure of TF importance, as it can also capture TFs that regulate a large number of genes through control of other TFs. Although STAT3 had the sixth-highest degree in the full Th17 TRN (Supplemental Fig. S18A), it has the highest-degree TF in the IBD subnetwork (Fig. 6B). Relative degree more than doubles for both STAT3 and FOXB1 in the IBD subnetwork, and betweenness centrality increased, too (Fig. 6B). The IBD genes regulated by STAT3 and FOXB1 include a number of Th17 genes: Rorc, Il23r, Tnfsf15 (Fig. 6B, right panel).

NFKB2 and ETS1 are also associated with immune phenotypes (Fig. 6A). NFKB2’s targets are enriched in the phenotype “Chronic inflammatory diseases (ankylosing spondylitis, Crohn's disease, psoriasis, primary sclerosing cholangitis, ulcerative colitis) (pleiotropy)” (Supplemental Fig. S18B). Mutations in NFKB2 have been previously associated with common variable immunodeficiency (CVID) (Liu et al. 2014; Chen et al. 2013; Lindsley et al. 2014), a heterogeneous disorder in which 25% of patients suffer autoimmune disorders, including thrombocytopenia purpura, autoimmune hemolytic anemia, rheumatoid arthritis, and autoimmune enteropathy (which can be classified as Crohn’s disease) (Cunningham-Rundles 2008; Lopez-Herrera et al. 2012). Thus, NFKB2 was previously genetically associated with pleiotropic autoimmune diseases, in the context of CVID. (We note that our set of GWAS phenotypes did not include CVID.) ETS1’s targets are associated with “Neutrophil percentage of granulocytes” (Supplemental Fig. S18C). ETS1 is known to repress the Th17 program (Moisan et al. 2007). Ets1 expression decreases over the course of both Th0 and Th17 polarization, and, of the 48hr Th polarization conditions, has highest expression in Treg. Mutations in ETS1 have been associated with systemic lupus erythematosus (SLE) (Leng et al. 2011), an autoimmune disease in which the role of neutrophils has become increasingly appreciated (Smith et Kaplan 2015). Thus, a predicted role in neutrophil regulation could be consistent with the known role of ETS1 in SLE.

Discussion

Th17 cells protect mucosa from bacteria and fungi but can also drive autoimmune and inflammatory disease (Khader et al. 2009; Littman et Rudensky 2010; Stadhouders et al. 2018). These diverse roles require coordination of thousands of genes. TF regulation of gene expression provides a map for immuno-engineering Th17 behavior in disease. Researchers in academia and industry have used our first genome-scale Th17 transcriptional regulatory network (Ciofani et al. 2012) to develop hypotheses in the context of autoimmunity (Patel et Kuchroo 2015; Yang et al. 2014; Isono et al. 2014). Here, we provide an important update to our knowledge of Th17 transcriptional regulation, enabled by technical advances in genomic measurement and computational advances in TRN inference. KO data for 20 TFs and TF ChIP data for 9 TFs were central to the original Th17 TRN, providing excellent coverage of TF-gene targets for TFs in that set. However, technical limitations and cost precluded application of these tools to the hundreds of TFs expressed over the course of Th17 differentiation, all of which could play important roles in Th17 gene expression regulation.

In combination with large-scale efforts to learn TF DNA-binding motifs (Weirauch et al. 2014; Jolma et al. 2013; Najafabadi et al. 2015; Badis et al. 2009), the advent of ATAC-seq represents an opportunity to overcome limitations of sequential TF ChIP experiments, expanding the number of TFs with chromatin binding profiles by over an order of magnitude. In addition, although TF KO and ChIP data were pragmatically limited to 48hr Th17 conditions, standard ATAC-seq protocols require two orders of magnitude fewer cells than TF ChIP. Here, we Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

obtained (indirect) TF binding profiles from multiple differentiation time points. Yet TF-binding profiles derived from motif analysis of ATAC-seq are noisy. Here, we provide a single, integrated method to infer regulatory roles for TFs genome-wide. At its core, gene expression is modeled as a function of TF activities, where prior information (e.g., from ATAC-seq) can be used to (1) improve TF activity estimates for some TFs and (2) favor TF-gene interactions that also have prior support. We rigorously test the performance of our method in terms of precision- recall and gene-expression prediction. Our methods have two very desirable features (1) they prune initial noisy prior networks (by over an order of magnitude in this study) while (2) also learning new TF-gene interactions for TFs with and without prior information.

Our final Th17 TRN is built integrating our best knowledge (KO and ChIP-seq of key Th17 TFs with ATAC-seq and a rich gene expression dataset). Our de novo Th17 core includes the original core (RORC, STAT3, BATF, IRF4, MAF) and dozens of additional TFs. The TF-TF module analyses predict gene pathway regulation by multiple TFs. We also exhaustively test for the association of TFs with nearly 1000 GWAS phenotypes, uncovering known associations between STAT3 and IBD as well as several novel TF associations with immune phenotypes. Notably, these TFs were not themselves members of the gene sets for the phenotypes they are predicted to regulate. Thus, application of our TRN methods might provide new links between TF regulators and disease-associated genetic polymorphisms. The resulting Th17 TRN provides an important update to our knowledge of transcriptional regulation in Th17 cells and can be used to query key regulators of pathways and disease genes.

Of perhaps greater importance, the TRN experimental design and computational methods proposed are general, designed for regimes where prior knowledge of transcriptional regulators and/or sample material is scarce (e.g., cells directly from humans and animal models). Given the rigorous testing and case study presented here, we have high expectations for their successful application in other systems. Indeed, we have already applied our methods to a new physiological setting, constructing and experimentally validating TRNs for innate lymphoid cells of the intestine (Pokrovskii et al. 2018). Our methods are widely applicable. Prior information can be derived from diverse sources: chromatin state data, systems genetics, and literature- curated databases.

This work also highlights avenues for future improvement of TRN inference methods. We tested two methods for TF activity estimation: (1) TF mRNA levels and (2) based on prior knowledge of TF-gene interactions. Although prior-based TFA improved TRN inference in B. subtilis and yeast (Arrieta-Ortiz et al. 2015; Tchourine et al. 2018), neither method consistently outperformed the other in this study. As a result, final TRNs were built using both TFA methods. There are multiple dimensions along which TFA estimation could be improved. The simplicity of the linear framework proposed for prior-based estimation has limitations in the context of complex mammalian transcriptional regulation, and a more sophisticated mathematical model for TFA estimation could be of value. TFA estimation would also improve from better prediction of TF binding events. Here, we limited our approach to a simple TF motif analysis of accessible chromatin, yet several more sophisticated methods exist and merit testing (Pique-Regi et al. 2011; Sherwood et al. 2014; Chen et al. 2017; Lamparter et al. 2017). Another limitation of our method is the mapping of putative TF binding events to gene loci. In our analysis, 3D distance between potential regulatory regions and gene loci is approximated by linear distance, a short- coming that chromatin capture data (e.g., Hi-C (Lieberman-Aiden et al. 2009) and other 3D- chromatin techniques (Zhang et al. 2012; Beagrie et al. 2017) would mitigate. Thus, the Th17 genomics dataset (Ciofani et al. 2012), augmented by our new ATAC-seq and RNA-seq experiments, provides a fertile testing ground for the development of future TRN inference methods and innovation. Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

Methods

ATAC-seq CD4+ T Cells were sorted and polarized according to (Ciofani et al. 2012), and ATAC-seq samples were prepared as described previously (Buenrostro et al. 2013). Paired-end 50bp sequences were generated from samples on an Illumina HiSeq 2500. Sequences were mapped to the murine genome (mm10) with Bowtie 2 (2.2.3) (Langmead et Salzberg 2012), filtered based on mapping score (MAPQ > 30, SAMtools (0.1.19) (Li et al. 2009)), and duplicates removed (Picard (http://broadinstitute.github.io/picard)). The ATACseqQC package (Ou et al. 2018) was used to evaluate ATAC-seq fragment-length distributions and signal at TSS for each sample (Supplemental Fig. S19). For each sample individually, we ran PeaKDEck (McCarthy et O’Callaghan 2014) (parameters –bin 75, -STEP 25, -back 10000, -npBack100000) and -4 filtered peaks with a Praw<10 . To enable quantitative comparison of accessibility across samples, we generated a reference set of accessible regions, taking the union (BEDTools) of peaks detected in individual samples. The reference set of ATAC-seq peaks contained 63,049 potential regulatory loci, ranging from 75 to 3725bp (median 275bp). Reads per reference peak were counted with HTSeq-count (Anders et al. 2015). ATAC-seq data was robustly normalized using DESeq2 (Love et al. 2014) for PCA and clustering (Fig. 1A, S2). The 33 ATAC-seq experiments are available from NCBI’s GEO Database (GSE113721).

RNA-seq The 18 new samples composing “late Th17” time points were generated as follows: Naïve CD4 T cells were primed on an anti-CD3 (Bio X Cell BE0001-1) and anti-CD28 (Bio X Cell BE0015-1) coated plate (without any additional cytokine) for 12-14hrs (overnight). Cells were then polarized with one of two cytokine cocktails: (1) Th17N: TGF-b (0.3ng/ml, PeproTech 100-21-10) + IL6 (20ng/ml, eBioscience 34-8061-82) or (2) Th17P: IL6 (20ng/ml) + IL1b (20ng/ml, PeproTech 211-11b) + IL23 (20ng/ml, R&D Systems 1887-ML-010). Cells were harvested for RNA-seq at 60 and 108hrs post-TCR stimulation. Cells were lysed and snap-frozen in TRIzol then thawed for chloroform extraction, using a 1:1 ratio of 70% ethanol to aqueous phase. Samples were loaded onto a Qiagen RNEasy column according to manufacturer’s instructions. rRNA was depleted with a Ribo-Zero Gold kit; libraries were then prepared using the Illumina TruSeq Stranded Total RNA library prep and sequenced on an Illumina HiSeq 2500. The remaining 81 new CD4+T cell samples were (1) naïve or polarized and (2) processed as described in (Ciofani et al. 2012). The 99 RNA-seq experiments are available from GEO (GSE113720). Publicly available RNA-seq data was downloaded from GEO: GSE40918 (156 samples), GSE70108 (4 samples), and GSE92992 (8 samples). Sequences were mapped to mm10 (STAR aligner (Dobin et al. 2013)). Reads per gene were counted (using HTSeq-count (Anders et al. 2015) with parameters: --stranded=no --mode=union) and robustly normalized (DESeq2 (Love et al. 2014)). Supplemental Note 2 details treatment of batch effects.

Transcriptional regulatory network inference. Selection of Target Genes: We built gene expression models for 3578 target genes, composed of the union of (1) genes differentially expressed between Th17 and Th0 at 48hr (FDR = 10%, log2|FC| > log2(1.5)) and (2) the 2100 genes in the original Th17 TRN (Ciofani et al. 2012) (Supplemental Table S2).

Selection of Potential Regulators: We initially generated a custom list of potential mouse protein TFs, combining (1) mouse and human TFs from TFClass (Wingender et al. 2014) and genes with the GO annotation “Transcription Factor Activity”. (Human TFs were mapped to mouse using the MGI database.) From our list of potential mouse protein TFs (2093 genes), we Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

generated a list of 869 potential TF regulators, limited to TFs with differential gene expression in at least one pairwise comparison between Th17, Th0, Th1, Treg, or Th2 at 48hr (FDR = 10%, log2|FC| > log2(1.5)). This initial list was used for all analyses comparing mLASSO-StARS and BBSR-BIC (Fig. 2, 3B, S3, S6). However, given recent efforts in TF annotation (Lambert et al. 2018), we generated a new list of mouse TFs for subsequent TRN analyses. Lambert et al. manually curated lists of (1) likely TFs and (2) “likely non-TFs”. We converted both lists to mouse. To gain mouse TFs without human orthologs, we integrated with mouse TFs from AnimalTFDB (Zhang et al. 2015), but removed any mouse TFs (70) that were “likely nonTFs”. Our final mouse TF list contained 1577 TFs, 715 of which served as potential regulators (differentially expressed as described above). Both candidate TF lists are available (Supplemental Table S3).

Generation of Prior Matrices: ATAC-seq: ATAC-seq peaks were associated with putative transcription-factor (TF) binding events and target genes to generate a “prior” network, ||||, of TF-gene interactions. We used a compendium of human and mouse TF motifs. Human and/or mouse TF binding motifs (PWMs) were downloaded from the CisBP motif collection version 1.02 (http://cisbp.ccbr.utoronto.ca) (Weirauch et al. 2014) and the ENCODE motif collection (http://compbio.mit.edu/encode-motifs) (Kheradpour et Kellis 2014). Transfac version 2014.2 motifs (Wingender 2008), referenced in the human CisBP collection, were reformatted with the MEME Suite tool transfac2meme version 4.10.1. Human ENCODE motifs were added to the CisBP motif collection if the TF PWM had R2 < 0.95 with a CisBP entry for that TF. The combined human ENCODE and CisBP set were mapped to mouse orthologs. We scanned peaks for individual motif occurrences with FIMO (Grant et al. 2011) (parameters --thresh .00001, ---stored-scores 500000, and a first-order-Markov background model). We found inclusion of human TF orthologs from the ENCODE motif collection slightly increased precision- recall relative to mouse CisBP alone (Supplemental Fig. S20). TF motif occurrences with raw p-value < 10-5 were included in downstream analysis. Putative binding events were associated with a target gene, if the peak fell within +/-10kb of gene body. We tested several peak-gene association rules based on distance from gene body or TSS, and TRN inference was robust to that choice (Supplemental Fig. S21). We generated two ATAC-seq priors: (1) A(Th17) for which only peaks from Th17 48hr wildtype conditions were included and (2) A(Th) for which all Th samples were included. For the resulting prior matrix of TF-gene interactions, entries were one (if a TF motif was found proximal to the gene) and zero otherwise. Similar methods were used to derive priors from ChIP-seq, TRRUST, ENCODE DHS and combined sources (Supplemental Note 3).

Inference framework. We used the Inferelator model for TRN inference (Bonneau et al. 2006). At steady state1, gene expression is modeled as a sparse, multivariate linear combination of TF activities (TFAs): ∑ , [Equation 1] where xij corresponds to the expression level of gene i in condition j, ajk is the activity of TF k in condition j, and bik is describes the effect of TF k on gene i. A TF’s mRNA expression can serve as a proxy of protein TF activity. More recently (Arrieta-Ortiz et al. 2015), TFA has been estimated based on partial prior knowledge of a TF’s gene targets: , [Equation 2] where |||| is the expression matrix for genes in the prior, |||| is the prior matrix of known TF-gene interactions, and |||| contains the unknown

1 Consideration of time-series is discussed in Supplemental Note 4, Supplemental Fig. S23. Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

TF activities. Equation 2 has no unique solution, but the least-squares solution has worked well in simpler organisms (Arrieta-Ortiz et al. 2015; Tchourine et al. 2018). Given first-time application to a mammalian setting, we tested both methods of TFA estimation: (1) TF mRNA levels and (2) prior-based (Equation 2). Note that all expressed genes (24007) with edges in the prior were used to solve Equation 2. As described in results, we solved for the interaction terms {bik} in Equation 1 using the current Inferelator (BBSR-BIC, (Arrieta-Ortiz et al. 2015)) as well as a new method (mLASSO-StARS, detailed below).

Model-building with LASSO and StARS. We constructed sparse models of gene expression using a modified LASSO framework: || |Λ| , [Equation 3] where X, and A are defined as above, B|||| is the matrix of inferred TF-gene interaction coefficients, Λ|||| is a matrix of nonnegative penalties, and represents an entry-wise matrix product (Studham et al. 2014; Gustafsson et al. 2015). Matrix representation of the LASSO penalty enables incorporation of prior information. Specifically, a smaller penalty Λ is used if there is evidence for the TF-gene interaction in the prior matrix. Similar to the G-prior in the current Inferelator BBSR and older Inferelator modified elastic net framework (Greenfield et al. 2013), this procedure encourages selection of interactions supported by the prior, if there is also support in the gene expression data. For this study, the entries of the Λ matrices were limited to two values: the nonnegative value , for TF-gene interactions without evidence in the prior, and bias*, where bias 0,1, for TF-gene interactions with support in the prior.

We hypothesized that a data-driven approach to model selection might perform better in a complex, mammalian setting than a theoretical one (e.g., BIC used in Inferelator-BBSR-BIC). Specifically, we chose to test the StARS (Liu et al. 2010), anticipating that the resulting networks would be larger than those built using our BBSR-BIC method. We hypothesized that a larger model might be needed to describe a mammalian TRN. StARS was designed to ensure that the inferred network of interactions includes the true set of network interactions with high probability. In contrast, another popular data-driven selection method, stability selection, seeks to limit false positive rate (Meinshausen et Bühlmann 2010), which, in a biological setting, might be overly conservative (Liu et al. 2010). Thus, StARS seemed ideally suited to our objective.

In brief, StARS rests on the definition of edge instabilities. For a fixed value of , instabilities are estimated via subsampling and can be interpreted as twice the Bernoulli variance of a subsampled edge or the fraction of times subsample edge predictions disagree (Liu et al. 2010). This definition is used to select the smallest value corresponding to an acceptable average edge instability; authors heuristically recommend an average instability cutoff = .05.

Importantly, given our application of StARS in the new setting of TRN inference and a modified LASSO objective function, we used out-of-sample gene expression prediction and precision- recall of G.S. interactions to guide selection of an appropriate instability cutoff, rather than relying on the recommended heuristic (Supplemental Note 5, Supplemental Fig. S24-S29). Following our previous work in ecological network inference (Kurtz et al. 2015), TF-gene interactions were ranked according to nonzero subsamples per edge. To better and more efficiently prioritize high-confidence edges for TRN inference (see Supplemental Note 5, Supplemental Fig. S26-S28), we developed the following edge confidence score: Confidence(i,k) = Nonzero Subsamples + |pcorr(i,k)|, [Equation 4] where i and k correspond to gene i and TF k, and pcorr(i,k) is the partial correlation between gene i and TF k, for a set model size. For TF-gene interactions with the same number of Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

nonzero subsamples, the TF-gene interaction with higher absolute partial correlation will be higher confidence. mLASSO-StARS was implemented in MATLAB R2016b, and code relies on the Glmnet for Matlab package (Qian et al. 2013) to solve Equation 3. Computational speed-ups using bStARS (Müller et al. 2016) are discussed in Supplemental Note 6, Supplemental Fig. S30. Code is available in Supplemental Materials and from https://github.com/emiraldi/infTRN_lassoStARS.git.

Prior Reinforcement. We tested several levels of prior reinforcement (none, moderate and high) for BBSR-BIC and mLASSO-StARS. For BBSR-BIC, these corresponded to G-prior weights of 1, 1.1, and 1.5, for mLASSO-StARS, bias 1, .5, .25, respectively. These prior-reinforcement parameters resulted in commensurate levels of TRN prior edge incorporation between methods (Supplemental Fig. S22).

Gene Expression Prediction. We generated three leave-out (test) datasets (Fig. 3A, Supplemental Table S4). For each leave-out prediction challenge, the training set included all samples excluding test. For each training set, we performed model selection and parameter estimation independently of the test set. Both BBSR-BIC and the mLASSO-StARS methods provide confidence estimates for predicted TF-gene interactions, and we built TRN models of various sizes as a function of edge confidence cutoffs for each of the training sets. For parameter estimation, training TFA matrices were mean-centered and variance-normalized || || according to the training-set means and standard deviations . Target gene expression vectors were mean-centered according to the training-set mean || . Then, for each confidence-level cutoff, we regressed the vector of normalized training gene expression data onto the reduced set of normalized training TFA estimates to |||| arrive at a set of multivariate linear coefficients . Sum of Squared Error of prediction was calculated as follows: , ∑|| ∑ || , , [Equation 5a] , The “null” model was calculated relative to the mean of training data: ∑|| , [Equation 5b] We then calculated , a normalized measure of predictive performance: 1 , ∞, 1 [Equation 5c]. For gene-expression prediction with prior-based TFA, the mRNA of target genes with edges in the prior contribute to TFA estimation (Equation 2), and then their gene expression patterns are predicted as a function of TFA. This circularity did not lead to over-fitting and inflated values (Supplemental Note 7, Supplemental Fig. S31, S32).

Final TRNs. To generate “final” TRNs, we used mLASSO-StARS with the following parameters: moderate prior reinforcement (bias = .05) and corresponding to average network instability = .05 to rank edges by confidence. To ensure that our final Th17 TRN was as complete and accurate as possible, our final network edge inclusion criteria was context-specific, guided by the most rigorous tools at hand, precision-recall and out-of-sample gene expression prediction (see supporting results and discussion in Supplemental Note 5). We included the highest confidence edges until we reached a model size of average 15 TFs/gene (3578 genes x 15 TFs/gene = 53,670 TF-gene interactions). Given the complementary performance of TF-mRNA and prior-based TFA, we combined resulting TRNs by taking the maximum edge confidence, to Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

preserve the individual strengths of each (Kittler et al. 1996; Castro et al. 2018). (See Supplemental Note 8, Supplemental Fig. S33, S34 for performance comparison of max- to rank-combine (Marbach et al. 2012) relative to individual TRNs as well as performance combining TRNs from different priors.)

De Novo Th17 Core. Th17 TFs were limited to TFs specifically promoting Th17 gene expression patterns. TFs were included in the core if they met one of two criteria: (1) The TF promotes Th17 gene expression through activation (the TF’s positive edges are enriched in up-regulated Th17 genes at an FDR = 1%) or (2) repression of non-Th17 genes (TF’s negative edges are enriched in down-regulated Th17 genes at an FDR = 1%).

Gold Standards. For the gold standards from our lab, we used recommended cutoffs of .75, .75, and 1.5 for KO, ChIP, and KO+ChIP networks, respectively (Ciofani et al. 2012). For the six additional TF KO experiments, we downloaded networks without filtering (Yosef et al. 2013). For both G.S.’s, gene symbols were mapped from mm9 to mm10, and only genes mapping to both genome builds were considered in precision-recall analysis. Random AUPR was calculated as the ratio of G.S. edges to target genes to the number of possible edges between target genes and TFs in the G.S.

Network Visualization and Availability. Networks were visualized using jp_gene_viz, a newly designed interactive interface, based on iPython. Software is available at https://github.com/simonsfoundation/jp_gene_viz. All 36 LASSO-StARS Th17 TRNs (from Supplemental Fig. S8), gold standards, and final, combined TRNs are available in a Jupyter-notebook binder: https://mybinder.org/v2/gh/simonsfoundation/Th17_TRN_Networks/master. Both jp_gene_viz codebase and TRN notebooks are also included in Supplemental Materials.

TF-TF Module Analysis. We calculated the number of shared target genes between each pair of TFs, analyzing positive and negative target edges separately. (Edges with |partial correlation| < .01 were excluded from analysis, as were TFs with fewer than 20 gene targets.) TFs vary greatly by number of target genes (Supplemental Fig. S10, S11), so we devised an overlap normalization scheme that controlled for the variable number of targets per TF (Supplemental Note 9, Supplemental Fig. S35).

GWAS Analysis. The NHGRI-EBI GWAS Catalog v1.0.2 (MacArthur et al. 2016) was downloaded on August 4, 2018. SNPs were mapped to the nearest gene within +/-1Mbp using the catalog’s “MAPPED GENE(S)”. Phenotype-associated gene sets were converted to mouse gene symbols. Sets containing 5 genes (991 sets) were retained. For each TF in the Th17 TRN (KO+ChIP+ATAC prior) with 5 targets (605 TFs), overlap with the GWAS gene sets was calculated and significance estimated using the hypergeometric CDF. Benjamini-Hochberg correction was applied to control for multiple hypothesis testing. Network statistics were calculated in MATLAB R2016b, normalizing degree by total target genes and fraction of shortest paths (betweenness) by total number of paths between TFs and target genes (some of which were also TFs).

Data Access

The datasets from this study have been submitted to the NCBI Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) under accession number GSE113723.

Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

Acknowledgements

We thank the Flatiron Institute Scientific Computing Core (I. Fisk) for enabling the computational aspects of this work and the New York University Langone Medical Center Genomics Core (A. Heguy and P. Zappile) for help with sequencing. We are thankful to T. Aijo, M. Weirauch and R. Taylor for advice on the manuscript and to G. Atluri for helpful discussions of the TF-TF module analysis. This work was supported by the Cincinnati Children’s Research Foundation (Trustee Award Grant to E.R.M.), the Simons Foundation (E.R.M., A.W., N.D., N.C., R.B.), U.S. National Institute of Health (5T32AI100853 to M.P.; R01-DK103358-01 to R.B. and D.L.; and R01- GM112192-01 to R.B., T32 CA009161 (Levy) to J.A.H.), Crohn’s and Colitis Foundation of America (fellowship to M.C.), Damon Runyon Cancer Research Foundation (Dale and Betty Frey Fellowship to J.A.H.), the Laura and Isaac Perlmutter Cancer Center (P30CA016087 to A.H.).

Disclosure Declaration

The authors have no conflicts of interest. Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

References

Äijö T, Bonneau R. 2016. Biophysically Motivated Regulatory Network Inference: Progress and Prospects. Hum Hered 81: 6277. Anders S, Pyl PT, Huber W. 2015. HTSeq-A Python framework to work with high-throughput sequencing data. Bioinformatics 31: 166169. Arrieta-Ortiz ML, Hafemeister C, Bate AR, Chu T, Greenfield A, Shuster B, Barry SN, Gallitto M, Liu B, Kacmarczyk T, et al. 2015. An experimentally supported model of the Bacillus subtilis global transcriptional regulatory network. Mol Syst Biol 11: 839. Badis G, Berger MF, Philippakis AA, Talukder S, Gehrke AR, Jaeger SA, Chan ET, Metzler G, Vedenko A, Chen X. 2009. Diversity and complexity in DNA recognition by transcription factors. Science 324: 17201723. Barski A, Cuddapah S, Cui K, Roh T-Y, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K. 2007. High-Resolution Profiling of Histone Methylations in the . Cell 129: 823837. Beagrie RA, Scialdone A, Schueler M, Kraemer DCA, Chotalia M, Xie SQ, Barbieri M, de Santiago I, Lavitas L-M, Branco MR, et al. 2017. Complex multi-enhancer contacts captured by genome architecture mapping. Nature 543: 519524. Blatti C, Kazemian M, Wolfe S, Brodsky M, Sinha S. 2015. Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism. Nucleic Acids Res 43: 39984012. Bonneau R, Facciotti MT, Reiss DJ, Schmid AK, Pan M, Kaur A, Thorsson V, Shannon P, Johnson MH, Bare JC, et al. 2007. A Predictive Model for Transcriptional Control of Physiology in a Free Living Cell. Cell 131: 13541365. Bonneau R, Reiss DJ, Shannon P, Facciotti M, Hood L, Baliga NS, Thorsson V. 2006. The Inferelator: an algorithm for learning parsimonious regulatory networks from systems- biology data sets de novo. Genome Biol 7: 1. Boyle AP, Davis S, Shulha HP, Meltzer P, Margulies EH, Weng Z, Furey TS, Crawford GE. 2008. High-resolution mapping and characterization of open chromatin across the genome. Cell 132: 311322. Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. 2013. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding and nucleosome position. Nat Methods 10: 12138. Buenrostro JD, Wu B, Litzenburger UM, Ruff D, Gonzales ML, Snyder MP, Chang HY, Greenleaf WJ. 2015. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523: 48690. Castro DM, Veaux N de, Miraldi ER, Bonneau R. 2018. Multi-study inference of regulatory networks for more accurate models of gene regulation. bioRxiv 279224. Cerami EG, Gross BE, Demir E, Rodchenkov I, Babur Ö, Anwar N, Schultz N, Bader GD, Sander C. 2010. Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res 39: D685D690. Chai LE, Loh SK, Low ST, Mohamad MS, Deris S, Zakaria Z. 2014. A review on the computational approaches for gene regulatory network construction. Comput Biol Med 48: 5565. Chen K, Coonrod EM, Kumánovics A, Franks ZF, Durtschi JD, Margraf RL, Wu W, Heikal NM, Augustine NH, Ridge PG, et al. 2013. Germline Mutations in NFKB2 Implicate the Noncanonical NF-kB Pathway in the Pathogenesis of Common Variable Immunodeficiency. Am J Hum Genet 93: 812824. Chen X, Yu B, Carriero N, Silva C, Bonneau R. 2017. Mocap: Large-scale inference of transcription factor binding sites from chromatin accessibility. Nucleic Acids Res 45: Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

43154329. Cho JH. 2008. The genetics and immunopathogenesis of inflammatory bowel disease. Nat Rev Immunol 8: 458466. Choi GB, Yim YS, Wong H, Kim S, Kim H, Kim S V, Hoeffer CA, Littman DR, Huh JR. 2016. The maternal interleukin-17a pathway in mice promotes autism-like phenotypes in offspring. Science 351: 933 LP-939. Christie D, Zhu J. 2014. Transcriptional regulatory networks for CD4 T cell differentiation. Dans Transcriptional Control of Lineage Differentiation in Immune Cells, p. 125172, Springer. Ciofani M, Madar A, Galan C, Sellars M, Mace K, Pauli F, Agarwal A, Huang W, Parkurst CN, Muratet M, et al. 2012. A validated regulatory network for Th17 cell specification. Cell 151: 289303. Cunningham-Rundles C. 2008. Autoimmune manifestations in common variable immunodeficiency. J Clin Immunol 28: 4245. Debnath M, Berk M. 2014. Th17 pathway-mediated immunopathogenesis of schizophrenia: Mechanisms and implications. Schizophr Bull 40: 14121421. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. 2013. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29: 1521. Duren Z, Chen X, Jiang R, Wang Y, Wong WH. 2017. Modeling gene regulation from paired expression and chromatin accessibility data. Proc Natl Acad Sci 114: E4914E4923. Endo Y, Yokote K, Nakayama T. 2017. The obesity-related pathology and Th17 cells. Cell Mol Life Sci 74: 12311245. Fu Y, Jarboe LR, Dickerson JA. 2011. Reconstructing genome-wide regulatory network of E. coli using transcriptome data and predicted transcription factor activities. BMC Bioinformatics 12: 233. Gene Ontology Consortium. 2004. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 32: D258D261. Giresi PG, Kim J, McDaniell RM, Iyer VR, Lieb JD. 2007. FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. Genome Res 17: 877885. Grant CE, Bailey TL, Noble WS. 2011. FIMO: scanning for occurrences of a given motif. Bioinformatics 27: 10171018. Greenfield A, Hafemeister C, Bonneau R. 2013. Robust data-driven incorporation of prior knowledge into the inference of dynamic regulatory networks. Bioinformatics 29: 10601067. Gustafsson M, Gawel DR, Alfredsson L, Baranzini S, Bjorkander J, Blomgran R, Hellberg S, Eklund D, Ernerudh J, Kockum I, et al. 2015. A validated gene regulatory network and GWAS identifies early regulators of T cell-associated diseases. Sci Transl Med 7: 110. Han H, Shim H, Shin D, Shim JE, Ko Y, Shin J, Kim H, Cho A, Kim E, Lee T, et al. 2015. TRRUST: a reference database of human transcriptional regulatory interactions. Sci Rep 5: 11432. Harley ITW, Stankiewicz TE, Giles DA, Softic S, Flick LM, Cappelletti M, Sheridan R, Xanthakos SA, Steinbrecher KA, Sartor RB, et al. 2014. IL-17 signaling accelerates the progression of nonalcoholic fatty liver disease in mice. Hepatology 59: 18301839. Hecker M, Lambeck S, Toepfer S, van Someren E, Guthke R. 2009. Gene regulatory network inference: Data integration in dynamic models-A review. BioSystems 96: 86103. Heng TSP, Painter MW, Elpek K, Lukacs-Kornek V, Mauermann N, Turley SJ, Koller D, Kim FS, Wagers AJ, Asinovski N. 2008. The Immunological Genome Project: networks of gene expression in immune cells. Nat Immunol 9: 1091. Isono F, Fujita-Sato S, Ito S. 2014. Inhibiting RORγt/Th17 axis for autoimmune disorders. Drug Discov Today 19: 12051211. Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

Jolma A, Kivioja T, Toivonen J, Cheng L, Wei G, Enge M, Taipale M, Vaquerizas JM, Yan J, Sillanpa MJ, et al. 2010. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. 861873. Jolma A, Yan J, Whitington T, Toivonen J, Nitta KR, Rastas P, Morgunova E, Enge M, Taipale M, Wei G, et al. 2013. DNA-binding specificities of human transcription factors. Cell 152: 327339. Kanehisa M, Goto S. 2000. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28: 2730. Karwacz K, Miraldi ER, Pokrovskii M, Madi A, Yosef N, Wortman I, Chen X, Watters A, Carriero N, Awasthi A, et al. 2017. Critical role of IRF1 and BATF in forming chromatin landscape during type 1 regulatory cell differentiation. Nat Immunol 18: 412421. Khader SA, Gaffen SL, Kolls JK. 2009. Th17 cells at the crossroads of innate and adaptive immunity against infectious diseases at the mucosa. Mucosal Immunol 2: 403411. Kheradpour P, Kellis M. 2014. Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments. Nucleic Acids Res 42: 29762987. Kittler J, Hater M, Duin RPW. 1996. Combining classifiers. Proc - Int Conf Pattern Recognit 2: 897901. Kurtz ZD, Müller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. 2015. Sparse and Compositionally Robust Inference of Microbial Ecological Networks. PLoS Comput Biol 11: e1004226. Lambert SA, Jolma A, Campitelli LF, Das PK, Yin Y, Albu M, Chen X, Taipale J, Hughes TR, Weirauch MT. 2018. The Human Transcription Factors. Cell 172: 650665. Lamparter D, Marbach D, Rueedi R, Bergmann S, Kutalik Z. 2017. Genome-Wide Association between Transcription Factor Expression and Chromatin Accessibility Reveals Regulators of Chromatin Accessibility. PLoS Comput Biol 13: 119. Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat Methods 9: 357. Lee SY, Lee SH, Yang EJ, Kim EK, Kim JK, Shin DY, Cho M La. 2015. Metformin ameliorates inflammatory bowel disease by suppression of the signaling pathway and regulation of the between Th17/Treg Balance. PLoS One 10: 112. Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, et al. 2002. Transcriptional Regulatory Networks in Saccharomyces cerevisiae. Science 298: 799 LP-804. Leng RX, Pan HF, Chen GM, Feng CC, Fan YG, Ye DQ, Li XP. 2011. The dual nature of Ets-1: Focus to the pathogenesis of systemic lupus erythematosus. Autoimmun Rev 10: 439443. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25: 20782079. Li P, Spolski R, Liao W, Leonard WJ. 2014. Complex interactions of transcription factors in mediating cytokine biology in T cells. Immunol Rev 261: 141156. Liao JC, Boscolo R, Yang Y-L, Tran LM, Sabatti C, Roychowdhury VP. 2003. Network component analysis: reconstruction of regulatory signals in biological systems. Proc Natl Acad Sci U S A 100: 155227. Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP. 2011. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27: 17391740. Lieberman-Aiden E, Van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO. 2009. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326: 289293. Lindsley AW, Qian Y, Valencia CA, Shah K, Zhang K, Assa’ad A. 2014. Combined Immune Deficiency in a Patient with a Novel NFKB2 Mutation. J Clin Immunol 34: 910915. Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

Littman DR, Rudensky AY. 2010. Th17 and Regulatory T Cells in Mediating and Restraining Inflammation. Cell 140: 845858. Liu H, Roeder K, Wasserman L. 2010. Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models. Dans Advances in Neural Information Processing Systems 23 (éd. J.D. Lafferty, C.K.I. Williams, J. Shawe-Taylor, R.S. Zemel, et A. Culotta), p. 14321440, Curran Associates, Inc. Liu Y, Hanson S, Gurugama P, Jones A, Clark B, Ibrahim MAA. 2014. Novel NFKB2 Mutation in Early-Onset CVID. J Clin Immunol 34: 686690. Lopez-Herrera G, Tampella G, Pan-Hammarström Q, Herholz P, Trujillo-Vargas CM, Phadwal K, Simon AK, Moutschen M, Etzioni A, Mory A, et al. 2012. Deleterious Mutations in LRBA Are Associated with a Syndrome of Immune Deficiency and Autoimmunity. Am J Hum Genet 90: 9861001. Love MI, Huber W, Anders S. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15: 550. MacArthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E, Junkins H, McMahon A, Milano A, Morales J. 2016. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res 45: D896D901. Madar A, Greenfield A, Vanden-Eijnden E, Bonneau R. 2010. DREAM3: Network inference using dynamic context likelihood of relatedness and the inferelator. PLoS One 5. Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, Allison KR, Aderhold A, Bonneau R, Chen Y. 2012. Wisdom of crowds for robust gene network inference. Nat Methods 9: 796. McCarthy MT, O’Callaghan CA. 2014. PeaKDEck: a kernel density estimator-based peak calling program for DNaseI-seq data. Bioinformatics 30: 13021304. Meinshausen N, Bühlmann P. 2010. Stability selection. J R Stat Soc Ser B (Statistical Methodol 72: 417473. Moisan J, Grenningloh R, Bettelli E, Oukka M, Ho I-C. 2007. Ets-1 is a negative regulator of Th17 differentiation. J Exp Med 204: 28252835. Müller CL, Bonneau R, Kurtz Z. 2016. Generalized Stability Approach for Regularized Graphical Models. arXiv 1605.07072. Najafabadi HS, Mnaimneh S, Schmitges FW, Garton M, Lam KN, Yang A, Albu M, Weirauch MT, Radovani E, Kim PM. 2015. C2H2 proteins greatly expand the human regulatory lexicon. Nat Biotechnol 33: 555562. Neph S, Stergachis AB, Reynolds A, Sandstrom R, Borenstein E, Stamatoyannopoulos JA. 2012. Circuitry and Dynamics of Human Transcription Factor Regulatory Networks. Cell 150: 12741286. Nguyen PM, Putoczki TL, Ernst M. 2015. STAT3-Activating Cytokines: A Therapeutic Opportunity for Inflammatory Bowel Disease? J Interf Cytokine Res 35: 340350. Ou J, Liu H, Yu J, Kelliher MA, Castilla LH, Lawson ND, Zhu LJ. 2018. ATACseqQC: a Bioconductor package for post-alignment quality assessment of ATAC-seq data. BMC Genomics 19: 169. Ouyang Z, Zhou Q, Wong WH. 2009. ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells. Proc Natl Acad Sci 106: 2152121526. Patel DD, Kuchroo VK. 2015. Th17 Cell Pathway in Human Immunity: Lessons from Genetics and Therapeutic Interventions. Immunity 43: 10401051. Pico AR, Kelder T, Van Iersel MP, Hanspers K, Conklin BR, Evelo C. 2008. WikiPathways: pathway editing for the people. PLoS Biol 6: e184. Pique-Regi R, Degner JF, Pai AA, Gaffney DJ, Gilad Y, Pritchard JK. 2011. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

Genome Res 21: 447455. Pokrovskii M, Hall JA, Ochayon DE, Yi R, Chaimowitz NS, Seelamneni H, Carriero N, Watters A, Waggoner S, Littman DR, et al. 2018. Transcriptional regulatory networks that promote and restrict identities and functions of intestinal innate lymphoid cells. bioRxiv 465435. Qian J, Hastie T, Friedman J, Tibshirani R, Simon N. 2013. Glmnet for Matlab. Qin J, Hu Y, Xu F, Yalamanchili HK, Wang J. 2014. Inferring gene regulatory networks by integrating ChIP-seq/chip and transcriptome data via LASSO-type regularization methods. Methods 67: 294303. Ramirez RN, El-Ali NC, Mager MA, Wyman D, Conesa A, Mortazavi A. 2017. Dynamic Gene Regulatory Networks of Human Myeloid Differentiation. Cell Syst 4: 416429.e3. Rendeiro AF, Schmidl C, Strefford JC, Walewska R, Davis Z, Farlik M, Oscier D, Bock C. 2016. Chromatin accessibility maps of chronic lymphocytic leukaemia identify subtype-specific epigenome signatures and transcription regulatory networks. Nat Commun 7. Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A, et al. 2007. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods 4: 651657. Sherwood RI, Hashimoto T, O’Donnell CW, Lewis S, Barkal AA, Van Hoff JP, Karun V, Jaakkola T, Gifford DK. 2014. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat Biotechnol 32: 171178. Siahpirani AF, Roy S. 2016. A prior-based integrative framework for functional transcriptional regulatory network inference. Nucleic Acids Res 45: e21. Smith CK, Kaplan MJ. 2015. The role of neutrophils in the pathogenesis of systemic lupus erythematosus. Curr Opin Rheumatol 27: 448453. Stadhouders R, Lubberts E, Hendriks RW. 2017. A cellular and molecular view of T helper 17 cell plasticity in autoimmunity. J Autoimmun. Stadhouders R, Lubberts E, Hendriks RW. 2018. A cellular and molecular view of T helper 17 cell plasticity in autoimmunity. J Autoimmun 87: 115. Stergachis AB, Neph S, Sandstrom R, Haugen E, Reynolds AP, Zhang M, Byron R, Canfield T, Stelhing-Sun S, Lee K, et al. 2014. Conservation of trans-acting circuitry during mammalian regulatory evolution. Nature 515: 365370. Studham ME, Tjärnberg A, Nordling TEM, Nelander S, Sonnhammer ELL. 2014. Functional association networks as priors for gene regulatory network inference. Bioinformatics 30: 130138. Tchourine K, Vogel C, Bonneau R. 2018. Condition-Specific Modeling of Biophysical Parameters Advances Inference of Regulatory Networks. Cell Rep 23: 376388. Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, Najafabadi HS, Lambert SA, Mann I, Cook K, et al. 2014. Determination and Inference of Eukaryotic Transcription Factor Sequence Specificity. Cell 158: 14311443. Wilkins O, Hafemeister C, Plessis A, Holloway-Phillips M-M, Pham GM, Nicotra AB, Gregorio GB, Jagadish SVK, Septiningsih EM, Bonneau R, et al. 2016. EGRINs (Environmental Gene Regulatory Influence Networks) in Rice That Function in the Response to Water Deficit, High Temperature, and Agricultural Environments. Plant Cell 28: 23652384. Wingender E. 2008. The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation. Brief Bioinform 9: 326332. Wingender E, Schoeps T, Haubrock M, Dönitz J. 2014. TFClass: a classification of human transcription factors and their rodent orthologs. Nucleic Acids Res 43: D97D102. Xavier RJ, Podolsky DK. 2007. Unravelling the pathogenesis of inflammatory bowel disease. Nature 448: 427434. Xi H, Shulha HP, Lin JM, Vales TR, Fu Y, Bodine DM, McKay RDG, Chenoweth JG, Tesar PJ, Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

Furey TS. 2007. Identification and characterization of cell type–specific and ubiquitous chromatin regulatory structures in the human genome. PLoS Genet 3: e136. Yang J, Sundrud MS, Skepner J, Yamagata T. 2014. Targeting Th17 cells in autoimmune diseases. Trends Pharmacol Sci 35: 493500. Yosef N, Shalek AK, Gaublomme JT, Jin H, Lee Y, Awasthi A, Wu C, Karwacz K, Xiao S, Jorgolli M, et al. 2013. Dynamic regulatory network controlling TH17 cell differentiation. Nature 496: 4618. Zhang HM, Liu T, Liu CJ, Song S, Zhang X, Liu W, Jia H, Xue Y, Guo AY. 2015. AnimalTFDB 2.0: A resource for expression, prediction and functional study of animal transcription factors. Nucleic Acids Res 43: D76D81. Zhang J, Poh HM, Peh SQ, Sia YY, Li G, Mulawadi FH, Goh Y, Fullwood MJ, Sung W-K, Ruan X. 2012. ChIA-PET analysis of transcriptional chromatin interactions. Methods 58: 289299.

Figure Legends

Figure 1. (A) PCA of Chromatin Accessibility Profiles. The 33 ATAC-seq samples are plotted as a function of ATAC-seq peak intensities in PCA space, using the reference set of 63,049 ATAC-seq peaks identified. Open circles denote experimental conditions that deviate from the standard T Cell differentiation conditions (e.g., gene deletion, additional cytokines). Gray and red arrows indicate Maf and Stat3 KO Th17 conditions, respectively. (B) PCA of Gene Expression Profiles. The 254 RNA-seq samples are plotted as a function of all genes in PCA space. (C) Study Design. (see text).

Figure 2. (A) Precision-recall of Th17 TRNs. The left two panels enable comparison of TRNs built from ChIP versus Th17 ATAC priors, quantified by precision-recall of the KO G.S. (25 TFs, 8875 interactions), insets display AUPR. The performance of several TRNs are plotted for each prior, based on Inferelator method (LS = mLASSO-StARS (reds), BB = BBSR-BIC (blues)), TFA estimate (m = TF mRNA, TFA = P+X), and “+” corresponds to strength of prior reinforcement. Random and “No Prior” control TRNs serve as references. The right panel shows precision- recall of the KO-ChIP G.S. (9 TFs, 2375 edges) for TRNs built from the Th17 ATAC prior. (B) Number of targets per TF in the Gold Standards. Targets per TF are limited to the 3578 considered by the model. (C) TF-specific TRN performance. For each G.S., AUPRs were calculated for each TF individually. TF-specific performance of TRNs is quantified as the log2- foldchange between AUPR of the TRN model relative to random. “+”, “m” and “TFA” are as in (A).

2 Figure 3. (A) Leave-out sets plotted in PCA space. (B) Gene expression prediction. R pred for each leave-out set is plotted as a function of mean number of TFs per gene. In the legend, LS = mLASSO-StARS, BB = BBSR-BIC, m = TF mRNA, TFA = prior-based TFA, and “+” corresponds to strength of prior reinforcement. The gray line corresponds to a model-size cutoff 2 of mean 15 TFs per gene. (C) Distributions of R pred values. Empirical cumulative distribution 2 functions (CDFs) of per-gene R pred values for each method (model-size cutoff = mean 15 TFs per gene). (D) Model quality metrics versus model size. For two TRN models built with Th17 ATAC (left panel) or ChIP+KO+ATAC (right panel) priors (mLASSO-StARS, bias = .5, TFA = + 2 P X), the quality metrics (R pred for each leave-out set, precision and recall) are plotted as a function of model size. The model size used for subsequent analyses is highlighted.

Figure 4. (A-D) Th17 core TRN models. “Core” Th17 genes and TFs were selected from the literature for visual comparison with jp_gene_viz software. Network size was limited to an average of 15 TFs per gene for Inferelator networks using the: (A) Th17 ATAC prior, (B) no Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

prior, or (D) ChIP+ATAC+KO prior. The edges in Inferelator TRNs are colored according to partial correlation (red positive, blue negative) and weighted proportionally to edge stability. Solid edges have prior support, while dotted edges were learned from gene expression modeling alone. (C) represents the full KO-ChIP G.S. from Ciofani et al., where edge sign is based on differential gene expression analysis between TF KO and control. Nodes are colored according to z-scored gene expression at 48h in Th17, relative to the other T Helper cell time points (red/blue = increased/decreased expression). The “Final” KO+ChIP+ATAC (D) and ATAC-only (B) TRNs max-combine networks built using TF mRNA and prior-based TFA.

Figure 5. De Novo Th17 Core TFs in the ChIP+KO+ATAC TRN (A) and ATAC-only TRN (B). The Top-30 most significant “core” TFs are displayed. Significance was based on enrichment of a TF’s (1) positive gene targets in up-regulated Th17 genes or (2) negative targets in down- regulated Th17 genes. The left panel displays significance and direction of regulation, while the right panel denotes number, sign and prior support of TF target edges. Superscripts c and y indicate TF Th17 association from (Ciofani et al. 2012) and (Yosef et al., 2013), respectively. (C) Top 15 TF-TF Modules for ChIP+KO+ATAC TRN. TFs were clustered into modules based on shared positive target genes between TFs (Methods). Gene-set enrichment was used to annotate clusters, and TF members are listed.

Figure 6. (A) TFs whose target genes are enriched in GWAS phenotype genes (FDR=10%). Phenotype abbreviations “IBD” = “Inflammatory Bowel Disease” and “Chronic inflammatory diseases” = “Chronic inflammatory diseases (ankylosing spondylitis, Crohn's disease, psoriasis, primary sclerosing cholangitis, ulcerative colitis) (pleiotropy)”. # GWAS, # TF, and # Overlap correspond to the number of genes associated with the phenotype, regulated by the TF in the Th17 TRN (KO+ChIP+ATAC), and the overlap between those two sets, respectively. Further details are contained in Methods. (B) STAT3 and FOXB1 are central regulators of IBD genes. In the left panel, each arrow corresponds to a single TF. Arrow source is TF’s centrality (out degree, betweenness) in the full Th17 TRN and arrowhead is TF centrality for the IBD subnetwork (where target genes are limited to the 54 shared the Th17 TRN and IBD GWAS set). STAT3 and FOXB1 (pink arrows) both show significant increase in degree centrality for IBD genes (FDR=10%). The right panel features the subnetwork connecting STAT3 and FOXB1 to their target genes in the IBD set. Node color indicates log2(fold-change) in Th17 48h condition relative to other Th timepoints (red = increased, blue = decreased), while red / blue edges indicate positive / negative regulation. Solid edges have support in the ChIP+KO+ATAC prior, while dotted edges do not.

Figure 1 Figure 1

Figure 1

Figure 1 Th1 Th2

Treg 48hr CD4T 96hr A B Th0 ivThATACTh17 Raw ATAC Scores-seq (, 63049) Th17 RNA-seq Th17 Tr1 (9%) 72hr 16hr Th1 Th2 PC 3 PC PC 3 (9.45%)

3-12hr Treg Th17 48hr CD4T 96hr 1hr Th0

6hr PCPC 1 (26.91%)(27%) 16hr 48hr PC 2 2 (14.53%) (15%) Th17 Tr1 (9%) 72hr 2hr 100 16hr PC 2 (10%) PC 3 PC PC 2 (10%) PC 3 (9.45%) CD4CD4T T Cell CD4T Th2 80 Th0CD4Tper 3-12hr Th1Th0 60 Th17Th0per Th2Th1 Tr1Th17 Th0 40 TregTh17per Treg 20 1hr

0 PCPC 1 1(55%) (55%) PC 1 (26.91%) PC 2 (14.53%) PC 1 (27%) -20 PC 2 (15%) C -40 Challenge: Model is in the underdetermined regime 100 Gene Expression Data

• 1000s of Genes to Model -60 • 100s of TFs Expressed -50 0 50 100 RNA-seq • >105 TF-gene Interaction Terms to Learn Performance CD4CD4T T Cell • Only 10s-100s of Samples Genes

~20k genes x # Measurements > # Samples 80 Th0

1000s of CD4Tper ATAC (Th17 48h) (KC1p5) # Interactions to be Estimated >> # Samples 1 “p” >> “n” 254 samples Target Genes 0.9 Th1Th0 0.8 10s-100s of Samples (3578 x 254) 0.7 Gold Standards 60 0.6 Th17Th0per 0.5 25 TF KO + 9 TF ChIP TFA = TF mRNA Precision 0.4 Th2Th1 0.3

Inferelator TRN Precision TF Activities 0.2 0.1 Tr1Th17 ∆gene ~ Σ TF activities40 0 + (~869 x 254) 0 0.1 0.2 TFA = P X • BBSR-BIC (G-Prior) RecallRecall BenchmarkTregTh17per: ATAC (Th17 48h), LO = Early Th17 ü Prior Source • mLASSO-StARS Prior Matrix Figure 1 20 1 ü BBSR-BIC vs. Prior Information 0.8 A CCR6+ ILC3 (~3578 x ~869) mLASSO-StARS • ATAC-seq 0.6 2 pred pred R 2 ü Strength of prior 0 0.4 •CCR6ENCODE- ILC3 DHS R RNA-seq 0.2 reinforcement • TRRUST + 0 ü TFA = P X vs. ILC1 0 10 20 30 Small• (ChIP-seq) Out-of-sample Prediction ModelTFs Per GeneSize Inferelator -20 TFA = TF mRNA Intestine ATAC-seq NK Initial TRN ~1E6 Edges fineILC 1607 sqrtZ Scores (, 172111) Large Intestine Refined TRN ILC2 ~1E5 Edges -40

B fineILCATAC 1607 sqrtZ Scores Scores Plot (, 172111) ATAC Loadings Plot ATAC peaks unique to: ILC1 -60 ATAC samples: ILC2 -50 0 50 100 CCR6CCR6nILC3-ILC3 SI ILC3 CCR6nILC3 SI CCR6+CCR6pILC3ILC3 SI CCR6pILC3 SI Unique peaks cis to genes ILC1 SI ILC1ILC1 SI ILC2 SI uniquely expressed in: PC 2 (27%) 2 PC PC 2 (27%) 2 PC NK SI PC 2 (27.46%) ILC2ILC2 SI • ILC3ILC3 NK SI PC 2 (27.46%) NK • ILC1ILC1 • ILC2ILC2 PC 1 (38.96%) PC 1 (39%) PC 1 (39%)

Genes ILC1 Peaks ILC2 Peaks ILC3 Peaks ILC1 Tbx21, Eomes, Tbx6, Irf8, Zbtb49, (none recovered) Rorc, Nrd1d PC 1 (38.96%) Zfp287, Zfp646, Zfp341

ILC2 Tbx21, Eomes, Nr4a2, Zbtb49, Zfp287, Mecom, Gata3, Rorc, Nrd1d1, Nkx6-2, Hoxa7, Tcf7 Zfp646, Zfp341 Pou2f2 ILC3 Tbx21, Eomes, Irf8 (none recovered) Rorc, Nrd1d, Rela, Atf3, Batf3, Irf5, Mef2c, Nkx6-2, Pou6f2, Rax, Stat6, Tcf7, Zfp462, Zfp57

Potential transcriptional activation Potential repression

C Il17re (ILC3 gene) locus (ATAC-seq, putative activation)

CCR6+ ILC3 CCR6- ILC3 ILC1 NK ILC2 Il23r (ILC3 gene) locus (ATAC-seq, putative repression)

CCR6+ ILC3 CCR6- ILC3 ILC1 RORE motif NK ILC2

IRF motif Tbx21 motif Figure 2

A ChIP Prior ATAC Prior ATAC Prior 1 G.S. = KO G.S.ChIP (KOall)= KO ATAC (Th17 48h) (KOall) ATACG.S. (Th17 = KO+ChIP 48h) (KC1p5) 0.2 1 0.2 1 .2 1 10.2 .2 .2 RandomRandom 0.9 0.8 0.9 0.9 No Prior, BB 0.15

0.15 No Prior, BB 0.15 0.8 0.8 0.8 BB,ATAC TFA (Th17 48h), BB, w=1 0.1 0.1 0.1 BB,ATAC TFA, (Th17 + 48h), BB, w=1.1

0.7 AUPR 0.7 0.7 AUPR AUPR ChIP (KOall)

AUPR ATAC (Th17 48h), BB, w=1.5

AUPR BB, TFA, ++ AUPR 0.6 0.6 0.6 0.6 0.05 0.05 0.05 ATAC (Th17 48h) (KOall) ATAC (Th17 48h) (KC1p5) BB,ATAC m, +(Th17 48h), BB, w=1.1 0.5 0.5 0.5 BB,ATAC m, (Th17++ 48h), BB, w=1.5 0 0 0 0 0 0 NoNo Prior, Prior,LS LS Precision Precision 0.4 0.4 Precision 0.4 Precision Precision 0.4 LS,ATAC TFA (Th17 48h), LS, w=1 0.3 0.3 0.3 LS,ATAC TFA, (Th17 + 48h), LS, w=0.5 0.2 0.2 0.2 LS,ATAC TFA, (Th17++ 48h), LS, w=0.25 LS,ATAC m, +(Th17 48h), LS, w=0.5 0.1 0.1 0.1 0.2 LS,ATACm, ++(Th17 48h), LS, w=0.25 0 0 0 0 0.1Recall 0.2 .25 00 0.1Recall 0.2 .25 00 0.1Recall 0.2 .25 Recall Recall Recall 0 B Atf6 C 0 0.2 0.4 0.6 0.8 1 # Targets / TF Batf Breakdownlog (AUPR of Precision/AUPR -recall) by TF Recall 2 model random ATF6 Bcl11b 2 BATFATF6ATF6 Crem NO PRIOR,LS N.P. BATF CHIP+ BCL11BBATF Ddit3 m BCL11BCREM CHIP++ BCL11B Egr2 CREMDDIT3CREM CHIP Etv6 CHIP+ ChIP DDIT3 TFA DDIT3EGR2 CHIP++ 1.5 EGR2ETV6EGR2 Fosl2 SATAC+ m FOSL2ETV6ETV6 Hif1a SATAC++ FOSL2HIF1AFOSL2 Id2 SATAC StARS

- + HIF1A SATAC ATAC HIF1AID2 Ikzf3 TFA IKZF3ID2ID2 SATAC++ IKZF3 Irf4 TRRUST+ 1 IKZF4 m IKZF3 ++ IKZF4 Lef1 TRRUST IKZF4IRF4 TRRUST IRF4 Maf No Prior,BB IRF4IRF8 TRRUST+ No Prior,LS ENCODE ENCODE ENCODE ENCODE ENCODE ENCODE ENCODE ENCODE ENCODE ENCODE ENCODE TRRUST TRRUST TRRUST TRRUST TRRUST TRRUST TRRUST TRRUST TRRUST TRRUST TRRUST mLASSO sATAC sATAC sATAC sATAC sATAC sATAC sATAC sATAC sATAC sATAC sATAC TFA

IRF8 TRRUST LEF1IRF8 Nfatc2 TRRUST++ ChIP ChIP ChIP ChIP ChIP ChIP ChIP ChIP ChIP ChIP ChIP LEF1 + LEF1MAF Nfe2l2 ENCODE Batf m Fosl2 0.5 MAF ++ Irf4 NFATC2MAF ENCODE Maf Rorc Nfatc2 ENCODE Rorc NFATC2NFE2L2NFATC2 Stat3 Satb1 Satb1 ENCODE+ Atf6 POU2AF1NFE2L2 Egr2

NFE2L2 TFA Crem ENCODE ENCODE++ Etv6 POU2AF1 Stat3 Lef1 log RORC Bcl11b POU2AF1 2 NO PRIOR,BB Ikzf3 (AUPR RORC Zfp36l2 Zfp36l2

SATB1RORC ID2 Id2 model

CHIP SP4 Ddit3 IRF4 Nfe2l2 IRF8 MAF IRF4 MAF MAF IRF4 0 LEF1 ATF6 ETV6 ETV6 ETV6 ZEB1 /AUPR BATF BATF BATF EGR2

SATB1 IKZF3 Hif1a IKZF4 DDIT3 HIF1A HIF1A HIF1A SATB1SP4 CREM RORC RORC RORC

CHIP STAT3 STAT3 STAT3

FOSL2 Batf FOSL2 FOSL2 0 500 1000 Fosl2 SATB1 NFE2L2 NFE2L2 NFE2L2 BCL11B NFATC2 Etv6 random STAT3SP4 ZFP36L2 SP4 CHIP Irf4 MafPOU2AF1 STAT3 CHIP Rorc ) TSC22D3STAT3 KO75KO Stat3 Nfe2l2 ZEB1 CHIP Hif1a TSC22D3ZEB1 KC1KO+ChIP KO Batf Fosl2 KO+ChIP ChIP SATAC Rorc ZFP36L2ZFP36L2ZEB1 Stat3 ChIP75ChIP Irf4 SATAC Maf -0.5 ZFP36L2 Nfe2l2 logEtv62 ( AUPRmodel/AUPRrandom) 0 500 1000 SATAC Hif1a 0 500 1000 SATAC ≤-2 ≥2 KOall SATAC -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 KOallKC1p5 TRRUST TRRUST KC1p5ChIP75 TRRUST -1 ChIP75 TRRUST TRRUST ENCODE ENCODE ENCODE ENCODE -1.5 ENCODE CHIP SATAC TRRUST ENCODE -2 ID2 SP4 IRF4 IRF8 IRF4 IRF4 LEF1 IKZF3 ATF6 BATF ETV6 IKZF4 HIF1A MAF ZEB1 BATF ETV6 MAF HIF1A MAF BATF ETV6 HIF1A EGR2 DDIT3 FOSL2 STAT3 FOSL2 STAT3 FOSL2 STAT3 CREM SATB1 RORC RORC RORC NFE2L2 NFE2L2 NFE2L2 BCL11B ZFP36L2 NFATC2 POU2AF1 Figure 4 -- LO Figure 3

A D Gene Expression Prediction Sets ATACATAC Th17,- only,bias =0.5, TFA, s=Network + ChIPChIP+KO+ATAC A17 KOall ATh, bias =0.5,, TFA, s=Network + 6 6

Late Th17 11 11 (18 samples)

ATAC Th17, bias =0.5, s=Network0.8 0.8 48hr 6 .8 .8 All Th0 96hr R2pred, EarlyTh17LO10 R2pred, EarlyTh17LO10 (53 samples) ATAC Th17, bias =0.5, s=Network R2pred, Th0LO53 R2pred, Th0LO53 1 0.6.6 6 0.6.6R2pred, IL2Th17LO18 R2pred, IL2Th17LO18 KOall: F1-score KOall: F1-score KOall: Precision KOall: Precision Th17ATAC Th17, bias 1=0.5, s=Network0.4.4 0.4.4KOall: Recall KOall: Recall 72hr6 KC1p5: F1-score KC1p5: F1-score KC1p5: Precision KC1p5: Precision

100 0.8 QualityMetric , F1-Scores, Precision, Recall 0.2.2 , F1-Scores, Precision, Recall 0.2.2

PC 3(9%) PC 1 PC 3 (9.45%) 2 pred 16hr CD4TCD4T 2 pred R 0.8 R 80 Early Th17 Th0CD4Tper R2pred, EarlyTh17LO10 (8 samples) 0 0 Th1Th0 00 55 R2pred,1010 Th0LO531515 2020 00 R2pred,55 EarlyTh17LO101010 1515 2020 0.6 Th17 TFs Per Gene TFs Per Gene 60 3-12hr Th0per TFsR2pred, Per Gene IL2Th17LO18 R2pred,TFs PerTh0LO53 Gene 0.8 Th2 Th1 0.6 KOall: F1-score R2pred, IL2Th17LO18 Tr1Th17 2 40 1hr KOall:R2pred,R pred, EarlyPrecision EarlyTh17LO10 Th17 (8) Recall,KOall: KOF1-score TregTh17per 2 0.4 KOall:R2pred,R pred, Th0Recall Th0LO53 (53) Precision,KOall: Precision KO 2 20 0.6 KC1p5:R2pred,R pred, Late F1-score IL2Th17LO18 Th17 (18) Recall,KOall: KO+ChIPRecall PC0.4PC 1 1(27%) (26.91%) PCPC 2 2 (14.53%) (15%) KC1p5:KOall:15 TFs F1-scorePer Precision Gene Precision,KC1p5: F1-score KO+ChIP 0 KOall: Precision KC1p5: Precision , F1-Scores, Precision, Recall 0.2 KOall: Recall 0.4 , F1-Scores, Precision, Recall 0.2 -20 KC1p5: F1-score B ATAC (Th17 48h),2 pred LO = Early Th17 ATAC (Th17All Th048h), LO = Th0 ATAC (Th17 48h), LO = Late Th17

Early Th17 2 pred Late Th17 R KC1p5: Precision 1 11 1 R 11 -40 0 , F1-Scores, Precision, Recall 0.2 0 0.8.8 0 5 0.8.8 10 15 20 0.8.8 -60 0 5 10 15 20

-50 0 2 pred RNA50-seq PCA 100 pred TFs Per Gene

2 TFs Per Gene ChIP (KO75) R 2 pred

0.6 2 pred 2 pred 0.6.6 .6 0.6.6 1 0 0.9 .40.4 0 5 0.4.4 10 15 20 0.4.4 Overall R Overall R Overall R Random TFs Per Gene 0.8 OverallR NoNo Prior, Prior,LS LS 0.2 0.2.2 0.2.2 .2 0.7 LS,ChIP, TFA LS, w=1 LS,ChIP, TFA, LS, + w=0.5 0 0 0 0.6 LS,ChIP, TFA, LS, ++ w=0.1 00 55 1010 1515 2020 00 55 1010 1515 2020 00 55 1010 1515 2020 LS,ChIP, m, LS,+ w=0.5 TFs Per Gene TFs Per Gene TFsTFs Per Per GeneGene TFs Per Gene TFs0.5 Per Gene LS,ChIP, m, LS, ++ w=0.1 ATAC (Th17 48h), LO = Early Th17 ATAC (Th17 48h), LO = Th0 ATAC (Th17 48h), LO = Late Th17 NoNo Prior, Prior,BB BB Precision 0.4 C BB,ChIP, TFA BB, w=1 11 1 11 BB,ChIP, TFA, BB, + w=1.1 0.3 BB,ChIP, TFA, BB,++ w=1.5 0.8 .8 .8 BB,ChIP, m, BB,+ w=1.1 .8 0.8 0.8 0.2

) BB,ChIP,m, BB,++ w=1.5

pred .60.6 0.6.6 .60.6 0.1 2 CDF CDF CDF 0 .40.4 0.4.4 .40.4 0 0.1 0.2 0.3

CDF(R Recall .20.2 0.2.2 .20.2

00 00 00 --11 --0.5.5 0 .50.5 1 --11 --0.5.5 0 .50.5 11 --11 -0.5-.5 00 .50.5 11 2 2 2 R2 R2 R2 R pred Rpredpred R predpred In [13]: # where script also searches the current directory # directory = "/Users/emiraldi/erm/Shared/DCproject/1702_DC_lineageNets_binder" In [13]: # where script also searches the current directory # directory = "/Users/emiraldi/erm/Shared/DCproject/1702_DC_lineageNets_binder" # # netPath = "DC_genesets_FC1_FDR10/DCs_max4_body_bp10000/DCnets_loc/DCnets_loc_pFC1_FDR10_rawp0001_hyg0001" # netPath = "DCs_ATAC_1701_cut4_2f_body_bp10000_pFDR10_FC1_gFDR10_FC1/DCs_pFDR10_FC1_gFDR10_FC1_rawp00001_hyg001" # # netPath = "DC_genesets_FC1_FDR10/DCs_max4_body_bp10000/DCnets_loc/DCnets_loc_pFC1_FDR10_rawp0001_hyg0001" # netPath = "DCs_ATAC_1701_cut4_2f_body_bp10000_pFDR10_FC1_gFDR10_FC1/DCs_pFDR10_FC1_gFDR10_FC1_rawp00001_hyg001" # # The starting conditions for each of the networks, a list of tuples. Tuple entries are: # # 0. network file name (column format) (as found in directory) # # The starting conditions for each of the networks, a list of tuples. Tuple entries are: # # 1. column of the expression matrix that you want the nodes to be colored by # # 0. network file name (column format) (as found in directory) # # 2. network title, to which we'll add the gene and peak cutoffs # # 1. column of the expression matrix that you want the nodes to be colored by # # 3. subselect the list of relevant genes -- NOT USED for this code and can be left out # # 2. network title, to which we'll add the gene and peak cutoffs # networkInits = [('CD103p_sp.tsv','GF_SI_CD11b-CD103+_1','CD103+CD11b- DCs',1), # # 3. subselect the list of relevant genes -- NOT USED for this code and can be left out # ('CD11bmCD103m_sp.tsv','GF_SI_CD11b-CD103-_1','CD11b-CD103- DCs',1), # networkInits = [('CD103p_sp.tsv','GF_SI_CD11b-CD103+_1','CD103+CD11b- DCs',1), # ('CD11bpCD103p_sp.tsv','GF_SI_CD11b+CD103+_1','CD11b+CD103+ DCs',1), # ('CD11bmCD103m_sp.tsv','GF_SI_CD11b-CD103-_1','CD11b-CD103- DCs',1), # ('CD11bp_sp.tsv','GF_SI_CD11b+CD103-_1','CD11b+CD103- DCs',1), # ('CD11bpCD103p_sp.tsv','GF_SI_CD11b+CD103+_1','CD11b+CD103+ DCs',1), # ('Macrophage_sp.tsv','GF_SI_Macrophage_1','Macrophage',1), # ('CD11bp_sp.tsv','GF_SI_CD11b+CD103-_1','CD11b+CD103- DCs',1), # ('Monocyte_sp.tsv','GF_SI_Monocyte_1','Monocyte',1)] # ('Macrophage_sp.tsv','GF_SI_Macrophage_1','Macrophage',1), # ('Monocyte_sp.tsv','GF_SI_Monocyte_1','Monocyte',1)] # expressionFile = "DC_RNAseq_2f_lengthNormCounts.txt" # expressionFile = "DC_RNAseq_2f_lengthNormCounts.txt" # tfFocus = 0 # If 1, automatically applies the "trim" function, so we can focus on TFs # # If 0, does not! # tfFocus = 0 # If 1, automatically applies the "trim" function, so we can focus on TFs # # If 0, does not! # threshhold = .2 # Set to .6 if you want to remove TFs that aren't differentially expressed # threshhold = .2 # Set to .6 if you want to remove TFs that aren't differentially expressed # clim = 1.5 # absolute value color threshhold on edge color # clim = 1.5 # absolute value color threshhold on edge color

# compare Th17 ATAC-seq networks to gold standard networks multiple network comparison tool directory = "/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks" In [13]: # where script also searches the current directory # compare Th17 ATAC-seq networks to gold standard networks multiple network comparison tool # directory = "/Users/emiraldi/erm/Documents/Manuscripts/Th17methods_manuscript/Th17_TRN_notebooks" # directory = "/Users/emiraldi/erm/Shared/DCproject/1702_DC_lineageNets_binder"directory = "/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks" netPath = 'Networks_pre180331' # directory = "/Users/emiraldi/erm/Documents/Manuscripts/Th17methods_manuscript/Th17_TRN_notebooks" # # netPath = "DC_genesets_FC1_FDR10/DCs_max4_body_bp10000/DCnets_loc/DCnets_loc_pFC1_FDR10_rawp0001_hyg0001"netPath = 'Networks_pre180331' # The starting conditions for each of the networks, a list of tuples. Tuple entries are: # netPath = "DCs_ATAC_1701_cut4_2f_body_bp10000_pFDR10_FC1_gFDR10_FC1/DCs_pFDR10_FC1_gFDR10_FC1_rawp00001_hyg001" # 0. network file name (column format) (as found in directory)In [13]: # where script also searches the current directory # The starting conditions for each of the networks, a list of tuples. Tuple entries are: # 1. column of the expression matrix that you want the nodes to be# directorycolored by = "/Users/emiraldi/erm/Shared/DCproject/1702_DC_lineageNets_binder"# # The starting conditions for each of the networks, a list of# tuples.0. network Tuple file entries name (column are: format) (as found in directory) # 2. network title, to which we'll add the gene and peak cutoffs # # 0. network file name (column format) (as found in directory)# 1. column of the expression matrix that you want the nodes to be colored by # 3. cut off for edge strength # # netPath = "DC_genesets_FC1_FDR10/DCs_max4_body_bp10000/DCnets_loc/DCnets_loc_pFC1_FDR10_rawp0001_hyg0001"# # 1. column of the expression matrix that you want the nodes #to 2. be network colored title, by to which we'll add the gene and peak cutoffs sampleConditionOfInt = 'Th17(48h)' # netPath = "DCs_ATAC_1701_cut4_2f_body_bp10000_pFDR10_FC1_gFDR10_FC1/DCs_pFDR10_FC1_gFDR10_FC1_rawp00001_hyg001"# # 2. network title, to which we'll add the gene and peak cutoffs# 3. cut off for edge strength # sampleConditionOfInt = 'Th17_wt_48h_SL1858' # # 3. subselect the list of relevant genes -- NOT USED for thissampleConditionOfInt code and can be left = 'Th17(48h)'out # Th17_KC_sigGenes_net.tsv # # The starting conditions for each of the networks, a list of tuples. # TuplenetworkInits entries =are: [('CD103p_sp.tsv','GF_SI_CD11b-CD103+_1','CD103+CD11b-# sampleConditionOfInt DCs',1), = 'Th17_wt_48h_SL1858' # -rw-r--r-- 1 emiraldi staff 10K Apr 30 12:43 sATAC_inf_net_sigGenes.tsv# # 0. network file name (column format) (as found in directory) # ('CD11bmCD103m_sp.tsv','GF_SI_CD11b-CD103-_1','CD11b-CD103-# DCs',1),Th17_KC_sigGenes_net.tsv # -rw-r--r-- 1 emiraldi staff 9.3K Apr 30 12:44 TsAqA_Inf_sigGenes.tsv# # 1. column of the expression matrix that you want the nodes to be colored# ('CD11bpCD103p_sp.tsv','GF_SI_CD11b+CD103+_1','CD11b+CD103+ by # DCs',1),-rw-r--r-- 1 emiraldi staff 10K Apr 30 12:43 sATAC_inf_net_sigGenes.tsv networkInits = [ # # 2. network title, to which we'll add the gene and peak cutoffs # ('CD11bp_sp.tsv','GF_SI_CD11b+CD103-_1','CD11b+CD103- DCs',1),# -rw-r--r-- 1 emiraldi staff 9.3K Apr 30 12:44 TsAqA_Inf_sigGenes.tsv ('ATAC_Th17_bias50_sp.tsv',sampleConditionOfInt,'sA(Th17), TFA'# #, 3.0), subselect the list of relevant genes -- NOT USED for this code and# can ('Macrophage_sp.tsv','GF_SI_Macrophage_1','Macrophage',1), be left out networkInits = [ ('ATAC_Th17_bias50_noTFA_sp.tsv',sampleConditionOfInt,'sA(Th17),# networkInits TF mRNA', 0=), [('CD103p_sp.tsv','GF_SI_CD11b-CD103+_1','CD103+CD11b- #DCs',1), ('Monocyte_sp.tsv','GF_SI_Monocyte_1','Monocyte',1)] ('ATAC_Th17_bias50_sp.tsv',sampleConditionOfInt,'sA(Th17), TFA', 0), ('ATAC_Th17_bias50_maxComb_sp.tsv',sampleConditionOfInt,'sA(Th17),# ('CD11bmCD103m_sp.tsv','GF_SI_CD11b-CD103-_1','CD11b-CD103- Max', 0), DCs',1), ('ATAC_Th17_bias50_noTFA_sp.tsv',sampleConditionOfInt,'sA(Th17), TF mRNA', 0), ('ENCODE_bias50_sp.tsv',sampleConditionOfInt,'ENCODE, TFA',0),# ('CD11bpCD103p_sp.tsv','GF_SI_CD11b+CD103+_1','CD11b+CD103+ DCs',1),# expressionFile = "DC_RNAseq_2f_lengthNormCounts.txt" ('ATAC_Th17_bias50_maxComb_sp.tsv',sampleConditionOfInt,'sA(Th17), Max', 0), ('ENCODE_bias50_noTFA_sp.tsv',sampleConditionOfInt,'ENCODE, TF# mRNA' ('CD11bp_sp.tsv','GF_SI_CD11b+CD103-_1','CD11b+CD103-,0), DCs',1), ('ENCODE_bias50_sp.tsv',sampleConditionOfInt,'ENCODE, TFA',0), ('KC1p5_sp.tsv',sampleConditionOfInt,'KO-ChIP G.S. (No Infererence)'# ('Macrophage_sp.tsv','GF_SI_Macrophage_1','Macrophage',1),,0), # tfFocus = 0 # If 1, automatically applies the "trim" function, so ( 'ENCODE_bias50_noTFA_sp.tsv'we can focus on TFs ,sampleConditionOfInt,'ENCODE, TF mRNA',0), ('ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv',sampleConditionOfInt# ,('Monocyte_sp.tsv','GF_SI_Monocyte_1','Monocyte',1)]'KO+ChIP+ATAC "Final" TRN',0), # # If 0, does not! ('KC1p5_sp.tsv',sampleConditionOfInt,'KO-ChIP G.S. (No Infererence)',0), ('ATAC_Th17_bias100_noTFA_sp.tsv',sampleConditionOfInt,'No Prior',0)] ('ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv',sampleConditionOfInt,'KO+ChIP+ATAC "Final" TRN',0), # expressionFile = "DC_RNAseq_2f_lengthNormCounts.txt" # threshhold = .2 # Set to .6 if you want to remove TFs that aren't ( 'ATAC_Th17_bias100_noTFA_sp.tsv'differentially expressed ,sampleConditionOfInt,'No Prior',0)] # ('ChIP_A17_KOall_ATh_bias10_noTFA_sp.tsv',sampleConditionOfInt,'ChIP/sA(Th17)+KO+sA(Th) (b=10), TF mRNA',.4), # ('ChIP_A17_KOall_ATh_bias10_sp.tsv',sampleConditionOfInt,'ChIP/sA(Th17)+KO+sA(Th)# tfFocus = 0 # If 1, automatically (b=10), TFA',.4), applies the "trim" function, so we can# clim focus = 1.5on TFs# absolute value color threshhold on edge color # ('ChIP_A17_KOall_ATh_bias10_noTFA_sp.tsv',sampleConditionOfInt,'ChIP/sA(Th17)+KO+sA(Th) (b=10), TF mRNA',.4), # ('Th17_48h_cut4_sA_p5_huA_bias50_sp.tsv',sampleConditionOfInt,'sA(Th17),# # If 0, doesTFA', not! .6)] # ('ChIP_A17_KOall_ATh_bias10_sp.tsv',sampleConditionOfInt,'ChIP/sA(Th17)+KO+sA(Th) (b=10), TFA',.4), # ('Networks1711/Th17_48h_cut4_sA_p5_huA_bias50_noTFA_sp.tsv','sA(Th17), TF mRNA',.75), # ('Th17_48h_cut4_sA_p5_huA_bias50_sp.tsv',sampleConditionOfInt,'sA(Th17), TFA', .6)] # ('Networks1711/Th17_48h_cut4_sA_p5_huA_bias50_sp.tsv','sA(Th17),# threshhold TFA',.75), = .2 # Set to .6 if you want to remove TFs that aren't differentially# compare Th17 expressed ATAC-seq networks to gold standard networks multiple# network('Networks1711/Th17_48h_cut4_sA_p5_huA_bias50_noTFA_sp.tsv','sA(Th17), comparison tool TF mRNA',.75), # ('Networks1711/iTh_1503_cut4_sA_p5_huA_bias50_noTFA_sp.tsv','sA(Th), TF mRNA',.75), directory = "/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks"# ('Networks1711/Th17_48h_cut4_sA_p5_huA_bias50_sp.tsv','sA(Th17), TFA',.75), # ('Networks1711/iTh_1503_cut4_sA_p5_huA_bias50_sp.tsv','sA(Th),# clim TFA',.75)] = 1.5 # absolute value color threshhold on edge color # directory = "/Users/emiraldi/erm/Documents/Manuscripts/Th17methods_manuscript/Th17_TRN_notebooks"# ('Networks1711/iTh_1503_cut4_sA_p5_huA_bias50_noTFA_sp.tsv','sA(Th), TF mRNA',.75), # goldStandards=[("Networks1711/KC1_sp.tsv",'Th17 KO-ChIP G.S. (9 TFs)',1)] netPath = 'Networks_pre180331' # ('Networks1711/iTh_1503_cut4_sA_p5_huA_bias50_sp.tsv','sA(Th), TFA',.75)] # goldStandards=[("Networks1711/KC1_sp.tsv",'Th17 KO-ChIP G.S. (9 TFs)',1)] In [13]: ## expressionFilewhere script also = 'th17_timecourse_unsorted.txt' searches the current directory # compare Th17 ATAC-seq networks to gold standard networks multiple network# The comparison starting conditionstool for each of the networks,In [13]:a list of# wheretuples. script Tuple also entries searches are: the current directory expressionFile# directory = "/Users/emiraldi/erm/Shared/DCproject/1702_DC_lineageNets_binder"= 'Th0_Th17_48hTh.txt' directory = "/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks" # 0. network file name (column format) (as found in directory)# directory# expressionFile = "/Users/emiraldi/erm/Shared/DCproject/1702_DC_lineageNets_binder" = 'th17_timecourse_unsorted.txt' # 'Th17_TARGpap17v0FC0p58FDR10_REGSexp80nPapFC0p58FDR25_ivTh_targGenes.txt' # directory = "/Users/emiraldi/erm/Documents/Manuscripts/Th17methods_manuscript/Th17_TRN_notebooks"# 1. column of the expression matrix that you want the nodes to expressionFile be colored by = 'Th0_Th17_48hTh.txt' geneListFile# # netPath = "DC_genesets_FC1_FDR10/DCs_max4_body_bp10000/DCnets_loc/DCnets_loc_pFC1_FDR10_rawp0001_hyg0001""Th17coreGenes.txt" netPath = 'Networks_pre180331' # 2. network title, to which we'll add the gene and peak cutoffs# ## netPath'Th17_TARGpap17v0FC0p58FDR10_REGSexp80nPapFC0p58FDR25_ivTh_targGenes.txt' = "DC_genesets_FC1_FDR10/DCs_max4_body_bp10000/DCnets_loc/DCnets_loc_pFC1_FDR10_rawp0001_hyg0001" ## 'Th17_unBiasedCore.txt'netPath = "DCs_ATAC_1701_cut4_2f_body_bp10000_pFDR10_FC1_gFDR10_FC1/DCs_pFDR10_FC1_gFDR10_FC1_rawp00001_hyg001" # 3. cut off for edge strength # netPathgeneListFile = "DCs_ATAC_1701_cut4_2f_body_bp10000_pFDR10_FC1_gFDR10_FC1/DCs_pFDR10_FC1_gFDR10_FC1_rawp00001_hyg001" = "Th17coreGenes.txt" # "allTfList.txt" # The starting conditions for each of the networks, a list of tuples. TuplesampleConditionOfInt entries are: = 'Th17(48h)' # 'Th17_unBiasedCore.txt' ## 'Th17_unBiasedCore.txt'# The starting conditions for each of the networks, a list of #tuples. 0. network Tuple file entries name (columnare: format) (as found in directory) # sampleConditionOfInt = 'Th17_wt_48h_SL1858' # ## The"allTfList.txt" starting conditions for each of the networks, a list of tuples. Tuple entries are: ## Th17coreGenes.txt# 0. network file name (column format) (as found in directory)# 1. column of the expression matrix that you want the nodes to be colored# Th17_KC_sigGenes_net.tsv by # ## 0.'Th17_unBiasedCore.txt' network file name (column format) (as found in directory) # # 1. column of the expression matrix that you want the nodes to# 2.be networkcolored title,by to which we'll add the gene and peak cutoffs # -rw-r--r-- 1 emiraldi staff 10K Apr 30 12:43 sATAC_inf_net_sigGenes.tsv# ## 1.Th17coreGenes.txt column of the expression matrix that you want the nodes to be colored by tfFocus# # 2. network= 0 # If title, 1, automatically to which we'll applies add the gene"trim" and function, peak cutoffs so# we3. cancut focusoff for on edgeTFs strength # -rw-r--r-- 1 emiraldi staff 9.3K Apr 30 12:44 TsAqA_Inf_sigGenes.tsv# # 2. network title, to which we'll add the gene and peak cutoffs # # #3. If subselect 0, does not!the list of relevant genes -- NOT USED for thissampleConditionOfInt code and can be left = out'Th17(48h)' networkInits = [ # #tfFocus 3. subselect = 0 # Ifthe 1, list automatically of relevant applies genes --the NOT "trim" USED function,for this codeso we and can can focus be lefton TFs out ## threshholdnetworkInits = .2= [('CD103p_sp.tsv','GF_SI_CD11b-CD103+_1','CD103+CD11b-# Set to .6 if you want to remove TFs that aren't# sampleConditionOfInt differentially DCs',1), expressed = 'Th17_wt_48h_SL1858' ('ATAC_Th17_bias50_sp.tsv',sampleConditionOfInt,'sA(Th17),# networkInits TFA' #, If0), 0, does= [('CD103p_sp.tsv','GF_SI_CD11b-CD103+_1','CD103+CD11b- not! DCs',1), clim# ('CD11bmCD103m_sp.tsv','GF_SI_CD11b-CD103-_1','CD11b-CD103-= 1.5 # absolute value color threshhold on edge color #DCs',1), Th17_KC_sigGenes_net.tsv ('ATAC_Th17_bias50_noTFA_sp.tsv',sampleConditionOfInt,'sA(Th17),# # ('CD11bmCD103m_sp.tsv','GF_SI_CD11b-CD103-_1','CD11b-CD103-threshhold TF mRNA' = ,.2 0 ),# Set to .6 if you want to remove TFs that aren'tDCs',1), differentially expressed # ('CD11bpCD103p_sp.tsv','GF_SI_CD11b+CD103+_1','CD11b+CD103+ #DCs',1), -rw-r--r-- 1 emiraldi staff 10K Apr 30 12:43 sATAC_inf_net_sigGenes.tsv ('ATAC_Th17_bias50_maxComb_sp.tsv',sampleConditionOfInt,#'sA(Th17), clim ('CD11bpCD103p_sp.tsv','GF_SI_CD11b+CD103+_1','CD11b+CD103+ = 1.5 Max' # ,absolute 0), value color threshhold on edge color DCs',1), # -rw-r--r-- 1 emiraldi staff 9.3K Apr 30 12:44 TsAqA_Inf_sigGenes.tsv # ('CD11bp_sp.tsv','GF_SI_CD11b+CD103-_1','CD11b+CD103- DCs',1), ('ENCODE_bias50_sp.tsv',sampleConditionOfInt,'ENCODE, TFA'# , 0('CD11bp_sp.tsv','GF_SI_CD11b+CD103-_1','CD11b+CD103-), DCs',1), networkInits = [ # ('Macrophage_sp.tsv','GF_SI_Macrophage_1','Macrophage',1), ('ENCODE_bias50_noTFA_sp.tsv',sampleConditionOfInt,'ENCODE,# ('Macrophage_sp.tsv','GF_SI_Macrophage_1','Macrophage',1),TF mRNA',0), In [14]: # %%html ('ATAC_Th17_bias50_sp.tsv',sampleConditionOfInt,'sA(Th17), TFA', 0), # ('Monocyte_sp.tsv','GF_SI_Monocyte_1','Monocyte',1)] ('KC1p5_sp.tsv',sampleConditionOfInt,'KO-ChIP G.S. (No Infererence)'# ('Monocyte_sp.tsv','GF_SI_Monocyte_1','Monocyte',1)],0), # ('KC1p5_sp.tsv',sampleConditionOfInt,'KO-ChIP G.S. (No Infererence)',0), # } # # If 0, does not! # ('ChIP_A17_KOall_ATh_bias10_sp.tsv',sampleConditionOfInt,'ChIP/sA(Th17)+KO+sA(Th)# # If 0, does not! (b=10), TFA',.4), ('ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv',sampleConditionOfInt,'KO+ChIP+ATAC# ('Th17_48h_cut4_sA_p5_huA_bias50_sp.tsv',sampleConditionOfInt,'sA(Th17), "Final" TRN',0), # TFA', .6)] ('ATAC_Th17_bias100_noTFA_sp.tsv',sampleConditionOfInt,'No Prior',0)]# ('Networks1711/Th17_48h_cut4_sA_p5_huA_bias50_noTFA_sp.tsv','sA(Th17),# threshhold = .2 TF# SetmRNA',.75), to .6 if you want to remove TFs that aren't differentially expressed In [15]: ## Uncommentthreshhold to = run.2 #without Set to install.6 if you (in want binder, to remove for example) TFs that aren't differentially expressed # ('Networks1711/Th17_48h_cut4_sA_p5_huA_bias50_sp.tsv','sA(Th17), TFA',.75), import sys In [15]: # Uncomment to run without install (in binder, for example) # ('ChIP_A17_KOall_ATh_bias10_noTFA_sp.tsv',sampleConditionOfInt,'ChIP/sA(Th17)+KO+sA(Th)# ('Networks1711/iTh_1503_cut4_sA_p5_huA_bias50_noTFA_sp.tsv','sA(Th), (b=10), TF mRNA',.4), # clim = 1.5 # absoluteTF mRNA',.75), value color threshhold on edge color if# clim".." =not 1.5 in # sysabsolute.path: value color threshhold on edge color import sys # ('ChIP_A17_KOall_ATh_bias10_sp.tsv',sampleConditionOfInt,'ChIP/sA(Th17)+KO+sA(Th)# ('Networks1711/iTh_1503_cut4_sA_p5_huA_bias50_sp.tsv','sA(Th), (b=10), TFA',.4), TFA',.75)] sys.path.append("..") if ".." not in sys.path: # ('Th17_48h_cut4_sA_p5_huA_bias50_sp.tsv',sampleConditionOfInt,'sA(Th17),# goldStandards=[("Networks1711/KC1_sp.tsv",'Th17 TFA', .6)] KO-ChIP G.S. (9 TFs)',1)] from jp_gene_viz import dNetwork sys.path.append("..") # ('Networks1711/Th17_48h_cut4_sA_p5_huA_bias50_noTFA_sp.tsv','sA(Th17), TF mRNA',.75), # compare Th17 ATAC-seq networks to gold standard networks multiple network comparison tool dNetwork# compare.load_javascript_support Th17 ATAC-seq networks() to gold standard networks multiple network comparison tool from jp_gene_viz import dNetwork # ('Networks1711/Th17_48h_cut4_sA_p5_huA_bias50_sp.tsv','sA(Th17), TFA',.75),# expressionFile = 'th17_timecourse_unsorted.txt' directory = "/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks" fromdirectory jp_gene_viz = "/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks" import multiple_network dNetwork.load_javascript_support() # ('Networks1711/iTh_1503_cut4_sA_p5_huA_bias50_noTFA_sp.tsv','sA(Th),expressionFile TF mRNA',.75), = 'Th0_Th17_48hTh.txt' # directory = "/Users/emiraldi/erm/Documents/Manuscripts/Th17methods_manuscript/Th17_TRN_notebooks" from# directory jp_gene_viz = "/Users/emiraldi/erm/Documents/Manuscripts/Th17methods_manuscript/Th17_TRN_notebooks" import LExpression from jp_gene_viz import multiple_network # ('Networks1711/iTh_1503_cut4_sA_p5_huA_bias50_sp.tsv','sA(Th), TFA',.75)]# 'Th17_TARGpap17v0FC0p58FDR10_REGSexp80nPapFC0p58FDR25_ivTh_targGenes.txt'netPath = 'Networks_pre180331' LExpressionnetPath = 'Networks_pre180331'.load_javascript_support() from jp_gene_viz import LExpression # goldStandards=[("Networks1711/KC1_sp.tsv",'Th17 KO-ChIP G.S. (9 TFs)',1)]geneListFile = "Th17coreGenes.txt" LExpression.load_javascript_support() # The starting conditions for each of the networks, a list of tuples. Tuple entries are: # 'Th17_unBiasedCore.txt' # The starting conditions for each of the networks, a list of tuples. Tuple entries are: # expressionFile = 'th17_timecourse_unsorted.txt' In [16]: networkList# 0. network = filelist ()name # (columnthis list format) contains (as heatmap-linkedfound in directory) network objects # "allTfList.txt" # 0. network file name (column format) (as found in directory) expressionFile = 'Th0_Th17_48hTh.txt' # 1. column of the expression matrix that you want the nodes to be colored by # 'Th17_unBiasedCore.txt' In [16]:# 1.networkList column of =the list expression() # this matrix list containsthat you heatmap-linkedwant the nodes networkto be colored objects by # 'Th17_TARGpap17v0FC0p58FDR10_REGSexp80nPapFC0p58FDR25_ivTh_targGenes.txt' for# 2. networkInit network title, in networkInits to which we'll: add the gene and peak cutoffs # Th17coreGenes.txt # 2. network title, to which we'll add the gene and peak cutoffs geneListFile = "Th17coreGenes.txt" # 3. networkFile cut off for = edgenetworkInit strength[0] # 3.for cut networkInit off for edge in networkInitsstrength : # 'Th17_unBiasedCore.txt' sampleConditionOfInt curr = LExpression =. LinkedExpressionNetwork'Th17(48h)' () tfFocus = 0 # If 1, automatically applies the "trim" function,sampleConditionOfInt so wenetworkFile can focus =on networkInit= TFs'Th17(48h)'[0] # "allTfList.txt" # sampleConditionOfInt print directory + '/' = 'Th17_wt_48h_SL1858'+ netPath + '/' + networkFile # If 0, does not! # sampleConditionOfInt curr = LExpression = .'Th17_wt_48h_SL1858'LinkedExpressionNetwork() # 'Th17_unBiasedCore.txt' ## Th17_KC_sigGenes_net.tsv try: # threshhold = .2 # Set to .6 if you want to remove TFs that# aren'tTh17_KC_sigGenes_net.tsv print differentially directory +expressed '/' + netPath + '/' + networkFile # Th17coreGenes.txt # -rw-r--r-- curr.load_network 1 emiraldi(directory staff + '/' 10K + AprnetPath 30 12:43 + '/' sATAC_inf_net_sigGenes.tsv + networkFile) clim = 1.5 # absolute value color threshhold on edge color # -rw-r--r--# try: 1 emiraldi staff 10K Apr 30 12:43 sATAC_inf_net_sigGenes.tsv ## -rw-r--r-- except AssertionError: 1 emiraldi staff 9.3K Apr 30 12:44 TsAqA_Inf_sigGenes.tsv # -rw-r--r-- curr.load_network 1 emiraldi( directory staff 9.3K+ '/' Apr + netPath30 12:44 + TsAqA_Inf_sigGenes.tsv'/' + networkFile) tfFocus = 0 # If 1, automatically applies the "trim" function, so we can focus on TFs #networkInits directory = [ = "." networkInits# except = [AssertionError: # curr.load_network(directory + '/' + netPath + '/' + networkFile) # If 0, does not! # ( 'ATAC_Th17_bias50_sp.tsv' directory = "." ,sampleConditionOfInt,'sA(Th17), TFA', 0), ('ATAC_Th17_bias50_sp.tsv',sampleConditionOfInt,'sA(Th17), TFA', 0), In [14]: # %%html networkList.append(curr) # threshhold = .2 # Set to .6 if you want to remove TFs that aren't differentially expressed # ( 'ATAC_Th17_bias50_noTFA_sp.tsv' curr.load_network(directory,sampleConditionOfInt + '/' + netPath +, 'sA(Th17),'/' + networkFile) TF mRNA' , 0 ), ('ATAC_Th17_bias50_noTFA_sp.tsv',sampleConditionOfInt,'sA(Th17), TF mRNA', 0), # ('Reading ('ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv' network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_sp.tsv'),sampleConditionOfInt,'KO+ChIP+ATAC "Final" TRN',0), ('ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv',sampleConditionOfInt# (b=10), TFA',.4), if ".." not in sys.path: Omitting# ('Th17_48h_cut4_sA_p5_huA_bias50_sp.tsv',sampleConditionOfInt,'sA(Th17), edges, using canvas, and fast layout default because the network is large TFA', .6)] sys.path.append("..") # ('Loading ('Th17_48h_cut4_sA_p5_huA_bias50_sp.tsv',sampleConditionOfInt,'sA(Th17), saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_noTFA_sp.tsv.layout.json') TFA', .6)] /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_maxComb_sp.tsv# ('Networks1711/Th17_48h_cut4_sA_p5_huA_bias50_noTFA_sp.tsv','sA(Th17), TF mRNA',.75), from jp_gene_viz import dNetwork # Omitting ('Networks1711/Th17_48h_cut4_sA_p5_huA_bias50_noTFA_sp.tsv','sA(Th17), edges, using canvas, and fast layout default because the network isTF largemRNA',.75), ('Reading# ('Networks1711/Th17_48h_cut4_sA_p5_huA_bias50_sp.tsv','sA(Th17), network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_maxComb_sp.tsv')In [15]: # Uncomment TFA',.75), to run without install (in binder, for example) dNetwork.load_javascript_support() # /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_maxComb_sp.tsv ('Networks1711/Th17_48h_cut4_sA_p5_huA_bias50_sp.tsv','sA(Th17), TFA',.75), ('Loading# ('Networks1711/iTh_1503_cut4_sA_p5_huA_bias50_noTFA_sp.tsv','sA(Th), saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_maxComb_sp.tsv.layout.json')import sys TF mRNA',.75), from jp_gene_viz import multiple_network # ('Reading ('Networks1711/iTh_1503_cut4_sA_p5_huA_bias50_noTFA_sp.tsv','sA(Th), network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_maxComb_sp.tsv') TF mRNA',.75), Omitting# ('Networks1711/iTh_1503_cut4_sA_p5_huA_bias50_sp.tsv','sA(Th), edges, using canvas, and fast layout default because theif network ".." TFA',.75)] not is inlarge sys .path: from jp_gene_viz import LExpression # ('Loading ('Networks1711/iTh_1503_cut4_sA_p5_huA_bias50_sp.tsv','sA(Th), saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_maxComb_sp.tsv.layout.json') TFA',.75)] /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_sp.tsv# goldStandards=[("Networks1711/KC1_sp.tsv",'Th17 KO-ChIP G.S. (9 TFs)',1)] sys.path.append("..") LExpression.load_javascript_support() # goldStandards=[("Networks1711/KC1_sp.tsv",'Th17Omitting edges, using canvas, and fast layout default KO-ChIP because G.S. (9 the TFs)',1)] network is large ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_sp.tsv')from jp_gene_viz import dNetwork /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_sp.tsv ('Loading# expressionFile saved layout', = 'th17_timecourse_unsorted.txt' '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_sp.tsv.layout.json')dNetwork.load_javascript_support() # expressionFile('Reading network', = 'th17_timecourse_unsorted.txt' '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_sp.tsv') OmittingexpressionFile edges, =using 'Th0_Th17_48hTh.txt' canvas, and fast layout default because thefrom network jp_gene_viz is large import multiple_network In [16]: networkList = list() # this list contains heatmap-linked networkexpressionFile('Loading objects saved = 'Th0_Th17_48hTh.txt' layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_sp.tsv.layout.json') /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_noTFA_sp.tsv# 'Th17_TARGpap17v0FC0p58FDR10_REGSexp80nPapFC0p58FDR25_ivTh_targGenes.txt'from jp_gene_viz import LExpression # 'Th17_TARGpap17v0FC0p58FDR10_REGSexp80nPapFC0p58FDR25_ivTh_targGenes.txt'Omitting edges, using canvas, and fast layout default because the network is large ('ReadinggeneListFile network', = "Th17coreGenes.txt" '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_noTFA_sp.tsv')LExpression.load_javascript_support() for networkInit in networkInits: geneListFile/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_noTFA_sp.tsv = "Th17coreGenes.txt" ('Loading# 'Th17_unBiasedCore.txt' saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_noTFA_sp.tsv.layout.json') networkFile = networkInit[0] # 'Th17_unBiasedCore.txt'('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_noTFA_sp.tsv') Omitting# "allTfList.txt" edges, using canvas, and fast layout default because the network is large curr = LExpression.LinkedExpressionNetwork() # "allTfList.txt"('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_noTFA_sp.tsv.layout.json') In [16]: networkList = list() # this list contains heatmap-linked network objects /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/KC1p5_sp.tsv# 'Th17_unBiasedCore.txt' print directory + '/' + netPath + '/' + networkFile # 'Th17_unBiasedCore.txt'Omitting edges, using canvas, and fast layout default because the network is large ('Reading# Th17coreGenes.txt network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/KC1p5_sp.tsv') # try: # Th17coreGenes.txt/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/KC1p5_sp.tsv ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/KC1p5_sp.tsv.layout.json')for networkInit in networkInits: curr.load_network(directory + '/' + netPath + '/' + networkFile('Reading) network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/KC1p5_sp.tsv') networkFile = networkInit[0] OmittingtfFocus =edges, 0 # If using 1, automatically canvas, and fastapplies layout the default"trim" function,because the so network we can isfocus large on TFs # except AssertionError: tfFocus('Loading = 0 #saved If 1, layout', automatically '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/KC1p5_sp.tsv.layout.json') applies the "trim" function, so we can focus on TFs curr = LExpression.LinkedExpressionNetwork() /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv # If 0, does not! # directory = "." Omitting # If 0, edges,does not! using canvas, and fast layout default because the network is large print directory + '/' + netPath + '/' + networkFile ('Reading# threshhold network', = .2 # '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv')Set to .6 if you want to remove TFs that aren't differentially expressed # curr.load_network(directory + '/' + netPath + '/' #+ threshhold/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsvnetworkFile) = .2 # Set to .6 if you want to remove TFs that aren't differentially expressed # try: ('Loadingclim = 1.5 saved # absolute layout', value '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv.layout.json') color threshhold on edge color networkList.append (curr) clim('Reading = 1.5 # network',absolute value'/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv') color threshhold on edge color Omitting edges, using canvas, and fast layout default because the network curr. load_networkis large (directory + '/' + netPath + '/' + networkFile) ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv.layout.json') /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv# except AssertionError: Omitting edges, using canvas, and fast layout default because the network is large ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv')# directory = "." /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_sp.tsv/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv In [14]: ('Loading# %%html saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv.layout.json')# curr.load_network(directory + '/' + netPath + '/' + networkFile)('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_sp.tsv')In [14]: # %%html('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv') Omitting# ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_sp.tsv')('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_noTFA_sp.tsv.layout.json') # # [[networkList[0]], [networkList[1]], [networkList[2]]]) ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_sp.tsv.layout.json')Omitting edges, using canvas, and fast layout default because the# M network= multiple_network.MultipleNetworks( is large # # [networkList[2], networkList[3]] Omitting edges, using canvas, and fast layout default because the network/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_maxComb_sp.tsv is large # [[networkList[0]], [networkList[1]], [networkList[2]]]) In [15]: # Uncomment to run without install (in binder, for example) In [15]: ## M.svg_widthUncomment to = run500 without install (in binder, for example) /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_noTFA_sp.tsv('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_maxComb_sp.tsv') # # [networkList[2], networkList[3]] import sys #import M.show() sys ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_noTFA_sp.tsv')('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_maxComb_sp.tsv.layout.json') # M.svg_width = 500 if ".." not in sys.path: Mif = ".."multiple_network not in sys.path.MultipleNetworks: ( ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_noTFA_sp.tsv.layout.json')Omitting edges, using canvas, and fast layout default because the# M.show() network is large sys.path.append("..") [[sysnetworkList.path.append[0],(".." networkList) [1]], Omitting edges, using canvas, and fast layout default because the network/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_sp.tsv is large M = multiple_network.MultipleNetworks ( from jp_gene_viz import dNetwork from [ networkListjp_gene_viz[ 2import], networkList dNetwork[3]], /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_maxComb_sp.tsv('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_sp.tsv') [[networkList[0], networkList[1]], dNetwork [networkList.load_javascript_support[2], networkList() [3]], dNetwork [networkList.load_javascript_support[4], networkList[()5]], ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_maxComb_sp.tsv')('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_sp.tsv.layout.json') from jp_gene_viz [networkList import[4], networkListmultiple_network[5]], from [ networkListjp_gene_viz[ 6import], networkList multiple_network[7]]]) ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_maxComb_sp.tsv.layout.json')Omitting edges, using canvas, and fast layout default because the network is large from jp_gene_viz [networkList import[6], networkListLExpression[7]]]) #from jp_gene_viz], import LExpression Omitting edges, using canvas, and fast layout default because the network/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_noTFA_sp.tsv is large LExpression# ], .load_javascript_support() #LExpression [networkList[8],.load_javascript_support networkList[9]]])() /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_sp.tsv('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_noTFA_sp.tsv') # [networkList[8], networkList[9]]]) # [networkList[10], networkList[11]], ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_sp.tsv')('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_noTFA_sp.tsv.layout.json') # [networkList[12], networkList[13]], # [networkList[10], networkList[11]], ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_sp.tsv.layout.json')Omitting edges, using canvas, and fast layout defaultIn [16]: becausenetworkList the network = listis large() # this list contains heatmap-linked network objects In [16]: networkList = list() # this list contains heatmap-linked network objects # [networkList[12], networkList[13]], # [networkList[14], networkList[15]], Omitting edges, using canvas, and fast layout default because the network/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/KC1p5_sp.tsv is large # [networkList[16], networkList[17]]]) /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_noTFA_sp.tsv # [networkList[14], networkList[15]], for networkInit in networkInits: ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/KC1p5_sp.tsv')for networkInit in networkInits: M.svg_width = 500 ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_noTFA_sp.tsv') # [networkList[16], networkList[17]]]) networkFile = networkInit[0] ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/KC1p5_sp.tsv.layout.json') networkFile = networkInit[0] M.show() ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_noTFA_sp.tsv.layout.json')M.svg_width = 500 curr = LExpression.LinkedExpressionNetwork() Omitting edges, using canvas, and fast layout default because the curr network = LExpression is large. LinkedExpressionNetwork() Omitting edges, using canvas, and fast layout default because the network is large M.show() print directory + '/' + netPath + '/' + networkFile /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv print directory + '/' + netPath + '/' + networkFile 200 /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/KC1p5_sp.tsv # try: ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv')# try: ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/KC1p5_sp.tsv') 200 curr.load_network(directory + '/' + netPath + '/' + networkFile) ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv.layout.json') curr.load_network(directory + '/' + netPath + '/' + networkFile) ⊖ ∧ ⊨ ('Loading⊖ saved∧ layout',⊨ '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/KC1p5_sp.tsv.layout.json') # except AssertionError: Omitting edges, using canvas, and fast layout default because# the exceptnetwork AssertionError: is large Omitting edges, using canvas, and fast layout default because the network is large ⊖ ∧ ⊨ ⊖ ∧ ⊨ # directory = "." /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv# directory = "." sA(Th17), TFA /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsvsA(Th17), TF mRNA # curr.load_network(directory + '/' + netPath + '/' + networkFile) ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv')# curr.load_network(directory + '/' + netPath + '/' + networkFile) ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv')sA(Th17 ), TFA sA(Th17), TF mRNA networkList.append(curr) ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv.layout.json') networkList.append(curr) ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv.layout.json') Omitting edges, using canvas, and fast layout default because the network is large Omitting edges, using canvas, and fast layout default because the network is large /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_sp.tsv /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_sp.tsv In [17]: # visualize the networks -- HARD CODED for 5 networks: ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv') ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_sp.tsv') ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_sp.tsv') # M = multiple_network.MultipleNetworks( ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv.layout.json')('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_sp.tsv.layout.json') ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_sp.tsv.layout.json')# [[networkList[0]], [networkList[1]], [networkList[2]]]) Omitting edges, using canvas, and fast layout default because the network is large Omitting edges, using canvas, and fast layout default because the network is large Omitting edges, using canvas, and fast layout default because the network is large # # [networkList[2], networkList[3]] /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_noTFA_sp.tsv /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_noTFA_sp.tsv # M.svg_width = 500 ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_noTFA_sp.tsv') ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_noTFA_sp.tsv')In [17]: # visualize the networks -- HARD CODED for 5 networks: # M.show() ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_noTFA_sp.tsv.layout.json') ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_noTFA_sp.tsv.layout.json')# M = multiple_network.MultipleNetworks( M = multiple_network .MultipleNetworks( Omitting edges, using canvas, and fast layout default because the network is large Omitting edges, using canvas, and fast layout default because the# network [[networkList[0]], is large [networkList[1]], [networkList[2]]]) [[networkList[0], networkList[1]], /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_maxComb_sp.tsv /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_maxComb_sp.tsv# # [networkList[2], networkList[3]] [networkList[2], networkList[3]], ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_maxComb_sp.tsv') ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_maxComb_sp.tsv')# M.svg_width = 500 [networkList[4], networkList[5]], ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_maxComb_sp.tsv.layout.json') ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias50_maxComb_sp.tsv.layout.json')# M.show() [networkList [6], networkList[7]]]) Omitting edges, using canvas, and fast layout default because the network is large Omitting edges, using canvas, and fast layout default because theM =network multiple_network is large .MultipleNetworks( # ], /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_sp.tsv /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_sp.tsv [[networkList[0], networkList [1]], # [networkList[8], networkList[9]]]) ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_sp.tsv') ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_sp.tsv') [networkList[2], networkList[3]], # [networkList[10], networkList[11]], ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_sp.tsv.layout.json') ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_sp.tsv.layout.json') [networkList[4], networkList[5]], # [networkList[12], networkList[13]], Omitting edges, using canvas, and fast layout default because the network is large Omitting edges, using canvas, and fast layout default because the network [networkList is large[6], networkList[7]]]) # [networkList[14], networkList[15]], /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_noTFA_sp.tsv /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_noTFA_sp.tsv# ], # [networkList[16], networkList[17]]]) ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_noTFA_sp.tsv') ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_noTFA_sp.tsv')# [networkList[8], networkList[9]]]) M.svg_width = 500 ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_noTFA_sp.tsv.layout.json') ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ENCODE_bias50_noTFA_sp.tsv.layout.json')# [networkList[10], networkList[11]], M.show() Omitting edges, using canvas, and fast layout default because the# network [networkList[12], is large networkList[13]], Omitting edges, using canvas, and fast layout default because the network is large /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/KC1p5_sp.tsv# [networkList[14], networkList[15]], /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/KC1p5_sp.tsv 200 ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/KC1p5_sp.tsv')# [networkList[16], networkList[17]]]) ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/KC1p5_sp.tsv') ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/KC1p5_sp.tsv.layout.json')M.svg_width = 500 ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/KC1p5_sp.tsv.layout.json') ⊖ ∧ ⊨ ⊖ ∧ ⊨ Omitting edges, using canvas, and fast layout default because theM. shownetwork() is large Omitting edges, using canvas, and fast layout default because the network is large /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv sA(Th17), TFA sA(Th17), TF mRNA ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv') ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv')200 ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv.layout.json') ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv.layout.json') Omitting⊖ ∧ edges,⊨ using canvas, and fast layout default because the ⊖network∧⊖ is⊨ ∧large⊨ ⊖ ∧ ⊨ Omitting edges, using canvas, and fast layout default because the network is large /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv⊖ ∧ ⊨ ⊖ ∧ ⊨ sA('Reading(Th17), Max network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv')sA(Th17),E TNFCAODE, TFA sA(Th17), TF mRNA ('Reading network', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv') ('Loading saved layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv.layout.json') ('LoadingsA(Th17), Msavedax layout', '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv.layout.json')ENCODE, TFA Omitting edges, using canvas, and fast layout default because the network is large Omitting edges, using canvas, and fast layout default because the network is large

In [17]: # visualize the networks -- HARD CODED for 5 networks: In [17]: # visualize the networks -- HARD CODED for 5 networks: # M = multiple_network.MultipleNetworks( # M = multiple_network.MultipleNetworks( # [[networkList[0]], [networkList[1]], [networkList[2]]]) # [[networkList[0]], [networkList[1]], [networkList[2]]]) # # [networkList[2], networkList[3]] # # [networkList[2], networkList[3]] # M.svg_width = 500 # M.svg_width = 500 # M.show() # M.show() M = multiple_network.MultipleNetworks( M = multiple_network.MultipleNetworks( [[networkList[0], networkList[1]], [[networkList[0], networkList[1]], [networkList[2], networkList[3]], [networkList[2], networkList[3]], [networkList[4], networkList[5]], [networkList[4], networkList[5]], [networkList[6], networkList[7]]]) [networkList[6], networkList[7]]]) Figure 4 # ], # ], # [networkList[8], networkList[9]]]) # [networkList[8], networkList[9]]]) # [networkList[10], networkList[11]], # [networkList[10], networkList[11]], # [networkList[12], networkList[13]], # [networkList[12], networkList[13]], # [networkList[14], networkList[15]], # [networkList[14], networkList[15]], # [networkList[16], networkList[17]]]) # [networkList[16], networkList[17]]]) M.svg_width = 500 M.svg_width = 500 M.show() A ATAC-only Th17 Core Networks M.show() 200 200 + Combined “Final” ⊖ ∧ ⊨ TFA = P X, + ⊖ ∧ ⊨ ⊖ ∧ ⊨ ⊖ ∧ ⊨ TFA = TF mRNA, + ⊖ ∧ ⊨ ⊖ ∧ ⊨ sA(Th17), TFA sA(Th17), TF mRNA sA(Th17), TFA sA(Th17), TF mRNA sA(Th17), Max ENCODE, TFA Il24 Il24 Il24 Il22 ⊖ ∧ ⊨ ⊖ ∧⊖ ⊨∧ ⊨ Il22 ⊖ ∧ ⊨ Il22 Il23r Il1r1 Il23r Il1r1 ⊖ ∧ ⊨ Il23r Il1r1 ⊖ ∧ ⊨ ENCODE, TF mRNA sA(Th17),K MOa-Cx hIP G.S. (No Infererence) ENCODE, TFA ENCODE, TF mRNA KO-ChIP G.S. (No Infererence) Batf Batf Batf Hif1a Hif1a Hif1a Maf Maf Maf Rorc Etv6 Rorc Fosl2 Etv6 Rorc Fosl2 Etv6 Fosl2 Rora Stat3 Rora Stat3 Rora Stat3 Irf4 Irf4 Irf4

Il12rb1 Il12rb1 Il12rb1 Ccl20 Il17f Ccl20 Il17f Ccl20 Il17f Il17a Il17a ⊖ ∧ ⊨ Il17a ⊖ ∧ ⊨ ⊖ ∧ ⊨ ⊖ ∧ ⊨ ⊖ ∧ ⊨ ⊖ ∧ ⊨ sA(Th17), Max ENCODE, TFA sA(Th17), Max ENCODE, TFA ENCODE, TF mRNA KO-ChIP G.S. (No Infererence) B C KO-ChIP G.S. (No Inference) D ⊖ ∧ ⊨ ⊖ ∧⊖ ⊨∧ ⊨ No Prior Th17 Core ⊖ ∧ ⊨ “Final” KO+ChIP+ATAC ⊖ ∧ ⊨ ⊖ ∧ ⊨ KO+ChIP+ATAC "Final" TRN ENCODEN, ToF P mrioRrNA KO-ChIP G.S. (No Infererence) Il24 Il24 KO+ChIP+ATAC "FinaIl24l" TRN No Prior Il22 Il23r Il22 Il22 Il1r1 Il23r Il1r1 Il23r Il1r1

Batf Batf Batf Hif1a Maf Hif1a Maf Hif1a Etv6 Rorc Rorc Maf Fosl2 Etv6 Fosl2 Etv6 Rorc Fosl2 Rora Stat3 Rora Stat3 Irf4 Irf4 Irf4Rora Stat3

⊖ ∧ ⊨ ⊖ ∧ ⊨ ⊖ ∧ ⊨ Il12rb1⊖ ∧ ⊨ ⊖ ∧ ⊨ ⊖ ∧ ⊨ Il12rb1 ENCODE, TF mRNA KO-ChIP G.S. (No Infererence) ENCODE, TF mRNA KO-ChIP G.S. (No Infererence) KO+ChIP+ATAC "Final" TRN Il12rb1No Prior Ccl20 Ccl20 Il17a Il17f Il17a Il17f Ccl20 Il17f ⊖ ∧ ⊨ ⊖ ∧ ⊨ Il17a In [21]: # Set network preferences count = 0 KO+ChIP+ATAC "Final" TRN No Prior In [21]: # Set network preferences for curr in networkList: count = 0 networkInit = networkInits[count] for curr in networkList: # get title information + curr column for shading of figures networkInit = networkInits[count] currCol = networkInit[1] # get title information + curr column for shading of figures titleBase = networkInit[2] currCol = networkInit[1] titleInf = titleBase #+ ' (gCut: ' + geneCutoff.replace('_',' ') + ', pCut: ' + peakCutoff.replace('_',' ') titleBase = networkInit[2] threshhold = networkInit[3] titleInf = titleBase #+ ' (gCut: ' + geneCutoff.replace('_',' ') + ', pCut: ' + peakCutoff.replace('_',' ') threshhold = networkInit[3] # set threshold curr.network.threshhold_slider.value = threshhold # set threshold curr.network.apply_click(None) curr.network.threshhold_slider.value = threshhold # if tfFocus: curr.network.apply_click(None) # # focus on TF core # if tfFocus: # curr.network.tf_only_click(None) # # focus on TF core # curr.network.layout_click(None) # curr.network.tf_only_click(None) # curr.network.layout_click(None) # set title curr.network.title_html.value = titleInf # set title curr.network.title_html.value = titleInf # add labels curr.network.labels_button.value=False # add labels print #you can turn labels back on curr.network.labels_button.value=False curr.network.draw_click(None) print #you can turn labels back on # Limit genes to target genes of interest and their putative regulators curr.network.draw_click(None) geneIn = open(directory + '/' + geneListFile,'r') # Limit genes to target genes of interest and their putative regulators geneList = list() geneIn = open(directory + '/' + geneListFile,'r') ⊖ ∧ ⊨ ⊖ ∧ ⊨ ⊖ for gene∧ in⊨ geneIn: ⊖ ∧ ⊨ geneList = list() geneList.append(gene.strip('\n')) In [21]: # Set network preferences for gene in geneIn: KO+ChIP+ATAC "Final" TRN No Prior KO+ChIP+ATAC "Final" TRN No Prior geneIn.close() count = 0 geneList.append(gene.strip('\n')) for curr in networkList: geneIn.close() curr.network.pattern_text.value = " ".join(geneList) networkInit = networkInits[count] curr.network.match_click(None) # get title information + curr column for shading of figures curr.network.pattern_text.value = " ".join(geneList) In [21]: # Set network preferences currCol = networkInit[1] curr.network.match_click(None) count = 0 # don't use titleBase = networkInit[2] for curr in networkList: # curr.targeted_click(None) titleInf = titleBase #+ ' (gCut: ' + geneCutoff.replace('_','# don't ') + use', pCut: ' + peakCutoff.replace('_',' ') networkInit = networkInits[count] # curr.network.gene_button = " ".join(geneList) threshhold = networkInit[3] # curr.targeted_click(None) # get title information + curr column for shading of figures # curr.network.gene_click(None) # curr.network.gene_button = " ".join(geneList) currCol = networkInit[1] # set threshold # curr.network.gene_click(None) titleBase = networkInit[2] # Load heatmap curr.network.threshhold_slider.value = threshhold titleInf = titleBase #+ ' (gCut: ' + geneCutoff.replace('_',' ') + ', pCut: ' + peakCutoff.replace('_',' ') curr.load_heatmap(directory + '/' + expressionFile) curr.network.apply_click(None) # Load heatmap threshhold = networkInit[3] # color nodes according to a sample column in the gene expression matrix # if tfFocus: curr.load_heatmap(directory + '/' + expressionFile)

curr.gene_click(None) # # focus on TF core # color nodes according to a sample column in the gene expression matrix # set threshold curr.expression.transform_dropdown.value = 'Z score' # curr.network.tf_only_click(None) curr.gene_click(None) curr.network.threshhold_slider.value = threshhold curr.expression.apply_transform() # curr.network.layout_click(None) curr.expression.transform_dropdown.value = 'Z score' curr.network.apply_click(None) curr.expression.col = currCol curr.expression.apply_transform() # if tfFocus: curr.condition_click(None) # set title curr.expression.col = currCol # # focus on TF core curr.network.title_html.value = titleInf curr.condition_click(None) # curr.network.tf_only_click(None) # # Attach the motif collection populated above: # curr.network.layout_click(None) # N.motif_collection = C # add labels # # Attach the motif collection populated above:

# net_with_motifs curr.network.labels_button.value=False # N.motif_collection = C # set title print #you can turn labels back on # net_with_motifs curr.network.title_html.value = titleInf # !!!!! for some reason this doesn't work, tried curr.display... w/o curr.network.display... curr.network.draw_click(None) # Part 2: a hacky way to set min and max on the heatmap and heatmap-colored nodes # Limit genes to target genes of interest and their putative regulators # !!!!! for some reason this doesn't work, tried curr.display... w/o curr.network.display... # add labels # curr.network.display_graph.get_edge_color_interpolator() geneIn = open(directory + '/' + geneListFile,'r') # Part 2: a hacky way to set min and max on the heatmap and heatmap-colored nodes curr.network.labels_button.value=False # curr.network.display_graph._edge_color_interpolator.minclr = minclr geneList = list() # curr.network.display_graph.get_edge_color_interpolator() print #you can turn labels back on # curr.network.display_graph._edge_color_interpolator.maxclr = maxclr for gene in geneIn: # curr.network.display_graph._edge_color_interpolator.minclr = minclr curr.network.draw_click(None) # curr.network.display_graph._edge_color_interpolator.breakpoints = \ geneList.append(gene.strip('\n')) # curr.network.display_graph._edge_color_interpolator.maxclr = maxclr # Limit genes to target genes of interest and their putative regulators # [(minvalue, minclr), geneIn.close() # curr.network.display_graph._edge_color_interpolator.breakpoints = \ geneIn = open(directory + '/' + geneListFile,'r') # (0, zeroclr), # [(minvalue, minclr), geneList = list() # (maxvalue, maxclr)] curr.network.pattern_text.value = " ".join(geneList) # (0, zeroclr), for gene in geneIn: In [21]: ## Set curr.network.draw_click(None)network preferences curr.network.match_click(None) In [21]: # Set# network (maxvalue,preferences maxclr)] geneList.append(gene.strip('\n')) count count = 0 += 1 count# = 0curr.network.draw_click(None) geneIn.close() for curr in networkList: # don't use for curr count in networkList+= 1 : networkInit = networkInits[count] # curr.targeted_click(None) networkInit = networkInits[count] curr.network.pattern_text.value = " ".join(geneList) # get title information + curr column for shading of figures # curr.network.gene_button = " ".join(geneList) # get title information + curr column for shading of figures /Users/emiraldi/anaconda/lib/python2.7/site-packages/scipy/stats/stats.py:2238: curr.network RuntimeWarning:.match_click(None invalid) value encountered in true_divide currCol = networkInit[1] # curr.network.gene_click(None) currCol = networkInit[1] np.expand_dims(sstd, axis=axis)) /Users/emiraldi/anaconda/lib/python2.7/site-packages/scipy/stats/stats.py:2238: RuntimeWarning: invalid value encountered in true_divide titleBase = networkInit[2] titleBase = networkInit[2] # don't use np.expand_dims(sstd, axis=axis)) titleInf = titleBase #+ ' (gCut: ' + geneCutoff.replace('_',' ') + ', pCut: ' + peakCutoff.replace('_',' ') # Load heatmap titleInf = titleBase #+ ' (gCut: ' + geneCutoff.replace('_',' ') + ', pCut: ' + peakCutoff.replace('_',' ') # curr.targeted_click(None) threshhold = networkInit[3] curr.load_heatmap(directory + '/' + expressionFile) threshhold = networkInit[3] # curr.network.gene_button = " ".join(geneList) # color nodes according to a sample column in the gene expression matrix # curr.network.gene_click(None) # set threshold curr.gene_click(None) # set threshold curr.network.threshhold_slider.value = threshhold curr.expression.transform_dropdown.value = 'Z score' curr.network.threshhold_slider.value = threshhold # Load heatmap curr.network.apply_click(None) curr.expression.apply_transform() curr.network.apply_click(None) curr.load_heatmap(directory + '/' + expressionFile) # if tfFocus: curr.expression.col = currCol # if tfFocus: # color nodes according to a sample column in the gene expression matrix # # focus on TF core curr.condition_click(None) # # focus on TF core curr.gene_click(None) # curr.network.tf_only_click(None) # curr.network.tf_only_click(None) In [19]: print directory + '/' + netPath + '/' + networkFile curr.expression.transform_dropdown.value = 'Z score' # curr.network.layout_click(None) # # Attach the motif collection populated above:# curr.network.layout_click(None) curr.expression.apply_transform() In [19]: print directory + '/' + netPath + '/' + networkFile # N.motif_collection = C curr.expression.col = currCol # set title # net_with_motifs # set title /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv curr.condition_click(None) curr.network.title_html.value = titleInf /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv curr.network.title_html.value = titleInf # !!!!! for some reason this doesn't work, tried curr.display... w/o curr.network.display... # # Attach the motif collection populated above: In [20]: directory # add +labels '/' + netPath + '/' + networkFile # Part 2: a hacky way to set min and max on the heatmap and #heatmap-colored add labels nodes # N.motif_collection = C curr.network.labels_button.value=False # curr.network.display_graph.get_edge_color_interpolator()In [20]: directory curr.network + '/'.labels_button + netPath + .'/'value + =networkFileFalse # net_with_motifs Out[20]: '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv' print #you can turn labels back on # curr.network.display_graph._edge_color_interpolator.minclr print = minclr #you can turn labels back on

curr.network.draw_click(None) # curr.network.display_graph._edge_color_interpolator.maxclrOut[20]: '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv' curr = maxclr.network.draw_click(None) # !!!!! for some reason this doesn't work, tried curr.display... w/o curr.network.display... # Limit genes to target genes of interest and their putative regulators # curr.network.display_graph._edge_color_interpolator.breakpoints # Limit = genes\ to target genes of interest and their putative regulators In [ ]: # Part 2: a hacky way to set min and max on the heatmap and heatmap-colored nodes geneIn = open(directory + '/' + geneListFile,'r') # [(minvalue, minclr), In [ ]: geneIn = open(directory + '/' + geneListFile,'r') # curr.network.display_graph.get_edge_color_interpolator() geneList = list() # (0, zeroclr), geneList = list() # curr.network.display_graph._edge_color_interpolator.minclr = minclr for gene in geneIn: # (maxvalue, maxclr)] for gene in geneIn: # curr.network.display_graph._edge_color_interpolator.maxclr = maxclr geneList.append(gene.strip('\n')) # curr.network.draw_click(None) geneList.append(gene.strip('\n')) # curr.network.display_graph._edge_color_interpolator.breakpoints = \ geneIn.close() count += 1 geneIn.close() # [(minvalue, minclr), # (0, zeroclr), curr.network.pattern_text.value = " ".join(geneList) curr.network.pattern_text.value = " ".join(geneList) # (maxvalue, maxclr)] curr.network.match_click(None) curr.network.match_click(None) # curr.network.draw_click(None) /Users/emiraldi/anaconda/lib/python2.7/site-packages/scipy/stats/stats.py:2238: RuntimeWarning: invalid value encountered in true_divide count += 1 np.expand_dims(sstd, axis=axis)) # don't use # don't use # curr.targeted_click(None) # curr.targeted_click(None) # curr.network.gene_button = " ".join(geneList) # curr.network.gene_button = " ".join(geneList) # curr.network.gene_click(None) /Users/emiraldi/anaconda/lib/python2.7/site-packages/scipy/stats/stats.py:2238: RuntimeWarning: invalid value encountered in true_divide# curr.network.gene_click(None) np.expand_dims(sstd, axis=axis)) # Load heatmap # Load heatmap curr.load_heatmap(directory + '/' + expressionFile) curr.load_heatmap(directory + '/' + expressionFile) # color nodes according to a sample column in the gene expression matrix # color nodes according to a sample column in the gene expression matrix curr.gene_click(None) curr.gene_click(None) curr.expression.transform_dropdown.value = 'Z score' curr.expression.transform_dropdown.value = 'Z score' In [19]: print directory + '/' + netPath + '/' + networkFile curr.expression.apply_transform() curr.expression.apply_transform() curr.expression.col = currCol curr.expression.col = currCol

curr.condition_click(None) /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv curr.condition_click(None)

# # Attach the motif collection populated Inabove: [19]: print directory + '/' + netPath + '/' + networkFile # # Attach the motif collection populated above: # N.motif_collection = C In [20]: directory + '/' + netPath + '/' + networkFile # N.motif_collection = C # net_with_motifs # net_with_motifs /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsvOut[20]: '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv' # !!!!! for some reason this doesn't work, tried curr.display... w/o curr.network.display... # !!!!! for some reason this doesn't work, tried curr.display... w/o curr.network.display... # Part 2: a hacky way to set min and max on the heatmap and heatmap-colored nodes # Part 2: a hacky way to set min and max on the heatmapIn [20]: and directoryheatmap-colored + '/' +nodes netPath + '/' + networkFile In [ ]: # curr.network.display_graph.get_edge_color_interpolator() # curr.network.display_graph.get_edge_color_interpolator() # curr.network.display_graph._edge_color_interpolator.minclrOut[20]: '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv' = minclr # curr.network.display_graph._edge_color_interpolator.minclr = minclr # curr.network.display_graph._edge_color_interpolator.maxclr = maxclr # curr.network.display_graph._edge_color_interpolator.maxclr = maxclr # curr.network.display_graph._edge_color_interpolator.breakpoints = \ # curr.network.display_graph._edge_color_interpolator.breakpoints = \ # [(minvalue, minclr), In [ ]: # [(minvalue, minclr), # (0, zeroclr), # (0, zeroclr), # (maxvalue, maxclr)] # (maxvalue, maxclr)] # curr.network.draw_click(None) # curr.network.draw_click(None) count += 1 count += 1

/Users/emiraldi/anaconda/lib/python2.7/site-packages/scipy/stats/stats.py:2238: RuntimeWarning: invalid value encountered in true_divide /Users/emiraldi/anaconda/lib/python2.7/site-packages/scipy/stats/stats.py:2238: RuntimeWarning: invalid value encountered in true_divide np.expand_dims(sstd, axis=axis)) np.expand_dims(sstd, axis=axis))

In [19]: print directory + '/' + netPath + '/' + networkFile In [19]: print directory + '/' + netPath + '/' + networkFile

/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv /Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv

In [20]: directory + '/' + netPath + '/' + networkFile In [20]: directory + '/' + netPath + '/' + networkFile

Out[20]: '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv' Out[20]: '/Users/emiraldi/erm/Shared/Maria-Emily/1803_Th17networks/Networks_pre180331/ATAC_Th17_bias100_noTFA_sp.tsv'

In [ ]: In [ ]: Zfp263 Zfp263 Bcl6 Bcl6 Irf4 Irf4 Irf7 Irf7 Maf Maf Egr2 Egr2 Irf1 Irf1 Spi1 Spi1 Hic1 Hic1 Gata2 Gata2 Foxj1 Foxj1 Prdm1 Prdm1 Zfp105 Zfp105 Nr3c1 Nr3c1 Egr1 Egr1 Stat1 Stat1 Pou2f2 Pou2f2 Ikzf2 Ikzf2 Trp53 Trp53 Srf Srf Elf1 Elf1 Sp3 Sp3 Vdr Vdr Mecom Mecom Rest Rest Tfcp2 Tfcp2 Rfx2 Rfx2 Pparg Pparg Max Max Tbx21 Tbx21 Irf5 Irf5 Klf4 Zfp219 Zfp219 Klf5 Klf5 Tcf7l1 Tcf7l1 Myc Mxi1 Mxi1 Plag1 Plag1 Foxa1 Foxa1 Bcl11a Bcl11a Elf3 Elf3 Egr3 Egr3 Pou2af1 Pou2af1 Klf2 Klf2 Pura Pura Nfkb1 Nfkb1 Figure 5 Bach2 Bach2 Tcf7l2 Tcf7l2 Thra Thra Jund Jund A De NovoStat3 Th17 Core (ChIP+KO+ATAC) B De NovoStat3 Th17 Core (ATAC-only) KCAA max, Th17SetsIrf8 fdr1 top30 ATAC Th17, maxTh17SetsIrf8 fdr1 top30 Zfhx3 Zfhx3 AHRR Lef1 AHRR Lef1 Sox7 AHRR Sox7 AHRR BATFc,y Elk3 BATF BCL6 Elk3 BCL6 CASZ1 Nfatc1 CENPT Nfatc1 Zfp110 CASZ1 Zfp110 CENPT DACH2 Sox8 DACH2 DACH2 Sox8 DACH2 DPF1 Nfe2l2 DPF1 DPF1 Nfe2l2 DPF1 y Foxo1 y Foxo1 ETS1 Zfp1 ETS1 ETS1 Zfp1 ETS1 FOSL1 Rora FOSL1 FOSL1 Rora FOSL1 c Foxm1 c Foxm1 FOSL2 Tef FOSL2 FOSL2 Tef FOSL2 FOXB1 Foxf1 FOXB1 GLI3 Foxf1 c,y Gata3 Gata3 GLI3 HIF1A Smad1 HIF1A GLIS3 Smad1 GLIS3 IKZF3c Foxc1 IKZF3 HIF1Ac,y Foxc1 c,y Zbtb16 Zbtb16 HIF1A IRF4 Stat6 IRF4 HIF3A Stat6 HIF3A JUN y Sox4 JUNB Sox4 Nfix JUN Nfix JUNB KLF13 Myb KLF13 KLF7 Myb KLF7 MAFc MAF MAFc E2f3 Pbx1 Pbx1 MAF NR1D1 Eomes NR1D1 NCOA1 Eomes NCOA1 NR1D2 Setdb1 NR1D2 NR1D1 Setdb1 Nr4a2 Nr4a2 NR1D1 RORA Zeb1 RORA NR1D2 Zeb1 NR1D2 RORCc,y Klf7 RORC POU6F1 Klf7 y Tcf7 Tcf7 POU6F1 RUNX1 Hey1 RUNX1 RFX1 Hey1 RFX1 SATB1c,y Elk1 SATB1 RORA Elk1 RORA Atf3 c,y Atf3 SCRT1 Sox5 SCRT1 RORC Sox5 RORC SMAD3 c Etv5 SMAD3 SATB1c,y Etv5 Arnt2 Arnt2 SATB1 SREBF1 Ar SREBF1 SCRT1 Ar SCRT1 STAT3c,y Bhlhe40 STAT3 SMAD3 c Bhlhe40 SMAD3 Ikzf3 c,y Ikzf3 THRA Nfic THRA STAT3 Nfic STAT3 VAX2 Mef2d VAX2 TBX21y Mef2d TBX21 VDR Meis3 Meis3 Klf8 VDR THRA Klf8 THRA ZBTB7A Zbtb4 ZBTB7A VDR Zbtb4 VDR ZFP9 Ybx1 ZFP9 Ybx1 Sp6 ZFP9 Sp6 ZFP9 Bach1 Bach1 ≤-10-10 --55 Nr1d10 5 ≥1010 -400-400 0 400400 ≤-10-10 -5-5 Nr1d100 55 ≥10 -100 0 200 (Edge Sign) * -log -200(P ) -100 0 100 200 (Edge Sign) * -log (P-200) -100-100 0 0 100 100200 200 (Sign of Edges) * -10logadj10(Padj) # Targets (Sign of Edges) * -10logadj(P ) # Targets # of Negative# of Negative and and Positive Positive Targets Targets per TF per TF 10# of# adjNegativeof Negative and and Positive Positive Targets Targets per TFTF +, not in Prior +, not in Prior +, w/Prior Support +, w/Prior Support -, not in Prior -, not in Prior -, w/Prior SupportFigure S11. TF-TF module analysis ChIP+KO+ATAC-, w/PriorTRN Support Th17 Th17 - - ChIP+KO+ATACTF-TF ModulesTRN Top 15 TF (-TFChIP+KO+ATAC Modules (Positive TF-GeneTRN) Interactions) % Overlap between networks, TF/gene = 15

posEdge negEdgeTh17 Non posEdgeTh17 Non negEdge 1 C P <= 0.01 sATAC (Th17 48h), b=50,P <= 0.01 noTFA adj adj 10 Nr1d2 ChIP/sA(Th17),NR1D2Nr1d2 Nr1d2Nr1d2 b=50, noTFA Nr1d2 ChIP/sA(Th17)+KO+sA(Th),Vax2 Vax2 Vax2 b=50, noTFA Vax2 % Overlap between networks, TF/gene = 15 VAX2 Vax2 1 IL23 Signaling, Rheumatoid Arthritis No Prior sATAC (Th17 48h), b=50, noTFA 10 Rorc RORCRorc RorcRorc Rorc ChIP/sA(Th17), b=50, noTFA 0.9 ChIP, b=50, noTFA ChIP/sA(Th17)+KO+sA(Th), b=50, noTFA Rora Rora Rora Rora No Prior 0.9 RORAKO,Rora b=50, noTFA ChIP, b=50, noTFA Nr1d1 Nr1d1 Nr1d1 Nr1d1 KO, b=50, noTFA qATAC (AllNR1D1 Th),Nr1d1 b=50, noTFA qATAC (All Th), b=50, noTFA KO, b=10, noTFA VAX2, RORC,Klf7 RORA, NR1D1Klf7KO,Klf7 b=10, noTFA Klf7 0.8 0.8 KLF7 Klf7 ChIP, b=100 ChIP, b=100 ChIP, b=50 Thra THRAThra Thra Thra ChIP, b=50 Thra qATAC (All Th), b=10, noTFA ChIP, b=10 9 score 0.7

Zbtb7b qATAC (AllZbtb7b Th),Zbtb7b b=10, noTFA Zbtb7b ChIP, b=10, noTFA - ZBTB7B Zbtb7b KO, b=100 Sp4 Sp4 Sp4 ChIP, b=10 Sp4 KO, b=50 score 0.7 SP4 Sp4 KO, b=10 0.6

ChIP, b=10, noTFA qATAC (All Th), b=100 - Stat3 Integrin SignalingSTAT3Stat3 Stat3 Stat3 Stat3 qATAC (All Th), b=50 KO, b=100 qATAC (All Th), b=10 Fosl2 Fosl2 Fosl2Fosl2 KO, b=50 Fosl2 sATAC (Th17 48h), b=100 0.5 FOSL2 sATAC (Th17 48h), b=50 Smad3 Smad3 Smad3 KO, b=10 Smad3 ChIP/sA(Th17), b=100 0.6 SMAD3 Smad3 ChIP/sA(Th17), b=50 qATAC (All Th), b=100 0.4 Maf Maf Maf ChIP/sA(Th17)+KO+sA(Th),Maf b=100 ZBTB7B, SP4, STAT3, FOSL2, SMAD3,qATACMAF MAF Maf(All Th), b=50 ChIP/sA(Th17)+KO+sA(Th), b=50 Normalized sATAC (Th17 48h), b=10 Klf10 qATACKLF10Klf10 Klf10 Klf10(All Th), b=10 Klf10 ChIP/sA(Th17), b=10 ChIP/sA(Th17)+KO+sA(Th), b=10 8 0.3 sATAC (Th17 48h), b=100 Overlap Z 0.5 Foxo1 FOXO1Foxo1 Foxo1Foxo1 Foxo1sATAC (Th17 48h), b=10, noTFA sATAC (Th17 48h), b=50 ChIP/sA(Th17), b=10, noTFA ChIP/sA(Th17)+KO+sA(Th), b=10, noTFA Zeb1 ChIP/sA(Th17),ZEB1Zeb1 Zeb1Zeb1 b=100 Zeb1 00.2 Cic ChIP/sA(Th17),CICCic Cic Cic b=50 Cic

No Prior 0.4 ChIP/sA(Th17)+KO+sA(Th),P = .1 P = .1 b=100 KO, b=50 KO, b=10 Runx1 Runx1 Runx1 Runx1 KO, b=100 adj RUNX1 Runx1 adj ChIP, b=50 ChIP, b=10 ChIP, b=100 ChIP/sA(Th17)+KO+sA(Th), b=50 Normalized Ahrr Ahrr Ahrr Ahrr

AHRR Ahrr KO, b=50, noTFA KO, b=10, noTFA

sATAC (Th17 48h), b=10 ChIP, b=50, noTFA ChIP, b=10, noTFA ChIP/sA(Th17), b=50 ChIP/sA(Th17), b=10 qATAC (All Th), b=50 qATAC (All Th), b=10 Zbtb32 ChIP/sA(Th17),ZBTB32Zbtb32 Zbtb32Zbtb32 b=10 Zbtb32 qATAC (All Th), b=100 ChIP/sA(Th17), b=100 Amino Acid Transport sATAC (Th17 48h), b=50 sATAC (Th17 48h), b=10 sATAC (Th17 48h), b=100 0.3 Plscr1 ChIP/sA(Th17)+KO+sA(Th),Plscr1 Plscr1 b=10 Plscr1 ChIP/sA(Th17), b=50, noTFA ChIP/sA(Th17), b=10, noTFA qATAC (All Th), b=50, noTFA qATAC (All Th), b=10, noTFA

PLSCR1 Plscr1 Overlap Z sATAC (Th17 48h), b=10, noTFA 7 sATAC (Th17 48h), b=50, noTFA sATAC (Th17 48h), b=10, noTFA Plagl1 Plagl1 Plagl1 Plagl1 ChIP/sA(Th17)+KO+sA(Th), b=50 ChIP/sA(Th17)+KO+sA(Th), b=10 ChIP/sA(Th17),PLAGL1 Plagl1 b=10, noTFA ChIP/sA(Th17)+KO+sA(Th), b=100 ChIP/sA(Th17)+KO+sA(Th),Bhlhe41 BHLHE41Bhlhe41 Bhlhe41Bhlhe41 b=10, noTFA Bhlhe41 ZBTB32, PLSCR1, PLAGL1, BHLHE41, ChIP/sA(Th17)+KO+sA(Th), b=50, noTFA ChIP/sA(Th17)+KO+sA(Th), b=10, noTFA 0.2 Jdp2 JDP2Jdp2 Jdp2Jdp2 Jdp2 0 Ddit3 DDIT3Ddit3 Ddit3Ddit3 Ddit3

Etv4 ETV4Etv4 Etv4 Etv4 Etv4 No Prior JDP2, DDIT3, ETV4, BCL11B KO, b=50 KO, b=10 KO, b=100 Bcl11b BCL11BBcl11b Bcl11bBcl11b Bcl11b ChIP, b=50 ChIP, b=10 ChIP, b=100 Stat2 STAT2Stat2 Stat2 Stat2 Stat2 6 Irf9 IRF9Irf9 Irf9 Irf9 Irf9 KO, b=50, noTFA KO, b=10, noTFA ChIP, b=50, noTFA ChIP, b=10, noTFA ChIP/sA(Th17), b=50 ChIP/sA(Th17), b=10 Gene Expression qATAC (All Th), b=50 qATAC (All Th), b=10 Irf7 Irf7 Irf7 Irf7 Interferon, Viral ResponseIRF7 Irf7 qATAC (All Th), b=100 ChIP/sA(Th17), b=100

* >2 media(1h) Stat4 ≥2 STAT4Stat4 Stat4 Stat4 Stat4 sATAC (Th17 48h), b=50 sATAC (Th17 48h), b=10 Th0(1h) sATAC (Th17 48h), b=100 Th0(3h) Stat1 Stat1 Stat1 Stat1

STAT1 ChIP/sA(Th17), b=50, noTFA ChIP/sA(Th17), b=10, noTFA

Stat1 qATAC (All Th), b=50, noTFA qATAC (All Th), b=10, noTFA

Th0(6h) ) Th0(16h) Irf2 IRF2Irf2 Irf2 Irf2 Irf2 sATAC (Th17 48h), b=50, noTFA TFs sATAC (Th17 48h), b=10, noTFA Th0(48h) ChIP/sA(Th17)+KO+sA(Th), b=50 ChIP/sA(Th17)+KO+sA(Th), b=10 ChIP/sA(Th17)+KO+sA(Th), b=100 STAT2, IRF9, IRF7, STAT4, STAT1,adj IRF2, IRF1 Th17(1h) Irf1 IRF1Irf1 Irf1 Irf1 Irf1

Th17(3h) P Trp73 Trp73 Trp73 Trp73 Th17(6h) ( 0 TRP73

edges) Trp73 P = 1 P = 1 P = 1 P = 1 Th17(9h) adj adj adj adj 5 Zfp365 Zfp365 Zfp365 Zfp365 ChIP/sA(Th17)+KO+sA(Th), b=50, noTFA ChIP/sA(Th17)+KO+sA(Th), b=10, noTFA Th17(12h) 10 ZFP365 Zfp365 Th17(16h) Zfp750 p53 SignalingZfp750 Zfp750 Zfp750 Th17(24h) ZFP750 Zfp750 Th17(48h)

log Zfp385a Zfp385a Th1(48h) Zfp385a ZFP385A Zfp385aZfp385a - Th2(48h) Trp53 TRP53Trp53 Trp53 Trp53 Treg(48h) Trp53 <-2≤-2 Zfp280b ZFP280BZfp280b Zfp280bZfp280b Zfp280b

TRP73, ZFP365, ZFP750, ZFP385A, of (Sign TRP53 Rlf Irf9 Irf2 Irf1 Irf7 Vdr Lbr Cic Id1 Id2 Erg

Sp9 Atf5 Atf3 Txk Pcx Pml Maf Tal2 Tox Utf1 Klf3 Irf4 Batf Aff3 IRF4Irf4 Irf4 Irf4 Dpf1 Ing2 Mxi1 Rest Sirt6 Etv4 Ikzf3 Rorc Rora Vax2 Edf1 Ift57 Bcl3 Mlf1 Ikzf2 Spib Rbpj Dffb Irf4 Scrt1 Tcf25 Hes6 Hif1a Arrb1 Glis3 Hif3a Foxj1 Sos2 Jdp2 Msh5 Mafg Ddit3 Stat2 Stat1 Zbp1 Eno1 Stat3 Mxd1 Ssh2 Chd7 Phtf2 Nrip1 Akna Tnp2 Hhex Sap30 Rcor2 Plscr1 Nupr1 Tbx21 Oasl1 Dhx58 Nr1d2 Nr1d1 Foxs1 Thap3 Tead1 Ssbp4 Prkdc Mbd5 Foxp2 Bcl6b Bmyc Plagl2 Nfkb2 Nfkb1 Plagl1 Gata3 Gata1 Pparg Epas1 Zbtb5 Foxo3 Trp53 Trp73 Zfp654 Zfp263 Zfp395 Ankzf1 Rbbp8 Cebpg Zfp790 Cebpb Trim21 Nfkbil1 Srebf1 Zfp608 Zfp827 Nfkbib Nfkbie Ncoa3 Zfp266 Trim24 Aebp2 Zfp750 Zfp365

Snapc1 Fbxo21 Kdm5b Setdb2 Ppfibp2 Erg Erg Dnajc21 Trim30a Erg Zfp280b Ppp1r10 Zfp385a ERG Erg Erg Ppp1r13l Tsc22d3 4 DNAZfp367 Mismatch ZFP367RepairZfp367 Zfp367Zfp367 Zfp367 E2F4E2f4 E2f4 E2f4 C030039L03Rik E2f4 E2F1E2f1 E2f1 E2f1 E2f1 Zfp960 ZFP960Zfp960 Zfp960Zfp960 Zfp960 Phf20 ZFP367, E2F4,Phf20 E2FPhf20 Phf20 PHF20 Phf20 Scrt1 SCRT1Scrt1 Scrt1Scrt1 Scrt1 Sp9 SP9Sp9 Sp9 Sp9 Sp9 HIF1A Signaling, Glycolysis 3 Dpf1 DPF1Dpf1 Dpf1Dpf1 Dpf1 Kdm5b KDM5BKdm5b Kdm5bKdm5b Kdm5b Hif3a HIF3AHif3a Hif3a Hif3a Hif3a SCRT1, SP9, DPF1, KDM5B, HIF3A, HIF1A, P = .1 Hif1a Hif1a Hif1a HIF1A Hif1a Hif1a P adj = .1 adj C030039L03Rik C030039L03RIKC030039L03RikC030039L03RikC030039L03Rik C030039L03Rik ANKZF1Ankzf1 Ankzf1 C030039L03RIKAnkzf1 (ZNF780A), ANKZF1Ankzf1Ankzf1 Plagl2 PLAGL2Plagl2 Plagl2Plagl2 Plagl2 2 Rel RELRel Rel Rel Rel Nfkb2 NFKB2Nfkb2 Nfkb2Nfkb2 Nfkb2 Nfkb1 NFKB1Nfkb1 Nfkb1Nfkb1 Nfkb1 Elf1 ELF1Elf1 Elf1 Elf1 Elf1 Dmrt1 DMRT1Dmrt1 Dmrt1Dmrt1 Dmrt1 Creb1 CREB1Creb1 Creb1Creb1 Creb1 Atf2 ATF2Atf2 Atf2 Atf2 Atf2 1 Peg3 PEG3Peg3 Peg3Peg3 Peg3 Prrx1 PRRX1Prrx1 Prrx1Prrx1 Prrx1 Hmga2 HMGA2Hmga2 Hmga2Hmga2 Hmga2 Zfp534 ZFP534Zfp534 Zfp534Zfp534 Zfp534 Hoxb1 HOXB1Hoxb1 Hoxb1Hoxb1 Hoxb1 Rxra RXRARxra Rxra Rxra Rxra Myt1l MYT1LMyt1l Myt1lMyt1l Myt1l P <= 0.01 P <= 0.01 adj adj Gene Expression 0 >2 media(1h)media (1h) Irf9 Irf7 Irf2 Irf1 Irf4 Elf1 Klf7 Cic Ahrr Erg Rel Atf2 Thra Sp4 Stat3 Maf Klf10 Etv4 Stat2 Stat4 Stat1 E2f4 E2f1 Sp9 Hif3a Hif1a Rxra

Th0 (1h) Vax2 Rorc Rora Fosl2 Zeb1 Plscr1 Plagl1 Jdp2 Ddit3 Scrt1 Dpf1 Plagl2 Prrx1 Myt1l Nr1d2 Nr1d1 Trp73 Trp53 Phf20 Nfkb2 Nfkb1 Dmrt1 Creb1 Peg3 Th17 upOnly Th17 upOnly Foxo1 Runx1 Zbtb32 Bcl11b Zfp365 Zfp750 Zfp367 Zfp960 Ankzf1 Zfp534 Hoxb1 Th17 upOnly Th0(1h)Th17 upOnly Zbtb7b Bhlhe41 Zfp385a Smad3 Zfp280b Kdm5b Hmga2 Th17 downOnly Th17 downOnly Th17 downOnly Th0Th0(3h) (3h)Th17 downOnly TFs C030039L03Rik Gene Expression Th0Th0(6h) (6h) media(1h) >2≥2 Th0 (16h) Th0(1h) Th0(16h) Th0(3h) Th0 (48h) Th0(6h) Th0(48h) Th0(16h) Th17 (1h) Th0(48h) Th17(1h) Th17(1h) Th17(3h)Th17 (3h) Th17(3h) Th17(6h) 0 Th17(6h)Th17 (6h) 0 Th17(9h) Th17(12h) Th17(9h)Th17 (9h) Th17(16h) Th17(24h) Th17(12h)Th17 (12h)

Th17(48h) Log2(FC) Th1(48h) Th17(16h)Th17 (16h) Th2(48h) Th17 (24h) Treg(48h) Th17(24h) <-2≤-2 Gene Expression Expression Gene Th17(48h)Th17 (48h) Rlf Irf9 Irf2 Irf1 Irf7 Vdr Lbr Cic Id1 Id2 Erg

Sp9 Atf5 Atf3 Txk Pcx Pml Maf Tal2 Tox Utf1 Klf3 Batf Aff3 Th1 (48h) Dpf1 Ing2 Mxi1 Rest Sirt6 Etv4 Ikzf3 Rorc Rora Vax2 Edf1 Ift57 Bcl3 Mlf1 Ikzf2 Spib Rbpj Dffb Th1(48h) Scrt1 Tcf25 Hes6 Hif1a Arrb1 Glis3 Hif3a Foxj1 Sos2 Jdp2 Msh5 Mafg Ddit3 Stat2 Stat1 Zbp1 Eno1 Stat3 Mxd1 Ssh2 Chd7 Phtf2 Nrip1 Akna Tnp2 Hhex Sap30 Rcor2 Plscr1 Nupr1 Tbx21 Oasl1 Dhx58 Nr1d2 Nr1d1 Foxs1 Thap3 Tead1 Ssbp4 Prkdc Mbd5 Foxp2 Bcl6b Bmyc Plagl2 Nfkb2 Nfkb1 Plagl1 Gata3 Gata1 Pparg Epas1 Zbtb5 Foxo3 Trp53 Trp73 Zfp654 Zfp263 Zfp395 Ankzf1 Rbbp8 Cebpg Zfp790 Cebpb Trim21 Nfkbil1 Srebf1 Zfp608 Zfp827 Nfkbib Nfkbie Ncoa3 Zfp266 Trim24 Aebp2 Zfp750 Zfp365 Snapc1 Fbxo21 Kdm5b Setdb2 Ppfibp2

Dnajc21 Trim30a Zfp280b Ppp1r10 Zfp385a Th2 (48h)

Ppp1r13l Tsc22d3 Th2(48h) Treg(48h)Treg (48h) <-2 C030039L03Rik Cic Irf9 Irf7 Irf2 Irf1 Irf4 Rel CIC Erg Elf1 SP4 SP9 Klf7 Sp4 Maf Sp9 Atf2 REL MAF IRF9 IRF7 IRF2 IRF1 IRF4 ERG Thra Ahrr Etv4 E2f4 E2f1 E2F4 E2F1 Vax2 Rorc Rora Zeb1 Jdp2 Dpf1 Rxra ELF1 KLF7 JDP2 ETV4 ATF2 ZEB1 DPF1 VAX2 Stat3 Fosl2 Klf10 Ddit3 Stat2 Stat4 Stat1 Scrt1 Hif3a Hif1a Peg3 Prrx1 Myt1l PEG3 THRA RXRA Trp73 Trp53 Phf20 AHRR DDIT3 HIF3A HIF1A RORC RORA KLF10 Nr1d2 Nr1d1 Foxo1 Plscr1 Plagl1 Plagl2 Nfkb2 Nfkb1 Dmrt1 Creb1 TRP73 TRP53 PHF20 STAT3 STAT2 STAT4 STAT1 FOSL2 NR1D2 NR1D1 SCRT1 MYT1L Runx1 Hoxb1 NFKB2 NFKB1 PRRX1 FOXO1 RUNX1 CREB1 DMRT1 HOXB1 SMAD3 Zbtb7b Smad3 Zbtb32 Bcl11b Zfp365 Zfp750 Zfp367 Zfp960 Ankzf1 Hmga2 Zfp534 ZFP365 ZFP750 ZFP367 ZFP960 KDM5B ZFP534 HMGA2 Kdm5b ZBTB32 BCL11B ZBTB7B PLSCR1 PLAGL1 ANKZF1 PLAGL2 Bhlhe41 Zfp385a Zfp280b ZFP385A ZFP280B BHLHE41 C030039L03RIK C030039L03Rik In [1]: # Visualization of the KO+ChIP Gold Standard from: # Miraldi et al. (2018) "Leveraging chromatin accessibility for Intranscriptional [1]: # Visualization regulatory of thenetwork KO+ChIP inference Gold Standardin Th17 Cells"from: # Miraldi et al. (2018) "Leveraging chromatin accessibility for transcriptional regulatory network inference in Th17 Cells" # TO START: In the menu above, choose "Cell" --> "Run All", and network + heatmap will load # Change "canvas" to "SVG" (drop-down menu in cell below) to enable drag# interactionsTO START: In withthe menunodes above, & labels choose "Cell" --> "Run All", and network + heatmap will load # More info about jp_gene_viz and user interface instructions are available# Change on Github: "canvas" to "SVG" (drop-down menu in cell below) to enable drag interactions with nodes & labels # https://github.com/simonsfoundation/jp_gene_viz/blob/master/doc/dNetwork%20widget%20overview.ipynb# More info about jp_gene_viz and user interface instructions are available on Github: # Info specific to the "Multi-network" view: # https://github.com/simonsfoundation/jp_gene_viz/blob/master/doc/dNetwork%20widget%20overview.ipynb # https://github.com/simonsfoundation/jp_gene_viz/blob/master/doc/Combined%20widgets.ipynb# Info specific to the "Multi-network" view: # https://github.com/simonsfoundation/jp_gene_viz/blob/master/doc/Combined%20widgets.ipynb # directory containing gene expression data and network folder directory = "." # directory containing gene expression data and network folder # folder containing networks directory = "." netPath = 'Networks' # folder containing networks # name of gene expression file netPath = 'Networks' expressionFile = 'Th0_Th17_48hTh.txt' # name of gene expression file # sample condition for initial gene node color expressionFile = 'Th0_Th17_48hTh.txt' sampleConditionOfInt = 'Th17(48h)' # sample condition for initial gene node color sampleConditionOfInt = 'Th17(48h)' # The starting conditions for each of the networks is a list of tuples. Tuple entries are: # 0. network file name (column format) (as found in directory) # The starting conditions for each of the networks is a list of tuples. Tuple entries are: # 1. column of the expression matrix that you want the nodes to be colored# 0. by network file name (column format) (as found in directory) # 2. network title, to which we'll add the gene and peak cutoffs # 1. column of the expression matrix that you want the nodes to be colored by # 3. cut off for edge strength, note TRN edges strengths are quantile for# 2.15 networkTFs/gene, title, to see to topwhich 10 TFs/gene,we'll add the gene and peak cutoffs # increase cutoff to .33, etc. # 3. cut off for edge strength, note TRN edges strengths are quantile for 15 TFs/gene, to see top 10 TFs/gene, networkInits = [ # increase cutoff to .33, etc. ('ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv',sampleConditionOfInt,' FinalnetworkInits ChIP/ATAC(Th17)+KO+ATAC(Th) = [ TRN',0), #.93), ('ATAC_Th17_bias50_maxComb_sp.tsv',sampleConditionOfInt,'Final ATAC-only ( 'ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv'TRN', 0), #.93), ,sampleConditionOfInt,' Final ChIP/ATAC(Th17)+KO+ATAC(Th) TRN',0), #.93), ("KO75_KOrk_1norm_sp.tsv",sampleConditionOfInt,'KO G.S. (25 TFs)',0), ('ATAC_Th17_bias50_maxComb_sp.tsv',sampleConditionOfInt,'Final ATAC-only TRN', 0), #.93), ("KC1p5_sp.tsv",sampleConditionOfInt,'KO-ChIP G.S. (9 TFs)',0)] ("KO75_KOrk_1norm_sp.tsv",sampleConditionOfInt,'KO G.S. (25 TFs)',0), ("KC1p5_sp.tsv",sampleConditionOfInt,'KO-ChIP G.S. (9 TFs)',0)] tfFocus = 1 # If 1, automatically applies the "TF only" function, so we can focus on TFs # If 0, all genes shown tfFocus = 1 # If 1, automatically applies the "TF only" function, so we can focus on TFs # If 0, all genes shown

In [2]: # Uncomment to run without install (in binder, for example) import sys In [2]: # Uncomment to run without install (in binder, for example) if ".." not in sys.path: import sys sys.path.append("..") if ".." not in sys.path: In [1]: # Visualization ofsys the. pathKO+ChIP.append Gold Standard("..") from: from jp_gene_viz import dNetwork # Miraldi et al. (2018) "Leveraging chromatin accessibility for transcriptional regulatory network inference in Th17 Cells" dNetwork.load_javascript_support() from jp_gene_viz import dNetwork from jp_gene_viz import multiple_network # TO START: dNetworkIn the menu. load_javascript_supportabove, choose "Cell" --> "Run ()All", and network + heatmap will load from jp_gene_viz import LExpression # Change "canvas"from tojp_gene_viz "SVG" (drop-down import menu multiple_networkin cell below) to enable drag interactions with nodes & labels # More info fromabout jp_gene_vizjp_gene_viz and importuser interface LExpression instructions are available on Github: LExpression.load_javascript_support() # https://github.com/simonsfoundation/jp_gene_viz/blob/master/doc/dNetwork%20widget%20overview.ipynb # Info specificLExpression to the "Multi-network".load_javascript_support view: () # https://github.com/simonsfoundation/jp_gene_viz/blob/master/doc/Combined%20widgets.ipynb In [3]: networkList = list() # this list will contain heatmap-linked network objects for networkInit in networkInits: # Indirectory [3]: networkListcontaining gene =expression list() data# this and networklist willfolder contain heatmap-linked network objects directory = for"." networkInit in networkInits: networkFile = networkInit[0] TF Phenotype # folder containingP adjnetworks Praw # Overlap # GWAS # TF networkFile = networkInit[0] curr = LExpression.LinkedExpressionNetworkIRF1 Parental() Extreme LongevitynetPath = 'Networks'.0006 9E-10 8 32 178 print directory + '/' + networkFile # name of gene expression curr = fileLExpression.LinkedExpressionNetwork() STAT3 IBD expressionFile = 'Th0_Th17_48hTh.txt'.002 3E-9 19 130 479 curr.load_network(directory + '/' + netPath + '/' + networkFile) print directory + '/' + networkFile # sample condition curr for .initialload_network gene node( directorycolor + '/' + netPath + '/' + networkFile) networkList.append(curr) FOXB1 IBD sampleConditionOfInt.04 = 'Th17(48h)'7E-8 11 130 171 STAT3 Ulcerative Colitis networkList.05 8E.append-8 (curr14) 85 479 ./ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv # The starting conditions for each of the networks is a list of tuples. Tuple entries are: # 0. network file name (column format) (as found in directory) ('Reading network', './Networks/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv')STAT3 Crohn’s Disease ./ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv .05 9E-8 16 113 479 # 1. column ('Readingof the expression network', matrix that'./Networks/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv') youIn [1]:want the# Visualizationnodes to be colored of the byKO+ChIP Gold Standard from: ('Loading saved layout', './Networks/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv.layout.json')STAT1 Blood Protein Levels # 2. network title,.0 to6 which 1Ewe'll-7 add the gene30 and peak# Miraldi cutoffs620 et al. (2018) 280"Leveraging chromatin accessibility for transcriptional regulatory network inference in Th17 Cells" Omitting edges, using canvas, and fast layout default because the# 3. networkcut off('Loading for is edge large strength, saved notelayout', TRN edges './Networks/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv.layout.json') strengths are quantile for 15 TFs/gene, to see top 10 TFs/gene, NFKB2 Chronic Inflammatory# Disease increase cutoff.0 6to .33, 1Eetc.-7 10 93 199 ./ATAC_Th17_bias50_maxComb_sp.tsv Omitting edges, using canvas, and# TO fast START: layout In the defaultmenu above, because choose "Cell" the network--> "Run All",is large and network + heatmap will load networkInits./ATAC_Th17_bias50_maxComb_sp.tsv = [ # Change "canvas" to "SVG" (drop-down menu in cell below) to enable drag interactions with nodes & labels ('Reading network', './Networks/ATAC_Th17_bias50_maxComb_sp.tsv')ETS1 Eosinophil Counts ('ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv'.07 1E-7 12, sampleConditionOfInt77 ,' Final383 ChIP/ATAC(Th17)+KO+ATAC(Th) TRN',0), #.93), ('Reading network', './Networks/ATAC_Th17_bias50_maxComb_sp.tsv')# More info about jp_gene_viz and user interface instructions are available on Github: ('Loading saved layout', './Networks/ATAC_Th17_bias50_maxComb_sp.tsv.layout.json') ('ATAC_Th17_bias50_maxComb_sp.tsv' ,sampleConditionOfInt# https://github.com/simonsfoundation/jp_gene_viz/blob/master/doc/dNetwork%20widget%20overview.ipynb,'Final ATAC-only TRN', 0), #.93), ("KO75_KOrk_1norm_sp.tsv"('Loading saved,sampleConditionOfInt layout', './Networks/ATAC_Th17_bias50_maxComb_sp.tsv.layout.json'),'KO G.S. (25 TFs)',0), Omitting edges, using canvas, and fast layout default because the network is large # Info specific to the "Multi-network" view: ("KC1p5_sp.tsv",sampleConditionOfInt,'KO-ChIP G.S. (9 TFs)',0)] ./KO75_KOrk_1norm_sp.tsv Omitting edges, using canvas, and# https://github.com/simonsfoundation/jp_gene_viz/blob/master/doc/Combined%20widgets.ipynb fast layout default because the network is large ./KO75_KOrk_1norm_sp.tsv ('Reading network', './Networks/KO75_KOrk_1norm_sp.tsv') tfFocus = 1 # If 1, automatically applies the "TF only" function, so we can focus on TFs ('Reading network', './Networks/KO75_KOrk_1norm_sp.tsv')# directory containing gene expression data and network folder ('Loading saved layout', './Networks/KO75_KOrk_1norm_sp.tsv.layout.json').04 # If 0, all genes shown directory = "." Figure 6 ('Loading saved layout', './Networks/KO75_KOrk_1norm_sp.tsv.layout.json') Omitting edges, using canvas, and fast layout default because the network is large # folder containing networks IRF4 Omitting edges, using canvas, andnetPath fast = layout'Networks' default because the network is large ./KC1p5_sp.tsv RXRA In [2]: # Uncomment ./KC1p5_sp.tsvto run without install (in binder, for example)# name of gene expression file ('Reading network', './Networks/KC1p5_sp.tsv')A import sys ('Reading network', './Networks/KC1p5_sp.tsv')expressionFile = 'Th0_Th17_48hTh.txt' ('Loading saved layout', './Networks/KC1p5_sp.tsv.layout.json')if ".." not in sys.path: # sample condition for initial gene node color TF ETS1Phenotype ('LoadingPadj savedP rawlayout', # Overlap './Networks/KC1p5_sp.tsv.layout.json') # GWAS # TF Omitting edges, using canvas, and fast layout default FOSL2because the sysnetwork.path.append is large("..") sampleConditionOfInt = 'Th17(48h)' IRF1 Parental Extreme Longevityfrom jp_gene_viz Omitting .0006import edges,dNetwork 9E -using10 canvas,8 and fast 32layout default178 because the network is large MAFdNetworkBATF.load_javascript_support() # The starting conditions for each of the networks is a list of tuples. Tuple entries are: STAT3RUNX2 IBDCEBPB from jp_gene_viz import.002 multiple_network3E-9 19 # 0. network130 file name (column479 format) (as found in directory) In [4]: # visualize the networks -- HARD CODED for 4 networks: .02 fromIn [4]: jp_gene_viz# visualize import LExpression the networks -- HARD# 1.CODED column for of the4 networks: expression matrix that you want the nodes to be colored by M = multiple_network.MultipleNetworksFOXB1 ( IBD RORC .04 7E-8 11 130 171 HIF1A LExpression.Mload_javascript_support = multiple_network() .MultipleNetworks# 2. network( title, to which we'll add the gene and peak cutoffs [[networkList[0], networkListSTAT3[1]], UlcerativeSMAD3 Colitis .05 8E-8 14 # 3. cut 85off for edge strength,479 note TRN edges strengths are quantile for 15 TFs/gene, to see top 10 TFs/gene, STAT3 [[networkList[0], networkList# [ 1 ]],increase cutoff to .33, etc.

[networkList[2], networkListBetweenness [3]]]) In [3]: networkList = list() # this list will contain heatmap-linked network objects EGR2 [networkList[2], networkListnetworkInits[3]]]) = [ STAT3 Crohn’s Disease for networkInit in .0networkInits5 9E: -8 16 113 479 M.svg_width = 500 ('ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv',sampleConditionOfInt,' Final ChIP/ATAC(Th17)+KO+ATAC(Th) TRN',0), #.93), networkFileM.svg_width = networkInit = [5000] M.show() STAT1 Blood Protein Levels .06 1E-7 30 ('ATAC_Th17_bias50_maxComb_sp.tsv'620 280 ,sampleConditionOfInt,'Final ATAC-only TRN', 0), #.93), curr = LExpression.LinkedExpressionNetwork() M.show() ("KO75_KOrk_1norm_sp.tsv",sampleConditionOfInt,'KO G.S. (25 TFs)',0), print directory + '/' + networkFile NFKB2 Chronic Inflammatory Diseases .06 1E-7 10 ("KC1p5_sp.tsv"93 ,sampleConditionOfInt199 ,'KO-ChIP G.S. (9 TFs)',0)] 200 curr.load_network(directory + '/' + netPath + '/' + networkFile) ETS1 Neutrophil % of Granulocytes networkList .append.07( curr) 1E -7 12 77 383 20tfFocus0 = 1 # If 1, automatically applies the "TF only" function, so we can focus on TFs # If 0, all genes shown ⊖ ∧ ⊨ 0 .1 .2./ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv⊖ ∧ ⊨ ⊖ ∧ ⊨ ⊖ ∧ ⊨ Out Degree ('Reading network', './Networks/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv') Final ChIP/ATAC(Th17)+KO+ATACB(Th) TRN Inflammatory('Loading savedFi nBowelal A layout',TAC-on Diseasely T'./Networks/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv.layout.json')RN (54 genes) Omitting edges,Fina lusing ChIBDIP /Acanvas,TAC(Th 1and7)+ KfastO+A TlayoutAC(TInh ) [2]:defaultTRN # becauseUncomment the to network run without is large install (in binder, for Fexample)inal ATAC-only TRN Gpr65 ./ATAC_Th17_bias50_maxComb_sp.tsv import sys .04 Impg2 if ".." not in sys.path: Ptpn2 ('Reading network', 54'./Networks/ATAC_Th17_bias50_maxComb_sp.tsv')GGprp6r655 Impg2 Cd28 ('Loading saved layout', './Networks/ATAC_Th17_bias50_maxComb_sp.tsv.layout.json')PPttpn2pn2 Isysmpg2.path.append("..") Gpr65 Map3k8 Cd2Cd828 Ptpn2 Impg2 Ipmk Omitting edges, usingMa p3pcanvas,3k88 and fast layout defaultfrom because jp_gene_viz the network import is dNetworklarge Cd28 Sbno2 Rorc ./KO75_KOrk_1norm_sp.tsv dNetwork.load_javascript_support() Map3k8 TF Phenotype Padj PrawSSbnob no22# OverlapRRoorcrc # GWAS # ITFpmIp k mk Il23r ('Reading network', './Networks/KO75_KOrk_1norm_sp.tsv')from jp_gene_viz import multiple_network Sbno2 Rorc Ipmk Ifroml2Il32r 3jp_gene_vizr import LExpression Fcgr3 IRF1 Parental Extreme Longevity('Loading saved layout',.0006 './Networks/KO75_KOrk_1norm_sp.tsv.layout.json')9E-10 8 32 178 Il23r Omitting edges, usingFc grcanvas,gr3 and fast layout defaultLExpression because the.load_javascript_support network is large () Tnfsf15 STAT3 IBD ./KC1p5_sp.tsv .002 3E-9 19 130 479 Fcgr3 Il10 Il2ra ('ReadingSTAT3 network',TTnfsnfsff15 './Networks/KC1p5_sp.tsv') FOXB1 IBD .04 7E-8 In11 [3]: IlnetworkList1Il010 130 = list() # 171thisIl 2 listrIal2r awill contain heatmap-linkedTnfsf15 network objects ('Loading saved layout', './Networks/KC1p5_sp.tsv.layout.json') Il10 Il2ra Stat3 for networkInit in networkInits: G.0al2c Foxb1 Omitting edges, using canvas, andSS fastttaat t3layout3 default because theFo networkxb1 is large STAT3 Ulcerative Colitis .05 8E-8 14 GGalac l c networkFile85 Fo =x networkInitb1 479 [0] Stat3 Foxb1 Cd226 CCd2d226 Galc STAT3So x5Crohn’s Disease .05 9E-8 16 curr 113= LExpression .LinkedExpressionNetwork479 () Cd226 Il18r1In [4]: # visualize the networks -- HARD CODED for 4 networks:S o S xo print5x5 directory + '/'Il1 +8 rnetworkFile1

Betweenness Il18r1 Sox5 STAT1Betweenness Blood Protein Levels M = multiple_network.0.MultipleNetworks6 1E-7 ( 30 curr.620load_network (directory280 + '/' + netPath + '/' + networkFile) Il18r1 Lpxn Smad3 Sema6d FOXB1 [[networkListL[px0],n networkList[1]], Smad3 networkListSeSmeam6ad6d.append(curr) NFKB2 Chronic Inflammatory Disease [networkLists [L2p],.0x nnetworkList6 1E[3-]]])7 S10m ad3 93 199 Lpxn Smad3 Sema6d Lpp M.svg_width = 500 ./ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv ETS1 Eosinophil Counts .07 L1ELpppp-7 12 77 383 Lpp Litaf M.show() Litaf ('Reading network', './Networks/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv') Fosl2 Ankrd55 Litaf ('Loading savedAnk layout',rd55 './Networks/ChIP_A17_KOall_ATh_bias50_maxComb_sp.tsv.layout.json')Litaf Skap2 Skap2 FFoossl2l2 Ankrd55 Fosl2 Ankrd55 2S00kap2 Omitting edges, using canvas, and fast layout default because the networkSk aisp2 large ./ATAC_Th17_bias50_maxComb_sp.tsv Hormad2 0 .1 .2 .3 .4 HoHormrmaad2d2 Hormad2 ⊖ ∧ ⊨ ('Reading network', './Networks/ATAC_Th17_bias50_maxComb_sp.tsv')⊖ ∧ ⊨ OutOut DegreeDegree ('Loading saved layout', './Networks/ATAC_Th17_bias50_maxComb_sp.tsv.layout.json') Final ChIP/ATAC(Th17)+KO+ATAC(Th) TRN Omitting edges, usingFina l canvas,ATAC-only TandRN fast layout default because the network is large ./KO75_KOrk_1norm_sp.tsv ('Reading network', './Networks/KO75_KOrk_1norm_sp.tsv') Bank1 Bank1 Antxr2 ('Loading saved layout', './Networks/KO75_KOrk_1norm_sp.tsv.layout.json')Antxr2 Ankrd55 Omitting edges, using canvas, Aandnk fastrd5 5layout default because the network is large ./KC1p5_sp.tsv ('Reading network', './Networks/KC1p5_sp.tsv') ('Loading saved layout', './Networks/KC1p5_sp.tsv.layout.json') Omitting edges, using canvas, and fast layout default because the network is large

In [4]:Nfk# bvisualize1 the networks -- HARD CODED for 4 networks: Nfkb1 Thada M = multiple_networkT.MultipleNetworkshada ( Nfkb2 [[networkList[0], networkList[1]], Nfkb2 [networkList[2], networkList[3]]]) ⊖ ∧ ⊨ ⊖ ∧ ⊨ M.svg_width = 500 ⊖ ∧ ⊨ M.show() ⊖ ∧ ⊨ Zpbp2 Zpbp2 KO G.S. (25 TFs) KO-ChIP G.S. (9 TFs) KO G.S. (25 TFs) 200 KO-ChIP G.S. (9 TFs) Gpr65 Gpr65 Ptpn2 Ptpn2 Gpr65 ⊖ ∧ ⊨ ⊖ ∧Gpr6⊨5 Cd28 Il2ra Ets1 Ptpn2 CdC2d828Il2ra Ets1 Ptpn2 Cd28 Map3k8 Map3k8 Final ChIP/ATAC(Th17)+KO+ATAC(Th) TRN Final ATAC-only TRN Ipmk Map3k8 Rorc Ipmk MaRpo3rkc8 Sbno2 Rorc SbSnbon2o2 RRorocrc Ipmk Sbno2 Rorc Ipmk Il23r Il23r Zfpm1 Zfpm1 Prdm1 Il23r Bcl2 Prdm1 Bcl2 Il23r Fcgr3 Fcgr3 Rasgrp4 Rasgrp4 Il10 Il2ra Il1I0l10 Il2Irla2ra Il10 Il2ra Stat3 Stat3 Klf3 Klf3 Galc Stat3 GaGlcalc Stat3 Galc Cd226 CdC2d22626 Cd226 Sox5 ⊖ ∧ ⊨ Sox5 ⊖ ∧ ⊨ Zfp365 Zfp365 Il18r1 Sox5 Il1I8l1r18r1 Sox5 Il18r1 KO G.S. (25 TFs) KO-ChIP G.S. (9 TFs) Smad3 Smad3 Chd7 Chd7 Lpxn Sema6d LpLxpnxn Smad3 Sema6d Lpxn Smad3 Lpp LppBank1 Ets1 Antxr2 Ets1 Lpp Antxr2 Lpp Litaf Ankrd5L5itaf Fosl2 Ankrd55 Fosl2 Ankrd55 Exoc2 Exoc2 Skap2 SkSakpa2p2 Fosl2 Skap2 Fosl2 Bcl2a1b Bcl2a1b Zfp36l2 Zfp36l2

Zfp652 Nfkb2 Zfp652 Nfkb2 Spsb1 Larp4b Spsb1 Larp4b

⊖ ∧ ⊨ ⊖ ∧ ⊨

KO G.S. (25 TFs) KO-ChIP G.S. (9 TFs)

Il2ra Zfpm1 Ets1 Bcl2 Il2ra Ets1 Bcl2 Rorc In [5]: # Set network preferences In [5]: # Set network preferences Rorc Rasgrp4 count = 0 count = 0 for curr in networkList: for curr in networkList: Klf3 Klf3 networkInit = networkInits[count] networkInit = networkInits[count] # get title information + curr column for shading of figures # get title information + curr column for shading of figures Zfp365 currCol = networkInit[1] In [5]: # Set network preferences currCol = networkInit[1] titleInf = networkInit[2] count = 0 Chd7 titleInf = networkInit[2] Chd7 threshhold = networkInit[3] for curr in networkList: networkInit = threshholdnetworkInits [count= networkInit] [3] Ets1 # get title information + curr column for shading of figures Ets1 currCol = networkInit[1] # set threshold # set threshold Exoc2 curr.network.threshhold_slider.value = threshhold titleInf = networkInit curr.network[2] .threshhold_slider.value = threshhold Exoc2 threshhold = networkInit[3] curr.network.apply_click(None) curr.network.apply_click(None) Bcl2a1b curr.network.restore_click(None) # set threshold curr.network.restore_click(NoneBcl2)a1b Zfp36l2 Zfp36l2 if tfFocus: curr.network .threshhold_sliderif tfFocus: .value = threshhold curr.network.apply_click(None) # focus on TF core curr.network .restore_click # focus(None on) TF core curr.network.tf_only_click(None) if tfFocus : curr.network.tf_only_click(None) curr.network.layout_click(None) # focus on TF corecurr . network .layout_click(None) Z fp652 # layout network curr .network # layout.tf_only_click network(None) curr.network.layout_click(None) Spsb1 curr.network.connected_only_click() # layout network curr.network.connected_only_click() curr.network.layout_dropdown.value = 'fruchterman_reingold' curr.network .connected_only_clickcurr.network.layout_dropdown() .value = 'fruchterman_reingold' curr.network.layout_click() curr.network .layout_dropdowncurr.network..valuelayout_click = 'fruchterman_reingold'() curr.network.layout_click()

# set title # set title # set title In [5]: # Set network preferences curr.network.title_html.value = titleInf count = 0 curr.network.title_html.value = titleInf curr.network.title_html.valuefor =curr titleInf in networkList: # add labels networkInit = networkInits[count] # add labels curr.network .labels_button# add labels.value=True # get title information + curr column for shading of figures curr.network.labels_button.value=True curr.network .draw_clickcurr.network(None).labels_button.value currCol=True = networkInit[1] titleInf = networkInit[2] curr.network.draw_click(None) curr.network.draw_click(None) # Load heatmap threshhold = networkInit[3] curr.load_heatmap(directory + '/' + expressionFile) # Load heatmap # color nodes #according Load heatmap to a sample column in the gene # setexpression threshold matrix curr.network.threshhold_slider.value = threshhold curr.load_heatmap(directory + '/' + expressionFile) curr.gene_click curr(None.load_heatmap) (directory + '/' + expressionFile) curr.expression.transform_dropdown.value = 'Z score' curr.network.apply_click(None) # color nodes according to a sample column in the gene expression curr. expressionmatrix # .colorapply_transform nodes ()according to a sample curr.network column.restore_click in the gene(None )expression matrix curr.gene_click(None) curr.expression curr.col. gene_click= currCol (None) if tfFocus: curr.expression.transform_dropdown.value = 'Z score' curr.condition_click curr.expression(None) .transform_dropdown #. valuefocus on = TF'Z core score' count += 1 curr.network.tf_only_click(None) curr.expression.apply_transform() curr.expression.apply_transform () curr.network.layout_click(None) curr.expression.col = currCol /Users/emiraldi/anaconda/lib/python2.7/site-packages/scipy/stats/stats.py:2238: curr.expression.col = currCol # layout network RuntimeWarning: invalid value encountered in true_divide curr.condition_click(None) np.expand_dims(sstd, curr .axis=axis))condition_click (None) curr.network.connected_only_click() curr.network.layout_dropdown.value = 'fruchterman_reingold' count += 1 count += 1 curr.network.layout_click() In [ ]: /Users/emiraldi/anaconda/lib/python2.7/site-packages/scipy/stats/stats.py:2238:/Users/emiraldi/anaconda/lib/python2.7/site-packages/scipy/stats/stats.py:2238: RuntimeWarning: invalid value # set encountered title in true_divide RuntimeWarning: invalid value encountered in true_divide np.expand_dims(sstd, axis=axis)) np.expand_dims(sstd, axis=axis)) curr.network.title_html.value = titleInf # add labels In [ ]: In [ ]: curr.network.labels_button.value=True curr.network.draw_click(None)

# Load heatmap curr.load_heatmap(directory + '/' + expressionFile) # color nodes according to a sample column in the gene expression matrix curr.gene_click(None) curr.expression.transform_dropdown.value = 'Z score' curr.expression.apply_transform() curr.expression.col = currCol curr.condition_click(None) count += 1

/Users/emiraldi/anaconda/lib/python2.7/site-packages/scipy/stats/stats.py:2238: RuntimeWarning: invalid value encountered in true_divide np.expand_dims(sstd, axis=axis))

In [ ]: Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

Leveraging chromatin accessibility for transcriptional regulatory network inference in T Helper 17 Cells

Emily R. Miraldi, Maria Pokrovskii, Aaron Waters, et al.

Genome Res. published online January 29, 2019 Access the most recent version at doi:10.1101/gr.238253.118

Supplemental http://genome.cshlp.org/content/suppl/2019/02/19/gr.238253.118.DC1 Material

P

Accepted Peer-reviewed and accepted for publication but not copyedited or typeset; accepted Manuscript manuscript is likely to differ from the final, published version.

Open Access Freely available online through the Genome Research Open Access option.

Creative This manuscript is Open Access.This article, published in Genome Research, is Commons available under a Creative Commons License (Attribution 4.0 International license), License as described at http://creativecommons.org/licenses/by/4.0/.

Email Alerting Receive free email alerts when new articles cite this article - sign up in the box at the Service top right corner of the article or click here.

Advance online articles have been peer reviewed and accepted for publication but have not yet appeared in the paper journal (edited, typeset versions may be posted when available prior to final publication). Advance online articles are citable and establish publication priority; they are indexed by PubMed from initial publication. Citations to Advance online articles must include the digital object identifier (DOIs) and date of initial publication.

To subscribe to Genome Research go to: https://genome.cshlp.org/subscriptions

Published by Cold Spring Harbor Laboratory Press