bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Dynamic regulatory module networks for inference of type

specific transcriptional networks

Alireza Fotuhi Siahpirani1,2,+, Deborah Chasman1,8,+, Morten Seirup3,4, Sara Knaack1, Rupa Sridharan1,5, Ron Stewart3, James Thomson3,5,6, and Sushmita Roy1,2,7*

1Wisconsin Institute for Discovery, University of Wisconsin-Madison 2Department of Computer Sciences, University of Wisconsin-Madison 3Morgridge Institute for Research 4Molecular and Environmental Toxicology Program, University of Wisconsin-Madison 5Department of Cell and Regenerative Biology, University of Wisconsin-Madison 6Department of Molecular, Cellular, & Developmental Biology, University of California Santa Barbara 7Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison 8Present address: Division of Reproductive Sciences, Department of Obstetrics and Gynecology, University of Wisconsin-Madison +These authors contributed equally. *To whom correspondence should be addressed.

1 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Abstract

Changes in transcriptional regulatory networks can significantly alter cell fate. To gain insight into transcriptional dynamics, several studies have profiled transcriptomes and epigenomes at different stages of a developmental process. However, integrating these data across multiple cell types to infer cell type specific regulatory networks is a major challenge because of the small sample size for each time point. We present a novel approach, Dynamic Regulatory Module Networks (DRMNs), to model regulatory network dynamics on a cell lineage. DRMNs represent a cell type specific network by a set of expres- sion modules and associated regulatory programs, and probabilistically model the transitions between cell types. DRMNs learn a cell type’s regulatory network from input expression and epigenomic pro- files using multi-task learning to exploit cell type relatedness. We applied DRMNs to study regulatory network dynamics in two different developmental dynamic processes including cellular reprogramming and liver dedifferentiation. For both systems, DRMN predicted relevant regulators driving the major patterns of expression in each time point as well as regulators for transitioning sets that change their expression over time.

2 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Introduction

Transcriptional regulatory networks connect regulators such as factors to target , and

specify the context specific patterns of . Changes in regulatory networks can significantly

alter the type or function of a cell, which can affect both normal and disease processes. The regulatory

interaction between a (TF) and a target gene’s promoter is dependent upon TF binding

activity, histone modifications and open chromatin, that have all been associated with differential expression

between cell types [1–4]. To probe the dynamic and cell type-specific nature of mammalian regulatory net-

works, several research groups are generating matched transcriptomic and epigenomic data from short time

courses or for cell types related by a branching lineage [5–7]. However, very few methods have exploited

these datasets to infer cell type specific regulatory networks.

Existing computational methods to infer cell type specific networks can be grouped into three main cat-

egories: (i) Skeleton network-based methods, (ii) Regression-based methods, (iii) Probabilistic graphical

model-based methods. Skeleton-network based methods, combine available -protein and protein-

DNA interactions to create a “skeleton” network representing the union of possible edges that can exist

in a cell at any time, and overlay context-specific mRNA levels on this network to derive dynamic snap-

shots of the network [8, 9]. Such approaches rely on a comprehensive characterization of the regulatory

network, which we currently lack for most mammalian cell types. A second group uses linear and non-

linear regression-based methods to predict mRNA levels as a function of chromatin marks [10, 11] and/or

transcription factor occupancies [11] and can infer a predictive model of mRNA for a single condition (time

point or cell type). These regression approaches are applied to each context individually and have not been

extended to model multiple related time points or cell types, which is important to study how networks tran-

sition between different time points and cell states. Multi-task regression methods [12–17] are systematic

approaches to learn multiple related networks. Thus far, these approaches have nearly all been based on

mRNA levels and require a sufficiently large number of mRNA samples for each time point or cell type

to reliably estimate the statistical dependency structure. The last class of methods are based probabilis-

tic graphical models, namely, dynamic Bayesian networks (DBNs), including input-output Hidden Markov

Models [18] and time-varying DBNs [15]. Both DREM and time-varying DBNs are suited for time courses

only, and do not accommodate lineage trees.

3 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

To address the limitations of existing methods and systematically integrate parallel transcriptomic and

epigenomic datasets, we have developed a novel dynamic network reconstruction method, Dynamic Regula-

tory Module Networks (DRMNs) to predict regulatory networks in a cell type-specific manner by leveraging

their relationships to each other. DRMNs are based on a non-stationary probabilistic graphical model and

can be used to model regulatory networks on a lineage. DRMNs represent the cell type regulatory network

by a concise set of gene expression modules, defined by groups of genes with similar expression levels, and

their associated regulatory programs. The module-based representation of regulatory networks enables us to

reduce the number of parameters to be learned and increases the number of samples available for parameter

estimation. To learn the regulatory programs of each module at each time point, we use multi-task learning

that shares information between related time points or cell types.

We applied DRMN to three datasets measuring transcriptomic and epigenomic profiles in multiple cell

types at different stages of cellular reprogramming and during dedifferentiation from liver hepatocytes. Two

of these corresponding to cellular reprogramming including one array [19] and one sequencing experiments

[7]. The third dataset (unpublished) includes RNA-seq and ATAC-seq profiling for mouse dedifferentiation.

DRMN learned a modular regulatory program for each of the cell types by integrating chromatin marks, open

chromatin, sequence specific motifs and gene expression. Compared to an approach that does not model

dependencies among cell types DRMN was able to predict more realistic regulatory networks that were

more concise and reflected the shared relatedness among the cell types, while maintaining high predictive

power of expression. Furthermore, integrating cell type specific chromatin data with cell type invariant

sequence motif data enabled us to better predict expression than each data type alone. Comparison of the

inferred regulatory networks showed that they change gradually over time, and identified key regulators

that are different between the cell types. In particular, among the regulators identified by DRMN were

Klf4, , and Tcf3 which are known to be associated with ESC state. In the hepatocyte data, we found

regulators associated with development such as Gbx2, Six6, and Irx1 associated with transitioning gene sets.

Taken together our results show that DRMN is a powerful approach to infer cell type specific regulatory

networks, which enables us to systematically link upstream regulatory programs to gene expression states

and to identify regulator and module transitions associated with changes in cell state.

4 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Results

Dynamic Regulatory Module Networks (DRMNs)

DRMNs are used to represent and learn cell-type or time-point specific regulatory networks while leveraging

the relationship among the cell types and time points. DRMN’s design is motivated by the difficulty of

inferring a cell type specific regulatory network from expression when only a small number of samples are

available for each cell type. In a DRMN, the regulatory network for each context, such as a cell type, is

represented compactly by a set of modules, each representing a discrete expression state, and the regulatory

program for each module (Figure 1). The regulatory program is a predictive model of expression that

predicts the expression of genes in a module from upstream regulatory features such as sequence motifs and

epigenomic signals. Like previous module-based representations of regulatory networks [14,20,21], DRMN

organizes genes into modules to reduce the number of parameters and increase the number of samples

available for statistical network inference. In addition, DRMN leverages the relationship between cell types

defined by a lineage tree. The modules and regulatory programs are learned simultaneously using a multi-

task learning framework to encourage similarity between the regulators chosen for a cell type and its parent

in the lineage tree. DRMN allows two ways to share information across cell types: DRMN-FUSED and

DRMN-ST. DRMN-FUSED makes use of regularized regression to share information across cell types,

while the other uses a graph structure prior approach (See Methods). Both approaches are able to share

information across timepoints effectively, however, the DRMN-FUSED approach is computationally more

efficient.

DRMNs offer a flexible framework to integrate diverse regulatory genomic features

DRMN’s predictive model can be used to incorporate different types of regulatory features, such as sequence-

based motif strength, accessibility, and histone marks. We first examined the relative importance of context-

specific (e.g., chromatin marks and accessibility) and context independent features (e.g., sequence motifs)

for building an accurate gene expression model. We compared the performance of both versions of DRMNs

on different feature sets on a mouse reprogramming dataset which had both array and sequencing datasets

(Figure 2). Our metric for comparison was the Pearson’s correlation between true and predicted expression

5 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

in each module (Figure 2A) on three fold cross-validation.

The different features we considered were, (1) Motif only, (2) Histone marks, (3) Histone and Motif,

(4) Accessibility (from atac-seq) and motif, (5) Motifs with quantified accessibility (Q-motif), (6) Histone

marks, motif features and accessibility, (7) Histone marks, Q-motif, and (8) Histone marks, Q-motif and

accessibility. We first compared motif alone, chromatin alone, and chromatin+motif, as these features were

available for both array and sequencing datasets for the DRMN-ST (Figure 2B,C) and DRMN-FUSED

(Figure 2D,E) models. In both models, motifs alone (dark blue markers) have minimal predictive power,

which is consistent across different k and for both array (Figure 2B, D) and sequencing (Figure2C, E)

data. Histone marks alone (red marker) have higher predictive power, however adding both histone marks

and motif features (magenta) has the best performance for k = 3 and 5, with the improved performance to

be more striking for DRMN-ST. For DRMN-Fused, Histone only and Histone + motifs seemed to perform

similarly, although at higher k using histone marks is better. Between different cell types the performance

was very consistent in array data, while for sequencing data, the MEF and MEF48 cell types were harder to

predict than ESC and preIPSC.

We next examined the contribution of accessibility data in predicting expression. As only the sequencing

dataset included ATAC-seq data for each cell type, we performed this comparison on the sequencing dataset

alone (Figure 2C, E). We incorporated the ATAC-seq data in five ways: as a single feature defined by

the aggregated accessibility of a particular promoter (ATAC feature, orange markers Figure 2) combined

with motif sequence, using ATAC-seq to quantify the strength of a motif instance (Q-Motif, cyan markers

Figure 2B, D), combining the ATAC feature with histone and motifs (dark purple marker), combining Q-

Motif with histone (light green), and the ATAC feature with histone and Q-motifs (dark green). We also

considered ATAC-seq as a single feature, but this was not very helpful (Supplementary Figure S1).

Combining the ATAC feature together with motif feature improves performance over the sequence motif

feature (Figure 2B, D orange vs. dark blue markers) suggesting that these are both useful features for

predicting expression. The Q-Motif feature (cyan markers) was better than the motif only feature (dark

blue) at lower k (k=3), however, surprisingly, did not outperform the sequence alone features. One possible

explanation for this is that the Q-Motif feature was more sparse than the sequence only feature, due to the 0

feature value if we could not assign a read to the feature. Finally, we compared the ATAC feature combined

6 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

with chromatin and motif features to study additional gain in performance (Figure 2B, D, purple markers).

Interestingly, even though using ATAC-seq features combined with motif features improves the performance

over motif alone, addition of ATAC-seq feature to histone marks+motif does not change the performance

of either version of DRMN (Figure 2B, D) compared to histone + motif (magenta markers). This suggests

that combination of multiple chromatin marks capture the dynamics of expression levels better than the

chromatin accessibility signal. It is possible that the overall cell type specific information captured by the

accessibility profile is redundant with the large number of chromatin marks and we might observe a greater

benefit of ATAC-seq if there were fewer or no marks. We observe similar trends with Histone+Q-Motif and

Histone+ATAC+Q-motif features (light and dark green) which perform on par to each other, and close to

histone+motif and histone+ATAC+motif.

DRMN-ST and DRMN-FUSED behaved in a largely consistent way for these different feature combina-

tions with the exception of the Histone+Motif feature where DRMN-Fused was not gaining in performance

at higher k. However, when we directly compared DRMN-ST to DRMN-FUSED, DRMN-FUSED was able

to outperform DRMN-ST on most of the feature combinations (Figure 2F, G), with the exception of His-

tone+Motif (magenta), Histone + Motif + ATAC (dark purple), Histone + Q-Motif (light green), and Histone

+ Q-Motif + ATAC (dark green) where the two models were similar. It is likely that DRMN-ST learns a

sparser model at the cost of predictive power (Supplementary Figure S2). For the application of DRMNs

to real data, we focus on DRMN-FUSED due to its improved performance.

Multi-task learning approach is beneficial for learning cell type-specific expression patterns

We next compared the utility of DRMN to share information across cell types or time points while learning

predictive models of expression against several baseline models: (1) those that do not incorporate sharing

(RMN), or (2) those that are clustering-based (GMM-Merged and GMM-Indep). The clustering-based base-

lines offer simple approaches to describe the major expression patterns across time but link cis-regulatory

elements to expression changes only as post-processing steps. For both DRMN and RMN, we learned

predictive models of expression in each module using different regulatory feature sets (Figure 3, Supple-

mentary Figure S3): motif only, chromatin mark only and using motif and chromatin marks. We used

overall correlation and per module expression level comparison to assess the quality of the predictions. We

7 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

performed these experiments on the array and the RNA-seq dataset for mouse reprogramming.

Using the overall correlation, both variants of DRMN, RMN model and GMM-Indep, vastly outper-

form GMM-Merged, which also had the least stable results across cell types and folds (Supplementary

Figure S3A-F). This suggests that the gene partitions are likely different between the different cell types

and imposing a single structure for all three, as done in GMM-Merged, misses out on the cell type specific

aspects of the data. Based on overall correlation, DRMN models performed at par with RMN and GMM-

Indep for most cases (Supplementary Figure S3A-F), with the exception of Histone and Histone+Motif for

sequencing data (Supplementary Figure S3A, B) where GMM-Indep is worse for lower ks. These results

suggest that based on overall correlation, clustering could offer a first approach to analyze these data, how-

ever, a predictive modeling approach has advantages over a clustering approach as it improves prediction

quality by incorporating additional cis-regulatory data.

We next compared the different models on the basis of the per-module expression levels (Supplementary

Figure S3G-L). DRMN and RMN clearly outperform the GMM-based approaches, which is expected be-

cause GMM is only able to produce one value per module and does not capture the within-module variation.

The overall high correlation for DRMN and RMN demonstrate that a model learned from the regulatory

features is able to provide a more fine-tuned model of expression variation.

Finally, we compared DRMNs to RMNs (Figure 3). Both versions of DRMN outperform or are at par

with their corresponding RMN versions on histone only and histone + motif features (Figure 3, blue for

DRMN and red for RMN). On motifs, the difference between the models was dependent upon the cell line,

the number of modules, k and specific implementation of DRMNs. In particular, DRMN-Fused was at par

or better than RMs on sequence motifs. However, DRMN-ST had a lower performance for several cell

lines on the sequencing data, at higher k. Interestingly, DRMN-ST did have a greater gain in performance

compared to DRMN-Fused on the Histone-Motif and Histone datasets from arrays. It is likely that motif

only features are being overfit to the data resulting in reduced performance. Overall, our results suggest that

a predictive modeling approach as in DRMN and RMN, can explain the expression variation much better

than a clustering based approach, and that using multi-task learning with regularized regression helps to

build accurate models of gene expression.

8 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Using DRMN to gain insight into regulatory programs of cellular reprogramming

We applied DRMN to gain insights into the regulatory programs of cellular reprogramming. We focus on

the results obtained on the sequencing dataset for reprogramming (Figure 4), although many of the trends

are captured in the array data too (Supplementary Figure S4). DRMN modules learned on the sequencing

data exhibit seven distinct patterns of expression in each of the four cell types (Figure 4A). We observe that

while the expression patterns remains the same, the number of genes in each module in each cell type varies

(Figure 4A). Furthermore, we compared the extent of similarity of matched modules between consecutive

cell types and found that the modules were on average 30-90% similar, exhibiting the lowest similarity at

the MEF48 to pips transition and the highest between the MEFs (Figure 4C). For the array as well, we

observed the greatest dissimilarity between the MEF and pips transition (Supplementary Figure S4). This

agrees with the pips state exhibiting a major change in transcriptional status during reprogramming.

To interpret the modules, we next looked at the regulatory programs inferred for each module (Figure 4B).

While some of these regulators are selected in all cell types, we also observe some cell type specific pat-

terns, such as Pitx2 in module 4 (associated with muscle cell differentiation [22]), Pou5f1 in module 5

and Esrrb in module 6 (associated with pluripotency state [3]), and Insm1 in module 7 (associated with

differentiation [23], all highlighted in red in Figure 4B).

To gain insight into the dynamics of the process, we identified genes that change their module assign-

ments between time points. Overall, of the 17,358 genes, 11,104 were associated with a module transition

in the sequencing dataset (Figure 5A) and of the 15,982 total genes in the array dataset, 3,573 exhibited a

module transition (Supplementary Figure S4A). We clustered the transitioning genes and tested each for

biological processes and also cis-regulatory elements. These transitioning gene sets can provide insight into

the overall dynamics of the process. One of these sets is enriched for binding sites of Tcf3 (Figure 5B),

which is known to repress pluripotency genes [24, 25] and is only enriched in the low expression module

(module 1) in MEF and MEF48. This gene set, induced specifically in ESCs was predicted to be regulated

by histone marks H3K79me2 and H3K27me3 and the TF, Bcl6, which indicates an interplay of histone and

TF binding to enable cell fate specification. These genes move from low expression module (1) in early

stages (MEF and MEF 48) to highly expressed modules (5,and 6) in the ES state (cell type specific module

assignments on left, cell type specific expression in the middle). We also observe that the promoter regions

9 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

of the genes in this set are strongly enriched for the Tcf3 motif (dark purple heatmap on right). The genes

exhibiting this trend include several ESC specific genes such as Sall4 and Dppa2. We observe a similar pat-

tern with the array data as well (Supplementary Figure S5), identifying ESC specific genes induced in the

iPSC state and enriched for Tcf3, thus corroborating the transitioning gene sets between the two expression

platforms.

Using DRMN to gain insight into regulatory programs of hepatocyte dedifferentiation

We next applied DRMNs to understand the temporal dynamics of regulatory programs during hepatocyte

dedifferentiation. A major challenge in studying dynamics in primary cells such as liver hepatocytes is

that they dedifferentiate from their hepatocyte state. Maintaining hepatocytes in their differentiated state is

important for studying normal liver function as well as for liver-related diseases [26]. Dedifferentiation could

be due to the changes in the regulatory program over time, however, little is known about the transcriptional

and epigenetic changes during this process. To address this, we collected RNA-seq and ATAC-seq data for

16 time points from 0 hours to 36 hours (Figure 6).

Here we applied DRMNs with k=5 modules and followed a similar approach for first interpreting the

modules by regulatory programs inferred for each module. We observe pluripotency associated TFs such as

Gbx2 and Kl4 [27], TFs associated with hepatocyte differentiation such as Onecut2 [28], and liver associated

TFs such as Hnf4a [29]. We also tested these genes for GO enrichment (Supplementary Figure S6) and

found that the repressed module (Module 1) was enriched for developmental processes while the other

modules were enriched for diverse metabolic processes. Of these modules 4 and 5, which are associated with

higher expression are enriched for more liver-specific metabolic function such as co-enzyme metabolism,

acetyl CoA metabolism and modules 2 and 3 where enriched for general housekeeping function such as

DNA and nucleic acid metabolism.

We next examined the transitioning gene sets. We identified a total of 150 transitioning gene sets span-

ning 5,762 genes. Many of the transitions were between modules that are adjacent to each other based on

expression levels, suggesting the majority of the dynamic transitions are subtle (e.g., module 1 and 2, Fig-

ure 7A). We next predicted regulators for these transitioning gene sets using a regularized regression model,

MTG-LASSO (Methods). Using this approach we identified 84 gene sets that we could predict a regulator

10 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

of. Figure 7B shows 5 of these gene sets where module assignment change more than 1 (e.g. 1 to 2 to 3) and

regulators associated with these gene sets. These include pluripotency or development associated TFs such

as Irx1 [30] (gene set 420 and 439), Gbx2 [27, 31] and Six6 [32] (gene set 439), Mecom [33], Esrrb [3],

Alx1 [34] (gene set 390). Together, these results suggest that analysis regulatory interactions associated

with specific modules and transitioning gene sets, can help identify transcription factors that drive these

processes.

11 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Discussion

Cell type-specific gene expression patterns are the result of a complex interplay between transcription factor

binding, genome accessibility and histone modifications. Accordingly, datasets that measure both transcrip-

tomes and epigenomes in closely related cell types and processes are becoming increasingly available. A

key challenge is to interrogate these datasets to identify the underlying gene regulatory networks that drive

context-specific expression changes. However, doing so from such datasets is a major challenge because of

the large number of variables measured in each time point. In this work, we developed, DRMNs, that sim-

plifies genome scale regulatory networks into gene modules and infers regulatory program for each module

in all the input cell types. Using DRMNs, one can characterize the major transcriptional patterns during a

dynamic process over time or over a cell lineage and identify transcription factors and epigenomic signals

that are responsible for these transitions.

Central to DRMN’s modeling framework is to jointly learn the regulatory programs for each cell type

or time point by using multi-task learning. Using two different approaches to multi-task learning, we show

that joint learning of regulatory programs is advantageous compared to a simpler approach of learning

regulatory programs independently per cell type. Furthermore, predictive modeling of expression that also

clusters genes into expression groups is more powerful than learning a single predictive model and simple

clustering. Such models have improved generalizability and are able to capture fine-grained expression

variation as a function of the upstream regulatory state of a gene.

DRMN offers a flexible framework to integrate a variety of experimental setups. In its simplest form,

DRMN can be applied to an expression time course, array or sequencing, and can use sequence specific

motif instances to learn a predictive model. However, DRMN is able to integrate other types of regulatory

signals such as chromatin accessibility measured using ATAC-seq, histone modifications and transcription

factor measured using ChIP-seq. The datasets that we applied DRMN to include exemplars of different

designs. In particular the reprogramming sequencing dataset was the most comprehensive with nine histone

marks, ATAC-seq and expression measured for four cell types. In contrast the dedifferentiation dataset

measured RNA-seq and ATAC-seq for 16 time points.

The Chronis et al [7] dataset enabled us to systematically study the utility of different cell-type specific

measurements such as chromatin marks and accessibility to predict expression. When combining ATAC-seq

12 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

with chromatin marks to predict expression, we did not see a substantial improvement in predictive power.

This is likely because of the large number of chromatin marks in our dataset. However, in both array and

sequencing data we found that combine sequence features (motifs) and histone modifications had the highest

predictive power. In our application to the Chronis et al dataset, we did not observe a substantial advantage

of using accessible motif instances (Q-motif) as opposed to all motif instances. This is most likely due to the

sparsity of the feature space of Q-motifs. As future work, an important direction of work would be to explore

additional ways to incorporate the ATAC-seq signal for each motif could provide a greater benefit [2].

We applied DRMNs to two distinct types of developmental processes: mouse reprogramming (3-4 cell

types) and hepatocyte dedifferentiation (16 time points). DRMN application identified the major patterns

of expression in both sequencing and array reprogramming datasets as well as the dedifferentation dataset.

When comparing the results from both processes there were greater changes in expression in the repro-

gramming time course compared to dedifferentiation indicative of the different dynamics in the two pro-

cesses. Importantly DRMN was able to identify key regulators for each module as well as dynamic gene

sets in both cases. Among the regulators identified by DRMN for the induces genes in the reprogramming

dataset included known pluripotency markers. Similarly, DRMN predicted liver transcription factors such

as HNF4G::HNF4A for induced genes in the dedifferentiation dataset. In both datasets, we identified tran-

sitioning genes and predicted regulators for these genes using simple or multi-task regression. This enabled

us to identify regulators important for transitions. In particular for ESC, we found a gene set that exhibited

dynamics predictable by histone elongation marks and a transcription factor Bcl6. Similarly for hepatocyte

dedifferentiation, we found gene sets associated with changes in HNF4G motif accessibility.

Currently DRMN operates on regulatory features on gene promoter regions. An important direction

of future work would be to incorporate long-range interactions to enable distal regions to contribute to

the expression levels of gene. Another direction would be to use more generic sequence features, such as

k-mers [35] to enable the discovery of novel regulatory elements and offer great flexibility in capturing

sequence specificity and its role in predictive models of expression.

Taken together, DRMN offers a powerful and flexible framework to model time series and lineage

specific regulatory genomic datasets and enables inference of cell type-specific regulatory programs. As

datasets that profile epigenetic and transcriptomic dynamics of specific processes become available, methods

13 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

like DRMNs will become increasingly useful to examine cell type and context-specific regulatory networks

form these datasets.

Materials and methods

Dynamic Regulatory Module Networks (DRMNs)

DRMN model description

The DRMN is a probabilistic model of regulatory networks for multiple cell types related by a lineage tree.

DRMN is suited for datasets with multiple time points or conditions, with a small number of samples (e.g,

one or two) per condition and several types of measurements, such as RNA-seq, ChIP-seq and ATAC-seq.

Due to the small sample size, a standard regulatory network, that connects individual TFs to target genes for

each time point, cannot be inferred. Instead in DRMN, we learn regulatory programs for groups of genes.

For C cell types, the DRMN model is defined by a set of regulatory module networks, R = {R1, ..., RC },

a lineage tree τ, and transition probability distributions Π = {Π1, ..., ΠC }, Figure 1. For each cell type c,

Rc =< Gc, Θc > defines the regulatory program as Gc, the set of edges between regulators and modules,

and Θc, the parameters of a regression function for each module that relates the selected regulatory features

to the expression of the module’s genes. Transition matrices {Π1, ..., ΠC } capture the dynamics of the

module assignments in one cell type c given its parent cell type in the lineage tree τ. Specifically, Πc(i, j) is the probability of any gene being in module j given that its parental assignment is to module i. For the

root cell type, this is simply a prior probability over modules.

The DRMN inputs are (i) cell type-specific expression for N genes X ∈ RN×C , assuming we have

a single measurement of a gene in each cell type; (ii) gene-specific regulatory features for F candidate

regulators YN×C×F ; and (iii) the lineage tree, τ. The regulatory signals used in the regulatory program

can be either context-independent (e.g., a sequence-based motif network) or context-specific (e.g., a motif

network informed by epigenomic measurements). The number of modules, k, is provided as input to the

method. Given the above inputs, the DRMN model aims to optimize the following score:

P (R|X, Y) ∝ P (X|R, Y)P (R) (1)

14 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Here P (X|R, Y) is the data likelihood given the regulatory program, and P (R), is a prior probability

distribution of the regulatory program. The data likelihood can be decomposed over the individual cell Q types: P (X|R, Y) = c P (Xc|Rc,Yc). Within each cell type, this is modeled using a mixture of predictive models, one model for each module. The prior, P (R) can in turn be defined in terms of the graph structure

Gc and parameters, Θc. We used two formulations for this prior to enable sharing information: DRMN-

Structure Prior (DRMN-ST) defines a structure prior over the graph structures P (G1, ..., GC ) while DRMN-

FUSED uses a regularized regression framework and implicitly defines priors on the P (θ1, ··· , θC ). In both frameworks, we share information between the cell types/time points to learn the regulatory programs of

each cell type or time point.

DRMN-ST: Structure prior approach. The structure prior, P (R), which is defined only over the graph

structures, P (G1, ..., GC ) assumes the parameters are set to their Maximum likelihood setting. The prior

term P (G1, ..., GC ) determines how information is shared between different cell types at the level of the

network structure and encourages similarity of regulators between cell types. P (G1, ..., GC ) is computed

using the transition matrices {Π1, ..., ΠC } and decomposes over individual regulator-module edges within each cell type as follows:

Y P (G1, ··· ,GC ) = P (If→k) f→k

where

root Y c c0 P (If→k) = P (If→k) P (If→k|If→k) c0→c∈τ

c where If→k is an indicator function for the presence of the edge f → k for cell type c, between a c c0 regulator f and a module k. To define P (If→k|If→k), we use the transition probability of the modules as:

  c c0 Πc(k|k) if I = I c c0  f→k f→k P (If→k|If→k) =  1 − Πc(k|k) otherwise

Here the first option gives the probability of maintaining the same state from parent to child cell lines, if the

edge has the same state in both parent and child (is present in both c and c0 or is absent in both), and the

15 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

second option gives the probability of changing the edge state.

We implemented a greedy hill climbing algorithm to incorporate the structure prior. See Section DRMN

learning for more details of the learning algorithm.

DRMN-FUSED: Parameter prior approach. Our second approach used a fused group LASSO formu-

lation to share information between the cell types/time points by defining the following objective:

X T 2 min ||Xc,k − Yc,kΘc,k||2 + ρ1||Θk||1 + ρ2||RΘk||1 + ρ3||Θk||2,1, (2) Θ c

where Xc,k is the expression vector of genes in module k in cell line c, Yc,k is the feature matrix (size

of module k by F , the number of features) corresponding to the same genes, and Θc,k is the vector of regression coefficients for the same module and cell line (1 by F , non-zero values correspond to selected

features), and Θk is the C by F matrix resulting from concatenation of Θc,k vectors (number of cell types/ time points by number of features, C by F ). R is a C − 1 by C matrix, encoding the lineage tree. Each

row of R correspond to an edge of the tree, meaning that if row i of R correspond to the edge c1 → c2

in the lineage tree, we set R(i, c1) to 1, R(i, c2) to −1, and all other values in that row to 0. This would

mean that RΘk would be a C − 1 by F where row i correspond to Θc1,k − Θc2,k, the difference between

regression coefficients of cell lines c1 and c2. ||.||1 denotes l1-norm (sum of absolute values), ||.||2 denotes

l2-norm (square root of sum of square of value), and ||.||2,1 denotes l2,1-norm (sum of l2-norm of columns

of the given matrix). ρ1, ρ2 and ρ3 correspond to hyper parameters, with ρ1 for sparsity penalty, ρ2 to

enforce similarity between selected features of consecutive cell lines in the lineage tree, and ρ3 to enforce

selecting the same features for all cell lines/time points. Thus, ρ2 controls the extent to which more closely

related cell types are closer in their regulatory program, which ρ3 controls the extent to which all the cell types share similarity in their regulatory programs. We set these hyper parameters based on cross-validation

by performing a grid search for ρ1 ∈ {0.5, 1, 2, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130}, and

ρ2, ρ3 ∈ {0, 10, 20, 30, 40, 50}. We implemented the algorithm described in MALSAR Matlab package [36], which uses accelerated gradient method [37,38] to minimize the objective function above in C++.

16 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

DRMN learning

DRMNs are learned by optimizing the DRMN score (Eqn 1), using an Expectation Maximization (EM)

style algorithm that searches over the space of possible graphs for a local optimum (Algorithm 1). The

algorithm uses a multi-task learning approach to jointly learn the regulatory programs for all cell types. In

the M step, we estimate transition parameters (M1 step) and the regulatory program structure (M2 step). In

the E step, we compute the expected probability of a gene’s expression profile to be generated by one of the

regulatory programs.

Algorithm 1: DRMN Algorithm Input: - Expression data X = {X1, ..., XC }, - Regulatory features Y = {Y1, ..., YC }, - Initial module assignments M = {M1, ..., MC } Output: - Regulatory programs R = {R1 = (G1, Θ1), ..., RC = (GC , ΘC )}, - Transition probabilities Π = {Π1, ..., ΠC } while not converged do M1: Estimate transition parameters (Π1, ..., ΠC ) M2: Update regulatory programs (G1, ..., GC , Θ1, ..., ΘC ) E: Update module assignment probabilities and module assignments (Γ, M1, ..., MC ) end

g,c Estimate transition parameters (M1 step): Let γk,k0 be the probability of gene g in cell type c to belong to module k, and, in its parent cell type c0, to module k0. We calculate the probability of transitioning from P γg,c 0 0 0 g k,k0 k in c to k in c as Πc(k, k ) = P γg,c . g,k,k0 k,k0

Update regulatory programs (M2 step): Recall that the regulatory program for each cell type c is Rc =<

Gc, Θc >, where Gc is a set of regulatory interactions f → k from a regulatory feature f to a module k, and

Θc are the parameters of a regression function for each module that relates the selected regulatory features to the expression of the module’s genes. The estimation of the regulatory programs is specific to the way in

which information is shared across tasks, and differs in the DRMN-ST and DRMN-FUSED approach.

We assume that the expression levels are generated by a mixture of experts, each expert corresponding

to a module. Each expert uses a Multivariate normal distribution, which means that the likelihood of gene

17 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

g’s expression being generated by the regulatory program for a module k in cell type c can be written as:

P (xg,c|Rc,k, yg,c) ∼ N(µc,k, Σc,k)

0 X f f = N(θc,k + θc,kyg,c, Σc,k) f∈Gc,k

where Gc,k is the current list of regulators selected for module k in cell type c. This is the equivalent of a linear regression to predict expression of genes in module k in cell type c, using regulatory features of genes

in cell type c. In the DRMN-ST approach, the regulatory interactions are learned by per-module per-cell type

regression in a greedy hill-climbing framework. At initialization, all regulatory network structures Gc are

empty, and the parameters Θc are computed simply as the empirical mean and variance of the genes initially assigned to each module in each cell type. In each iteration of DRMN learning, we fix the current module

assignments Mc for all cell types c and update the regulatory program of each module independently. In each iteration, we score each potential regulatory feature based on its improvement to the likelihood of the

model, and choose the regulator with maximum improvement (if over a minimum threshold). This regulator

is added to the module’s regulatory program for all cell types for which it improves the cell type-specific

likelihood. For the regularized regression version, DRMN-FUSED, the parameters are learned using an

accelerated gradient method [37, 38] to minimize the objective function in Eqn 2.

g,c Update soft module assignments (E step): Let γk|k0 be the probability of gene g in cell type c to belong 0 0 c to module k, given that in its parent cell type c , it belonged to module k . We also introduce αg, a vector c 0 of size K × 1 where each element αg(c ) specifies the probability of observations given the parent state is c0. We estimate the probabilities using a dynamic programming procedure, where values at internal nodes

in the lineage tree are computed using the values for all descendent nodes, down to the leaves.

If c is a leaf node, we calculate

g,c 0 γk|k0 = P (xg,c|Rc,k, yg,c)Πc(k|k )

where the first term is the probability of observing expression of gene g in cell type c in module k (given its

18 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

regulatory program and regulatory features), and the second is the probability of transitioning from module

k0 in the parent cell type c0 to module k in cell type c.

For a non-leaf cell type c:

g,c 0 Y g,l γk|k0 = P (xg,c|Rc,k, yg,c)Πc(k|k ) α (k) c→l∈τ

γg,c 0 g,c k|k0 For both internal and leaf cell types, we write the joint probability of the (k , k) pair as γk,k0 = αg,c(k0) , g,c 0 P g,c 0 where α (k ) = k γk|k0 is the probability of g’s expression in any module given parent module k .

Termination: DRMN inference runs for a set number of iterations or until convergence. Final module

assignments are computed as maximum likelihood assignments using a dynamic programming approach.

While module assignments between consecutive iterations do not change significantly, the final module as-

signments are significantly different from the initial module assignments, and predictive power of model

significantly improves as iterations progress (though improvements are small after 10 iterations, Supple-

mentary Figure S7).

In our experiments, we ran DRMN for up to ten iterations. When using greedy hill climbing approach,

to decrease computation time, we inferred regulatory edges in small batches per DRMN iteration. For each

full DRMN iteration, the M2 step was run until up to five regulators were added per module.

Hyper parameter tuning of DRMN-FUSED To assess the effect of hyper-parameters on the performance

of DRMN when using fused group LASSO, we performed a grid search on a wide range of parameters

values. Supplementary Figure S8 shows the average correlation (averaged over modules, and averaged

over cell lines) for different sequencing feature sets (same features sets described in Figure 2). We observe

that increase in ρ3 penalty (which enforces the selection of the same features across all cell lines) decreases the predictive power of the model in almost all cases (no penalty (magenta curve) vs. high penalty (yellow

curve)). Supplementary Figure S9 shows the effect of ρ1 (sparsity penalty) and ρ2 (fused penalty) on the predictive power of the model. We observe that when using chromatin features (Motif+Chromatin,

and ATAC+Motif+Chromatin), increase in ρ1 (increasing the sparsity) and increase in ρ2 (increasing the similarity of inferred networks) improves the predictive power of the method. Inversely, for the feature sets

19 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

that do not use chromatin (Motif, Q-Motif, and ATAC+Motif), increase in ρ1 (sparser models) decrease

the predictive power of the model. Increase in ρ2 decrease the predictive power of the method for sparser

models, while for denser networks (lower values of ρ1) changes in ρ2 does not significantly change the

performance. Finally, we note that while ρ1 is the main driver of sparsity of the model, increase in ρ2 also increase the sparsity of the model (second column in Supplementary Figure S9).

Input datasets

We applied DRMN to study regulatory program transitions during reprogramming of differentiated cells to

pluripotent cells. We obtained four datasets, one measured by array and three by sequencing. Three of these

assayed mouse cell types representing different stages of reprogramming, while the fourth one focused on

hepatocyte dedifferentation. The reprogramming dataset included differentiated mouse embryonic fibrob-

lasts (MEFs), partially reprogrammed induced pluripotent stem cells (pre-iPSCs), and pluripotent stem cells

(iPSCs or embryonic stem cells, ESCs).

Reprogramming array data

We obtained measurements of gene expression and eight chromatin marks in three cell types that were pro-

cessed by [19]. The measurements spanned three cell types: MEF, pre-iPSC, and iPSC. The original data

were collected from multiple publications [19, 39–41]. The gene expression of 15,982 genes was measured

by microarray. Eight chromatin marks were measured by ChIP-on-chip (chromatin immunoprecipitation

followed by promoter microarray). For each gene promoter, each mark’s value was averaged across a 8000-

bp region associated with the promoter. The chromatin marks were associated with active transcription

(H3K4me3, H3K9ac, H3K14ac, and H3K18ac), repression (H3K9me2, H3K9me3, H3K27me3), and tran-

scription elongation (H3K79me2).

Reprogramming data from Chronis et al

Chronis et al. [7] assayed gene expression with RNA-seq, nine chromatin marks with ChIP-seq, and chro-

matin accessibility with ATAC-seq using sequencing in different stages of reprogramming (MEF, MEF48

(48 hours after start of the reprogramming process), pre-iPSC, and embryonic stem cells (ESC)). We aligned

20 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

all sequencing reads to the mouse mm9 reference genome using Bowtie2 [42]. For RNA-seq data, we quan-

tified expression to TPMs using RSEM [43] and applied a log transform. After removing unexpressed genes

(TPM< 1), we had 17,358 genes. The epigenomic data spanned nine histone modification marks (H3K27ac,

H3K27me3, H3K36me3, H3K4me1, H3K4me2, H3K4me3, H3K79me2, H3K9ac, and H3K9me3), and

chromatin accessibility (ATAC-seq). For the epigenomic datasets, we obtained per- read coverage

using BEDTools [44], aggregated counts within ±2, 500 bps of genes’ transcription start sites, and applied

log transformation.

Hepatocyte dedifferentiation time course data

For the dedifferentiation time course, samples were extracted from adult mouse liver, and gene expression

and chromatin accessibility was assayed by RNA-seq and ATAC-seq at 0 hours, 0.5 hours, 1 hours, 2 hours,

and every 2 hours until 24 hours, and finally 36 hours (16 time points in total). All sequencing reads

were aligned to mouse mm10 reference genome using Bowtie2 [42], and gene expression was quantified

using RSEM [43]. Any gene with TPM= 0 in all time points was removed, resulting in 14,794 genes

with measurement in at least one time point. Per-base pair read coverage for ATAC-seq was obtained using

BEDTools [44], and counts were aggregated within ±2, 500 bps of genes’ transcription start sites. Both gene

expression and accessibility data were quantile normalized across 16 time points and then log transformed.

Feature set generation for reprogramming datasets

We considered the following features for each gene to predict its expression. Feature datasets were processed

as described above. Features involving the accessibility data were only available for the sequencing dataset

(marked below with *). Any missing values in the feature data were set to 0.

• Motif network, scored by −log10(p-value) of motif instances (features for 353 TFs). We assembled a motif-based, context-independent network by computationally identifying links between transcrip-

tion factor motifs and the genes from the expression data based on the presence of a motif instance

within the gene’s promoter region. We downloaded a meta-compilation of mouse motif PWMs from

http://piq.csail.mit.edu/ [45], which were sourced from multiple databases [46–48]. From the full mo-

tif list, we used only those annotated as transcription factor [49]. We applied FIMO [50] to

21 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

scan the mouse genome (Ensembl version NCBIM37.66) for significant motif instances (p < 1e − 5).

This resulted in a total of 353 TFs. Each edge in the motif network between transcription factor r and

gene g is scored with the smallest p-value for any instance of r within 2kb of any TSS for g. We also

assessed a z-score transformation of the p-values, which performed poorly compared to the p-value

representation (not shown).

• Chromatin mark signal (feature for 8 marks in array dataset, 9 marks in sequencing dataset)

• Chromatin + motif: Concatenation of chromatin and motif features (361 features array, 362 sequenc-

ing)

• *ATAC: Log ATAC-seq signal (1 feature).

• *ATAC + motif: ATAC-seq signal and motif features (354 features)

• *Q-motif: Motif features scored by ATAC-seq signal (353 features) We quantified cell type-specific

motif networks using the ATAC-seq data from the sequencing dataset. We used BedTools (bedtools

genomecov -ibam input.bam -bg -pc > output.counts) to obtain the aggregated sig-

nal on each base pair. We defined the feature value as the log-transformed mean ATAC-seq count

under each motif instance.

• *Chromatin + ATAC + motif: Concatenation of chromatin, accessibility, and motif features (363

features).

• *Chromatin + Q-motif: Concatenation of chromatin and Q-motif features (362 features)

• *Chromatin + ATAC + Q-motif: Concatenation of chromatin, accessibility, and Q-motif features (363

features).

Feature set generation for dedifferentiation datasets

We used PIQ package [45] to identify genome-wide motif instances using PWMs from CIS-BP database

[51]. Motif instances were mapped to pm2, 500bp of genes’ TSS, creating 2,856 features. Motif instances

were scored by ATAC-seq signal (see Q-Motif above), and accessibility signal on promoter (pm2, 500bp

22 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

of genes’ TSS) was added as an additional feature. Features were quantile normalized and log transforemd

across the 16 time points.

Experiments to evaluate DRMN and feature combinations

In our experiments, we sought to evaluate (a) whether sharing information between cell types is beneficial,

and (b) which types of regulatory features are most useful to predict gene expression. We evaluated ex-

pression prediction in three-fold cross-validation, where a model was trained on two-thirds of the genes and

used to predict expression for the held-aside third. We ran each experiment over a range of the number of

modules, k ∈ 3, 5, 7, 9, 11.

DRMN’s performance was compared to that of two baseline algorithms. RMN is simply DRMN run on

one cell type at a time, keeping all other experimental parameters identical. GMM applies Gaussian Mixture

Model clustering to gene expression values, and predicts the expression of a held-aside gene as the mean

of the module with the highest posterior probability. We ran GMM per-cell type (GMM-Indep) as well as

on a merged expression matrix with all three cell types (GMM-Merged). For GMM-Merged, the expression

predictions were scored per cell-type.

We evaluated DRMN to other base line methods based on the ability to predict held-aside gene expres-

sion in three-fold cross validation. Expression prediction was evaluated using two metrics. First, the overall

expression prediction was assessed using, the Pearson’s correlation between actual and predicted expression

of all held-aside genes. Second, the average Pearson’s correlation in each module, which reflects a more

fine-grained view of the utility of using regulatory features in explaining the expression levels of individual

genes in a module. The first metric examines how each model (e.g., simple expression-based clustering

approach versus expression and regulatory feature-based model), explains the overall variation in the data.

The second metric assess the value of using additional regulatory features, such as, sequence and chromatin

to predict expression. We assessed this ability using a 3-fold cross validation scheme for different values

of k, number of clusters, and compared the observed (true) expression in the three test sets to predicted

expression values.

23 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Identification of transitioning genes

A transitioning gene was defined as a gene that its module assignment was changed in at least one time

point/cell line. These genes were grouped into transitioning gene sets using hierarchical clustering approach

(with city block as distance metric and 0.05 as distance threshold) using our in-house programs.

Prediction of regulators for transitioning genes

To identify regulators associated with transitioning gene sets in dedifferentiation dataset, we used Multi-

Task Group LASSO. Briefly, in each transitioning gene sets, we solve a linear regression for each gene to

predict its expression (across time points) using Q-motif features as predictor, with the group constraint to

enforce selection of similar predictors for all the genes in the gene set:

2 min ||X − Y Θ||2 + λ||Θ||2,1, Θ

where X is the expression matrix of genes in the transitioning gene set, and Y correspond to fea-

ture matrix of Q-motif features of regulators for genes in the transitioning gene set. We used MATLAB

implementation of multi-task group LASSO from SLEPP package [52]. We performed leave-one-out cross-

validation and selected regulators that were selected in at least 60% of the trained models. Additionally, we

used randomized feature data to train 40 random models, and asked if the frequency of selecting a regulator

was significantly higher random models (z-test with p-value < 0.05 using mean and standard deviation from

the 40 random models).

Availability

The DRMN codeis available at https://github.com/Roy-lab/drmn

Acknowledgment

This work was made possible in part by NIH NIGMS grant 1R01GM117339 to S.R. The authors thank the

Center for High Throughput Computing (CHTC) for computing resources.

24 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

References

[1] Alvaro J. Gonzalez, Manu Setty, and Christina S. Leslie. Early enhancer establishment and regulatory

locus complexity shape transcriptional programs in hematopoietic differentiation. Nature genetics,

47(11):1249–1259, Nov 2015. 26390058[pmid].

[2] Hatice U. Osmanbeyoglu, Fumiko Shimizu, Angela Rynne-Vidal, Petar Jelinic, Samuel C. Mok,

Gabriela Chiosis, Douglas A. Levine, and Christina S. Leslie. Chromatin-informed inference of tran-

scriptional programs in gynecologic and basal breast cancers. bioRxiv, 2018.

[3] RichardA. Young. Control of the embryonic stem cell state. Cell, 144(6):940 – 954, 2011.

[4] Tong Ihn Lee and Richard A Young. Transcriptional regulation and its misregulation in disease. Cell,

152(6):1237–1251, Mar 2013.

[5] Joseph A Wamstad, Jeffrey M Alexander, Rebecca M Truty, Avanti Shrikumar, Fugen Li, Kirsten E

Eilertson, Huiming Ding, John N Wylie, Alexander R Pico, John A Capra, Genevieve Erwin, Steven J

Kattman, Gordon M Keller, Deepak Srivastava, Stuart S Levine, Katherine S Pollard, Alisha K Hol-

loway, Laurie A Boyer, and Benoit G Bruneau. Dynamic and coordinated epigenetic regulation of

developmental transitions in the cardiac lineage. Cell, 151:206–220, September 2012.

[6] David Lara-Astiaso, Assaf Weiner, Erika Lorenzo-Vivas, Irina Zaretsky, Diego Adhemar Jaitin, Eyal

David, Hadas Keren-Shaul, Alexander Mildner, Deborah Winter, Steffen Jung, , and Ido

Amit. Chromatin state dynamics during blood formation. Science (New York, N.Y.), 345:943–949,

August 2014.

[7] Constantinos Chronis, Petko Fiziev, Bernadett Papp, Stefan Butz, Giancarlo Bonora, Shan Sabri, Jason

Ernst, and Kathrin Plath. Cooperative binding of transcription factors orchestrates reprogramming.

Cell, 168:442–459.e20, January 2017.

[8] Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King,

Pavan K Auluck, Melissa L Geddie, Julie S Valastyan, David R Karger, Susan Lindquist, and Ernest

25 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Fraenkel. Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-

synuclein toxicity. Nature genetics, 41:316–323, March 2009.

[9] Nir Yosef, Alex K. Shalek, Jellert T. Gaublomme, Hulin Jin, Youjin Lee, Amit Awasthi, Chuan Wu,

Katarzyna Karwacz, Sheng Xiao, Marsela Jorgolli, David Gennert, Rahul Satija, Arvind Shakya, Di-

ana Y. Lu, John J. Trombetta, Meenu R. Pillai, Peter J. Ratcliffe, Mathew L. Coleman, Mark Bix, Dean

Tantin, Hongkun Park, Vijay K. Kuchroo, and Aviv Regev. Dynamic regulatory network controlling

TH17 cell differentiation. Nature, 496(7446):461–468, Apr 2013.

[10] Xianjun Dong, Melissa C Greven, Anshul Kundaje, Sarah Djebali, James B Brown, Chao Cheng,

Thomas R Gingeras, Mark Gerstein, Roderic Guigo,´ , and . Modeling gene

expression using chromatin features in various cellular contexts. Genome biology, 13:R53, June 2012.

[11] Thais G do Rego, Helge G Roider, Francisco A T de Carvalho, and Ivan G Costa. Inferring epigenetic

and transcriptional regulation during blood cell development with a mixture of sparse linear models.

Bioinformatics (Oxford, England), 28:2297–2303, September 2012.

[12] Ankur P Parikh, Wei Wu, Ross E Curtis, and Eric P Xing. Treegl: reverse engineering tree-evolving

gene networks underlying developing biological lineages. (Oxford, England), 27:i196–

i204, July 2011.

[13] Sushmita Roy, Margaret Werner-Washburne, and Terran Lane. A multiple network learning approach

to capture system-wide condition-specific responses. Bioinformatics, 27(13):1832–1838, 2011.

[14] Vladimir Jojic, Tal Shay, Katelyn Sylvia, Or Zuk, Xin Sun, Joonsoo Kang, Aviv Regev, ,

Immunological Genome Project Consortium, Adam J. Best, Jamie Knell, Ananda Goldrath, Vladimir

Joic, Daphne Koller, Tal Shay, Aviv Regev, Nadia Cohen, Patrick Brennan, Michael Brenner, Francis

Kim, Tata Nageswara Rao, Amy Wagers, Tracy Heng, Jeffrey Ericson, Katherine Rothamel, Adriana

Ortiz-Lopez, Diane Mathis, Christophe Benoist, Natalie A. Bezman, Joseph C. Sun, Gundula Min-

Oo, Charlie C. Kim, Lewis L. Lanier, Jennifer Miller, Brian Brown, Miriam Merad, Emmanuel L.

Gautier, Claudia Jakubzick, Gwendalyn J. Randolph, Paul Monach, David A. Blair, Michael L. Dustin,

Susan A. Shinton, Richard R. Hardy, David Laidlaw, Jim Collins, Roi Gazit, Derrick J. Rossi, Nidhi

26 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Malhotra, Katelyn Sylvia, Joonsoo Kang, Taras Kreslavsky, Anne Fletcher, Kutlu Elpek, Angelique

Bellemarte-Pelletier, Deepali Malhotra, and Shannon Turley. Identification of transcriptional regulators

in the mouse immune system. Nat Immunol, 14(6):633–643, Jun 2013.

[15] Wuming Gong, Naoko Koyano-Nakagawa, Tongbin Li, and Daniel J Garry. Inferring dynamic gene

regulatory networks in cardiac differentiation through the integration of multi-dimensional data. BMC

bioinformatics, 16:74, March 2015.

[16] Emma Pierson, GTEx Consortium , Daphne Koller, Alexis Battle, Sara Mostafavi, Kristin G Ardlie,

Gad Getz, Fred A Wright, Manolis Kellis, Simona Volpi, and Emmanouil T Dermitzakis. Sharing and

specificity of co-expression networks across 35 human tissues. PLoS Comput Biol, 11(5):e1004220,

May 2015.

[17] Christopher Koch, Jay Konieczka, Toni Delorey, Ana Lyons, Amanda Socha, Kathleen Davis, Sara A.

Knaack, Dawn Thompson, Erin K. O’Shea, Aviv Regev, and Sushmita Roy. Inference and evolutionary

analysis of genome-scale regulatory networks in large phylogenies. Cell Systems, 4(5):543 – 558.e8,

2017.

[18] Jason Ernst, Oded Vainas, Christopher T Harbison, Itamar Simon, and Ziv Bar-Joseph. Reconstructing

dynamic regulatory maps. Molecular systems biology, 3:74, 2007.

[19] Sushmita Roy and Rupa Sridharan. Chromatin module inference on cellular trajectories identifies key

transition points and poised epigenetic states in diverse developmental processes. Genome research,

27:1250–1262, July 2017.

[20] , Michael Shapira, Aviv Regev, Dana Pe’er, , Daphne Koller, and Nir Fried-

man. Module networks: identifying regulatory modules and their condition-specific regulators from

gene expression data. Nat. Genet., 34(2):166–176, May 2003.

[21] Su-In Lee, Aimee´ M. Dudley, David Drubin, Pamela A. Silver, Nevan J. Krogan, Dana Pe’er,

and Daphne Koller. Learning a prior on regulatory potential from eQTL data. PLoS Genet,

5(1):e1000358+, January 2009.

27 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

[22] R. Gherzi, M. Trabucchi, M. Ponassi, I.-E. Gallouzi, M. G. Rosenfeld, and P. Briata. Akt2-mediated

phosphorylation of controls ccnd1 mrna decay during muscle cell differentiation. Cell Death &

Differentiation, 17(6):975–983, Jun 2010.

[23] Anna B. Osipovich, Qiaoming Long, Elisabetta Manduchi, Rama Gangula, Susan B. Hipkens, Judsen

Schneider, Tadashi Okubo, Christian J. Stoeckert, Shinji Takada, and Mark A. Magnuson. Insm1 pro-

motes endocrine cell differentiation by modulating the expression of a network of genes that includes

neurog3 and ripply3. Development, 141(15):2939–2949, 2014.

[24] MF Cole, SE Johnstone, JJ Newman, MH Kagey, and RAE Young. Tcf3 is an integral component

of the core regulatory circuitry of embryonic stem cells. Genes & development, 22(6):746–755, Mar

2008.

[25] Frederic Lluis, Luigi Ombrato, Elisa Pedone, Stefano Pepe, Bradley J. Merrill, and Maria Pia Cosma.

T-cell factor 3 () deletion increases somatic cell reprogramming by inducing epigenome modifica-

tions. Proceedings of the National Academy of Sciences, 108(29):11912–11917, 2011.

[26] Greetje Elaut, Tom Henkens, Peggy Papeleu, Sarah Snykers, Mathieu Vinken, Tamara Vanhaecke, and

Vera Rogiers. Molecular mechanisms underlying the dedifferentiation process of isolated hepatocytes

and their cultures. Current Drug Metabolism, 7(6):629–660, 2006.

[27] Manman Wang, Ling Tang, Dahai Liu, Qi-Long Ying, and Shoudong Ye. The transcription factor

induces expression of kruppel-like factor 4 to maintain and induce nave pluripotency of embryonic

stem cells. Journal of Biological Chemistry, 292(41):17121–17128, 2017.

[28] Ilaria Laudadio, Isabelle Manfroid, Younes Achouri, Dominic Schmidt, Michael D. Wilson, Sabine

Cordi, Lieven Thorrez, Laurent Knoops, Patrick Jacquemin, Frans Schuit, Christophe E. Pierreux,

Duncan T. Odom, Bernard Peers, and Frederic P. Lemaigre. A feedback loop between the liver-

enriched transcription factor network and mir-122 controls hepatocyte differentiation. Gastroenterol-

ogy, 142(1):119–129, Jan 2012.

[29] Avinash Thakur, Jasper C.H. Wong, Evan Y. Wang, Jeremy Lotto, Donghwan Kim, Jung-Chien

Cheng, Matthew Mingay, Rebecca Cullum, Vaishali Moudgil, Nafeel Ahmed, Shu-Huei Tsai, Wei

28 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Wei, Colum P. Walsh, Tabea Stephan, Misha Bilenky, Bettina M. Fuglerud, Mohammad M. Karimi,

Frank J. Gonzalez, Martin Hirst, and Pamela A. Hoodless. Hepatocyte nuclear factor 4-alpha is essen-

tial for the active epigenetic state at enhancers in mouse liver. Hepatology, 70(4):1360–1376, 2019.

[30] Wenjie Yu, Xiao Li, Steven Eliason, Miguel Romero-Bustillos, Ryan J. Ries, Huojun Cao, and Brad A.

Amendt. Irx1 regulates dental outer enamel epithelial and lung alveolar type ii epithelial differentiation.

Developmental biology, 429(1):44–55, Sep 2017. 28746823[pmid].

[31] Chih-I Tai and Qi-Long Ying. Gbx2, a lif/ target, promotes reprogramming to and retention of the

pluripotent ground state. Journal of Cell Science, 126(5):1093–1098, 2013.

[32] Raven Diacou, Yilin Zhao, Deyou Zheng, Ales Cvekl, and Wei Liu. Six3 and six6 are jointly required

for the maintenance of multipotent retinal progenitors through both positive and negative regulation.

Cell Reports, 25(9):2510 – 2523.e4, 2018.

[33] Silvia Buonamici, Soumen Chakraborty, Vitalyi Senyuk, and Giuseppina Nucifora. The role of evi1 in

normal and leukemic cells. Blood Cells, Molecules, and Diseases, 31(2):206 – 212, 2003.

[34] Jian Ming Khor, Jennifer Guerrero-Santoro, and Charles A. Ettensohn. Genome-wide identification of

binding sites and gene targets of alx1, a pivotal regulator of echinoderm skeletogenesis. Development,

146(16), 2019.

[35] Manu Setty and Christina S. Leslie. Seqgl identifies context-dependent binding signals in genome-

wide regulatory element maps. PLOS Computational Biology, 11(5):1–21, 05 2015.

[36] Jiayu Zhou, Jianhui Chen, and Jieping Ye. Malsar: Multi-task learning via structural regularization –

users manual version 1.1, 2012.

[37] Yu. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming,

103(1):127–152, May 2005.

[38] Yu. Nesterov. Gradient methods for minimizing composite objective function. CORE Discussion

Papers 2007076, Universit catholique de Louvain, Center for Operations Research and Econometrics

(CORE), 2007.

29 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

[39] Nimet Maherali, Rupa Sridharan, Wei Xie, Jochen Utikal, Sarah Eminli, Katrin Arnold, Matthias

Stadtfeld, Robin Yachechko, Jason Tchieu, Rudolf Jaenisch, Kathrin Plath, and Konrad Hochedlinger.

Directly reprogrammed fibroblasts show global epigenetic remodeling and widespread tissue contribu-

tion. Cell stem cell, 1:55–70, June 2007.

[40] Rupa Sridharan, Jason Tchieu, Mike J Mason, Robin Yachechko, Edward Kuoy, Steve Horvath, Qing

Zhou, and Kathrin Plath. Role of the murine reprogramming factors in the induction of pluripotency.

Cell, 136:364–377, January 2009.

[41] Rupa Sridharan, Michelle Gonzales-Cope, Constantinos Chronis, Giancarlo Bonora, Robin McKee,

Chengyang Huang, Sanjeet Patel, David Lopez, Nilamadhab Mishra, Matteo Pellegrini, Michael

Carey, Benjamin A Garcia, and Kathrin Plath. Proteomic and genomic approaches reveal critical

functions of H3K9 methylation and heterochromatin protein-1γ in reprogramming to pluripotency.

Nature cell biology, 15:872–882, July 2013.

[42] Ben Langmead and Steven L Salzberg. Fast gapped-read alignment with bowtie 2. Nature methods,

9:357–359, March 2012.

[43] Bo Li and Colin N Dewey. Rsem: accurate transcript quantification from rna-seq data with or without

a reference genome. BMC bioinformatics, 12:323, August 2011.

[44] Aaron R. Quinlan and Ira M. Hall. BEDTools: a flexible suite of utilities for comparing genomic

features. Bioinformatics, 26(6):841–842, 01 2010.

[45] Richard I. Sherwood, Tatsunori Hashimoto, Charles W. O’Donnell, Sophia Lewis, Amira A. Barkal,

John Peter van Hoff, Vivek Karun, Tommi Jaakkola, and David K. Gifford. Discovery of directional

and nondirectional pioneer transcription factors by modeling dnase profile magnitude and shape. Nat

Biotechnol, 32(2):171–178, Feb 2014.

[46] V Matys, E Fricke, R Geffers, E Gossling,¨ M Haubrock, R Hehl, K Hornischer, D Karas, A E Kel,

O V Kel-Margoulis, D-U Kloos, S Land, B Lewicki-Potapov, H Michael, R Munch,¨ I Reuter, S Rotert,

H Saxel, M Scheer, S Thiele, and E Wingender. Transfac: transcriptional regulation, from patterns to

profiles. Nucleic acids research, 31:374–378, January 2003.

30 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

[47] Albin Sandelin, Wynand Alkema, Par¨ Engstrom,¨ Wyeth W Wasserman, and Boris Lenhard. Jaspar:

an open-access database for eukaryotic transcription factor binding profiles. Nucleic acids research,

32:D91–D94, January 2004.

[48] Michael F Berger, Anthony A Philippakis, Aaron M Qureshi, Fangxue S He, Preston W Estep, and

Martha L Bulyk. Compact, universal microarrays to comprehensively determine transcription-

factor binding site specificities. Nature biotechnology, 24:1429–1435, November 2006.

[49] Timothy Ravasi, Harukazu Suzuki, Carlo V. Cannistraci, Shintaro Katayama, Vladimir B. Bajic, Kai

Tan, Altuna Akalin, Sebastian Schmeier, Mutsumi Kanamori-Katayama, Nicolas Bertin, Piero Carn-

inci, Carsten O. Daub, Alistair R. R. Forrest, Julian Gough, Sean Grimmond, Jung-Hoon Han, Takehiro

Hashimoto, Winston Hide, Oliver Hofmann, Atanas Kamburov, Mandeep Kaur, Hideya Kawaji, Atsu-

taka Kubosaki, Timo Lassmann, Erik van Nimwegen, Cameron R. MacPherson, Chihiro Ogawa, Alek-

sandar Radovanovic, Ariel Schwartz, Rohan D. Teasdale, Jesper Tegner,´ Boris Lenhard, Sarah A. Te-

ichmann, Takahiro Arakawa, Noriko Ninomiya, Kayoko Murakami, Michihira Tagami, Shiro Fukuda,

Kengo Imamura, Chikatoshi Kai, Ryoko Ishihara, Yayoi Kitazume, Jun Kawai, David A. Hume, Trey

Ideker, and Yoshihide Hayashizaki. An atlas of combinatorial transcriptional regulation in mouse and

man. Cell, 140(5):744–752, March 2010.

[50] Charles E. Grant, Timothy L. Bailey, and William Stafford Noble. Fimo: scanning for occurrences of

a given motif. Bioinformatics, 27(7):1017–1018, Apr 2011.

[51] Matthew T. Weirauch, Ally Yang, Mihai Albu, Atina G. Cote, Alejandro Montenegro-Montero, Philipp

Drewe, Hamed S. Najafabadi, Samuel A. Lambert, Ishminder Mann, Kate Cook, Hong Zheng, Ale-

jandra Goity, Harm van Bakel, Jean-Claude Lozano, Mary Galli, Mathew G. Lewsey, Eryong Huang,

Tuhin Mukherjee, Xiaoting Chen, John S. Reece-Hoyes, Sridhar Govindarajan, Gad Shaulsky, Al-

bertha J M. Walhout, Franc¸ois-Yves Bouget, Gunnar Ratsch, Luis F. Larrondo, Joseph R. Ecker, and

Timothy R. Hughes. Determination and inference of eukaryotic transcription factor sequence speci-

ficity. Cell, 158(6):1431–1443, Sep 2014.

[52] Jolanta Jura, Paulina Wegrzyn, Michal Korostynski, Krzysztof Guzik, Malgorzata Oczko-

Wojciechowska, Michal Jarzab, Malgorzata Kowalska, Marcin Piechota, Ryszard Przewlocki, and

31 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Aleksander Koj. Identification of interleukin-1 and interleukin-6-responsive genes in human

monocyte-derived macrophages using microarrays. Biochimica et Biophysica Acta (BBA) - Gene Reg-

ulatory Mechanisms, 1779(6):383 – 389, 2008.

32 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Main Figures

33 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Motif network + Cell lineage tree+ Cell type-specific expression and chromatin marks Genes

G RNAseq A TCCAAT H3k27ac H3k4me3 ATACseq C CTATCGC

Genes

GC A RNAseq GC CGAGCGC H3k27ac H3k4me3 ATACseq

Dynamic Regulatory Module Networks (DRMN) Expression state gene module Predicted cis-regulators High expression gene Low expression gene

Πi→ j Dynamic increased matrices

gene Transition decreased

34 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 1: The outline of Dynamic Regulatory Module Networks (DRMN) method. Inputs are a lineage tree over the cell types, cell type-specific expression levels, a shared skeleton regulatory network (e.g. sequence specific motif network), and optionally cell type-specific features such as histone modification marks or chrmatin accessibility signal. The output is a learned DRMN, which consists of cell type-specific expression state modules, their regulatory programs, and transition matrices describing dynamics between the cell types.

35 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

DRMN-ST IPSC/ESC preIPS Histone+Motif Motif Histone Histone+Q-Motif A(i) IPSC MEF48 MEF ATAC+Motif Q-Motif Histone+ATAC+Motif Histone+ATAC+Q-Motif Array, Histone+Motif 14 avgCC=0.73 DRMN-ST DRMN-ST allCC=0.97 B Array C Sequencing 0.61 1.0 1.0 0.84 0.86 0.80 0.9 0.9 0.53 0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5 Corr 0 14 0.4 0.4 DRMN-ST 0.3 0.3 (ii) ESC 0.2 0.2 Sequencing, Histone+Motif 0.1 0.1 9 avgCC=0.83 0 0 allCC=0.98 0.68 DRMN-Fused DRMN-Fused 0.87 0.93 D Array E Sequencing 0.92 1.0 1.0 0.76 0.9 0.9

0.8 0.8

rdce Predicted Predicted 0.7 0.7

0.6 0.6

0 10 0.5 0.5 Corr DRMN-Fused 0.4 0.4 (iii) IPSC 0.3 0.3 Array, Histone+Motif 14 0.2 0.2 avgCC=0.78 allCC=0.98 0.1 0.1 0.64 0.88 0 0 0.91 3 5 7 119 3 5 7 119 0.86 #Modules #Modules 0.62 F Array G Sequencing 1.0 1.0

0.9 0.9

0.8 0.8 0 14 DRMN-Fused 0.7 0.7 (iv) ESC Sequencing, Histone+Motif 0.6 0.6 8 avgCC=0.83 0.5 0.5 allCC=0.98 0.71 0.90 0.4 0.4 0.94 0.92 DRMN-Fused 0.71 0.3 0.3

0.2 0.2 rdce Predicted Predicted

0.1 0.1

0 0 0 10 0.20 0.6 0.80.4 1.0 0.20 0.6 0.80.4 1.0 Actual DRMN-ST DRMN-ST

36 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 2: (A) Predicted expression vs. observed expression, for iPSC/ESC, for (i) DRMN-ST on array dataset, (ii) DRMN-ST on sequencing dataset, (iii) DRMN-FUSED on array dataset, and (iv) DRMN- FUSED on sequencing dataset. Average per-module correlation for individual cell lines as a function of different number of modules, for (B) DRMN-ST on array dataset, (C) DRMN-ST on sequencing dataset, (D) DRMN-FUSED on array dataset, and (E) DRMN-FUSED on sequencing dataset. Average per-module correlation for individual cell lines for DRMN-ST vs. DRMN-FUSED for (F) array dataset, and (G) se- quencing dataset. Each shape correspond to a cell line and each color correspond to a different feature set.

37 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

IPSC/ESC preIPS MEF48 MEF DRMN RMN A DRMN-ST BCDRMN-ST DRMN-ST Sequencing, Histone Sequencing, Histone+Motif Sequencing, Motif 0.8 0.36 0.8 0.7 0.32 0.6 0.7 0.28 0.5

Corr 0.6 0.4 0.24 0.5 0.3 0.20 0.2 0.4

D DRMN-Fused EFDRMN-Fused DRMN-Fused Sequencing, Histone Sequencing, Histone+Motif Sequencing, Motif 0.9 0.8 0.6 0.8 0.7 0.5 0.7 Corr 0.6 0.6 0.4 0.5 0.5 0.3

G DRMN-ST HIDRMN-ST DRMN-ST Array, Histone Array, Histone+Motif Array, Motif 0.6 0.45 0.7 0.5 0.40

0.4 0.6 0.35

Corr 0.30 0.3 0.5 0.25 0.2 0.4 0.20 J DRMN-Fused KLDRMN-Fused DRMN-Fused Array, Histone Array, Histone+Motif Array, Motif 0.78 0.80 0.6 0.75 0.74

0.70 0.5 0.70 Corr 0.65 0.66 0.4 0.60

0.62 0.55 0.3 3 5 7 9 11 3 5 7 9 11 3 5 7 9 11 #Modules #Modules #Modules

38 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 3: Average per-module correlation for individual cell lines as a function of different number of modules for single task and multi task versions of the method, for A-C) DRMN-ST on sequeincing dataset, D-F) DRMN-FUSED on sequeincing dataset, G-I) DRMN-ST on array dataset, and J-L) DRMN-FUSED on array dataset. Each shape correspond to a cell line and each color correspond to a different method.

39 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

AB 3955 ESC pre-IPSC MEF 48hr MEF ESC pre-IPSC MEF 48hr MEF ESC pre-IPSC MEF 48hr MEF ESC pre-IPSC MEF 48hr MEF ESC pre-IPSC MEF 48hr MEF ESC pre-IPSC MEF 48hr MEF ESC pre-IPSC MEF 48hr MEF Obox1 Elk4 Tgif2 H3K79me2 H3K79me2 H3K79me2 Obox3

3425 Cebpa Pax5 Hlf Dlx1 Pitx1 Hoxa7 Pknox2 Pou3f3 Myf6 Bbx Tgif2 Nkx2-6 H3K27me3 Insm1 3162 Sox13 Stat3 Pdx1 H3K9me3 Pou5f1 Obox5 H3K4me1 Isx Hoxc6 Nkx1-1 Alx4 Nr3c1 Zfx

2771 Bsx Max Nkx6-3 Nkx2-2 Hoxd8 Hoxa10 Sp1 Obox6 Tfap2b Hnf1b Junb Lhx6 H3K9ac Module 7 Emx2 Vps33a Pou3f3 Esrrb Arid3a Nkx6-1 Sox12 Isl2 Foxa1 Nr3c1 Mycn Myc Junb Crx Zfx Yap1 Hl� 2024 Elk4 Hbp1 Sox18 H3K4me2 Sox13 Pou5f1

Gata6 Sox11 H3K4me2 Plag1 Esr2 Zbtb12 ESC pre-IPSC MEF 48hr MEF Obox2 Mycn Pou5f1 Plagl1 Mzf1 Tbp Hoxd3 1542 Pou2f3 Zfp40 Zbtb12 Zfp740 H3K36me3 Sox13 H3K79me2

479 Tfcp2l1 Nkx1-1 Obox5 Zscan4c Pknox2 Esrra Mycn Barhl1 H3K4me3 Foxi1 A�1 Pparg Nfe2l2 Nfe2l2 ESC Zbtb7b Eomes Pparg Nfya Insm1 Gata6 Zfp945 Zfp40 H3K27ac Nr2f1 Spz1 Zbtb7b H3K4me3 H3K27me3 Foxj3 Pbx1 Foxk1 Myf6 Plag1 H3K4me1 Nrg1 Hnf1a H3K9ac Trp53 N�b1 Creb1 Zscan4c Dll1 Pou2f2 Insm1 Nfe2l2 Esrra Myf6 Ebf1 Bhlhe40 M�1 Hnf4a Msx3 Mzf1 Nkx3-1 Sox12 Plag1 Foxl1 Sox13 M�1 Sox13 Zscan26 Zscan26 Tcf3 Creb1 Sox9 Spz1 Runx1 A�1 Plag1 Esr1 Plag1 Tfap2a Insm1 Egr1 Dll1 Rreb1 Zic2 Nkx2-4 Plag1 Elf3 Sp1 Gabpa Zfp281 Zscan4c Nkx2-2 Foxa2 Hnf4a Hic1 Tfap2a Zfp143 Trp53 Six4 Runx1 Zbtb3 Plagl1 H3K4me2 Stat3 Tfap2c Ma� H3K9me3 Tfap2e Zscan4c Nfya Ascl2 Max Zfp105 Nfya H3K4me3 H3K4me3 Zfp143 Gabpa Runx1 Hdx A�1 Nfe2l2 Stat3 Myf6 Zfx Foxk1 Sox2 Plagl1 Nhlh1 Esrrb Sp1 Zscan4c

4297 Insm1 Tfap4 Rara Pou2f2 Nr2f1 Trp53 Elf3 Hoxa10 Nkx1-2 Mzf1 H3K27ac Mycn Nrg1 Tbp Irf3 Dlx3 N�b1 Hoxa3 Nhlh1 Sox17 Gata6 Nfya H3K27me3 Nhlh1 Pou3f3 Zfp105 Smad3 Esrrb H3K79me2 Hoxc13 Zfp143 Sox30 M�1 Mecom H3K4me1 Esrra Hoxa6 Zic1 Hoxc9 Hnf1b Egr1 H3K4me2

3217 Smad3 Hoxd13 Rest H3K27me3 Foxa1 H3K9me3 Sox13 Zfp691 Obox6 Zfp410 Hoxb6 Esrra Sox5 2999 H3K9me3 Rest Cux1 Tfap2c Pou5f1 Zbtb12 Rel Module 6 H3K9ac Hoxc12 Smad3 Lhx3 Nkx1-1 Tcf7 2637 Tfap2a H3K79me2 Pax4 Module 3 Pbx1 Gata1 Zfp143 Homez Zscan4c Nfe2l2 Sox30

2239 Tfap2b Foxq1 Irf3 H3K9me3 Foxa1 Rara Hoxa1 Mecom Glis2 Nkx2-6 H3K27me3 Lhx3 Zfp423 Rhox11 Pitx2 Nkx2-2 Hoxd10 Tfap2e Hdx Hoxc13 Pax4 1507 Upf1 Module 2 Ar Module 5 Pre 462 Nrg1 Module 2 Vps33a Foxf2 Upf1 Tfap2e H3K9ac iPSC Lhx6 H3K27ac Rel H3K27me3 H3K4me2 Pitx2 Rxra Regression coefficients Module 4 Tlx2 Cdx2 Irx4 Nkx2-6 0.0 0.1 0.1 0.2 0.3 Lhx4 Module 1 3725 3194 2942

2636 C 2387 1769 pre-iPSC pre-iPSC pre-iPSC pre-iPSC MEF MEF 48h MEF MEF 48h MEF MEF 48h MEF MEF 48h ESC ESC ESC ESC

MEF 705 48hr MEF 0.47 0.16 0.16 0.56 0.28 0.28 0.56 0.22 0.22 0.64 0.23 0.24

MEF 48h 0.47 0.22 0.22 0.56 0.32 0.32 0.56 0.27 0.27 0.64 0.27 0.27

pre-iPSC 0.16 0.22 0.99 0.28 0.32 0.99 0.22 0.27 0.98 0.23 0.27 0.99

ESC 0.16 0.22 0.99 0.28 0.32 0.99 0.22 0.27 0.98 0.24 0.27 0.99 Module1 Module2 Module3 Module4 log(TPM+1) 3755 6 0.75 0.27 0.27 0.85 0.31 0.31 0.91 0.46 0.46 0.68 0.28 0.28

3165 MEF

2939 5

2636 MEF 48h 0.75 0.30 0.30 0.85 0.34 0.34 0.91 0.46 0.46 0.68 0.32 0.32 2396

4 pre-iPSC 0.27 0.30 0.99 0.31 0.34 1.00 0.46 0.46 1.00 0.28 0.32 0.99 1762 3 ESC 0.27 0.30 0.99 0.31 0.34 1.00 0.46 0.46 1.00 0.28 0.32 0.99 MEF 705 2 Module5 Module6 Module7 Overall Jaccard 1 0.0 1.0 0 Modules: 1 2 3 4 5 6 7

40 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 4: Annotation of DRMN modules, for models trained on chromatin+q-motif with k = 7 modules, for sequencing dataset. (A) The gene expression pattern of the 7 modules. The number above the heatmap correspond to the number of genes in that module. (B) Inferred regulatory program for different modules and cell lines. (C) Similarity of modules across cell lines.

41 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A Size Size Size pre-iPSC pre-iPSC pre-iPSC pre-iPSC pre-iPSC pre-iPSC MEF 48h MEF MEF 48h MEF MEF 48h MEF MEF 48h MEF MEF 48h MEF MEF 48h MEF ESC ESC ESC ESC ESC ESC C119 2 4 3 3 19 C99 2 2 1 1 837 C127 3 2 6 6 7 C133 2 3 1 1 24 C128 4 3 1 1 15 C111 2 1 5 5 13 C211 2 6 1 1 7 C105 4 3 2 2 86 C221 4 3 7 7 19 C116 2 4 1 1 13 C189 6 3 1 1 10 C131 2 1 6 6 8 C89 2 4 2 2 22 C185 5 4 3 3 51 C219 4 4 7 7 38 C110 2 3 2 2 78 C122 3 2 1 1 123 C223 2 1 4 4 18 C186 4 5 4 4 50 C206 5 3 2 2 28 C215 3 3 5 5 117 C96 1 2 1 1 303 C192 6 2 5 5 7 C214 2 2 4 4 133 C139 1 4 1 1 8 C158 3 1 3 3 8 C97 4 4 6 6 102 C207 2 6 2 2 10 C121 4 2 3 3 19 C91 2 2 5 5 109 C114 3 4 3 3 55 C101 2 1 2 2 159 C113 3 3 6 6 77 C187 4 5 2 2 7 C108 3 2 3 3 74 C125 2 2 7 7 20 C200 4 5 3 3 15 C102 4 3 4 4 45 C88 1 1 3 3 219 C216 3 4 2 2 25 C129 5 4 5 5 37 C95 1 1 2 2 675 C228 3 4 1 1 8 C160 4 2 4 4 7 C92 2 2 6 6 52 C138 3 1 2 2 26 C188 3 5 5 5 9 C120 1 1 4 4 78 C202 5 3 4 4 14 C182 4 5 5 5 116 C136 1 1 5 5 35 C201 5 3 3 3 23 C100 1 2 2 2 243 C134 3 3 7 7 14 C181 5 2 2 2 42 C213 3 4 4 4 134 C147 1 1 7 7 20 C141 4 1 1 1 45 C107 2 3 3 3 122 C84 1 1 6 6 34 C123 3 1 1 1 119 C226 2 4 4 4 10 expr

C210 5 1 1 1 49 C194 4 6 7 7 7 0.0 4.0 8.0 C93 2 1 1 1 319 C229 3 5 6 6 20 size C98 4 3 3 3 167 C132 2 3 4 4 54 0.0 100 200 C106 4 2 2 2 47 C198 4 5 6 6 72 C86 3 2 2 2 197 C145 2 3 5 5 9 C204 5 4 4 4 78 C112 3 4 5 5 71 B C183 5 2 1 1 57 C222 2 4 5 5 8 TCF3 pre-iPSC pre-iPSC pre-iPSC pre-iPSC MEF 48h MEF MEF 48h MEF MEF 48h MEF MEF 48h MEF pre-iPSC ESC ESC ESC ESC MEF 48h MEF ESC C83 4 2 1 1 59 C154 1 1 3 2 7 Bspry 5 2 1 1 Ildr1 5 2 1 1 C117 4 1 2 2 10 C217 3 2 4 4 20 Clstn3 5 2 1 1 C208 5 1 2 2 15 C142 4 3 5 5 17 Nefh 5 2 1 1 Lrrc34 5 2 1 1 C171 7 7 5 5 8 C170 6 4 7 7 11 Nodal 5 2 1 1 8 4 0 Ooep 5 2 1 1 C115 3 3 1 1 50 C197 5 4 6 6 11 Dlgap3 5 2 1 1 C124 3 3 2 2 498 C109 2 1 3 3 57 Cgn 5 2 1 1 Fgf17 5 2 1 1 C227 5 5 1 1 10 C224 4 3 6 6 26 Spnb3 5 2 1 1 Gli1 5 2 1 1 C203 6 6 4 4 71 C143 3 2 5 5 18 Ybx2 5 2 1 1 C199 5 5 2 2 27 C94 3 3 4 4 460 Esrp1 5 2 1 1 Tfap2c 5 2 1 1 C168 6 6 2 2 17 C218 1 2 5 5 8 Sh3gl2 5 2 1 1 5 2.5 0 Dtx1 5 2 1 1 Expression Histone mark C225 4 4 1 1 22 C87 2 2 3 3 401 Lad1 5 2 1 1 C184 6 6 3 3 25 C169 6 6 7 7 160 Crb3 5 2 1 1 Krt17 5 2 1 1 C104 5 5 3 3 124 C90 4 4 5 5 557 Cpne5 5 2 1 1 Lin28a 6 2 1 1 C209 6 6 5 5 372 C212 5 5 7 7 35 Mycn 6 2 1 1 C180 5 5 4 4 615 C157 2 3 6 6 7 Dnmt3l 6 2 1 1 Palm3 6 2 1 1 C85 4 4 3 3 699 C205 5 5 6 6 470 Trim6 6 2 1 1 201 Cldn4 6 2 1 1 C179 7 7 6 6 67 C103 1 2 3 3 44 Krt42

6 2 1 1 TF binding C220 4 4 2 2 95 C149 1 2 4 4 15 Modules Exp H3K27H3K79 Bcl6b me3 me2

42 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 5: (A) Bird’s eye view of groups of transitioning genes. On left is the average expression of the genes in the group, in the middle is module assignments, and on right the number of genes in that group. (B) Mod- ule assignments, gene expression, presence or absence of TCF3 motif, and feature values for H3K27me3, H3K79me2, and Bcl6b, for genes that transition from low to high expression between differentiated or partially reprogrammed cells and ESCs/iPSC.

43 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 0h 0.5h 1h 2h 4h 6h 8h 10h 12h 14h 16h 18h 20h 22h 24h 36h 0h 0.5h 1h 2h 4h 6h 8h 10h 12h 0h 0.5h 1h 2h 4h 6h 8h 10h 12h 14h 16h 18h 20h 22h 24h 36h 0h 0.5h 1h 2h 4h 6h 8h 10h 12h 14h 16h 18h 20h 22h 24h 36h 14h 16h 18h 20h 22h 24h 36h M5847_1.02_Sox10::Sox8::Sox9 M5303_1.02_Bcl6b M6529_1.02_Ubp1::Tcfcp2l1::Tcfcp2 M5972_1.02_Zfp410 M5848_1.02_Sox10::Sox8::Sox9 M5831_1.02_Sox21::Sox2::Sox14 M1237_1.02_Pou2f1 M0118_1.02_Zfp957 M5811_1.02_Sox10::Sox8::Sox9 M3666_1.02_Rest M1999_1.02_Gsc2::Alx3::Alx1 M2324_1.02_Zfp263 M1434_1.02_Nr2f6::Nr2f1::Nr2f2 M5698_1.02_Crx::Prrxl1::Otx1 M6464_1.02_Smad2::Smad3 M4604_1.02_Zfp263 M1495_1.02_Nr2c2::Nr2f6::Nr2f1 M5718_1.02_Pitx2::Pitx3::Pitx1 M1100_1.02_Six6::Six3 M2041_1.02_Irx5::Irx4::Irx6 M6383_1.02_Nr1h2::Nr1h3 M0863_1.02_Crx M4734_1.02_Tcf4::Tcf3::Tcf12 M3298_1.02_Foxa1::Foxa3::Foxa2 M2917_1.02_Ahr::Ahrr M6073_1.02_Rara::Rarb::Rarg M5040_1.02_Pknox2::Meis3::Meis2 M6438_1.02_Prop1 M5626_1.02_Meox1::Meox2 M1999_1.02_Gsc2::Alx3::Alx1 M2019_1.02_Irx5::Irx4::Irx6 M1999_1.02_Gsc2::Alx3::Alx1 A 4649 B M6093_1.02_Sox3 M5999_1.02_Hlf::Tef::Dbp M4010_1.02_Tbp::Tbpl2 M5697_1.02_Onecut3::Onecut2::Onecut1 3625 M5851_1.02_Sox10::Sox8::Sox9 M5941_1.02_Alx3::Alx1::Uncx M5715_1.02_Prrx1::Prrx2::Alx3 M5776_1.02_Rfx8::Rfx4::Rfx5 3195 M6054_1.02_Msx1::Msx2::Msx3 M5217_1.02_Ahr::Ahrr M6554_1.02_Zfp238 M5723_1.02_Pou1f1::Pit1 M2586_1.02_Tcf7l2 M0742_1.02_Foxa1::Foxa3::Foxa2 M1497_1.02_Nr2c2 M3019_1.02_Cux1::Cux2 2098 M1041_1.02_Hoxc13::Hoxa13::Hoxd13 M5581_1.02_Irx5::Irx4::Irx6 M4522_1.02_Etv1::Fev::Etv3 M5040_1.02_Pknox2::Meis3::Meis2 M5449_1.02_Foxa1::Foxa3::Foxa2 M1929_1.02_Trp53 M4853_1.02_Alx1::Alx4::Vsx1 M4567_1.02_Foxa1::Foxa3::Foxa2 1226 M6250_1.02_Foxa1::Foxa3::Foxa2 M2118_1.02_Nfya M6491_1.02_Stat5b::Stat5a M2101_1.02_Foxj3 M5894_1.02_Tbx21::Eomes::Tbr1 M4717_1.02_Nkx2-3::Nkx2-6::Nkx2-5 M5598_1.02_Lhx9::Lhx2 M3732_1.02_Pbx4::Pbx2::Pbx3 M1084_1.02_Nobox M1246_1.02_Pou5f1::Pou3f1::Pou2f2 M4909_1.02_Hoxb8::Hoxd8::Hoxd4 M6521_1.02_Thrb::Thra M1490_1.02_Nr2e1 M5786_1.02_Rorc::Rorb::Rora M5657_1.02_Nfatc1::Nfatc2::Nfatc3 M2118_1.02_Nfya M3327_1.02_Gata6::Gata4::Gata5 M0195_1.02_Tcf15::Msc::Tcf21 M2008_1.02_Six6::Six3 M4453_1.02_Bcl11a::Bcl11b M5356_1.02_E2f2 M5482_1.02_Gbx2::Gbx1 M0951_1.02_Esx1 M0435_1.02_Zkscan5 4356 M3678_1.02_Pou3f1::Pou2f2::Pou2f1 M5838_1.02_Sox10::Sox8::Sox9 M4780_1.02_Barx1::Bsx M5071_1.02_Lbx2::Lbx1 3897 M2558_1.02_Hoxc8::Hoxb8::Hoxd8 M4551_1.02_Zfp110::Zfp369 M2464_1.02_Srf M1265_1.02_Irf7 M6073_1.02_Rara::Rarb::Rarg M2056_1.02_Twist1::Twist2 M5523_1.02_Hnf1b::Hnf1a M1418_1.02_Gm239::Gm98 3206 M0696_1.02_Ehf::Elf3::Elf5 M6420_1.02_Plag1 M6284_1.02_Hnf4g::Hnf4a M1952_1.02_Hoxd4::Hoxb3::Hoxd3 M0697_1.02_Ehf::Elf3::Elf5 M0882_1.02_Isx M5986_1.02_Prrx1::Prrx2::Dmbx1 M0123_1.02_Phf21a::Setbp1::Ahc�1 2105 M5131_1.02_Tcf4::Tcf3::Tcf12 M0214_1.02_Tcf15::Msc::Tcf21 M1054_1.02_Bsx M4931_1.02_Emx2::Emx1 M4703_1.02_Nr2f6::Nr2f1::Nr2f2 M5884_1.02_Tbx20::Tbx1 M5752_1.02_Prop1 M2276_1.02_Etv1::Fev::Etv3 1229 M5014_1.02_Tcf4::Tcf3::Tcf12 M5337_1.02_Hlf::Tef::Dbp M5326_1.02_Creb3l2 M1419_1.02_Gm239::Gm98 M5625_1.02_Meox1::Meox2 M1096_1.02_Gsc2::Alx3::Alx1 M6012_1.02_Foxa1::Foxa3::Foxa2 M4537_1.02_E2f5:: M6425_1.02_Pou3f1 M5714_1.02_Prrx1::Prrx2::Alx3 M5697_1.02_Onecut3::Onecut2::Onecut1 M4010_1.02_Tbp::Tbpl2 M6302_1.02_Hoxc13::Hoxa13::Hoxd13 M5715_1.02_Prrx1::Prrx2::Alx3 M0631_1.02_Dmrt1 M4522_1.02_Etv1::Fev::Etv3 M6285_1.02_Onecut3::Onecut2::Onecut1 M6401_1.02_Crx::Otx1::Otx2 M3985_1.02_Stat5b::Stat5a M6109_1.02_Vdr M0947_1.02_Nkx2-3::Nkx3-2::Nkx3-1 M0985_1.02_Irx5::Irx4::Irx6 M3551_1.02_Mef2c::Mef2b::Mef2a M0730_1.02_Foxo3::Foxo1::Foxo6 4361 M1964_1.02_Myc M3385_1.02_Foxa1::Foxa3::Foxa2 M0916_1.02_Evx1::Evx2 M0726_1.02_ENSMUSG00000090020::Foxa1::Foxa3 3854 M5354_1.02_E2f2 M6338_1.02_Nr3c2::Nr3c1::Ar M1887_1.02_Mzf1 M0999_1.02_NP_083278.1::Nkx6-2::Nkx6-3 M1091_1.02_Evx1::Evx2 M3226_1.02_Mecom M4530_1.02_Fosb::Fosl2::Fosl1 M5853_1.02_Sp8::Sp9::Sp1 3227 M6415_1.02_Hoxd4::Hoxb3::Hoxd3 M4703_1.02_Nr2f6::Nr2f1::Nr2f2 M3913_1.02_Sox13::Sox6::Sox5 M6226_1.02_Etv1::Fev::Etv3 M0918_1.02_Evx1::Evx2 M5972_1.02_Zfp410 M5528_1.02_Hnf4g::Hnf4a M1991_1.02_Lmx1a::Lmx1b 2122 M6026_1.02_Hoxd10::Hoxd11::Hoxc11 M1083_1.02_Alx3::Alx1::Uncx M6384_1.02_Nr1h4::Nr1h5::Nr1h3 M6339_1.02_Mecp2 M5775_1.02_Rfx8::Rfx4::Rfx5 M5413_1.02_Esrrg::Esrra::Esrrb M2102_1.02_Foxj1::Foxj3::Foxk2 M6040_1.02_Klf12::Klf7::Klf6 1229 M5535_1.02_Hoxc13::Hoxa13::Hoxd13 M3401_1.02_Hmx1::Hmx3::Hmx2 M3414_1.02_Hnf4g::Hnf4a M1970_1.02_Nfic::Nfib::Nfia M6504_1.02_Tbx2::Tbx3 M4964_1.02_Foxa1::Foxa3::Foxa2 M0742_1.02_Foxa1::Foxa3::Foxa2 M5099_1.02_Smad5::Smad9::Smad1 M0710_1.02_Etv1::Fev::Etv2 M5629_1.02_Mga M5776_1.02_Rfx8::Rfx4::Rfx5 M6547_1.02_Zfa::Zfx::Zfy1 M6412_1.02_Pbx4::Pbx2::Pbx3 M4945_1.02_En1::En2 M4968_1.02_Tcf4::Tcf3::Tcf12 M4951_1.02_Etv1::Fev::Etv3 M5276_1.02_Phf21a::Setbp1::Ahc�1 M6070_1.02_Rara::Rarb::Rarg M0412_1.02_Zic4::Zic2::Zic3 M1459_1.02_Rorc::Rorb::Rora M5540_1.02_Hoxc13::Hoxa13::Hoxd13 M0902_1.02_Lhx9::Lhx2 M6304_1.02_Hoxc9::Hoxc8::Hoxb8 M5097_1.02_Klf12::Klf7::Klf6 4319 M5692_1.02_Olig2::Olig1::Bhlhe22 M1902_1.02_Prrx2::Arx::Alx1 M6370_1.02_N�b2 M4539_1.02_Etv1::Fev::Etv3 M5524_1.02_Hnf4g::Hnf4a M6502_1.02_Tbp::Tbpl2 M5035_1.02_Nr2e3 M3926_1.02_Sp8::Sp9::Sp1 3790 M6516_1.02_Tcf4::Tcf3::Tcf12 M5502_1.02_Gsx2::Hoxa4::Gsx1 M5922_1.02_Tcfap2a::Tcfap2c::Tcfap2b M6529_1.02_Ubp1::Tcfcp2l1::Tcfcp2 3300 M0936_1.02_Gsc::Gsc2 M5205_1.02_Six1::Six2 M0641_1.02_Dmrta2::Dmrt2::Dmrta1 M5592_1.02_Klf13::Klf11::Klf16 M5859_1.02_Spdef M6102_1.02_Tcfap2a::Tcfap2c::Tcfap2b M5919_1.02_Tcfap2a::Tcfap2c::Tcfap2b M0421_1.02_Klf12::Klf7::Klf6 2142 M5674_1.02_Nr2e1 M2041_1.02_Irx5::Irx4::Irx6 M2041_1.02_Irx5::Irx4::Irx6 M6324_1.02_Klf12::Klf7::Klf6 M6502_1.02_Tbp::Tbpl2 M5597_1.02_Lhx9::Lhx2 M6504_1.02_Tbx2::Tbx3 M4598_1.02_E2f5::E2f4 1242 M5879_1.02_Tbx10::Tbx20::Tbx1 M5360_1.02_E2f5::E2f4 M0889_1.02_Lhx9::Lhx2 M4376_1.02_Foxj1::Foxj3::Foxk2 4 M4462_1.02_Etv1::Fev::Etv3 M5192_1.02_Six4::Six5 M0902_1.02_Lhx9::Lhx2 M2008_1.02_Six6::Six3 M3990_1.02_Stat5b::Stat5a M5787_1.02_Rorc::Rorb::Rora M5502_1.02_Gsx2::Hoxa4::Gsx1 M3675_1.02_Pou3f1::Pou2f2::Pou2f1 h1 .h0h 0.5h 1h 2h M4603_1.02_Mbtps2::Zfp42::Yy1 M4994_1.02_Hlx M1582_1.02_Hmg20b M0757_1.02_Foxa1::Foxa3::Foxa2 M6154_1.02_A�5::A�4 M1065_1.02_Noto M0718_1.02_Foxa1::Foxa3::Foxa2 M0737_1.02_Foxa1::Foxa3::Foxa2 M6062_1.02_Pknox2::Meis3::Pknox1 4264 M5308_1.02_Olig2::Olig1::Bhlhe22 M6069_1.02_Rara::Rarb::Rarg Module 4 M5827_1.02_Sox21::Sox2 M3298_1.02_Foxa1::Foxa3::Foxa2 3734 M3679_1.02_Pou3f1::Pou2f2::Pou2f1 M5892_1.02_Tbx21::Eomes::Tbr1 Module 5 3365 M2041_1.02_Irx5::Irx4::Irx6 M5357_1.02_E2f3:: M5768_1.02_Rara::Rarb::Rarg M4489_1.02_Spib::Spic::Sfpi1 2187 M6095_1.02_Sox3 M4744_1.02_Irx5::Irx4::Irx6 M1023_1.02_Obox6::ENSMUSG00000040953 M0630_1.02_Dmrt1 1243 M5357_1.02_E2f3::E2f2 M2008_1.02_Six6::Six3 M1907_1.02_Spib::Spic::Sfpi1 M5040_1.02_Pknox2::Meis3::Meis2 M2955_1.02_Zeb1 M2917_1.02_Ahr::Ahrr M0905_1.02_Hoxb5::Hoxb8::Hoxd8 M5359_1.02_E2f3::E2f2 3.5 M1003_1.02_Hoxc9::Hoxc8::Hoxb8 M2585_1.02_Rbpj::Rbpsuh-rs3 0 0.3 M6401_1.02_Crx::Otx1::Otx2 M1912_1.02_Rbpj::Rbpsuh-rs3 M6245_1.02_Foxo3::Foxo1::Foxo6 M5847_1.02_Sox10::Sox8::Sox9 4285 M6136_1.02_Bcl6 3742 M6156_1.02_Bach2::Bach1 3336 M6077_1.02_Rfx8::Rfx4::Rfx5 Regression coefficients M5465_1.02_Foxo3::Foxo1::Foxo6 Module 3 2118 M4555_1.02_Zfp110::Zfp369 M5521_1.02_Hnf1b::Hnf1a 1312 M0897_1.02_Hoxc13::Hoxa13::Hoxd13 M5686_1.02_Nr4a2::Nr4a3::Nr4a1 M5359_1.02_E2f3::E2f2 0h 0.5h 1h 2h 4h 6h 8h 10h 12h 14h 16h 18h 20h 22h 24h 36h M5361_1.02_E2f5::E2f4 M4013_1.02_A�1::Crem::Creb1 M6062_1.02_Pknox2::Meis3::Pknox1 M5843_1.02_Sox10::Sox8::Sox9 M5006_1.02_Heyl::Hey2::Hey1 M5303_1.02_Bcl6b 3 M6426_1.02_Pou5f1::Pou3f1::Pou2f2 M5847_1.02_Sox10::Sox8::Sox9 4306 M3666_1.02_Rest M6164_1.02_Tbx19::T 3697 M6075_1.02_Rfx8::Rfx4::Rfx5 3342 M3401_1.02_Hmx1::Hmx3::Hmx2 M5523_1.02_Hnf1b::Hnf1a Module 1 2133 M5836_1.02_Sox7 M5820_1.02_Sox15 1315 M5490_1.02_Glis1 M6299_1.02_Hoxc8::Hoxb8::Hoxd8 M6083_1.02_Sox10::Sox8::Sox9 M5878_1.02_Tbx10::Tbx20::Tbx1 M3004_1.02_Tbx19::T M1946_1.02_Ovol1 M2047_1.02_Pax5 4405 M5852_1.02_Sox10::Sox8::Sox9 M0941_1.02_Pitx2::Pitx3::Pitx1 3560 3388 2.5 M1031_1.02_Gm4830 M2585_1.02_Rbpj::Rbpsuh-rs3 2169 M1912_1.02_Rbpj::Rbpsuh-rs3 M0718_1.02_Foxa1::Foxa3::Foxa2 1271 M0757_1.02_Foxa1::Foxa3::Foxa2 M3781_1.02_Pparg::Ppard::Ppara M4726_1.02_Tgif2::Tgif1 M2031_1.02_Pbx4::Pbx2::Pbx3 M6164_1.02_Tbx19::T Module 2 4338 3635 3362 C 0h 0.5h 1h 2h 4h 8h 10h 14h 16h 18h 20h 22h 24h 36h 0h 0.5h 1h 2h 4h 8h 10h 14h 16h 18h 20h 22h 24h 36h 0h 0.5h 1h 2h 4h 8h 10h 14h 16h 18h 20h 22h 24h 36h 6h 12h 2166 2 6h 12h 6h 12h 1292 0h 1.0 0.5h 4344 1h 3684 3330 2h 2142 1293 4h

1.5 6h 4 2 0 h6 4h 6h 8h 10h 12h 14h 4470 8h 3681 3266 10h 2163 12h 1213 14h Jaccard 1 16h 4498 18h 3665 3375 20h 2087 22h 1168 24h 4405 0.5 36h 0.5 3768 3359 Module 1 Module 2 Module 3 2093 1168 0h 0.5h 4397 1h 3755 3381 0 2h 2092 4h 2092 6h 8h 4381 10h 3746 3410 12h 2091 1165 14h 16h 4604 18h 3522 3435 20h 2067 22h 1165 24h

36h 6 4 2 0 8 16h 18h 20h 22h 24h 36h Module 4 Module 5 Overall

44 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 6: Annotation of DRMN modules for dedifferentiation dataset with k = 5 modules. (A) The gene expression pattern of the 5 modules. The number above the heatmap correspond to the number of genes in that module. (B) Inferred regulatory program for different modules and time points. (C) Similarity of modules across time point.

45 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 22h 16h 24h 18h 24h 36h 0h 0.5h 1h 2h 4h 6h 8h 12h 36h Size 20h 14h 0h 0.5h 1h 2h 4h 6h 8h 10h 12h 14h 16h 18h 20h 22h A 10h B C491 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 2 5 C452 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 6 C459 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 2 10 Geneset 544 C337 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 2 7 C357 1 1 1 1 2 2 1 1 2 1 1 1 1 1 1 1 5 Exp Hmga1 Egr2::Egr4 Jun::Jund Tcfap2d C319 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 2 6 C450 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 6 C480 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 14 C552 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 42 C431 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 10 C320 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 26 C466 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 12 C547 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 64 C536 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 69 C447 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 65 C560 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 77 C538 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 34 C449 1 1 2 2 2 1 1 1 2 1 1 1 1 1 1 1 6 C487 1 1 2 2 2 2 2 2 2 2 2 1 1 1 1 2 5 C322 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 15 C331 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1 10 0h 0h 0h 0h 0h 36h C388 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 69 36h 36h 36h 36h C464 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 201 C474 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 5 Zfp740 Esrra::Esrrb Egr1::Egr4 Mafk::Mafg C502 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 23 C397 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 49 C489 1 1 1 1 2 2 2 2 1 1 2 2 1 1 1 1 5 C494 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 15 C413 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 15 C446 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 64 C551 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 155 C395 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 64 C438 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 21 C400 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 10 C493 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 7 C561 1 1 1 1 2 2 2 2 2 2 2 2 1 1 1 1 12 C385 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 106 C396 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 48 C261 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 5 0h C324 1 1 1 1 1 1 1 2 1 1 2 2 2 2 2 2 12 0h 0h 0h 36h C448 1 1 1 1 1 1 2 2 2 2 2 2 2 2 1 1 6 36h 36h 36h C559 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 53 C437 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 19 C318 1 1 1 2 2 2 2 2 2 2 2 2 1 1 1 1 6 C460 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 73 C456 1 1 1 1 1 1 1 1 2 1 1 2 2 2 2 2 8 C404 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 20 C543 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 63 Geneset 439 Geneset 420 C478 1 1 2 2 2 2 2 1 1 1 2 1 1 1 1 1 5 C429 1 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 5 Exp Six6::Six3 Lbx2::Lbx1 Irx1 (M2019) Exp Irx1 Zfp740 C414 1 1 1 1 2 2 2 3 3 3 3 3 3 3 3 3 9 C428 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 11 C340 1 1 1 1 1 2 1 1 2 2 2 2 2 2 2 2 7 C492 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 5 C451 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 11 C488 1 1 1 2 2 2 2 2 2 2 2 2 2 2 1 1 9 C402 1 1 2 2 2 2 2 1 1 2 2 2 2 2 2 2 5 C321 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 5 C473 2 2 2 1 1 1 1 2 2 2 2 2 1 1 1 1 6 C453 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 37 C327 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 2 5 C518 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 2 7 C558 2 2 2 2 2 2 1 1 1 2 1 1 1 1 1 1 7 C406 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 130 C315 2 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 6

C426 2 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 7 0h 0h 0h 0h 0h 0h 0h 36h 36h 36h

C333 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 8 36h 36h 36h 36h C479 2 2 2 2 2 2 1 1 2 2 2 1 1 1 1 1 6 Hnf4g::Hnf4a Zfp637 C432 2 2 2 2 1 1 1 2 2 2 2 2 2 2 1 1 5 Irx1 (M2026) Dlx6Mzf1 C410 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 72 C317 2 2 2 2 2 1 1 2 2 1 1 1 1 1 1 1 8 C546 2 2 2 1 1 1 2 2 2 2 2 2 2 2 2 2 25 C401 2 2 1 1 2 2 2 1 1 1 1 1 1 1 1 1 5 C316 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 5 C557 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 14 C475 2 2 2 2 2 2 2 1 1 1 1 2 1 1 1 1 7 C539 2 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 41 C550 2 2 1 1 1 1 1 1 2 2 2 2 2 2 2 2 8 C441 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 9 C468 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 54 C330 2 2 2 2 2 2 2 2 2 1 1 1 1 1 2 2 7 C477 2 2 2 2 2 2 2 2 2 1 1 2 1 1 1 1 7 C549 2 2 2 2 2 1 1 1 1 1 1 1 1 1 2 2 8 C399 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 9 0h 0h 0h 0h 0h

C332 2 2 2 2 1 1 1 1 1 1 2 2 2 2 2 2 6 36h 36h C387 2 2 2 2 2 1 1 1 2 2 2 2 2 2 2 2 15 36h 36h 36h C384 2 2 2 2 2 1 1 2 1 1 1 1 1 1 1 1 9 Gbx2::Gbx1 Arnt Fos::Fosb C457 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 75 C341 2 2 2 2 2 2 2 2 1 1 1 1 1 1 2 2 7 C542 2 2 2 2 2 2 2 2 2 1 1 1 2 2 2 2 19 C445 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 19 C408 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 16 C481 2 2 2 2 2 2 1 1 1 1 1 1 2 2 1 1 6 C532 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 57 C325 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 7 C342 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 8 C455 2 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2 8 C383 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 58 C476 2 2 2 2 2 2 2 2 1 1 1 2 2 2 2 2 28 C463 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 35 C467 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 60 C439 2 2 2 2 2 3 3 3 4 4 4 4 4 4 4 4 20 C440 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 48 0h 0h 0h

C524 2 2 2 2 2 2 2 2 1 1 2 2 2 1 1 1 5 36h 36h 36h C485 2 2 2 2 2 2 2 1 1 1 2 1 1 1 1 2 8 C454 2 2 2 2 3 4 4 4 5 5 5 5 5 5 5 5 8 C420 2 2 2 3 3 4 4 4 4 4 4 4 4 4 4 4 22 C407 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 18 C222 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 23 C533 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 38 C556 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 6 C490 2 2 2 2 2 2 2 2 2 2 2 1 1 1 2 2 15 C421 2 2 2 2 3 3 3 3 3 3 2 2 2 2 2 2 39 Geneset 390 Geneset 443 C231 2 2 2 3 3 2 2 2 2 2 2 2 2 2 2 2 6 C469 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 92 Exp Prrxl1(M5149)Prrxl1(M5150) Esrrg Exp Lhx9::Lhx2 C548 2 2 2 2 2 2 2 3 3 3 3 2 2 2 2 2 14 C411 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 136 C379 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 381 C465 2 2 2 3 3 3 3 3 2 2 2 2 2 2 2 2 16 C219 2 2 2 2 2 3 3 3 2 2 2 2 2 2 2 2 5 C537 3 3 3 3 3 2 2 2 2 2 2 3 3 3 3 3 21 C361 3 3 3 3 3 2 2 2 2 2 2 2 2 2 1 1 5 C409 3 3 3 3 3 2 2 2 3 3 3 3 3 3 3 3 20 C381 3 3 3 3 3 2 2 2 2 2 2 2 2 3 3 3 9 C412 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 228 C443 3 3 3 4 4 5 5 5 5 5 5 5 5 5 5 5 12 C225 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 126 C553 3 3 3 3 4 4 4 4 4 4 3 3 3 3 3 3 44 C462 3 3 3 3 3 3 3 2 2 2 2 3 3 3 3 3 9 C380 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 228 C403 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 231 0h 0h 0h 0h 0h 0h 0h 36h 36h 36h 36h C378 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 174 36h 36h 36h C382 3 3 3 3 3 3 3 3 3 3 2 2 3 3 3 3 12 C535 3 3 3 3 3 3 3 3 3 3 2 2 2 2 3 3 10 Meox1::Meox2 Mecom Zfp637 Hoxd4 Mef2a::Mef2c C389 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 31 C433 3 3 3 3 3 3 3 2 2 2 2 2 2 2 1 1 5 C228 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 11 C442 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 251 C461 4 4 4 4 4 3 3 3 3 3 4 4 4 4 4 4 16 C544 4 4 4 4 4 3 3 3 3 3 2 2 2 2 2 2 29 C458 4 4 4 3 3 3 3 3 3 2 2 2 2 2 2 2 5 C540 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 171 C554 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 264 C398 4 4 4 4 4 4 4 4 3 3 3 3 2 2 2 2 12 C444 4 4 4 4 4 5 5 5 5 5 4 4 4 4 4 4 15 C393 4 4 4 4 4 4 4 4 4 3 3 3 3 3 2 2 7 C405 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 18 C415 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 98 C390 5 5 5 5 5 4 4 4 4 4 4 3 3 3 3 3 22 5 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 142

C541 0h 0h 0h 0h 0h 46 36h 36h 36h 36h 36h bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 7: (A) Bird’s eye view of groups of transitioning genes. On left is the average expression of the genes in the group, in the middle is module assignments, and on right the number of genes in that group. (B) Regulators associated some of transitioning gene sets.

47 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Supplementary Figures

48 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

IPSC/ESC preIPS Histone+Motif Motif Histone MEF48 MEF ATAC+Motif Q-Motif Histone+ATAC+Motif ATAC

DRMN-ST DRMN-Fused A Sequencing B Sequencing C Sequencing 1.0 1.0 1.0

0.9 0.9 0.9

0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6 0.6

0.5 0.5 0.5 Corr 0.4 0.4 0.4

0.3 0.3 DRMN-Fused 0.3

0.2 0.2 0.2

0.1 0.1 0.1

0 0 0 3 5 7 119 3 5 7 119 0.20 0.6 0.80.4 1.0 #Modules #Modules DRMN-ST

49 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S1: Performance of ATAC as a single feature (green) vs. Motif alone (blue), Histone+Motif (ma- genta), Q-Motif (light blue), ATAC+Motif (yellow), Histone alone (red), and Histone+ATAC+Motif (dark purple) using A) DRMN-ST and B) DRMN-Fused. C) Shows the performance of DRMN-ST vs. DRMN- Fused for the same features.

50 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

IPSC/ESC preIPS MEF48 MEF DRMN-ST DRMN-Fused

A Array, Histone B Array, Histone+Motif C Array, Motif 350 400

300 350 7 300 250 250 200 200 150 6 150 100 100 50 50

Avg regulators module per 5 0 0

D Sequencing, HistoneE Sequencing, Histone+Motif F Sequencing, Motif 9 400 350

350 300

300 250 250 200 8 200 150 150 100 100 50 50

Avg regulators module per 7 0 0 Sequencing, G Sequencing, ATAC+Motif H Histone+ATAC+Motif I Sequencing, Q-Motif 350 350 350

300 300 300

250 250 250

200 200 200

150 150 150

100 100 100

50 50 50

Avg regulators module per 0 0 0 3 5 7 9 11 3 5 7 9 11 3 5 7 9 11 #Modules #Modules #Modules

51 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S2: Comparing average number of regulators per module between DRMN ST and Fused models for different dataset/feature combinations A-C) feature sets in array dataset, and D-I) feature sets in sequencing dataset.

52 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

DRMN-ST DRMN-Fused GMM (Indep) IPSC/ESC preIPS MEF48 MEF RMN-ST RMN-Fused GMM (Merged)

A Sequencing, HistoneB Sequencing, Histone+Motif C Sequencing, Motif 1.00 1.00 1.00

0.95 0.95 0.95

0.90 0.90 0.90

0.85 0.85 0.85

0.80 0.80 0.80 Overall Corr

0.75 0.75 0.75

0.70 0.70 0.70 D Array, Histone E Array, Histone+Motif F Array, Motif 1.00 1.00 1.00

0.98 0.98 0.98

0.96 0.96 0.96

0.94 0.94 0.94

0.92 0.92 0.92

0.90 0.90 0.90

0.88 0.88 0.88

0.86 0.86 0.86

G Sequencing, HistoneH Sequencing, Histone+Motif I Sequencing, Motif 0.9 0.9 0.7 0.8 0.8 0.6 0.7 0.7 0.5 0.6 0.6 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 v orPrMdl Overall Corr Avg Module Per Corr 0.1 0.1 0 0 0

J Array, Histone K Array, Histone+Motif L Array, Motif 0.8 0.8 0.7

0.7 0.7 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2

Avg Module Per Corr 0.1 0.1 0.1 0 0 0 3 5 7 9 11 3 5 7 9 11 3 5 7 9 11 #Modules #Modules #Modules

53 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S3: Comparing DRMN’s expression predictions to baseline approaches. Each panel compares six algorithms (legend) on the basis of one correlation metric (y-axis) across a range of k (x-axis). Each dot represents the results for one cell type averaged over 3 fold of cross validation. The columns show results for chromatin mark features (left), combination of chromatin and motif (middle), and the motif features (right). (A-F) Comparison based on Pearson correlation of predicted to true expression for held-aside genes. (G-L) Comparison of per-module correlation coefficients, averaged across k modules.

54 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A B IPSC pre-IPSC MEF IPSC pre-IPSC MEF IPSC pre-IPSC MEF

3610 Elk1 Creb1 Dlx5 Gmeb1 E2f2 H3K14ac H3K14ac Gbx2 H3K18ac

3040 H3K18ac H3K14ac H3K27me3 2912 H3K9ac H3K18ac H3K4me3 H3K9me3 H3K27me3 H3K79me2

2386 Hlf H3K4me3 H3K9ac Hmbox1 H3K79me2 H3K9me2 Homez H3K9ac H3K9me3 Vdr H3K9me2 Hmbox1 1729 1614 Module 1 H3K9me3 Hmx1 Hlf Hmx2

691 Hmbox1 Lhx6 Osr2 Mnx1 iPSC Regression coefficients Sox11 Nkx2-2 Sp100 Obox6 0.0 0.1 0.1 0.2 0.3 Vdr Osr2 Module 2 Six3 Module 3 IPSC pre-IPSC MEF IPSC pre-IPSC MEF IPSC pre-IPSC MEF IPSC pre-IPSC MEF Alx4 Alx4 H3K14ac Dbx2

3701 Bsx Bsx H3K18ac H3K14ac Cux1 Cux1 H3K27me3 H3K18ac H3K14ac Gmeb1 H3K79me2 H3K79me2

3012 H3K18ac H3K14ac Hlf H3K9ac H3K27me3 H3K18ac Hoxa13 Klf4 2683 H3K4me3 H3K27me3 Hoxd12 Lhx3

2415 H3K79me2 H3K4me3 Junb Rest H3K9ac H3K79me2 Mecom Rreb1 H3K9me3 H3K9ac Mybl2 Sox13

1812 Hdx H3K9me3 Obox1 Sox14 1693 Homez Hdx Pou3f1 Sp1 Sp100 Klf8 Pou3f2 Sp4

Pre 666 Tead1 Six1 Tead4 Srf Tef Sox2 Tfap2c Tfap2e iPSC Module 4 Module 5 Tgif1 Zfp143 Module 6 Module 7 pre-iPSC pre-iPSC pre-iPSC pre-iPSC iPSC iPSC iPSC iPSC MEF MEF MEF MEF

3264 C 2992 2917 MEF 0.78 0.71 0.92 0.88 0.88 0.77 0.81 0.62

0.92 0.92 0.88 0.83 0.81 0.69 log(Expression+1) 0.78 0.85

2454 pre-iPSC 13 iPSC 0.71 0.85 0.88 0.92 0.77 0.83 0.62 0.69 11 1826 1832 Overall Module1 Module2 Module3 9 MEF 0.78 0.57 0.77 0.58 0.83 0.68 0.96 0.88 7 MEF pre-iPSC 0.78 0.64 0.77 0.67 0.83 0.77 0.96 0.91 697 5 iPSC 0.57 0.64 0.58 0.67 0.68 0.77 0.88 0.91 3 Module4 Module5 Module6 Module7 Modules: 1 2 3 4 5 6 7 0.0 1.0

55 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S4: Annotation of DRMN modules, for models trained on chromatin+motif with k = 7 modules, for array dataset. (A) The gene expression pattern of the 7 modules. The number above the heatmap correspond to the number of genes in that module. (B) Inferred regulatory program for different modules and cell lines. (C) Similarity of modules across cell lines.

56 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A B Size iPSC pre-iPSC MEF iPSC pre-iPSC MEF C51 1 3 1 14 C90 1 1 3 32 C102 1 4 1 8 1 1 2 20 C103 TCF3 iPSC pre-iPSC MEF iPSC pre-iPSC MEF C76 2 4 1 8 C44 2 3 2 21 2610305D13Rik 6 1 1 C88 2 4 2 12 C99 2 5 2 8 Mael 6 1 1 C40 2 1 1 18 6 1 1 C69 2 2 4 65 Dazl C87 2 3 4 9 Ddx4 6 1 1 C79 2 2 5 22 C46 2 2 3 99 Hormad1 6 1 1 C74 2 3 3 39 C72 3 4 2 12 Sycp3 6 1 1 C68 3 4 3 48 6 1 1 C39 3 3 5 69 Dub1 C66 3 1 1 41 Prdm14 6 1 1 C77 3 3 1 10 C58 3 2 2 65 AV320801 6 1 1 C50 3 3 2 96 6 1 1 C47 3 3 4 253 Mcf2 C122 3 3 6 24 Efhc2 6 1 1 C42 3 4 4 115 C64 3 4 5 14 LOC639910 6 1 1 C96 4 5 3 10 C85 4 5 2 7 EG435970 6 1 1 C63 4 3 5 8 6 1 1 C62 4 4 2 19 LOC100044048 C135 4 4 6 70 EG434726 6 1 1 C78 4 1 1 22 C81 4 3 4 11 Dppa2 7 1 1 C84 4 5 4 15 6 2 2 C29 4 3 3 122 Aard C91 4 5 6 13 Tex14 6 2 2 C48 4 2 2 50 C41 4 5 5 108 Lin28 6 2 2 C30 4 4 3 242 C45 4 4 5 413 Pnma5 6 2 2 C43 4 3 2 7 6 2 2 C53 5 5 2 10 Slc2a3 C59 5 1 1 13 OTTMUSG00000004010 6 2 2 C61 5 6 3 8 C57 5 6 5 16 Npl 6 2 2 C60 5 3 3 37 6 2 2 C119 5 4 6 25 Trim6 C73 5 5 3 30 6 2 2 13.0

200.0 Fmr1nb C75 5 4 5 24

5 2 2 21 11.7 C65 180.0 Icam1 6 2 2 C118 5 6 6 141 10.4 C116 5 5 6 171 160.0 Xlr4c$LOC100044048 6 2 2

C49 5 4 4 129 9.1 140.0 6 2 2 C137 5 5 4 144 Xlr5a 7.8 C56 5 4 2 11 120.0 Gng3 6 2 2

C67 5 4 3 23 6.5 100.0 size C121 6 1 1 16 expr Esrrb 7 2 2 5.2 C55 6 6 7 23 80.0 7 2 2 84 3.9 Nanog C117 6 6 5 60.0 C109 6 6 4 7

2.6 7 2 2 40.0 BQ559217 C136 6 5 5 38

16 1.3 Module Expression Motif C134 6 2 2 20.0 C120 6 4 4 13 Assignments values presence 0.0 0.0 C127 6 4 3 10

57 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S5: (A) Bird’s eye view of groups of transitioning genes; on left is the average expression of the genes in the group, in the middle is module assignments, and on right the number of genes in that group. (B) Module assignments (left) and gene expression (middle), and presence or absence of TCF3 motif for genes that transition from low to high expression between differentiated or partially reprogrammed cells and ESCs/iPSC.

58 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 0h 0.5h 1h 2h 4h 6h 8h 10h 12h 14h 16h 18h 20h 22h 24h 36h 0h 0.5h 1h 2h 4h 6h 8h 10h 12h 14h 16h 18h 20h 22h 24h 36h 0h 0.5h 1h 2h 4h 6h 8h 10h 12h 14h 16h 18h 20h 22h 24h 36h 0h 0.5h 1h 2h 4h 6h 8h 10h 12h 14h 16h 18h 20h 22h 24h 36h

serine_family_amino_acid_metabolic_process 4.906 4.896 4.896 4.840 4.838 3.133 3.127 2.610 2.562 2.561 0.000 0.000 0.000 0.000 0.000 0.000 succinate_metabolic_process 3.372 3.365 3.365 3.338 3.339 3.194 3.189 3.284 3.262 3.257 2.439 2.466 2.466 2.466 2.471 2.471 nega�ve_regula�on_of_phosphorus_metabolic_process 0.000 0.000 1.375 1.574 0.000 1.862 1.653 1.330 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 pons_development 0.000 1.404 1.422 1.408 1.467 1.475 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

posi�ve_regula�on_of_cellular_amine_metabolic_process 2.888 2.879 2.879 2.857 2.859 2.747 2.744 2.827 1.834 1.832 0.000 0.000 0.000 0.000 0.000 0.000 L-phenylalanine_metabolic_process 3.235 3.229 3.229 3.210 3.210 3.096 3.092 3.171 3.156 3.158 3.257 2.214 2.214 2.214 2.214 2.214 nega�ve_regula�on_of_phosphate_metabolic_process 0.000 0.000 1.375 1.574 0.000 1.862 1.653 1.330 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 striated_muscle_cell_development 2.147 1.967 1.510 2.096 2.228 1.940 1.311 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

cysteine_metabolic_process 3.684 3.677 3.677 3.652 3.651 2.322 2.324 1.407 1.392 1.392 0.000 0.000 0.000 0.000 0.000 0.000 cellular_amide_metabolic_process 4.048 4.039 4.039 4.015 4.010 2.937 2.939 3.036 3.007 3.007 3.160 2.412 2.412 2.412 1.691 1.691 DNA_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.604 1.629 0.000 0.000 1.402 1.408 0.000 0.000 skin_development 1.812 1.482 0.000 1.553 1.670 1.673 1.750 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

glycerolipid_metabolic_process 3.603 3.583 3.583 3.503 3.503 2.205 2.199 2.125 2.035 2.029 1.394 0.000 0.000 0.000 0.000 0.000 vitamin_metabolic_process 4.368 4.352 4.352 4.299 4.292 3.990 3.984 3.596 3.542 3.544 3.311 2.020 2.020 2.020 2.027 2.027 sphingoid_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.336 0.000 0.000 1.338 1.363 1.331 0.000 0.000 muscle_cell_development 1.436 1.433 0.000 1.542 1.682 1.904 1.361 0.000 0.000 0.000 0.000 0.000 0.000 1.314 1.337 0.000

glutamate_metabolic_process 3.601 3.592 3.592 3.564 3.565 2.602 2.601 2.694 2.654 2.650 2.090 0.000 0.000 0.000 0.000 0.000 isoprenoid_metabolic_process 3.785 3.772 3.772 3.725 3.722 3.990 3.984 3.596 3.542 3.544 2.828 2.020 2.020 2.020 2.027 2.027 heme_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.423 1.323 1.808 1.788 0.000 0.000 thymus_development 0.000 0.000 0.000 0.000 1.408 0.000 1.477 0.000 1.488 1.488 0.000 1.521 1.466 1.490 0.000 1.338

fat-soluble_vitamin_metabolic_process 3.384 3.371 3.371 3.338 3.339 3.111 3.104 2.679 2.621 2.617 2.306 0.000 0.000 0.000 0.000 0.000 diterpenoid_metabolic_process 3.684 3.673 3.673 3.635 3.635 4.114 4.107 3.535 3.497 3.501 3.056 1.963 1.963 1.963 1.968 1.968 glycerophospholipid_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.589 1.641 1.820 1.769 1.461 posi�ve_regula�on_of_cGMP_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 1.372 0.000 0.000 1.371 1.402 0.000 1.377 1.392 1.423 1.531

glutamine_family_amino_acid_metabolic_process 3.867 3.851 3.851 3.816 3.812 2.993 2.991 2.610 2.555 2.553 2.285 1.472 1.472 1.472 0.000 0.000 re�noid_metabolic_process 3.684 3.673 3.673 3.635 3.635 4.114 4.107 3.535 3.497 3.501 3.056 1.963 1.963 1.963 1.968 1.968 tetrapyrrole_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.553 1.963 1.940 1.526 1.479 cochlea_development 0.000 0.000 0.000 0.000 0.000 0.000 1.329 0.000 1.329 1.324 0.000 0.000 1.324 1.337 1.366 1.504

nitrogen_cycle_metabolic_process 3.372 3.365 3.365 3.338 3.339 2.261 2.264 2.333 2.303 2.304 2.439 1.629 1.629 1.629 0.000 0.000 terpenoid_metabolic_process 3.724 3.713 3.713 3.669 3.667 4.108 4.100 3.572 3.530 3.532 3.131 2.100 2.100 2.100 2.101 2.101 porphyrin-containing_compound_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.553 1.963 1.940 1.526 1.479 genitalia_development 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.378 1.400 1.843 2.040

urea_metabolic_process 3.372 3.365 3.365 3.338 3.339 2.261 2.264 2.333 2.303 2.304 2.439 1.629 1.629 1.629 0.000 0.000 sulfur_amino_acid_metabolic_process 4.354 4.342 4.342 4.299 4.295 3.286 3.277 2.688 1.992 1.989 1.545 1.603 1.603 1.603 1.604 1.604 mannose_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 2.196 2.101 2.113 2.109 2.128 2.088 regula�on_of_developmental_growth 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.598 1.892 1.906 1.749

arginine_metabolic_process 2.389 2.382 2.382 2.366 2.366 2.261 2.264 2.333 2.303 2.304 2.439 1.629 1.629 1.629 0.000 0.000 triglyceride_metabolic_process 8.426 8.404 8.404 8.344 8.337 6.401 6.392 5.843 5.741 5.723 4.662 4.162 4.162 4.162 4.168 4.168 sphingomyelin_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.786 1.785 1.814 1.744 1.800 1.786 1.769 1.732 regula�on_of_muscle_organ_development 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.673 1.976 1.738 1.355

xenobio�c_metabolic_process 2.478 2.470 2.470 2.437 2.438 1.743 1.740 1.356 1.788 1.788 1.459 1.514 1.514 1.514 0.000 0.000 acylglycerol_metabolic_process 8.117 8.095 8.095 8.043 8.036 6.215 6.197 5.742 5.635 5.622 4.106 3.684 3.684 3.684 3.687 3.687 organophosphate_metabolic_process 1.579 1.551 1.537 1.359 1.680 2.507 2.475 2.133 2.242 2.516 3.724 4.482 4.300 4.489 4.062 3.928 regula�on_of_striated_muscle_�ssue_development 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.673 1.976 1.738 1.355

nega�ve_regula�on_of_steroid_metabolic_process 1.847 1.841 1.841 1.827 1.827 1.738 1.736 1.804 1.775 1.777 1.912 1.957 1.957 1.957 1.959 1.959 neutral_lipid_metabolic_process 7.793 7.771 7.771 7.711 7.703 5.948 5.930 5.488 5.374 5.368 3.926 3.508 3.508 3.508 3.512 3.512 phospholipid_metabolic_process 0.000 0.000 0.000 0.000 1.337 2.588 2.559 2.437 2.516 2.600 3.643 3.964 3.757 3.975 3.839 3.711 posi�ve_regula�on_of_cell_development 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.418 0.000 1.618 1.861 1.679 1.370

dATP_metabolic_process 1.423 1.421 1.421 1.410 1.409 1.358 1.354 1.407 1.392 1.392 1.459 1.497 1.497 1.497 1.490 1.490 acetyl-CoA_metabolic_process 7.283 7.257 7.257 7.193 7.187 6.130 6.113 6.323 6.219 6.213 5.059 3.850 3.850 3.850 3.859 3.859 regula�on_of_macromolecule_metabolic_process 0.000 0.000 0.000 0.000 1.724 2.592 2.661 2.920 2.843 2.619 2.259 2.428 3.090 3.034 3.195 2.973 embryo_development 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.603 0.000 1.478 1.821 1.734 2.006

NADH_metabolic_process 1.557 1.554 1.554 1.537 1.535 1.466 1.466 1.523 1.503 1.504 1.587 1.629 1.629 1.629 1.628 1.628 cellular_aldehyde_metabolic_process 6.386 6.370 6.370 6.323 6.330 7.016 7.006 6.270 6.191 6.173 3.991 3.370 3.370 3.370 3.375 3.375 regula�on_of_RNA_metabolic_process 0.000 0.000 0.000 0.000 1.857 1.923 2.260 2.304 2.653 2.468 2.122 1.872 2.346 2.361 2.464 2.246 respiratory_system_development 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.498 0.000 1.517 1.565 1.753 0.000

regula�on_of_glucose_metabolic_process 1.556 1.548 1.548 1.514 1.511 1.359 1.354 1.461 1.420 1.417 1.609 1.709 1.709 1.709 1.715 1.715 regula�on_of_lipid_metabolic_process 5.370 5.342 5.342 5.655 5.646 4.341 4.325 5.022 4.864 4.858 4.243 3.750 3.750 3.750 3.401 3.401 regula�on_of_nitrogen_compound_metabolic_process 0.000 0.000 0.000 0.000 1.721 1.984 2.243 2.336 2.427 2.117 1.718 1.504 1.916 1.928 2.109 1.870 trachea_development 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.402 1.392 0.000 1.392 1.423 0.000

superoxide_metabolic_process 1.777 1.772 1.772 1.752 1.751 1.642 1.641 1.721 1.684 1.687 1.327 1.380 1.380 1.380 1.376 1.376 glutathione_metabolic_process 5.298 5.283 5.283 5.243 5.235 4.961 4.950 4.393 5.031 5.033 4.583 4.004 4.004 4.004 4.010 4.010 regula�on_of_primary_metabolic_process 0.000 0.000 0.000 0.000 1.417 2.227 2.184 2.443 2.334 2.256 1.665 1.713 2.098 2.056 2.353 2.182 regula�on_of_nucleo�de_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.646 1.673 1.458 1.473 0.000

glycerol_ether_metabolic_process 2.161 2.154 2.154 2.129 2.128 1.386 1.382 1.450 1.427 1.425 1.530 1.578 1.578 1.578 1.579 1.579 pep�de_metabolic_process 5.461 5.448 5.448 5.389 5.383 5.693 5.681 4.636 5.141 5.142 3.754 3.354 3.354 3.354 3.363 3.363 regula�on_of_metabolic_process 0.000 0.000 0.000 0.000 1.486 1.945 2.015 2.046 2.049 1.922 1.515 1.703 2.368 2.294 2.535 2.402 regula�on_of_purine_nucleo�de_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.345 1.765 1.804 1.576 1.594 0.000

hormone_metabolic_process 1.356 1.647 1.647 1.940 1.938 2.426 2.418 2.591 2.514 2.512 1.427 0.000 0.000 0.000 0.000 0.000 regula�on_of_fa�y_acid_metabolic_process 4.586 4.573 4.573 4.514 4.509 4.218 4.206 4.393 4.324 4.316 3.428 2.510 2.510 2.510 2.050 2.050 phospha�dylinositol_metabolic_process 0.000 0.000 0.000 0.000 0.000 1.532 1.520 1.705 0.000 1.511 1.364 1.703 1.760 1.960 1.942 1.870 hemopoie�c_or_lymphoid_organ_development 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.623 1.653 1.729 1.664 1.443 1.613

vitamin_A_metabolic_process 2.814 2.809 2.809 2.783 2.783 2.617 2.611 2.115 2.076 2.072 1.674 0.000 0.000 0.000 0.000 0.000 regula�on_of_cellular_ketone_metabolic_process 5.142 5.122 5.122 5.049 5.046 4.706 4.689 4.890 4.780 4.779 3.613 2.842 2.842 2.842 2.412 2.412 nega�ve_regula�on_of_protein_metabolic_process 2.221 2.356 2.487 2.752 2.340 2.770 2.560 1.905 0.000 1.359 1.449 1.650 1.735 1.648 1.996 1.870 immune_system_development 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.625 1.541 1.594 1.553 1.440 1.644

very_long-chain_fa�y_acid_metabolic_process 2.502 2.493 2.493 2.465 2.466 2.331 2.331 2.427 2.385 2.384 1.394 0.000 0.000 0.000 0.000 0.000 sulfur_compound_metabolic_process 5.939 5.912 5.912 5.820 5.808 4.900 4.884 3.844 3.739 3.736 2.630 2.111 2.111 2.111 2.118 2.118 nega�ve_regula�on_of_cellular_protein_metabolic_process 2.059 2.162 2.307 2.827 2.609 3.247 3.011 2.314 1.732 1.813 1.865 1.958 2.043 1.960 2.353 2.246 kidney_vasculature_development 1.317 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.914 1.952 1.953 1.955 2.847 2.873 2.113

aroma�c_amino_acid_family_metabolic_process 2.020 2.013 2.013 1.994 1.993 1.889 1.889 1.970 1.933 1.930 2.090 0.000 0.000 0.000 0.000 0.000 organic_substance_metabolic_process 3.851 3.816 3.816 3.903 3.889 5.135 5.094 5.171 5.663 5.641 5.781 5.895 5.895 5.895 5.640 5.640 tRNA_metabolic_process 3.553 3.910 4.259 4.387 4.563 2.869 2.857 3.722 2.833 2.887 2.783 1.905 1.760 1.712 1.661 1.582 glomerulus_vasculature_development 1.317 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.914 1.952 1.953 1.955 2.847 2.873 2.113

regula�on_of_triglyceride_metabolic_process 2.095 2.088 2.088 2.060 2.059 1.938 1.938 2.033 1.992 1.989 1.545 0.000 0.000 0.000 0.000 0.000 carbohydrate_deriva�ve_metabolic_process 4.451 4.412 4.412 4.493 4.478 5.880 5.838 5.902 6.440 6.426 6.550 6.644 6.644 6.644 6.385 6.385 ncRNA_metabolic_process 4.171 4.393 4.916 4.500 6.048 4.543 4.507 5.478 4.744 4.929 3.866 2.851 2.662 2.574 2.518 2.394 renal_system_vasculature_development 1.317 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.914 1.952 1.953 1.955 2.847 2.873 2.113

phospha�dylcholine_metabolic_process 2.209 2.201 2.201 2.170 2.170 2.051 2.049 2.142 2.103 2.101 1.639 0.000 0.000 0.000 0.000 0.000 rRNA_metabolic_process 4.839 4.821 4.821 4.741 4.737 5.891 5.868 6.165 6.027 6.002 6.575 6.285 6.285 6.285 6.310 6.310 phosphorus_metabolic_process 6.152 6.363 6.676 7.420 6.689 5.329 5.245 5.359 3.997 3.011 3.015 2.709 2.344 2.344 2.532 2.332 muscle_�ssue_development 1.520 1.400 0.000 0.000 1.576 1.735 0.000 0.000 1.469 1.566 1.603 1.313 2.076 2.336 2.756 2.564

cellular_carbohydrate_metabolic_process 2.211 2.198 2.198 2.132 2.128 1.593 1.587 1.324 1.461 1.459 1.344 0.000 0.000 0.000 0.000 0.000 heterocycle_metabolic_process 9.274 9.208 9.208 9.315 9.635 9.610 9.870 11.043 11.339 11.339 12.118 11.497 11.497 11.497 11.157 11.157 phosphate-containing_compound_metabolic_process 6.152 6.363 6.676 7.420 6.689 5.329 5.245 5.359 3.997 3.011 3.015 2.709 2.344 2.344 2.532 2.332 diges�ve_tract_development 0.000 0.000 1.436 0.000 0.000 1.851 1.408 0.000 1.973 1.415 2.283 1.713 2.200 2.570 1.962 2.618

glycerol_metabolic_process 2.161 2.154 2.154 2.129 2.128 1.386 1.382 1.450 1.427 1.425 0.000 0.000 0.000 0.000 0.000 0.000 protein_metabolic_process 6.974 7.208 7.208 7.137 7.098 8.637 8.530 8.480 9.243 9.371 11.428 12.476 12.476 12.476 12.645 12.645 mRNA_metabolic_process 7.412 7.310 7.184 7.540 8.732 8.845 8.767 8.047 7.680 7.981 7.203 6.643 7.083 6.920 7.636 7.391 diges�ve_system_development 0.000 0.000 1.438 0.000 0.000 1.854 1.436 0.000 1.982 1.446 1.980 1.731 2.190 2.545 1.962 2.621

cellular_amine_metabolic_process 2.101 2.090 2.090 2.047 2.044 1.845 1.837 1.978 1.915 1.911 0.000 0.000 0.000 0.000 0.000 0.000 purine_nucleo�de_metabolic_process 4.129 4.097 4.097 4.278 4.267 5.693 5.662 6.865 7.357 7.341 7.674 7.901 7.901 7.901 7.950 7.950 nucleobase-containing_compound_metabolic_process 7.760 7.728 7.275 7.794 10.551 10.399 10.916 10.567 11.434 11.874 9.905 10.168 10.524 10.995 10.726 10.528 cell_migra�on_involved_in_heart_development 0.000 0.000 0.000 0.000 1.467 1.475 1.517 1.476 1.503 0.000 1.552 1.541 1.542 2.551 2.581 2.685

cellular_biogenic_amine_metabolic_process 2.101 2.090 2.090 2.047 2.044 1.845 1.837 1.978 1.915 1.911 0.000 0.000 0.000 0.000 0.000 0.000 purine-containing_compound_metabolic_process 5.178 5.141 5.141 5.301 5.287 6.747 6.717 7.651 8.073 8.052 8.138 8.059 8.059 8.059 8.112 8.112 cellular_nitrogen_compound_metabolic_process 6.880 6.976 6.676 7.420 9.928 10.608 11.107 10.567 11.054 11.208 10.393 11.240 11.709 12.219 11.775 11.529 metanephric_nephron_development 0.000 0.000 0.000 0.000 1.315 1.321 1.378 1.377 1.373 1.375 1.410 1.402 1.377 1.905 1.931 1.560

glucuronate_metabolic_process 1.697 1.693 1.693 1.678 1.679 1.615 1.616 1.671 1.647 1.649 0.000 0.000 0.000 0.000 0.000 0.000 cellular_protein_metabolic_process 4.730 4.957 4.957 4.940 4.915 6.213 6.124 6.314 7.158 7.321 8.863 9.822 9.822 9.822 9.895 9.895 nitrogen_compound_metabolic_process 6.980 7.059 6.983 7.746 10.291 10.834 11.324 10.708 11.006 11.193 10.700 11.762 12.287 12.759 12.191 12.018 coronary_vasculature_development 0.000 0.000 0.000 0.000 0.000 1.569 1.632 1.629 1.645 1.643 1.311 1.682 2.056 2.083 2.106 2.319

uronic_acid_metabolic_process 1.697 1.693 1.693 1.678 1.679 1.615 1.616 1.671 1.647 1.649 0.000 0.000 0.000 0.000 0.000 0.000 nucleoside_triphosphate_metabolic_process 5.602 5.565 5.565 5.437 5.427 7.261 7.232 8.525 9.131 9.115 9.372 9.967 9.967 9.967 10.020 10.020 nucleic_acid_metabolic_process 8.544 8.565 8.455 9.382 12.266 12.277 13.009 12.927 14.226 14.502 11.829 11.240 12.043 12.460 12.440 11.886 glomerulus_development 1.418 0.000 0.000 0.000 1.310 1.319 0.000 1.394 1.375 1.375 1.410 1.402 1.361 1.764 1.788 1.585

posi�ve_regula�on_of_fa�y_acid_metabolic_process 1.693 1.687 1.687 1.664 1.663 1.578 1.575 1.638 1.619 1.619 0.000 0.000 0.000 0.000 0.000 0.000 purine_nucleoside_triphosphate_metabolic_process 5.981 5.948 5.948 5.820 5.808 7.685 7.643 8.992 9.612 9.595 9.817 10.405 10.405 10.405 10.459 10.459 cellular_protein_metabolic_process 11.858 12.114 12.578 13.925 13.161 12.800 13.158 14.320 13.552 12.691 13.005 11.762 10.921 10.738 10.587 10.094 proteoglycan_metabolic_process 0.000 1.412 1.450 1.479 0.000 1.575 0.000 1.675 1.669 1.659 2.024 2.444 1.979 2.013 2.036 2.283

succinyl-CoA_metabolic_process 1.651 1.647 1.647 1.631 1.632 1.586 1.587 1.629 1.619 1.619 0.000 0.000 0.000 0.000 0.000 0.000 purine_ribonucleoside_triphosphate_metabolic_process 5.945 5.912 5.912 5.781 5.769 7.682 7.641 8.992 9.612 9.595 9.813 10.399 10.399 10.399 10.451 10.451 metabolic_process 9.709 9.984 10.019 10.664 12.096 12.506 13.009 12.052 13.194 12.341 13.626 17.694 17.385 17.758 16.979 17.180 metanephric_glomerulus_development 1.543 1.404 1.422 1.408 1.467 1.475 1.517 1.476 1.503 1.520 1.552 1.541 1.542 1.565 1.594 1.679

posi�ve_regula�on_of_triglyceride_metabolic_process 1.598 1.594 1.594 1.571 1.570 1.492 1.493 1.552 1.533 1.532 0.000 0.000 0.000 0.000 0.000 0.000 ribonucleoside_triphosphate_metabolic_process 5.808 5.775 5.775 5.653 5.641 7.525 7.492 8.813 9.454 9.425 9.649 10.226 10.226 10.226 10.279 10.279 primary_metabolic_process 10.243 10.472 10.499 11.513 13.161 12.800 13.412 13.159 14.226 14.017 14.249 17.694 16.931 17.166 16.482 16.599 regula�on_of_cGMP_metabolic_process 1.411 1.679 1.703 0.000 0.000 0.000 1.869 1.377 1.373 1.864 1.907 1.904 1.882 2.497 2.520 2.705

apocarotenoid_metabolic_process 1.557 1.554 1.554 1.537 1.535 1.466 1.466 1.523 1.503 1.504 0.000 0.000 0.000 0.000 0.000 0.000 purine_ribonucleo�de_metabolic_process 5.981 5.948 5.948 5.820 5.808 7.616 7.582 8.908 9.515 9.485 9.762 9.983 9.983 9.983 10.029 10.029 cellular_metabolic_process 12.198 12.559 12.895 13.673 14.949 14.901 15.511 15.193 16.051 15.079 15.317 17.820 17.563 18.077 17.497 17.304 dorsal_spinal_cord_development 0.000 1.462 1.486 1.480 1.561 1.558 2.318 2.305 2.303 2.291 2.336 2.359 2.348 3.299 3.333 3.513

re�nal_metabolic_process 1.557 1.554 1.554 1.537 1.535 1.466 1.466 1.523 1.503 1.504 0.000 0.000 0.000 0.000 0.000 0.000 ribonucleo�de_metabolic_process 6.155 6.125 6.125 6.006 5.990 7.740 7.696 9.008 9.612 9.595 9.903 10.120 10.120 10.120 10.175 10.175 regula�on_of_embryonic_development 1.829 1.714 1.769 1.543 1.944 1.673 2.038 0.000 1.781 1.765 1.805 1.803 2.303 2.685 2.706 2.720

ethanolamine-containing_compound_metabolic_process 1.554 1.548 1.548 1.522 1.520 1.414 1.411 1.491 1.458 1.458 0.000 0.000 0.000 0.000 0.000 0.000 nucleoside_phosphate_metabolic_process 6.074 6.026 6.026 6.166 6.458 7.400 7.355 8.360 8.713 8.710 9.372 9.822 9.822 9.822 9.503 9.503 muscle_organ_development 1.767 1.368 1.438 1.519 2.231 1.855 1.632 1.483 1.714 1.512 2.441 1.721 2.450 2.306 2.520 2.705

estrogen_metabolic_process 0.000 1.594 1.594 1.571 1.570 2.172 2.174 2.251 2.216 2.219 0.000 0.000 0.000 0.000 0.000 0.000 nucleo�de_metabolic_process 6.074 6.026 6.026 6.166 6.458 7.400 7.355 8.360 8.713 8.710 9.372 9.822 9.822 9.822 9.503 9.503 0h 0.5h 1h 2h 4h 6h 8h 10h 12h 14h 16h 18h 20h 22h 24h 36h eye_development 2.248 1.880 1.498 1.591 1.616 2.317 2.074 2.144 1.991 1.917 2.124 2.376 2.497 2.561 2.790 3.059 metabolic_process 22.328 22.312 22.160 19.731 20.317 20.632 19.855 23.219 21.640 22.116 19.488 16.291 16.980 16.880 16.684 16.499 regula�on_of_cellular_amine_metabolic_process 1.894 1.889 1.889 1.871 1.871 1.765 1.763 1.842 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 nucleobase-containing_small_molecule_metabolic_process 6.180 6.135 6.135 6.267 6.547 7.671 7.616 8.364 8.711 8.699 9.108 9.217 9.217 9.217 8.914 8.914 nephron_development 1.812 1.482 1.521 1.553 1.979 1.982 1.750 2.151 2.488 2.448 2.105 2.135 2.050 2.878 2.896 2.797 primary_metabolic_process 14.635 14.512 14.415 12.174 12.132 12.807 12.360 13.066 12.387 13.190 11.355 9.727 10.440 10.402 10.440 10.805 S-adenosylhomocysteine_metabolic_process 1.557 1.554 1.554 1.537 1.535 1.466 1.466 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 hydrogen_peroxide_metabolic_process 7.658 7.642 7.642 7.592 7.587 7.326 7.317 7.584 7.490 7.489 7.826 8.028 8.028 8.028 8.051 8.051 muscle_structure_development 1.812 1.631 1.863 2.176 2.595 2.596 2.594 1.897 2.549 1.942 2.666 1.898 3.133 3.401 3.415 3.513 small_molecule_metabolic_process 15.261 15.246 15.307 14.333 13.659 11.765 11.340 13.001 11.871 10.939 8.957 8.482 8.209 8.185 8.597 7.319 ethanol_metabolic_process 1.651 1.647 1.647 1.631 1.632 1.586 1.587 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 fa�y_acid_metabolic_process 9.453 9.410 9.410 9.747 9.732 7.167 7.137 5.438 5.255 4.858 4.068 1.702 1.702 1.702 1.715 1.715 appendage_development 2.037 1.746 1.605 2.126 2.281 2.303 2.695 1.601 2.775 1.930 2.744 2.239 2.585 2.925 3.562 2.864 carboxylic_acid_metabolic_process 11.171 11.054 10.735 10.383 10.274 7.647 7.414 10.200 8.657 8.141 7.880 7.731 7.383 7.368 7.668 6.101 S-adenosylmethionine_metabolic_process 1.697 1.693 1.693 1.678 1.679 1.615 1.616 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 cellular_lipid_metabolic_process 11.281 11.210 11.210 11.343 11.337 7.951 7.894 6.509 6.207 5.898 4.553 2.204 2.204 2.204 2.219 2.219 limb_development 2.037 1.746 1.605 2.126 2.281 2.303 2.695 1.601 2.775 1.930 2.744 2.239 2.585 2.925 3.562 2.864 oxoacid_metabolic_process 11.171 11.054 10.735 10.383 10.274 7.647 7.414 10.200 8.657 8.141 7.880 7.731 7.383 7.368 7.668 6.101 alditol_metabolic_process 2.461 2.456 2.456 2.432 2.429 1.653 1.651 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 alcohol_metabolic_process 10.576 10.533 10.533 10.358 10.344 8.506 8.496 7.159 6.960 6.948 4.648 3.165 3.165 3.165 3.179 3.179 inner_ear_development 1.767 2.207 2.287 2.115 2.259 2.283 2.133 1.385 2.756 2.972 2.173 2.474 3.184 3.565 3.246 2.557 cellular_ketone_metabolic_process 11.969 11.847 11.511 11.139 11.000 7.815 7.532 10.317 8.805 8.311 7.991 8.156 7.794 7.771 8.098 6.517 fa�y-acyl-CoA_metabolic_process 1.423 1.421 1.421 1.410 1.409 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 cellular_modified_amino_acid_metabolic_process 9.838 9.806 9.806 9.689 9.679 8.429 8.420 6.451 6.895 6.886 5.731 3.943 3.943 3.943 3.492 3.492 ear_development 1.738 2.067 2.156 2.009 2.164 1.940 1.632 1.394 2.631 2.818 2.337 2.384 3.282 3.318 3.044 2.496 organic_acid_metabolic_process 11.948 11.827 11.493 11.123 10.990 8.505 8.255 11.139 9.491 8.742 8.678 8.115 7.767 7.752 8.056 6.467 polyol_metabolic_process 1.475 1.470 1.470 1.444 1.442 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 coenzyme_metabolic_process 10.678 10.636 10.636 10.466 11.077 8.132 8.097 8.133 7.946 7.928 6.781 4.472 4.472 4.472 4.489 4.489 epithelium_development 1.501 1.505 1.754 1.568 2.320 2.942 2.380 1.502 1.656 1.902 2.325 2.394 3.017 3.108 3.990 4.114 mRNA_metabolic_process 10.373 10.286 10.050 9.377 8.817 7.974 7.740 9.045 9.529 9.817 9.377 10.339 10.367 10.353 10.355 10.691 acyl-CoA_metabolic_process 1.995 1.988 1.988 1.959 1.957 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 cofactor_metabolic_process 11.222 11.174 11.174 11.002 11.503 8.150 8.539 8.214 8.027 8.007 7.078 4.934 4.934 4.934 4.955 4.955 cardiac_chamber_development 2.187 1.712 1.535 1.590 1.729 2.535 2.661 1.673 2.401 2.046 1.887 1.893 3.232 3.266 2.954 2.762 cellular_macromolecule_metabolic_process 4.319 4.317 4.263 3.242 3.408 3.964 3.792 3.397 3.235 4.156 4.256 4.684 5.546 5.512 5.275 6.043 thioester_metabolic_process 1.995 1.988 1.988 1.959 1.957 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 cholesterol_metabolic_process 8.824 8.796 8.796 8.694 8.685 7.496 7.472 7.237 7.104 7.096 5.731 5.383 5.383 5.383 5.400 5.400 vasculature_development 1.923 1.932 1.690 1.380 1.824 3.035 2.266 1.650 1.609 1.985 2.336 2.546 3.899 3.602 2.877 2.811 monocarboxylic_acid_metabolic_process 4.458 4.402 4.263 4.116 4.065 3.490 3.374 4.424 3.276 3.229 3.288 4.084 3.836 3.828 3.800 2.740 taurine_metabolic_process 2.135 2.130 2.130 2.114 2.112 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 sterol_metabolic_process 9.995 9.965 9.965 9.841 9.831 8.542 8.525 7.704 7.563 7.559 6.256 5.935 5.935 5.935 5.941 5.941 blood_vessel_development 1.802 1.621 1.392 0.000 1.599 2.942 2.143 1.527 1.503 2.148 2.213 2.427 3.825 3.711 2.793 2.831 nitrogen_compound_metabolic_process 5.313 5.366 5.477 4.387 3.803 4.144 3.822 3.556 3.596 3.848 2.905 2.770 2.534 2.523 2.513 2.720 homocysteine_metabolic_process 2.135 2.130 2.130 2.114 2.112 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 dicarboxylic_acid_metabolic_process 8.836 8.815 8.815 8.737 8.730 10.003 9.981 8.541 8.485 8.493 5.661 5.077 5.077 5.077 5.089 5.089 cardiovascular_system_development 2.135 1.775 1.372 0.000 1.599 2.509 1.985 1.394 1.371 1.622 1.966 2.135 3.378 3.320 2.888 2.801 cellular_nitrogen_compound_metabolic_process 5.994 6.075 6.186 5.071 4.419 4.775 4.374 4.092 4.216 4.486 3.390 3.224 3.095 3.070 3.050 3.275 glycine_metabolic_process 2.161 2.154 2.154 2.129 2.128 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 reac�ve_oxygen_species_metabolic_process 10.035 10.009 10.009 9.903 9.895 9.346 9.322 8.192 8.075 8.775 7.849 7.360 7.360 7.360 7.383 7.383 circulatory_system_development 2.135 1.775 1.372 0.000 1.599 2.509 1.985 1.394 1.371 1.622 1.966 2.135 3.378 3.320 2.888 2.801 coenzyme_metabolic_process 7.106 7.050 6.922 7.159 6.399 5.714 5.571 5.287 5.632 5.019 4.566 4.258 3.938 3.932 3.631 3.417 AMP_metabolic_process 0.000 0.000 0.000 0.000 0.000 1.331 1.328 1.387 1.371 1.370 1.459 1.497 1.497 1.497 1.490 1.490 cellular_amino_acid_metabolic_process 12.655 12.621 12.621 12.400 12.382 12.233 12.181 9.837 10.484 10.467 8.132 5.736 5.736 5.736 5.380 5.380 mesenchymal_cell_development 2.811 2.255 2.030 0.000 1.946 1.686 2.043 2.489 2.080 2.020 2.715 2.419 3.301 2.983 2.998 3.091 cofactor_metabolic_process 8.565 8.499 8.318 9.028 8.107 7.064 6.520 6.640 7.042 6.410 4.515 3.413 2.915 2.909 2.630 2.502 mRNA_metabolic_process 0.000 0.000 0.000 0.000 0.000 1.526 1.515 1.546 1.450 1.447 2.097 2.122 2.122 2.122 2.137 2.137 steroid_metabolic_process 12.166 12.719 12.719 13.206 13.190 10.361 10.321 10.249 10.043 10.029 7.483 5.940 5.940 5.940 5.958 5.958 connec�ve_�ssue_development 2.983 2.564 2.368 2.492 1.913 1.700 1.807 1.510 2.080 1.819 3.488 3.189 3.282 3.318 3.336 3.312 heterocycle_metabolic_process 6.399 6.518 6.747 6.444 5.717 4.600 4.206 3.810 3.994 3.425 2.399 0.000 0.000 0.000 0.000 0.000 regula�on_of_cellular_protein_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.683 1.637 1.745 1.892 2.162 2.162 2.162 2.189 2.189 lipid_metabolic_process 13.845 14.165 14.165 14.623 14.592 9.917 9.870 8.407 8.075 7.795 5.187 2.979 2.979 2.979 3.018 3.018 developmental_matura�on 2.150 2.744 2.577 2.180 1.905 1.705 2.492 1.557 3.118 3.329 4.059 3.125 4.111 3.819 4.539 3.910 rRNA_metabolic_process 2.364 2.341 2.313 2.237 2.081 1.944 1.897 1.762 1.747 2.109 3.033 3.763 3.781 3.776 3.758 3.865 nega�ve_regula�on_of_protein_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.542 1.619 1.619 1.679 1.914 1.914 1.914 1.930 1.930 monocarboxylic_acid_metabolic_process 14.680 14.617 14.617 14.954 14.933 12.049 11.995 9.805 9.555 9.157 5.781 2.727 2.727 2.727 2.496 2.496 posi�ve_regula�on_of_purine_nucleo�de_metabolic_process 2.285 3.083 2.685 3.216 2.084 2.101 3.471 1.576 2.219 2.982 4.610 3.549 5.036 3.913 3.960 4.416 nucleobase-containing_compound_metabolic_process 3.543 3.597 3.721 2.997 2.528 3.050 2.812 2.644 2.685 2.858 2.412 2.679 2.611 2.589 2.512 2.756 nega�ve_regula�on_of_cellular_protein_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.351 1.348 1.353 1.534 1.534 1.534 1.545 1.545 primary_metabolic_process 16.600 16.995 16.995 16.616 16.747 17.248 16.917 15.924 16.353 16.244 17.395 16.053 16.053 16.053 15.657 15.657 posi�ve_regula�on_of_nucleo�de_metabolic_process 2.285 3.083 2.685 3.216 2.084 2.101 3.471 1.576 2.219 2.982 4.610 3.549 5.036 3.913 3.960 4.416 fa�y_acid_metabolic_process 3.465 3.427 3.325 3.182 3.210 2.465 2.378 3.222 2.454 2.303 1.384 2.074 1.857 1.852 1.862 1.548 fructose_1,6-bisphosphate_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.392 1.392 1.459 1.497 1.497 1.497 1.490 1.490 cellular_ketone_metabolic_process 31.228 31.098 31.098 31.128 31.101 27.336 27.220 22.329 22.673 22.117 16.008 9.183 9.183 9.183 8.590 8.590 posi�ve_regula�on_of_cyclic_nucleo�de_metabolic_process 1.931 2.705 2.326 2.841 1.780 1.775 3.062 0.000 1.882 2.608 4.166 3.121 4.589 3.527 3.549 3.953 cellular_amino_acid_metabolic_process 3.694 3.652 3.538 3.140 3.115 1.739 1.668 2.897 2.449 2.310 2.885 1.963 1.970 1.967 2.163 1.519 NAD_metabolic_process 0.000 0.000 0.000 0.000 1.313 0.000 0.000 0.000 0.000 0.000 1.394 1.451 1.451 1.451 1.447 1.447 organic_acid_metabolic_process 31.481 31.352 31.352 31.378 31.336 27.211 27.096 22.144 22.492 21.937 15.348 9.326 9.326 9.326 8.739 8.739 car�lage_development 3.615 3.277 2.994 3.125 2.327 2.343 2.464 2.053 2.831 2.451 4.704 4.330 4.462 4.484 4.530 4.353 cellular_amine_metabolic_process 2.925 2.897 2.832 2.380 2.595 2.053 1.978 1.876 2.204 2.278 2.198 1.761 1.758 1.755 1.757 1.519 cellular_nitrogen_compound_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.356 1.516 1.516 1.516 1.422 1.422 oxoacid_metabolic_process 31.481 31.352 31.352 31.378 31.336 27.605 27.490 22.503 22.868 22.304 16.061 9.822 9.822 9.822 9.193 9.193 regula�on_of_nervous_system_development 3.335 3.884 4.108 3.811 2.993 2.490 2.684 1.523 2.856 3.237 4.115 3.361 4.896 4.956 5.465 4.098 cellular_biogenic_amine_metabolic_process 2.925 2.897 2.832 2.380 2.595 2.053 1.978 1.876 2.204 2.278 2.198 1.761 1.758 1.755 1.757 1.519 nitrogen_compound_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.476 1.529 1.529 1.529 1.437 1.437 carboxylic_acid_metabolic_process 31.481 31.352 31.352 31.378 31.336 27.605 27.490 22.503 22.868 22.304 16.061 9.822 9.822 9.822 9.193 9.193 regula�on_of_cell_development 3.233 3.417 3.447 3.043 2.528 2.399 2.749 1.465 2.784 3.467 3.361 2.755 5.026 5.101 5.341 3.814 purine_ribonucleoside_metabolic_process 2.582 2.569 2.752 2.641 2.432 1.981 1.912 2.191 1.725 1.837 1.718 1.810 1.807 1.802 1.993 1.868 post-embryonic_camera-type_eye_development 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.328 1.328 1.328 1.321 1.321 cellular_metabolic_process 26.140 26.580 26.580 26.421 26.540 27.237 27.115 26.107 26.590 26.748 27.406 24.509 24.509 24.509 24.031 24.031 cGMP_metabolic_process 2.448 3.402 2.740 3.496 2.897 4.424 5.434 2.402 2.974 5.393 6.528 6.525 4.463 5.389 4.522 4.831 purine_nucleoside_metabolic_process 2.459 2.427 2.599 2.481 2.306 1.851 1.786 2.046 1.624 1.729 1.606 1.708 1.692 1.688 1.883 1.760 fructose_6-phosphate_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.328 1.328 1.328 1.321 1.321 genera�on_of_precursor_metabolites_and_energy 31.481 31.352 31.352 31.128 31.101 30.758 30.679 30.919 31.236 31.208 28.275 27.721 27.721 27.721 27.803 27.803 mesenchyme_development 4.500 4.044 3.781 2.028 3.384 2.738 3.531 3.742 3.625 3.541 4.771 4.813 5.802 6.308 6.340 5.741 ribonucleoside_metabolic_process 2.207 2.175 2.353 2.249 2.053 2.053 1.977 2.267 1.809 1.914 1.798 1.878 1.876 1.871 2.099 1.971 tetrahydrobiopterin_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.328 1.328 1.328 1.321 1.321 small_molecule_metabolic_process 45.942 45.709 45.709 45.865 46.331 41.184 40.974 36.285 37.403 36.845 30.139 23.180 23.180 23.180 22.094 22.094 metanephros_development 3.368 3.380 3.435 3.517 3.637 3.669 4.312 3.834 4.405 4.930 4.997 5.019 5.486 5.508 5.539 5.373 nucleobase-containing_small_molecule_metabolic_process 3.465 3.597 3.838 3.618 3.237 2.305 2.182 1.958 1.959 1.704 0.000 0.000 0.000 0.000 0.000 0.000 regula�on_of_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.328 1.328 1.328 1.314 1.314 metabolic_process 44.291 44.822 44.822 44.545 44.722 43.358 43.108 38.821 39.195 39.360 38.078 34.760 34.760 34.760 34.316 34.316 embryonic_organ_development 3.561 4.044 4.480 3.985 4.033 3.785 3.067 2.443 5.342 3.772 4.358 4.690 4.186 4.771 4.019 4.301 lipid_metabolic_process 3.045 3.137 2.973 2.638 2.534 2.458 2.304 2.738 2.158 2.071 0.000 0.000 0.000 0.000 0.000 0.000 water-soluble_vitamin_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.319 1.319 1.319 1.313 1.313 posi�ve_regula�on_of_developmental_process 4.014 3.452 3.868 3.405 3.640 3.163 3.286 2.614 4.459 5.100 5.622 5.116 7.212 8.072 9.088 7.836 nico�namide_nucleo�de_metabolic_process 3.543 3.533 3.462 3.932 3.237 4.592 4.504 3.222 3.187 2.300 1.434 0.000 0.000 0.000 0.000 0.000 cellular_macromolecule_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 2.076 2.635 2.635 2.635 2.629 2.629 regula�on_of_developmental_process 5.048 4.336 4.773 4.474 4.214 3.937 3.270 3.061 4.220 4.756 5.871 4.593 6.541 7.422 8.063 7.075 pyridine_nucleo�de_metabolic_process 3.148 3.134 3.078 3.485 2.862 4.108 4.037 2.846 2.863 2.016 0.000 0.000 0.000 0.000 0.000 0.000 macromolecule_metabolic_process 0.000 0.000 0.000 0.000 0.000 1.851 1.783 2.090 2.174 2.232 4.065 4.934 4.934 4.934 4.937 4.937 regula�on_of_cyclic_nucleo�de_metabolic_process 4.481 5.309 4.886 5.034 3.248 3.244 4.801 3.084 3.436 4.817 6.589 6.006 8.245 7.004 7.034 6.485 pyridine-containing_compound_metabolic_process 3.148 3.134 3.078 3.485 2.862 4.108 4.037 2.846 2.863 2.016 0.000 0.000 0.000 0.000 0.000 0.000 guanosine-containing_compound_metabolic_process 0.000 0.000 0.000 0.000 0.000 2.407 2.396 2.921 3.507 3.508 3.978 4.228 4.228 4.228 4.241 4.241 sensory_organ_development 5.066 5.028 4.504 4.840 4.968 5.481 4.685 4.520 5.512 5.495 5.348 5.655 6.997 6.782 7.095 7.160 oxidoreduc�on_coenzyme_metabolic_process 4.074 4.047 3.973 4.394 3.694 3.962 3.870 2.745 2.767 2.001 0.000 0.000 0.000 0.000 0.000 0.000 GTP_metabolic_process 0.000 0.000 0.000 0.000 0.000 2.483 2.471 3.013 3.620 3.614 4.089 4.334 4.334 4.334 4.346 4.346 regula�on_of_mul�cellular_organismal_development 7.516 6.784 7.067 6.446 7.038 6.427 5.994 5.232 6.949 7.227 8.063 7.060 9.874 10.359 11.699 10.474 cellular_aroma�c_compound_metabolic_process 4.109 4.070 3.973 3.481 3.258 0.000 0.000 0.000 0.000 0.000 1.346 0.000 0.000 0.000 0.000 0.000 ncRNA_metabolic_process 0.000 0.000 0.000 0.000 0.000 1.971 1.962 2.187 2.592 2.587 3.084 3.041 3.041 3.041 3.063 3.063 nervous_system_development 8.344 7.583 7.405 5.482 6.424 7.012 7.138 3.709 7.514 7.053 7.281 7.416 9.606 10.092 10.676 10.545 guanosine-containing_compound_metabolic_process 1.989 1.958 2.137 2.049 1.869 1.397 1.338 1.393 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 hexose_metabolic_process 0.000 0.000 0.000 0.000 0.000 2.028 2.021 2.196 2.437 2.434 2.462 2.285 2.285 2.285 2.297 2.297 anatomical_structure_development 8.300 7.196 6.942 6.460 6.733 8.102 7.138 4.271 7.804 8.266 9.310 10.112 11.505 11.502 11.198 11.484 nucleo�de_metabolic_process 2.524 2.636 2.879 2.721 2.419 2.015 1.912 1.535 1.535 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ribonucleoside_metabolic_process 1.799 1.785 1.785 1.727 1.724 3.295 3.277 3.561 4.104 4.096 4.320 4.268 4.268 4.268 4.282 4.282 Module 0 organ_development 10.640 9.277 8.967 9.002 9.837 10.166 9.432 6.506 8.732 9.576 10.560 10.676 14.487 13.884 14.627 15.251 nucleoside_phosphate_metabolic_process 2.524 2.636 2.879 2.721 2.419 2.015 1.912 1.535 1.535 0.000 0.000 0.000 0.000 0.000 0.000 0.000 purine_ribonucleoside_metabolic_process 1.685 1.670 1.670 1.630 1.627 3.230 3.210 3.483 4.040 4.031 4.229 4.156 4.156 4.156 4.168 4.168 0.0 5.0 10.0 developmental_process 9.295 8.848 8.503 8.400 9.069 9.844 8.948 6.488 9.948 11.459 11.878 12.478 13.998 14.363 14.580 14.610 NAD_metabolic_process 2.306 2.293 2.286 2.817 2.154 2.167 2.137 1.570 1.575 1.620 0.000 0.000 0.000 0.000 0.000 0.000 purine_nucleoside_metabolic_process 1.619 1.604 1.604 1.554 1.551 3.111 3.097 3.366 3.916 3.911 4.095 4.024 4.024 4.024 4.040 4.040 cellular_developmental_process 11.706 11.612 11.857 11.256 11.294 11.313 11.052 8.035 12.105 11.607 12.811 12.775 14.386 15.010 15.702 14.466 steroid_metabolic_process 1.993 2.162 2.136 1.590 1.436 2.250 2.179 1.604 1.395 1.478 0.000 0.000 0.000 0.000 0.000 0.000 nucleoside_metabolic_process 1.722 1.707 1.707 1.659 1.656 3.146 3.127 3.114 3.598 3.593 3.828 3.773 3.773 3.773 3.788 3.788 system_development 11.815 10.608 10.148 9.933 10.664 11.469 11.044 7.236 10.748 11.459 12.766 13.303 16.097 15.778 15.857 15.950 Module 1 blastocyst_development 1.606 1.584 1.546 1.485 1.388 1.471 1.423 1.333 1.660 1.729 0.000 0.000 0.000 0.000 0.000 0.000 regula�on_of_protein_metabolic_process 1.903 1.998 1.998 1.871 1.863 2.307 2.410 2.924 2.811 2.960 3.428 3.889 3.889 3.889 3.929 3.929 mul�cellular_organismal_development 11.621 10.927 10.386 10.272 11.404 12.478 11.439 7.664 12.072 12.988 14.389 15.201 16.926 17.107 17.666 17.945 0.0 5.0 10.0 cellular_lipid_metabolic_process 2.123 2.078 2.002 1.986 1.959 0.000 0.000 1.853 1.436 1.445 0.000 0.000 0.000 0.000 0.000 0.000 glucose_metabolic_process 1.588 1.578 1.578 1.538 1.535 2.705 2.692 3.282 3.617 3.611 3.127 2.881 2.881 2.881 2.896 2.896 GTP_metabolic_process 1.957 1.927 2.123 2.030 1.859 1.361 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 pyridine_nucleo�de_metabolic_process 2.478 2.470 2.470 2.437 2.953 2.734 2.727 2.869 2.811 2.807 3.071 3.178 3.178 3.178 3.183 3.183 heme_metabolic_process 1.586 1.569 1.536 1.958 1.863 1.451 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 pyridine-containing_compound_metabolic_process 2.478 2.470 2.470 2.437 2.953 2.734 2.727 2.869 2.811 2.807 3.071 3.178 3.178 3.178 3.183 3.183 Module 2 pigment_metabolic_process 1.542 1.524 1.485 1.780 1.679 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 nico�namide_nucleo�de_metabolic_process 2.209 2.201 2.201 2.162 2.673 2.480 2.473 2.610 2.555 2.553 2.798 2.888 2.888 2.888 2.898 2.898 0.0 5.0 10.0 amyloid_precursor_protein_metabolic_process 1.794 1.779 1.762 1.707 1.656 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 oxidoreduc�on_coenzyme_metabolic_process 1.894 1.886 1.886 1.857 2.294 2.101 2.096 2.228 2.169 2.167 2.426 2.519 2.519 2.519 2.527 2.527 pteridine-containing_compound_metabolic_process 1.702 1.679 1.650 1.587 1.506 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 L-ascorbic_acid_metabolic_process 2.135 2.130 2.130 2.114 2.112 2.032 2.033 2.105 2.076 2.072 2.190 2.214 2.214 2.214 2.214 2.214 Module 3 branched_chain_family_amino_acid_metabolic_process 1.643 1.629 1.608 1.550 1.488 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 nucleoside_diphosphate_metabolic_process 3.001 2.990 2.990 2.954 2.953 3.406 3.398 3.535 3.497 3.501 3.712 3.819 3.819 3.819 3.150 3.150 nucleoside_triphosphate_metabolic_process 1.566 1.537 1.608 1.495 1.312 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ribonucleoside_diphosphate_metabolic_process 3.403 3.394 3.394 3.364 3.365 3.194 3.189 3.300 3.269 3.266 3.426 3.495 3.495 3.495 3.495 3.495 0.0 5.0 10.0 indole-containing_compound_metabolic_process 2.039 2.023 2.009 1.316 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 purine_ribonucleoside_diphosphate_metabolic_process 3.574 3.565 3.565 3.538 3.539 3.371 3.365 3.471 3.449 3.454 3.595 3.649 3.649 3.649 3.651 3.651 indolalkylamine_metabolic_process 2.039 2.023 2.009 1.316 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 purine_nucleoside_diphosphate_metabolic_process 3.574 3.565 3.565 3.538 3.539 3.371 3.365 3.471 3.449 3.454 3.595 3.649 3.649 3.649 3.651 3.651 Module 4 0h 0.5h 1h 2h 4h 6h 8h 10h 12h 14h 16h 18h 20h 22h 24h 36h pentose_metabolic_process 1.393 1.381 1.355 1.316 0.000 2.694 2.666 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 regula�on_of_nucleobase-containing_compound_metabolic_process 5.895 6.577 6.086 5.695 3.950 2.484 2.948 3.030 2.303 1.367 0.000 0.000 0.000 0.000 0.000 0.000 2-oxoglutarate_metabolic_process 3.808 3.799 3.799 3.775 3.774 4.531 4.520 3.692 3.657 3.655 2.977 3.040 3.040 3.040 3.048 3.048 0.0 5.0 10.0 monosaccharide_metabolic_process 1.586 1.562 1.506 1.421 0.000 1.634 1.578 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 regula�on_of_nitrogen_compound_metabolic_process 5.658 6.281 6.000 5.695 3.794 2.202 2.948 2.908 2.047 0.000 0.000 0.000 0.000 0.000 0.000 0.000 oxaloacetate_metabolic_process 3.094 3.087 3.087 3.063 3.059 3.948 3.949 4.046 4.004 4.000 2.239 1.497 1.497 1.497 1.490 1.490 acetyl-CoA_metabolic_process 1.696 1.678 1.638 1.570 1.481 1.547 1.508 0.000 0.000 0.000 1.391 1.592 0.000 0.000 0.000 0.000 regula�on_of_RNA_metabolic_process 4.923 5.682 4.803 4.744 2.993 1.870 2.285 2.536 1.758 0.000 0.000 0.000 0.000 0.000 0.000 0.000 cellular_hormone_metabolic_process 2.356 2.802 2.802 3.234 3.231 3.454 3.441 3.616 3.557 3.557 2.010 1.358 1.358 1.358 1.358 1.358 -log10 q-value NADP_metabolic_process 1.334 1.320 0.000 0.000 0.000 2.167 2.137 1.570 1.575 0.000 0.000 0.000 0.000 0.000 0.000 0.000 DNA_metabolic_process 2.939 3.905 2.623 2.933 2.993 1.918 1.839 2.013 1.758 1.834 0.000 1.605 0.000 0.000 1.589 1.526 tyrosine_metabolic_process 2.888 2.879 2.879 2.857 2.859 2.747 2.744 2.827 2.801 2.798 2.925 1.985 1.985 1.985 1.986 1.986 ncRNA_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.659 1.814 1.811 1.814 1.908 nucleic_acid_metabolic_process 2.884 4.486 3.886 3.341 1.680 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 cellular_aroma�c_compound_metabolic_process 2.841 2.823 2.823 2.772 2.769 2.458 2.449 2.650 2.558 2.556 2.271 1.790 1.790 1.790 1.519 1.519 glutamine_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.496 1.489 1.487 1.475 1.519 regula�on_of_primary_metabolic_process 2.848 3.678 3.125 3.341 2.334 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 posi�ve_regula�on_of_steroid_metabolic_process 3.524 3.514 3.514 4.299 4.295 3.286 3.277 3.402 3.369 3.368 3.552 2.888 2.888 2.888 2.896 2.896 glyoxylate_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.384 1.500 1.493 1.492 1.478 1.519 regula�on_of_cellular_metabolic_process 2.618 3.678 3.169 3.302 2.069 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 posi�ve_regula�on_of_lipid_metabolic_process 3.733 3.718 3.718 4.151 4.145 3.343 3.325 3.530 3.457 3.460 3.349 2.635 2.635 2.635 2.647 2.647 RNA_metabolic_process 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.433 1.773 1.805 2.128 2.178 2.165 1.918 2.221 regula�on_of_macromolecule_metabolic_process 2.848 3.873 2.926 2.681 1.403 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

alcohol_metabolic_process 1.764 1.732 1.671 1.399 0.000 0.000 0.000 0.000 1.575 1.872 1.361 1.822 2.060 2.055 1.817 1.692 regula�on_of_metabolic_process 1.522 2.504 2.148 2.273 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

59 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S6: Enriched GO terms in dedifferentiation datset that show significant changed in enrichment be- tween earlier time points and later time points.

60 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A Expression prediction 0.90

0.85

0.80

0.75

0.70

0.65 ESC PIPS 0.60 MEF48

Avg Correlation 0.55 MEF

0.50

0.45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 B Similarity to next iteration 0.99

0.98

0.97

0.96

0.95

0.94

0.93 Avg Index Jaccard 0.92

0.91 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 C Similarity to first iteration 0.95

0.90

0.85

0.80

0.75

0.70

0.65 Avg Index Jaccard 0.60

0.55 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Iteration

61 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S7: The effect of iteration on performance of DRMN (on Motif+Chromatin, k=3, using fused LASSO (ρ1 = 100, ρ2 = 50, ρ3 = 0). A) Average correlation (over all modules) for each cell line, as a function of number of iterations. Each marker corresponds to a cell line. B) Similarity of module assignments between consecutive iterations. C) Similarity of module assignments between iteration 1 and iteration i.

62 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Fused LASSO, k=5 ATAC+Motif+Chromatin 0.76 ρ3 0.74 0 10 0.72 20 0.70 30 40 0.68 50

Average Correlation 0.66

0.64 Motif+Chromatin 0.76

0.74

0.72

0.70

0.68

Average Correlation 0.66

0.64 ATAC+Motif 0.43

0.41

0.39

0.37

0.35 Average Correlation

0.33

Q-Motif 0.42

0.40

0.38

0.36 Average Correlation 0.34

Motif 0.36

0.32

0.28

0.24 Average Correlation 0.20 0 0 0 0 0 0 0

ρ2 10 20 30 40 50 10 20 30 40 50 10 20 30 40 50 10 20 30 40 50 10 20 30 40 50 10 20 30 40 50 10 20 30 40 50 80 90 ρ1 70 110 100 120 130

63 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S8: The effect of group penalty on DRMN performance. Each panel corresponds to one of sequenc- ing feature sets. In each panel, different colors corresponds to different values of ρ3, the group LASSO penalty (enforcing selection of same features for all cell lines).

64 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Fused LASSO, k=5 Fused LASSO, k=5 ATAC+Motif+Chromatin ATAC+Motif+Chromatin 0.75 400 350 0.73 300 0.71 250 200 0.69 150 100 0.67 Average Correlation 50 0.65 Motif+Chromatin Motif+Chromatin 0.75 400 350 0.73 300 250 0.71 200 150 0.69 100 Average Correlation vrg euesz Average size Feture Average size Feture 0.67 50

ATAC+Motif ATAC+Motif 0.48 400 350 0.46 ρ 300 0.44 2 0 250 0.42 10 200 20 150 0.40 30 40 100

Average Correlation 0.38 50 50 0.36

Q-Motif Q-Motif 0.43 350 300 0.41 250 200 0.39 150 100 Average Correlation 0.37 Average size Feture Average size Feture 50

Motif Motif 0.44 400 350 0.40 300 0.36 250 200 0.32 150 100 0.28 Average Correlation Average size Feture 50 0.24 1 1 2 2 5 5 10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90 0.5 0.5 110 110 100 120 130 100 120 130

ρ1 ρ1

65 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.18.210328; this version posted July 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S9: The effect of fused penalty on DRMN performance and the number of selected features. Each panel corresponds to one of sequencing feature sets. In each panel, different colors correspond to differ- ent values of ρ2, the fused LASSO penalty (enforcing similarity of features selected for closer cell lines). First column shows the average correlation of predicted expression, and second columns shows the average number of selected features (averaged over cell lines and modules).

66