bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
Transcriptional landscape of cell lines and their tissues of origin
Camila M. Lopes-Ramos1,2,†, Joseph N. Paulson1,2,†, Cho-Yi Chen1,2, Marieke L. Kuijjer1,2, Maud
Fagny1,2, John Platig1,2, Abhijeet R. Sonawane3, Dawn L. DeMeo3,4, John Quackenbush1,2,3,5*,
Kimberly Glass1,2,3*
1Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
2Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
3Channing Division of Network Medicine, Brigham and Women’s Hospital, and Harvard Medical School,
Boston, MA, USA
4Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, MA, USA
5Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
*To whom correspondence should be addressed
†These authors contributed equally to this work
Contact Information
1 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
Summary
Cell lines are an indispensable tool in biomedical research and often used as surrogates for
tissues. An important question is how well a cell line’s transcriptional and regulatory processes
reflect those of its tissue of origin. We analyzed RNA-Seq data from GTEx for 127 paired
Epstein-Barr virus transformed lymphoblastoid cell lines and whole blood samples; and 244
paired fibroblast cell lines and skin biopsies. A combination of gene expression and network
analyses shows that while cell lines carry the expression signatures of their primary tissues,
albeit at reduced levels, they also exhibit changes in their patterns of transcription factor
regulation. Cell cycle genes are over-expressed in cell lines compared to primary tissue, and they
have a reduction of repressive transcription factor targeting. Our results provide insight into the
expression and regulatory alterations observed in cell lines and suggest that these changes should
be considered when using cell lines as models.
Keywords: Regulatory networks, Transcriptome, GTEx, lymphoblastoid cell lines, fibroblast
cell lines
Highlights
• Cell lines differ from their source tissues in gene expression and regulation
• Distinct cell lines share altered patterns of cell cycle regulation
• Cell cycle genes are less strongly targeted by repressive TFs in cell lines
• Cell lines share expression with their source tissue, but at reduced levels
2 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
Introduction
Cell lines are an essential tool in cellular and molecular biology, providing a lasting
resource, ideally matching a particular genotype and phenotype in a controllable and
reproducible setting. Cell lines have accelerated investigation of many biological processes,
benefiting drug development and biomarker identification (Hu et al., 2006; Tan et al., 2011).
Despite their merits as an experimental system, cell lines do not capture tissue complexity and
heterogeneity, mainly because they consist of a single cell type that is adapted to grow in culture
and lacks interactions with other cell types, the extracellular matrix, or paracrine signaling. There
are many other factors that can influence cell line properties, including the methods that are used
to establish and maintain them. Normal cells in culture divide a limited number of times and
reach a state of nonproliferation called senescence; these are known as finite cell lines (Stacey,
2006). However, some cells acquire the ability to proliferate indefinitely in a process called
transformation, which can be spontaneous or induced, and these are known as continuous cell
lines.
Normal human fibroblasts are a common type of finite cell line that are easily isolated
and grown in culture, but almost never spontaneously immortalize (Sherr and DePinho, 2000;
Wright and Shay, 2000). Because normal fibroblasts show no genetic alterations in oncogenes
and tumor suppressors, they are extensively used as model systems. For example, they have been
used to better understand the mechanisms associated with the development of diabetic
nephropathy, and to identify patients at risk of developing diabetic renal disease (Millioni et al.,
2012).
3 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
Epstein-Barr virus (EBV) transformed lymphoblastoid cell lines (LCLs) are among the
most widely created, archived, and analyzed continuous cell lines. LCLs have been extensively
genotyped and sequenced as part of large collaborative projects, such as the International
HapMap (International HapMap Consortium, 2003), 1000 Genomes (McVean et al., 2012),
ENCODE (Dunham et al., 2012) and Genotype-Tissue Expression (GTEx) (Lonsdale et al.,
2013) projects. EBV transforms resting B lymphocytes into proliferating lymphoblastoid cells,
which carry multiple extrachromosomal copies of the viral episome and constitutively express
six EBV nuclear antigens (EBNAs) and three latent membrane proteins (LMPs) (Young and
Rickinson, 2004). Besides being used for genetic association studies, LCLs are also used to
search for variants influencing gene expression (eQTLs), to measure response to radiation, and to
test chemotherapeutic drugs (Jen and Cheung, 2003; Shukla et al., 2008; Watters et al., 2004).
Despite their broad use, there has been concern about using LCLs as a model for primary tissues,
with two small-scale studies finding differences in gene expression profiles between LCLs and
primary B cells (Caliskan et al., 2011; Min et al., 2010).
While previous studies have analyzed the alterations in cell lines at the level of individual
genes, analyzing cell line expression in the context of other sources of regulatory information
may help us understand differences in gene regulatory processes. Complex cellular processes are
defined by a combination of signaling pathways and cell type-specific regulators that may be
represented in networks. By exploring regulatory networks, it has been possible to uncover
patterns of transcriptional regulation specific to different cell types, such as immune (Heinz et
al., 2010; Vandenbon et al., 2016) and stem cells (Müller et al., 2008), and to better understand
cell differentiation (Yun and Wold, 1996), pluripotency (Kim et al., 2008), and development
(Davidson et al., 2002). Moreover, commonly expressed transcription factors (TFs) may display
4 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
distinct regulation patterns within different cell types (Neph et al., 2012) or different disease state
(Glass et al., 2015).
The GTEx consortium generated a large multi-subject data set that offers an
unprecedented opportunity to understand how well a cell line’s regulatory processes recapitulate
those of its tissue of origin. GTEx version 6.0 includes RNA-Seq data for 127 paired LCL and
whole blood samples, and 244 paired fibroblast cell line and skin samples.
A detailed investigation of gene expression and gene regulatory networks using these
data indicates that, although many pathways are preserved between cell lines and their tissues of
origin, some biological processes that help define the function of the primary tissue are altered.
We also find that the cell lines exhibit changes in their apparent patterns of TF regulation. For
example, while cell cycle genes are over-expressed in cell lines compared to their tissues of
origin, they have an overall decrease in negative regulation by several TFs that are known to
function as repressors. These findings suggest that the properties of cell lines must be carefully
considered before using them as surrogates for primary tissue as some critical features of gene
regulation may be missing or altered.
Results
Transcriptional landscape of cell lines and their tissues of origin
The GTEx version 6.0 project collected post mortem biopsies from multiple tissues and
created LCLs and fibroblast cell lines. For the analysis described here, we used only data from
research subjects for whom primary tissue and matching cell lines were available. Data were
5 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
available for 127 paired whole blood samples and LCLs, and for 244 paired full-thickness skin
biopsies and primary fibroblast cell lines (Lonsdale et al., 2013); 89 subjects have data across all
four groups.
We found that cell lines and tissues have generally similar RNA-Seq transcriptomic
profiles in terms of numbers of genes and the general functional categories to which those genes
map (protein coding, antisense, pseudogene, lincRNA, and other; Figure 1A). We used principal
components analysis (PCA) on paired samples between the two tissues and cell lines to examine
the relationship between the expression levels of genes in cell lines and their tissues of origin. As
seen in the projection of the first two components, the gene expression easily distinguishes the
four groups (Figure 1B). The first principal component and the majority of the variability (37%)
separated blood and LCLs from the skin and fibroblast samples. The second component (22%)
separated tissues from cell lines. This indicates that while gene expression separates samples
based on their tissues of origin, there is also a significant separation between cell lines and
primary tissues.
Differential expression between cell lines and their tissues of origin
To investigate which biological pathways are enriched in differentially expressed genes
between cell lines and their tissues of origin, we used voom (Law et al., 2014) followed by Gene
Set Enrichment Analysis (GSEA) (Subramanian et al., 2005). We found 8,617 genes (32%) to be
differentially expressed between LCLs and blood paired samples (absolute log2 fold change > 2
and FDR < 0.05) with most of the differentially expressed genes (71%) over-expressed in LCLs
(Figure 2A, Table S1). In comparing fibroblasts and skin paired samples, we identified 5,655
6 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
differentially expressed genes (21%). In contrast to the LCL-vs-blood comparison, most of the
differentially expressed genes (68%) had increased expression in the primary tissue rather than in
the cell line. Previous studies have also shown that approximately 30% to 50% of genes have
significant differential expression between cell lines and tissue samples (Caliskan et al., 2011;
Sandberg and Ernberg, 2005).
Using GSEA, with genes ranked by the moderated t-statistic from voom, we identified
KEGG pathways (Kanehisa et al., 2015) enriched for differentially expressed genes between cell
lines and their tissues of origin. Consistent with the separation observed in the PCA, both cell
lines exhibit enrichment for pathways with similar biological functions compared to their tissues
of origin (Figure 2B). While immune processes are down-regulated in cell lines, the pathways
with positive enrichment are generally associated with cellular growth, and include cell cycle,
DNA replication and repair, and transcription processes. Cell cycle genes have been previously
found to be over-expressed in many cell line models (Lukk et al., 2010; Pan et al., 2009;
Sandberg and Ernberg, 2005).
When comparing blood to LCLs, we found that pathways enriched in blood were related
to immune system function, including complement and coagulation cascades, hematopoietic cell
lineage, chemokine signaling, and natural killer cell mediated cytotoxicity, while the pathways
enriched in LCLs were associated with cell growth and death, DNA replication and repair,
transcription, translation, amino acid and carbohydrate metabolism (FDR < 0.05, Figure 2B).
Similarly, when comparing skin to fibroblasts, the pathways enriched in skin were related to the
immune system, lipid and vitamin metabolism, cell adhesion, and melanogenesis, while the
pathways enriched in fibroblasts were associated with cell growth and death, DNA replication
7 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
and repair, transcription, and protein degradation (FDR < 0.05, Figure 2B). The significance of
all KEGG pathways is listed in Supplemental Table S2.
We also performed differential expression and KEGG pathway enrichment analysis
comparing the two tissues and comparing the two cell lines. We found a number of immune
signaling pathways enriched in blood compared to skin and in LCLs compared to fibroblasts
(Figure 2C). For example, pathways related to the biological function of B cells (B cell receptor
signaling, toll-like receptor signaling, antigen processing and presentation) were enriched in
LCLs and blood when comparing them to fibroblasts and skin, respectively. However, some
immune related pathways, including chemokine signaling and natural killer cell mediated
cytotoxicity, were also enriched in LCLs and blood compared to fibroblasts and skin, but they
were expressed at lower levels in LCLs compared to blood. The pathways enriched in fibroblasts
and skin compared to LCLs and blood were associated with biological processes related to
maintaining skin structure and organization and included cell-cell junction, extracellular matrix
interaction, and TGF-β signaling (Figure 2C).
Overall, we found that the preserved pathways in cell lines are mainly related to the cell
type specific functions (B cells or fibroblasts) rather than tissue-enriched functions. Further,
many of the genes in pathways that help define the function of the tissue are under-expressed in
cell lines relative to their tissues of origin.
Cell line and tissue-specific gene regulatory networks
Understanding the structure of gene regulation in cell lines compared to their tissues of
origin has the potential to help interpret differential expression results and to reveal important
8 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
regulatory differences. We used PANDA (Glass et al., 2013) to integrate gene expression,
predicted TF binding location and TF-TF interactions, and to estimate gene regulatory networks
in LCL, blood, fibroblast, and skin (Figure S1A).
In the PANDA gene regulatory networks, an edge connects a TF to a target gene, and the
associated edge weight indicates the strength of the inferred regulatory relationship. For each TF
we computed the difference between the “out-degree” (sum of edge weights from that TF) in the
cell line network and the corresponding tissue of origin network. We ranked TFs by their
absolute difference in out-degree (differential targeting) and found that TFs with the largest
differential targeting were involved in cellular responses to stress and DNA damage, and in the
control of cellular growth (Figure S1B, Table S3). For both the LCL-vs-blood and fibroblast-vs-
skin comparisons, many of the top differentially-targeting TFs, such as TP63, TOPORS, and
KLF15, belong to the p53 family or interact with p53 and are important mediators of DNA
damage response regulating cell cycle arrest, DNA repair and apoptosis (Candi et al., 2014;
Haldar et al., 2010; Levrero et al., 2000; Lin et al., 2005).
We found SP1 and SP3 had increased targeting in cell lines in both network comparisons,
LCL-vs-blood and fibroblast-vs-skin. These TFs have more than 12,000 binding sites in the
human genome and are involved in essential cellular processes, including proliferation,
differentiation, and DNA damage response (Beishline and Azizkhan-Clifford, 2015; Cawley et
al., 2004). It is important to note that SP1 and SP3 are not differentially expressed between cell
line and tissue samples. However, network comparisons captured the regulatory “rewiring” of
these TFs and their target genes, revealing potential differences in targeting even in cases where
the TFs themselves were not differentially expressed. Thus, these network models suggest that
TFs alter their patterns of regulation in cell lines, either through changing their expression,
9 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
altering the genes they target, or by participating in different regulatory contexts in controlling
their potential target genes (Figure S1C).
Cell cycle pathway genes are less strongly targeted by TFs in cell lines
We then tested whether or not changes in inferred TF targeting preferentially affected
genes belonging to specific biological pathways. Similar to the TF’s out-degree, we calculated
each gene’s “in-degree” as the sum of edge weights connected to a gene, which allows for a
comparison of how strongly targeted each gene is by the complete set of TFs. We compared the
in-degree differences for genes of a specific pathway against all other genes using an unpaired t-
test (Table S4). For the pathways over-expressed in the cell lines, such as cell cycle, DNA repair,
and DNA replication, we found a marked reduction of targeting in cell lines compared to their
tissues of origin (Figure S2).
To better understand these differences, we explored the network around 121 genes of the
KEGG cell cycle pathway (Figure 3, cell cycle gene names listed in Table S5). When comparing
the log2 fold change of the expression levels of these genes with their differential targeting, we
found a negative correlation (-0.48 to -0.62) for many TFs, including SMAD5 (Figure 4A),
E2F8, ZBTB14, ETV5, HLTF, USF1, IKZF1, and USF2 (Figure S3). This indicates that, even
though the cell cycle genes are over-expressed, they are less strongly targeted by these TFs in
LCLs compared to blood. To confirm this, we used a permutation analysis using random gene
sets equal in size to the cell cycle gene set and all negative correlations were significant (FDR <
0.05).
10 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
This analysis suggests that these TFs play a role as negative regulators of the cell cycle.
Indeed, these TFs are known regulators of the cell cycle, and many have documented roles in
repressing genes that promote the cell cycle. Given the top TFs found in the network analysis as
examples, SMAD5 can repress transcription, leading to proliferation inhibition after TGF-β
signaling (Bakkebø et al., 2010). E2F8 directly binds to E2F family target genes and repress their
transcription (Christensen et al., 2005; Ghazaryan et al., 2014; Li et al., 2008). ZBTB14 is a
transcriptional repressor of the mouse c-myc gene (Numoto et al., 1993).
To corroborate our network predictions, we looked for other biological evidence that
these TFs regulate cell cycle genes. We downloaded the LCL GM12878 ENCODE ChIP-Seq
assays for the available TFs (SMAD5, IKZF1, USF1, USF2). We then identified the genes with
peaks for these TFs in their promoter region. We also calculated the correlation between the
expression of each TF and the cell cycle genes with TF ChIP-Seq binding evidence.
SMAD5, the TF with the highest inverse correlation between the expression and targeting
of cell cycle genes (Figure 4A), binds to the promoter region of 113 out of the 121 cell cycle
genes based on ENCODE ChIP-Seq data. As expected, we found a higher negative correlation
between the expression of SMAD5 and the expression of its target genes in cell lines compared
to tissue samples in GTEx (p-value = 8.7×10-09, Figure 4B). The combined GTEx/ENCODE
results suggest that cell cycle regulation involves a complex interplay between changes in the
expression of regulatory TFs and alterations in the binding of these TFs to their targets.
There is also extensive functional evidence that SMAD5 targets genes to inhibit cellular
growth. SMAD proteins have a key role as signal transducers of the TGF-β family members to
mediate growth inhibition and apoptosis (Derynck and Zhang, 2003). SMAD5 negatively
regulates cell proliferation during embryonic hematopoiesis (Liu et al., 2003), in B-cell
11 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
lymphoma (Bakkebø et al., 2010), and it induces cell cycle arrest in response to shear stress in
tumor cell lines (Chang et al., 2008).
We repeated this analysis for the three other TFs with ChIP-Seq data available in
ENCODE. We found similar trends for IKZF1 and USF2, but not for USF1 (Figure S3). Figure
4C shows a summary visualization of the expression correlation between the four TFs and the
cell cycle genes with TF ChIP-Seq binding evidence, and supports the network-based conclusion
that these TFs are negative regulators of cell cycle genes. The smaller correlation observed for
USF1 might be due to the alteration of USF1 gene regulatory properties depending on its post-
translational modification (Corre et al., 2009).
Experimental evidence also demonstrates anti-proliferative roles for USF1 and USF2.
Loss or impairment of USF transcriptional activity is a common event in cancer cell lines and is
associated with increased proliferation (Ismail et al., 1999; Qyang et al., 1999). Additionally,
over-expression of USF, and in particular USF2, is known to suppress growth in a number of cell
lines (Luo and Sawadogo, 1996; Olave et al., 2010). CDK4, which controls the progression of
cells through G1, is transcriptionally regulated by USF1, USF2 and c-Myc in non-tumorigenic
mammary cells (Pawar et al., 2004). However, in breast cancer cell lines USF is transcriptionally
inactive, and the CDK4 promoter loses control by USF and c-Myc, altering CDK4 gene
regulation and its ability to respond to signals in cancer cells (Pawar et al., 2004). We found
similar differences in CDK4 regulation when we compared networks from LCL and blood.
Consistent with its higher expression in LCLs, CDK4 is also less strongly targeted by USF1,
USF2, and c-Myc.
When comparing fibroblasts and skin, we did not find the same strong negative
correlation between cell cycle gene expression and specific TF targeting (Figure S4A). This may
12 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
be due to the smaller changes we observed in expression of cell cycle genes in fibroblasts-vs-
skin, in contrast to the LCL-vs-blood comparison, which is potentially related to the fact that
LCL is transformed and fibroblast is not transformed. However, when we analyzed the
relationship between SMAD5 and the cell cycle genes expression, we found similar results for
fibroblasts (Figure S4B), indicating that some of the patterns we observe for LCLs may be true at
a smaller scale in other cell line models.
Discussion
Cell lines are widely used as experimental models to explore basic cellular biology, to
study gene regulation, and to test drugs and other compounds. One important question is whether
cell lines reflect the properties of the primary tissues from which they are derived. By studying
gene expression and gene regulatory networks, we were able to uncover patterns of
transcriptional regulation that differentiate cell lines from their tissues of origin. To the best of
our knowledge, this is the first study that compares the differences in regulatory networks
between cell lines and their tissues of origin and reveal differences not observed in differential
expression analyses.
In comparing two tissue/cell line pairs, blood/LCLs and skin/fibroblasts, we find that
while cell lines and their tissues of origin express similar numbers of genes and the functional
categories to which those genes map are similar, there are also important transcriptional
differences. We found that approximately 26% of the expressed genes were differentially
expressed between the cell lines and the corresponding tissue of origin. Both LCL and fibroblast
cell lines have increased expression of genes in pathways associated with proliferation and DNA
13 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
repair relative to their tissues of origin. Cell lines also have reduced expression of genes
associated with many of the primary functions that characterize the tissues. Our analysis shows
that while many tissue-enriched pathways are downregulated in cell lines, cell type-specific
pathways are preserved. This suggests that studies focusing on these cell type-specific pathways
can use cell lines as models, although users should be aware of the unique aspects of cell line
expression and regulation.
We investigated the potential drivers of the transcriptional changes by modeling gene
regulatory networks and comparing the regulatory networks of cell lines and their tissues of
origin. The most striking difference we observed was in regulation of processes associated with
cellular proliferation. Processes including cell cycle, DNA repair, and DNA replication were
more highly expressed in cell lines, and we found that they had lower overall targeting by a
number of cell cycle-associated TFs that are known to function as repressors. These results
indicate that cell lines switch off a number of transcriptional repressors that result in an overall
increase in cell cycle-related transcription.
Consistent with our findings that LCLs lose overall negative transcriptional regulation, a
recent study showed hypo-methylation of 250 genes after EBV transformation; in this case the
cellular machinery could not maintain DNA methylation (Hernando et al., 2013). While
alterations in the epigenetic profiles of LCLs have been demonstrated (Caliskan et al., 2011;
Hernando et al., 2014), our analysis is the first to explore the changes of TFs regulatory
targeting.
The fact that the GTEx cell lines were created in very different ways – one transformed
and the other a primary cell line – suggests that the global alterations in transcriptional patterns
may be associated with growing in culture, the lack of tissue context, and decreased cellular
14 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
heterogeneity. Some of the changes may also be associated with the transformation process as
we observed smaller changes in expression and regulation of cell cycle genes in fibroblasts
compared to LCLs.
Understanding that differences exist between cell lines and tissues in patterns of TF
targeting is important for designing and interpreting experimental studies using cell line models.
This is especially true for the development of targeted therapeutics, where failures might be
associated with therapeutics that target pathways that are altered in cell lines relative to the tissue
from which they are derived. Our analysis provides insights into which transcriptional processes
are altered and demonstrates that cell lines exhibit gene expression and regulatory changes that
distinguish them from their primary tissues and that these changes and the processes associated
with them should be considered when using cell lines as model.
Author contributions
All authors contributed to the conception and design of the study. CLR and JNP analyzed
the data. All authors contributed to writing and editing of the manuscript. All authors read and
approved the final manuscript.
Acknowledgments
This work was supported by grants from the US National institutes of Health, including
grants from the National Heart, Lung, and Blood Institute (5P01HL105339, 5R01HL111759,
5P01HL114501, K25HL133599), the National Cancer Institute (5P50CA127003,
1R35CA197449, 1U01CA190234, 5P30CA006516), and the National Institute of Allergy and
Infectious Disease (5R01AI099204). Additional funding was provided through a grant from the
15 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
NVIDIA foundation. CML was supported by Sao Paulo Research Foundation (FAPESP) grant
2014/19062-9. This work was conducted under dBGaP approved protocol #9112.
References
Bakkebø, M., Huse, K., Hilden, V.I., Smeland, E.B., and Oksvold, M.P. (2010). TGF-β-induced
growth inhibition in B-cell lymphoma correlates with Smad1/5 signalling and constitutively
active p38 MAPK. BMC Immunol. 11, 57.
Beishline, K., and Azizkhan-Clifford, J. (2015). Sp1 and the “hallmarks of cancer”. FEBS J. 282,
224–258.
Caliskan, M., Cusanovich, D.A., Ober, C., and Gilad, Y. (2011). The effects of EBV
transformation on gene expression levels and methylation profiles. Hum. Mol. Genet. 20, 1643–
1652.
Candi, E., Agostini, M., Melino, G., and Bernassola, F. (2014). How the TP53 family proteins
TP63 and TP73 contribute to tumorigenesis: regulators and effectors. Hum. Mutat. 35, 702–714.
Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A.,
Sementchenko, V., Cheng, J., Williams, A.J., et al. (2004). Unbiased mapping of transcription
factor binding sites along human chromosomes 21 and 22 points to widespread regulation of
noncoding RNAs. Cell 116, 499–509.
Chang, S.-F., Chang, C.A., Lee, D.-Y., Lee, P.-L., Yeh, Y.-M., Yeh, C.-R., Cheng, C.-K., Chien,
S., and Chiu, J.-J. (2008). Tumor cell cycle arrest induced by shear stress: Roles of integrins and
Smad. Proc. Natl. Acad. Sci. U. S. A. 105, 3927–3932.
Christensen, J., Cloos, P., Toftegaard, U., Klinkenberg, D., Bracken, A.P., Trinh, E., Heeran, M.,
16 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
Di Stefano, L., and Helin, K. (2005). Characterization of E2F8, a novel E2F-like cell-cycle
regulated repressor of E2F-activated transcription. Nucleic Acids Res. 33, 5458–5470.
Corre, S., Primot, A., Baron, Y., Le Seyec, J., Goding, C., and Galibert, M.-D. (2009). Target
gene specificity of USF-1 is directed via p38-mediated phosphorylation-dependent acetylation. J.
Biol. Chem. 284, 18851–18862.
Davidson, E.H., Rast, J.P., Oliveri, P., Ransick, A., Calestani, C., Yuh, C.-H., Minokawa, T.,
Amore, G., Hinman, V., Arenas-Mena, C., et al. (2002). A genomic regulatory network for
development. Science 295, 1669–1678.
Derynck, R., and Zhang, Y.E. (2003). Smad-dependent and Smad-independent pathways in
TGF-beta family signalling. Nature 425, 577–584.
Dunham, I., Kundaje, A., Aldred, S.F., Collins, P.J., Davis, C. a, Doyle, F., Epstein, C.B.,
Frietze, S., Harrow, J., Kaul, R., et al. (2012). An integrated encyclopedia of DNA elements in
the human genome. Nature 489, 57–74.
Ghazaryan, S., Sy, C., Hu, T., An, X., Mohandas, N., Fu, H., Aladjem, M.I., Chang, V.T.,
Opavsky, R., and Wu, L. (2014). Inactivation of Rb and E2f8 synergizes to trigger stressed DNA
replication during erythroid terminal differentiation. Mol. Cell. Biol. 34, 2833–2847.
Glass, K., Huttenhower, C., Quackenbush, J., and Yuan, G.-C. (2013). Passing messages
between biological networks to refine predicted interactions. PLoS One 8, e64832.
Glass, K., Quackenbush, J., Spentzos, D., Haibe-Kains, B., and Yuan, G.-C. (2015). A network
model for angiogenesis in ovarian cancer. BMC Bioinformatics 16, 1–17.
Grant, C.E., Bailey, T.L., and Noble, W.S. (2011). FIMO: scanning for occurrences of a given
motif. Bioinformatics 27, 1017–1018.
Haldar, S.M., Lu, Y., Jeyaraj, D., Kawanami, D., Cui, Y., Eapen, S.J., Hao, C., Li, Y.,
17 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
Doughman, Y.-Q., Watanabe, M., et al. (2010). Klf15 deficiency is a molecular link between
heart failure and aortic aneurysm formation. Sci. Transl. Med. 2, 26ra26.
Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y.C., Laslo, P., Cheng, J.X., Murre, C.,
Singh, H., and Glass, C.K. (2010). Simple combinations of lineage-determining transcription
factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell
38, 576–589.
Hernando, H., Shannon-Lowe, C., Islam, A.B., Al-Shahrour, F., Rodríguez-Ubreva, J.,
Rodríguez-Cortez, V.C., Javierre, B.M., Mangas, C., Fernández, A.F., Parra, M., et al. (2013).
The B cell transcription program mediates hypomethylation and overexpression of key genes in
Epstein-Barr virus-associated proliferative conversion. Genome Biol. 14, R3.
Hernando, H., Islam, A.B.M.M.K., Rodríguez-Ubreva, J., Forné, I., Ciudad, L., Imhof, A.,
Shannon-Lowe, C., and Ballestar, E. (2014). Epstein-Barr virus-mediated transformation of B
cells induces global chromatin changes independent to the acquisition of proliferation. Nucleic
Acids Res. 42, 249–263.
Hu, V.W., Frank, B.C., Heine, S., Lee, N.H., and Quackenbush, J. (2006). Gene expression
profiling of lymphoblastoid cell lines from monozygotic twins discordant in severity of autism
reveals differential regulation of neurologically relevant genes. BMC Genomics 7, 118.
International HapMap Consortium (2003). The International HapMap Project. Nature 426, 789–
796.
Ismail, P.M., Lu, T., and Sawadogo, M. (1999). Loss of USF transcriptional activity in breast
cancer cell lines. Oncogene 18, 5582–5591.
Jen, K.-Y., and Cheung, V.G. (2003). Transcriptional response of lymphoblastoid cells to
ionizing radiation. Genome Res. 13, 2092–2100.
18 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M., and Tanabe, M. (2015). KEGG as a
reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457-62.
Kim, J., Chu, J., Shen, X., Wang, J., and Orkin, S.H. (2008). An extended transcriptional
network for pluripotency of embryonic stem cells. Cell 132, 1049–1061.
Law, C.W., Chen, Y., Shi, W., and Smyth, G.K. (2014). voom: Precision weights unlock linear
model analysis tools for RNA-seq read counts. Genome Biol. 15, R29.
Levrero, M., De Laurenzi, V., Costanzo, A., Gong, J., Wang, J.Y., and Melino, G. (2000). The
p53/p63/p73 family of transcription factors: overlapping and distinct functions. J. Cell Sci.
1661–1670.
Li, J., Ran, C., Li, E., Gordon, F., Comstock, G., Siddiqui, H., Cleghorn, W., Chen, H.-Z.,
Kornacker, K., Liu, C.-G., et al. (2008). Synergistic function of E2F7 and E2F8 is essential for
cell survival and embryonic development. Dev. Cell 14, 62–75.
Lin, L., Ozaki, T., Takada, Y., Kageyama, H., Nakamura, Y., Hata, A., Zhang, J.-H., Simonds,
W.F., Nakagawara, A., and Koseki, H. (2005). topors, a p53 and topoisomerase I-binding RING
finger protein, is a coactivator of p53 in growth suppression induced by DNA damage. Oncogene
24, 3385–3396.
Liu, B., Sun, Y., Jiang, F., Zhang, S., Wu, Y., Lan, Y., Yang, X., and Mao, N. (2003). Disruption
of Smad5 gene leads to enhanced proliferation of high-proliferative potential precursors during
embryonic hematopoiesis. Blood 101, 124–133.
Lonsdale, J., Thomas, J., Salvatore, M., Phillips, R., Lo, E., Shad, S., Hasz, R., Walters, G.,
Garcia, F., Young, N., et al. (2013). The Genotype-Tissue Expression (GTEx) project. Nat.
Genet. 45, 580–585.
Lukk, M., Kapushesky, M., Nikkilä, J., Parkinson, H., Goncalves, A., Huber, W., Ukkonen, E.,
19 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
and Brazma, A. (2010). A global map of human gene expression. Nat. Biotechnol. 28, 322–324.
Luo, X., and Sawadogo, M. (1996). Antiproliferative properties of the USF family of helix-loop-
helix transcription factors. Proc. Natl. Acad. Sci. U. S. A. 93, 1308–1313.
McVean, G.A., Altshuler (Co-Chair), D.M., Durbin (Co-Chair), R.M., Abecasis, G.R., Bentley,
D.R., Chakravarti, A., Clark, A.G., Donnelly, P., Eichler, E.E., Flicek, P., et al. (2012). An
integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65.
Millioni, R., Puricelli, L., Iori, E., Trevisan, R., and Tessari, P. (2012). Skin fibroblasts as a tool
for identifying the risk of nephropathy in the type 1 diabetic population. Diabetes. Metab. Res.
Rev. 28, 62–70.
Min, J.L., Barrett, A., Watts, T., Pettersson, F.H., Lockstone, H.E., Lindgren, C.M., Taylor, J.M.,
Allen, M., Zondervan, K.T., and McCarthy, M.I. (2010). Variability of gene expression profiles
in human blood and lymphoblastoid cell lines. BMC Genomics 11, 96.
Müller, F.-J., Laurent, L.C., Kostka, D., Ulitsky, I., Williams, R., Lu, C., Park, I.-H., Rao, M.S.,
Shamir, R., Schwartz, P.H., et al. (2008). Regulatory networks define phenotypic classes of
human stem cell lines. Nature 455, 401–405.
Neph, S., Stergachis, A.B., Reynolds, A., Sandstrom, R., Borenstein, E., and
Stamatoyannopoulos, J.A. (2012). Circuitry and dynamics of human transcription factor
regulatory networks. Cell 150, 1274–1286.
Numoto, M., Niwa, O., Kaplan, J., Wong, K.K., Merrell, K., Kamiya, K., Yanagihara, K., and
Calame, K. (1993). Transcriptional repressor ZF5 identifies a new conserved domain in zinc
finger proteins. Nucleic Acids Res. 21, 3767–3775.
Olave, N.C., Grenett, M.H., Cadeiras, M., Grenett, H.E., and Higgins, P.J. (2010). Upstream
stimulatory factor-2 mediates quercetin-induced suppression of PAI-1 gene expression in human
20 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
endothelial cells. J. Cell. Biochem. 111, 720–726.
Pan, C., Kumar, C., Bohl, S., Klingmueller, U., and Mann, M. (2009). Comparative proteomic
phenotyping of cell lines and primary cells to assess preservation of cell type-specific functions.
Mol. Cell. Proteomics 8, 443–450.
Paulson, J.N., Chen, C.-Y., Lopes-Ramos, C.M., Kuijjer, M.L., Platig, J., Sonawane, A.R.,
Fagny, M., Glass, K., and Quackenbush, J. Tissue-aware RNA-Seq processing and normalization
for heterogeneous and sparse data. (manuscript accompanied submission).
Pawar, S.A., Szentirmay, M.N., Hermeking, H., and Sawadogo, M. (2004). Evidence for a
cancer-specific switch at the CDK4 promoter with loss of control by both USF and c-Myc.
Oncogene 23, 6125–6135.
Qyang, Y., Luo, X., Lu, T., Ismail, P.M., Krylov, D., Vinson, C., and Sawadogo, M. (1999).
Cell-type-dependent activity of the ubiquitous transcription factor USF in cellular proliferation
and transcriptional activation. Mol. Cell. Biol. 19, 1508–1517.
Sandberg, R., and Ernberg, I. (2005). The molecular portrait of in vitro growth by meta-analysis
of gene-expression profiles. Genome Biol. 6, R65.
Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N.,
Schwikowski, B., and Ideker, T. (2003). Cytoscape: a software environment for integrated
models of biomolecular interaction networks. Genome Res. 13, 2498–2504.
Sherr, C.J., and DePinho, R.A. (2000). Cellular senescence: mitotic clock or culture shock? Cell
102, 407–410.
Shukla, S.J., Duan, S., Badner, J.A., Wu, X., and Dolan, M.E. (2008). Susceptibility loci
involved in cisplatin-induced cytotoxicity and apoptosis. Pharmacogenet. Genomics 18, 253–
262.
21 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
Stacey, G. (2006). Primary Cell Cultures and Immortal Cell Lines. eLS.
Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A.,
Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., et al. (2005). Gene set enrichment
analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc.
Natl. Acad. Sci. U. S. A. 102, 15545–15550.
Tan, X.-L., Moyer, A.M., Fridley, B.L., Schaid, D.J., Niu, N., Batzler, A.J., Jenkins, G.D., Abo,
R.P., Li, L., Cunningham, J.M., et al. (2011). Genetic variation predicting cisplatin cytotoxicity
associated with overall survival in lung cancer patients receiving platinum-based chemotherapy.
Clin. Cancer Res. 17, 5801–5811.
Vandenbon, A., Dinh, V.H., Mikami, N., Kitagawa, Y., Teraguchi, S., Ohkura, N., and
Sakaguchi, S. (2016). Immuno-Navigator, a batch-corrected coexpression database, reveals cell
type-specific gene networks in the immune system. Proc. Natl. Acad. Sci. U. S. A. 113, E2393-
402.
Watters, J.W., Kraja, A., Meucci, M.A., Province, M.A., and McLeod, H.L. (2004). Genome-
wide discovery of loci influencing chemotherapy cytotoxicity. Proc. Natl. Acad. Sci. U. S. A.
101, 11809–11814.
Weirauch, M.T., Yang, A., Albu, M., Cote, A.G., Montenegro-Montero, A., Drewe, P.,
Najafabadi, H.S., Lambert, S.A., Mann, I., Cook, K., et al. (2014). Determination and inference
of eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443.
Wright, W.E., and Shay, J.W. (2000). Telomere dynamics in cancer progression and prevention:
fundamental differences in human and mouse telomere biology. Nat. Med. 6, 849–851.
Young, L.S., and Rickinson, A.B. (2004). Epstein–Barr virus: 40 years on. Nat. Rev. Cancer 4,
757–768.
22 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
Yun, K., and Wold, B. (1996). Skeletal muscle determination and differentiation: story of a core
regulatory network and its context. Curr. Opin. Cell Biol. 8, 877–889.
Figure Legends
Figure 1. Similarity between cell lines and their tissues of origin based on gene expression
(A) Number of genes identified for each group (LCL, whole blood, fibroblast, skin). Genes were
separated into biological classes using the definitions from GENCODE release 19
(GRCh37.p13).
(B) Principal component analysis (PCA) of paired samples between the two tissues and cell lines
(total of 89 subjects with all four samples) based on the normalized expression of all genes. The
primary axis separates samples by tissue; the secondary axis separates primary tissue from cell
lines.
Figure 2. Pathways are differentially expressed between cell lines and their tissues of origin
(A) Number of differentially expressed genes (absolute log2 fold change > 2 and FDR < 0.05)
using voom on paired samples.
(B) Results of GSEA reported based on the log10(FDR) significance scale, with one group in red
and the other one in blue. The 15 pathways most significantly different between one cell line and
its tissue of origin.
(C) Pathways enriched for at least two group comparisons (FDR < 0.05). The pathways
differentially expressed between the tissues that are also differentially expressed between the cell
23 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
lines (preserved pathways) are highlighted in red and blue. Pathways over-expressed in both cell
lines compared to their tissues of origin are highlighted in yellow.
See also Tables S1 and S2.
Figure 3. Cell cycle pathway genes are less strongly targeted by TFs in cell lines
(A) Group-specific gene regulatory networks were generated using PANDA. The illustrations
represent subnetworks of the 1000 edges with the highest edge weight difference between a cell
line and its tissue of origin around the cell cycle genes. The thickness and color indicate the edge
weight strength between the TF and target gene (the edges shown have a weight greater than 2 in
at least one network).
(B) Illustration of the gene in-degree difference between each cell line and its tissue of origin.
Positive values indicate higher targeting in cell lines, and negative values indicate higher
targeting in tissues.
(C) Boxplot of the gene in-degree differences for the genes in the KEGG cell cycle pathway and
for genes not in this pathway (t-test). Reduction of gene in-degree difference indicates that the
genes in the cell cycle pathway are less strongly targeted by TFs in the cell line compared to its
tissue of origin.
See also Figures S1 and S2, Tables S3 and S4.
Figure 4. SMAD5 is differentially regulating cell cycle pathway genes
(A) Spearman correlation between the log2 fold change in gene expression (LCL-blood
difference) of KEGG cell cycle pathway genes and the differential targeting they receive by the
TF SMAD5. Red: evidence of SMAD5 ChIP-Seq binding on the promoter of the gene, black: no
24 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
evidence of SMAD5 binding. The negative correlation observed indicates the cell cycle genes
are more highly expressed but less targeted by SMAD5 in LCL compared to blood.
(B) Boxplot of Spearman correlation coefficients between SMAD5 expression levels and
expression levels of all genes, and between SMAD5 expression levels and the expression levels
of cell cycle genes with SMAD5 ChIP-Seq binding evidence for LCL and blood samples.
Difference in magnitude was tested using a Wilcoxon rank-sum test comparing LCL and blood.
(C) Visualization of the correlation between TF and cell cycle gene expression for interactions
that have ChIP-Seq binding evidence. More positively-correlated associations are shown in red,
more negatively correlated are blue, and correlations near zero are gray.
See also Figures S3, S4 and Table 5.
Experimental Procedures
GTEx data
The GTEx version 6.0 RNA-Seq data set (phs000424.v6.p1, 2015-10-05 released) was
downloaded from dbGaP (approved protocol #9112). Using YARN R package (version 1.0.0) we
performed quality control, gene filtering, and normalization preprocessing (Paulson et al.). We
identified and removed GTEX-11ILO due to potential sex misannotation. We grouped related
body regions using gene expression similarity. For example, skin samples from the lower leg
(sun exposed) and from the suprapubic region (sun unexposed) were grouped as “skin.” We
filtered and normalized the data in a tissue-aware manner using smooth quantile normalization
[github.com/stephaniehicks/qsmooth]. The final data set contains 549 research subjects (188
25 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
females and 361 males) comprising 38 tissues (which included two cell lines), 30,333 genes, and
9,435 samples. We filtered sex-chromosome and mitochondrial genes (retaining 29,242 genes).
We reduced the data set to only cell line and tissue-specific paired samples, which
comprised 127 subjects with whole blood and EBV transformed lymphoblastoid cell line (LCL)
samples, and 244 subjects with skin and primary fibroblast cell line samples; 89 subjects have
data across all four groups. For the skin samples, an equivalent number of samples were obtained
from the lower leg (n=123), and from the suprapubic region (n=121). We kept only the 27,175
genes with at least one TF binding motif in its promoter region (see section: Gene regulatory
networks), so that we could use the same set of genes for differential expression and gene
regulatory network analysis.
GTEx RNA-Seq version 6 was annotated using the GENCODE release 19
(GRCh37.p13). Thus, we defined the different types of genes (protein coding, antisense,
pseudogene, lincRNA, and other) according to the same genome annotation downloaded from
http://www.gencodegenes.org/releases/19.html.
Principal Components Analysis
We performed principal component analysis (PCA) as implemented in the plotOrd
function on the R package metagenomeSeq 1.12.1. PCA was applied to the full expression data
matrix.
Differential expression analysis
26 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
Differential expression analysis was performed using voom R package (version 3.26.9)
(Law et al., 2014) and only paired samples between the two groups under comparison (LCL-vs-
Blood, Fibroblast-vs-Skin, LCL-vs-Fibroblast, Blood-vs-Skin). Genes with Benjamini-Hochberg
adjusted p-values of less than 0.05, and with absolute log2 fold change greater than 2 were
considered to be differentially expressed.
Pathway enrichment analysis
We performed Gene Set Enrichment Analysis (GSEA) to determine the biological
functions related to the differential expression between cell lines and tissues (Subramanian et al.,
2005). All genes were ranked by the moderated t-statistic produced by voom differential
expression analysis. We used pre-ranked GSEA program (Java command line version 2-2.0.13)
to calculate a running-sum statistic. We used the gene sets obtained from the KEGG pathway
database that was downloaded from the Molecular Signatures Database (MSigDB)
(http://www.broadinstitute.org/gsea/msigdb/collections.jsp) (“c2.cp.kegg.v5.0.symbols.gmt”).
We performed 1000 gene set permutations to assess the statistical significance, and considered
gene sets with FDR < 0.05 significant. We only considered gene sets of size greater than 15 and
less than 500 genes after filtering out those genes not in the expression dataset, or 176 gene sets
in total.
Gene regulatory networks
27 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
We reconstructed gene regulatory networks using PANDA, a message-passing model that
integrates multiple types of genomic data and infers the network of interactions between TFs and
their target genes (Glass et al., 2013). PANDA starts with a prior regulatory network inferred by
mapping TF binding sites to the genome, integrates protein-protein interaction and gene
expression data to iteratively refine the network structure and deduces a final consensus
regulatory network. In the regulatory networks estimated by PANDA, each edge connects a TF
to a target gene, and the edge weight indicates the strength of the inferred regulatory relationship.
We generated one PANDA network for each group: LCL, blood, fibroblasts, and skin
(Figure S1) (Glass et al., 2013). For each network, we used the same TF/target gene prior
regulatory network and the same protein-protein interaction (PPI) prior network (see below).
To generate the TF/target gene regulatory prior, we downloaded all position weight
matrices (PWM) for direct and inferred Homo sapiens motifs from the Catalog of Inferred
Sequence Binding Preferences (CIS-BP) (2015-07-07) (Weirauch et al., 2014). For each TF, we
selected the motif with the highest information content, total of 695 motifs. We mapped the
PWMs for these 695 motifs to promoter regions of Ensembl gene (ENSG) ids using FIMO
(Grant et al., 2011). Motif mappings were parsed to only retain those below p-value cut-off of
10-5 and ranging from -750bp to +250bp of the transcription start site (TSS). We then converted
TF motif identifiers to HGNC symbols and filtered out TFs for which no official gene symbol
was known. This resulted in a regulatory prior that included 652 TFs targeting 27,249 unique
Ensembl gene ids. Finally, we kept only genes that were found expressed in the GTEx filtered
and normalized data set, which included 27,175 target genes.
To generate the PPI prior, we downloaded Homo sapiens PPI interactions
(9606.protein.links.v10.txt.gz) and protein aliases (9606.protein.aliases.v10.txt.gz) from
28 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
StringDb v10 (2015-10-27). We parsed this PPI data set for the 652 TFs in our TF/target gene
prior. 628/652 TFs had interactions with other TFs (scores between 0-1). We added the
remaining 24 TFs to our PPI prior with self-interactions only, and set all self-interactions in this
PPI prior to edge scores of 1.
To run PANDA, for each sample group, we used the TF/target gene prior, the PPI prior,
and the sample group gene expression data. The TF/target gene edge weights emerging from
PANDA were then used to compare networks between each cell line and its tissue of origin. For
pairs of networks, we compared the TF out-degree, defined as the sum of edge weights from that
TF, and the gene in-degree, defined as the sum of all incoming edge weights a gene received
from all expressed TFs in the network. The illustrations of the subnetworks were done using
Cytoscape default yFiles Organic layout (version 3.4.0) (Shannon et al., 2003) where each edge
connects a TF to a target gene, and the edge weight is represented by the thickness and color
shade.
ENCODE Chip-Seq data
Chip-Seq on GM12878 (type of LCL) targeting the TFs SMAD5, IKZF1, USF1, and
USF2 were downloaded from the ENCODE Project (https://www.encodeproject.org, accessed
2016-06-03). We used the narrow peak data processed by ENCODE from 2 biological replicates
(accession: ENCFF553HHF, ENCFF001VEJ, ENCFF002CIB, ENCFF001VFQ). Then, to
identify genes bound by each of these TFs we used bedtools (v2.17) to annotate peaks that fall
within the promoter region of a gene (same promoter regions used for the network
reconstruction, ranging from -750bp to +250bp of the TSS).
29 bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
A B
●●●●●●● Fibroblast ●●●●●●● ●●●● ●●●●●●●●●●●● ●●●●●●●●●●● Skin ●●●● ●●●●● ●●●●●●● ●●●●●●●● 20000 ● ● LCL ●●●●● ● ●● protein coding 100 ● Whole blood antisense ● lincRNA ●●●●● 0 Fibroblast ●●● ●●●●●●●●● pseudogene ●●●●● ●●●●●●●●●●● ●●●●●●●●●● Skin ●●●●●●●●● 10000 ●●●●●● ●●●●●●●●● other ●●●●●● ● LCL ●●●●●● ●●●●●●●● ● ● ●●● 100 ●●●●●●●●● ● Whole blood Number of genes ●●●●● 100 ● ● −
0 ●●
PC2 (22.29% variance) ● ●●●●●● ●●●●●●●●● 0 ●●●●●●●●●●
200 ● ●●●●●● − ●●● ●●●●●●●●● ●●●●●●●●● ●●●●● LCL Skin −200 −100100 0● 100 200 PC1− (37.35% variance) ●●
PC2 (22.29% variance) ● Fibroblast ●●●●●● ●●●●●●●●● Whole blood ●●●●●●●●●●
200 ● − −200 −100 0 100 200 PC1 (37.35% variance) samples Value Value − − 5 5 A B
Blank Fibroblast enrichment Skin enrichment Blank LCL enrichment Blood enrichment 89 Skin Blood Whole blood Skin pyrimidine metabolism protein export n glycanbiosynthesis biosynthesis aminoacyl trna polymerase rna factorsbasal transcription nucleotide excision repair ubiquitin mediatedproteolysis mismatch repair dna replication degradation rna one carbonpoolby folate cell cycle spliceosome amino sugarandnucleotide sugarmetabolism t cellreceptorsignalingpathway neuroactive ligandreceptorinteraction signalingpathwayfc epsilonri b cellreceptorsignalingpathway intestinal immune for network igaproduction aldosterone regulatedsodiumreabsorption retinol metabolism type idiabetesmellitus allograft rejection linoleic acidmetabolism graft versus hostdisease cell adhesionmoleculescams autoimmune thyroid disease asthma arachidonic acidmetabolism ribosome nucleotide excision repair mismatch repair homologous recombination glyoxylate anddicarboxylate metabolism dna replication citrate cycletca cell cycle base excision repair biosynthesis aminoacyl trna cysteine andmethioninemetabolism valine leucineandisoleucinedegradation polymerase rna leukocyte transendothelial migration olfactory transduction fc gammarmediatedphagocytosis endocytosis nod like receptorsignalingpathway arachidonic acidmetabolism neuroactive ligandreceptorinteraction glycosaminoglycan degradation natural killercellmediatedcytotoxicity leishmania infection chemokine signalingpathway cytokine receptorinteraction lysosome hematopoietic celllineage complement andcoagulationcascades spliceosome degradation rna Cell line vs tissue Cellline of tissue origin vs Differentially expressed genes expressed Differentially genes 8,964 pathway enrichment pathway bioRxiv preprint certified bypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailableunder 5 ,1 genes 8,617 , 244 paired paired 244 127 paired paired 127 655 samples samples genes doi: https://doi.org/10.1101/082065 samples 89 Fibroblast LCL genes 6,553 Skin/Fibroblast Skin/Fibroblast Blood/LCL Blood/LCL enriched enriched enriched Cell Cell line C Color Key − − 5 log(FDR) 0 Pathways enriched for comparisons at two least group enriched Pathways ; 5 this versionpostedOctober20,2016. a CC-BY-ND 4.0Internationallicense SKN FIB Skin WBL LCL Blood WBL SKN Tissue LCL FIB Cell leukocyte transendothelial migration jak statsignalingpathway toll like receptorsignalingpathway cytosolic dnasensingpathway ilikerig receptorsignalingpathway cell adhesionmoleculescams phosphatidylinositol signalingsystem viral myocarditis asthma graft versus hostdisease autoimmune thyroid disease allograft rejection type idiabetesmellitus signalingpathwayfc epsilonri intestinal immune for network igaproduction t cellreceptorsignalingpathway b cellreceptorsignalingpathway antigen processingandpresentation immunodeficiency primary tight junction hedgehog signalingpathway melanogenesis axon guidance basal cellcarcinoma adherens junction metabolismcytochromep450 drug metabolism ofxenobiotics by cytochromep450 arachidonic acidmetabolism neuroactive ligand receptor interaction linoleic acidmetabolism alpha linolenicacidmetabolism complement andcoagulationcascades lysosome glycosaminoglycan degradation glycosaminoglycan biosynthesischondroitinsulfate glycosaminoglycan biosynthesisheparan sulfate arrhythmogenic ventricular cardiomyopathy right arvc tgf betasignalingpathway focal adhesion ecm receptorinteraction biosynthesis aminoacyl trna pyrimidine metabolism n glycanbiosynthesis protein export factorsbasal transcription polymerase rna ubiquitin mediatedproteolysis one carbonpoolby folate propanoate metabolism andglutamatemetabolism alanine aspartate lysine degradation butanoate metabolism valine leucineandisoleucinedegradation backbonebiosynthesis terpenoid proximal tubule bicarbonatereclamation disease parkinsons oxidative phosphorylation ribosome proteasome nucleotide excision repair cell cycle base excision repair spliceosome degradation rna mismatch repair homologous recombination dna replication hematopoietic celllineage fc gammarmediatedphagocytosis systemic lupuserythematosus chemokine signalingpathway leishmania infection natural killercellmediatedcytotoxicity cytokine receptorinteraction . Pathway enrichment Pathway The copyrightholderforthispreprint(whichwasnot + / - log(FDR) matrix Extracellular signaling Immune repair DNA cycle Cell junction Cell - cell bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
A B Cell line Tissue LCL Whole blood Transcription factor 0 2 4 6 8
SKP2 SKP2 Edge weight Target gene REST REST
BHLHE41 NR2F6 BHLHE41 NR2F6 ANAPC4 ANAPC4 NR2F1 NR2F1 BARX2 FOXF1 SMAD1 BARX2 FOXF1 SMAD1
HOXC6 CPEB1 HOXC6 CPEB1 NEUROD1 PATZ1 HNF1B TCF12 NEUROD1 PATZ1 TCF12 HNF1B MSX2 POU2F2 MSX2 PDX1 MSX1 NFATC1 POU2F2 POU3F2 PDX1 MSX1 NFATC1 WEE1 FOXD2 POU3F2 WEE1 FOXD2 LIN54 LIN54 NKX6-1 NKX6-1 SOX12 SOX12 LHX1 CTCFL FOXL1 ASCL1 LHX1 CTCFL FOXL1 ASCL1 FOXB1 BARX1 UNCX NR2F2 FOXA1 FOXB1 BARX1 TOPORS NFYC HAND1 BPTF REL UNCX NR2F2 FOXA1 HAND1 BPTF ALX4 TOPORS NFYC REL HOXB8 ALX4 HOXB8 POU3F1 PBX2 POU3F1 PBX2
HMX1 HIVEP1 HMX1 PITX2 PLAGL1 HIVEP1 POU2F1 FOXA2 PITX2 PLAGL1 PTF1A POU2F1 FOXA2 SMAD4 PTF1A PPARD SMAD4 SOX5 PPARD MEF2A SOX5 MEF2A ZFHX3 ZFHX3 NR4A3 DMRT2 ONECUT2 NR4A3 FOXG1 DMRT2 ONECUT2 FOXG1 STAG1 STAG1 BARHL2 SMAD2 MDM2 BARHL2 CCNE1 TP73 MDM2 POU5F1B CDKN1C MYOD1 SMAD2 TP73 YWHAG POU5F1B CCNE1 HNF4G RBX1 CDKN1C MYOD1 PKMYT1 HNF4G YWHAG RBX1 HDAC1 PKMYT1 HDAC1 MCM3 CDC45 MCM2 MCM3 RBL2 CDC25BANAPC5 CDC45 MCM2 CDC27 MCM7 E2F4 MCM5 RBL2 CDC25BANAPC5 CDC27 E2F4 ALX1 MCM7 MCM5 EMX2 FOXA3 PLK1 TP53 ALX1 CDC23 FOXA3 PLK1 TP53 CDC25C EMX2 TCF4 HIC2 CDK4 CDC25C CDC23 CREBBP TCF4 HIC2 CDK4 CREBBP E2F2 TGFB3 YY1 GSK3B TBP GADD45A TGFB3 E2F2 YY1 EP300 GSK3B GADD45A HNF1A gene gene TBP EP300 CDK2 ATR HNF1A CDKN1B CDKN2B GATA5 CDK2 ATR CDKN1B CDKN2B GATA5 GFI1 GFI1 ZNF35 CDC14A SKP1 PHOX2A HDAC2 ZNF35 CDC14A SKP1 HOXD9 PHOX2A RBL1 CCNE2 FUBP1 gene in-degree HDAC2 HOXD9 CCNE2 RBL1 FUBP1 RELB RAD21 SMC3 YWHAZ RAD21 RELB E2F3 SMC3 YWHAZ RB1 E2F3 RB1 ZNF784 ONECUT3 PAX4 CDC25A weighted weighted ZNF784 ONECUT3 NKX3-1 PAX4 ZNF32 NKX3-1 CDC25A ZNF32 YWHAH SMAD3CCNA1 EPAS1 AIRE ! ! YWHAH SMAD3 EPAS1 = CCNA1 AIRE CDX2 MAD1L1 cell cycle - difference MAD1L1 CDX2 ETS2 ATM POU3F3 ETS2 ATM POU3F3 HHEX DRGX TFAP4 HHEX DRGX TFAP4 DMRTC2 ANAPC10 DMRTC2 HOXB6 HEY2 edges edges ANAPC10 HOXB6 HEY2
FOXJ3 E2F6 ● FOXJ3 E2F6 ● ABL1 STAT2 LHX3 LMO2 ZEB1 ● ABL1 500 STAT2 LHX3 LMO2 ZEB1 EMX1 ZNF740 ZNF148 SP3 KLF15 ● TFAP2D ZNF740 ZNF148 ● CDC7 EMX1 KLF5 SP3 KLF15 ● TFAP2D YWHAQ ● KLF5 CDC7 ● SMAD5 KLF16 ● ZBTB7B ● YWHAQ PHOX2B ● KLF16 ● WT1 ● SMAD5 ● ZBTB7B ZNF263 ● TCF3 ● PHOX2B ● KLF3 ● WT1 RREB1 ● ZNF263 HOXB7 MYOG ZNF281 ● TCF3 KLF14 SP8 ● ALX3 ● MAZ MZF1 ● FOXP1 ● KLF3 ● FOXP4 ● ● CDKN2C RREB1 ● MYOG ● HOXB7 ZNF281 ● SP1 ● KLF14 ● SP8 ● ALX3 ● INSM1 ● MAZ MZF1 ● FOXP1 ● FOXP4 ● CDKN2C ● SP1 ● INSM1 ● TWIST1 ● KLF1 ● PURA ● ARID3A PLAG1 ● ZNF219 ● TWIST1 ● KLF1 ● PURA ● ARID3A PLAG1 ● SOX18 ● IRF1 ● ZNF219 POU2F3 SP4 ● SOX18 IRF1 KLF4 ● POU2F3 SP4 ● KLF4 NHLH2 ● HMG20B ● NHLH1 ● NHLH2 ● HMG20B ● NHLH1 ● SFN ● CCNB2 HOXD10 ● SFN ● TFAP2C ● POU4F1 ● CDC5L ● CCNB2 VTN ● HOXD10 ● TFAP2C NR5A1 VTN CDC5L POU4F1 POU4F2 PROP1 NR5A1 TFAP2A POU4F2 PROP1 ANAPC2 TFAP2B TFAP2A ANAPC2 TFAP2B POU1F1 NFATC2 SOX30 C NFATC2 MCM6 POU1F1 DMRTA2 SOX30 MCM6 POU5F1 DMRTA2 NFATC4 POU5F1 NFATC4 250
ZSCAN4 CCNB1 CDK6 ZNF232 ZSCAN4 CCNB1 CDK6 ZNF232 NKX2-1 NKX2-1 ORC6 ZBTB17 ISX ESPL1 ORC6 ZBTB17 ZNF683 ISX ESPL1 ZNF683 Pathway genes (LCL−WBL) CDC20 CDX1 CDC20 CDX1 CDKN1A ORC1 CDKN1A ORC1 LEF1 LEF1 TFDP1 TFDP1 NR2E1 CHEK1 CDC16 PTTG1 NR2E1 CHEK1 CDC16 PTTG1 CDKN2A Not in pathway genes (LCL−WBL) NRF1 E2F1 CDKN2A TGFB2 NRF1 E2F1 FEZF1 TGFB2 CUL1 FEZF1 CUL1 RUNX1 MCM4 IRX1 cell cycle ORC4 RUNX1 MCM4 IRX1 ORC4 0 SOX6 SOX6 Pathway genes (FIB−SKN) -9 -6 degree difference FDR=1.3x10 FDR=1. 6x10 − ● Not in pathway genes (FIB−SKN) 500 ● ● ● −250 ● ● ● ● ● ● ● ● Fibroblast Skin Gene in ● ● ● ● ● ● ● ● ● ● ● ● NR2E3 0 2 4 6 8 NR2E3 −500 ● ORC6 YWHAB ORC2 ORC6 YWHAB ORC2 250 KLF1 KLF1
ZEB1 Edge weight ZEB1
CDK7 CDK7 STAT6 STAT6 SOX15 SOX7 SOX15 SOX7 SOX9 SOX9 AIRE AIRE NR1D1 NR1D1
BPTF BPTF E2F5 E2F5 ZNF683 SKP2 ZNF683 SKP2
KLF16 KLF16 targeting Cell FOXP4 FOXP4 BARHL2 FOXC1 NEUROD1 BARHL2 FOXC1 NEUROD1 LHX1 POU5F1B LHX1 POU5F1B MIXL1 SOX12 MIXL1 SOX12 FOXJ3 HOXC6 FOXJ3 HOXC6 FOXP1 KLF4 FOXP1 KLF4 ANAPC11 ANAPC11 TGFB1 NFATC1 TGFB1 NFATC1 GABPA GABPA POU1F1 FOXL1 NFE2L1 POU1F1 FOXL1 NFE2L1 FUBP1 FOXA1 FOXJ2 FUBP1 FOXA1 FOXJ2 FOXB1 FOXO1 FOXB1 FOXO1 POU3F2 LHX3 FOXD2 CPEB1 DBF4 POU3F2 LHX3 FOXD2 CPEB1 DBF4 ONECUT3 ZNF32 ONECUT3 ZNF32 ARID3A BARX1 POU3F3 ARID3A BARX1 POU3F3 DRGX HOXD9 DRGX HOXD9 PRDM1 PRDM1 FOXA2 FOXM1 DMRT2 FOXA2 FOXM1 DMRT2 TBP TBP HMGA2 HMGA2 ZNF35 CDK4 ZNF35 CDK4 ONECUT2 IRF5 POU4F2 LHX9 ONECUT2 IRF5 POU4F2 LHX9 DMRT3 DMRT3 PRRX1 PRRX1 HOXB5 FOXO6 PROP1 FOXG1 HOXB5 FOXO6 PROP1 FOXG1 E2F4 CDC5L E2F4 CDC5L MEF2A FOXF1 MEF2A FOXF1 0 POU4F1 POU4F1 ZFHX3 ZFHX3 GADD45G GADD45G ARID5B PAX4 GATA5 ARID5B PAX4 GATA5 PBX2 HMX1 PBX2 HMX1 GATA4 MSX2 MAD2L2 DMRTC2 GATA4 MSX2 MAD2L2 DMRTC2 ZBTB17 ZBTB17 HOXD8 ATM HOXD8 ATM ANAPC10 IRX1 ANAPC10 IRX1 BCL6 FZR1 BCL6 FZR1 CUL1 HMG20BNKX2-1 CUL1 HMG20BNKX2-1 SMC3 SMC3 MAD1L1 PCNA TFDP2 FOXP2 MAD1L1 PCNA TFDP2 FOXP2 BUB3 MSX1 BUB3 MSX1 FOXA3 ANAPC4 FOXA3 ANAPC4 FOXF2 PBX1 FOXF2 PBX1 RB1 RB1 NKX6-1 CDC14A RBL1 NKX6-1 CDC14A RBL1 POU3F1 HMX2 POU3F1 HMX2 degree difference MAD2L1 BARX2 MAD2L1 BARX2 NKX3-1 CDK1 NKX3-1 CDK1 ANAPC5 SMAD4 ANAPC5 SMAD4 SMAD2 CCNHCDC27 SMAD2 CCNHCDC27 POU2F1 RBL2 RAD21 CCNE2 STAG1 POU2F1 RBL2 RAD21 CCNE2 STAG1 ZNF384 HOXB6 ZNF232 ANAPC2 ZNF384 HOXB6 ZNF232 ANAPC2 MNX1 GFI1 MNX1 GFI1
SKP1 ORC5 SKP1 ORC5 − NFATC3 ORC3 CDX1 NFATC3 ORC3 CDX1 ISX ORC4CCNA2 HHEX ISX ORC4CCNA2 HHEX CDKN2D CDKN2D STAT2 STAT2 YWHAZ YWHAZ SMC1B FOXD3 HDAC2 CCNA1 SMC1B FOXD3 HDAC2 CCNA1 NFATC2 NFATC2 YWHAQ SP4 YWHAQ SP4 EMX2 SOX5 EMX2 SOX5
SOX18 RUNX1 SOX18 RUNX1 PURA YWHAH PURA YWHAH KLF15 KLF15 GSX1 SP1 ZNF263 GSX1 SP1 ZNF263 HMX3 POU6F2 WT1 KLF5 IRF3 HMX3 POU6F2 WT1 KLF5 IRF3
NR2E1 NR2E1 NFATC4 SOX30 NFATC4 SOX30 POU2F2 POU2F2 SP3 ZNF148 NKX2-2 SP3 ZNF148 NKX2-2 PDX1 PDX1 ZBTB7B ZBTB7B MAZ MAZ ZNF281 HNF1A NFAT5 ZNF281 HNF1A NFAT5 SPI1 SPI1 −250
PHOX2A TEAD4 PHOX2A TEAD4 EMX1 CDKN1B SPIC EMX1 CDKN1B SPIC IRF1 HOXB7 IRF1 HOXB7 CDX2 CDX2 HOXA9 HOXA9 HOXD10 MECOM HOXD10 MECOM HNF1B HNF1B DMRTA2 UNCX DMRTA2 UNCX LIN54 LIN54 LBX2 LBX2 ● FEZF1 FEZF1 ● POU5F1 POU5F1 ● GSX2 GSX2 ● HOXB8 HOXB8 ● ALX3 ALX3 ● EN1 EN1 ● STAT4 STAT4 ● TTK TTK ● ONECUT1 ● ONECUT1 ● LEF1 LEF1 ● ALX4 ALX4 ● ALX1 ALX1 ● E2F3 E2F3 ● SOX10 FOXQ1 WEE1 SOX10 FOXQ1 WEE1 ● CDC23 CDC23 ● PHOX2B TEAD1 PHOX2B TEAD1 ● Gene in ● ● ● ● LHX2 LHX2 ● GSK3B GSK3B ● VTN VTN ● ● ● ● Tissue targeting −500 ● bioRxiv preprint doi: https://doi.org/10.1101/082065; this version posted October 20, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
A B 2 1.0 ● ● p=8.7x10-9
● ● rho=−0.621 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ●● ● 1.0 ● ● ● ● ● ●● ●● ● 0 ●● ● ● ●● ● ●●● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● Correlation ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● −1 ● ● ● ● ● ● ● ●● ●● 0.5 ● Edge weight difference Edge weight ●● ● ●● −0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● LCL SMAD5 x target gene expression −2 value Whole blood −0.01.0 −2.5 0.0 2.5 5.0 7.5 Expression LFC all genes cell cycle genes
−0.5 C ● ● ● ● ● ● Correlation ● ●
cells_allGenestissues_allGenescells_cellcycleGenestissues_cellcycleGenes LCL Whole factor(variable) blood
SMAD5 SMAD5
IKZF1 IKZF1 USF1 USF1
USF2 USF2