COMPREHENSIVE CHARACTERIZATION OF THE TRANSCRIPTIONAL SIGNALING OF HUMAN PARTURITION THROUGH INTEGRATIVE ANALYSIS OF MYOMETRIAL TISSUES AND CELL LINES
by
ZACHARY STANFIELD
Submitted in partial fulfillment of the requirements
For the degree of Doctor of Philosophy
Systems Biology and Bioinformatics
CASE WESTERN RESERVE UNIVERSITY
August, 2019 Case Western Reserve University
School of Graduate Studies
We hereby approve the thesis of
ZACHARY STANFIELD
candidate for the degree of Doctor of Philosophy1.
Committee Chair Dr. Gurkan Bebek Department of Nutrition
Committee Member, Adviser Dr. Mehmet Koyutürk Department of Computer Science and Electrical Engineering
Committee Member Dr. Sam Mesiano Department of Obstetrics and Gynecology
Committee Member Dr. Mark Cameron Department of Population and Quantitative Health Sciences
Date of Defense May 10, 2019
1We certify that written approval has been obtained for any proprietary material contained therein. Acknowledgements
I would first like to acknowledge my mentor, Dr. Mehmet Koyutürk. I knew from our first meeting that Dr. Koyutürk’s lab would be a great fit for me. He is a highly en- thusiastic and ambitious researcher as well as a relaxed, supportive, and understanding mentor. Dr. Koyutürk gave me a lot of freedom in my research but was always present to provide useful feedback and ideas on how to move forward. This allowed me the opportunity to progress in research directions I personally found both interesting and rewarding. I would also like to mention Dr. Sam Mesiano, who essentially served as my co-mentor throughout the latter part of my PhD. His extensive knowledge and experi- ence in the topic of my thesis research helped me progress quickly and better plan my project. Dr. Mesiano was always willing to offer advice and introduce me to potential collaborators at various research retreats and conferences. The amount of support and belief shown in me by both Dr. Koyutürk and Dr. Mesiano gave me the confidence and drive I needed throughout my PhD career.
Graduate school is just as much a lifestyle as a step in one’s career path. Therefore it is important to build relationships with those around you and experience life outside of research. Thankfully I have been able to meet a lot of great people through school events like intramural sports and grad student functions, as well as while exploring Cleveland.
Some of these relationships have lasted since the very beginning and will continue long after graduation. I thank each and every one of these individuals for their friendship and the memories we were able to create.
Lastly, I would like to thank my family. My parents are extremely supportive of me in every aspect of my life and I am truly grateful for it. My brother also deserves much credit, having participated in countless conversations about every possible aspect of my
iii personal and professional life throughout the years. Without my family’s continuous encouragement and guidance I would not be where I am today.
iv Table of Contents
Acknowledgements iii
List of Tables ix
List of Figuresx
Comprehensive Characterization of the Transcriptional Signaling of Human
Parturition through Integrative Analysis of Myometrial Tissues and
Cell Lines xx
Chapter 1. Introduction1
Overview of Human Parturition1
Preterm Labor2
Processes Controlling Human Parturition2
The Role of Transcriptomic Studies in Parturition Research4
The Role of Experimental Cell Lines in Parturition Research5
Hypotheses and Objectives6
1.6.1. Characterizing Tissue Gene Expression7
1.6.2. Characterizing Cell Line Gene Expression8
1.6.3. Determining the Agreement between Tissue and Cell Lines9
Chapter 2. Materials and Methods 11
Data 11
2.1.1. Tissue Data 11
2.1.2. Cell Line Data 13
Methods 15
v 2.2.1. Raw Data Processing, Calculating Differential Expression, and Study
Concatenation 15
2.2.2. Classification 17
2.2.3. Singular Value Decomposition 18
2.2.4. Enrichment Analysis 19
2.2.5. Annotation of Vascular Smooth Muscle Contraction Pathway Diagram 22
2.2.6. Signaling Network Construction 23
2.2.7. Regulatory Network Analysis 25
2.2.8. Comparison of Myometrial Cell Line and Tissue 26
Chapter 3. Characterizing Transcriptional Alterations at the Onset of Labor in
Myometrial Tissue 28
Differential Gene Expression in Quiescent and Laboring Myometrium 28
Within-study Classification 31
Cross-study Classification 36
Identification of Latent Transcriptional Regulation Patterns via Singular Value
Decomposition 37
Global Identification of Pathways Altered During Parturition 41
Hypothesis-Driven Signature Identification 43
Building a Parturition Signaling Network 49
Summary 52
Chapter 4. Investigating Pathways Controlling Human Parturition in a Myometrial
Cell Line 54
Progesterone Signaling 55
vi cAMP Signaling 56
IL-1β Signaling 59
Additional Insights from Single Stimulus Comparisons 60
Anti-inflammatory Actions of P4 and cAMP 62
Synergy Between P4 and cAMP 64
Summary 67
Chapter 5. Identifying Transcriptional Similarities of Human Myometrial Tissues
and Cell Lines 68
Agreement of Transcriptional Alterations at the Gene, Pathway, and Network
Levels 68
Assessing Similarity Using Tissue Similarity Index 71
Summary 74
Chapter 6. Discussion and Conclusions 76
Tissue Analysis 76
Cell Line Analysis 83
Tissue and Cell Line Comparison 86
Conclusions 87
Chapter 7. Suggested Future Research 89
Utilizing Clinical Data 89
Confirmational Studies for Validation of Findings 90
Myometrial Cell-Specific Signaling 91
Linking the Uterine Tissues 91
Appendix A. Supplementary Tables 93
vii Appendix. Complete References 130
viii List of Tables
2.1 Myometrial tissue gene expression studies analyzed in this chapter.
Three gene expression datasets collected from publicly available
sources or collaborators are shown. Sample groups and names are
consistent with the acronyms listed in column three throughout the
chapter. Clinical definitions of labor represent how each sample was
called phenotypically, based on the patient’s clinical presentation at
the time of cesarean section, as chosen by the researchers collecting
the samples. 12
ix List of Figures
2.1 Study design and conceptual illustration of the framework for
data analysis. A. The human myometrial hTERT-HMA/B cell
line, expressing the progesterone receptors (PRs), was treated
with progesterone (P4), forskolin (FSK, induces cAMP), and IL-1β
individually and in all combinations of these three stimuli. Treatment
was performed for 24 hours to establish steady-state cellular
transcription. RNA sequencing was performed in triplicate on each
condition, resulting in expression measurements for at total of 24
samples. B. Stimulation of the targeted pathways was confirmed by
examining the effect of different treatment combinations on mRNA
expression of IL-8, a commonly used downstream target of IL-1β
and indicator of inflammatory signaling. IL-8 induced expression is
achieved with IL-1β treatment. This induced expression is inhibited
by P4 in the presence of PR. cAMP inhibits IL-1β induced expression
of IL-8, and, finally, the strongest inhibition of IL-8 up-regulation is
achieved by treating cells with both FSK and P4. These experiments
were carried out by a collaborator, Dr. Peyvand Amini, and are shown
here to improve clarity. C. Comparisons performed to characterize the
signaling carried out by P4, cAMP,and IL-1β individually. The results
of these comparisons are shown in Figures 4.1-4.4. D. Comparisons
performed to examine the interplay between P4 and cAMP in the
context of their anti-inflammatory effects. Samples treated by IL-1β
and different combinations of P4 and FSK were first compared to
x vehicle. Subsequently, the genes, transcription factors, and processes
identified in these comparisons were compared to each other to
understand how P4 and cAMP work together or individually to
suppress inflammatory processes. The results of these analyses are
shown in Figures 4.5 and 4.6. 14
2.2 Data and methods used in bioinformatics pipeline. Raw gene
expression was processed in multiple steps and combined with
curated information from public databases to carry out supervised
and unsupervised approaches to identify transcriptional signatures,
assess their reproducibility, and identify key signaling pathways and
interactions involved in parturition. 16
3.1 Venn diagram of differential expression in Warwick, Detroit, and
London. Differentially expressed genes between non-labor and
in-labor sample groups for each dataset are compared to calculate
similarity across studies. 30
3.2 AUCs, ROC Curves, and sample classification frequencies for K-fold
cross-validation and cross-study classification tasks on tissue
gene expression studies using LASSO and EN. A-C. K-fold cross
validation (CV) for each dataset (k = 3 for Warwick, 5 for London and
Detroit). D-I. Classification of one dataset with training performed on
another dataset. LASSO and EN were implemented using the glmnet
package in R. J-R. Corresponding sample class frequencies for the
classification tasks shown in panels A-I, specifically using the results
xi from the top performing algorithm, based on AUC. For all prediction
tasks, the early labor (L-ILEa) and established labor (L-ILEs) samples
were grouped together as a single laboring sample group except for
sample classification frequency for 5-fold CV on London, where the
three sample groups were kept separate in order to assess how the
new initial labor group was classified. All results shown were obtained
using 100 runs of training and testing. Sample names in panels J-R
follow the naming convention in Table 2.1. 33
3.3 Distribution of singular values and examination of genes and
samples in the space spanned by the first singular vector from SVD
analysis on aggregated gene expression matrix. A. Singular values
calculated by performing SVD on the normalized, study-combined
expression matrix (12,622 genes by 71 samples). B. Mapping of the
top 25 positively and negatively valued genes in the space represented
by the first right singular vector (eigen-gene). Bars representing genes
are colored based on its differential expression in each of the four
differential expression analyses (shown in Fig. 3.1; i.e. purple bars
indicate genes in the 126 set of high-confidence genes). C. Mapping
of all tissue samples to the first left singular vector (eigen-sample),
sorted by mapping value. Bars representing samples are colored
based on their clinically defined phenotype (blue = non-laboring, red
= laboring). Sample name abbreviations in panel C follow Table 2.1. 39
3.4 Global and hypothesis-driven GSVA pathway enrichment across
all tissue studies. A. Top 30 enriched pathways (by median adjusted
xii p-value) using the hallmark gene sets from MSigDB. B. Enrichment
results for a selected set of pathways from MSigDB related to
progesterone response, smooth muscle, inflammation, and cAMP
signaling. Heat map is colored by log2(fold change) where red means
up-regulation in laboring samples and blue down-regulation. Fold
changes are calculated using an empirical Bayes model fit to the
output (from the gsva() function) gene set by sample matrix. Rows
(gene sets) are clustered using the complete clustering method based
on correlation. 44
3.5 Pathway-specific SVD analysis. SVD results for the gene sets A.
GO response to progesterone, B. hallmark inflammatory response,
and C. KEGG cAMP signaling pathway. All studies were combined
(12,622 genes x 71 tissues) and then reduced to the genes within
each gene set. Panels A, D, and G show the distribution of singular
values, panels B, E, and H show the top 25 genes associated with each
phenotype based on loading value (genes are colored by their mean
log2(fold change) value across all four sample group comparisons in
Fig. 3.1), and panels C, F, and I show the gene set activity (loading
value) for each sample (samples are colored based on their originally
reported clinical phenotype). 46
3.6 Visualization of gene expression changes on KEGG Vascular Smooth
Muscle Contraction (VSMC) pathway diagram. The Homo sapiens
VSMC pathway diagram was downloaded in KGML format from
KEGG and loaded into Cytoscape. Nodes (genes) are colored based
xiii on loading values for the dominant singular vector from performing
singular value decomposition on the reduced (to include only genes
within the VSMC pathway) Warwick expression matrix. These values
were directly loaded into Cytoscape as a node table and used in a
continuous manner in the style tab for node coloring. Blue nodes
indicate genes with higher expression in the W-NL group, red higher
expression in the W-IL group, and white no differential expression. 48
3.7 Parturition-specific signaling networks constructed using shortest
paths and differential expression data. Shortest paths were
calculated on a curated, directed, weighted signaling network.
Shortest paths were obtained from each of the three pathway initiator
proteins (PGR, IL1R1, ADCY; chosen to represent progesterone,
inflammatory, and cAMP signaling) to each downstream target
gene. Nodes in the network were weighted based on gene loadings
from the dominant singular vector when performing SVD on the
combined expression matrix (3 studies; 12622 genes X 71 samples).
A. A general parturition signaling network. Target genes consisted
of the top 50 genes associated with the non-laboring and laboring
sample groups (100 total input genes). Nodes were weighted such
that those with strong phenotype associations had the lowest cost.
Paths falling in the top 50%, in terms of lowest costs, were retained.
B. A parturition signaling network prior to labor (i.e. quiescence).
Target genes consisted of the top 50 genes associated with the non-
laboring sample groups. Nodes were weighted such that genes highly
xiv expressed in the non-laboring samples had low costs, those with
roughly equal expression across the two phenotypes had a medium
cost, and genes with higher expression in the laboring samples had
high cost. C. A parturition signaling network during labor. Target
genes consisted of the top 50 genes associated with the non-laboring
sample groups. Nodes were weighted opposite of the quiescent
network in panel B. All paths were retained for the networks in panels
B and C. Networks were visualized in Cytoscape. Node shapes are
based on function/pathway location, node colors correspond to their
respective weights, and edge width is proportional to the weights in
the original network (thicker meaning higher confidence). 51
4.1 Comparison of individual genes that are significantly differentially
expressed in response to each individual stimulus. Venn diagrams
depicting the overlap of genes that are A. significantly differentially
expressed, B. significantly up-regulated, and C. significantly down-
regulated in hTERT-HMA/B cell lines treated by P4 (green circle),
FSK (red circle), and IL-1β (blue circle) as compared to control
samples. D. Venn diagram depicting the overlap of genes that are
significantly up-regulated in IL-1β treated cells as compared to
control and significantly down-regulated in P4 and/or FSK treated
cells as compared to control. All comparisons are single stimulus
versus control and significance is defined as having |fold change| > 1.5
and p-value < 0.05 (after adjustment for multiple hypothesis testing
using the Benjamini Hochberg procedure). 55
xv 4.2 Pathways and transcription factors that exhibit significant
enrichment in differential expression in response to treatment
by P4, FSK, and IL-1β individually. The gene set collections for
Gene Ontology Biological Processes and transcription factor-targets
were obtained from the Molecular Signatures Database. Gene sets
significant (|fold change| > 1.5 and adjusted p-value < 0.05) in any
single stimulus condition versus control were identified using Gene
Set Variation Analysis (GSVA). For a subset of these significant A.
pathways and B. transcription factors, the magnitude and direction of
the collective fold change of the genes in the corresponding gene set,
as quantified by GSVA, is shown using three bars colored according to
the stimulus. 57
4.3 Regulatory networks representing the effects of each individual
stimulus. Regulatory network enrichment analysis (RNEA) was
performed and significant subnetworks were identified in samples
treated by A. P4,B. FSK (cAMP), and C. IL-1β based on gene expression
alterations compared to control. Nodes (genes and transcription
factors) are colored based on log2(fold change). Edges (regulatory
interactions) and node outlines are colored based on occurrence in
the 3 shown networks. Log2(fold change) cutoffs used for significance
in the RNEA function were 1.5 for P4 and IL-1β and 1.75 for FSK.
Visualization was performed using Cytoscape. 58
4.4 Regulation of shared targets for top enriched transcription factors
in response to individual stimuli. Comparison of how each stimulus
xvi alters expression of targets of selected transcription factors showing
strong enrichment from GSVA calculation in at least one of the
conditions P4, FSK, and IL-1β compared to control. For each of these
transcription factors, the number of known targets that are up- or
down-regulated by each stimulus and the overlap between these
targets are shown, along with the total number of known targets for
each TF. 61
4.5 Pathways that are activated/inhibited by IL-1β are
inhibited/activated by different combinations of P4 and FSK. The
gene set collections of Gene Ontology Biological Processes and
Transcription factor-targets were obtained from the Molecular
Signatures Database. Gene sets significantly altered (|fold change| >
1.5 and adjusted p-value < 0.05) from control to IL-1β were identified
using GSVA. These gene sets were then re-evaluated by GSVA for the
conditions of P4+IL-1β, FSK+IL-1β, and FSK+P4+IL-1β compared
with control. These gene sets were classified into 4 categories
according to their regulation pattern: 1) No inhibition (activated by
IL-1β and not inhibited by the addition of P4 and/or FSK), 2)
Inhibition by P4 (activated by IL-1β but this activation was reversed
by the addition of P4), 3) Inhibition by FSK (activated/inhibited by
IL-1β but this activation/inhibition was reversed by the addition of
FSK), 4) Inhibition by P4+FSK (activated/inhibited by IL-1β, this
regulation was not reversed by the addition of P4 or FSK individually,
but it was reversed by the addition of P4 and FSK together). For each
xvii of these four categories, the fold change of sample gene sets
(representing A. Gene Ontology Biological Processes, B. Transcription
Factor Targets) in response to different combinations of stimuli are
shown. See Supplemental files 10 and 11 for enrichment fold change
values for all GO BP and TF terms, respectively, significant in IL-1β
compared to control. These include enrichment fold changes for all
IL-1β conditions. 65
4.6 Regulation interplay of genomic signaling by IL-1β, FSK, and P4. A.
Inhibition of the regulatory network induced by IL-1β (same network
and inputs used for Fig. 4.3C) by different combinations of P4 and
FSK (e.g. an orange node is significant in IL-1β vs Ctrl and remains
significant in the same direction in P4+IL-1β vs Ctrl but becomes
insignificant or significant in the opposite direction in FSK+IL-1β
vs Ctrl and FSK+P4+IL-1β vs Ctrl). B. Regulatory patterns of P4 and
FSK. The network is obtained by performing RNEA for FSK+P4 vs Ctrl
with a log2(FC) cutoff of 2.4 and p-value cutoff of 0.05. Node colors
are based on P4 vs Ctrl, FSK vs Ctrl, and FSK+P4 vs Ctrl expression
comparisons. Node shape indicates differential expression of each
node, including direction of regulation. 66
5.1 Comparison of cell line and tissue gene set enrichment of GO
Biological Processes. Gene set variation analysis (GSVA) was
performed for the conditions P4 vs CTRL, FSK vs CTRL, IL-1β vs
CTRL and W-IL vs W-NL. Simple over-represenation anlaysis was also
performed for the 126 high-confidence genes identified in Fig. 3.1
xviii using Enrichr. A-D. Overlap in enriched GO BPs for different pairwise
comparisons of cell line-based and tissue-based enrichment. Shared
terms for A and D and their enrichment fold changes/p-values can be
found in Tables 7 and 8 Appendix A. 70
5.2 Comparison of cell line and tissue samples using Tissue Similarity
Index. Singular Value Decomposition (SVD) was performed on
the tissue gene expression matrix and phenotype centroids were
calculated in the reduced SVD space. Cell line samples were then
projected to this space and their correlations to the labor centroid
were calculated. A. Median correlations and standard deviations per
sample group are shown for all genes shared between the two studies,
highly significant (|fold change| > 5 and adjusted p-value < 0.001)
genes from the individual stimulus treated cells compared to control,
a set of key genes selected from the comprehensive analyses of cell
lines reported in Figures 4.1-4.7, and highly significant genes from the
tissue study by comparing in-labor and not-in-labor samples. B. A
heatmap of the key genes with median expression across all sample
groups. C. Visualization of samples and tissue centroids using the
first two singular vectors (or components) from the SVD analysis on
the key genes selected based on the cell line analysis. 73
xix Comprehensive Characterization of the Transcriptional
Signaling of Human Parturition through Integrative Analysis of
Myometrial Tissues and Cell Lines
Abstract
by
ZACHARY STANFIELD
The process of parturition is driven by multiple transcriptional pathways that work in concert to transform the quiescent myometrium (uterine smooth muscle) to the highly contractile laboring state. Here, important genes and pathways altered at the onset of labor were identified using three transcriptomic tissue studies and computa- tional analyses including classification, singular value decomposition (SVD), pathway enrichment, and signaling network inference. The interplay between three important signaling processes (progesterone (P4), cyclic AMP (cAMP), and inflammation) identi-
fied from the tissue analysis was then characterized through comprehensive analysis of a transcriptional study involving a human myometrial cell line (hTERT-HMA/B ) treated with P4, forskolin (FSK, induces cAMP), and interleukin-1β (IL-1β, mimics inflamma- tion) alone and in all combinations. Gene set enrichment and regulatory network anal- yses was then performed to identify pathways commonly, differentially, or synergisti- cally regulated by these treatments. Finally, results from cell line and tissue analysis were compared and used alongside Tissue Similarity Index (TSI) to further characterize the correspondence between cell lines and tissue phenotypes. In the tissue analysis, a
xx high-confidence set (significant across all studies) of 126 labor-associated genes were identified, signatures associated with labor included inflammatory related genes and pathways (e.g. signaling by interleukins, NF-κB, JAK-STAT) while the non-labor pheno- type was associated with relaxation pathways (e.g. cyclic AMP and muscle relaxation), and a parturition signaling network was constructed. From the cell line analysis, we ob- served that P4 was strongly anti-inflammatory (mainly through the JUN transcription factor), cAMP was partially anti-inflammatory and promoted myometrial relaxation, and P4 and cAMP synergistically blocked specific inflammatory pathways/regulators including STAT3/6, CEBPA/B, and OCT1/7, but not NF-κB. TSI analysis indicated that
forskolin+P4- and IL-1β-treated cells exhibit transcriptional signatures highly similar to
non-laboring and laboring term myometrium, respectively. In summary, identified tis-
sue signatures help classify/characterize stages of labor and the data-derived parturition
signaling networks provide new genes/signaling interactions to aid in future studies of
parturition while our cell line results identify potential therapeutic targets to prevent
preterm birth and show that the hTERT-HMA/B cell line provides an accurate transcrip-
tional model for term myometrial tissue.
xxi 1
1 Introduction
1.1 Overview of Human Parturition
Parturition, the process of birth, involves dramatic changes, collectively referred to as
labor, in the uterine tissues. The changes include the weakening and rupture of the fetal
membranes, softening and dilation of the uterine cervix to open the gateway for birth,
and activation of the myometrium (uterine smooth muscle) such that it contracts force-
fully and rhythmically to become the engine for birth. Therefore, the process and timing
of parturition involve transition of smooth muscle of the the myometrium from a state of
quiescence to the highly contractile laboring state. For most of pregnancy the myome-
trial cells are maintained in a relaxed and compliant state and undergo hypertrophy. At
parturition, hormonal factors impacting on myometrial cells and intracellular signaling
pathways within these cells affect the expression of genes whose products increase ex-
citability and contractility to transition the uterus to the laboring phenotype wherein
rhythmic and forceful myometrial contractions coupled with softening and dilation of
the cervix expel the fetus and placenta. The hormonal and intracellular signaling path- ways and gene expression networks involved in myometrial cell activation are not clearly
understood. 2
1.2 Preterm Labor
This lack of understanding of normal parturition is the reason for the lack of effective
therapies to prevent preterm birth (delivery before 37 weeks; the World Health Organiza-
tion (WHO) estimates 15 million babies are born prematurely each year) and is a major
point of study in reproductive biology. While there are certain factors that increase the
likelihood of preterm birth (e.g. multiple pregnancies, infections, high blood pressure),
the biological mechanisms are unknown. Actions during pregnancy such as maintain-
ing a healthy diet, avoiding harmful substances like tobacco, and multiple check-ups with a physician can help aid in preventing preterm birth. However, the identification
of clinical small-molecule interventions through biomedical research will go a long way
toward achieving major reductions in preterm birth rates.
1.3 Processes Controlling Human Parturition
In all viviparous species examined to date the steroid hormone progesterone (P4) acting via the nuclear P4 receptors (PRs) is essential for the establishment and maintenance of
pregnancy. P4 signals via PR throughout pregnancy to keep the myometrium relaxed.
In most mammals, the abundance of P4 in the myometrium significantly decreases at
term, leading to the removal of P4/PR signaling. However, in humans the abundance of
P4 remains high through term and labor. Therefore, it has been hypothesized that labor
occurs due to a function withdrawal of P4/PR signaling.
Part of the pregnancy maintenance carried out by P4/PR is achieved by inhibit-
ing the response of myometrial cells to pro-labor/pro-inflammatory stimuli.1–5 This 3
is important because parturition is an inflammatory process that is triggered by pro-
inflammatory stimuli inducing tissue-level inflammation within the myometrium, de-
cidua and cervix. For this reason, many preterm births may be attributed to high stress,
infection, or pre-existing conditions that stimulate the inflammatory response. Desensi-
tization of uterine cells, and especially myometrial cells, to pro-inflammatory stimuli by
P4/PR is therefore considered to be a principal mechanism for the P4 block to labor.6,7
Since the myometrium is a smooth muscle that exhibits states of relaxation and hypertrophy (phenotype for most of pregnancy) as well as contractility (phenotype dur- ing labor), pathways related to the former may also work to maintain quiescence and potentially block inflammatory signaling. The 3’,5’-cyclic adenosine monophosphate
(cAMP) intracellular signaling cascade has been shown to promote uterine relaxation by inhibiting myometrial cell contractility.8,9 Typically cAMP functions as a second mes- senger, acting through protein kinase-A (PKA), a kinase that phosphorylates a diverse set of downstream targets to produce various cellular responses.10 PKA suppress my- ometrial cell contractility by inhibiting myosin light chain kinase activity and decreas- ing the intracellular concentration of free Ca2+.11–15 Studies in airway smooth muscle cells have shown that cAMP signaling inhibits response to pro-inflammatory stimuli by suppressing the activation of nuclear factor kappa-light-chain-enhancer of activated B cells (NFκB) and specific mitogen-active protein kinases (MAPKs).16 Thus, hormones whose actions are mediated by the cAMP/PKA pathway may decrease myometrial cell
contractility by inhibiting the contractile apparatus and response to pro-inflammatory
stimuli.
A recent study found that cAMP/PKA and P4/PR signaling in myometrial cells ex-
erts a potent and synergistic anti-inflammatory effect.17 Those findings were consistent 4
with other studies showing that treatment of myometrial cells derived from term uterus with forskolin (FSK), a natural bicyclic diterpene that increases intracellular cAMP by
stimulating adenylyl cyclase, increased the anti-inflammatory actions of P4.3 Thus, anti-
inflammatory synergism between cAMP and P4/PR signaling in myometrial cells ap-
pears to be a major mechanism by which P4/PR promotes uterine quiescence for most
of pregnancy.
1.4 The Role of Transcriptomic Studies in Parturition Research
Transition of the myometrium from quiescence to labor is thought to be controlled
at the transcriptional level through changes in the expression of specific genes whose
products increase contractibility and excitability. High-dimensional transcriptome pro-
filing technologies provide an opportunity to examine the gene expression landscape within laboring and non-laboring myometrium to identify the gene sets controlling la-
bor. This approach allows unbiased discovery of regulatory pathways and mechanisms
associated with parturition.
Transcriptional differences between laboring and non-laboring human
myometrium has been examined by multiple studies in the last two decades. Early
studies examined a predefined gene set or performed functional genomics via smaller
expression arrays.18–20 Over the last decade, microarray21–26, and more recently RNA
sequencing (RNA-seq)27 platforms were used to determine the extent of expression of
thousands of genes in samples of term myometrium obtained at the time of cesarean
section delivery performed in women with no clinical signs of labor or from women in
active labor. These studies identified multiple differentially expressed genes between 5
laboring and non-laboring myometrium, especially with technologies that allowed
greater coverage of the transcriptome. Genes with previous ties to parturition (e.g.
PTGS2, IL8, OXTR) were validated in multiple studies using various experimental
techniques. On aggregate the data showed that labor is associated with inflammatory
signals, including genes/pathways related to cytokine signaling, chemotaxis, and
immune response, and to a lesser extent, genes/pathways associated with non-labor,
such as smooth muscle-related processes and cell adhesion. However, despite the
generation of numerous comparable myometrium transcriptome datasets, a
reproducible and reliable transcriptional signature and signaling network for human
labor has not been identified and comparison of the variability and consistency
between larger recent studies is lacking.
1.5 The Role of Experimental Cell Lines in Parturition Research
To expand human parturition research, cell line models have been employed in recent years. Cell lines open new avenues of investigation into pregnancy due to the level
of control allowed as opposed to the limited space for intervention in tissue samples.
Myometrial cell line models include term cultured cells28 and cell lines PHM1-4129,
PHM1-3130,31, and hTERT-HM.1,2,4,17,32 Studies using these cell lines examine various
aspects of myometrial bio-molecular actions relevant to parturition including interac-
tions/importance of the signaling and abundance of the PR isoforms (PR-A and PR-B)
and the effect of specific pro-labor stimuli (e.g. prostaglandins and oxytocin). Results
obtained from these studies have advanced the current understanding of reproductive
biology. However, most cases only focus on the effect of a single intervention and/or on a 6
very small set of genes/molecules of interest. Parturition is a highly complex physiolog-
ical event involving multiple biological genes and pathways. Therefore, research needs
to be done to understand not only how each of these pathways play a role individually
in the myometrium but also how they work in concert to achieve the transition from
quiescence to contractility. Observation of these changes on a global scale (e.g. using
genome-wide transcriptional profiling) can also help identify important aspects of the
signaling machinery relevant to the transition to the labor state. Furthermore, to make
meaningful connections between cell line observations and myometrial tissue, it is im-
portant to establish how well cell line models align at all levels of bio-molecular activity
(gene expression, pathway activity, etc.) with true myometrial tissue phenotypes.
1.6 Hypotheses and Objectives
Motivated by these considerations, a comprehensive analysis was performed on tran-
scriptional data from both clinical myometrial tissue and a myometrial cell line. A va-
riety of complementary bioinformatics methods were employed to identify global gene
expression changes occurring at the onset of labor in tissue expression data. This anal-
ysis utilized three top transcriptional datasets in the field to discover clear, reliable pat-
terns of expression in term myometrium. A similar set of techniques were applied to cell
line expression data to characterize the signaling of P4, cAMP,and inflammation individ-
ually as well as the interplay of these process in the context of parturition. Finally, to aid
in the extrapolation of the obtained cell line results to the in vivo process represented by 7
the tissues, we calculated the similarity of transcriptional landscapes across tissue phe-
notypes and a cell line with various exposure conditions. Therefore, the main hypoth-
esis of this work is that transcriptional changes in the myometrium between laboring
and non-laboring women are important in human parturition and that these changes
can be identified and characterized using a variety of bioinformatics techniques. Fur-
ther analyses involving a myometrial cell line can improve this characterization and the
overall understanding of parturition such that generation of testable hypotheses in both
clinical tissue samples and cell lines can be achieved.
1.6.1 Characterizing Tissue Gene Expression
Three tissue gene expression datasets were integrated to identify the core genes, signal-
ing networks, and biological pathways involved in the transition of term myometrium
from quiescence to the laboring state. This was achieved by applying a series of su-
pervised and unsupervised computational techniques to extract myometrial gene ex-
pression signatures involved in the onset of labor. First, standard differential expression
analysis was carried out to obtain a high-level view of the discrepancies in gene expres-
sion across phenotypes and identify a high-confidence set of genes whose expression
level is altered between labor and non-labor. Two machine learning algorithms were
then used to train and test gene expression based classifiers for predicting labor. This
approach also enabled the assessment of within-study sample quality and consistency
as well as cross-study agreement and sample group variability. To further enhance the
gene signatures identified at the first step, the learned model coefficients were used to
identify the transcriptional signatures that are most predictive of phenotype for each
dataset. 8
Unsupervised analyses were also used to investigate whether regulatory patterns extracted from integrated gene expression data contain information relating to given phenotypes. For this purpose, singular value decomposition (SVD) was applied to the combined expression matrix of all transcriptional studies to identify dominant patterns of expression, the genes that participate in the patterns, and the activity of the patterns in samples across studies. Overall our analyses showed the presence of a robust tran- scriptional signature of labor in term myometrium, both when datasets were examined individually and together. This signature was also observed at the pathway level where consistent enrichment was seen for pathways associated with labor (tumor necrosis fac- tor alpha (TNFα) signaling via NF-κB, MTORC1 signaling, and inflammatory response) and non-labor (vascular smooth muscle contraction, progesterone response, and cyclic
AMP (cAMP) signaling). Insights into the consistency of each dataset and sample were also obtained as well as the variability within laboring and non-laboring clinical tissue sample groups. Finally, observed gene and pathway signatures were integrated with existing network information to build signaling networks of parturition and validated these networks with knowledge from the literature.
1.6.2 Characterizing Cell Line Gene Expression
The inter-play of three signaling pathways known to affect the contractile phenotype of the human myometrium during pregnancy: 1) the steroid hormone progesterone (P4) acting via the P4 receptors (PRs); 2) the pro-labor/pro-inflammatory cytokine interleukin-1β (IL-1β), and 3) the 3’,5’-cyclic adenosine monophosphate (cAMP)
intracellular signaling cascade was examined. 9
RNA sequencing followed by global, unbiased gene expression analysis was per- formed to determine the effects of each signaling pathway alone and in combination.
Pathway enrichment analysis and regulatory network inference was used to fully char- acterize P4/PR, cAMP, and IL-1β signaling. The same techniques were then used to investigate the anti-inflammatory actions and synergistic effects of P4/PR and cAMP on responsiveness to IL-1β. Results show that the main action of P4/PR in myome- trial cells is anti-inflammatory, working through the JUN transcription factor, and sup- pressing pathways involved in muscle tissue development, leukocyte migration, and
MAPK signaling. cAMP also exhibited anti-inflammatory effects and, interestingly, pro- moted pathways involved in cell hypertrophy and smooth muscle contraction. P4/PR and cAMP acted synergistically to down-regulate genes involved in cytokine produc- tion, the STAT signaling cascade, and smooth muscle cell proliferation and up-regulate a set of genes controlled by transcription factors involved in developmental processes and suppression of inflammation. The data suggests that P4/PR, cAMP, and IL-1β sig- naling share a subset of regulatory machinery, allowing for the observed inter-pathway regulation.
1.6.3 Determining the Agreement between Tissue and Cell Lines
Differentially expressed genes, enriched pathways, and inferred signaling networks were compared across tissue and cell line results. Similarities were shared at all levels and provided insights into which actions of different signaling processes create the tissue phenotypes. To more directly characterize these similarities, a technique introduced in the literature, the Tissue Similarity Index (TSI), was employed. It was observed that, at 10
the transcriptional level, the hTERT-HM myometrial cell line exposed to the combina- tion of forskolin and P4 was an accurate approximation for non-laboring myometrium and IL-1β-treated cells best resembled laboring myometrium. Therefore, this observed similarity serves as an initial validation for all the results outlined in chapter 4, in terms of their extrapolation toward in vivo myometrial action, and supports the use of this cell line model in future experiments regarding the importance of various signaling path- ways in human parturition. 11
2 Materials and Methods
2.1 Data
2.1.1 Tissue Data
This work analyzed three gene expression datasets generated from myometrial biopsies of women at term ( 38 weeks) undergoing cesarean section (Table 2.1). Raw data files ≥ for the Warwick dataset were downloaded from the NCBI Gene Expression Omnibus
(GEO), accession number GSE50599, the Detroit study files were obtained directly from the study authors, and the London study raw data was published to GEO (GSE80172).
Full details of tissue collection procedures, sample phenotyping, and mRNA expression measurement can be found in the original research papers (Chan et al, 2014; Mittal et al,
2010)24,27. The following two sections describe the methods used to generate the Lon- don dataset, which were carried out by the lab of Dr. Mark Johnson at Imperial College
London.
Sample Collection. Myometrial biopsies from women undergoing cesarean section were collected in accordance with the Declaration of Helsinki guidelines, and with approval from the local research ethics committee for Chelsea & Westminster Hospital
(London, UK; Ethics No. 10/H0801/45). Informed written consent was obtained from 12
Study Platform Sample Groups Group Size Clinical Definition of Labor Non-Labor (W-NL) N = 5 Contractions (<3min apart), cervical Warwick RNA-seq In-Labor (W-IL) N = 5 dilation (>2cm), membrane rupture Non-Labor (L-NL) N = 8 London RNA-seq Early-Labor (L-ILEa) N = 8 Cervical dilation < 3cm (L-ILEa) Established-Labor (L-ILEs) N = 6 Cervical dilation > 3cm (L-ILEs) Non-Labor (W-NL) N = 20 Contractions (<10min apart), cervical Detroit Microarray In-Labor (W-IL) N = 19 dilation requiring hospitalization
Table 2.1. Myometrial tissue gene expression studies analyzed in this chapter. Three gene ex- pression datasets collected from publicly available sources or collaborators are shown. Sample groups and names are consistent with the acronyms listed in column three throughout the chap- ter. Clinical definitions of labor represent how each sample was called phenotypically, based on the patient’s clinical presentation at the time of cesarean section, as chosen by the researchers collecting the samples. all women who participated. Biopsies were excised from the upper margin of the incision made in the lower segment of the uterus, immediately washed with Dulbecco’s phosphate-buffered saline (Sigma) and dissected into pieces approximately measuring
2-3 mm3. For RNA study, biopsies were immersed in RNAlater (Sigma) within 6 minutes after biopsy excision from the uterus and stored at 4°C overnight, before being taken out of RNAlater solution to be frozen for long-term storage at -80°C. Labor was defined by the presence of regular uterine contractions associated with progressive cervical dilation. All specimens were categorized into three groups according to their labor stages: term not in labor (NL, n=8), term in initial or early labor (ILEa, defined as cervical dilatation 3 cm, n=8) and term in established labor (ILEs, defined as cervical < dilatation 3 cm, n=6). The reason NL patients underwent cesarean section included > obstetric indications (e.g. persistent breech presentation) or maternal request due to tocophobia. Women in the labor group underwent cesarean section for indications including fetal distress, breech presentation and previous cesarean section. Women with gestational diabetes, preeclampsia, chorioamnionitis, multiple pregnancy, and/or given labor-augmenting drugs (prostaglandins and oxytocin) were excluded. 13
RNA Extraction, Library Preparation, and Sequencing. For each sample, 60-100 mg of myometrium tissue were extracted in TRIzol (Life Technologies) by mechanical ho- mogenisation in a Precellys 24 bead-based homogeniser using 5 cycles of 5000 rpm for 20 seconds, before chloroform treatment and centrifugation at 4°C. RNA was ex- tracted from the aqueous phase of centrifuged homogenates using the TRIzol Plus RNA
Purification kit (Life Technologies) with on-column DNase treatment prior to elution, all according to manufacturer’s instructions. Final RNA samples were stored at -80°C.
The quantity and quality RNA was measured using a Nanodrop ND-1000 spectropho- tometer (LabTech), Qubit fluorimeter (Life Technologies) and Bioanalyser 2100 (Agilent
Technologies). Preparation of cDNA libraries was carried out using the TruSeq Stranded mRNA Sample Preparation kit (Illumina), following the high-throughput sample (HT) protocol. The quantity and quality of cDNA libraries were also tested by a Qubit fluo- rimeter and Bioanalyser 2100. TruSeq Stranded libraries were then multiplexed and se- quenced with the average of 42 million DNA fragments per sample (100 bp paired-end reads). Quality control was performed using FastQC software (version 0.11.2).
2.1.2 Cell Line Data
Data was obtained from cell culture studies performed in a telomerase immortalized human myometrial cell line, hTERT-HMA/B, expressing stable and independently in- ducible PR-A and PR-B as described in Amini et al. (2018)17. Briefly, cells were induced
to express equal levels of PR-A and PR-B, and exposed to progesterone (P4), IL-1β, and forskolin (FSK, to stimulate cyclic AMP) alone and in all combinations (Fig. 1A). All ex- perimental conditions were performed in triplicate. Stimulation of these pathways was confirmed by qRT-PCR showing IL-1β induced expression of IL-8 and that this induced 14
A B
CD
Figure 2.1. Study design and conceptual illustration of the framework for data analysis. A. The human myometrial hTERT-HMA/B cell line, expressing the progesterone receptors (PRs), was treated with progesterone (P4), forskolin (FSK, induces cAMP), and IL-1β individually and in all combinations of these three stimuli. Treatment was performed for 24 hours to establish steady-state cellular transcription. RNA sequencing was performed in triplicate on each con- dition, resulting in expression measurements for at total of 24 samples. B. Stimulation of the targeted pathways was confirmed by examining the effect of different treatment combinations on mRNA expression of IL-8, a commonly used downstream target of IL-1β and indicator of in- flammatory signaling. IL-8 induced expression is achieved with IL-1β treatment. This induced expression is inhibited by P4 in the presence of PR. cAMP inhibits IL-1β induced expression of IL-8, and, finally, the strongest inhibition of IL-8 up-regulation is achieved by treating cells with both FSK and P4. These experiments were carried out by a collaborator, Dr. Peyvand Amini, and are shown here to improve clarity. C. Comparisons performed to characterize the signal- ing carried out by P4, cAMP,and IL-1β individually. The results of these comparisons are shown in Figures 4.1-4.4. D. Comparisons performed to examine the interplay between P4 and cAMP in the context of their anti-inflammatory effects. Samples treated by IL-1β and different com- binations of P4 and FSK were first compared to vehicle. Subsequently, the genes, transcription factors, and processes identified in these comparisons were compared to each other to under- stand how P4 and cAMP work together or individually to suppress inflammatory processes. The results of these analyses are shown in Figures 4.5 and 4.6. expression of IL-8 was inhibited by P4 in the presence of PR and by FSK (Fig. 1B). Figure
1C-D depicts the comparisons of various conditions performed throughout this chapter. 15
2.2 Methods
2.2.1 Raw Data Processing, Calculating Differential Expression, and Study
Concatenation
RNA-seq datasets (Warwick, London, and cell line data) were processed separately using identical pipelines. FASTQ data files were analyzed using FASTQC (http://www.bioinfor matics.bbsrc.ac.uk/projects/fastqc). Transcripts were aligned using tophat against the
University of California Santa Cruz (UCSC) hg19 reference transcriptome.33 Transcripts were assembled and RNA abundance was estimated using cufflinks on the Bam files
resulting from tophat alignment (version 2.2.1)34. A single run of the cuffdiff com- mand was run on all samples for each dataset. The Warwick and London tissue gene expression matrices were built using the genes.read_group_tracking cuffdiff output file.
Genes having zero expression in three or more samples were removed prior to down- stream analysis. For the cell line dataset, the cuffdiff output was further analyzed using cummeRbund in R (version 3.5.1).35 Differential expression was performed for labor- ing vs non-laboring phenotypes for the tissue studies and all sample group compar- isons were performed for the cell line study. Genes exhibiting f old change 1.5 | | > and ad justed p value 0.05 (adjusted using the Benjamini Hochberg procedure − < for multiple hypothesis test correction) were deemed significant. The cell line gene ex-
pression matrix was obtained using the repFpkmMatrix() function in the cummRbund
package. The microarray dataset (Detroit; Illumina) was processed as in Mittal et al
24 (2010). Briefly, the raw data file was log2 transformed followed by quantile normal-
ization. Probes having expression within the background range in at least five samples were removed. Duplicate rows were also removed. 16
As will be described below, certain analyses employed involved working with an ex- pression matrix containing all tissue samples (i.e. concatenation of the expression ma- trices from all 3 studies). In these cases, additional processing and normalization was performed on the individual matrices. This is due to the differences in the RNA-seq and microarray platforms, which each have specific characteristics and expression distribu- tions. For this reason, any classification or comparison of datasets from these different platforms required prior normalization. The two RNA-seq studies, Warwick and Lon- don, were transformed using log (X 1) in order to shift them to a similar scale as the 2 + Detroit microarray study, which, as mentioned above, was already log transformed. All datasets were then row and column normalized to achieve comparable distributions of expression across genes and samples. A descriptive diagram of the described normal- ization and downstream analysis pipeline (see next sections) performed on the tissue studies (results in chapter 3) is provided to improve clarity (Fig. 2.2).
Figure 2.2. Data and methods used in bioinformatics pipeline. Raw gene expression was pro- cessed in multiple steps and combined with curated information from public databases to carry out supervised and unsupervised approaches to identify transcriptional signatures, assess their reproducibility, and identify key signaling pathways and interactions involved in parturition. 17
2.2.2 Classification
The R package glmnet was employed to perform classification with the least absolute shrinkage and selection operator (LASSO) and elastic net (EN) algorithms.36 Classifica- tion was performed using the cv.glmnet function, setting the “family” argument to “bi- nomial” for Warwick and Detroit (two sample groups: labor and non-labor) and “multi- nomial” for London (three sample groups: non labor, early labor, and established la- bor). Dataset and sample group variability were assessed by performing k-fold cross- validation on each study with k=3 for Warwick and k=5 for London and Detroit. A smaller k value was chosen for Warwick as the study consisted of only 10 samples, while 5 was chosen for London and Detroit as 5-fold cross-validation is common practice in many classification problems. For drawing of Receiver Operating Characteristic (ROC) curves and calculation of the Area Under the ROC Curve (AUC; measure of classification per- formance), the London dataset was converted to a binary problem by combining the early labor (L-ILEa) and established labor (L-ILEs) sample groups into a single laboring group whereas the multinomial method was used to calculate sample classification fre- quencies. Due to the variability of cross validation, one hundred rounds of training and prediction was carried out for each classification task. Model features (genes) and their coefficients were recorded for each training run. To assess the importance of each gene in discriminating labor from non-labor samples, computation of the number of models in which a gene was retained as a feature and each gene’s mean model coefficient was performed.
To assess the reproducibility of the machine learning models, cross-classification was also performed, namely by making predictions on each of the three datasets using the model that was trained on each of the other two studies individually. This resulted 18
in six cross-classification experiments, in which each dataset served as the training data in three experiments and as the test data in three experiments. Except for the London dataset, which was reduced to the L-NL and L-ILEs samples for consistency with the other two studies, the full datasets were used in training, which included k-fold cross- validation to obtain optimal model parameters as calculated by the cv.glmnet function within the glmnet package. All studies were binomial classification problems and the
training and prediction was performed one hundred times.
2.2.3 Singular Value Decomposition
To extract common patterns and gene expression signatures across the three studies,
singular value decomposition (SVD) was used.37 SVD is an unsupervised mathematical
approach that maps a high dimensional matrix to lower dimensional space and iden-
tifies patterns of variability within the data. For a given gene expression matrix M of
size mxn (m genes and n samples), SVD decomposes this matrix into three matrices U,
Σ, and V T (the transpose of V ) such that M UΣV T . Σ is a diagonal matrix consist- = ing of singular values sorted from largest (first entry in matrix) to smallest (last entry).
These singular values correspond to the strength of each pattern in the data (i.e. how well the pattern approximates the variability within M). U and V contain the left and right singular vectors where columns of U are considered eigen-assays and rows of V T eigen-genes.38
Using the normalized gene expression matrices (see previous section) for the three studies, a combined gene expression matrix M consisting of all 71 samples (10 Warwick
+ 22 London + 39 Detroit) and the 12,622 genes measured in all three studies was con- structed. Singular value decomposition was then applied to M. Singular values were 19
assessed to determine the significant patterns within the expression matrix by observ- ing the diagonal entries of Σ. The dominant gene pattern was obtained by plotting the
first column of U and the dominant sample pattern is represented by the first row of V T .
For pathway-specific SVD analysis, the same procedure as above was performed with the difference being the expression matrix on which the analysis was done. Genes for each pathway or gene set were extracted and the gene expression matrix was reduced from the 12,622 shared, measured genes to only contain the genes involved in the path- way. The procedure was done individually for each pathway.
2.2.4 Enrichment Analysis
Tissue Data. Pathway enrichment was performed in two ways to better interpret the gene expression signatures obtained from the differential expression analysis. A high level approach was used by inputting the gene symbols of the 126 high-confidence genes, obtained from comparing differential expression across all studies (Fig. 3.1), into the web-tool Enrichr to determine which pathways were most strongly and consistently altered between laboring and non-laboring myometrium.39 Enrichment was assessed using the KEGG 2016 result under the Pathway tab on the Enrichr results webpage.40 Terms were sorted by adjusted p-value. A within-dataset approach was used by performing Gene Set Variation Analysis (GSVA) to identify the pathways most differentially expressed between labor and non-labor conditions specific to each dataset. GSVA is a publicly available tool that is implemented in R to identify the gene sets that are collectively differentially expressed between two different experimental conditions41. GSVA is a non-parametric method applied to the full expression matrix as opposed to working with a list of differential expression scores for each gene. It 20
transforms the gene by sample matrix to a gene set by sample matrix. This allows for use of expression information about every gene (including non-significant ones) in a given gene set to asses the collective differential expression of the set and calculates an enrichment fold change for each gene set. Application of GSVA across all three tissue studies allowed for observation of signature consistency at the pathway level. The hallmark gene sets from the molecular signatures database (MSigDB; version 5.2) was obtained via the Broad Institute website and loaded into R using the getGmt() function42,43. GSVA was then applied individually to the expression matrices of the
Warwick, London, and Detroit studies after IQR filtering to remove genes with low expression variance across all samples. The raw expression matrices from the Warwick and London studies were log2() transformed and run with the argument
RNA-seq=FALSE, as recommended in the GSVA documentation. Gene sets of size 5 to
500 were allowed in the enrichment analysis. Finally, the limma package was used to fit models to each gene set, and significance was calculated by applying an empirical
Bayes statistical test44. P-values were adjusted using the Benjamini Hochberg procedure for multiple hypothesis testing. Clustering of the enrichment results was performed on the top 30 genes sets (selected by median p-value over the 4 study comparisons) using the pheatmap package with correlation between fold change enrichment as the distance measure and complete clustering as the clustering method.
For the hypothesis-driven pathway analysis, MSigDB was used to identify and build a set of parturition-relevant gene sets (P4, inflammation, cAMP/smooth muscle) from multiple sources (GO, KEGG, etc.). GSVA was then implemented on 32 gene sets for each study individually. Of this initial set, 26 met the 5 gene size requirement (previous 21
paragraph) after IQR filtering. Results for all 26 gene sets were then clustered by GSVA fold change values as with the global analysis.
In all figures, the log2 enrichment fold change is used. This is achieved by calling the gsva() function on the gene expression matrix to obtain a matrix whose entries are the enrichment scores for a given pathway (row) in a given sample (column). These are then fit to an empirical Bayes model to identify significantly associated gene sets and calculate the enrichment fold change with respect to the condition of interest (e.g. labor, P4+FSK, etc.).
Cell Line Data. GSVA was used to identify pathways and transcription factors that were enriched in differentially expressed genes/targets between treated cell line conditions and the control condition. The input expression matrix derived from the cell line RNA- seq contained 16,500 genes and 24 samples after removal of genes exhibiting zero ex- pression in more than 3 samples. The Gene Ontology Biological Processes (GO BPs) and Transcription Factor-Targets gene set collections were obtained from the molecu- lar signatures database (MSigDB) via the Broad Institute website43. Gene sets for the
FOS and JUN transcription factors were missing from the Transcription Factor-Targets
gene sets. Considering these are of much interest and relevance in inflammatory sig-
naling, and therefore parturition research, sets for FOS and JUN were created using the
curated transcription factor network employed in the regulatory network enrichment
analysis method (described below; includes data from the TRED, TRRUST, TFACTS, and
OREGANNO databases). These gene sets were loaded into R using the getGmt() func-
tion. The same procedure employed for the tissue data was applied here (log2() trans-
formation, IQR filtering, gene set size restriction, fitting models with limma, and p-value 22
adjustment using the Benjamini Hochberg procedure). GSVA was applied for each con- dition against the control sample group.
To gain insights into how targets of TFs enriched in multiple conditions were reg- ulated, the overlap of their significantly differentially expressed targets in cells treated with P4, FSK, and IL-1β individually as compared to control was calculated. For each significant target of a given TF in a given comparison, their direction of fold change (up- or down-regulated) was identified using the original differential expression calculation.
The sets of up- and down-regulated targets for each TF were compared across different cell line treatment groups.
2.2.5 Annotation of Vascular Smooth Muscle Contraction Pathway Diagram
The vascular smooth muscle contraction (VSMC) pathway diagram was retrieved from the KEGG pathway database. The pathway information was obtained by downloading the associated KEGG Markup Language (KGML) file for the “Homo sapiens (human)” option in the organism drop-down bar on the KEGG pathway website. This pathway was imported into Cytoscape version 3.3. Node (gene) colors were assigned by performing
SVD on the Warwick gene expression matrix reduced to the genes in the VSMC path- way (see Singular Value Decomposition section of this chapter). The Warwick study was used due to its observed higher quality in the classification tasks and because it allowed for population of the greatest proportion of nodes in the pathway with a gene loading value. These gene loadings for the dominant singular vector were imported into Cy- toscape as a node table and used in a continuous mapping for the node color (blue rep- resents higher association with non-labor, red higher association with laboring samples) attribute within the “Style” tab. 23
2.2.6 Signaling Network Construction
Parturition signaling networks were created by performing shortest path calculations on a curated signaling network from Ritz et al. (2016) (Pathlinker version 1.1).45 Briefly, this network was built by integrating multiple protein-protein interaction networks with pathway databases to create a high-confidence, directed signaling network. Edges in this network were weighted based on experimental evidence and commonality of gene ontology annotations. Nodes were then weighted based on the gene loadings from the
first singular vector obtained from applying SVD on the normalized, combined tissue expression matrix (12,622 genes X 71 tissue samples). The raw gene loadings were nor- malized to the range of [0,1] after performing absolute value to obtain appropriate node costs (nodes with cost near zero were genes with highly negative and positive singular value loadings, the differentially expressed genes, making them more likely to be chosen in the final paths).
Shortest paths were then computed between upstream pathway initiator proteins to downstream target genes. Specifically, Dijkstra’s algorithm was used to calculate the shortest path between each pathway initiator and each target gene.46 Additionally, a requirement of these shortest paths was that the last step in the path be a known tran- scription factor-target gene interaction (this target being one of the downstream tar- get genes based on the gene expression studies), where transcription factor interactions were identified utilizing the TRED47, TFactS48, Oreganno49, and TRRUST50 databases
(combined interactions from Chouvardas et al., 201651). The pathway initiator pro-
teins chosen here were PGR, interleuking 1 receptor type 1 (IL1R1), and adenylyl cyclase
(ADCY) to represent the first protein in the signaling cascade for P4 signaling, inflam-
matory signaling, and cAMP signaling, respectively, which were believed to play a role in 24
controlling parturition through existing hypotheses and our targeted pathway analyses.
ADCY1-9 all connect to the same set of six proteins directly downstream in the signal- ing network, meaning that each ADCY has the same interactions or path options when building the network. Therefore, any of these ADCY nodes can be chosen as the start of the path. ADCY9 was chosen out of these nine possible proteins due to its loading for the dominant singular vector, which was the closest in magnitude to the loadings for
PGR and IL1R1. This helped ensure that the cAMP portion of the obtained networks was not unfairly diminished (increased) by a low (high) receptor node value when restricting the final network based on cost of paths.
Three separate gene sets were used to seed the shortest path calculations as the tar- get genes. For the general parturition network, the top 50 genes associated with each phenotype based on the loading value from the first singular vector on the full tissue expression matrix were used (100 total nodes). For the non-laboring network, the top
50 genes associated with the non-labor phenotype (again, using gene values from the
SVD analysis) were used as the downstream targets. Finally, the top 50 genes associated with the labor phenotype were used to create the laboring signaling network. For these latter two shortest paths calculations, the node weights of all genes in the original net- work were altered to favor paths with incorporated gene products related to the chosen downstream genes. In other words, for non-laboring, the raw gene loadings from SVD were again normalized to the range of [0,1] but without the prior application of absolute value. This ensured that genes highly associated with the non-labor signal have a low cost in the network while those genes associated with the labor signal have a high cost.
These node weights were then reversed for the laboring network. Finally, a path cost 25
threshold was applied to the first of these three networks to obtain a network of rea- sonable size (Fig. 3.7A). For this set of target genes, the cost of each unique path (sum of node and edge weights making up path from initiator protein to target gene) was treated as a distribution, ranked from least to most costly. The final network was then obtained after discarding the top 50% of paths when sorted from highest to lowest cost.
2.2.7 Regulatory Network Analysis
Regulatory network enrichment analysis (RNEA) was employed on the cell line dataset
(results in chapter 4) to infer the genomic signaling induced by each stimuli, as well as compare the networks of different stimuli.51 For RNEA, fold changes and adjusted p- values from differential expression were used to build a regulatory network specific to an experimental condition, drawing from curated transcription factor (TF) target data.
Differentially expressed TFs and their targets were included in the network and non- significant TFs along a pathway that have a significant gene both directly upstream and downstream were also included in the final output network. Differential expression sta- tistics were obtained for all genes using the diffData() function in the cummeRbund package. These were analyzed using the RNEA() function in R, requiring a fold change greater than 1.5 and an adjusted p-value less than 0.05 for significance. Networks were visualized using Cytoscape, with fold change values mapped to the included nodes via the same input data that was used for RNEA.52 The networks in Figures 4.3 and 4.6 are subnetworks of the full RNEA result achieved using stricter input arguments for fold change and p-value for the sake of identifying the most important changes. The hier- archical layout was applied for all networks. Nodes in Figure 4.6 were colored based 26
on gene regulation across multiple comparisons of differential expression (e.g. all IL-1β conditions compared to control for Figure 4.6A).
2.2.8 Comparison of Myometrial Cell Line and Tissue
To compare the tissue phenotypes and treated cell lines, the Warwick and cell line gene expression matrices were used. The two matrices were individually log2() normalized,
reduced to the set of genes measured in both studies, and concatenated such that the re-
sulting expression matrix contained 34 samples (10 tissue, 24 cell line) and 15388 genes
(G 15388 intersect(G , G ) where G is the set of genes in a gene expres- comb = = ti ssue cell line sion matrix). The R function removeBatchEffect() (part of the limma package) was then
used to further normalize the expression matrix to make the two datasets comparable.
The data was split back into the tissue and cell line matrices, which were then individ-
ually row (gene) centered using the function scale() with the input arguments center =
TRUE and scale = FALSE.
To quantify the similarity in expression between the tissue and cell line samples, a
method introduced by Sandberg and Ernberg (2005) called tissue similarity index (TSI) was implemented in R 53. Briefly, singular value decomposition (SVD) was applied to
the tissue gene expression matrix after batch removal and row-centering. A phenotype
centroid was calculated for the laboring and non-laboring sample groups by averaging
the projected tissue samples using all singular vectors. The cell line samples were then
projected into the SVD space, again using all singular vectors derived from the tissue
data. Correlations between the projected cell line samples and the phenotype centroids were calculated and median and standard deviation values over treatment conditions 27
were obtained. Visualization of sample clustering was performed using the first two singular vectors.
Tissue Similarity Index was computed in the same manner on gene expression ma- trices reduced to specific sets of genes after batch removal and before row centering.
Genes differentially expressed with a fold change 5 and adjusted p-value 0.001 in | | > < P4, FSK, and IL-1β treated cells versus CTRL (145 genes) and in labor versus non-labor tissue (151 genes) were identified for TSI calculation. Key cell line genes were manually selected by identifying highly differentially expressed cell line genes across multiple in- dividual stimuli versus control comparisons that additionally occurred in the visualized
RNEA subnetworks. A heatmap of key genes was generated using the pheatmap pack- age in R. Median expression values were obtained for each sample type using the same input matrix as was used with TSI. Rows (genes) were clustered by correlation using the complete clustering method. Sample clustering based on TSI results was visualized us- ing ggplot2. Tissue samples, phenotype centroids, and cell line samples were projected onto the first two singular vectors from the SVD calculation using the key cell line genes. 28
3 Characterizing Transcriptional
Alterations at the Onset of Labor in
Myometrial Tissue
3.1 Differential Gene Expression in Quiescent and Laboring
Myometrium
Differential gene expression analysis on three transcription studies (Table 2.1) was used to assess the transcriptional landscape of quiescent and laboring myometrium. The transcriptome datasets comprised two previously published high-quality studies.24,27
The Chan et al. (2014) dataset is from a RNA-seq analysis of non-labor (NL; n=5) and in labor (IL; n=5) term myometrium collected from patients in Warwick, UK. This dataset was designated Warwick (W)-NL and W-IL. The Mittal et al. (2010) dataset is from a microarray analysis of NL (n=20) and IL (n=19) term myometrium patients in Detroit,
MI, USA. This dataset was designated Detroit (D)-NL and D-IL. A third RNA-seq dataset using myometrium collected from 22 women at term cesarean section delivery under- taken at the Chelsea & Westminster Hospital, London, UK was also used. This study, 29
designated London (L), included L-NL (n=8) and L-IL (n=14) myometrium. The L-IL tissue was further stratified into early (Ea) stages of active labor (L-ILEa: defined as con- tractions with cervical dilation <3 cm, n=8) and during established (Es) labor (L-ILEs: defined as contractions with cervical dilation >3 cm, n=6). (See Materials and Methods for more detail regarding sample collection and RNA preparation and measurement.)
For the London dataset, significant ( fold change 1.5 and p-value 0.05) differ- | | > < ential expression was detected for 417 genes between L-NL and L-ILEa (397 up and 20 down in L-ILEa), 547 between L-NL and L-ILEs (440 up and 107 down in L-ILEs), and, surprisingly, none between L-ILEs and L-ILEa. Initial analyses detected no significant difference in the transcriptomes of L-ILEa and L-ILEs, which suggests that gene expres- sion changes associated with the onset of labor persist through early and late stages of the parturition process. This outcome also indicates that labor associated changes in myometrial gene expression occur early in the labor process and persist.
Analysis of the Warwick dataset identified 2468 genes differentially expressed
(1227 increased and 1241 decreased based on mRNA abundance) in W-IL compared with W-NL myometrium. In the parent publication, the results of three separate software packages was used to calculate differential expression to improve confidence27. Results shown in this section are comparable in the number of significant genes to one of these packages, edgeR.
Analysis of the Detroit dataset identified 463 genes that were differentially expressed (267 up and 196 down) in D-IL compared with D-NL myometrium. This is highly consistent with the number (471) obtained in the parent publication24. Figure
3.1 shows the overlap of differentially expressed genes across the four within-study
phenotype comparisons. For the Detroit and London comparisons a similar number of 30
genes were differentially expressed between the NL and IL tissues, with the majority of genes up-regulated in the IL myometrium while the Warwick comparison showed a very strong change in expression with almost five times as many differentially expressed genes compared to the other datasets. Pairwise comparisons between all labor conditions in the three studies identified 358 shared genes for Warwick and
Detroit, 339 for Warwick and London (L-ILEa), 410 for Warwick and London (L-ILEs),
156 for Detroit and London (L-ILEa), 180 for Detroit and London (L-ILEs), and 282 for
L-ILEa and L-ILEs.
126 genes (124 in- creased and 2 decreased) were identified to be dif- ferentially expressed in all comparisons (see Appen- dix A Table 1 for a list of these genes with their median log2(fold change) and median adjusted p- Figure 3.1. Venn diagram of differential expression value). Thus, there is high in Warwick, Detroit, and London. Differentially ex- pressed genes between non-labor and in-labor sam- confidence that these 126 ple groups for each dataset are compared to calculate similarity across studies. genes are true positives for identifying biological processes that likely initiate and/or maintain the labor phenotype.
Using these high-confidence genes, pathway enrichment analysis was performed us- ing the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways with the online 31
tool EnrichR. This analysis identified multiple significant pathways including TNF sig- naling (adjusted p-value = 1.2e-9), cytokine-cytokine receptor interaction (4.6e-9), Jak-
STAT signaling (2.4e-7), NOD-like receptor signaling (2.3e-4), and chemokine signaling
(8.9e-4). These results indicate a strong inflammatory component, which is supported by previous studies and suggests that a significant portion of the transcriptional changes shared by all studies represents an increase of inflammatory signaling with the onset of labor.
3.2 Within-study Classification
To identify the best set of genes that distinguish labor from non-labor and assess the consistency of samples within and across datasets, two sparse regression-based classi-
fication approaches that are well-established in the area of machine learning: 1) least absolute shrinkage and selection operator (LASSO), and 2) elastic net (EN) were em- ployed.
LASSO utilizes a regularization term, the L1 norm of model coefficients, penaliz- ing models that contain more features.54 This results in more parsimonious predictive models, which allows for easier interpretation of the final classifiers (i.e. small num- ber of genes in the model). EN is an extension of LASSO in that it includes an addi- tional penalty term in the form of the L2 norm of the model weights.55 This serves to avoid shortcomings of LASSO, such as its restriction to choosing at most n (the number of samples) features in its model and tendency to only select one feature from sets of highly correlated features. With this additional term in the objective function, EN mod- els typically include more features (genes) than the models built by LASSO. Here, we 32
employed both EN and LASSO to obtain the smallest set of genes that can predict labor
(LASSO), while also allowing models with a larger number of genes (EN), of which some may exhibit highly similar patterns of expression. The latter can result in models with multiple genes that show similar differential expression across the IL and NL tissues, all of which can be of interest for our analysis. Besides providing a means to identify genes that can serve as predictive features, use of classifiers also provides a way to assess the consistency of sample groups within and across datasets.
Using the R glmnet package, 100 iterations of k-fold cross validation (CV) were preformed to train and test the models. Nested cross validation was used to optimize model parameters. Due to the varying number of samples in each study, we set k=3 for the Warwick dataset and k=5 for the London and Detroit datasets. The performance of
LASSO and EN in classifying the samples from each study was assessed by visualization of ROC curves and calculation of AUC (Fig. 3.2A-C). Varying levels of performance were achieved across the datasets, with AUC being near perfect (>99%) for Warwick, good for
Detroit ( 90%), and fair for London ( 75%). In these experiments, for the London dataset, we grouped the L-ILEa and L-ILEs samples into one L-IL group so that labor was repre- sented as a single phenotype for the calculation of AUC.
The frequency with which each sample was classified into each phenotype over the
100 runs of training and testing was also examined (Fig. 3.2J-L). Based on these results, several interesting patterns were observed. First, for classification frequency calcula- tions, the two L-IL sample types were ungrouped to assess how they separated based on the built classification models. The L-ILEa class appeared to be more variable and diffi- cult to classify, seeming to have signatures partially represented in the L-NL and L-ILEs group. Second, classification of the non-labor samples was highly accurate across all 33
A B C
D E F
G H I
J K L
M N O
P Q R
Figure 3.2. AUCs, ROC Curves, and sample classification frequencies for K-fold cross- validation and cross-study classification tasks on tissue gene expression studies using LASSO and EN. A-C. K-fold cross validation (CV) for each dataset (k = 3 for Warwick, 5 for London and Detroit). D-I. Classification of one dataset with training performed on another dataset. LASSO and EN were implemented using the glmnet package in R. J-R. Corresponding sample class fre- quencies for the classification tasks shown in panels A-I, specifically using the results from the top performing algorithm, based on AUC. For all prediction tasks, the early labor (L-ILEa) and established labor (L-ILEs) samples were grouped together as a single laboring sample group ex- cept for sample classification frequency for 5-fold CV on London, where the three sample groups were kept separate in order to assess how the new initial labor group was classified. All results shown were obtained using 100 runs of training and testing. Sample names in panels J-R follow the naming convention in Table 2.1. 34
studies, suggesting the presence of a robust gene expression profile characteristic of the quiescent pre-labor myometrium. Finally, there were samples with a classification dis- tribution that was highly contradictory to their given phenotype (e.g. L-ILEa6, D-NL18,
D-IL7, and D-IL16).
To gather information on the genes that were used to achieve the classification re- sults, the features selected in the models built during training were examined. For the
Warwick dataset, 147 genes were selected in at least one EN model, and 22 of the genes were selected in all 100 EN models (Appendix A Table 2). For the London dataset, to fur- ther observe how the L-ILEa group was classified, the L-ILEa and L-ILEs samples for this sample classification frequency calculation were not combined. For this reason, there are three classes in the classification task, meaning it is a multinomial problem. In this case, when assessing models and the gene coefficients, there are genes associated with each class (these are the genes that best separate each class from the other two). The re- sults were 7 unique genes for the L-NL class, 12 for the L-ILEa class, and 12 for the L-ILEs class in at least one of the 100 LASSO models (Appendix A Tables 2-4). For the Detroit dataset, 98 genes were selected in at least one of the 100 EN models, 32 of which were selected in all models (Appendix A Table 5). The models were approximately evenly di- vided between genes increased in IL samples and those decreased in NL samples across the three studies. (Supplementary tables include model genes and average coefficients.)
Next, the genes selected by the classification models to those that were identified via differential expression analysis were compared. 97 of 147 Warwick model genes were significantly differentially expressed in the Warwick dataset, two of these were high- confidence genes that were differentially expressed in all comparisons (total of 126). The two high-confidence genes were also among the 22 genes that were selected in all 100 35
Warwick models. In contrast, 19 of the 98 Detroit model genes were significantly differ- entially expressed in the Detroit dataset with only one being a high-confidence gene. Of the 32 genes that were selected in all Detroit models, 12 were differentially expressed in the Detroit dataset, and one was a high-confidence gene. Finally, 2 of the 7 L-NL, 1 of the 12 L-ILEa, and none of the 12 L-ILEs genes for the London models were significantly differentially expressed in the London dataset. The number of significant genes in the models for each dataset correlate with the accuracy of classification (e.g. Warwick had the highest percentage of incorporated significantly called genes and the highest AUC).
Interestingly, models built on different datasets shared only two genes, MITF and
CLK2, which were found in respectively 100 and 87 of the Warwick EN models and in re- spectively 96 and 18 of the Detroit EN models. However, neither of these two genes were among the 126 high-confidence genes that were identified via differential expression analysis. It has been shown that MITF is involved in the suppression of IL-8, a classic inflammatory gene, in the cervix during pregnancy.56 This agrees with the classification models derived from the Warwick and Detroit datasets as MITF has a negative model coefficient (higher expression in non-labor group). No direct associates between CLK2 and parturition were observed in the literature, however, data from the human protein atlas shows that CLK2 exhibits a higher expression in leukocytes compared to smooth muscle.57 Therefore, the positive model coefficient observed for CLK2 may represent leukocytes (accompanying increased inflammation) in the laboring tissue. 36
3.3 Cross-study Classification
The overlap between models learned from different studies is low in terms of gene con- tent, although all models represent the difference between myometrial quiescence and labor. Based on this observation, it was hypothesized that the models learned using each dataset may be capturing a different aspect of the signaling processes that under- lie labor. To test this hypothesis, the samples in each dataset were classified by using models learned from each of the other two datasets. Note that training on London data was done only using the L-NL and L-ILEs sample groups to be consistent with Warwick and Detroit protocols. As in the previous section, LASSO and EN were used for classifi- cation, nested k-fold cross validation was used to optimize model parameters, and 100 iterations of training and testing were performed.
The results of cross-study classification are shown in Fig 3.2D-I. Since there are three studies and training on each dataset was followed by testing on the remaining two datasets, there were a total of six cross-study classification tasks. Assessing the average performance in each of the six classification tasks, it was observed that models learned on one dataset were overall successful in classifying samples from a separate dataset, with AUCs ranging from 0.676 to 1 and a median AUC of 0.817. Classification of the War- wick samples proved to be a simple task for the models built using each of the London and Detroit datasets. This is likely because the Warwick dataset has a larger set (about
2,000 more than the other two studies) of differentially expressed genes between the la- boring and non-laboring groups. It was also observed that LASSO models performed better for classification of the London samples relative to EN. This could be due to the fact that many genes in a model may create additional uncertainty for classifying the more volatile L-ILEa samples. Another interesting result is that models derived from 37
the Warwick and Detroit datasets produced similar performance on the London dataset despite the fact that Detroit has almost four times as many samples.
As was done for the cross-validation analysis, computation of the sample classi-
fication frequencies for the cross-classification tasks was performed (Fig. 3.2M-R). In other words, the confidence with which one dataset correctly called the phenotype of a given sample from another dataset was calculated. Samples not always classified into the same group may have a weaker signature representation than others. Interestingly samples highly misclassified in the cross-validation analysis (e.g. L-ILEa6, D-IL7, and
D-IL16) showed the same pattern when classified using other datasets.
Overall, the cross-study analysis indicates a robust transcriptional signature differ- entiating NL from IL across the three datasets as performance in all classification tasks ranged from moderately accurate (AUC = 0.7) to highly accurate (AUC = 1). This is fairly impressive for a gene expression based classifier given that these types of classifiers of- ten fail to generate reproducible models due to experimental noise, heterogeneity of samples, and the limitations of mRNA-level expression in capturing changes in protein activity and function. This observation suggests that transcriptional regulation may play a key role in the transition of myometrial tissue from quiescence to the laboring pheno- type.
3.4 Identification of Latent Transcriptional Regulation Patterns
via Singular Value Decomposition
To understand underlying patterns of gene expression associated with labor, singular value decomposition (SVD), an unsupervised, dimensionality reduction and pattern 38
identification method was used. SVD has been previously employed to identify latent transcriptional regulation patterns in gene expression data.58–60
SVD was performed on the expression matrix containing all 71 samples (10 War- wick + 22 London + 39 Detroit) and 12,622 genes (those with measured expression in all studies). For a given gene expression matrix M, SVD decomposes the matrix into three parts: a pair of singular vectors, which can be interpreted as an eigen-sample and eigen- gene pair, and a relative strength value called a singular value. An eigen-gene represents a subset of genes with a given, similar pattern of expression. The eigen-gene then con- tains a value for each sample (n=71) indicating how the subset of genes is expressed in each of them. An eigen-sample represents a subset of samples with a given, similar pat- tern of activity across the corresponding eigen-gene. The eigen-sample then contains a value for each gene (12,622) indicating the activity of the subset of samples for a set of genes. The corresponding singular value indicates how well represented or how com- mon that pattern of expression is within a gene expression matrix.58 Therefore, SVD was employed to determine the dominant signals or expression programs within the com- bined tissue expression matrix, identify the genes that had the strongest contribution to the observed patterns, and visualize the stratification of genes and samples in a reduced space.
Examination of the singular values of the combined gene expression matrix indi- cated at least one and potentially two important patterns (Fig. 3.3A). The first singular vector was used for further analysis as it captures the main component of variation in the expression matrix and can be thought of as the dominant regulatory program. The dominant pattern among samples, obtained from the first eigen-gene, shows a clear 39
A B
C
Figure 3.3. Distribution of singular values and examination of genes and samples in the space spanned by the first singular vector from SVD analysis on aggregated gene expression matrix. A. Singular values calculated by performing SVD on the normalized, study-combined expression matrix (12,622 genes by 71 samples). B. Mapping of the top 25 positively and negatively valued genes in the space represented by the first right singular vector (eigen-gene). Bars representing genes are colored based on its differential expression in each of the four differential expression analyses (shown in Fig. 3.1; i.e. purple bars indicate genes in the 126 set of high-confidence genes). C. Mapping of all tissue samples to the first left singular vector (eigen-sample), sorted by mapping value. Bars representing samples are colored based on their clinically defined pheno- type (blue = non-laboring, red = laboring). Sample name abbreviations in panel C follow Table 2.1. separation of NL and IL samples with the former having negative values in the singu- lar vector and the latter having (mostly) positive values (Fig. 3.3C). The strength of this state of differential expression in each sample is determined by observing the magni- tude of each sample’s loading in the first eigen-gene. Based on these values, varying levels of pattern consistency across studies and within phenotypes were identified. Fig- ure 3.3C suggests that IL samples are more heterogeneous or potentially more difficult to call phenotypically (e.g. L-ILEs3 and D-IL7 are highly inconsistent) than non-labor 40
samples, which always have the same loading on the dominant singular vector. Based on these observations, it can be asserted that the dominant eigen-gene of the combined expression pattern represents the separation of IL and NL samples.
The most dominant eigen-sample was examined to identify the genes that con- tribute most to this singular vector, since these genes most accurately represent the labor phenotype. The 25 genes with the largest positive and negative values on the
first eigen-sample are shown in Fig. 3.3B. To provide a context for the correspondence between differential expression and this latent transcriptional regulation pattern, each gene is represented as a colored bar where the color is based on the number of study comparisons in which its expression pattern showed differential expression between phenotypes (e.g. purple bars indicate genes that were part of the 126 high-confidence genes from Fig. 3.1). The positively contributing genes in Fig. 3.3B were associated with higher expression in samples having a positive loading, mainly the IL samples. These genes strongly represented inflammatory processes, which has been shown in previous studies, containing genes such as IL-8, RUNX1, and IL-1β. The negatively contributing genes were more highly expressed in the NL samples and were not as well-replicated across all studies, but had comparable magnitudes of pattern contribution when com- pared to the positively loaded genes. One of these genes, GPR161 increases intracellu-
lar cAMP and promotes protein kinase A (PKA)-dependent processes. This aligns with
the hypothesis that the cAMP/PKA pathway plays a role in maintaining quiescence by
promoting relaxation of the myometrium61. Genes including JAZF1 and TCEAL4 are in- volved in transcriptional regulation while little is known about the function of MAMDC2
and BIVM. Further investigation of these genes may provide additional hypotheses re-
lating to the maintenance of pregnancy. 41
In summary, the dominant singular vector of the combined expression matrix rep- resents a differential expression of genes between IL and NL myometrium. This result indicates that SVD directly presents an unsupervised way to classify the phenotype of any myometrial sample, via gene expression, using the first singular vector.
3.5 Global Identification of Pathways Altered During Parturition
Based on the above results from comprehensive computational experiments using clas- sification and SVD, it is clear that IL and NL myometrium exhibit strong differences in their transcriptional profiles. To better understand and interpret the signatures ob- tained, pathway enrichment analysis was performed using Gene Set Variation Analysis
(GSVA), with the hallmark gene sets from the molecular signatures database (MSigDB; well-defined biological states or processes created by collating gene sets from multi- ple sources), to each study individually. GSVA is an unsupervised method that makes use of the full expression matrix as opposed to a simple list of differentially expressed genes (e.g. over-representation) and provides a directionality, as fold change, for each pathway. This permits GSVA to determine pathway enrichment similarly to other ap- proaches in that strong differential expression in a few pathway genes will be significant, but also in cases where there is potentially weak but highly consistent differential ex- pression across many genes in a pathway. GSVA was performed separately on the stud- ies to make use of GSVA’s enrichment fold change output (i.e. to assess how pathways are being changed from NL to L as opposed to simply calling them significant with no di- rection of regulation). As mentioned in chapter 2, an additional normalization step was required to combine all studies (as in the SVD analysis). This normalization shifts the 42
expression distribution, diminishing the resulting fold change values, making it difficult to determine significance using traditional methods (i.e. fold change cutoff of 1.5). Path- way enrichment in this form allows better interpretation of the strongest signal between
IL and NL samples specific to each study as well as identify pathway-level consistency across all studies.
Figure 3.4A shows the GSVA results on the Warwick, London, and Detroit datasets, where rows (gene sets) are clustered by fold change enrichment. In the figure, pathways with higher activity in NL samples are shown in blue and pathways with higher activity in IL samples are shown in red. All three studies exhibited consistent enrichment, specifically upregulation, for multiple gene sets including TNFα, MYC targets,
MTORC1, inflammatory response, and JAK-STAT signaling gene sets. Note that not every pathway was significant in every study under traditional cutoffs (FC > 1.5 and adj. p-value < 0.05). Expression of TNFα has been shown to increase during labor in the cervix, fetal membranes, and myometrium, where it stimulates arachidonic acid release, activates phospholipid metabolism, and increase the production of prostaglandins in the myometrium (see review by Peltier, 200362). The JAK-STAT
pathway is a major cellular signaling pathway that has been shown to be activated by
mechanical stretch of myocytes.63 JAKs are also activated by cytokine ligands,
including interferons and interleukins, which are known to be associated with labor.64
Additionally, a study by Breuiller-Fouche et al. (2007) that combined results from existing studies examining labor in the myometrium showed that many of the consistently discovered genes were involved in the JAK-STAT pathway.65 From the above results, genes involved in TNFα signaling include SOCS3, IL6, CEBPB, IL1B,
BCL3, LIF, TNFAIP3, CCL2, PTGS2, SELE, and CXCL2, while the genes SOCS3, CSF3, IL6, 43
IL4R, CSF3R, MYC, OSM, PIM1, LIF, and IL7R make up the JAK-STAT pathway. Other pathways observed to be consistently enriched have links to parturition. Master et al.
(2002) showed that MYC expression in the mammary glands of pregnant mice decreases during pregnancy and rises just before labor and work by Deng et al. (2016) suggested an important role for mTORC1 signaling in the control of parturition timing.66,67 Together, the result of enrichment analysis both confirms and annotates the transcriptional signature identified in the above sections with specific pathways that can provide starting points for more targeted studies involving the transition of the myometrium from quiescence to labor.
3.6 Hypothesis-Driven Signature Identification
32 gene sets from MSigDB related to P4 signaling, inflammation, and muscle relaxation were selected based on previous finding to target the search to pathways implicated in parturition. GSVA was repeated with these sets for each study (Fig. 3.4B). As in the pre- vious section, results are represented in terms of a colored (by GSVA fold change enrich- ment), clustered heat map. There are clear areas of the heat map that indicate, for mul- tiple gene sets, differential expression and, just as importantly, fair consistency across most or all study comparisons. These sets include TNFα signaling via NF-κB and in-
flammatory response up-regulated in labor while vascular smooth muscle contraction, regulation of cAMP metabolic process, and relaxation of muscle were down-regulated in labor. Such results indicate the importance of these pathways in parturition. 44
A Log2(Fold Change)
B Log2(Fold Change)
Figure 3.4. Global and hypothesis-driven GSVA pathway enrichment across all tissue stud- ies. A. Top 30 enriched pathways (by median adjusted p-value) using the hallmark gene sets from MSigDB. B. Enrichment results for a selected set of pathways from MSigDB related to progesterone response, smooth muscle, inflammation, and cAMP signaling. Heat map is col- ored by log2(fold change) where red means up-regulation in laboring samples and blue down- regulation. Fold changes are calculated using an empirical Bayes model fit to the output (from the gsva() function) gene set by sample matrix. Rows (gene sets) are clustered using the complete clustering method based on correlation. 45
Next, SVD was applied to genes involved in signaling by progesterone, inflamma- tion, and cAMP. The most representative gene sets selected for this analysis were re- sponse to progesterone from Gene Ontology, inflammatory response from the MSigDB hallmark sets, and cAMP signaling from KEGG. The combined gene expression matrix was reduced to the genes in each of the gene sets separately. The pathway-specific ex- pression matrices were then concatenated and subjected to SVD. Results indicate that all three pathways exhibit differential expression between phenotypes, since the dom- inant regulatory program (eigen-gene and eigen-sample pair) for each gene set clearly separates the IL and NL samples (Fig. 3.5), similar to the global SVD result.
The relative strength of the dominant regulatory programs can be compared be- tween these three pathways by using a measure introduced in Leskovec et al. (2014) called “energy”.68 The square of each singular value is a measure of the amount of vari- ation accounted for by the first eigen-gene and eigen-sample pair.60 It then follows that the squared sum of all singular values represents the total variation of the data. Then the energy measure of interest here is the proportion of total variation (in the gene ex- pression data) covered by the first singular vector pair, which can be calculated by di- viding the square of the first singular value by the sum of squares of all singular values.
Using this measure, the energies were 24.28% for response to progesterone, 23.26% for inflammatory response, and 15.89% for cAMP signaling. Therefore, the separation be- tween IL and NL samples was strong for response to progesterone and inflammatory response and less so for cAMP signaling. Positive sample loadings were seen for the la- boring phenotype for response to progesterone and inflammatory response while pos- itive sample loadings for the non-laboring phenotype were seen in the case of cAMP 46
A B
C
D E
F
G H
I
Figure 3.5. Pathway-specific SVD analysis. SVD results for the gene sets A. GO response to progesterone, B. hallmark inflammatory response, and C. KEGG cAMP signaling pathway. All studies were combined (12,622 genes x 71 tissues) and then reduced to the genes within each gene set. Panels A, D, and G show the distribution of singular values, panels B, E, and H show the top 25 genes associated with each phenotype based on loading value (genes are colored by their mean log2(fold change) value across all four sample group comparisons in Fig. 3.1), and panels C, F,and I show the gene set activity (loading value) for each sample (samples are colored based on their originally reported clinical phenotype). 47
signaling. This is consistent with the understanding that inflammatory response is pro- labor and cAMP signaling is relaxatory or an anti-labor process. The gene set response to progesterone has many inflammation-related genes due to the fact that it is anti- inflammatory (i.e. expression of such genes are decreased in response to inflammation), therefore, the samples map in the same direction as the inflammatory response SVD result (Fig. 3.5F). However, gene loadings can provide insights into the pro-relaxation genes that progesterone regulates. Panels B, E, and H of Fig. 3.5 rank contributions of genes in these pathways, showing which genes contribute to the association of the path- way with each phenotype. Due to the strength of the inflammatory/laboring signal, the observed fold change is more consistent across studies for genes in this set than those with higher expression in the non-laboring signal. Overall, these results indicate that there is some level of differential expression for these pathways in the three transcrip- tome studies. However, many pathways are comprised of various signaling processes.
Certain genes in a pathway may be pro-quiescent and down-regulated at the onset of labor, while others in the same pathway promote labor and are up-regulated during la- bor. An example is cAMP signaling, which has canonical paths in various tissues that ac- tivate inflammatory-related processes (MAPK and PI3K-AKT signaling) as well as paths that activate muscle relaxation. Such pathways are clearly relevant to parturition but may have reduced significance (enrichment fold change) due to the regulation of the genes involved in these pathways in different directions.
Next, a pathway that was consistently, but not significantly, enriched in the targeted
GSVA analysis: vascular smooth muscle contraction (VSMC) was examined in further detail. Gene loadings from the first eigen-sample obtained from performing SVD were overlaid on this pathway (Fig. 3.6). By analyzing a pathway in this manner, an improved 48
Contraction CACNA1C CALML6 MYLK4 Ca2+
Calcium signaling KCNMB2 pathway K+ +p
20-HETE Myosin Arachidonic acid ITPR1 metabolism CYP4A11 MYL9 MYH11
Arterenol PLA2G4B Arachidonate ADRA1D -p Angiotonin AGTR1 Ins(1,4,5)P3 ARAF +p MAP2K1 +p MAPK1 +p CALD1 ACTA2 GNA11 PLCB1 EDNRA Endothelin-1 AVPR1A PRKCA PPP1R14A Diglyceride PPP1R12A AVP
GNA13
ARHGEF12 RHOA ROCK1
Reduction of Adenosine ADORA2A contractile system Ca2+ sensitivity Relaxation
PGI2 PTGIR GNAS ADCY1 PRKACA MYLK4 cAMP
alpha-CGRP CALCRL
Adrenomedullin Reduction of +p CACNA1C intracellular Ca2+ Ca2+ concentration Myosin
14,15-EET ITPR1 MYL9 Ca2+ MYH11 KCNMB2 K+ MRVI1 -p + ANP +p NPR1 ACTA2 BNP PRKG1 PPP1R12A NPR2 cGMP
CNP Non-labor Labor
GUCY1A2 NO
Figure 3.6. Visualization of gene expression changes on KEGG Vascular Smooth Muscle Con- traction (VSMC) pathway diagram. The Homo sapiens VSMC pathway diagram was down- loaded in KGML format from KEGG and loaded into Cytoscape. Nodes (genes) are colored based on loading values for the dominant singular vector from performing singular value decomposi- tion on the reduced (to include only genes within the VSMC pathway) Warwick expression ma- trix. These values were directly loaded into Cytoscape as a node table and used in a continuous manner in the style tab for node coloring. Blue nodes indicate genes with higher expression in the W-NL group, red higher expression in the W-IL group, and white no differential expression.
understanding can be achieved in terms of how changes in expression may be influ- encing signaling paths as opposed to just knowing whether or not the expression of a group of genes is changing in one direction or another. A large majority of the genes in this pathway have higher expression in the NL samples, suggesting that a significant part of the signaling machinery for contraction is used to keep myometrial cells in a re- laxed, quiescent state during pregnancy. Then, at the time of labor, a small number of 49
important genes in specific areas of the pathway are up-regulated to achieve myome- trial contraction. The labor-triggering genes are involved in initiating inflammatory re- sponse (PLA2G4B), MAPK signaling (ARAF, MAP2K1), and calcium-induced contraction
(CALM6, MYLK4).
3.7 Building a Parturition Signaling Network
The problem of reconstructing biochemical pathways is well studied, and the shortest
paths approach has been shown to be an accurate solution.45,69,70 A curated signaling
network constructed by Ritz et al. (2016) by integrating multiple protein interaction and
pathway databases as well as Gene Ontology (GO) was used (see chapter 2).45 This net- work is directed and weighted, where weights are calculated using a Bayesian approach to assign probabilities to edges based on experimental evidence and shared GO terms.
Since progesterone signaling, inflammation, and cAMP signaling were identified as im- portant pathways based on enrichment results and prior work in the parturition field, the network was reduced by selecting the nodes and edges that are downstream of the proteins PGR, IL1R1, and ADCY. These three proteins are the main, upstream initiators for the three pathways of focus. These three proteins are referred to as the “source” nodes in the parturition signaling network. Weights were then assigned to the nodes of the resulting network based on the broad gene expression signature derived from the global SVD analysis (i.e. the weight of a gene product in the network is the absolute value of the loading of the gene in the most dominant eigen-sample). “Sink” nodes are also chosen in the parturition network as the products of the top 50 genes associated with laboring and non-laboring myometrium based on the absolute values of the gene 50
loadings. This choice of the top 50 is somewhat arbitrary, however, it ensures an equal representation from each phenotype while capturing the most significant changes at the gene expression level. It also allows for a relatively broad signal (as there are 100 to- tal genes) without being too large, which would result in very dense networks. Shortest paths were calculated from each of the source nodes (the three pathway initiator pro- teins) to each of the sink nodes (the highly differentially expressed genes), by ensuring that the last edge in the path is an edge between a transcription factor and a target node.
All nodes that lie on one such shortest path are included in the final parturition signaling network.
The resulting network generated from this method has 66 nodes and 91 edges (Fig.
3.7A). The network contains four different node types: receptor, signaling protein, tran- scription factor, and target gene. The transcription factors and signaling proteins were selected by the shortest paths calculations. Nodes are colored based on their weight
(gene loadings from the dominant regulatory program from our global SVD analysis) with blue representing higher expression in non-laboring samples and red representing higher expression in laboring samples. Most of the nodes in this network have a dark blue or red color, indicating that they exhibit some differential expression between la- boring and non-laboring myometrium. Multiple genes in the network upstream of our target genes have been linked to parturition or are in some of the gene sets/pathways we investigated in previous sections of this work.
The parturition network provides signaling information about quiescence and la- bor, or the machinery that is present in both phenotypes. To identify signaling pathways specific to each phenotype (IL vs. NL), phenotype-specific signaling networks were also constructed. These networks were built by setting the input target genes as the top 50 51
PGR IL1R1 A ADCY PRKX RXRA PIK3R1
PRMT2 AKT1 TRAF2 NFKB1 CASP3 MAPK10 EGR1 MAP2K3 RELA RAC1 HSP90AA1 PTPN6 ATF3 AR RARA COPS5 SMAD7 PRKCA NFKBIA PRKACB MAP3K1 SREBF1 OSM MAPK14 NFYA STAT3 NFKB2 FOXO1 RIPK2 BATF3 SMAD3 MYC TP53 COPZ2 ACACB IER3 EIF4A1 SOD2 CAV1 CEACAM1 ETS2 IL1B ITPR1 WWP1 CEBPB NAMPT MT2A SPHK1 MYO1G BCL3 PTN MT1A PNP DYNLL1 PLAUR RUNX1 PFKFB3 CD34 ZBTB4 PIM1 HADH UBE2D4
B ADCY PGR IL1R1
TRADD PRKACB PIK3R1 PRKCI PPARA PRMT2 PRKAB2 AKT1 MAPK9 COPS5 PRKACA CREB1 SREBF1 AR PPP1R1B ESR1 MAPK8 DAB2 MAP3K3 RAF1 PPP1CB RELA MAPK1 PRKAG1 CKAP5 SNCG PPP2CB HRAS RAC1 MAPK10 YWHAZ ATL1 TGFBR2 RAP1B PTK2 NCOR2 SMAD2 NFKBIA STAT1 MYO1D PIAS4 EP300 EGR1 PEBP1 RNF14 MYC MAP3K1 SMURF2 ACACB PPP4C SNW1 NFYC CAMK2G TP53 VHL UBE2D4 NFYA EEF1A1 ZBTB4 PTN CCDC6 CAV1 ITPR1 WWP1 DCAF6 CRYZ DYNLL1 HADH CD34 COPZ2 AHNAK PFN2
PGR C ADCY IL1R1 PRKX MYD88 HSP90AA1 NFKB1 TRAF2 ATP1A1 MAPK14 MAP2K3 VDR ATF3 MAPK3 NCOR2 HDAC1 SMAD3 MAP3K5 PIK3CD PIK3R2 AKT1 SNW1 RELA CTNNB1 SRC PTPN6 BAD FOXO3 PRKACA EGFR MAPK8 STAT3 IER3 EP300 CASP3 MYC EGR1 IL6 SFN HNF1A MDM2 TP53 RIPK2 HNF4A MYO1G BCL3 CEBPB PNP NFKB2 PIM3 AR OSM EIF4A3 SPHK1 CEACAM1 PFKFB3 MT1A EIF4A1 PIM1 IL1B ANPEP SOD2 ETS2 RUNX1 SERPINA1 MT2A NAMPT BATF3 PLAUR
Signaling Transcription Receptor SVD Gene Loading Value Protein Factor
Transcription Factor Target Non-Labor Labor and Target Gene Gene
Figure 3.7. Parturition-specific signaling networks constructed using shortest paths and dif- ferential expression data. Shortest paths were calculated on a curated, directed, weighted sig- naling network. Shortest paths were obtained from each of the three pathway initiator proteins (PGR, IL1R1, ADCY; chosen to represent progesterone, inflammatory, and cAMP signaling) to each downstream target gene. Nodes in the network were weighted based on gene loadings from the dominant singular vector when performing SVD on the combined expression matrix (3 studies; 12622 genes X 71 samples). A. A general parturition signaling network. Target genes consisted of the top 50 genes associated with the non-laboring and laboring sample groups (100 total input genes). Nodes were weighted such that those with strong phenotype associations had the lowest cost. Paths falling in the top 50%, in terms of lowest costs, were retained. B. A parturition signaling network prior to labor (i.e. quiescence). Target genes consisted of the top 50 genes associated with the non-laboring sample groups. Nodes were weighted such that genes highly expressed in the non-laboring samples had low costs, those with roughly equal expression across the two phenotypes had a medium cost, and genes with higher expression in the laboring samples had high cost. C. A parturition signaling network during labor. Target genes consisted of the top 50 genes associated with the non-laboring sample groups. Nodes were weighted op- posite of the quiescent network in panel B. All paths were retained for the networks in panels B and C. Networks were visualized in Cytoscape. Node shapes are based on function/pathway location, node colors correspond to their respective weights, and edge width is proportional to the weights in the original network (thicker meaning higher confidence). 52
genes associated with NL (NL network) or IL (IL network), based on their SVD gene load- ings. For the remaining nodes in the network, weights were assigned to ensure that genes with expression associated with the corresponding phenotype were prioritized
(i.e. have lower costs and were therefore more likely to be chosen in the shortest path).
In other words, the values in the singular vectors associated with the genes were nor- malized such that genes up-regulated during labor had greater importance in the labor network (and vice versa for the non-labor network). The resulting networks are shown in Fig. 3.7B (non-labor) and Fig. 3.7C (labor). These two networks show very few simi- larities with each other, but both exhibit similarity with the network from Fig. 3.7A. Also of note is that the networks do not contain nodes of only one color, indicating the im- portance of certain proteins in the signaling of both biological states. The NL network is enriched for neutrophil signaling, EGF receptor signaling, cAMP signaling, regulation of circadian rhythm, and JUN phosphorylation while the labor network shows enrich- ment for glucocorticoid receptor regulatory network, TNF signaling, signaling by inter- leukins, regulation of PI3K, and positive regulation of cell proliferation. The IL network has many downstream targets that are known to be classic inflammatory genes (IL-1β,
RUNX1, PIM1, BCL3, etc.) whereas the target genes in the NL network give clues as to which genes may be most important during pregnancy and how they are regulated.
3.8 Summary
In this chapter, three gene expression studies derived from myometrial tissue samples of women at term (IL vs NL) were analyzed and compared. Differential expression analysis 53
provided a high-level examination of transcriptome variation between sample pheno- types and similarities across the three studies, which produced a high confidence set of
126 genes (differentially expressed in all studies) useful for characterizing labor. Clas- sification through supervised machine learning algorithms LASSO and EN was used to assess the strength of the transcriptional signature within each dataset and gain insight regarding the overall quality and consistency both within and between datasets. SVD on an aggregated expression matrix further confirmed a strong transcriptional signal sepa- rating laboring and non-laboring phenotypes. Finally, an integration of the knowledge gained from these approaches and pathway analyses with network information was per- formed to construct a signaling network of pregnancy and labor in human myometrium. 54
4 Investigating Pathways
Controlling Human Parturition in a
Myometrial Cell Line
Investigation of differential gene expression (comparisons in Fig. 2.1C) indicates that FSK/cAMP induced a major transcriptional response, altering the expression of
2210 genes, while P4 and IL-1β affect a less but near equal number of genes (932 and
970, respectively) between them (Fig. 4.1A). cAMP and IL-1β share a substantial num- ber of up-regulated genes (Fig. 4.1B), cAMP and P4 share a large set of down-regulated genes (Fig. 4.1C), and P4 down-regulates a number of genes that are up-regulated by
IL-1β (Fig. 4.1D). These results suggest that there is inter-gene regulation between these processes and that P4 is anti-inflammatory while cAMP has a more complicated rela- tionship in that it likely both up- and down-regulates different sets of inflammatory- related genes. 55
A All Significant B Up-regulated
C Down-regulated D IL-1B Up, P4 and FSK Down
P4 FSK IL-1B
Figure 4.1. Comparison of individual genes that are significantly differentially expressed in response to each individual stimulus. Venn diagrams depicting the overlap of genes that are A. significantly differentially expressed, B. significantly up-regulated, and C. significantly down- regulated in hTERT-HMA/B cell lines treated by P4 (green circle), FSK (red circle), and IL-1β (blue circle) as compared to control samples. D. Venn diagram depicting the overlap of genes that are significantly up-regulated in IL-1β treated cells as compared to control and significantly down- regulated in P4 and/or FSK treated cells as compared to control. All comparisons are single stim- ulus versus control and significance is defined as having |fold change| > 1.5 and p-value < 0.05 (after adjustment for multiple hypothesis testing using the Benjamini Hochberg procedure).
4.1 Progesterone Signaling
Treatment of the hTERT-HMA/B cells made to equally express PR-A and PR-B with P4 significantly ( fold change 1.5 and adjusted p-value 0.05) altered the expression of | | > < 932 genes compared to untreated cells (CTRL). P4 induced more down-regulation (555 56
genes) than up-regulation (377 genes). At the pathway level, P4 down-regulated the GO
BPs muscle tissue development, STAT cascade, chemokine signaling, and leukocyte mi- gration while the TNF signaling pathway and cellular hormone metabolic process were up-regulated (Fig. 4.2A). As for transcription factors, P4 up-regulated downstream tar- gets of AR and ISRE and down-regulated the targets of STAT6, RSRFC4 (or MEF2A), CEBP, and HNF1 (Fig. 4.2B). These results clearly indicate anti-inflammatory actions by P4, which is supported by previous research71. Investigation of the inferred transcriptional regulatory machinery underlying the P4-induced gene expression changes (i.e. the tran- scription factor pathways that P4 uses to alter expression of certain genes) indicates that
JUN, a part of the AP-1 complex, is an important component (Fig. 4.3A). Node colors in
this network indicate strong anti-inflammatory action by P4 (the targets MMP1, CCL3,
IL-8, and IL-1β are all down-regulated). Other genes of interest include KLF4, which
is up-regulated by P4 and has multiple down-stream targets in the network. KLF4 has
been previously linked to P4 and appears to limit cell proliferation72. Furthermore, PLAT
plays a role in tissue remodeling and cell migration and PTX3 is involved in the regula-
tion of innate resistance to pathogens and inflammatory reactions73,74. Together these results indicate that the majority of signaling by P4 in hTERT-HMA/B cells is to produce a strong anti-inflammatory action, likely utilizing JUN to achieve this effect.
4.2 cAMP Signaling
Treatment of cells with FSK significantly altered the expression of 2210 genes, the ma- jority of which were up-regulated compared to CTRL (1345 up and 865 down). At the pathway level partial anti-inflammatory action was observed for cAMP,with response to 57
A Figure 4.2. Pathways and transcription factors that exhibit significant enrich- ment in differential expression in re- sponse to treatment by P4, FSK, and IL- 1β individually. The gene set collections for Gene Ontology Biological Processes and transcription factor-targets were obtained from the Molecular Signatures Database. Gene sets significant (|fold change| > 1.5 and adjusted p-value < 0.05) in any single stimulus condition B versus control were identified using Gene Set Variation Analysis (GSVA). For a subset of these significant A. pathways and B. transcription factors, the magnitude and direction of the col- lective fold change of the genes in the corresponding gene set, as quantified by GSVA, is shown using three bars colored according to the stimulus. 58
A Progesterone Regulatory Network Control Stimulus BCL11B KLF4 -3.31 Log2(Fold Change) 6.22 CREBBP JUN CEBPB CSF3 STAT1 CSF2 IL1B POU2F1 STAT5B IFNA1 TP53 PNMT EGR1 VDR KCNN3 CEBPA SMAD4 BMP2 AR IL6 STAT5A
SMAD1 GDNF SEPP1 ACPP RARA STAT3 IL11 FLI1 HIF1A ANKRD1 DCN MMP12 IDO1 STAT2 SERPINA1 MMP1 CCL4 PTGES SERPINB2 PLAU CCL3 PLAT BCL2A1 IL8 AKR1C2 SPRY1 MMP3 CXCR4 PTX3 IL24 IL13RA2 ROR1
B cAMP Regulatory Network Individual Stimulus PRKCZ Shared with P4 Control Stimulus Shared with FSK JUN -3.83 Log2(Fold Change) 6.14 CEBPB RELA POU2F1 Shared with IL-1B Shared between all CSF3 NR4A3 CSF2
PPARG EGR1 STAT3 PTGES HMGA2 KLF9 IL6 IL11 HNF4A EPAS1 RARA NR4A2 PGR TFAP2A AR NOX4
RUNX2 POU1F1 SERPINB2 VTN DIO2 MMP1 FLI1 ALDH1A1 CRISP3 SMAD6 CXCR4 REN FBP2 MBP HSD17B2 CCL3 ANGPTL4
C IL-1B Regulatory Network Control Stimulus IL18 JUN STAT3 -1.70 Log2(Fold Change) 4.87
CEBPB CEBPD
CEBPA POU2F1 IL1B
CCL2 SOCS3 SMAD4 CSF2 CSF3 PPARG CCL3 EGR1 DDIT3 CCL4
AR BCL2A1 SOD2 IL6 HIF1A SMAD1 IL11 IL24 RARA FLI1 HSD11B1 SERPINB2 SAA1 ALDH1A3 SAA2 MMP1 MMP3 ITGB8 CXCL2 SPRR2A CYP27A1 MYOCD PTGS2 IL23A SERPINA1 CCL7 ANGPTL4 PDGFB CCL20 KRT14 KCNN3 MMP12
Figure 4.3. Regulatory networks representing the effects of each individual stimulus. Regu- latory network enrichment analysis (RNEA) was performed and significant subnetworks were identified in samples treated by A. P4,B. FSK (cAMP), and C. IL-1β based on gene expression alterations compared to control. Nodes (genes and transcription factors) are colored based on log2(fold change). Edges (regulatory interactions) and node outlines are colored based on occur- rence in the 3 shown networks. Log2(fold change) cutoffs used for significance in the RNEA func- tion were 1.5 for P4 and IL-1β and 1.75 for FSK. Visualization was performed using Cytoscape. 59
IL-1 and FC epsilon receptor signaling significantly down-regulated (Fig. 4.2A). FSK also induced changes associated with myometrial cell quiescence; genes involved in smooth muscle contraction were down-regulated and those involved in muscle hypertrophy were up-regulated. At the transcription factor level, more anti-inflammatory action was apparent through the down-regulation of FOS and ETS1 targets (Fig. 4.2B). cAMP up-
regulated targets of CREB, expectantly, as well as targets of FOXM1 and NFY, which are
involved in regulating cell proliferation. The inferred regulatory network based on these
gene expression changes exhibits multiple points of interest (Fig. 4.3B). Many down-
stream genes were regulated by RELA, a sub-unit of the well known inflammatory tran-
scription factor NF-κB. RELA was in turn down-stream of the most strongly up-regulated
gene after FSK treatment, protein kinase C zeta (PRKCZ). There were some similari-
ties with the P4 network, including down-regulation of inflammatory genes MMP1 and
CCL3. However, known inflammatory genes such as IL6, IL11, and CXCR4 were up-
regulated by cAMP, indicating more diverse and complex action within the cAMP sig-
naling pathway in myometrial cells. Further illustrating this point, some genes in this
network correspond to the establishment of the quiescent phenotype which was also
observed from the pathway enrichment (e.g. NR4A2 and NR4A3 participate in cardiac
hypertrophy and VTN promotes cell adhesion)75.
4.3 IL-1β Signaling
IL-1β significantly altered the expression of 970 genes in hTERT-HMA/B cells, most of which were up-regulated compared to CTRL (586 up and 384 down). At the pathway
level, the gene expression alterations corresponded to a classic inflammatory response 60
signal, with pathways related to chemokines, leukocytes, cytokines, and inflammatory molecule signaling (IL-1, FC epsilon receptor, JUN, and NF-κB) were all up-regulated
(Fig. 4.2A). Similarly for transcription factors, IL-1β up-regulated targets of GATA, NF-
κB, SRF, and STAT6 (Fig. 4.2B). This pattern was also seen in the IL-1β RNEA network with the majority of genes exhibiting increased expression (Fig. 4.3C). Genes such as
CCL3, PTGS2, and IL11 were down-stream of CSF2/3 and CEBPA/B/D transcription fac-
tors as well as JUN, which regulates expression of many genes altered by IL-1β treat-
ment, suggesting an important role for this protein in regulating inflammation.
4.4 Additional Insights from Single Stimulus Comparisons
For the most enriched TFs in Fig. 4.2B, the co-regulation of their target genes by P4,
FSK, and IL-1β was examined further. A substantial portion of the targets of these top
TFs were regulated in specific directions (up- or down-regulated) by at least two of the
individual stimuli. For example, the TF ISRE was up-regulated by P4 and FSK (Fig.
4.2B). Of the 247 known targets of ISRE, 34 were up-regulated by FSK, 15 by P4, and 7
of these were up-regulated by both (Fig. 4.4A). FSK and P4 also altered the same genes
of TFs whose activity was down-regulated, such as ETS1 (Fig. 4.4K). FSK and IL-1β ac-
tivated many of the same targets of NF-κB, whereas TFs such as STAT6 exhibit that the
anti-inflammatory actions of P4 are effective on some of the same targets that were up-
regulated by IL-1β (Fig. 4.4I). These results indicate that P4, FSK, and IL-1β individually
regulate the expression of the same genes, with P4 and FSK typically having the opposite
effect as IL-1β on expression. 61
Up-regulated Down-regulated IL1B Up, P4 and FSK Down A B C
ISRE-01
247 Total Targets
D E F
NFKB_C
263 Total Targets 1
2
G H I
STAT6_02
258 Total Targets
J K L
ETS1_B
259 Total Targets 2
4 2
P4 FSK IL-1B
Figure 4.4. Regulation of shared targets for top enriched transcription factors in response to individual stimuli. Comparison of how each stimulus alters expression of targets of selected transcription factors showing strong enrichment from GSVA calculation in at least one of the conditions P4, FSK, and IL-1β compared to control. For each of these transcription factors, the number of known targets that are up- or down-regulated by each stimulus and the overlap be- tween these targets are shown, along with the total number of known targets for each TF. 62
The regulatory network analysis also indicates clear similarities between the three individual stimuli. All three networks shared 15 nodes and 25 edges, indicated in red
(edges and node borders in Fig. 4). A majority of the shared genes included upstream
TFs, suggesting that each stimulus induced its cellular response through common reg- ulatory machinery. The anti-inflammatory action by P4 is clearly represented in the shared sub-network between P4 and IL-1β (purple in Fig. 4A and blue in Fig. 4C). These two networks shared 12 nodes and 20 edges (in addition to the nodes and edges shared by all three conditions), where most genes were up-regulated in the IL-1β network and down-regulated in the P4 network. This indicates that the main action of P4 is anti- inflammatory, directly down-regulating many of the same genes that IL-1β stimulates.
Although P4 and FSK had anti-inflammatory action and equally acted on many of the same TF targets, little was shared between top-enriched P4 and FSK regulatory networks that is not also shared with the IL-1β network. Taken together, it is clear that P4, FSK,
and IL-1β are not only regulating the same genes, but also using similar transcriptional
pathways to achieve the regulation.
4.5 Anti-inflammatory Actions of P4 and cAMP
Terms significantly enriched by IL-1β treatment were examined to better characterize the anti-inflammatory affects of P4 and FSK in the presence of inflammation (perform- ing comparisons in Fig. 2.1D). Certain pathways up- (chromatin access and organiza- tion) and down-regulated (inflammatory-related pathways including glucose starvation and type 1 interferon) by IL-1β were observed to be unaffected by the addition of P4 and/or FSK (Fig. 4.5A, top panel). The addition of P4 and FSK individually decreased 63
specific IL-1β responsive pathways. P4 strongly inhibited GO terms related to cell dy- namics. FSK again exhibits a dual action of promoting relaxation (reversal of pathways relating to muscle and calcium transport) and inhibition of inflammation (IL-1 signaling
and monocyte chemotaxis). Interestingly, multiple pathways significantly up-regulated
by IL-1β were inhibited only when both FSK and P4 were present, consisting of pro-
cesses related to cytokine production and signaling cascades involved in inflammatory
response (STAT, IL12, and LPS). As for transcription factors, it was observed that the ac-
tivity of NF-κB, in the context of inflammatory signaling, was not inhibited by P4 and/or
FSK (Fig. 4.5B, top panel). The extent to which P4 inhibited inflammation at the level of transcription factor activity was clear, almost completely reversing the expression of downstream targets. For most of these cases, FSK had relatively little or no effect sug- gesting anti-inflammatory action specific to P4 signaling. FSK again seemed to oppose down-regulation by IL-1β, particularly on TFs related to relaxatory actions (GR and PR).
Similar to results seen at the pathway level, there were cases where the combination of
FSK and P4 had a stronger affect on IL-1β-altered gene sets than the two individually. In- hibited inflammatory TFs included GATA, SRF (part of the MAPK pathway), and STAT6, which was also a synergistically inhibited pathway.
Anti-inflammatory action was investigated further through annotation of the previ- ously calculated IL-1β RNEA network (Fig. 4.3C). Figure 4.6A depicts how genes from the
IL-1β RNEA network were influenced, in terms of their expression, by addition of FSK
and/or P4. A sizable portion of the network includes IL-1β genes that were inhibited in the presence of P4 alone, FSK alone, and FSK + P4 (indicated in yellow). Note that all of these genes were up-regulated by IL-1β (boxed nodes), indicating the pro-inflammatory genes and regulatory interactions that P4 and FSK inhibit or inactivate individually and 64
in combination. There were additional cases in blue, green, and orange where only one of P4 or FSK was needed to achieve inhibition. Genes in red indicate those where inhibi- tion was only achieved in the presence of P4 and FSK, representing the GSVA-observed synergistic inhibition. Finally, this network also highlights genes and regulatory inter- actions that are unaffected by P4 and/or FSK (in purple), meaning they remain active in the presence of anti-inflammatory stimuli.
4.6 Synergy Between P4 and cAMP
Much of the previous results indicate a synergistic effect between cAMP signaling and
P4. Such a relationship is important in the context of establishing and maintaining preg- nancy. FSK and P4 each had anti-inflammatory actions and relaxatory actions (Figs.
4.1-4.4), but these did not fully overlap, meaning that they work similarly in certain bio- logical contexts and complimentary in others. This was examined further by annotating a regulatory enrichment network calculated from the FSK + P4 vs CTRL comparison of expression (Fig. 4.6B). A sizable portion of this network (indicated in red) shows the acti- vated regulatory interactions present only under the combination treatment of FSK and
P4. Many of the genes were up-regulated, implicating them as important molecules for pro-relaxtory signaling as opposed to the dominant anti-inflammatory signaling seen in Fig 4.6A. Again, many of the nodes in this network are co-regulated by P4 and cAMP
(indicated in yellow). These genes are mostly down-regulated (ovular nodes), which is 65
A Figure 4.5. Pathways that are
No activated/inhibited by IL-1β are Inhibition inhibited/activated by different combinations of P4 and FSK. The gene set collections of Gene Ontology Biological Processes
Inhibition and Transcription factor-targets by P4 were obtained from the Molecular Signatures Database. Gene sets significantly altered (|fold change| Inhibition by FSK > 1.5 and adjusted p-value < 0.05) from control to IL-1β were identi- fied using GSVA. These gene sets were then re-evaluated by GSVA for the conditions of P4+IL-1β, Inhibition by FSK+P4 only FSK+IL-1β, and FSK+P4+IL-1β compared with control. These gene sets were classified into 4 categories according to their regulation pat- B tern: 1) No inhibition (activated by IL-1β and not inhibited by the No Inhibition addition of P4 and/or FSK), 2) Inhibition by P4 (activated by IL-1β but this activation was reversed by the addition of P4), 3) Inhibition Inhibition by P4 by FSK (activated/inhibited by IL-1β but this activation/inhibition was reversed by the addition of FSK), 4) Inhibition by P4+FSK
Inhibition (activated/inhibited by IL-1β, this by FSK regulation was not reversed by the addition of P4 or FSK individually, but it was reversed by the addition of P4 and FSK together). For each Inhibition by FSK+P4 of these four categories, the fold change of sample gene sets (repre- senting A. Gene Ontology Biological Processes, B. Transcription Factor Targets) in response to different combinations of stimuli are shown. See Supplemental files 10 and 11 for enrichment fold change val- ues for all GO BP and TF terms, respectively, significant in IL-1β compared to control. These include enrichment fold changes for all IL-1β conditions. 66
A Inhibition of IL-1B Regulatory Signaling
IL18 JUN STAT3
CEBPB CEBPD
CEBPA POU2F1 IL1B
CCL2 SOCS3 SMAD4 CSF2 CSF3 PPARG CCL3 EGR1 DDIT3 CCL4
AR BCL2A1 SOD2 IL6 HIF1A
IL11 SMAD1 IL24 RARA FLI1 HSD11B1 SAA2 SERPINB2 SAA1 ALDH1A3 MMP1 MMP3 ITGB8 CXCL2 SPRR2A CYP27A1 MYOCD PTGS2 IL23A SERPINA1 CCL7 ANGPTL4 PDGFB CCL20 KRT14 KCNN3 MMP12
Not Significant IL1B-responsive not inhibited Inhibited by P4 only Inhibited by FSK only
Inhibited by FSK+P4 only Inhibited by P4, FSK+P4 Inhibited by FSK, FSK+P4 Inhibited by P4, FSK, FSK+P4
B FSK and P4 Regulatory Signaling
Not Significant Responsive to FSK only Responsive to FSK+P4 only
Responsive to P4, FSK+P4 Responsive to FSK, FSK+P4 Responsive to P4, FSK, FSK+P4
Not Significant Up-regulated Down-regulated
Figure 4.6. Regulation interplay of genomic signaling by IL-1β, FSK, and P4. A. Inhibition of the regulatory network induced by IL-1β (same network and inputs used for Fig. 4.3C) by differ- ent combinations of P4 and FSK (e.g. an orange node is significant in IL-1β vs Ctrl and remains significant in the same direction in P4+IL-1β vs Ctrl but becomes insignificant or significant in the opposite direction in FSK+IL-1β vs Ctrl and FSK+P4+IL-1β vs Ctrl). B. Regulatory patterns of P4 and FSK. The network is obtained by performing RNEA for FSK+P4 vs Ctrl with a log2(FC) cutoff of 2.4 and p-value cutoff of 0.05. Node colors are based on P4 vs Ctrl, FSK vs Ctrl, and FSK+P4 vs Ctrl expression comparisons. Node shape indicates differential expression of each node, including direction of regulation. 67
consistent with the idea that cAMP signaling and P4 are anti-inflammatory. It is impor- tant to understand how these two processes work individually and in concert to eluci- date their role in maintaining quiescence. The results in this chapter identify pathways and regulatory mechanisms that may be important for fulfilling this role.
4.7 Summary
RNA-seq data derived from an immortalized human myometrial cell line, hTERT-HMA/B
induced to express PR-A and PR-B, treated with combinations of IL-1β, P4, and FSK
(activates cAMP signaling) and vehicle were used to address two questions: 1) what
genes and regulatory pathways are affected by IL-1β, P4, and FSK in myometrial cells
and 2) how do P4 and FSK affect responsiveness of myometrial cells to IL-1β. These
questions were answered using global differential gene expression analysis, gene set en-
richment, and regulatory network enrichment. Those analyses suggest that 1) P4/PR
signaling inhibits responsiveness to IL-1β primarily through AP-1 (JUN), 2) cAMP sig-
naling exerts anti-inflammatory effects alone while also helping to establish a relaxed
and hypertrophic state, and 3) that P4 and cAMP signaling work together in a syner-
gistic manner to further inhibit certain inflammatory processes. Specific regulatory
molecules/interactions were observed to have altered activity under individual treat-
ment with the three stimuli, providing context to the inter-process regulation observed
from gene set enrichment. 68
5 Identifying Transcriptional
Similarities of Human Myometrial
Tissues and Cell Lines
5.1 Agreement of Transcriptional Alterations at the Gene,
Pathway, and Network Levels
Similar methods of analysis, such as gene set enrichment and network construction, were used to study tissue data (chapter 3) and cell line data (chapter 4). Therefore, a simple examination of the results from these chapters can allow for a comparison be- tween these two types of experimental specimens.
Enrichment analysis by GSVA, using a combination of hallmark and selected sets for tissue and GO BPs for cells, shows a level of similarity. For example, the term "TNFα signaling via NF-κB" is up-regulated in labor and both TNFα signaling and NF-κB sig- naling are up-regulated by IL-1β. This pattern is also seen for pathways related to IL-2/6,
STAT3/5, cytokine signaling, and muscle cell differentiation/migration. Interestingly, a number of these are inhibited by P4, either alone or when in the presence of IL-1β. 69
Smooth muscle contraction is up-regulated in laboring myometrium, not significantly altered by IL-1β, but significantly down-regulated by FSK alone.
A more direct comparison of term enrichment can be achieved by repeating the
GSVA calculation on the tissue data using GO BPs. Overlap in enriched terms from the tissue data (Warwick study and 126 high-confidence set of differentially expressed genes) and cell line data (terms significant in response to at least one of P4, FSK, or IL-
1β compared to control and just the IL-1β altered terms) in pairwise comparisons (Fig.
5.1). Interestingly, some of the terms in Fig. 5.1A are up-regulated in labor, but vary in terms of their response to FSK with some being strongly up-regulated and others in- hibited. Shared terms in Fig. 5.1D are of interest considering the high-confidence gene set strongly represented inflammatory signaling. When comparing with cells treated with IL-1β in addition to P4 and/or FSK, hypotheses can be drawn about how parts of this labor signature are affected by pro-quiescent stimuli. For example, a number of these pathways are only inhibited by the combination of P4 + FSK, including regulation of leukocyte chemotaxis, inflammatory response, and regulation of STAT cascade. All shared terms in Fig. 5.1A and D and their enrichment statistics are listed in Appendix
A Tables 7 and 8, which provides more detail for each pathway indicating how pathway enrichment varies under different cell line treatment conditions.
Despite using two quite different methods (shortest path-based starting from a full signaling network for tissue compared to a TF-target gene regulatory network method utilizing differential expression) there are some agreements between the tissue and cell line-based networks. Comparing the non-labor network (Fig. 3.7B) and FSK + P4 RNEA network (Fig. 4.6B), there are a number of common TFs including RELA, MYC, TP53,
PRKCZ, and ERG1. Likewise, for the labor network (Fig. 3.7C) and IL-1β RNEA network 70
A B
C D
41
Significant in at least 1 of: Significant Tissue High- Significant in P4 vs CTRL, in Warwick Confidence IL-1B vs CTRL FSK vs CTRL, Dataset 126 Gene Set IL-1B vs CTRL Figure 5.1. Comparison of cell line and tissue gene set enrichment of GO Biological Processes. Gene set variation analysis (GSVA) was performed for the conditions P4 vs CTRL, FSK vs CTRL, IL-1β vs CTRL and W-IL vs W-NL. Simple over-represenation anlaysis was also performed for the 126 high-confidence genes identified in Fig. 3.1 using Enrichr. A-D. Overlap in enriched GO BPs for different pairwise comparisons of cell line-based and tissue-based enrichment. Shared terms for A and D and their enrichment fold changes/p-values can be found in Tables 7 and 8 Appendix A.
(Fig. 4.6A), shared TFs are IL6, STAT3, ERG1, AR, SMAD, and IL-1β. Despite these sim- ilarities, there are very few cases of the down-stream targets comprising of the same genes (only SERPINA1 for the latter comparison).
Taken together, it appears that differences in myometrial cell gene regulation be- tween labor and non-labor are not always the same in treated cell line comparisons despite the observed agreement at the pathway level (based on similarity in enriched 71
pathways) and the TF level (based on comparisons of constructed networks). This sug- gests that a more global, unsupervised assessment of the alterations in gene expression will indicate a strong degree of similarity between the transcriptional landscapes of my- ometrial tissue and specifically treated cell lines. The analysis in the following section attempts to address this hypothesis.
5.2 Assessing Similarity Using Tissue Similarity Index
Tissue similarity analysis was used to determine the extent to which hTERT-HMA/B cells
in response to various stimuli mimic term myometrium. To do this, the Warwick RNA-
seq dataset from chapter 3 (term laboring and non-laboring myometrium, N = 5 for
each) was compared to the treated cell line using a method introduced by Sandnberg
and Ernberg (2005) called Tissue Similarity Index (TSI)53. TSI employs singular value
decomposition (SVD) to find an average profile for all samples of a given type (e.g. a spe-
cific cancer or tissue) based on the variability within the gene expression matrix. The cell
line samples are then projected (using the calculated singular vectors) into the reduced
SVD space and their correlations to the different average tissue profiles is calculated.
The correlations produce the TSI values, which range from -1 (low similarity) to 1 (high
similarity).
Correlation for each hTERT-HMA/B cell line sample was calculated with the aver-
aged non-labor profile and labor profile and were symmetric (i.e. samples with a corre-
lation of 1 to the non-labor representative had a correlation of -1 to the labor represen-
tative). Based on the current understanding of P4, cAMP, and inflammatory signaling
in parturition, as well as the results presented in this work, a given trend was expected 72
for these cell line sample groups. IL-1β should likely be most labor-like, addition of
P4 and/or FSK to IL-1β-treated cells should make them less labor-like, untreated cells should be roughly in a middle-ground between labor and non-labor, while FSK-, P4-, and FSK+P4-treated cells should be more non-labor-like. TSI was performed on 4 sepa- rate gene expression matrices containing: 1. all genes, 2. highly significant (|fold change|
> 5 and adjusted p-value < 0.001) genes from P4, FSK, and IL-1β individual treatment
(145 genes), 3. a set of key cell line genes selected based on the results presented in Fig- ures 4.1-4.6 (17 genes), and 4. highly significant genes from the tissue dataset (non-labor vs labor; 151) (Fig. 5.2A, see chapter 2 for a detailed description of the four gene sets).
Using all genes shared between the two expression studies (gene set 1) in the TSI calculation produced fair TSI values (i.e. similar to the hypothesized pattern of similar- ity), with IL-1β-treated cells having the highest correlation, about 0.5, to the averaged labor profile while FSK+P4-treated cells were highly correlated, around -0.75, with the averaged non-labor profile. Using gene set 2, the correlation to labor increased to 0.85 for IL-1β and all IL-1β conditions were more labor-like. In contrast, P4-treated cells achieved the highest correlation to the non-laboring phenotype. This pattern indicates that the strong anti-inflammatory action of P4 is captured well by this gene set, but the relaxatory action of cAMP is less represented. Use of gene set 3 produced a result highly similar to our prior expected cell condition-tissue phenotype similarity. Addition- ally, the synergistic effect of FSK and P4 seen in the GSVA results were replicated, with
FSK+P4 being more highly correlated with the non-laboring phenotype and FSK+P4+IL-
1β correlated with non-labor, almost equivalent to the untreated cells. FSK and P4 were also clearly relaxatory stimuli in this result, seen by the substantial increase in corre- lation to non-labor compared with control. These patterns were also clearly visible in 73
A
B C
Figure 5.2. Comparison of cell line and tissue samples using Tissue Similarity Index. Singular Value Decomposition (SVD) was performed on the tissue gene expression matrix and phenotype centroids were calculated in the reduced SVD space. Cell line samples were then projected to this space and their correlations to the labor centroid were calculated. A. Median correlations and standard deviations per sample group are shown for all genes shared between the two stud- ies, highly significant (|fold change| > 5 and adjusted p-value < 0.001) genes from the individual stimulus treated cells compared to control, a set of key genes selected from the comprehensive analyses of cell lines reported in Figures 4.1-4.7, and highly significant genes from the tissue study by comparing in-labor and not-in-labor samples. B. A heatmap of the key genes with me- dian expression across all sample groups. C. Visualization of samples and tissue centroids using the first two singular vectors (or components) from the SVD analysis on the key genes selected based on the cell line analysis.
the gene expression matrix (Fig. 5.2B). Finally, gene set 4 (tissue-derived) produced the
same ordering as the key cell line genes, albeit with more variability. 74
Further visualization of the TSI results was achieved using the first two singular vectors, or components, of the SVD calculation. Figure 5.2C depicts this sample clus- tering for gene set 3 (selected key cell line genes). Tissue phenotype "centroids" indi- cate the representative profile for labor and non-labor. The top two SVD components indicate that the most dominant patterns in the gene expression data are first the IL-
1β/inflammatory signaling and cAMP signaling second. The former, component 1, rep- resents the TSI result in that the FSK/P4 and non-labor samples are separated from the
IL-1β and labor samples with opposite values. The latter, component 2, identifies cAMP signaling, separating FSK-treated samples from the rest. These results are interesting because they were clearly observed in the cell line data, but the components were iden- tified from the tissue data only. The genes came from the observed expression in the cell line, but no direct information from the cells is included in the SVD calculation. This indicates a clear role for cAMP in parturition. These genes also capture the inherent di- versity in the laboring phenotype, clearly exhibiting more variability in expression than the non-labor samples. These high correlations and the ordering of the cell line sample groups indicates that this experimental model of human myometrium accurately cap- tures the transcriptional landscapes of myometrium at term and during parturition.
5.3 Summary
Results from various bioinformatics approaches carried out on RNA-seq data from multiple term myometrial tissue studies and data derived from an immortalized human myometrial cell line, hTERT-HMA/B induced to express PR-A and PR-B, treated with combinations of IL-1β, P4, and forskolin (activates cAMP signaling) and vehicle 75
were used to address an essential question: how do the gene expression patterns in differently treated hTERT-HMA/B cells compare with expression patterns in laboring and non-laboring term myometrium. This question was answered implicitly through comparisons of global differential gene expression, gene set enrichment, and signaling/regulatory network analysis and then explicitly by TSI. Comparison of results from chapters 3 and 4 indicated a relative agreement at the pathway level and the TFs active in regulation. A number of pathways were similarly regulated across certain cell conditions and tissue phenotypes, specifically inflammatory-related signaling that were up-regulated in labor and up-regulated with IL-1β, often with inhibition by either
P4, FSK, or P4 + FSK. Tissue and cell line networks contained a number of the same TFs exhibiting similar differential expression, but there was little similarity when observing their most highly altered targets in the networks. Lastly, TSI results further supported the hypothesis that hTERT-HMA/B cells exposed to IL-1β mimics the laboring phenotype whereas exposure to the combination of P4 and forskolin mimics the pre-labor phenotype. 76
6 Discussion and Conclusions
6.1 Tissue Analysis
Overlap between datasets reveals high-confidence transcriptional markers of labor.
By examining multiple tissue datasets and employing various analysis techniques, more was learned about the transcriptional landscapes of myometrial tissue at different stages of parturition as compared to previous studies that examine only one dataset using sim- ple differential expression analysis. Investigation of the results in an integrative manner helps avoid making false conclusions or dismissing outliers of a given dataset as a con- sequence of the limitations for a single method. The benefit of using multiple datasets is clearly seen with the Venn diagram shown in Figure 3.1, where a minimum of 500 genes would be identified through any given study individually, however, upon comparison across datasets, a compact set of 126 high confidence genes is obtained. Gene expres- sion data are inherently noisy. In this study, there exists the added challenges of differing populations, study sizes, platforms, and clinical phenotypes. Therefore, the 126 identi-
fied genes meeting stringent significance criteria in all studies can be considered as a reliable signature of labor-associated genes. Included in this set of 126 genes are some well-known contraction associated proteins, including PTGS, PTGES, COX2, IL8, CXCL2, 77
CCL2, and SOCS3. The oxytocin receptor (OXTR) is only significantly differentially ex- pressed in the Warwick study.
Implications of miss-classified samples. In addition to the identification of high confidence transcriptional markers in chapter 3, application of multiple computational approaches on the same data helped identify potential outliers within a given study.
Namely, the comparison of miss-classified samples across LASSO/EN and SVD, revealed multiple samples that showed expression patterns more consistent with the opposite phenotype (e.g. samples D-IL7 and D-IL16, which are incorrectly classified in all LASSO/EN classification tasks (training on any dataset) and scored more similarly to non-labor samples by the most dominant singular vector of SVD; the same result is seen for samples L-ILEa2, L-ILEa6, L-ILEs3 from the London dataset). Together, these results bring into question the quality of the samples as they are repeatedly miss-classified in all cases. In this context, each sample has a clear phenotype distinction. However, visible characteristics of laboring women may not match the biochemical actions within the myometrium. The initial stages of labor for certain individuals may still involve further biochemical changes to achieve full activation characteristic of parturition. All three of the tissue studies analyzed in chapter 3 adopted different clinical definitions of labor to designate sample phenotype. These criteria were mainly based on contraction frequency and cervical dilation (Table 2.1).
This observation calls for a clinical standard for determining the presenting phenotype, adding to the importance of the work carried out in this study as it is possible to employ our discovered transcriptional signatures as a means to aid in phenotype calling of a given myometrial sample. 78
Gene expression signatures have greater variability in laboring samples. Analysis of samples via supervised learning also allowed for broader conclusions to be drawn about the data. One example is that the laboring phenotype has greater variability relative to the non-labor phenotype as all “questionable” samples fall into the labor class. This is not surprising as it is inherently easier for physicians to determine the non-laboring phenotype, whereas there is a broad spectrum of laboring markers (cervical dilation, frequency of contractions, etc.) There are also many factors that can influence these phenotype markers of parturition at the time of tissue sample collection, such as age, weight, race, and number of prior pregnancies.
Data normalization and pre-processing. It is important to note that much of the anal- ysis carried out in chapter 3 required an accurate procedure for creating comparable distributions of gene expression across the three transcriptome studies. It is well under- stood that raw data from RNA-seq and microarray platforms have dissimilar distribu- tions and characteristics.76 Therefore, to perform cross-dataset classification and SVD,
a careful normalization approach was needed. There are many proposed procedures for
creating agreement between RNA-seq and microarray data.76 In the analysis presented
above, a relatively simple approach was used and was highly effective, demonstrated by
the performance of the classification models and singular vectors from the SVD analysis.
It is important to keep in mind that more complicated study comparisons may require
other, more sophisticated techniques for sufficiently accurate results.
Other areas of consideration may also be important when combining studies in
similar analysis pipelines. Here both RNA-seq studies employed paired-end sequencing
and aligned around 97% of reads. London averaged 53 million reads per sample and
Warwick averaged 7.2 million. Here London and Warwick were aligned and processed 79
in the same manner, leading to the identification of 23617 genes while 19897 genes were identified for the Detroit study (microarray). After removal of duplicate genes and those with very low expression, the gene expression matrices were similar in size (around
15000 genes). Finally, 410 genes are significant in both the W-IL vs W-NL and the L-ILEs vs L-NL, 48 of which are not tiled on the array in the Detroit study.
Comparing tissue studies with other experimental techniques. The expression signa- tures obtained in chapter 3 can also be applied to understand how well other experi- mental samples match to true phenotypical conditions within the myometrium, at least in terms of gene expression patterns. For example, the trained classifiers and SVD sin- gular vectors can be employed to assess samples from new studies, cell lines, or even different uterine tissues. The similarity between cell line models and the tissues they are derived from is a very important aspect of consideration for in vitro research. Therefore, the consistencies and inconsistencies between tissue-based gene expression signatures and cell-line based expression signatures can serve as a means to understand the model system and assess the ability of the model system to accurately represent the biological state(s) being studied.
Results reveal differences between computational techniques. The results in chapter
3 also revealed certain strengths and weaknesses of different analysis techniques in the context of our research problem. One example can be seen from the LASSO and EN clas- sification performance on different datasets and classification tasks. It was observed that models derived from the Warwick and Detroit datasets produced similar perfor- mance on the London dataset despite the fact that the number of samples in the De- troit study is much greater than that of the Warwick study. This may be simply due to the observed higher level of differential expression of the Warwick study or an artifact 80
of comparing studies from different platforms. Additionally, the classification perfor- mance suggests that EN is better suited for cross-dataset classification. One probable reason for this is that EN, unlike LASSO, allows correlated features to be included in the
final model. A LASSO model will be restricted to including only one of these correlated features, which may or may not be differentially expressed in the testing dataset. On the other hand, EN can include this gene and others with similar expression patterns. This increases the likelihood that at least one of the correlated genes will exhibit differential expression in the testing set, allowing it to improve classification accuracy. It is impor- tant to note that the genes selected by these algorithms are not necessarily the genes most associated with phenotypic changes. Rather, they represent sets of genes that can robustly distinguish different phenotypes when considered together. Many other genes may exhibit differential expression between different phenotypes, the differential ex- pression of these genes can be more significant than that of the genes that are selected, and some of those genes may be the genes most associated with phenotypic changes.
However, such genes may be left out of the model if they do not provide additional infor- mation in the presence of other differentially expressed genes. Taken together with the previously mentioned sparsity induced by LASSO and EN, it is not surprising that there was little overlap between the differentially expressed genes and those genes incorpo- rated in the classification models. We also observed little overlap between model genes for different datasets but good cross-study classification. This likely represents the iden- tification of different genes that are part of similar biological systems/processes. Noise in gene expression data can affect the individual dataset signatures, but these algorithms can pick up a signal present in all studies. Another example comes from comparing the 81
differential expression and pathway enrichment results with the SVD output. The for- mer two analyses identified signatures that were almost exclusively associated with in-
flammatory processes (a majority of the 126 shared differentially expressed genes as well as most of the significant GSVA pathways were linked to inflammation). However, SVD identified genes highly associated with the non-labor phenotype (Fig. 3.3B), leading to improved characterization of the transcriptional programs utilized throughout parturi- tion.
Toward characterizing the systems biology of labor. In chapter 3, robust transcrip- tional signatures that could be used to allocate a clinical phenotype to a given sample were identified. These signatures, as seen from gene and pathway enrichment analyses, were highly composed of inflammatory-related processes. This attests to the impor- tance of inflammation in the onset of labor. However, certain pathways may be highly complex with sub-paths signaling during labor and others signaling during quiescence.
This can influence analyses like pathway enrichment. For this reason, it is important, and as we showed can be very informative, to utilize network and pathway interaction information to better interpret how genes are altered in the overall scheme for a pathway of interest.
The London dataset in chapter 3 incorporated a new clinical group to the two ex- isting ones (IL and NL), ILEa: early stage labor, allowing for more time points around the pivotal period of parturition. Although no significant gene expression changes were observed, it is still critical for future studies to assess biological changes during vari- ous stages of pregnancy and labor. Migale et al. (2016) carried out a temporal tran- scriptomics (RNA-seq) study on CD1 mice under conditions of term gestation, RU486- induced preterm labor, and lipopolysaccharide (LPS)-induced preterm labor.77 They 82
identified that the LPS model (inflammatory) most closely resembled human labor, ex- hibiting specific changes that were also seen in this study (up-regulation of interferon, apoptosis, IL6, JAK-STAT, and cytokine signaling pathways). Salomonis et al. (2005) per- formed global gene expression analysis (microarray) on myometrial samples collected from mice at multiple stages of parturition (non-pregnant, mid-gestation, late gestation, and postpartum).78 Results from this study exhibit strong similarity with the results pre-
sented here including up-regulation of cAMP and progesterone signaling during qui-
escence and up-regulation of contractile signaling and cell remodeling during labor.
Further research in this area will help identify the timing mechanisms and responsible
pathways for transitioning the quiescent myometrium to the contractile state, which in- volves many factors and processes (inflammatory load, functional P4 withdrawal, tissue
cross-talk) working in concert.79 Understanding these timing mechanisms and the rela- tionship between P4 and inflammation can then be used to model parturition and make predictions about preterm birth risk.80 Larger studies probing gene expression through- out pregnancy and labor in humans is much more difficult, however, steps in this direc- tion would improve our understanding of how the uterine tissues establish and maintain quiescence, prepare for labor, and achieve parturition.
It is important to note that the transcriptome datasets were derived from total RNA extracted from whole-tissue samples, which are heterogeneous, containing myometrial cells, connective tissue and immune cells such as leukocytes. Shynlova et al. (2013) showed in mice that the number of immune cells recruited to the myometrium increases at the time of parturition.81 This is therefore an important consideration when assess- ing changes observed in the transcriptional studies analyzed in this work. To obtain a 83
full picture of transcriptional alterations during parturition, further studies utilizing ad- vanced experimental techniques (e.g. single-cell transcriptomics) are needed to assess gene expression changes specific to myometrial cells as well as determine the popula- tion dynamics of the non-myometrial cells during quiescence and labor.
6.2 Cell Line Analysis
Importance of the PR-A:PR-B Ratio. Human parturition is associated with an increase in the PR-A:PR-B ratio that is thought to cause functional P4 withdrawal2. Recent studies suggest that the PR-A:PR-B ratio is affected by P4 and pro-inflammatory stimuli due to differential effects of serine phosphorylation on the stability of PR-B
(steady-state level decreases) and PR-A (steady-state level increases)5. Here it was
proposed that the labor associated increase in the myometrial PR-A:PR-B ratio occurs
in response to pro-inflammatory stimuli inducing increased stability of PR-A17. Using
tissue similarity index analyses it was observed that hTERT-HMA/B cells exposed to P4
and forskolin mimic quiescent myometrium whereas cells exposed to IL-1β mimic
laboring myometrium. This suggest that transition to the labor phenotype (at least
based on transcriptome analyses) could be caused by a loss of cAMP/PKA signaling
and/or gain of pro-inflammatory stimuli that alters P4/PR activity due to changes in PR
phosphorylation. In this context the PR-A:PR-B ratio in myometrial cells reflects the
balance between the cAMP/PKA and inflammatory load stimuli.
Considerations Regarding Enrichment Analysis. Analysis of gene expression changes
using gene set enrichment allows for better interpretation of the altered expression of
hundreds of genes by grouping them in biologically meaningful sets (e.g. pathways or TF 84
targets). While this analysis is highly useful, there are caveats to consider. One is that the top enriched sets are not always the definitive cause/effect of the observed expression changes. This is especially the case for TF enrichment. While many targets of a specific
TF may exhibit altered expression, one cannot be certain that these alterations were due to increased or decreased activity of a specific TF in question. Furthermore, when many genes are changing, it is even less likely that a specific TF is solely responsible but rather a combination of some of the significantly enriched TFs. As an example, targets of
P4/PR were significantly down-regulated in the IL-1β vs CTRL comparison (Fig. 4.5B).
By experimental design, these cells express PR, however, PR is only expected to be active when bound to its ligand P4. Therefore, it is likely that other factors are responsible for changes in these genes (or potentially ligand-independent actions of PRs), but the enrichment shows PR as a top candidate.
GSVA was used to calculate enrichment partly because it provides directionality for each gene set in the form of an enrichment fold change and our initial hypothe- ses suggested opposite regulation of similar pathways in different conditions (e.g. pro- inflammatory signaling by IL-1β and anti-inflammatory action of P4). GSVA takes into
account the direction of expression change in each gene in the gene set while other en-
richment approaches calculate differential expression for each gene first, followed by
tallying the number of significant genes in the set, regardless of up or down-regulation.
Both analyses have their advantages and shortcomings and the approach used should
be based on the study design and research questions. For regulatory network enrich-
ment analysis (RNEA), TFs that undergo significant changes in expression are included
in the network. However, it is known that this is not solely dependent on gene expression
as most TFs are activated by post-translational modifications. RNEA covers this case by 85
allowing TFs exhibiting minimal or no change in expression in the network if it is up- and down-stream of two differentially expressed genes. TFs having no differentially ex- pressed up-stream regulators may play a role in the observed down-stream changes in target expression but will not be incorporated into the network. GSVA and RNEA were combined in the analysis to circumvent these shortcomings. RNEA gives the regulatory pathway detail with targets and the directionality of their changes, while GSVA requires no information about the transcriptional alteration of the TFs themselves. Combining the techniques allows for identification of pathways, TFs, and targets of those TFs that
fit specific relationships between P4, cAMP,and IL-1β signaling.
Observing results from these two approaches in the context of chapter 4 suggests two potential avenues for treating preterm birth. The transcription factor complex AP-1,
specifically the JUN sub-unit, was seen to be a key part of the inflammatory signaling in-
duced by IL-1β. This was distinctly inhibited with co-treatment of P4 and up-regulated
targets of JUN were also strongly inhibited by P4 alone, suggesting that this is one of
the key mechanisms influenced by P4 to maintain quiescence throughout pregnancy.
On the other hand, NF-κB was clearly a major part of the IL-1β-induced inflammatory
signaling and this activity remained significant under co-treatment with P4 and/or FSK.
Therefore, molecular targeting of JUN or NF-κB may reduce the likelihood of preterm
delivery. Inhibition of NF-κB would provide a second anti-inflammatory action sepa-
rate from that carried out by P4 and FSK while inhibition of JUN would act alongside
P4 and cAMP signaling in preventing signaling through this TF. Treatment with a JUN
inhibitor during late stages of pregnancy may work most effectively, as it could work in
place of P4 after its functional withdrawal. NF-κB is thought to cooperate with AP-1
and CEBP to activate labor genes due to the presence of both AP-1 and NF-κB binding 86
sites on the promoters of these genes.82,83 Therefore, an inhibitor of just one of AP-1 or
NF-κB could be successful in reducing the risk of preterm birth. Recent studies suggest that full inhibition of NF-κB signaling would be undesirable for a number of reasons and suggest partial inhibition through a small noncompetitive IL-1R biased ligand or com-
pounds targeting the non-canonical NFκB signaling pathways.82,84 Finally, the analysis
in chapter 4 identified an important role for cAMP signaling in terms of promoting qui-
escence (smooth muscle relaxation and hypertrophy). Therefore, there are three main
components to consider when developing any therapeutics for preterm birth, whether
it be inhibiting anti-inflammatory actions or stimulating relaxatory pathways.
6.3 Tissue and Cell Line Comparison
The use of cell line models as an approximation of in vivo systems provides alternative
and useful avenues of research to address various biological questions. The experimen-
tal model setup used in chapter 4 involving treated cell lines allows for isolated investiga-
tion of processes that are interconnected in a complex manner throughout parturition.
Through the comparisons in chapter 4 synergistic effects of P4 and cAMP was identified
as well as how cAMP inhibits signaling by IL-1β. These insights are made more mean-
ingful when the in vitro model was shown to effectively approximate term myometrium.
Importantly, as expected, IL-1β- and FSK + P4-treated cells had similar gene expression
profiles to laboring and non-laboring myometrial samples, respectively. The similarity
- at least at the gene expression level - suggests that the hTERT-HMA/B cell line mimics
the gravid myometrium in response to specific treatments. Interestingly, under basal 87
conditions hTERT-HMA/B cells appear less relaxed than non-laboring myometrium but can be transformed to this phenotype by exposure to P4 and FSK.
6.4 Conclusions
Translational Implications. Parturition is a complex process involving significant changes at the cellular and tissue levels. This makes it difficult to understand and treat labor disorders such as preterm labor. Understanding how parturition is carried out by the uterine tissues is key to gaining insights into the cause of labor disorders. By utilizing three large-scale molecular datasets in combination with advanced bioinformatics techniques, it was shown that there are significant changes at the level of transcription within the myometrium during the transition from quiescence to labor.
Furthermore, genes and pathways representing these changes were identified and a signaling network of parturition was constructed. This work, detailed in chapter 3, demonstrates the utility of bioinformatics approaches as a means to probe complex biological events, as it has provided robust transcriptional signatures that can be useful for both characterizing, by looking at the representation of these signatures in a given clinical sample, and understanding parturition. Additionally, combination of quality datasets can provide results that may not be seen through analysis of individual studies and allows for more powerful conclusions to be reached.
Experimental Implications. The cell line analysis presented in chapter 4 is, to my knowledge, the first to examine the relationship between major pathways that control human parturition at the genome-wide level in myometrial cells. P4, cAMP, and inflammatory signaling are believed to play a major role in pregnancy and parturition. 88
Through the presented analysis, testable hypotheses on how these processes interact were generated to guide future research efforts. This includes using the results presented in chapter 4 to find new ways to increase the anti-inflammatory actions of P4 and cAMP to promote uterine quiescence as a means for treating preterm birth. Top targets include JUN and NF-κB, whose inhibition may supplement the action of P4/PR after its functional withdrawal in the myometrium during the late stages of pregnancy.
Final Conclusions. When attempting to understand complex biological processes like parturition, it is important to pursue multiple avenues of research utilizing integra- tive and diverse techniques. Results can then be compared and corroborated to obtain stronger conclusions and more comprehensive knowledge. By studying parturition in myometrial tissues and a treated cell line, it was possible to strategically examine the entire system as a whole using tissue to generate hypotheses about important pathways and then follow this directly in cells by individually targeting these important pathways and the relationships between them. Additionally, the use of multiple tissue datasets and complimentary bioinformatics methods provided a means to establish hypothe- ses with higher confidence while performing similar analyses between tissue and cells
(e.g. pathway enrichment and network construction) allowed for better comparison be- tween the two experimental models. This work demonstrates the utility and effective- ness of the integration of well-designed studies and bioinformatic pipelines to investi- gate a given biological process or disease and will hopefully serve as a useful guide for similar studies in the future. 89
7 Suggested Future Research
7.1 Utilizing Clinical Data
Timing of various physiological and biomolecular events is highly important in partu- rition. There is a need to understand how the triggering of labor is achieved. If it is driven by biological changes in the cell (e.g. gene expression, post translational modifi- cations), do these changes occur gradually or very quickly? Do changes at the transcrip- tional level precede the observations that are made in the clinical setting? To answer these questions, accurate clinical data is needed. Having more information would allow for stratification of women into various stages of labor using bio-molecular markers or signatures, as opposed to the typical two-class conditions of labor and non-labor. Ad- ditionally, it may be possible to identify a series of bio-molecular changes that occur sequentially. Such information would strongly aid our ability to better understand the role of these changes and which may be involved in preterm birth. Regarding results presented in this work, these additional variables would potentially explain the hetero- geneity in the laboring tissue samples or the observed sample miss-classification. 90
7.2 Confirmational Studies for Validation of Findings
The results from chapters 3 and 4 generated a number of hypotheses using various bioinformatics approaches that, if validated experimentally, could strongly support our
findings. It is important to prioritize which findings to validate due to the expenses and time involved in experimental studies. The strongest signal from the cell line analysis
(chapter 4) was that P4 exhibited strong anti-inflammatory action mainly through the
JUN transcription factor. Performing ChIP-qPCR for a subset of JUN target genes (iden- tified from its known targets or the RNEA networks, e.g. MMP1/3/12, SERPINB2, IL6,
BCL2A1, etc.) to see how P4/PR influences the binding of JUN to the promoter regions of its target genes could identify the important aspects of this inhibitory affect and how it is achieved. Another case where ChIP-qPCR could be beneficial would involve look- ing at certain targets of NF-κB since it was observed that they are responsive to IL-1β and strictly not inhibited by P4 and/or FSK. To inform the effects of P4 on the signal- ing mechanisms in pregnancy, it may be of interest to perform phosphorylation stud- ies using the hTERT-HMA/B cell line to investigate specific interactions included in the non-labor specific signaling network from chapter 3 (Fig. 3.7B). In other words, treat cells with P4 and see if signaling proteins down-stream of PR (e.g. COPS9, MAPK10,
PRMT2) become phosphorylated. Experimental validation is an important follow-up step to bioinformatic analyses as confirmations at different levels of biochemical sig- naling/regulation can help researchers understand the accuracy of these analyses and promote more targeted approaches in the future. 91
7.3 Myometrial Cell-Specific Signaling
Tissue samples collected from the myometrium are heterogeneous, including the cells of interest (smooth muscle myometrial cells) but also connective tissue and immune cells (e.g. leukocytes). Therefore, when performing global transcriptomic analysis on these tissue samples, it is important to keep this in mind and note that the changes may not be myometrium specific. It has been shown in this work and others that an increase of inflammation is a key part of the labor process, meaning that an increase in immune cells in the region of myometrial tissue is likely. Thus, it is important to char- acterize these changes, and their effect at the transcriptional level, to better understand signaling at the time of labor in myometrial cells. One potential approach for reduc- ing sample heterogeneity is laser capture micro-disection (LCM), which is a method for isolating specific cells of interest from microscopic regions of tissue with the help of a laser. This technique would allow selection of myometrial cells and removal of un- wanted areas of the sample such as connective tissue, improving the specificity of the transcriptional changes we observe. Another experimental option gaining popularity in the area of transcriptomics is single cell sequencing, which can examine the sequence information from individual cells. These technologies are important to consider in fu- ture research efforts to understand the process of human parturition.
7.4 Linking the Uterine Tissues
Human parturition involves major changes to multiple tissues: cervical dilation, mem- brane rupture, and myometrial contractions. Therefore, to fully understand parturition, it may be necessary to study all uterine tissues during pregnancy and at the onset of 92
labor. Similar studies involving transcriptional data like this one should be performed in the cervix, membranes, and decidua. Are the same changes identified in the my- ometrium occurring in these other tissues during the onset of labor? Is there a cross-talk between the uterine tissues? These are important questions to answer and this can only be achieved by looking at these tissues both individually and as a collective. 93
Appendix A
Supplementary Tables
This section contains tables referenced in the document.
Table 1: Hundred and twenty six high-confidence gene set. Gene name, median •
log2(fold change), and median adjusted p-value for all 126 genes found to be
significantly differentially expressed in the sample group comparisons W-IL vs.
W-NL, D-IL vs. D-NL, L-ILEa vs. L-NL, and L-ILEs vs. L-NL.
Gene Median log2(FC) Median Adjusted P-value
PPP2R2C -0.95523 0.003191
PPP1R3C -0.90792 0.011512
SORL1 0.718524 0.012854
GTPBP4 0.621002 0.008805
CHSY1 0.820915 0.002171
MTHFD2 0.839168 0.001422
NCOA7 0.826025 0.00266
DUSP5 0.970045 0.002171
CD55 0.842211 0.005761
ETS2 0.791655 0.005359
LCP1 0.913972 0.003191
ELL2 1.002648 0.001425
LEFTY2 1.531495 0.002384
RNF149 0.918522 0.002384 94
BMP2 1.089915 0.013079
MNDA 0.899079 0.00475
MAP3K8 0.993665 0.002384
UAP1 0.969502 0.002384
NFKB2 0.825443 0.005728
GADD45A 1.09781 0.002074
SERPINB8 0.927669 0.003534
MAFF 1.298045 0.002171
NFKBIA 0.989885 0.004696
SIPA1L2 1.314768 0.002384
GCA 1.067897 0.005021
PTPRE 0.903658 0.002384
CCRN4L 1.214445 0.007029
UGCG 1.156174 0.003002
CSRNP1 1.449044 0.001511
FCGR3B 1.32533 0.003191
LDLR 0.999709 0.002384
ITPRIP 1.001999 0.002099
MXD1 1.30231 0.002312
TMEM2 1.311551 0.002384
HIF1A 1.016104 0.002384
TNFAIP3 1.398003 0.002384
KRT34 1.81075 0.002384 95
MYC 0.957301 0.002384
UPP1 1.254569 0.002384
TRIB1 1.221102 0.001926
IL4R 0.904563 0.002375
ZC3H12A 1.23521 0.002384
CEBPB 0.86979 0.003691
HBEGF 1.713045 0.002384
PNP 1.524425 0.002384
SBNO2 1.01676 0.002384
LMNB1 1.189158 0.006055
GFPT2 1.091361 0.002384
LMCD1 1.70608 0.002384
NNMT 1.05658 0.002099
PFKFB3 1.318825 0.00226
PIM1 1.274812 0.002361
SLC39A14 1.260558 0.002384
ALPL 1.072588 0.005215
ANPEP 1.424565 0.002384
CEBPD 1.011887 0.002384
ADAMTS9 1.339353 0.002384
SPHK1 1.695745 0.002384
IER3 1.523745 0.002384
SLA 1.459215 0.002384 96
SERPINE2 1.134225 0.008415
BCL3 1.122565 0.002384
SELL 1.664555 0.002384
FOSL1 2.26029 0.002384
ADM 1.45817 0.002384
ITGAX 1.8564 0.002384
FCN1 1.560249 0.002384
RDH10 1.270945 0.002384
SLC7A5 1.636065 0.002074
SOD2 1.615446 0.002384
MYO1G 1.410975 0.003191
PLAUR 1.787685 0.002384
IL1RN 2.601395 0.002384
RUNX1 1.214055 0.002384
LILRB3 1.187979 0.004659
IL1B 3.395985 0.002384
RGS16 1.92145 0.002384
FGR 1.540165 0.002384
ADAM8 1.872865 0.002384
SERPINA1 1.92345 0.002384
GK 2.43881 0.002384
FPR1 2.30624 0.002384
TNFAIP6 2.41967 0.002384 97
TREM1 2.676575 0.002384
STEAP1 2.032775 0.002384
IL7R 2.05551 0.002384
CLEC5A 2.819755 0.002384
CSF3R 1.885729 0.002384
PTGS2 2.61283 0.002099
HK2 1.750735 0.002207
CXCR2 2.19877 0.002384
DIO2 1.151498 0.002384
GREM1 1.995947 0.001935
FFAR2 2.589005 0.002384
SDS 3.219885 0.00448
RARRES1 1.36736 0.002384
CXCR1 2.646205 0.002384
MMP25 1.527125 0.006726
NAMPT 2.575605 0.002024
PTGES 1.739667 0.003534
CCL2 2.553615 0.002384
S100A9 2.016662 0.001535
MT1X 1.98029 0.002171
SOCS3 1.9309 0.002384
LIF 4.09664 0.002384
DIO3 1.334307 0.002099 98
LRG1 2.95882 0.002384
SLC16A10 4.189705 0.004162
PROK2 3.282925 0.002384
LILRA5 2.14944 0.001425
FPR2 3.385695 0.002384
NFE2 1.899855 0.003566
S100A8 2.43954 0.001896
AQP9 3.11008 0.00171
CXCL2 2.80759 0.002384
S100A12 2.31903 0.00475
IL6 3.09081 0.002384
MT2A 2.56653 0.001814
OSM 3.52356 0.003262
ANGPTL4 2.08278 0.002375
SELE 4.53256 0.003983
IL8 5.717645 0.002099
C19orf59 3.455735 0.002874
CCL11 3.078985 0.002384
CXCL6 2.84988 0.004692
CSF3 6.069955 0.008667
Table 2: Unique genes recruited in Warwick EN models. List of genes included • in at least one of the 100 Warwick EN models in 3-fold cross-validation. Column 99
1 provides the gene name, column 2 the percentage of model for which the gene was recruited (95 = 95 of 100 models), and column 3 the average coefficient of the gene for models in which it was recruited.
Gene Model Occurrence % Mean Model Coeffictient
MFAP2 100 0.055456654
PTGER3 100 -0.055312544
SLC43A3 100 0.042976227
KLF12 100 -0.033053288
ALDH1A2 100 -0.038536515
MLKL 100 0.048494287
SPHK1 100 0.068076662
SOCS3 100 0.05452764
C19orf48 100 0.04769892
PCYOX1 100 -0.064223546
SEC23B 100 0.028985156
HM13 100 0.04397553
MITF 100 -0.050448209
ATP1B3 100 0.030574544
SUMF1 100 -0.039376918
PCDH18 100 -0.060874845
ENPP1 100 -0.046371766
SASH1 100 -0.029205489
CAV2 100 -0.035534939 100
OSGIN2 100 -0.060924667
RAI2 100 -0.030830841
ARHGEF6 100 -0.032663317
RCC2 99 0.028576206
MARCH8 99 -0.025981063
CAB39L 99 -0.021524201
FUS 99 0.049451994
STXBP4 99 -0.031382984
LAMA1 99 0.03703682
EPB41L1 99 -0.022160437
IQCB1 99 0.050785036
H2AFY 99 0.025707875
SMAGP 98 0.035492593
PXMP4 98 -0.018151992
CAV1 98 -0.014656707
WDPCP 97 -0.029634657
C16orf59 95 0.035434844
DGCR5 95 0.027227933
LINGO2 95 -0.014788474
CRYZ 94 -0.015830414
SYT6 94 -0.017217741
CLK2 87 0.031805491
RELT 87 0.017522776 101
SH3D19 87 -0.014740035
CDK6 87 -0.018774441
CLSTN3 83 0.017187335
PLK3 76 0.017127736
FDPS 76 0.028329465
EMX2 76 -0.01010169
PITPNC1 76 0.02032546
HN1 76 0.011473405
DNAJB11 76 0.012164545
MYL3 76 -0.02793828
CYP2U1 76 -0.007713685
FBXO8 76 -0.031113747
LEPRE1 73 0.025994706
CHST6 73 0.018269392
CD109 73 -0.013500458
ATP8A1 71 -0.015912672
TSPAN5 71 -0.009770869
EPM2A 71 -0.013864931
FKBP11 70 0.015546604
GLYCTK 70 0.018336927
PALMD 69 -0.009585427
ALX3 69 -0.012381019
FAM189A2 69 -0.012373473 102
PTPRO 68 -0.016297166
UTP11L 65 0.020372246
MEAF6 65 -0.011551939
RABGGTB 62 0.017060963
TMEM200B 62 -0.017427013
BEND7 62 -0.022020567
TMOD2 62 -0.009782723
ZIK1 62 -0.018621767
HNRNPL 62 0.013285247
H1F0 62 -0.010936904
MAP3K4 62 0.014109961
SDC1 59 0.017158121
XBP1 59 0.01964373
MOGS 58 0.017222775
PRSS23 52 -0.012594117
HDAC3 52 0.014120383
TBL2 52 0.009382891
SCARF1 48 0.022630636
CNN2 48 0.012477531
SNN 46 -0.01048817
PTPN2 42 0.017123786
SLC24A6 36 0.020634464
KDM2B 36 0.016221773 103
SGPP2 36 -0.014881268
CERS6-AS1 36 -0.016614687
GGCT 36 0.011838866
MTERFD2 35 0.019410875
SGK1 35 0.018065784
KANK1 35 -0.011285577
NLRP6 33 0.012068898
POM121L9P 33 0.011308218
SEC61A1 33 0.007178117
ASTN2 33 -0.010339384
NONO 33 0.009800326
ZFAS1 32 0.008224224
AMPD2 28 0.0091122
HERPUD1 28 0.005229233
CSMD2 27 0.014017173
DUSP6 27 0.009835593
RNF14 27 -0.00929602
DCLRE1A 26 -0.014059007
EBPL 26 0.006635634
PDE8A 26 0.00993137
HSD17B13 26 -0.008513232
HNRNPH1 26 0.012253197
LGR4 24 -0.004400984 104
ZC3H6 24 -0.00919652
LOC254128 24 -0.006671542
NOL12 24 0.008317766
MYLK-AS1 24 -0.009262725
MSC 24 0.012266858
RIBC1 24 -0.002720396
DCK 23 -0.004691689
OLFML2A 23 -0.009578599
KIAA1462 22 -0.002656676
IFITM3 22 0.003251581
LOH12CR2 22 -0.002964264
NXPH3 22 -0.003316063
SRBD1 22 -0.007116577
AJUBA 18 -0.001556845
ZDHHC3 18 -0.002692914
MARCKSL1 13 0.002155927
SYTL2 13 0.001642151
MMP19 13 0.003509749
CCR7 13 0.004691744
LINC00152 13 0.00312735
PABPC1L 13 0.002441463
BCAS1 13 0.002617315
ABLIM3 13 -0.00197416 105
FPGS 13 0.004640536
LILRB1 11 0.000432662
C1QTNF7 10 -0.002632509
GUCY1A2 7 -0.00142709
GPR68 7 0.000998028
DHRS13 7 0.000432662
DTNA 7 -0.001155681
RPSAP58 7 0.001157481
GNL3 7 0.001356156
WDR46 7 0.001670878
C1orf21 1 -0.000258785
FZD4 1 -0.000530091
ENC1 1 0.000980655
Table 3: Unique genes recruited to distinguish the London NL samples in • LASSO models. List of genes included in at least one of the 100 London LASSO
models in 5-fold cross-validation, specifically for the NL class. 106
Gene Model Occurrence % Mean Model Coefficient
PER3 100 0.55719 MAGOH2 65 0.16145 LOC100505933 57 0.16289 CEBPA 36 0.16122 TEF 27 0.08654 FAM115C 10 -0.21290 C3orf49 7 0.05133
Table 4: Unique genes recruited to distinguish the London ILEa samples in • LASSO models. List of genes included in at least one of the 100 London LASSO
models in 5-fold cross-validation, specifically for the ILEa class. 107
Gene Model Occurrence % Mean Model Coefficient
C1orf229 87 -0.34234 VEPH1 45 0.08555 GEMIN7 36 0.14640 KLK13 36 0.27511 FLJ30403 34 0.12393 ABT1 27 0.15123 NFYC 23 0.16612 GRPR 22 0.05461 APLN 22 0.04980 CCDC144B 18 0.10765 DDX11L2 16 0.22352 NDST3 9 0.18653
Table 5: Unique genes recruited to distinguish the London ILEs samples in • LASSO models. List of genes included in at least one of the 100 London LASSO
models in 5-fold cross-validation, specifically for the ILEs class. 108
Gene Model Occurrence % Mean Model Coefficient
IFT43 69 0.42317 CHRNB2 47 0.06018 MAP2K1 46 0.29258 GOT2 27 0.06024 SPAG17 24 0.02246 SLC22A16 18 0.11233 ANKRD11 15 -0.14405 FGF17 11 0.12297 PGAM2 9 -0.09484 DDX59 5 0.04245 BOK-AS1 5 0.01617 CYB561D1 2 -0.00143
Table 6: Unique genes recruited in Detroit EN models. List of genes included in • at least one of the 100 Detroit EN models in 5-fold cross-validation. 109
Gene Model Occurrence % Mean Model Coefficient
ADORA2B 100 0.259
AEBP1 100 0.108
AMPH 100 0.256
C21orf70 100 0.592
C5orf25 100 -0.208
C6 100 -0.137
C6orf222 100 0.765
CASC1 100 -0.215
CYP4B1 100 -0.294
DBP 100 -0.402
FKBP5 100 -0.171
GREM1 100 0.297
HIST1H3D 100 0.462
HOXA11 100 -0.052
HTR3D 100 0.671
IGFBP6 100 -0.233
KLHDC9 100 -0.062
KRT17 100 0.047
LOC338799 100 -0.165
LOC440934 100 -0.018
CDC14C 100 -0.978
MS4A4A 100 -0.369 110
NUP98 100 0.454
SHMT2 100 0.063
SNORD15B 100 0.457
SSFA2 100 -0.364
STRBP 100 -0.470
TCTA 100 -0.127
TLR7 100 -0.097
ZFHX3 100 -0.180
ZNF189 100 -0.175
ZNF235 100 -1.491
ANK3 99 0.924
DPM2 99 0.087
PLEK2 99 0.403
PTGER4 99 0.059
SPRED3 99 0.501
SUFU 99 -0.445
C16orf86 98 -0.063
CHCHD4 98 0.046
TPPP 98 -0.024
DRG1 96 0.273
MITF 96 -0.038
RPL13P5 94 0.348
C17orf97 94 -0.090 111
ZNF681 92 -0.390
FAM164C 89 -0.064
C9orf93 89 -0.045
KLK7 89 0.108
SLITRK4 89 -0.013
ZBTB26 89 -0.206
GPR126 85 -0.072
VMA21 85 0.067
FAM26F 81 -0.165
PAFAH1B2 81 -0.128
RPS6KL1 81 -0.065
ZNF655 81 0.216
C15orf38 78 -0.106
EIF4E1B 78 0.154
SH2D1A 78 0.322
FAN1 73 -0.024
LOC644714 71 -0.163
C11orf91 71 0.040
KDELR2 56 0.049
PCDHGA6 56 0.083
SFRP1 54 0.004
LYSMD4 47 -0.074
WNK3 47 0.103 112
SCN3B 41 -0.051
KRT6A 38 0.006
PAQR8 34 -0.018
ATP9B 31 0.089
GPR37 29 -0.008
ANGEL2 24 -0.054
CEND1 24 -0.031
GCOM1 20 -0.012
CLK2 18 0.060
PCDHGB1 18 -0.072
ARID4B 13 -0.157
DKK1 13 -0.031
ERMN 13 0.105
SRMS 13 -0.051
TNNI3K 11 -0.046
KCNG3 9 -0.077
C17orf39 8 -0.014
DCBLD1 8 0.003
LOC653720 8 0.051
SCPEP1 8 -0.017
TAGLN3 8 0.036
RERG 6 -0.005
ASAH2B 5 0.022 113
LRRC3 4 -0.041
FAM122A 3 -0.006
PRKG1 3 0.054
OR1N2 2 0.034
PDE3A 2 0.019
SRPX 2 -0.005
COPS7B 1 -0.001
Table 7: GO Biological Processes significantly enriched in at least one of the sin- • gle stimulus conditions (P4, FSK, IL-1β) compared to CTRL and in the Warwick
study (labor vs non-labor) determined by GSVA. Significance is determined by
log(fold change) 1.5 and adjusted p-value < 0.05. Log (FC) values are re- > 2 ported for the relevant comparisons. 114 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL 0.000 1.2070.000 0.000 -1.0860.000 0.600 0.000 1.186 0.019 0.672 0.0000.000 0.008 0.822 -1.0910.000 0.000 0.007 -1.113 0.819 0.0000.000 0.002 0.630 0.972 0.018 0.000 0.773 0.007 vs CTRL GeneSetGO POSITIVE REGULATION OF EPIDERMIS DEVELOPMENT GO PYRIMIDINE CONTAINING COMPOUND BIOSYNTHETIC PROCESS GO REGULATION OF CHOLESTEROL EFFLUX GO TRNA TRANSPORTGO POSTTRANSCRIPTIONAL logFC GENE P4 SILENCING GO GLUCOSE 6 PHOSPHATE METABOLIC PROCESS GO FATTY ACID HOMEOSTASISGO PERK MEDIATED UNFOLDED 0.000PROTEIN RESPONSE -1.119 0.000 0.000 0.905 0.917 0.001 0.000 0.785 0.001 115 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL 0.000 -0.965 0.0000.000 0.709 0.782 0.033 0.0000.000 0.932 -1.008 0.001 0.0000.086 0.672 -0.815 0.008 0.402 0.744 0.019 vs CTRL GeneSetGO DNA DAMAGE RESPONSE DETECTION OF DNA DAMAGE GO CYTOPLASMIC TRANSLATIONGO POSITIVE REGULATION OF TRANSCRIPTION FROM RNA POLYMERASE II PROMOTER IN RESPONSE TO STRESS GO MATURATION OF 5 0.000 8S RRNAGO PYRIMIDINE logFC NUCLEOSIDE P4 -1.080BIOSYNTHETIC PROCESS 0.000GO NCRNA PROCESSINGGO PROTEIN LOCALIZATION TO 0.934 0.000ENDOPLASMIC RETICULUM 0.001 -1.027 0.000 0.220 1.012 0.002 -0.756 0.159 0.832 0.001 116 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL 0.000 -0.879 0.000 1.097 0.000 0.000 0.7950.000 0.000 -0.805 -0.702 0.000 0.002 0.777 0.020 vs CTRL GeneSetGO MATURATION OF SSU RRNA FROM TRICISTRONIC RRNA TRANSCRIPT SSU RRNA 5 8S RRNA LSU RRNA GO RRNA METABOLIC PROCESSGO PROTEIN EXPORT FROM NUCLEUSGO TRANSLATIONAL INITIATIONGO RIBOSOME BIOGENESISGO NCRNA logFC 3 END 0.000 PROCESSING P4 0.392GO NCRNA METABOLIC PROCESS -0.960GO -0.789 POSITIVE REGULATION OF 0.282 0.000DENDRITE MORPHOGENESIS 0.313 -0.839GO ESTABLISHMENT OF PROTEIN 0.359 0.673 0.000 0.872 0.232LOCALIZATION TO ENDOPLASMIC RETICULUM 0.044 -0.800 0.021 GO RESPONSE 0.002 TO -0.844 PAIN 0.770 -0.704 0.222 0.000 0.018 0.204 0.874 0.879 0.761 0.001 0.002 0.002 0.000 0.661 0.000 0.696 0.005 117 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL 0.000 -0.8010.087 0.000 -0.7840.000 0.950 -0.171 -0.751 0.001 0.811 -0.142 0.001 0.8450.000 0.008 -0.634 -0.048 0.709 0.006 vs CTRL GeneSetGO TRNA MODIFICATIONGO RIBOSOMAL SMALL SUBUNIT BIOGENESIS GO RIBONUCLEOPROTEIN COMPLEX LOCALIZATION GO MULTI ORGANISM METABOLIC 0.000PROCESS GO TRNA logFC METABOLIC -0.680 PROCESS P4 GO MATURATION OF SSU RRNA 0.218GO TELOMERE CAPPINGGO 0.814 RIBONUCLEOSIDE DIPHOSPHATE METABOLIC PROCESS 0.005 0.000GO NUCLEAR EXPORT 0.000 -0.629 -0.813 -0.007 0.000 0.000 0.722 -0.828 0.976 0.005 -0.742 0.000 0.144 -0.637 -0.718 0.022 -0.215 0.673 0.003 118 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL 0.000 -0.7770.217 0.000 -0.698 0.733 0.2100.000 0.024 0.857 0.6210.000 0.001 0.000 0.6880.000 0.931 -0.230 -0.590 0.001 0.6920.000 0.795 0.008 -0.691 0.674 0.000 0.007 0.657 0.010 vs CTRL GeneSetGO NUCLEAR TRANSCRIBED MRNA CATABOLIC PROCESS NONSENSE MEDIATED DECAY GO RIBONUCLEOPROTEIN COMPLEX BIOGENESIS GO RIBOSOME ASSEMBLYGO RESPONSE TO LAMINAR FLUID SHEAR STRESS GO POSITIVE logFC REGULATION OF P4 ERYTHROCYTE DIFFERENTIATION GO T CELL ACTIVATION INVOLVED 0.000IN IMMUNE RESPONSE GO NUCLEOTIDE BINDING -0.822 DOMAIN LEUCINE RICH REPEAT CONTAINING 0.000RECEPTOR SIGNALING PATHWAY 0.798 0.006 119 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL 0.000 -0.657 0.7100.000 0.659 0.645 0.021 0.0000.000 0.595 0.6230.000 0.016 0.000 -0.9080.000 0.660 0.000 -0.611 0.006 -0.814 0.000 0.018 0.587 0.007 vs CTRL GeneSetGO LIPOPOLYSACCHARIDE MEDIATED SIGNALING PATHWAY GO FIBRINOLYSISGO NUCLEOTIDE PHOSPHORYLATIONGO NEGATIVE REGULATION OF LEUKOCYTE APOPTOTIC PROCESS GO NUCLEOSIDE 0.000 SALVAGEGO REGULATION logFC OF TRANSCRIPTION P4 -0.596REGULATORY REGION DNA BINDING -0.042GO NEGATIVE REGULATION OF POTASSIUM 0.000ION TRANSMEMBRANE TRANSPORTER 0.700 ACTIVITY -0.594GO POSITIVE REGULATION OF 0.007 0.000INTERLEUKIN 0.199 1 BETA PRODUCTION -0.851 0.708 0.000 0.042 0.784 0.001 120 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL 0.000 -0.6110.000 0.772 0.6290.000 0.693 0.791 0.771 0.006 0.000 0.630 0.000 -0.835 0.040 0.596 0.000-1.064 0.019 -0.911 0.345-1.064 0.002 0.517 0.345 0.939 0.554 0.001 0.722 0.006 vs CTRL GeneSetGO POSITIVE REGULATION OF INTERLEUKIN 1 SECRETION GO REGULATION OF ALPHA BETA T CELL PROLIFERATION GO REGULATION OF MEIOTIC NUCLEAR DIVISION GO NEGATIVE REGULATION OF POTASSIUM ION logFC TRANSMEMBRANE TRANSPORT P4 GO EMBRYO IMPLANTATIONGO POSITIVE REGULATION OF GLIAL CELL DIFFERENTIATION GO POSITIVE REGULATION OF GLIOGENESIS -1.154 -0.230 0.752 0.720 0.002 121 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL -1.005 0.413-1.005 0.363 0.405-0.721 0.633 0.486 -0.005 0.018 0.000 1.067 0.538 0.000 0.001 0.000 0.618 1.114 0.004 0.3340.000 0.594 0.923 0.000 0.014 0.622 1.180 0.005 0.597 0.045 vs CTRL GeneSetGO REGULATION OF ASTROCYTE DIFFERENTIATION GO POSITIVE REGULATION OF ASTROCYTE DIFFERENTIATION GO POSITIVE REGULATION OF SMOOTH MUSCLE CELL PROLIFERATION GO POSITIVE REGULATION OF NF KAPPAB IMPORT logFC INTO NUCLEUS P4 GO POSITIVE REGULATION OF LIPID TRANSPORT GO NEGATIVE REGULATION OF INTERLEUKIN 10 PRODUCTION GO COPPER ION TRANSPORT 0.000 -0.428 1.010 0.704 0.002 122 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL 0.000 0.0000.000 0.982 -0.2550.000 0.654 -1.228 -0.203 0.012 0.657 1.0120.000 0.037 0.706 0.3760.000 0.004 0.913 -0.0420.000 0.594 0.967 -0.190 0.007 0.676 0.985 0.010 0.865 0.002 vs CTRL GeneSetGO REGULATION OF MACROPHAGE ACTIVATION GO REGULATION OF UBIQUITIN PROTEIN LIGASE ACTIVITY GO CYTOPLASMIC PATTERN RECOGNITION RECEPTOR SIGNALING PATHWAY GO ACUTE PHASE RESPONSEGO POSITIVE logFC REGULATION OF P4 ACUTE INFLAMMATORY RESPONSE GO CELLULAR RESPONSE TO INTERLEUKIN 6 GO DENDRITIC CELL 0.000DIFFERENTIATION -0.042 1.076 0.747 0.001 123 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL 0.000 0.0000.000 -0.891 -0.5010.000 0.883 0.913 0.103 0.001 0.000 0.640 0.705 -0.011 0.008 0.000 0.641 0.743 0.000 0.009 0.000 0.604 0.712 0.008 0.023 0.000 0.632 0.638 0.051 0.040 0.648 0.595 0.007 0.677 0.003 vs CTRL GeneSetGO MICROTUBULE DEPOLYMERIZATION GO REGULATION OF INTERLEUKIN 1 SECRETION GO ALPHA BETA T CELL DIFFERENTIATION GO POSITIVE REGULATION OF INTERLEUKIN 2 logFC PRODUCTION P4 GO POSITIVE REGULATION OF ALPHA BETA T CELL PROLIFERATION GO POSITIVE REGULATION OF ALPHA BETA T CELL ACTIVATION GO REGULATION OF ALPHA BETA T CELL ACTIVATION 124 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL 0.000 -0.2440.000 0.675 -0.362 0.760 -0.872 0.008 0.647 0.028 vs CTRL GeneSetGO MYELOID DENDRITIC CELL ACTIVATION GO REGULATION OF TRANSCRIPTION INVOLVED IN G1 S TRANSITION OF MITOTIC CELL CYCLE GO ACUTE INFLAMMATORY RESPONSE logFC P4 -0.294 -0.056 0.707 0.665 0.002 125
Table 8: GO Biological Processes significantly enriched in IL-1β compared to • CTRL calculated by GSVA and using the 126 high-confidence set of differentially
expressed genes (Fig. 3.1) calculated by EnrichR. Significance is determined by
log(fold change) 1.5 and adjusted p-value < 0.05 for GSVA. Columns 2-5 show >
log2(FC) values for all IL-1β conditions and the last column is the adjusted p-
value from the EnrichR analysis. 126 Enrichment Adj P-val logFC FSK + P4 + IL1B logFC FSK + IL1B logFC P4 + IL1B IL1B 1.093 0.2321.033 0.505 0.808 0.041 0.1860.984 1.49E-02 0.113 0.5780.913 1.02E-02 0.169 NA 0.068 0.8860.958 1.33E-02 0.575 0.5730.825 0.239 8.28E-08 1.054 0.220 0.651 9.69E-03 0.243 5.20E-03 GeneSetGO POSITIVE REGULATION OF TRANSCRIPTION FACTOR IMPORT INTO NUCLEUS GO POSITIVE REGULATION OF LEUKOCYTE CHEMOTAXIS GO CELLULAR RESPONSE TO DRUGGO REGULATION OF LEUKOCYTE CHEMOTAXIS GO POSITIVE logFC REGULATION OF ACUTE 0.883 INFLAMMATORY RESPONSE GO INFLAMMATORY RESPONSE 1.131GO POSITIVE REGULATION OF 0.190NEUTROPHIL MIGRATION 0.083GO PATTERN RECOGNITION RECEPTOR SIGNALING PATHWAY 1.33E-02 0.783 0.745 0.473 0.242 6.62E-13 127 Enrichment Adj P-val logFC FSK + P4 + IL1B logFC FSK + IL1B logFC P4 + IL1B IL1B 0.746 0.2710.803 0.363 0.712 0.0130.828 0.539 0.867 2.40E-04 0.1610.764 0.748 0.651 1.02E-02 0.1270.752 0.079 1.129 4.66E-02 0.124 0.805 1.90E-03 0.4940.921 0.374 1.99E-02 0.287 0.018 5.20E-03 GeneSetGO POSITIVE REGULATION OF VASCULATURE DEVELOPMENT GO REGULATION OF TYROSINE PHOSPHORYLATION OF STAT PROTEIN GO POSITIVE REGULATION OF STAT CASCADE GO POSITIVE REGULATION OF LEUKOCYTE MIGRATION logFC GO MYELOID LEUKOCYTE MEDIATED IMMUNITY GO RECEPTOR METABOLIC PROCESSGO CELL CHEMOTAXISGO REGULATION OF TRANSCRIPTION FACTOR IMPORT INTO NUCLEUS 0.735 -0.187 0.216 0.170 0.759 0.378 4.36E-02 0.131 0.078 4.63E-02 128 Enrichment Adj P-val logFC FSK + P4 + IL1B logFC FSK + IL1B logFC P4 + IL1B IL1B 0.649 0.5410.639 0.518 0.413 0.1300.666 0.100 0.740 1.23E-03 -0.057 0.2570.633 2.65E-02 0.235 NA0.614 2.52E-12 0.578 0.3210.627 0.146 0.038 0.338 -0.053 1.69E-02 0.650 3.09E-02 0.395 4.70E-02 GeneSetGO RESPONSE TO MOLECULE OF BACTERIAL ORIGIN GO REGULATION OF SMOOTH MUSCLE CELL PROLIFERATION GO CELLULAR RESPONSE TO CYTOKINE STIMULUS GO MONOCYTE CHEMOTAXISGO REGULATION logFC OF INSULIN RECEPTOR SIGNALING PATHWAY GO POSITIVE REGULATION OF CHEMOTAXIS 0.680GO POSITIVE REGULATION OF T CELL PROLIFERATION 0.846GO ACTIVATION OF MAPK 0.712 ACTIVITY -0.169 3.18E-03 0.605 -0.032 0.309 -0.206 1.29E-02 129 Enrichment Adj P-val logFC FSK + P4 + IL1B logFC FSK + IL1B logFC P4 + IL1B IL1B 0.592 -0.183 0.520 0.189 2.23E-03 GeneSetGO NEGATIVE REGULATION OF HORMONE SECRETION logFC 130
Complete References
[1] Daniel B Hardy, Bethany A Janowski, David R Corey, and Carole R Mendelson. Progesterone receptor plays a major antiinflammatory role in human myometrial cells by antagonism of nuclear factor-κb activation of cyclooxygenase 2 expression. Molecular endocrinology, 20(11):2724–2733, 2006.
[2] Huiqing Tan, Lijuan Yi, Neal S Rote, William W Hurd, and Sam Mesiano. Proges- terone receptor-a and-b have opposite effects on proinflammatory gene expression in human myometrial cells: implications for progesterone actions in human preg- nancy and parturition. The Journal of Clinical Endocrinology, 97(5):E719–E730, 2012.
[3] Li Chen, Kaiyu Lei, Johann Malawana, Angela Yulia, Suren R Sooranna, Phillip R Bennett, Zhiqing Liang, Dimitri Grammatopoulos, and Mark R Johnson. Cyclic amp enhances progesterone action in human myometrial cells. Molecular and cellular endocrinology, 382(1):334–343, 2014.
[4] Peyvand Amini, Daniel Michniuk, Kelly Kuo, Lijuan Yi, Yelenna Skomorovska- Prokvolit, Gregory A Peters, Huiqing Tan, Junye Wang, Charles J Malemud, and Sam Mesiano. Human parturition involves phosphorylation of progesterone receptor-a at serine-345 in myometrial cells. Endocrinology, 157(11):4434–4445, 2016.
[5] Gregory A Peters, Lijuan Yi, Yelenna Skomorovska-Prokvolit, Bansari Patel, Pey- vand Amini, Huiqing Tan, and Sam Mesiano. Inflammatory stimuli increase pro- gesterone receptor-a stability and transrepressive activity in myometrial cells. Endocrinology, 158(1):158–169, 2016.
[6] Sam Mesiano, Eng-Cheng Chan, John T Fitter, Kenneth Kwek, George Yeo, and Roger Smith. Progesterone withdrawal and estrogen activation in human parturi- tion are coordinated by progesterone receptor a expression in the myometrium. The Journal of Clinical Endocrinology & Metabolism, 87(6):2924–2930, 2002.
[7] Sam Mesiano, Yuguang Wang, and Errol R Norwitz. Progesterone receptors in the human pregnancy uterus: do they hold the key to birth timing? Reproductive Sciences, 18(1):6–19, 2011.
[8] Sarah A Price and Andrés López Bernal. Uterine quiescence: the role of cyclic amp. Experimental physiology, 86(2):265–272, 2001.
[9] Boonsri Chanrachakul, Fiona Broughton Pipkin, and Raheela N Khan. Contribu- tion of coupling between human myometrial β2-adrenoreceptor and the bkca 131
channel to uterine quiescence. American Journal of Physiology-Cell Physiology, 287(6):C1747–C1752, 2004.
[10] BS Skalhegg and Kjetil Taskén. Specificity in the camp/pka signaling pathway. dif- ferential expression, regulation, and subcellular localization of subunits of pka. Front Biosci, 2:d331–d342, 1997.
[11] Manuel Morgado, Elisa Cairrão, António José Santos-Silva, and Ignacio Verde. Cyclic nucleotide-dependent relaxation pathways in vascular smooth muscle. Cellular and molecular life sciences, 69(2):247–266, 2012.
[12] Angela Yulia, Natasha Singh, Kaiyu Lei, Suren R Sooranna, and Mark R John- son. Cyclic amp effectors regulate myometrial oxytocin receptor expression. Endocrinology, 157(11):4411–4422, 2016.
[13] Barbara M Sanborn, Chun-Ying Ku, Sergiy Shlykov, and Lidiya Babich. Molecu- lar signaling through g-protein-coupled receptors and the control of intracellu- lar calcium in myometrium. Journal of the Society for Gynecologic Investigation, 12(7):479–487, 2005.
[14] Barbara M Sanborn. Hormonal signaling and signal pathway crosstalk in the con- trol of myometrial calcium dynamics. In Seminars in cell & developmental biology, volume 18, pages 305–314. Elsevier, 2007.
[15] Anouk Oldenburger, Sara S Roscioni, Esther Jansen, Mark H Menzen, Andrew J Ha- layko, Wim Timens, Herman Meurs, Harm Maarsingh, and Martina Schmidt. Anti- inflammatory role of the camp effectors epac and pka: implications in chronic ob- structive pulmonary disease. PLoS One, 7(2):e31574, 2012.
[16] Charlotte K Billington, Oluwaseun O Ojo, Raymond B Penn, and Satoru Ito. camp regulation of airway smooth muscle function. Pulmonary pharmacology & therapeutics, 26(1):112–120, 2013.
[17] Peyvand Amini, Rachel Wilson, William Koeblitz, Junye Wang, Huiqing Tan, Lijuan Yi, Zachary Stanfield, Andrea Romani, Charles Malemud, and Sam Mesiano. Mech- anism by which progesterone and camp synergize to maintain uterine quiescence during pregnancy. Molecular and cellular endocrinology, 2018.
[18] Kripamoy Aguan, Jorge A Carvajal, Loren P Thompson, and Carl P Weiner. Ap- plication of a functional genomics approach to identify differentially expressed genes in human myometrium during pregnancy and labour. Molecular human reproduction, 6(12):1141–1145, 2000. 132
[19] E Cheng Chan, Sarah Fraser, S Yin, G Yeo, K Kwek, RJ Fairclough, and R Smith. Human myometrial genes are differentially expressed in labor: a suppression sub- tractive hybridization study. The Journal of Clinical Endocrinology & Metabolism, 87(6):2435–2441, 2002.
[20] Gilles Charpigny, M-J Leroy, Michelle Breuiller-Fouché, Zarha Tanfin, Sakina Mhaouty-Kodja, Ph Robin, Denis Leiber, Joëlle Cohen-Tannoudji, Dominique Cabrol, Claude Barberis, et al. A functional genomic study to identify differen- tial gene expression in the preterm and term human myometrium. Biology of reproduction, 68(6):2289–2296, 2003.
[21] M Sean Esplin, M Bardett Fausett, Morgan R Peltier, Steven Hamblin, Robert M Sil- ver, D Ware Branch, Eli Y Adashi, and David Whiting. The use of cdna microarray to identify differentially expressed labor-associated genes within the human my- ometrium during labor. American journal of obstetrics and gynecology, 193(2):404– 413, 2005.
[22] Jon C Havelock, Patrick Keller, Ndaya Muleba, Bobbie A Mayhew, Brian M Casey, William E Rainey, and R Ann Word. Human myometrial gene expression before and during parturition. Biology of reproduction, 72(3):707–719, 2005.
[23] Shrikant Bollopragada, Refaat Youssef, Fiona Jordan, Ian Greer, Jane Norman, and Scott Nelson. Term labor is associated with a core inflammatory response in hu- man fetal membranes, myometrium, and cervix. American journal of obstetrics and gynecology, 200(1):104–e1, 2009.
[24] Pooja Mittal, Roberto Romero, Adi L Tarca, Juan Gonzalez, Sorin Draghici, Yi Xu, Zhong Dong, Chia-Ling Nhan-Chang, Tinnakorn Chaiworapongsa, Stephen Lye, et al. Characterization of the myometrial transcriptome and biological pathways of spontaneous human labor at term. Journal of perinatal medicine, 38(6):617–643, 2010.
[25] Carl P Weiner, Clifford W Mason, Yafeng Dong, Irina A Buhimschi, Peter W Swaan, and Catalin S Buhimschi. Human effector/initiator gene sets that regulate myome- trial contractility during term and preterm labor. American journal of obstetrics and gynecology, 202(5):474–e1, 2010.
[26] Gemma C Sharp, James L Hutchinson, Nanette Hibbert, Tom C Freeman, Philippa TK Saunders, and Jane E Norman. Transcription analysis of the my- ometrium of labouring and non-labouring women. PloS one, 11(5):e0155413, 2016. 133
[27] Yi-Wah Chan, Hugo A Berg, Jonathan D Moore, Siobhan Quenby, and Andrew M Blanks. Assessment of myometrial transcriptome changes associated with spon- taneous human labour by high-throughput rna-seq. Experimental physiology, 99(3):510–524, 2014.
[28] Doris Pieber, Victoria C Allport, Frank Hills, Mark Johnson, and Phillip R Bennett. Interactions between progesterone receptor isoforms in myometrial cells in human labour. MHR: Basic science of reproductive medicine, 7(9):875–879, 2001.
[29] Manju Monga, Chun-Ying Ku, Kimberly Dodge, and Barbara M Sanborn. Oxytocin- stimulated responses in a pregnant human immortalized myometrial cell line. Biology of reproduction, 55(2):427–432, 1996.
[30] Amy A Merlino, Toni N Welsh, Huiqing Tan, Li Juan Yi, Vernon Cannon, Brian M Mercer, and Sam Mesiano. Nuclear progesterone receptors in the human preg- nancy myometrium: evidence that parturition involves functional progesterone withdrawal mediated by increased expression of progesterone receptor-a. The Journal of Clinical Endocrinology & Metabolism, 92(5):1927–1933, 2007.
[31] Gemma Madsen, Tamas Zakar, Chun Ying Ku, Barbara M Sanborn, Roger Smith, and Sam Mesiano. Prostaglandins differentially modulate progesterone receptor-a and-b expression in human myometrial cells: evidence for prostaglandin-induced functional progesterone withdrawal. The Journal of clinical endocrinology and metabolism, 89(2):1010–1013, 2004.
[32] Jennifer Condon, Su Yin, Bobbie Mayhew, R Ann Word, WE Wright, JW Shay, and William E Rainey. Telomerase immortalization of human myometrial cells. Biology of reproduction, 67(2):506–514, 2002.
[33] Cole Trapnell, Lior Pachter, and Steven L Salzberg. Tophat: discovering splice junc- tions with rna-seq. Bioinformatics, 25(9):1105–1111, 2009.
[34] Cole Trapnell, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J Van Baren, Steven L Salzberg, Barbara J Wold, and Lior Pachter. Transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switch- ing during cell differentiation. Nature biotechnology, 28(5):511, 2010.
[35] Loyal A Goff, Cole Trapnell, and David Kelley. Cummerbund: visualization and ex- ploration of cufflinks high-throughput sequencing data. R package version, 2(0), 2012.
[36] Jerome Friedman, Trevor Hastie, and Rob Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1):1, 2010. 134
[37] Gene H Golub and Christian Reinsch. Singular value decomposition and least squares solutions. In Linear Algebra, pages 134–151. Springer, 1971.
[38] Michael E Wall, Andreas Rechtsteiner, and Luis M Rocha. Singular value decom- position and principal component analysis. In A practical approach to microarray data analysis, pages 91–109. Springer, 2003.
[39] Maxim V Kuleshov, Matthew R Jones, Andrew D Rouillard, Nicolas F Fernandez, Qiaonan Duan, Zichen Wang, Simon Koplev, Sherry L Jenkins, Kathleen M Jagod- nik, Alexander Lachmann, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic acids research, 44(W1):W90–W97, 2016.
[40] Minoru Kanehisa and Susumu Goto. Kegg: kyoto encyclopedia of genes and genomes. Nucleic acids research, 28(1):27–30, 2000.
[41] Sonja Hänzelmann, Robert Castelo, and Justin Guinney. Gsva: gene set variation analysis for microarray and rna-seq data. BMC bioinformatics, 14(1):7, 2013.
[42] Arthur Liberzon, Chet Birger, Helga Thorvaldsdóttir, Mahmoud Ghandi, Jill P Mesirov, and Pablo Tamayo. The molecular signatures database hallmark gene set collection. Cell systems, 1(6):417–425, 2015.
[43] Aravind Subramanian, Pablo Tamayo, Vamsi K Mootha, Sayan Mukherjee, Ben- jamin L Ebert, Michael A Gillette, Amanda Paulovich, Scott L Pomeroy, Todd R Golub, Eric S Lander, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 102(43):15545–15550, 2005.
[44] Gordon K Smyth. Limma: linear models for microarray data. In Bioinformatics and computational biology solutions using R and Bioconductor, pages 397–420. Springer, 2005.
[45] Anna Ritz, Christopher L Poirel, Allison N Tegge, Nicholas Sharp, Kelsey Simmons, Allison Powell, Shiv D Kale, and TM Murali. Pathways on demand: automated re- construction of human signaling networks. NPJ systems biology and applications, 2:16002, 2016.
[46] Edsger W Dijkstra. A note on two problems in connexion with graphs. Numerische mathematik, 1(1):269–271, 1959.
[47] C Jiang, Zhenyu Xuan, Fang Zhao, and Michael Q Zhang. Tred: a transcriptional regulatory element database, new entries and other development. Nucleic acids research, 35(suppl_1):D137–D140, 2007. 135
[48] Ahmed Essaghir, Federica Toffalini, Laurent Knoops, Anders Kallin, Jacques van Helden, and Jean-Baptiste Demoulin. Transcription factor regulation can be ac- curately predicted from the presence of target gene signatures in microarray gene expression data. Nucleic acids research, 38(11):e120–e120, 2010.
[49] Obi L Griffith, Stephen B Montgomery, Bridget Bernier, Bryan Chu, Katayoon Ka- saian, Stein Aerts, Shaun Mahony, Monica C Sleumer, Mikhail Bilenky, Maximilian Haeussler, et al. Oreganno: an open-access community-driven resource for regula- tory annotation. Nucleic acids research, 36(suppl_1):D107–D113, 2007.
[50] Heonjong Han, Hongseok Shim, Donghyun Shin, Jung Eun Shim, Yunhee Ko, Junha Shin, Hanhae Kim, Ara Cho, Eiru Kim, Tak Lee, et al. Trrust: a reference database of human transcriptional regulatory interactions. Scientific reports, 5:11432, 2015.
[51] Panagiotis Chouvardas, George Kollias, and Christoforos Nikolaou. Inferring active regulatory networks from gene expression data using a combination of prior knowl- edge and enrichment analysis. BMC bioinformatics, 17(5):181, 2016.
[52] Paul Shannon, Andrew Markiel, Owen Ozier, Nitin S Baliga, Jonathan T Wang, Daniel Ramage, Nada Amin, Benno Schwikowski, and Trey Ideker. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research, 13(11):2498–2504, 2003.
[53] Rickard Sandberg and Ingemar Ernberg. Assessment of tumor characteristic gene expression in cell lines using a tissue similarity index (tsi). Proceedings of the National Academy of Sciences, 102(6):2052–2057, 2005.
[54] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.
[55] Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 67(2):301– 320, 2005.
[56] Xiang-Hong Li, A Hari Kishore, Doan Dao, Weiming Zheng, Christopher A Ro- man, and R Ann Word. A novel isoform of microphthalmia-associated transcrip- tion factor inhibits il-8 gene expression in human cervical stromal cells. Molecular Endocrinology, 24(8):1512–1528, 2010.
[57] Mathias Uhlén, Linn Fagerberg, Björn M Hallström, Cecilia Lindskog, Per Oksvold, Adil Mardinoglu, Åsa Sivertsson, Caroline Kampf, Evelina Sjöstedt, Anna Asplund, et al. Tissue-based map of the human proteome. Science, 347(6220):1260419, 2015. 136
[58] Orly Alter, Patrick O Brown, and David Botstein. Singular value decomposition for genome-wide expression data processing and modeling. Proceedings of the National Academy of Sciences, 97(18):10101–10106, 2000.
[59] Orly Alter, Patrick O Brown, and David Botstein. Generalized singular value decom- position for comparative analysis of genome-scale expression data sets of two dif- ferent organisms. Proceedings of the National Academy of Sciences, 100(6):3351– 3356, 2003.
[60] John Tomfohr, Jun Lu, and Thomas B Kepler. Pathway level analysis of gene expres- sion using singular value decomposition. BMC bioinformatics, 6(1):225, 2005.
[61] Wei Yuan and Andrés López Bernal. Cyclic amp signalling pathways in the regula- tion of uterine relaxation. In BMC pregnancy and childbirth, volume 7, page S10. BioMed Central, 2007.
[62] Morgan R Peltier. Immunology of term and preterm labor. Reproductive Biology and Endocrinology, 1(1):122, 2003.
[63] Jing Pan, Keiichi Fukuda, Mikiyoshi Saito, Junichi Matsuzaki, Hiroaki Kodama, Mo- toaki Sano, Toshiyuki Takahashi, Takahiro Kato, and Satoshi Ogawa. Mechanical stretch activates the jak/stat pathway in rat cardiomyocytes. Circulation research, 84(10):1127–1136, 1999.
[64] Andrew J Brooks, Wei Dai, Megan L O’Mara, Daniel Abankwa, Yash Chhabra, Re- becca A Pelekanos, Olivier Gardon, Kathryn A Tunny, Kristopher M Blucher, Craig J Morton, et al. Mechanism of activation of protein kinase jak2 by the growth hor- mone receptor. Science, 344(6185):1249783, 2014.
[65] Michèle Breuiller-Fouche, Gilles Charpigny, and Guy Germain. Functional ge- nomics of the pregnant uterus: from expectations to reality, a compilation of stud- ies in the myometrium. In BMC pregnancy and childbirth, volume 7, page S4. BioMed Central, 2007.
[66] Stephen R Master, Jennifer L Hartman, Celina M D’cruz, Susan E Moody, Eliz- abeth A Keiper, Seung I Ha, James D Cox, George K Belka, and Lewis A Cho- dosh. Functional microarray analysis of mammary organogenesis reveals a devel- opmental role in adaptive thermogenesis. Molecular endocrinology, 16(6):1185– 1203, 2002.
[67] Wenbo Deng, Jeeyeon Cha, Jia Yuan, Hirofumi Haraguchi, Amanda Bartos, Emma Leishman, Benoit Viollet, Heather B Bradshaw, Yasushi Hirota, and Sudhansu K Dey. p53 coordinates decidual sestrin 2/ampk/mtorc1 signaling to govern parturi- tion timing. The Journal of clinical investigation, 126(8):2941–2954, 2016. 137
[68] Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. Mining of massive datasets. Cambridge university press, 2014.
[69] Didier Croes, Fabian Couche, Shoshana J Wodak, and Jacques Van Helden. Meta- bolic pathfinding: inferring relevant pathways in biochemical networks. Nucleic acids research, 33(suppl_2):W326–W330, 2005.
[70] Dana Silverbush and Roded Sharan. Network orientation via shortest paths. Bioinformatics, 30(10):1449–1455, 2014.
[71] John R Challis, Charles J Lockwood, Leslie Myatt, Jane E Norman, Jerome F Strauss III, and Felice Petraglia. Inflammation and pregnancy. Reproductive sciences, 16(2):206–215, 2009.
[72] Yutaka Shimizu, Takashi Takeuchi, Shizuka Mita, Tatsuto Notsu, Kiyoshi Mizuguchi, and Satoru Kyo. Krüppel-like factor 4 mediates anti-proliferative effects of progesterone with g0/g1 arrest in human endometrial epithelial cells. Journal of endocrinological investigation, 33(10):745–750, 2010.
[73] Marilyn Safran, Irina Dalah, Justin Alexander, Naomi Rosen, Tsippi Iny Stein, Michael Shmoish, Noam Nativ, Iris Bahir, Tirza Doniger, Hagit Krug, et al. Genecards version 3: the human gene integrator. Database, 2010, 2010.
[74] Cecilia Garlanda, Barbara Bottazzi, Antonio Bastone, and Alberto Mantovani. Pen- traxins at the crossroads between innate immunity, inflammation, matrix deposi- tion, and female fertility. Annu. Rev. Immunol., 23:337–366, 2005.
[75] Xiao-Jun Feng, Hui Gao, Si Gao, Zhuoming Li, Hong Li, Jing Lu, Jiao-Jiao Wang, Xiao-Yang Huang, Min Liu, Jian Zou, et al. The orphan receptor nor1 participates in isoprenaline-induced cardiac hypertrophy by regulating parp-1. British journal of pharmacology, 172(11):2852–2863, 2015.
[76] Jaclyn N Taroni and Casey S Greene. Cross-platform normalization enables ma- chine learning model training on microarray and rna-seq data simultaneously. BioRxiv, page 118349, 2017.
[77] Roberta Migale, David A MacIntyre, Stefano Cacciatore, Yun S Lee, Henrik Hagberg, Bronwen R Herbert, Mark R Johnson, Donald Peebles, Simon N Waddington, and Phillip R Bennett. Modeling hormonal and inflammatory contributions to preterm and term labor using uterine temporal transcriptomics. BMC medicine, 14(1):86, 2016. 138
[78] Nathan Salomonis, Nathalie Cotte, Alexander C Zambon, Katherine S Pollard, Karen Vranizan, Scott W Doniger, Gregory Dolganov, and Bruce R Conklin. Identi- fying genetic networks underlying myometrial transition to labor. Genome biology, 6(2):R12, 2005.
[79] Ramkumar Menon, Elizabeth A Bonney, Jennifer Condon, Sam Mesiano, and Robert N Taylor. Novel concepts on pregnancy clocks and alarms: redundancy and synergy in human parturition. Human reproduction update, 22(5):535–560, 2016.
[80] Douglas Brubaker, Alethea Barbaro, Mark R Chance, and Sam Mesiano. A dynami- cal systems model of progesterone receptor interactions with inflammation in hu- man parturition. BMC systems biology, 10(1):79, 2016.
[81] Oksana Shynlova, Tamara Nedd-Roderique, Yunqing Li, Anna Dorogin, and Stephen J Lye. Myometrial immune cells contribute to term parturition, preterm labour and post-partum involution in mice. Journal of cellular and molecular medicine, 17(1):90–102, 2013.
[82] Mathieu Nadeau-Vallée, Christiane Quiniou, Julia Palacios, Xin Hou, Atefeh Er- fani, Ankush Madaan, Mélanie Sanchez, Kelycia Leimert, Amarilys Boudreault, François Duhamel, et al. Novel noncompetitive il-1 receptor-biased ligand prevents infection-and inflammation-induced preterm birth. The Journal of Immunology, page 1500758, 2015.
[83] Agata Sakowicz. The role of nfKb in the three stages of pregnancy: implanta- tion, maintenance and labour; a review article. BJOG: An International Journal of Obstetrics & Gynaecology, 2018.
[84] Bingbing Wang, Nataliya Parobchak, Adriana Martin, Max Rosen, Lumeng Jenny Yu, Mary Nguyen, Kseniya Gololobova, and Todd Rosen. Screening a small mole- cule library to identify inhibitors of nf-κb inducing kinase and pro-labor genes in human placenta. Scientific reports, 8(1):1657, 2018.