COMPREHENSIVE CHARACTERIZATION OF THE TRANSCRIPTIONAL SIGNALING OF PARTURITION THROUGH INTEGRATIVE ANALYSIS OF MYOMETRIAL TISSUES AND LINES

by

ZACHARY STANFIELD

Submitted in partial fulfillment of the requirements

For the degree of Doctor of Philosophy

Systems Biology and Bioinformatics

CASE WESTERN RESERVE UNIVERSITY

August, 2019 Case Western Reserve University

School of Graduate Studies

We hereby approve the thesis of

ZACHARY STANFIELD

candidate for the degree of Doctor of Philosophy1.

Committee Chair Dr. Gurkan Bebek Department of Nutrition

Committee Member, Adviser Dr. Mehmet Koyutürk Department of Computer Science and Electrical Engineering

Committee Member Dr. Sam Mesiano Department of Obstetrics and Gynecology

Committee Member Dr. Mark Cameron Department of Population and Quantitative Health Sciences

Date of Defense May 10, 2019

1We certify that written approval has been obtained for any proprietary material contained therein. Acknowledgements

I would first like to acknowledge my mentor, Dr. Mehmet Koyutürk. I knew from our first meeting that Dr. Koyutürk’s lab would be a great fit for me. He is a highly en- thusiastic and ambitious researcher as well as a relaxed, supportive, and understanding mentor. Dr. Koyutürk gave me a lot of freedom in my research but was always present to provide useful feedback and ideas on how to move forward. This allowed me the opportunity to progress in research directions I personally found both interesting and rewarding. I would also like to mention Dr. Sam Mesiano, who essentially served as my co-mentor throughout the latter part of my PhD. His extensive knowledge and experi- ence in the topic of my thesis research helped me progress quickly and better plan my project. Dr. Mesiano was always willing to offer advice and introduce me to potential collaborators at various research retreats and conferences. The amount of support and belief shown in me by both Dr. Koyutürk and Dr. Mesiano gave me the confidence and drive I needed throughout my PhD career.

Graduate school is just as much a lifestyle as a step in one’s career path. Therefore it is important to build relationships with those around you and experience life outside of research. Thankfully I have been able to meet a lot of great people through school events like intramural sports and grad student functions, as well as while exploring Cleveland.

Some of these relationships have lasted since the very beginning and will continue long after graduation. I thank each and every one of these individuals for their friendship and the memories we were able to create.

Lastly, I would like to thank my family. My parents are extremely supportive of me in every aspect of my life and I am truly grateful for it. My brother also deserves much credit, having participated in countless conversations about every possible aspect of my

iii personal and professional life throughout the years. Without my family’s continuous encouragement and guidance I would not be where I am today.

iv Table of Contents

Acknowledgements iii

List of Tables ix

List of Figuresx

Comprehensive Characterization of the Transcriptional Signaling of Human

Parturition through Integrative Analysis of Myometrial Tissues and

Cell Lines xx

Chapter 1. Introduction1

Overview of Human Parturition1

Preterm Labor2

Processes Controlling Human Parturition2

The Role of Transcriptomic Studies in Parturition Research4

The Role of Experimental Cell Lines in Parturition Research5

Hypotheses and Objectives6

1.6.1. Characterizing Tissue Expression7

1.6.2. Characterizing Cell Line Gene Expression8

1.6.3. Determining the Agreement between Tissue and Cell Lines9

Chapter 2. Materials and Methods 11

Data 11

2.1.1. Tissue Data 11

2.1.2. Cell Line Data 13

Methods 15

v 2.2.1. Raw Data Processing, Calculating Differential Expression, and Study

Concatenation 15

2.2.2. Classification 17

2.2.3. Singular Value Decomposition 18

2.2.4. Enrichment Analysis 19

2.2.5. Annotation of Vascular Smooth Muscle Contraction Pathway Diagram 22

2.2.6. Signaling Network Construction 23

2.2.7. Regulatory Network Analysis 25

2.2.8. Comparison of Myometrial Cell Line and Tissue 26

Chapter 3. Characterizing Transcriptional Alterations at the Onset of Labor in

Myometrial Tissue 28

Differential in Quiescent and Laboring Myometrium 28

Within-study Classification 31

Cross-study Classification 36

Identification of Latent Transcriptional Regulation Patterns via Singular Value

Decomposition 37

Global Identification of Pathways Altered During Parturition 41

Hypothesis-Driven Signature Identification 43

Building a Parturition Signaling Network 49

Summary 52

Chapter 4. Investigating Pathways Controlling Human Parturition in a Myometrial

Cell Line 54

Progesterone Signaling 55

vi cAMP Signaling 56

IL-1β Signaling 59

Additional Insights from Single Stimulus Comparisons 60

Anti-inflammatory Actions of P4 and cAMP 62

Synergy Between P4 and cAMP 64

Summary 67

Chapter 5. Identifying Transcriptional Similarities of Human Myometrial Tissues

and Cell Lines 68

Agreement of Transcriptional Alterations at the Gene, Pathway, and Network

Levels 68

Assessing Similarity Using Tissue Similarity Index 71

Summary 74

Chapter 6. Discussion and Conclusions 76

Tissue Analysis 76

Cell Line Analysis 83

Tissue and Cell Line Comparison 86

Conclusions 87

Chapter 7. Suggested Future Research 89

Utilizing Clinical Data 89

Confirmational Studies for Validation of Findings 90

Myometrial Cell-Specific Signaling 91

Linking the Uterine Tissues 91

Appendix A. Supplementary Tables 93

vii Appendix. Complete References 130

viii List of Tables

2.1 Myometrial tissue gene expression studies analyzed in this chapter.

Three gene expression datasets collected from publicly available

sources or collaborators are shown. Sample groups and names are

consistent with the acronyms listed in column three throughout the

chapter. Clinical definitions of labor represent how each sample was

called phenotypically, based on the patient’s clinical presentation at

the time of cesarean section, as chosen by the researchers collecting

the samples. 12

ix List of Figures

2.1 Study design and conceptual illustration of the framework for

data analysis. A. The human myometrial hTERT-HMA/B cell

line, expressing the progesterone receptors (PRs), was treated

with progesterone (P4), forskolin (FSK, induces cAMP), and IL-1β

individually and in all combinations of these three stimuli. Treatment

was performed for 24 hours to establish steady-state cellular

. RNA sequencing was performed in triplicate on each

condition, resulting in expression measurements for at total of 24

samples. B. Stimulation of the targeted pathways was confirmed by

examining the effect of different treatment combinations on mRNA

expression of IL-8, a commonly used downstream target of IL-1β

and indicator of inflammatory signaling. IL-8 induced expression is

achieved with IL-1β treatment. This induced expression is inhibited

by P4 in the presence of PR. cAMP inhibits IL-1β induced expression

of IL-8, and, finally, the strongest inhibition of IL-8 up-regulation is

achieved by treating cells with both FSK and P4. These experiments

were carried out by a collaborator, Dr. Peyvand Amini, and are shown

here to improve clarity. C. Comparisons performed to characterize the

signaling carried out by P4, cAMP,and IL-1β individually. The results

of these comparisons are shown in Figures 4.1-4.4. D. Comparisons

performed to examine the interplay between P4 and cAMP in the

context of their anti-inflammatory effects. Samples treated by IL-1β

and different combinations of P4 and FSK were first compared to

x vehicle. Subsequently, the , transcription factors, and processes

identified in these comparisons were compared to each other to

understand how P4 and cAMP work together or individually to

suppress inflammatory processes. The results of these analyses are

shown in Figures 4.5 and 4.6. 14

2.2 Data and methods used in bioinformatics pipeline. Raw gene

expression was processed in multiple steps and combined with

curated information from public databases to carry out supervised

and unsupervised approaches to identify transcriptional signatures,

assess their reproducibility, and identify key signaling pathways and

interactions involved in parturition. 16

3.1 Venn diagram of differential expression in Warwick, Detroit, and

London. Differentially expressed genes between non-labor and

in-labor sample groups for each dataset are compared to calculate

similarity across studies. 30

3.2 AUCs, ROC Curves, and sample classification frequencies for K-fold

cross-validation and cross-study classification tasks on tissue

gene expression studies using LASSO and EN. A-C. K-fold cross

validation (CV) for each dataset (k = 3 for Warwick, 5 for London and

Detroit). D-I. Classification of one dataset with training performed on

another dataset. LASSO and EN were implemented using the glmnet

package in R. J-R. Corresponding sample class frequencies for the

classification tasks shown in panels A-I, specifically using the results

xi from the top performing algorithm, based on AUC. For all prediction

tasks, the early labor (L-ILEa) and established labor (L-ILEs) samples

were grouped together as a single laboring sample group except for

sample classification frequency for 5-fold CV on London, where the

three sample groups were kept separate in order to assess how the

new initial labor group was classified. All results shown were obtained

using 100 runs of training and testing. Sample names in panels J-R

follow the naming convention in Table 2.1. 33

3.3 Distribution of singular values and examination of genes and

samples in the space spanned by the first singular vector from SVD

analysis on aggregated gene expression matrix. A. Singular values

calculated by performing SVD on the normalized, study-combined

expression matrix (12,622 genes by 71 samples). B. Mapping of the

top 25 positively and negatively valued genes in the space represented

by the first right singular vector (eigen-gene). Bars representing genes

are colored based on its differential expression in each of the four

differential expression analyses (shown in Fig. 3.1; i.e. purple bars

indicate genes in the 126 set of high-confidence genes). C. Mapping

of all tissue samples to the first left singular vector (eigen-sample),

sorted by mapping value. Bars representing samples are colored

based on their clinically defined phenotype (blue = non-laboring, red

= laboring). Sample name abbreviations in panel C follow Table 2.1. 39

3.4 Global and hypothesis-driven GSVA pathway enrichment across

all tissue studies. A. Top 30 enriched pathways (by median adjusted

xii p-value) using the hallmark gene sets from MSigDB. B. Enrichment

results for a selected set of pathways from MSigDB related to

progesterone response, smooth muscle, inflammation, and cAMP

signaling. Heat map is colored by log2(fold change) where red means

up-regulation in laboring samples and blue down-regulation. Fold

changes are calculated using an empirical Bayes model fit to the

output (from the gsva() function) gene set by sample matrix. Rows

(gene sets) are clustered using the complete clustering method based

on correlation. 44

3.5 Pathway-specific SVD analysis. SVD results for the gene sets A.

GO response to progesterone, B. hallmark inflammatory response,

and C. KEGG cAMP signaling pathway. All studies were combined

(12,622 genes x 71 tissues) and then reduced to the genes within

each gene set. Panels A, D, and G show the distribution of singular

values, panels B, E, and H show the top 25 genes associated with each

phenotype based on loading value (genes are colored by their mean

log2(fold change) value across all four sample group comparisons in

Fig. 3.1), and panels C, F, and I show the gene set activity (loading

value) for each sample (samples are colored based on their originally

reported clinical phenotype). 46

3.6 Visualization of gene expression changes on KEGG Vascular Smooth

Muscle Contraction (VSMC) pathway diagram. The Homo sapiens

VSMC pathway diagram was downloaded in KGML format from

KEGG and loaded into Cytoscape. Nodes (genes) are colored based

xiii on loading values for the dominant singular vector from performing

singular value decomposition on the reduced (to include only genes

within the VSMC pathway) Warwick expression matrix. These values

were directly loaded into Cytoscape as a node table and used in a

continuous manner in the style tab for node coloring. Blue nodes

indicate genes with higher expression in the W-NL group, red higher

expression in the W-IL group, and white no differential expression. 48

3.7 Parturition-specific signaling networks constructed using shortest

paths and differential expression data. Shortest paths were

calculated on a curated, directed, weighted signaling network.

Shortest paths were obtained from each of the three pathway initiator

(PGR, IL1R1, ADCY; chosen to represent progesterone,

inflammatory, and cAMP signaling) to each downstream target

gene. Nodes in the network were weighted based on gene loadings

from the dominant singular vector when performing SVD on the

combined expression matrix (3 studies; 12622 genes X 71 samples).

A. A general parturition signaling network. Target genes consisted

of the top 50 genes associated with the non-laboring and laboring

sample groups (100 total input genes). Nodes were weighted such

that those with strong phenotype associations had the lowest cost.

Paths falling in the top 50%, in terms of lowest costs, were retained.

B. A parturition signaling network prior to labor (i.e. quiescence).

Target genes consisted of the top 50 genes associated with the non-

laboring sample groups. Nodes were weighted such that genes highly

xiv expressed in the non-laboring samples had low costs, those with

roughly equal expression across the two phenotypes had a medium

cost, and genes with higher expression in the laboring samples had

high cost. C. A parturition signaling network during labor. Target

genes consisted of the top 50 genes associated with the non-laboring

sample groups. Nodes were weighted opposite of the quiescent

network in panel B. All paths were retained for the networks in panels

B and C. Networks were visualized in Cytoscape. Node shapes are

based on function/pathway location, node colors correspond to their

respective weights, and edge width is proportional to the weights in

the original network (thicker meaning higher confidence). 51

4.1 Comparison of individual genes that are significantly differentially

expressed in response to each individual stimulus. Venn diagrams

depicting the overlap of genes that are A. significantly differentially

expressed, B. significantly up-regulated, and C. significantly down-

regulated in hTERT-HMA/B cell lines treated by P4 (green circle),

FSK (red circle), and IL-1β (blue circle) as compared to control

samples. D. Venn diagram depicting the overlap of genes that are

significantly up-regulated in IL-1β treated cells as compared to

control and significantly down-regulated in P4 and/or FSK treated

cells as compared to control. All comparisons are single stimulus

versus control and significance is defined as having |fold change| > 1.5

and p-value < 0.05 (after adjustment for multiple hypothesis testing

using the Benjamini Hochberg procedure). 55

xv 4.2 Pathways and transcription factors that exhibit significant

enrichment in differential expression in response to treatment

by P4, FSK, and IL-1β individually. The gene set collections for

Gene Ontology Biological Processes and -targets

were obtained from the Molecular Signatures Database. Gene sets

significant (|fold change| > 1.5 and adjusted p-value < 0.05) in any

single stimulus condition versus control were identified using Gene

Set Variation Analysis (GSVA). For a subset of these significant A.

pathways and B. transcription factors, the magnitude and direction of

the collective fold change of the genes in the corresponding gene set,

as quantified by GSVA, is shown using three bars colored according to

the stimulus. 57

4.3 Regulatory networks representing the effects of each individual

stimulus. Regulatory network enrichment analysis (RNEA) was

performed and significant subnetworks were identified in samples

treated by A. P4,B. FSK (cAMP), and C. IL-1β based on gene expression

alterations compared to control. Nodes (genes and transcription

factors) are colored based on log2(fold change). Edges (regulatory

interactions) and node outlines are colored based on occurrence in

the 3 shown networks. Log2(fold change) cutoffs used for significance

in the RNEA function were 1.5 for P4 and IL-1β and 1.75 for FSK.

Visualization was performed using Cytoscape. 58

4.4 Regulation of shared targets for top enriched transcription factors

in response to individual stimuli. Comparison of how each stimulus

xvi alters expression of targets of selected transcription factors showing

strong enrichment from GSVA calculation in at least one of the

conditions P4, FSK, and IL-1β compared to control. For each of these

transcription factors, the number of known targets that are up- or

down-regulated by each stimulus and the overlap between these

targets are shown, along with the total number of known targets for

each TF. 61

4.5 Pathways that are activated/inhibited by IL-1β are

inhibited/activated by different combinations of P4 and FSK. The

gene set collections of Biological Processes and

Transcription factor-targets were obtained from the Molecular

Signatures Database. Gene sets significantly altered (|fold change| >

1.5 and adjusted p-value < 0.05) from control to IL-1β were identified

using GSVA. These gene sets were then re-evaluated by GSVA for the

conditions of P4+IL-1β, FSK+IL-1β, and FSK+P4+IL-1β compared

with control. These gene sets were classified into 4 categories

according to their regulation pattern: 1) No inhibition (activated by

IL-1β and not inhibited by the addition of P4 and/or FSK), 2)

Inhibition by P4 (activated by IL-1β but this activation was reversed

by the addition of P4), 3) Inhibition by FSK (activated/inhibited by

IL-1β but this activation/inhibition was reversed by the addition of

FSK), 4) Inhibition by P4+FSK (activated/inhibited by IL-1β, this

regulation was not reversed by the addition of P4 or FSK individually,

but it was reversed by the addition of P4 and FSK together). For each

xvii of these four categories, the fold change of sample gene sets

(representing A. Gene Ontology Biological Processes, B. Transcription

Factor Targets) in response to different combinations of stimuli are

shown. See Supplemental files 10 and 11 for enrichment fold change

values for all GO BP and TF terms, respectively, significant in IL-1β

compared to control. These include enrichment fold changes for all

IL-1β conditions. 65

4.6 Regulation interplay of genomic signaling by IL-1β, FSK, and P4. A.

Inhibition of the regulatory network induced by IL-1β (same network

and inputs used for Fig. 4.3C) by different combinations of P4 and

FSK (e.g. an orange node is significant in IL-1β vs Ctrl and remains

significant in the same direction in P4+IL-1β vs Ctrl but becomes

insignificant or significant in the opposite direction in FSK+IL-1β

vs Ctrl and FSK+P4+IL-1β vs Ctrl). B. Regulatory patterns of P4 and

FSK. The network is obtained by performing RNEA for FSK+P4 vs Ctrl

with a log2(FC) cutoff of 2.4 and p-value cutoff of 0.05. Node colors

are based on P4 vs Ctrl, FSK vs Ctrl, and FSK+P4 vs Ctrl expression

comparisons. Node shape indicates differential expression of each

node, including direction of regulation. 66

5.1 Comparison of cell line and tissue gene set enrichment of GO

Biological Processes. Gene set variation analysis (GSVA) was

performed for the conditions P4 vs CTRL, FSK vs CTRL, IL-1β vs

CTRL and W-IL vs W-NL. Simple over-represenation anlaysis was also

performed for the 126 high-confidence genes identified in Fig. 3.1

xviii using Enrichr. A-D. Overlap in enriched GO BPs for different pairwise

comparisons of cell line-based and tissue-based enrichment. Shared

terms for A and D and their enrichment fold changes/p-values can be

found in Tables 7 and 8 Appendix A. 70

5.2 Comparison of cell line and tissue samples using Tissue Similarity

Index. Singular Value Decomposition (SVD) was performed on

the tissue gene expression matrix and phenotype centroids were

calculated in the reduced SVD space. Cell line samples were then

projected to this space and their correlations to the labor centroid

were calculated. A. Median correlations and standard deviations per

sample group are shown for all genes shared between the two studies,

highly significant (|fold change| > 5 and adjusted p-value < 0.001)

genes from the individual stimulus treated cells compared to control,

a set of key genes selected from the comprehensive analyses of cell

lines reported in Figures 4.1-4.7, and highly significant genes from the

tissue study by comparing in-labor and not-in-labor samples. B. A

heatmap of the key genes with median expression across all sample

groups. C. Visualization of samples and tissue centroids using the

first two singular vectors (or components) from the SVD analysis on

the key genes selected based on the cell line analysis. 73

xix Comprehensive Characterization of the Transcriptional

Signaling of Human Parturition through Integrative Analysis of

Myometrial Tissues and Cell Lines

Abstract

by

ZACHARY STANFIELD

The process of parturition is driven by multiple transcriptional pathways that work in concert to transform the quiescent myometrium (uterine smooth muscle) to the highly contractile laboring state. Here, important genes and pathways altered at the onset of labor were identified using three transcriptomic tissue studies and computa- tional analyses including classification, singular value decomposition (SVD), pathway enrichment, and signaling network inference. The interplay between three important signaling processes (progesterone (P4), cyclic AMP (cAMP), and inflammation) identi-

fied from the tissue analysis was then characterized through comprehensive analysis of a transcriptional study involving a human myometrial cell line (hTERT-HMA/B ) treated with P4, forskolin (FSK, induces cAMP), and interleukin-1β (IL-1β, mimics inflamma- tion) alone and in all combinations. Gene set enrichment and regulatory network anal- yses was then performed to identify pathways commonly, differentially, or synergisti- cally regulated by these treatments. Finally, results from cell line and tissue analysis were compared and used alongside Tissue Similarity Index (TSI) to further characterize the correspondence between cell lines and tissue phenotypes. In the tissue analysis, a

xx high-confidence set (significant across all studies) of 126 labor-associated genes were identified, signatures associated with labor included inflammatory related genes and pathways (e.g. signaling by interleukins, NF-κB, JAK-STAT) while the non-labor pheno- type was associated with relaxation pathways (e.g. cyclic AMP and muscle relaxation), and a parturition signaling network was constructed. From the cell line analysis, we ob- served that P4 was strongly anti-inflammatory (mainly through the JUN transcription factor), cAMP was partially anti-inflammatory and promoted myometrial relaxation, and P4 and cAMP synergistically blocked specific inflammatory pathways/regulators including STAT3/6, CEBPA/B, and OCT1/7, but not NF-κB. TSI analysis indicated that

forskolin+P4- and IL-1β-treated cells exhibit transcriptional signatures highly similar to

non-laboring and laboring term myometrium, respectively. In summary, identified tis-

sue signatures help classify/characterize stages of labor and the data-derived parturition

signaling networks provide new genes/signaling interactions to aid in future studies of

parturition while our cell line results identify potential therapeutic targets to prevent

preterm birth and show that the hTERT-HMA/B cell line provides an accurate transcrip-

tional model for term myometrial tissue.

xxi 1

1 Introduction

1.1 Overview of Human Parturition

Parturition, the process of birth, involves dramatic changes, collectively referred to as

labor, in the uterine tissues. The changes include the weakening and rupture of the fetal

membranes, softening and dilation of the uterine cervix to open the gateway for birth,

and activation of the myometrium (uterine smooth muscle) such that it contracts force-

fully and rhythmically to become the engine for birth. Therefore, the process and timing

of parturition involve transition of smooth muscle of the the myometrium from a state of

quiescence to the highly contractile laboring state. For most of pregnancy the myome-

trial cells are maintained in a relaxed and compliant state and undergo hypertrophy. At

parturition, hormonal factors impacting on myometrial cells and intracellular signaling

pathways within these cells affect the expression of genes whose products increase ex-

citability and contractility to transition the uterus to the laboring phenotype wherein

rhythmic and forceful myometrial contractions coupled with softening and dilation of

the cervix expel the fetus and placenta. The hormonal and intracellular signaling path- ways and gene expression networks involved in myometrial cell activation are not clearly

understood. 2

1.2 Preterm Labor

This lack of understanding of normal parturition is the reason for the lack of effective

therapies to prevent preterm birth (delivery before 37 weeks; the World Health Organiza-

tion (WHO) estimates 15 million babies are born prematurely each year) and is a major

point of study in reproductive biology. While there are certain factors that increase the

likelihood of preterm birth (e.g. multiple pregnancies, infections, high blood pressure),

the biological mechanisms are unknown. Actions during pregnancy such as maintain-

ing a healthy diet, avoiding harmful substances like tobacco, and multiple check-ups with a physician can help aid in preventing preterm birth. However, the identification

of clinical small-molecule interventions through biomedical research will go a long way

toward achieving major reductions in preterm birth rates.

1.3 Processes Controlling Human Parturition

In all viviparous species examined to date the steroid hormone progesterone (P4) acting via the nuclear P4 receptors (PRs) is essential for the establishment and maintenance of

pregnancy. P4 signals via PR throughout pregnancy to keep the myometrium relaxed.

In most mammals, the abundance of P4 in the myometrium significantly decreases at

term, leading to the removal of P4/PR signaling. However, in the abundance of

P4 remains high through term and labor. Therefore, it has been hypothesized that labor

occurs due to a function withdrawal of P4/PR signaling.

Part of the pregnancy maintenance carried out by P4/PR is achieved by inhibit-

ing the response of myometrial cells to pro-labor/pro-inflammatory stimuli.1–5 This 3

is important because parturition is an inflammatory process that is triggered by pro-

inflammatory stimuli inducing tissue-level inflammation within the myometrium, de-

cidua and cervix. For this reason, many preterm births may be attributed to high stress,

infection, or pre-existing conditions that stimulate the inflammatory response. Desensi-

tization of uterine cells, and especially myometrial cells, to pro-inflammatory stimuli by

P4/PR is therefore considered to be a principal mechanism for the P4 block to labor.6,7

Since the myometrium is a smooth muscle that exhibits states of relaxation and hypertrophy (phenotype for most of pregnancy) as well as contractility (phenotype dur- ing labor), pathways related to the former may also work to maintain quiescence and potentially block inflammatory signaling. The 3’,5’-cyclic adenosine monophosphate

(cAMP) intracellular signaling cascade has been shown to promote uterine relaxation by inhibiting myometrial cell contractility.8,9 Typically cAMP functions as a second mes- senger, acting through -A (PKA), a kinase that phosphorylates a diverse set of downstream targets to produce various cellular responses.10 PKA suppress my- ometrial cell contractility by inhibiting light chain kinase activity and decreas- ing the intracellular concentration of free Ca2+.11–15 Studies in airway smooth muscle cells have shown that cAMP signaling inhibits response to pro-inflammatory stimuli by suppressing the activation of nuclear factor kappa-light-chain-enhancer of activated B cells (NFκB) and specific mitogen-active protein (MAPKs).16 Thus, hormones whose actions are mediated by the cAMP/PKA pathway may decrease myometrial cell

contractility by inhibiting the contractile apparatus and response to pro-inflammatory

stimuli.

A recent study found that cAMP/PKA and P4/PR signaling in myometrial cells ex-

erts a potent and synergistic anti-inflammatory effect.17 Those findings were consistent 4

with other studies showing that treatment of myometrial cells derived from term uterus with forskolin (FSK), a natural bicyclic diterpene that increases intracellular cAMP by

stimulating adenylyl cyclase, increased the anti-inflammatory actions of P4.3 Thus, anti-

inflammatory synergism between cAMP and P4/PR signaling in myometrial cells ap-

pears to be a major mechanism by which P4/PR promotes uterine quiescence for most

of pregnancy.

1.4 The Role of Transcriptomic Studies in Parturition Research

Transition of the myometrium from quiescence to labor is thought to be controlled

at the transcriptional level through changes in the expression of specific genes whose

products increase contractibility and excitability. High-dimensional transcriptome pro-

filing technologies provide an opportunity to examine the gene expression landscape within laboring and non-laboring myometrium to identify the gene sets controlling la-

bor. This approach allows unbiased discovery of regulatory pathways and mechanisms

associated with parturition.

Transcriptional differences between laboring and non-laboring human

myometrium has been examined by multiple studies in the last two decades. Early

studies examined a predefined gene set or performed functional genomics via smaller

expression arrays.18–20 Over the last decade, microarray21–26, and more recently RNA

sequencing (RNA-seq)27 platforms were used to determine the extent of expression of

thousands of genes in samples of term myometrium obtained at the time of cesarean

section delivery performed in women with no clinical signs of labor or from women in

active labor. These studies identified multiple differentially expressed genes between 5

laboring and non-laboring myometrium, especially with technologies that allowed

greater coverage of the transcriptome. Genes with previous ties to parturition (e.g.

PTGS2, IL8, OXTR) were validated in multiple studies using various experimental

techniques. On aggregate the data showed that labor is associated with inflammatory

signals, including genes/pathways related to cytokine signaling, chemotaxis, and

immune response, and to a lesser extent, genes/pathways associated with non-labor,

such as smooth muscle-related processes and cell adhesion. However, despite the

generation of numerous comparable myometrium transcriptome datasets, a

reproducible and reliable transcriptional signature and signaling network for human

labor has not been identified and comparison of the variability and consistency

between larger recent studies is lacking.

1.5 The Role of Experimental Cell Lines in Parturition Research

To expand human parturition research, cell line models have been employed in recent years. Cell lines open new avenues of investigation into pregnancy due to the level

of control allowed as opposed to the limited space for intervention in tissue samples.

Myometrial cell line models include term cultured cells28 and cell lines PHM1-4129,

PHM1-3130,31, and hTERT-HM.1,2,4,17,32 Studies using these cell lines examine various

aspects of myometrial bio-molecular actions relevant to parturition including interac-

tions/importance of the signaling and abundance of the PR isoforms (PR-A and PR-B)

and the effect of specific pro-labor stimuli (e.g. prostaglandins and oxytocin). Results

obtained from these studies have advanced the current understanding of reproductive

biology. However, most cases only focus on the effect of a single intervention and/or on a 6

very small set of genes/molecules of interest. Parturition is a highly complex physiolog-

ical event involving multiple biological genes and pathways. Therefore, research needs

to be done to understand not only how each of these pathways play a role individually

in the myometrium but also how they work in concert to achieve the transition from

quiescence to contractility. Observation of these changes on a global scale (e.g. using

genome-wide transcriptional profiling) can also help identify important aspects of the

signaling machinery relevant to the transition to the labor state. Furthermore, to make

meaningful connections between cell line observations and myometrial tissue, it is im-

portant to establish how well cell line models align at all levels of bio-molecular activity

(gene expression, pathway activity, etc.) with true myometrial tissue phenotypes.

1.6 Hypotheses and Objectives

Motivated by these considerations, a comprehensive analysis was performed on tran-

scriptional data from both clinical myometrial tissue and a myometrial cell line. A va-

riety of complementary bioinformatics methods were employed to identify global gene

expression changes occurring at the onset of labor in tissue expression data. This anal-

ysis utilized three top transcriptional datasets in the field to discover clear, reliable pat-

terns of expression in term myometrium. A similar set of techniques were applied to cell

line expression data to characterize the signaling of P4, cAMP,and inflammation individ-

ually as well as the interplay of these process in the context of parturition. Finally, to aid

in the extrapolation of the obtained cell line results to the in vivo process represented by 7

the tissues, we calculated the similarity of transcriptional landscapes across tissue phe-

notypes and a cell line with various exposure conditions. Therefore, the main hypoth-

esis of this work is that transcriptional changes in the myometrium between laboring

and non-laboring women are important in human parturition and that these changes

can be identified and characterized using a variety of bioinformatics techniques. Fur-

ther analyses involving a myometrial cell line can improve this characterization and the

overall understanding of parturition such that generation of testable hypotheses in both

clinical tissue samples and cell lines can be achieved.

1.6.1 Characterizing Tissue Gene Expression

Three tissue gene expression datasets were integrated to identify the core genes, signal-

ing networks, and biological pathways involved in the transition of term myometrium

from quiescence to the laboring state. This was achieved by applying a series of su-

pervised and unsupervised computational techniques to extract myometrial gene ex-

pression signatures involved in the onset of labor. First, standard differential expression

analysis was carried out to obtain a high-level view of the discrepancies in gene expres-

sion across phenotypes and identify a high-confidence set of genes whose expression

level is altered between labor and non-labor. Two machine learning algorithms were

then used to train and test gene expression based classifiers for predicting labor. This

approach also enabled the assessment of within-study sample quality and consistency

as well as cross-study agreement and sample group variability. To further enhance the

gene signatures identified at the first step, the learned model coefficients were used to

identify the transcriptional signatures that are most predictive of phenotype for each

dataset. 8

Unsupervised analyses were also used to investigate whether regulatory patterns extracted from integrated gene expression data contain information relating to given phenotypes. For this purpose, singular value decomposition (SVD) was applied to the combined expression matrix of all transcriptional studies to identify dominant patterns of expression, the genes that participate in the patterns, and the activity of the patterns in samples across studies. Overall our analyses showed the presence of a robust tran- scriptional signature of labor in term myometrium, both when datasets were examined individually and together. This signature was also observed at the pathway level where consistent enrichment was seen for pathways associated with labor (tumor necrosis fac- tor alpha (TNFα) signaling via NF-κB, MTORC1 signaling, and inflammatory response) and non-labor (vascular smooth muscle contraction, progesterone response, and cyclic

AMP (cAMP) signaling). Insights into the consistency of each dataset and sample were also obtained as well as the variability within laboring and non-laboring clinical tissue sample groups. Finally, observed gene and pathway signatures were integrated with existing network information to build signaling networks of parturition and validated these networks with knowledge from the literature.

1.6.2 Characterizing Cell Line Gene Expression

The inter-play of three signaling pathways known to affect the contractile phenotype of the human myometrium during pregnancy: 1) the steroid hormone progesterone (P4) acting via the P4 receptors (PRs); 2) the pro-labor/pro-inflammatory cytokine interleukin-1β (IL-1β), and 3) the 3’,5’-cyclic adenosine monophosphate (cAMP)

intracellular signaling cascade was examined. 9

RNA sequencing followed by global, unbiased gene expression analysis was per- formed to determine the effects of each signaling pathway alone and in combination.

Pathway enrichment analysis and regulatory network inference was used to fully char- acterize P4/PR, cAMP, and IL-1β signaling. The same techniques were then used to investigate the anti-inflammatory actions and synergistic effects of P4/PR and cAMP on responsiveness to IL-1β. Results show that the main action of P4/PR in myome- trial cells is anti-inflammatory, working through the JUN transcription factor, and sup- pressing pathways involved in muscle tissue development, leukocyte migration, and

MAPK signaling. cAMP also exhibited anti-inflammatory effects and, interestingly, pro- moted pathways involved in cell hypertrophy and smooth muscle contraction. P4/PR and cAMP acted synergistically to down-regulate genes involved in cytokine produc- tion, the STAT signaling cascade, and smooth muscle cell proliferation and up-regulate a set of genes controlled by transcription factors involved in developmental processes and suppression of inflammation. The data suggests that P4/PR, cAMP, and IL-1β sig- naling share a subset of regulatory machinery, allowing for the observed inter-pathway regulation.

1.6.3 Determining the Agreement between Tissue and Cell Lines

Differentially expressed genes, enriched pathways, and inferred signaling networks were compared across tissue and cell line results. Similarities were shared at all levels and provided insights into which actions of different signaling processes create the tissue phenotypes. To more directly characterize these similarities, a technique introduced in the literature, the Tissue Similarity Index (TSI), was employed. It was observed that, at 10

the transcriptional level, the hTERT-HM myometrial cell line exposed to the combina- tion of forskolin and P4 was an accurate approximation for non-laboring myometrium and IL-1β-treated cells best resembled laboring myometrium. Therefore, this observed similarity serves as an initial validation for all the results outlined in chapter 4, in terms of their extrapolation toward in vivo myometrial action, and supports the use of this cell line model in future experiments regarding the importance of various signaling path- ways in human parturition. 11

2 Materials and Methods

2.1 Data

2.1.1 Tissue Data

This work analyzed three gene expression datasets generated from myometrial biopsies of women at term ( 38 weeks) undergoing cesarean section (Table 2.1). Raw data files ≥ for the Warwick dataset were downloaded from the NCBI Gene Expression Omnibus

(GEO), accession number GSE50599, the Detroit study files were obtained directly from the study authors, and the London study raw data was published to GEO (GSE80172).

Full details of tissue collection procedures, sample phenotyping, and mRNA expression measurement can be found in the original research papers (Chan et al, 2014; Mittal et al,

2010)24,27. The following two sections describe the methods used to generate the Lon- don dataset, which were carried out by the lab of Dr. Mark Johnson at Imperial College

London.

Sample Collection. Myometrial biopsies from women undergoing cesarean section were collected in accordance with the Declaration of Helsinki guidelines, and with approval from the local research ethics committee for Chelsea & Westminster Hospital

(London, UK; Ethics No. 10/H0801/45). Informed written consent was obtained from 12

Study Platform Sample Groups Group Size Clinical Definition of Labor Non-Labor (W-NL) N = 5 Contractions (<3min apart), cervical Warwick RNA-seq In-Labor (W-IL) N = 5 dilation (>2cm), membrane rupture Non-Labor (L-NL) N = 8 London RNA-seq Early-Labor (L-ILEa) N = 8 Cervical dilation < 3cm (L-ILEa) Established-Labor (L-ILEs) N = 6 Cervical dilation > 3cm (L-ILEs) Non-Labor (W-NL) N = 20 Contractions (<10min apart), cervical Detroit Microarray In-Labor (W-IL) N = 19 dilation requiring hospitalization

Table 2.1. Myometrial tissue gene expression studies analyzed in this chapter. Three gene ex- pression datasets collected from publicly available sources or collaborators are shown. Sample groups and names are consistent with the acronyms listed in column three throughout the chap- ter. Clinical definitions of labor represent how each sample was called phenotypically, based on the patient’s clinical presentation at the time of cesarean section, as chosen by the researchers collecting the samples. all women who participated. Biopsies were excised from the upper margin of the incision made in the lower segment of the uterus, immediately washed with Dulbecco’s phosphate-buffered saline (Sigma) and dissected into pieces approximately measuring

2-3 mm3. For RNA study, biopsies were immersed in RNAlater (Sigma) within 6 minutes after biopsy excision from the uterus and stored at 4°C overnight, before being taken out of RNAlater solution to be frozen for long-term storage at -80°C. Labor was defined by the presence of regular uterine contractions associated with progressive cervical dilation. All specimens were categorized into three groups according to their labor stages: term not in labor (NL, n=8), term in initial or early labor (ILEa, defined as cervical dilatation 3 cm, n=8) and term in established labor (ILEs, defined as cervical < dilatation 3 cm, n=6). The reason NL patients underwent cesarean section included > obstetric indications (e.g. persistent breech presentation) or maternal request due to tocophobia. Women in the labor group underwent cesarean section for indications including fetal distress, breech presentation and previous cesarean section. Women with gestational diabetes, preeclampsia, chorioamnionitis, multiple pregnancy, and/or given labor-augmenting drugs (prostaglandins and oxytocin) were excluded. 13

RNA Extraction, Library Preparation, and Sequencing. For each sample, 60-100 mg of myometrium tissue were extracted in TRIzol (Life Technologies) by mechanical ho- mogenisation in a Precellys 24 bead-based homogeniser using 5 cycles of 5000 rpm for 20 seconds, before chloroform treatment and centrifugation at 4°C. RNA was ex- tracted from the aqueous phase of centrifuged homogenates using the TRIzol Plus RNA

Purification kit (Life Technologies) with on-column DNase treatment prior to elution, all according to manufacturer’s instructions. Final RNA samples were stored at -80°C.

The quantity and quality RNA was measured using a Nanodrop ND-1000 spectropho- tometer (LabTech), Qubit fluorimeter (Life Technologies) and Bioanalyser 2100 (Agilent

Technologies). Preparation of cDNA libraries was carried out using the TruSeq Stranded mRNA Sample Preparation kit (Illumina), following the high-throughput sample (HT) protocol. The quantity and quality of cDNA libraries were also tested by a Qubit fluo- rimeter and Bioanalyser 2100. TruSeq Stranded libraries were then multiplexed and se- quenced with the average of 42 million DNA fragments per sample (100 bp paired-end reads). Quality control was performed using FastQC software (version 0.11.2).

2.1.2 Cell Line Data

Data was obtained from cell culture studies performed in a telomerase immortalized human myometrial cell line, hTERT-HMA/B, expressing stable and independently in- ducible PR-A and PR-B as described in Amini et al. (2018)17. Briefly, cells were induced

to express equal levels of PR-A and PR-B, and exposed to progesterone (P4), IL-1β, and forskolin (FSK, to stimulate cyclic AMP) alone and in all combinations (Fig. 1A). All ex- perimental conditions were performed in triplicate. Stimulation of these pathways was confirmed by qRT-PCR showing IL-1β induced expression of IL-8 and that this induced 14

A B

CD

Figure 2.1. Study design and conceptual illustration of the framework for data analysis. A. The human myometrial hTERT-HMA/B cell line, expressing the progesterone receptors (PRs), was treated with progesterone (P4), forskolin (FSK, induces cAMP), and IL-1β individually and in all combinations of these three stimuli. Treatment was performed for 24 hours to establish steady-state cellular transcription. RNA sequencing was performed in triplicate on each con- dition, resulting in expression measurements for at total of 24 samples. B. Stimulation of the targeted pathways was confirmed by examining the effect of different treatment combinations on mRNA expression of IL-8, a commonly used downstream target of IL-1β and indicator of in- flammatory signaling. IL-8 induced expression is achieved with IL-1β treatment. This induced expression is inhibited by P4 in the presence of PR. cAMP inhibits IL-1β induced expression of IL-8, and, finally, the strongest inhibition of IL-8 up-regulation is achieved by treating cells with both FSK and P4. These experiments were carried out by a collaborator, Dr. Peyvand Amini, and are shown here to improve clarity. C. Comparisons performed to characterize the signal- ing carried out by P4, cAMP,and IL-1β individually. The results of these comparisons are shown in Figures 4.1-4.4. D. Comparisons performed to examine the interplay between P4 and cAMP in the context of their anti-inflammatory effects. Samples treated by IL-1β and different com- binations of P4 and FSK were first compared to vehicle. Subsequently, the genes, transcription factors, and processes identified in these comparisons were compared to each other to under- stand how P4 and cAMP work together or individually to suppress inflammatory processes. The results of these analyses are shown in Figures 4.5 and 4.6. expression of IL-8 was inhibited by P4 in the presence of PR and by FSK (Fig. 1B). Figure

1C-D depicts the comparisons of various conditions performed throughout this chapter. 15

2.2 Methods

2.2.1 Raw Data Processing, Calculating Differential Expression, and Study

Concatenation

RNA-seq datasets (Warwick, London, and cell line data) were processed separately using identical pipelines. FASTQ data files were analyzed using FASTQC (http://www.bioinfor matics.bbsrc.ac.uk/projects/fastqc). Transcripts were aligned using tophat against the

University of California Santa Cruz (UCSC) hg19 reference transcriptome.33 Transcripts were assembled and RNA abundance was estimated using cufflinks on the Bam files

resulting from tophat alignment (version 2.2.1)34. A single run of the cuffdiff com- mand was run on all samples for each dataset. The Warwick and London tissue gene expression matrices were built using the genes.read_group_tracking cuffdiff output file.

Genes having zero expression in three or more samples were removed prior to down- stream analysis. For the cell line dataset, the cuffdiff output was further analyzed using cummeRbund in R (version 3.5.1).35 Differential expression was performed for labor- ing vs non-laboring phenotypes for the tissue studies and all sample group compar- isons were performed for the cell line study. Genes exhibiting f old change 1.5 | | > and ad justed p value 0.05 (adjusted using the Benjamini Hochberg procedure − < for multiple hypothesis test correction) were deemed significant. The cell line gene ex-

pression matrix was obtained using the repFpkmMatrix() function in the cummRbund

package. The microarray dataset (Detroit; Illumina) was processed as in Mittal et al

24 (2010). Briefly, the raw data file was log2 transformed followed by quantile normal-

ization. Probes having expression within the background range in at least five samples were removed. Duplicate rows were also removed. 16

As will be described below, certain analyses employed involved working with an ex- pression matrix containing all tissue samples (i.e. concatenation of the expression ma- trices from all 3 studies). In these cases, additional processing and normalization was performed on the individual matrices. This is due to the differences in the RNA-seq and microarray platforms, which each have specific characteristics and expression distribu- tions. For this reason, any classification or comparison of datasets from these different platforms required prior normalization. The two RNA-seq studies, Warwick and Lon- don, were transformed using log (X 1) in order to shift them to a similar scale as the 2 + Detroit microarray study, which, as mentioned above, was already log transformed. All datasets were then row and column normalized to achieve comparable distributions of expression across genes and samples. A descriptive diagram of the described normal- ization and downstream analysis pipeline (see next sections) performed on the tissue studies (results in chapter 3) is provided to improve clarity (Fig. 2.2).

Figure 2.2. Data and methods used in bioinformatics pipeline. Raw gene expression was pro- cessed in multiple steps and combined with curated information from public databases to carry out supervised and unsupervised approaches to identify transcriptional signatures, assess their reproducibility, and identify key signaling pathways and interactions involved in parturition. 17

2.2.2 Classification

The R package glmnet was employed to perform classification with the least absolute shrinkage and selection operator (LASSO) and elastic net (EN) algorithms.36 Classifica- tion was performed using the cv.glmnet function, setting the “family” argument to “bi- nomial” for Warwick and Detroit (two sample groups: labor and non-labor) and “multi- nomial” for London (three sample groups: non labor, early labor, and established la- bor). Dataset and sample group variability were assessed by performing k-fold cross- validation on each study with k=3 for Warwick and k=5 for London and Detroit. A smaller k value was chosen for Warwick as the study consisted of only 10 samples, while 5 was chosen for London and Detroit as 5-fold cross-validation is common practice in many classification problems. For drawing of Receiver Operating Characteristic (ROC) curves and calculation of the Area Under the ROC Curve (AUC; measure of classification per- formance), the London dataset was converted to a binary problem by combining the early labor (L-ILEa) and established labor (L-ILEs) sample groups into a single laboring group whereas the multinomial method was used to calculate sample classification fre- quencies. Due to the variability of cross validation, one hundred rounds of training and prediction was carried out for each classification task. Model features (genes) and their coefficients were recorded for each training run. To assess the importance of each gene in discriminating labor from non-labor samples, computation of the number of models in which a gene was retained as a feature and each gene’s mean model coefficient was performed.

To assess the reproducibility of the machine learning models, cross-classification was also performed, namely by making predictions on each of the three datasets using the model that was trained on each of the other two studies individually. This resulted 18

in six cross-classification experiments, in which each dataset served as the training data in three experiments and as the test data in three experiments. Except for the London dataset, which was reduced to the L-NL and L-ILEs samples for consistency with the other two studies, the full datasets were used in training, which included k-fold cross- validation to obtain optimal model parameters as calculated by the cv.glmnet function within the glmnet package. All studies were binomial classification problems and the

training and prediction was performed one hundred times.

2.2.3 Singular Value Decomposition

To extract common patterns and gene expression signatures across the three studies,

singular value decomposition (SVD) was used.37 SVD is an unsupervised mathematical

approach that maps a high dimensional matrix to lower dimensional space and iden-

tifies patterns of variability within the data. For a given gene expression matrix M of

size mxn (m genes and n samples), SVD decomposes this matrix into three matrices U,

Σ, and V T (the transpose of V ) such that M UΣV T . Σ is a diagonal matrix consist- = ing of singular values sorted from largest (first entry in matrix) to smallest (last entry).

These singular values correspond to the strength of each pattern in the data (i.e. how well the pattern approximates the variability within M). U and V contain the left and right singular vectors where columns of U are considered eigen-assays and rows of V T eigen-genes.38

Using the normalized gene expression matrices (see previous section) for the three studies, a combined gene expression matrix M consisting of all 71 samples (10 Warwick

+ 22 London + 39 Detroit) and the 12,622 genes measured in all three studies was con- structed. Singular value decomposition was then applied to M. Singular values were 19

assessed to determine the significant patterns within the expression matrix by observ- ing the diagonal entries of Σ. The dominant gene pattern was obtained by plotting the

first column of U and the dominant sample pattern is represented by the first row of V T .

For pathway-specific SVD analysis, the same procedure as above was performed with the difference being the expression matrix on which the analysis was done. Genes for each pathway or gene set were extracted and the gene expression matrix was reduced from the 12,622 shared, measured genes to only contain the genes involved in the path- way. The procedure was done individually for each pathway.

2.2.4 Enrichment Analysis

Tissue Data. Pathway enrichment was performed in two ways to better interpret the gene expression signatures obtained from the differential expression analysis. A high level approach was used by inputting the gene symbols of the 126 high-confidence genes, obtained from comparing differential expression across all studies (Fig. 3.1), into the web-tool Enrichr to determine which pathways were most strongly and consistently altered between laboring and non-laboring myometrium.39 Enrichment was assessed using the KEGG 2016 result under the Pathway tab on the Enrichr results webpage.40 Terms were sorted by adjusted p-value. A within-dataset approach was used by performing Gene Set Variation Analysis (GSVA) to identify the pathways most differentially expressed between labor and non-labor conditions specific to each dataset. GSVA is a publicly available tool that is implemented in R to identify the gene sets that are collectively differentially expressed between two different experimental conditions41. GSVA is a non-parametric method applied to the full expression matrix as opposed to working with a list of differential expression scores for each gene. It 20

transforms the gene by sample matrix to a gene set by sample matrix. This allows for use of expression information about every gene (including non-significant ones) in a given gene set to asses the collective differential expression of the set and calculates an enrichment fold change for each gene set. Application of GSVA across all three tissue studies allowed for observation of signature consistency at the pathway level. The hallmark gene sets from the molecular signatures database (MSigDB; version 5.2) was obtained via the Broad Institute website and loaded into R using the getGmt() function42,43. GSVA was then applied individually to the expression matrices of the

Warwick, London, and Detroit studies after IQR filtering to remove genes with low expression variance across all samples. The raw expression matrices from the Warwick and London studies were log2() transformed and run with the argument

RNA-seq=FALSE, as recommended in the GSVA documentation. Gene sets of size 5 to

500 were allowed in the enrichment analysis. Finally, the limma package was used to fit models to each gene set, and significance was calculated by applying an empirical

Bayes statistical test44. P-values were adjusted using the Benjamini Hochberg procedure for multiple hypothesis testing. Clustering of the enrichment results was performed on the top 30 genes sets (selected by median p-value over the 4 study comparisons) using the pheatmap package with correlation between fold change enrichment as the distance measure and complete clustering as the clustering method.

For the hypothesis-driven pathway analysis, MSigDB was used to identify and build a set of parturition-relevant gene sets (P4, inflammation, cAMP/smooth muscle) from multiple sources (GO, KEGG, etc.). GSVA was then implemented on 32 gene sets for each study individually. Of this initial set, 26 met the 5 gene size requirement (previous 21

paragraph) after IQR filtering. Results for all 26 gene sets were then clustered by GSVA fold change values as with the global analysis.

In all figures, the log2 enrichment fold change is used. This is achieved by calling the gsva() function on the gene expression matrix to obtain a matrix whose entries are the enrichment scores for a given pathway (row) in a given sample (column). These are then fit to an empirical Bayes model to identify significantly associated gene sets and calculate the enrichment fold change with respect to the condition of interest (e.g. labor, P4+FSK, etc.).

Cell Line Data. GSVA was used to identify pathways and transcription factors that were enriched in differentially expressed genes/targets between treated cell line conditions and the control condition. The input expression matrix derived from the cell line RNA- seq contained 16,500 genes and 24 samples after removal of genes exhibiting zero ex- pression in more than 3 samples. The Gene Ontology Biological Processes (GO BPs) and Transcription Factor-Targets gene set collections were obtained from the molecu- lar signatures database (MSigDB) via the Broad Institute website43. Gene sets for the

FOS and JUN transcription factors were missing from the Transcription Factor-Targets

gene sets. Considering these are of much interest and relevance in inflammatory sig-

naling, and therefore parturition research, sets for FOS and JUN were created using the

curated transcription factor network employed in the regulatory network enrichment

analysis method (described below; includes data from the TRED, TRRUST, TFACTS, and

OREGANNO databases). These gene sets were loaded into R using the getGmt() func-

tion. The same procedure employed for the tissue data was applied here (log2() trans-

formation, IQR filtering, gene set size restriction, fitting models with limma, and p-value 22

adjustment using the Benjamini Hochberg procedure). GSVA was applied for each con- dition against the control sample group.

To gain insights into how targets of TFs enriched in multiple conditions were reg- ulated, the overlap of their significantly differentially expressed targets in cells treated with P4, FSK, and IL-1β individually as compared to control was calculated. For each significant target of a given TF in a given comparison, their direction of fold change (up- or down-regulated) was identified using the original differential expression calculation.

The sets of up- and down-regulated targets for each TF were compared across different cell line treatment groups.

2.2.5 Annotation of Vascular Smooth Muscle Contraction Pathway Diagram

The vascular smooth muscle contraction (VSMC) pathway diagram was retrieved from the KEGG pathway database. The pathway information was obtained by downloading the associated KEGG Markup Language (KGML) file for the “Homo sapiens (human)” option in the organism drop-down bar on the KEGG pathway website. This pathway was imported into Cytoscape version 3.3. Node (gene) colors were assigned by performing

SVD on the Warwick gene expression matrix reduced to the genes in the VSMC path- way (see Singular Value Decomposition section of this chapter). The Warwick study was used due to its observed higher quality in the classification tasks and because it allowed for population of the greatest proportion of nodes in the pathway with a gene loading value. These gene loadings for the dominant singular vector were imported into Cy- toscape as a node table and used in a continuous mapping for the node color (blue rep- resents higher association with non-labor, red higher association with laboring samples) attribute within the “Style” tab. 23

2.2.6 Signaling Network Construction

Parturition signaling networks were created by performing shortest path calculations on a curated signaling network from Ritz et al. (2016) (Pathlinker version 1.1).45 Briefly, this network was built by integrating multiple protein-protein interaction networks with pathway databases to create a high-confidence, directed signaling network. Edges in this network were weighted based on experimental evidence and commonality of gene ontology annotations. Nodes were then weighted based on the gene loadings from the

first singular vector obtained from applying SVD on the normalized, combined tissue expression matrix (12,622 genes X 71 tissue samples). The raw gene loadings were nor- malized to the range of [0,1] after performing absolute value to obtain appropriate node costs (nodes with cost near zero were genes with highly negative and positive singular value loadings, the differentially expressed genes, making them more likely to be chosen in the final paths).

Shortest paths were then computed between upstream pathway initiator proteins to downstream target genes. Specifically, Dijkstra’s algorithm was used to calculate the shortest path between each pathway initiator and each target gene.46 Additionally, a requirement of these shortest paths was that the last step in the path be a known tran- scription factor-target gene interaction (this target being one of the downstream tar- get genes based on the gene expression studies), where transcription factor interactions were identified utilizing the TRED47, TFactS48, Oreganno49, and TRRUST50 databases

(combined interactions from Chouvardas et al., 201651). The pathway initiator pro-

teins chosen here were PGR, interleuking 1 receptor type 1 (IL1R1), and adenylyl cyclase

(ADCY) to represent the first protein in the signaling cascade for P4 signaling, inflam-

matory signaling, and cAMP signaling, respectively, which were believed to play a role in 24

controlling parturition through existing hypotheses and our targeted pathway analyses.

ADCY1-9 all connect to the same set of six proteins directly downstream in the signal- ing network, meaning that each ADCY has the same interactions or path options when building the network. Therefore, any of these ADCY nodes can be chosen as the start of the path. ADCY9 was chosen out of these nine possible proteins due to its loading for the dominant singular vector, which was the closest in magnitude to the loadings for

PGR and IL1R1. This helped ensure that the cAMP portion of the obtained networks was not unfairly diminished (increased) by a low (high) receptor node value when restricting the final network based on cost of paths.

Three separate gene sets were used to seed the shortest path calculations as the tar- get genes. For the general parturition network, the top 50 genes associated with each phenotype based on the loading value from the first singular vector on the full tissue expression matrix were used (100 total nodes). For the non-laboring network, the top

50 genes associated with the non-labor phenotype (again, using gene values from the

SVD analysis) were used as the downstream targets. Finally, the top 50 genes associated with the labor phenotype were used to create the laboring signaling network. For these latter two shortest paths calculations, the node weights of all genes in the original net- work were altered to favor paths with incorporated gene products related to the chosen downstream genes. In other words, for non-laboring, the raw gene loadings from SVD were again normalized to the range of [0,1] but without the prior application of absolute value. This ensured that genes highly associated with the non-labor signal have a low cost in the network while those genes associated with the labor signal have a high cost.

These node weights were then reversed for the laboring network. Finally, a path cost 25

threshold was applied to the first of these three networks to obtain a network of rea- sonable size (Fig. 3.7A). For this set of target genes, the cost of each unique path (sum of node and edge weights making up path from initiator protein to target gene) was treated as a distribution, ranked from least to most costly. The final network was then obtained after discarding the top 50% of paths when sorted from highest to lowest cost.

2.2.7 Regulatory Network Analysis

Regulatory network enrichment analysis (RNEA) was employed on the cell line dataset

(results in chapter 4) to infer the genomic signaling induced by each stimuli, as well as compare the networks of different stimuli.51 For RNEA, fold changes and adjusted p- values from differential expression were used to build a regulatory network specific to an experimental condition, drawing from curated transcription factor (TF) target data.

Differentially expressed TFs and their targets were included in the network and non- significant TFs along a pathway that have a significant gene both directly upstream and downstream were also included in the final output network. Differential expression sta- tistics were obtained for all genes using the diffData() function in the cummeRbund package. These were analyzed using the RNEA() function in R, requiring a fold change greater than 1.5 and an adjusted p-value less than 0.05 for significance. Networks were visualized using Cytoscape, with fold change values mapped to the included nodes via the same input data that was used for RNEA.52 The networks in Figures 4.3 and 4.6 are subnetworks of the full RNEA result achieved using stricter input arguments for fold change and p-value for the sake of identifying the most important changes. The hier- archical layout was applied for all networks. Nodes in Figure 4.6 were colored based 26

on gene regulation across multiple comparisons of differential expression (e.g. all IL-1β conditions compared to control for Figure 4.6A).

2.2.8 Comparison of Myometrial Cell Line and Tissue

To compare the tissue phenotypes and treated cell lines, the Warwick and cell line gene expression matrices were used. The two matrices were individually log2() normalized,

reduced to the set of genes measured in both studies, and concatenated such that the re-

sulting expression matrix contained 34 samples (10 tissue, 24 cell line) and 15388 genes

(G 15388 intersect(G , G ) where G is the set of genes in a gene expres- comb = = ti ssue cell line sion matrix). The R function removeBatchEffect() (part of the limma package) was then

used to further normalize the expression matrix to make the two datasets comparable.

The data was split back into the tissue and cell line matrices, which were then individ-

ually row (gene) centered using the function scale() with the input arguments center =

TRUE and scale = FALSE.

To quantify the similarity in expression between the tissue and cell line samples, a

method introduced by Sandberg and Ernberg (2005) called tissue similarity index (TSI) was implemented in R 53. Briefly, singular value decomposition (SVD) was applied to

the tissue gene expression matrix after batch removal and row-centering. A phenotype

centroid was calculated for the laboring and non-laboring sample groups by averaging

the projected tissue samples using all singular vectors. The cell line samples were then

projected into the SVD space, again using all singular vectors derived from the tissue

data. Correlations between the projected cell line samples and the phenotype centroids were calculated and median and standard deviation values over treatment conditions 27

were obtained. Visualization of sample clustering was performed using the first two singular vectors.

Tissue Similarity Index was computed in the same manner on gene expression ma- trices reduced to specific sets of genes after batch removal and before row centering.

Genes differentially expressed with a fold change 5 and adjusted p-value 0.001 in | | > < P4, FSK, and IL-1β treated cells versus CTRL (145 genes) and in labor versus non-labor tissue (151 genes) were identified for TSI calculation. Key cell line genes were manually selected by identifying highly differentially expressed cell line genes across multiple in- dividual stimuli versus control comparisons that additionally occurred in the visualized

RNEA subnetworks. A heatmap of key genes was generated using the pheatmap pack- age in R. Median expression values were obtained for each sample type using the same input matrix as was used with TSI. Rows (genes) were clustered by correlation using the complete clustering method. Sample clustering based on TSI results was visualized us- ing ggplot2. Tissue samples, phenotype centroids, and cell line samples were projected onto the first two singular vectors from the SVD calculation using the key cell line genes. 28

3 Characterizing Transcriptional

Alterations at the Onset of Labor in

Myometrial Tissue

3.1 Differential Gene Expression in Quiescent and Laboring

Myometrium

Differential gene expression analysis on three transcription studies (Table 2.1) was used to assess the transcriptional landscape of quiescent and laboring myometrium. The transcriptome datasets comprised two previously published high-quality studies.24,27

The Chan et al. (2014) dataset is from a RNA-seq analysis of non-labor (NL; n=5) and in labor (IL; n=5) term myometrium collected from patients in Warwick, UK. This dataset was designated Warwick (W)-NL and W-IL. The Mittal et al. (2010) dataset is from a microarray analysis of NL (n=20) and IL (n=19) term myometrium patients in Detroit,

MI, USA. This dataset was designated Detroit (D)-NL and D-IL. A third RNA-seq dataset using myometrium collected from 22 women at term cesarean section delivery under- taken at the Chelsea & Westminster Hospital, London, UK was also used. This study, 29

designated London (L), included L-NL (n=8) and L-IL (n=14) myometrium. The L-IL tissue was further stratified into early (Ea) stages of active labor (L-ILEa: defined as con- tractions with cervical dilation <3 cm, n=8) and during established (Es) labor (L-ILEs: defined as contractions with cervical dilation >3 cm, n=6). (See Materials and Methods for more detail regarding sample collection and RNA preparation and measurement.)

For the London dataset, significant ( fold change 1.5 and p-value 0.05) differ- | | > < ential expression was detected for 417 genes between L-NL and L-ILEa (397 up and 20 down in L-ILEa), 547 between L-NL and L-ILEs (440 up and 107 down in L-ILEs), and, surprisingly, none between L-ILEs and L-ILEa. Initial analyses detected no significant difference in the transcriptomes of L-ILEa and L-ILEs, which suggests that gene expres- sion changes associated with the onset of labor persist through early and late stages of the parturition process. This outcome also indicates that labor associated changes in myometrial gene expression occur early in the labor process and persist.

Analysis of the Warwick dataset identified 2468 genes differentially expressed

(1227 increased and 1241 decreased based on mRNA abundance) in W-IL compared with W-NL myometrium. In the parent publication, the results of three separate software packages was used to calculate differential expression to improve confidence27. Results shown in this section are comparable in the number of significant genes to one of these packages, edgeR.

Analysis of the Detroit dataset identified 463 genes that were differentially expressed (267 up and 196 down) in D-IL compared with D-NL myometrium. This is highly consistent with the number (471) obtained in the parent publication24. Figure

3.1 shows the overlap of differentially expressed genes across the four within-study

phenotype comparisons. For the Detroit and London comparisons a similar number of 30

genes were differentially expressed between the NL and IL tissues, with the majority of genes up-regulated in the IL myometrium while the Warwick comparison showed a very strong change in expression with almost five times as many differentially expressed genes compared to the other datasets. Pairwise comparisons between all labor conditions in the three studies identified 358 shared genes for Warwick and

Detroit, 339 for Warwick and London (L-ILEa), 410 for Warwick and London (L-ILEs),

156 for Detroit and London (L-ILEa), 180 for Detroit and London (L-ILEs), and 282 for

L-ILEa and L-ILEs.

126 genes (124 in- creased and 2 decreased) were identified to be dif- ferentially expressed in all comparisons (see Appen- dix A Table 1 for a list of these genes with their median log2(fold change) and median adjusted p- Figure 3.1. Venn diagram of differential expression value). Thus, there is high in Warwick, Detroit, and London. Differentially ex- pressed genes between non-labor and in-labor sam- confidence that these 126 ple groups for each dataset are compared to calculate similarity across studies. genes are true positives for identifying biological processes that likely initiate and/or maintain the labor phenotype.

Using these high-confidence genes, pathway enrichment analysis was performed us- ing the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways with the online 31

tool EnrichR. This analysis identified multiple significant pathways including TNF sig- naling (adjusted p-value = 1.2e-9), cytokine-cytokine receptor interaction (4.6e-9), Jak-

STAT signaling (2.4e-7), NOD-like receptor signaling (2.3e-4), and chemokine signaling

(8.9e-4). These results indicate a strong inflammatory component, which is supported by previous studies and suggests that a significant portion of the transcriptional changes shared by all studies represents an increase of inflammatory signaling with the onset of labor.

3.2 Within-study Classification

To identify the best set of genes that distinguish labor from non-labor and assess the consistency of samples within and across datasets, two sparse regression-based classi-

fication approaches that are well-established in the area of machine learning: 1) least absolute shrinkage and selection operator (LASSO), and 2) elastic net (EN) were em- ployed.

LASSO utilizes a regularization term, the L1 norm of model coefficients, penaliz- ing models that contain more features.54 This results in more parsimonious predictive models, which allows for easier interpretation of the final classifiers (i.e. small num- ber of genes in the model). EN is an extension of LASSO in that it includes an addi- tional penalty term in the form of the L2 norm of the model weights.55 This serves to avoid shortcomings of LASSO, such as its restriction to choosing at most n (the number of samples) features in its model and tendency to only select one feature from sets of highly correlated features. With this additional term in the objective function, EN mod- els typically include more features (genes) than the models built by LASSO. Here, we 32

employed both EN and LASSO to obtain the smallest set of genes that can predict labor

(LASSO), while also allowing models with a larger number of genes (EN), of which some may exhibit highly similar patterns of expression. The latter can result in models with multiple genes that show similar differential expression across the IL and NL tissues, all of which can be of interest for our analysis. Besides providing a means to identify genes that can serve as predictive features, use of classifiers also provides a way to assess the consistency of sample groups within and across datasets.

Using the R glmnet package, 100 iterations of k-fold cross validation (CV) were preformed to train and test the models. Nested cross validation was used to optimize model parameters. Due to the varying number of samples in each study, we set k=3 for the Warwick dataset and k=5 for the London and Detroit datasets. The performance of

LASSO and EN in classifying the samples from each study was assessed by visualization of ROC curves and calculation of AUC (Fig. 3.2A-C). Varying levels of performance were achieved across the datasets, with AUC being near perfect (>99%) for Warwick, good for

Detroit ( 90%), and fair for London ( 75%). In these experiments, for the London dataset, we grouped the L-ILEa and L-ILEs samples into one L-IL group so that labor was repre- sented as a single phenotype for the calculation of AUC.

The frequency with which each sample was classified into each phenotype over the

100 runs of training and testing was also examined (Fig. 3.2J-L). Based on these results, several interesting patterns were observed. First, for classification frequency calcula- tions, the two L-IL sample types were ungrouped to assess how they separated based on the built classification models. The L-ILEa class appeared to be more variable and diffi- cult to classify, seeming to have signatures partially represented in the L-NL and L-ILEs group. Second, classification of the non-labor samples was highly accurate across all 33

A B C

D E F

G H I

J K L

M N O

P Q R

Figure 3.2. AUCs, ROC Curves, and sample classification frequencies for K-fold cross- validation and cross-study classification tasks on tissue gene expression studies using LASSO and EN. A-C. K-fold cross validation (CV) for each dataset (k = 3 for Warwick, 5 for London and Detroit). D-I. Classification of one dataset with training performed on another dataset. LASSO and EN were implemented using the glmnet package in R. J-R. Corresponding sample class fre- quencies for the classification tasks shown in panels A-I, specifically using the results from the top performing algorithm, based on AUC. For all prediction tasks, the early labor (L-ILEa) and established labor (L-ILEs) samples were grouped together as a single laboring sample group ex- cept for sample classification frequency for 5-fold CV on London, where the three sample groups were kept separate in order to assess how the new initial labor group was classified. All results shown were obtained using 100 runs of training and testing. Sample names in panels J-R follow the naming convention in Table 2.1. 34

studies, suggesting the presence of a robust gene expression profile characteristic of the quiescent pre-labor myometrium. Finally, there were samples with a classification dis- tribution that was highly contradictory to their given phenotype (e.g. L-ILEa6, D-NL18,

D-IL7, and D-IL16).

To gather information on the genes that were used to achieve the classification re- sults, the features selected in the models built during training were examined. For the

Warwick dataset, 147 genes were selected in at least one EN model, and 22 of the genes were selected in all 100 EN models (Appendix A Table 2). For the London dataset, to fur- ther observe how the L-ILEa group was classified, the L-ILEa and L-ILEs samples for this sample classification frequency calculation were not combined. For this reason, there are three classes in the classification task, meaning it is a multinomial problem. In this case, when assessing models and the gene coefficients, there are genes associated with each class (these are the genes that best separate each class from the other two). The re- sults were 7 unique genes for the L-NL class, 12 for the L-ILEa class, and 12 for the L-ILEs class in at least one of the 100 LASSO models (Appendix A Tables 2-4). For the Detroit dataset, 98 genes were selected in at least one of the 100 EN models, 32 of which were selected in all models (Appendix A Table 5). The models were approximately evenly di- vided between genes increased in IL samples and those decreased in NL samples across the three studies. (Supplementary tables include model genes and average coefficients.)

Next, the genes selected by the classification models to those that were identified via differential expression analysis were compared. 97 of 147 Warwick model genes were significantly differentially expressed in the Warwick dataset, two of these were high- confidence genes that were differentially expressed in all comparisons (total of 126). The two high-confidence genes were also among the 22 genes that were selected in all 100 35

Warwick models. In contrast, 19 of the 98 Detroit model genes were significantly differ- entially expressed in the Detroit dataset with only one being a high-confidence gene. Of the 32 genes that were selected in all Detroit models, 12 were differentially expressed in the Detroit dataset, and one was a high-confidence gene. Finally, 2 of the 7 L-NL, 1 of the 12 L-ILEa, and none of the 12 L-ILEs genes for the London models were significantly differentially expressed in the London dataset. The number of significant genes in the models for each dataset correlate with the accuracy of classification (e.g. Warwick had the highest percentage of incorporated significantly called genes and the highest AUC).

Interestingly, models built on different datasets shared only two genes, MITF and

CLK2, which were found in respectively 100 and 87 of the Warwick EN models and in re- spectively 96 and 18 of the Detroit EN models. However, neither of these two genes were among the 126 high-confidence genes that were identified via differential expression analysis. It has been shown that MITF is involved in the suppression of IL-8, a classic inflammatory gene, in the cervix during pregnancy.56 This agrees with the classification models derived from the Warwick and Detroit datasets as MITF has a negative model coefficient (higher expression in non-labor group). No direct associates between CLK2 and parturition were observed in the literature, however, data from the human protein atlas shows that CLK2 exhibits a higher expression in leukocytes compared to smooth muscle.57 Therefore, the positive model coefficient observed for CLK2 may represent leukocytes (accompanying increased inflammation) in the laboring tissue. 36

3.3 Cross-study Classification

The overlap between models learned from different studies is low in terms of gene con- tent, although all models represent the difference between myometrial quiescence and labor. Based on this observation, it was hypothesized that the models learned using each dataset may be capturing a different aspect of the signaling processes that under- lie labor. To test this hypothesis, the samples in each dataset were classified by using models learned from each of the other two datasets. Note that training on London data was done only using the L-NL and L-ILEs sample groups to be consistent with Warwick and Detroit protocols. As in the previous section, LASSO and EN were used for classifi- cation, nested k-fold cross validation was used to optimize model parameters, and 100 iterations of training and testing were performed.

The results of cross-study classification are shown in Fig 3.2D-I. Since there are three studies and training on each dataset was followed by testing on the remaining two datasets, there were a total of six cross-study classification tasks. Assessing the average performance in each of the six classification tasks, it was observed that models learned on one dataset were overall successful in classifying samples from a separate dataset, with AUCs ranging from 0.676 to 1 and a median AUC of 0.817. Classification of the War- wick samples proved to be a simple task for the models built using each of the London and Detroit datasets. This is likely because the Warwick dataset has a larger set (about

2,000 more than the other two studies) of differentially expressed genes between the la- boring and non-laboring groups. It was also observed that LASSO models performed better for classification of the London samples relative to EN. This could be due to the fact that many genes in a model may create additional uncertainty for classifying the more volatile L-ILEa samples. Another interesting result is that models derived from 37

the Warwick and Detroit datasets produced similar performance on the London dataset despite the fact that Detroit has almost four times as many samples.

As was done for the cross-validation analysis, computation of the sample classi-

fication frequencies for the cross-classification tasks was performed (Fig. 3.2M-R). In other words, the confidence with which one dataset correctly called the phenotype of a given sample from another dataset was calculated. Samples not always classified into the same group may have a weaker signature representation than others. Interestingly samples highly misclassified in the cross-validation analysis (e.g. L-ILEa6, D-IL7, and

D-IL16) showed the same pattern when classified using other datasets.

Overall, the cross-study analysis indicates a robust transcriptional signature differ- entiating NL from IL across the three datasets as performance in all classification tasks ranged from moderately accurate (AUC = 0.7) to highly accurate (AUC = 1). This is fairly impressive for a gene expression based classifier given that these types of classifiers of- ten fail to generate reproducible models due to experimental noise, heterogeneity of samples, and the limitations of mRNA-level expression in capturing changes in protein activity and function. This observation suggests that transcriptional regulation may play a key role in the transition of myometrial tissue from quiescence to the laboring pheno- type.

3.4 Identification of Latent Transcriptional Regulation Patterns

via Singular Value Decomposition

To understand underlying patterns of gene expression associated with labor, singular value decomposition (SVD), an unsupervised, dimensionality reduction and pattern 38

identification method was used. SVD has been previously employed to identify latent transcriptional regulation patterns in gene expression data.58–60

SVD was performed on the expression matrix containing all 71 samples (10 War- wick + 22 London + 39 Detroit) and 12,622 genes (those with measured expression in all studies). For a given gene expression matrix M, SVD decomposes the matrix into three parts: a pair of singular vectors, which can be interpreted as an eigen-sample and eigen- gene pair, and a relative strength value called a singular value. An eigen-gene represents a subset of genes with a given, similar pattern of expression. The eigen-gene then con- tains a value for each sample (n=71) indicating how the subset of genes is expressed in each of them. An eigen-sample represents a subset of samples with a given, similar pat- tern of activity across the corresponding eigen-gene. The eigen-sample then contains a value for each gene (12,622) indicating the activity of the subset of samples for a set of genes. The corresponding singular value indicates how well represented or how com- mon that pattern of expression is within a gene expression matrix.58 Therefore, SVD was employed to determine the dominant signals or expression programs within the com- bined tissue expression matrix, identify the genes that had the strongest contribution to the observed patterns, and visualize the stratification of genes and samples in a reduced space.

Examination of the singular values of the combined gene expression matrix indi- cated at least one and potentially two important patterns (Fig. 3.3A). The first singular vector was used for further analysis as it captures the main component of variation in the expression matrix and can be thought of as the dominant regulatory program. The dominant pattern among samples, obtained from the first eigen-gene, shows a clear 39

A B

C

Figure 3.3. Distribution of singular values and examination of genes and samples in the space spanned by the first singular vector from SVD analysis on aggregated gene expression matrix. A. Singular values calculated by performing SVD on the normalized, study-combined expression matrix (12,622 genes by 71 samples). B. Mapping of the top 25 positively and negatively valued genes in the space represented by the first right singular vector (eigen-gene). Bars representing genes are colored based on its differential expression in each of the four differential expression analyses (shown in Fig. 3.1; i.e. purple bars indicate genes in the 126 set of high-confidence genes). C. Mapping of all tissue samples to the first left singular vector (eigen-sample), sorted by mapping value. Bars representing samples are colored based on their clinically defined pheno- type (blue = non-laboring, red = laboring). Sample name abbreviations in panel C follow Table 2.1. separation of NL and IL samples with the former having negative values in the singu- lar vector and the latter having (mostly) positive values (Fig. 3.3C). The strength of this state of differential expression in each sample is determined by observing the magni- tude of each sample’s loading in the first eigen-gene. Based on these values, varying levels of pattern consistency across studies and within phenotypes were identified. Fig- ure 3.3C suggests that IL samples are more heterogeneous or potentially more difficult to call phenotypically (e.g. L-ILEs3 and D-IL7 are highly inconsistent) than non-labor 40

samples, which always have the same loading on the dominant singular vector. Based on these observations, it can be asserted that the dominant eigen-gene of the combined expression pattern represents the separation of IL and NL samples.

The most dominant eigen-sample was examined to identify the genes that con- tribute most to this singular vector, since these genes most accurately represent the labor phenotype. The 25 genes with the largest positive and negative values on the

first eigen-sample are shown in Fig. 3.3B. To provide a context for the correspondence between differential expression and this latent transcriptional regulation pattern, each gene is represented as a colored bar where the color is based on the number of study comparisons in which its expression pattern showed differential expression between phenotypes (e.g. purple bars indicate genes that were part of the 126 high-confidence genes from Fig. 3.1). The positively contributing genes in Fig. 3.3B were associated with higher expression in samples having a positive loading, mainly the IL samples. These genes strongly represented inflammatory processes, which has been shown in previous studies, containing genes such as IL-8, RUNX1, and IL-1β. The negatively contributing genes were more highly expressed in the NL samples and were not as well-replicated across all studies, but had comparable magnitudes of pattern contribution when com- pared to the positively loaded genes. One of these genes, GPR161 increases intracellu-

lar cAMP and promotes A (PKA)-dependent processes. This aligns with

the hypothesis that the cAMP/PKA pathway plays a role in maintaining quiescence by

promoting relaxation of the myometrium61. Genes including JAZF1 and TCEAL4 are in- volved in transcriptional regulation while little is known about the function of MAMDC2

and BIVM. Further investigation of these genes may provide additional hypotheses re-

lating to the maintenance of pregnancy. 41

In summary, the dominant singular vector of the combined expression matrix rep- resents a differential expression of genes between IL and NL myometrium. This result indicates that SVD directly presents an unsupervised way to classify the phenotype of any myometrial sample, via gene expression, using the first singular vector.

3.5 Global Identification of Pathways Altered During Parturition

Based on the above results from comprehensive computational experiments using clas- sification and SVD, it is clear that IL and NL myometrium exhibit strong differences in their transcriptional profiles. To better understand and interpret the signatures ob- tained, pathway enrichment analysis was performed using Gene Set Variation Analysis

(GSVA), with the hallmark gene sets from the molecular signatures database (MSigDB; well-defined biological states or processes created by collating gene sets from multi- ple sources), to each study individually. GSVA is an unsupervised method that makes use of the full expression matrix as opposed to a simple list of differentially expressed genes (e.g. over-representation) and provides a directionality, as fold change, for each pathway. This permits GSVA to determine pathway enrichment similarly to other ap- proaches in that strong differential expression in a few pathway genes will be significant, but also in cases where there is potentially weak but highly consistent differential ex- pression across many genes in a pathway. GSVA was performed separately on the stud- ies to make use of GSVA’s enrichment fold change output (i.e. to assess how pathways are being changed from NL to L as opposed to simply calling them significant with no di- rection of regulation). As mentioned in chapter 2, an additional normalization step was required to combine all studies (as in the SVD analysis). This normalization shifts the 42

expression distribution, diminishing the resulting fold change values, making it difficult to determine significance using traditional methods (i.e. fold change cutoff of 1.5). Path- way enrichment in this form allows better interpretation of the strongest signal between

IL and NL samples specific to each study as well as identify pathway-level consistency across all studies.

Figure 3.4A shows the GSVA results on the Warwick, London, and Detroit datasets, where rows (gene sets) are clustered by fold change enrichment. In the figure, pathways with higher activity in NL samples are shown in blue and pathways with higher activity in IL samples are shown in red. All three studies exhibited consistent enrichment, specifically upregulation, for multiple gene sets including TNFα, MYC targets,

MTORC1, inflammatory response, and JAK-STAT signaling gene sets. Note that not every pathway was significant in every study under traditional cutoffs (FC > 1.5 and adj. p-value < 0.05). Expression of TNFα has been shown to increase during labor in the cervix, fetal membranes, and myometrium, where it stimulates arachidonic acid release, activates phospholipid , and increase the production of prostaglandins in the myometrium (see review by Peltier, 200362). The JAK-STAT

pathway is a major cellular signaling pathway that has been shown to be activated by

mechanical stretch of myocytes.63 JAKs are also activated by cytokine ligands,

including interferons and interleukins, which are known to be associated with labor.64

Additionally, a study by Breuiller-Fouche et al. (2007) that combined results from existing studies examining labor in the myometrium showed that many of the consistently discovered genes were involved in the JAK-STAT pathway.65 From the above results, genes involved in TNFα signaling include SOCS3, IL6, CEBPB, IL1B,

BCL3, LIF, TNFAIP3, CCL2, PTGS2, SELE, and CXCL2, while the genes SOCS3, CSF3, IL6, 43

IL4R, CSF3R, MYC, OSM, PIM1, LIF, and IL7R make up the JAK-STAT pathway. Other pathways observed to be consistently enriched have links to parturition. Master et al.

(2002) showed that MYC expression in the mammary glands of pregnant mice decreases during pregnancy and rises just before labor and work by Deng et al. (2016) suggested an important role for mTORC1 signaling in the control of parturition timing.66,67 Together, the result of enrichment analysis both confirms and annotates the transcriptional signature identified in the above sections with specific pathways that can provide starting points for more targeted studies involving the transition of the myometrium from quiescence to labor.

3.6 Hypothesis-Driven Signature Identification

32 gene sets from MSigDB related to P4 signaling, inflammation, and muscle relaxation were selected based on previous finding to target the search to pathways implicated in parturition. GSVA was repeated with these sets for each study (Fig. 3.4B). As in the pre- vious section, results are represented in terms of a colored (by GSVA fold change enrich- ment), clustered heat map. There are clear areas of the heat map that indicate, for mul- tiple gene sets, differential expression and, just as importantly, fair consistency across most or all study comparisons. These sets include TNFα signaling via NF-κB and in-

flammatory response up-regulated in labor while vascular smooth muscle contraction, regulation of cAMP metabolic process, and relaxation of muscle were down-regulated in labor. Such results indicate the importance of these pathways in parturition. 44

A Log2(Fold Change)

B Log2(Fold Change)

Figure 3.4. Global and hypothesis-driven GSVA pathway enrichment across all tissue stud- ies. A. Top 30 enriched pathways (by median adjusted p-value) using the hallmark gene sets from MSigDB. B. Enrichment results for a selected set of pathways from MSigDB related to progesterone response, smooth muscle, inflammation, and cAMP signaling. Heat map is col- ored by log2(fold change) where red means up-regulation in laboring samples and blue down- regulation. Fold changes are calculated using an empirical Bayes model fit to the output (from the gsva() function) gene set by sample matrix. Rows (gene sets) are clustered using the complete clustering method based on correlation. 45

Next, SVD was applied to genes involved in signaling by progesterone, inflamma- tion, and cAMP. The most representative gene sets selected for this analysis were re- sponse to progesterone from Gene Ontology, inflammatory response from the MSigDB hallmark sets, and cAMP signaling from KEGG. The combined gene expression matrix was reduced to the genes in each of the gene sets separately. The pathway-specific ex- pression matrices were then concatenated and subjected to SVD. Results indicate that all three pathways exhibit differential expression between phenotypes, since the dom- inant regulatory program (eigen-gene and eigen-sample pair) for each gene set clearly separates the IL and NL samples (Fig. 3.5), similar to the global SVD result.

The relative strength of the dominant regulatory programs can be compared be- tween these three pathways by using a measure introduced in Leskovec et al. (2014) called “energy”.68 The square of each singular value is a measure of the amount of vari- ation accounted for by the first eigen-gene and eigen-sample pair.60 It then follows that the squared sum of all singular values represents the total variation of the data. Then the energy measure of interest here is the proportion of total variation (in the gene ex- pression data) covered by the first singular vector pair, which can be calculated by di- viding the square of the first singular value by the sum of squares of all singular values.

Using this measure, the energies were 24.28% for response to progesterone, 23.26% for inflammatory response, and 15.89% for cAMP signaling. Therefore, the separation be- tween IL and NL samples was strong for response to progesterone and inflammatory response and less so for cAMP signaling. Positive sample loadings were seen for the la- boring phenotype for response to progesterone and inflammatory response while pos- itive sample loadings for the non-laboring phenotype were seen in the case of cAMP 46

A B

C

D E

F

G H

I

Figure 3.5. Pathway-specific SVD analysis. SVD results for the gene sets A. GO response to progesterone, B. hallmark inflammatory response, and C. KEGG cAMP signaling pathway. All studies were combined (12,622 genes x 71 tissues) and then reduced to the genes within each gene set. Panels A, D, and G show the distribution of singular values, panels B, E, and H show the top 25 genes associated with each phenotype based on loading value (genes are colored by their mean log2(fold change) value across all four sample group comparisons in Fig. 3.1), and panels C, F,and I show the gene set activity (loading value) for each sample (samples are colored based on their originally reported clinical phenotype). 47

signaling. This is consistent with the understanding that inflammatory response is pro- labor and cAMP signaling is relaxatory or an anti-labor process. The gene set response to progesterone has many inflammation-related genes due to the fact that it is anti- inflammatory (i.e. expression of such genes are decreased in response to inflammation), therefore, the samples map in the same direction as the inflammatory response SVD result (Fig. 3.5F). However, gene loadings can provide insights into the pro-relaxation genes that progesterone regulates. Panels B, E, and H of Fig. 3.5 rank contributions of genes in these pathways, showing which genes contribute to the association of the path- way with each phenotype. Due to the strength of the inflammatory/laboring signal, the observed fold change is more consistent across studies for genes in this set than those with higher expression in the non-laboring signal. Overall, these results indicate that there is some level of differential expression for these pathways in the three transcrip- tome studies. However, many pathways are comprised of various signaling processes.

Certain genes in a pathway may be pro-quiescent and down-regulated at the onset of labor, while others in the same pathway promote labor and are up-regulated during la- bor. An example is cAMP signaling, which has canonical paths in various tissues that ac- tivate inflammatory-related processes (MAPK and PI3K-AKT signaling) as well as paths that activate muscle relaxation. Such pathways are clearly relevant to parturition but may have reduced significance (enrichment fold change) due to the regulation of the genes involved in these pathways in different directions.

Next, a pathway that was consistently, but not significantly, enriched in the targeted

GSVA analysis: vascular smooth muscle contraction (VSMC) was examined in further detail. Gene loadings from the first eigen-sample obtained from performing SVD were overlaid on this pathway (Fig. 3.6). By analyzing a pathway in this manner, an improved 48

Contraction CACNA1C CALML6 MYLK4 Ca2+

Calcium signaling KCNMB2 pathway K+ +p

20-HETE Myosin Arachidonic acid ITPR1 metabolism CYP4A11 MYL9 MYH11

Arterenol PLA2G4B Arachidonate ADRA1D -p Angiotonin AGTR1 Ins(1,4,5)P3 ARAF +p MAP2K1 +p MAPK1 +p CALD1 ACTA2 GNA11 PLCB1 EDNRA Endothelin-1 AVPR1A PRKCA PPP1R14A Diglyceride PPP1R12A AVP

GNA13

ARHGEF12 RHOA ROCK1

Reduction of Adenosine ADORA2A contractile system Ca2+ sensitivity Relaxation

PGI2 PTGIR GNAS ADCY1 PRKACA MYLK4 cAMP

alpha-CGRP CALCRL

Adrenomedullin Reduction of +p CACNA1C intracellular Ca2+ Ca2+ concentration Myosin

14,15-EET ITPR1 MYL9 Ca2+ MYH11 KCNMB2 K+ MRVI1 -p + ANP +p NPR1 ACTA2 BNP PRKG1 PPP1R12A NPR2 cGMP

CNP Non-labor Labor

GUCY1A2 NO

Figure 3.6. Visualization of gene expression changes on KEGG Vascular Smooth Muscle Con- traction (VSMC) pathway diagram. The Homo sapiens VSMC pathway diagram was down- loaded in KGML format from KEGG and loaded into Cytoscape. Nodes (genes) are colored based on loading values for the dominant singular vector from performing singular value decomposi- tion on the reduced (to include only genes within the VSMC pathway) Warwick expression ma- trix. These values were directly loaded into Cytoscape as a node table and used in a continuous manner in the style tab for node coloring. Blue nodes indicate genes with higher expression in the W-NL group, red higher expression in the W-IL group, and white no differential expression.

understanding can be achieved in terms of how changes in expression may be influ- encing signaling paths as opposed to just knowing whether or not the expression of a group of genes is changing in one direction or another. A large majority of the genes in this pathway have higher expression in the NL samples, suggesting that a significant part of the signaling machinery for contraction is used to keep myometrial cells in a re- laxed, quiescent state during pregnancy. Then, at the time of labor, a small number of 49

important genes in specific areas of the pathway are up-regulated to achieve myome- trial contraction. The labor-triggering genes are involved in initiating inflammatory re- sponse (PLA2G4B), MAPK signaling (ARAF, MAP2K1), and calcium-induced contraction

(CALM6, MYLK4).

3.7 Building a Parturition Signaling Network

The problem of reconstructing biochemical pathways is well studied, and the shortest

paths approach has been shown to be an accurate solution.45,69,70 A curated signaling

network constructed by Ritz et al. (2016) by integrating multiple protein interaction and

pathway databases as well as Gene Ontology (GO) was used (see chapter 2).45 This net- work is directed and weighted, where weights are calculated using a Bayesian approach to assign probabilities to edges based on experimental evidence and shared GO terms.

Since progesterone signaling, inflammation, and cAMP signaling were identified as im- portant pathways based on enrichment results and prior work in the parturition field, the network was reduced by selecting the nodes and edges that are downstream of the proteins PGR, IL1R1, and ADCY. These three proteins are the main, upstream initiators for the three pathways of focus. These three proteins are referred to as the “source” nodes in the parturition signaling network. Weights were then assigned to the nodes of the resulting network based on the broad gene expression signature derived from the global SVD analysis (i.e. the weight of a gene product in the network is the absolute value of the loading of the gene in the most dominant eigen-sample). “Sink” nodes are also chosen in the parturition network as the products of the top 50 genes associated with laboring and non-laboring myometrium based on the absolute values of the gene 50

loadings. This choice of the top 50 is somewhat arbitrary, however, it ensures an equal representation from each phenotype while capturing the most significant changes at the gene expression level. It also allows for a relatively broad signal (as there are 100 to- tal genes) without being too large, which would result in very dense networks. Shortest paths were calculated from each of the source nodes (the three pathway initiator pro- teins) to each of the sink nodes (the highly differentially expressed genes), by ensuring that the last edge in the path is an edge between a transcription factor and a target node.

All nodes that lie on one such shortest path are included in the final parturition signaling network.

The resulting network generated from this method has 66 nodes and 91 edges (Fig.

3.7A). The network contains four different node types: receptor, signaling protein, tran- scription factor, and target gene. The transcription factors and signaling proteins were selected by the shortest paths calculations. Nodes are colored based on their weight

(gene loadings from the dominant regulatory program from our global SVD analysis) with blue representing higher expression in non-laboring samples and red representing higher expression in laboring samples. Most of the nodes in this network have a dark blue or red color, indicating that they exhibit some differential expression between la- boring and non-laboring myometrium. Multiple genes in the network upstream of our target genes have been linked to parturition or are in some of the gene sets/pathways we investigated in previous sections of this work.

The parturition network provides signaling information about quiescence and la- bor, or the machinery that is present in both phenotypes. To identify signaling pathways specific to each phenotype (IL vs. NL), phenotype-specific signaling networks were also constructed. These networks were built by setting the input target genes as the top 50 51

PGR IL1R1 A ADCY PRKX RXRA PIK3R1

PRMT2 AKT1 TRAF2 NFKB1 CASP3 MAPK10 EGR1 MAP2K3 RELA RAC1 HSP90AA1 PTPN6 ATF3 AR RARA COPS5 SMAD7 PRKCA NFKBIA PRKACB MAP3K1 SREBF1 OSM MAPK14 NFYA STAT3 NFKB2 FOXO1 RIPK2 BATF3 SMAD3 MYC TP53 COPZ2 ACACB IER3 EIF4A1 SOD2 CAV1 CEACAM1 ETS2 IL1B ITPR1 WWP1 CEBPB NAMPT MT2A SPHK1 MYO1G BCL3 PTN MT1A PNP DYNLL1 PLAUR RUNX1 PFKFB3 CD34 ZBTB4 PIM1 HADH UBE2D4

B ADCY PGR IL1R1

TRADD PRKACB PIK3R1 PRKCI PPARA PRMT2 PRKAB2 AKT1 MAPK9 COPS5 PRKACA CREB1 SREBF1 AR PPP1R1B ESR1 MAPK8 DAB2 MAP3K3 RAF1 PPP1CB RELA MAPK1 PRKAG1 CKAP5 SNCG PPP2CB HRAS RAC1 MAPK10 YWHAZ ATL1 TGFBR2 RAP1B PTK2 NCOR2 SMAD2 NFKBIA STAT1 MYO1D PIAS4 EP300 EGR1 PEBP1 RNF14 MYC MAP3K1 SMURF2 ACACB PPP4C SNW1 NFYC CAMK2G TP53 VHL UBE2D4 NFYA EEF1A1 ZBTB4 PTN CCDC6 CAV1 ITPR1 WWP1 DCAF6 CRYZ DYNLL1 HADH CD34 COPZ2 AHNAK PFN2

PGR C ADCY IL1R1 PRKX MYD88 HSP90AA1 NFKB1 TRAF2 ATP1A1 MAPK14 MAP2K3 VDR ATF3 MAPK3 NCOR2 HDAC1 SMAD3 MAP3K5 PIK3CD PIK3R2 AKT1 SNW1 RELA CTNNB1 SRC PTPN6 BAD FOXO3 PRKACA EGFR MAPK8 STAT3 IER3 EP300 CASP3 MYC EGR1 IL6 SFN HNF1A MDM2 TP53 RIPK2 HNF4A MYO1G BCL3 CEBPB PNP NFKB2 PIM3 AR OSM EIF4A3 SPHK1 CEACAM1 PFKFB3 MT1A EIF4A1 PIM1 IL1B ANPEP SOD2 ETS2 RUNX1 SERPINA1 MT2A NAMPT BATF3 PLAUR

Signaling Transcription Receptor SVD Gene Loading Value Protein Factor

Transcription Factor Target Non-Labor Labor and Target Gene Gene

Figure 3.7. Parturition-specific signaling networks constructed using shortest paths and dif- ferential expression data. Shortest paths were calculated on a curated, directed, weighted sig- naling network. Shortest paths were obtained from each of the three pathway initiator proteins (PGR, IL1R1, ADCY; chosen to represent progesterone, inflammatory, and cAMP signaling) to each downstream target gene. Nodes in the network were weighted based on gene loadings from the dominant singular vector when performing SVD on the combined expression matrix (3 studies; 12622 genes X 71 samples). A. A general parturition signaling network. Target genes consisted of the top 50 genes associated with the non-laboring and laboring sample groups (100 total input genes). Nodes were weighted such that those with strong phenotype associations had the lowest cost. Paths falling in the top 50%, in terms of lowest costs, were retained. B. A parturition signaling network prior to labor (i.e. quiescence). Target genes consisted of the top 50 genes associated with the non-laboring sample groups. Nodes were weighted such that genes highly expressed in the non-laboring samples had low costs, those with roughly equal expression across the two phenotypes had a medium cost, and genes with higher expression in the laboring samples had high cost. C. A parturition signaling network during labor. Target genes consisted of the top 50 genes associated with the non-laboring sample groups. Nodes were weighted op- posite of the quiescent network in panel B. All paths were retained for the networks in panels B and C. Networks were visualized in Cytoscape. Node shapes are based on function/pathway location, node colors correspond to their respective weights, and edge width is proportional to the weights in the original network (thicker meaning higher confidence). 52

genes associated with NL (NL network) or IL (IL network), based on their SVD gene load- ings. For the remaining nodes in the network, weights were assigned to ensure that genes with expression associated with the corresponding phenotype were prioritized

(i.e. have lower costs and were therefore more likely to be chosen in the shortest path).

In other words, the values in the singular vectors associated with the genes were nor- malized such that genes up-regulated during labor had greater importance in the labor network (and vice versa for the non-labor network). The resulting networks are shown in Fig. 3.7B (non-labor) and Fig. 3.7C (labor). These two networks show very few simi- larities with each other, but both exhibit similarity with the network from Fig. 3.7A. Also of note is that the networks do not contain nodes of only one color, indicating the im- portance of certain proteins in the signaling of both biological states. The NL network is enriched for neutrophil signaling, EGF receptor signaling, cAMP signaling, regulation of circadian rhythm, and JUN phosphorylation while the labor network shows enrich- ment for glucocorticoid receptor regulatory network, TNF signaling, signaling by inter- leukins, regulation of PI3K, and positive regulation of cell proliferation. The IL network has many downstream targets that are known to be classic inflammatory genes (IL-1β,

RUNX1, PIM1, BCL3, etc.) whereas the target genes in the NL network give clues as to which genes may be most important during pregnancy and how they are regulated.

3.8 Summary

In this chapter, three gene expression studies derived from myometrial tissue samples of women at term (IL vs NL) were analyzed and compared. Differential expression analysis 53

provided a high-level examination of transcriptome variation between sample pheno- types and similarities across the three studies, which produced a high confidence set of

126 genes (differentially expressed in all studies) useful for characterizing labor. Clas- sification through supervised machine learning algorithms LASSO and EN was used to assess the strength of the transcriptional signature within each dataset and gain insight regarding the overall quality and consistency both within and between datasets. SVD on an aggregated expression matrix further confirmed a strong transcriptional signal sepa- rating laboring and non-laboring phenotypes. Finally, an integration of the knowledge gained from these approaches and pathway analyses with network information was per- formed to construct a signaling network of pregnancy and labor in human myometrium. 54

4 Investigating Pathways

Controlling Human Parturition in a

Myometrial Cell Line

Investigation of differential gene expression (comparisons in Fig. 2.1C) indicates that FSK/cAMP induced a major transcriptional response, altering the expression of

2210 genes, while P4 and IL-1β affect a less but near equal number of genes (932 and

970, respectively) between them (Fig. 4.1A). cAMP and IL-1β share a substantial num- ber of up-regulated genes (Fig. 4.1B), cAMP and P4 share a large set of down-regulated genes (Fig. 4.1C), and P4 down-regulates a number of genes that are up-regulated by

IL-1β (Fig. 4.1D). These results suggest that there is inter-gene regulation between these processes and that P4 is anti-inflammatory while cAMP has a more complicated rela- tionship in that it likely both up- and down-regulates different sets of inflammatory- related genes. 55

A All Significant B Up-regulated

C Down-regulated D IL-1B Up, P4 and FSK Down

P4 FSK IL-1B

Figure 4.1. Comparison of individual genes that are significantly differentially expressed in response to each individual stimulus. Venn diagrams depicting the overlap of genes that are A. significantly differentially expressed, B. significantly up-regulated, and C. significantly down- regulated in hTERT-HMA/B cell lines treated by P4 (green circle), FSK (red circle), and IL-1β (blue circle) as compared to control samples. D. Venn diagram depicting the overlap of genes that are significantly up-regulated in IL-1β treated cells as compared to control and significantly down- regulated in P4 and/or FSK treated cells as compared to control. All comparisons are single stim- ulus versus control and significance is defined as having |fold change| > 1.5 and p-value < 0.05 (after adjustment for multiple hypothesis testing using the Benjamini Hochberg procedure).

4.1 Progesterone Signaling

Treatment of the hTERT-HMA/B cells made to equally express PR-A and PR-B with P4 significantly ( fold change 1.5 and adjusted p-value 0.05) altered the expression of | | > < 932 genes compared to untreated cells (CTRL). P4 induced more down-regulation (555 56

genes) than up-regulation (377 genes). At the pathway level, P4 down-regulated the GO

BPs muscle tissue development, STAT cascade, chemokine signaling, and leukocyte mi- gration while the TNF signaling pathway and cellular hormone metabolic process were up-regulated (Fig. 4.2A). As for transcription factors, P4 up-regulated downstream tar- gets of AR and ISRE and down-regulated the targets of STAT6, RSRFC4 (or MEF2A), CEBP, and HNF1 (Fig. 4.2B). These results clearly indicate anti-inflammatory actions by P4, which is supported by previous research71. Investigation of the inferred transcriptional regulatory machinery underlying the P4-induced gene expression changes (i.e. the tran- scription factor pathways that P4 uses to alter expression of certain genes) indicates that

JUN, a part of the AP-1 complex, is an important component (Fig. 4.3A). Node colors in

this network indicate strong anti-inflammatory action by P4 (the targets MMP1, CCL3,

IL-8, and IL-1β are all down-regulated). Other genes of interest include KLF4, which

is up-regulated by P4 and has multiple down-stream targets in the network. KLF4 has

been previously linked to P4 and appears to limit cell proliferation72. Furthermore, PLAT

plays a role in tissue remodeling and cell migration and PTX3 is involved in the regula-

tion of innate resistance to pathogens and inflammatory reactions73,74. Together these results indicate that the majority of signaling by P4 in hTERT-HMA/B cells is to produce a strong anti-inflammatory action, likely utilizing JUN to achieve this effect.

4.2 cAMP Signaling

Treatment of cells with FSK significantly altered the expression of 2210 genes, the ma- jority of which were up-regulated compared to CTRL (1345 up and 865 down). At the pathway level partial anti-inflammatory action was observed for cAMP,with response to 57

A Figure 4.2. Pathways and transcription factors that exhibit significant enrich- ment in differential expression in re- sponse to treatment by P4, FSK, and IL- 1β individually. The gene set collections for Gene Ontology Biological Processes and transcription factor-targets were obtained from the Molecular Signatures Database. Gene sets significant (|fold change| > 1.5 and adjusted p-value < 0.05) in any single stimulus condition B versus control were identified using Gene Set Variation Analysis (GSVA). For a subset of these significant A. pathways and B. transcription factors, the magnitude and direction of the col- lective fold change of the genes in the corresponding gene set, as quantified by GSVA, is shown using three bars colored according to the stimulus. 58

A Progesterone Regulatory Network Control Stimulus BCL11B KLF4 -3.31 Log2(Fold Change) 6.22 CREBBP JUN CEBPB CSF3 STAT1 CSF2 IL1B POU2F1 STAT5B IFNA1 TP53 PNMT EGR1 VDR KCNN3 CEBPA SMAD4 BMP2 AR IL6 STAT5A

SMAD1 GDNF SEPP1 ACPP RARA STAT3 IL11 FLI1 HIF1A ANKRD1 DCN MMP12 IDO1 STAT2 SERPINA1 MMP1 CCL4 PTGES SERPINB2 PLAU CCL3 PLAT BCL2A1 IL8 AKR1C2 SPRY1 MMP3 CXCR4 PTX3 IL24 IL13RA2 ROR1

B cAMP Regulatory Network Individual Stimulus PRKCZ Shared with P4 Control Stimulus Shared with FSK JUN -3.83 Log2(Fold Change) 6.14 CEBPB RELA POU2F1 Shared with IL-1B Shared between all CSF3 NR4A3 CSF2

PPARG EGR1 STAT3 PTGES HMGA2 KLF9 IL6 IL11 HNF4A EPAS1 RARA NR4A2 PGR TFAP2A AR NOX4

RUNX2 POU1F1 SERPINB2 VTN DIO2 MMP1 FLI1 ALDH1A1 CRISP3 SMAD6 CXCR4 REN FBP2 MBP HSD17B2 CCL3 ANGPTL4

C IL-1B Regulatory Network Control Stimulus IL18 JUN STAT3 -1.70 Log2(Fold Change) 4.87

CEBPB CEBPD

CEBPA POU2F1 IL1B

CCL2 SOCS3 SMAD4 CSF2 CSF3 PPARG CCL3 EGR1 DDIT3 CCL4

AR BCL2A1 SOD2 IL6 HIF1A SMAD1 IL11 IL24 RARA FLI1 HSD11B1 SERPINB2 SAA1 ALDH1A3 SAA2 MMP1 MMP3 ITGB8 CXCL2 SPRR2A CYP27A1 MYOCD PTGS2 IL23A SERPINA1 CCL7 ANGPTL4 PDGFB CCL20 KRT14 KCNN3 MMP12

Figure 4.3. Regulatory networks representing the effects of each individual stimulus. Regu- latory network enrichment analysis (RNEA) was performed and significant subnetworks were identified in samples treated by A. P4,B. FSK (cAMP), and C. IL-1β based on gene expression alterations compared to control. Nodes (genes and transcription factors) are colored based on log2(fold change). Edges (regulatory interactions) and node outlines are colored based on occur- rence in the 3 shown networks. Log2(fold change) cutoffs used for significance in the RNEA func- tion were 1.5 for P4 and IL-1β and 1.75 for FSK. Visualization was performed using Cytoscape. 59

IL-1 and FC epsilon receptor signaling significantly down-regulated (Fig. 4.2A). FSK also induced changes associated with myometrial cell quiescence; genes involved in smooth muscle contraction were down-regulated and those involved in muscle hypertrophy were up-regulated. At the transcription factor level, more anti-inflammatory action was apparent through the down-regulation of FOS and ETS1 targets (Fig. 4.2B). cAMP up-

regulated targets of CREB, expectantly, as well as targets of FOXM1 and NFY, which are

involved in regulating cell proliferation. The inferred regulatory network based on these

gene expression changes exhibits multiple points of interest (Fig. 4.3B). Many down-

stream genes were regulated by RELA, a sub-unit of the well known inflammatory tran-

scription factor NF-κB. RELA was in turn down-stream of the most strongly up-regulated

gene after FSK treatment, zeta (PRKCZ). There were some similari-

ties with the P4 network, including down-regulation of inflammatory genes MMP1 and

CCL3. However, known inflammatory genes such as IL6, IL11, and CXCR4 were up-

regulated by cAMP, indicating more diverse and complex action within the cAMP sig-

naling pathway in myometrial cells. Further illustrating this point, some genes in this

network correspond to the establishment of the quiescent phenotype which was also

observed from the pathway enrichment (e.g. NR4A2 and NR4A3 participate in cardiac

hypertrophy and VTN promotes cell adhesion)75.

4.3 IL-1β Signaling

IL-1β significantly altered the expression of 970 genes in hTERT-HMA/B cells, most of which were up-regulated compared to CTRL (586 up and 384 down). At the pathway

level, the gene expression alterations corresponded to a classic inflammatory response 60

signal, with pathways related to chemokines, leukocytes, cytokines, and inflammatory molecule signaling (IL-1, FC epsilon receptor, JUN, and NF-κB) were all up-regulated

(Fig. 4.2A). Similarly for transcription factors, IL-1β up-regulated targets of GATA, NF-

κB, SRF, and STAT6 (Fig. 4.2B). This pattern was also seen in the IL-1β RNEA network with the majority of genes exhibiting increased expression (Fig. 4.3C). Genes such as

CCL3, PTGS2, and IL11 were down-stream of CSF2/3 and CEBPA/B/D transcription fac-

tors as well as JUN, which regulates expression of many genes altered by IL-1β treat-

ment, suggesting an important role for this protein in regulating inflammation.

4.4 Additional Insights from Single Stimulus Comparisons

For the most enriched TFs in Fig. 4.2B, the co-regulation of their target genes by P4,

FSK, and IL-1β was examined further. A substantial portion of the targets of these top

TFs were regulated in specific directions (up- or down-regulated) by at least two of the

individual stimuli. For example, the TF ISRE was up-regulated by P4 and FSK (Fig.

4.2B). Of the 247 known targets of ISRE, 34 were up-regulated by FSK, 15 by P4, and 7

of these were up-regulated by both (Fig. 4.4A). FSK and P4 also altered the same genes

of TFs whose activity was down-regulated, such as ETS1 (Fig. 4.4K). FSK and IL-1β ac-

tivated many of the same targets of NF-κB, whereas TFs such as STAT6 exhibit that the

anti-inflammatory actions of P4 are effective on some of the same targets that were up-

regulated by IL-1β (Fig. 4.4I). These results indicate that P4, FSK, and IL-1β individually

regulate the expression of the same genes, with P4 and FSK typically having the opposite

effect as IL-1β on expression. 61

Up-regulated Down-regulated IL1B Up, P4 and FSK Down A B C

ISRE-01

247 Total Targets

D E F

NFKB_C

263 Total Targets 1

2

G H I

STAT6_02

258 Total Targets

J K L

ETS1_B

259 Total Targets 2

4 2

P4 FSK IL-1B

Figure 4.4. Regulation of shared targets for top enriched transcription factors in response to individual stimuli. Comparison of how each stimulus alters expression of targets of selected transcription factors showing strong enrichment from GSVA calculation in at least one of the conditions P4, FSK, and IL-1β compared to control. For each of these transcription factors, the number of known targets that are up- or down-regulated by each stimulus and the overlap be- tween these targets are shown, along with the total number of known targets for each TF. 62

The regulatory network analysis also indicates clear similarities between the three individual stimuli. All three networks shared 15 nodes and 25 edges, indicated in red

(edges and node borders in Fig. 4). A majority of the shared genes included upstream

TFs, suggesting that each stimulus induced its cellular response through common reg- ulatory machinery. The anti-inflammatory action by P4 is clearly represented in the shared sub-network between P4 and IL-1β (purple in Fig. 4A and blue in Fig. 4C). These two networks shared 12 nodes and 20 edges (in addition to the nodes and edges shared by all three conditions), where most genes were up-regulated in the IL-1β network and down-regulated in the P4 network. This indicates that the main action of P4 is anti- inflammatory, directly down-regulating many of the same genes that IL-1β stimulates.

Although P4 and FSK had anti-inflammatory action and equally acted on many of the same TF targets, little was shared between top-enriched P4 and FSK regulatory networks that is not also shared with the IL-1β network. Taken together, it is clear that P4, FSK,

and IL-1β are not only regulating the same genes, but also using similar transcriptional

pathways to achieve the regulation.

4.5 Anti-inflammatory Actions of P4 and cAMP

Terms significantly enriched by IL-1β treatment were examined to better characterize the anti-inflammatory affects of P4 and FSK in the presence of inflammation (perform- ing comparisons in Fig. 2.1D). Certain pathways up- (chromatin access and organiza- tion) and down-regulated (inflammatory-related pathways including glucose starvation and type 1 interferon) by IL-1β were observed to be unaffected by the addition of P4 and/or FSK (Fig. 4.5A, top panel). The addition of P4 and FSK individually decreased 63

specific IL-1β responsive pathways. P4 strongly inhibited GO terms related to cell dy- namics. FSK again exhibits a dual action of promoting relaxation (reversal of pathways relating to muscle and calcium transport) and inhibition of inflammation (IL-1 signaling

and monocyte chemotaxis). Interestingly, multiple pathways significantly up-regulated

by IL-1β were inhibited only when both FSK and P4 were present, consisting of pro-

cesses related to cytokine production and signaling cascades involved in inflammatory

response (STAT, IL12, and LPS). As for transcription factors, it was observed that the ac-

tivity of NF-κB, in the context of inflammatory signaling, was not inhibited by P4 and/or

FSK (Fig. 4.5B, top panel). The extent to which P4 inhibited inflammation at the level of transcription factor activity was clear, almost completely reversing the expression of downstream targets. For most of these cases, FSK had relatively little or no effect sug- gesting anti-inflammatory action specific to P4 signaling. FSK again seemed to oppose down-regulation by IL-1β, particularly on TFs related to relaxatory actions (GR and PR).

Similar to results seen at the pathway level, there were cases where the combination of

FSK and P4 had a stronger affect on IL-1β-altered gene sets than the two individually. In- hibited inflammatory TFs included GATA, SRF (part of the MAPK pathway), and STAT6, which was also a synergistically inhibited pathway.

Anti-inflammatory action was investigated further through annotation of the previ- ously calculated IL-1β RNEA network (Fig. 4.3C). Figure 4.6A depicts how genes from the

IL-1β RNEA network were influenced, in terms of their expression, by addition of FSK

and/or P4. A sizable portion of the network includes IL-1β genes that were inhibited in the presence of P4 alone, FSK alone, and FSK + P4 (indicated in yellow). Note that all of these genes were up-regulated by IL-1β (boxed nodes), indicating the pro-inflammatory genes and regulatory interactions that P4 and FSK inhibit or inactivate individually and 64

in combination. There were additional cases in blue, green, and orange where only one of P4 or FSK was needed to achieve inhibition. Genes in red indicate those where inhibi- tion was only achieved in the presence of P4 and FSK, representing the GSVA-observed synergistic inhibition. Finally, this network also highlights genes and regulatory inter- actions that are unaffected by P4 and/or FSK (in purple), meaning they remain active in the presence of anti-inflammatory stimuli.

4.6 Synergy Between P4 and cAMP

Much of the previous results indicate a synergistic effect between cAMP signaling and

P4. Such a relationship is important in the context of establishing and maintaining preg- nancy. FSK and P4 each had anti-inflammatory actions and relaxatory actions (Figs.

4.1-4.4), but these did not fully overlap, meaning that they work similarly in certain bio- logical contexts and complimentary in others. This was examined further by annotating a regulatory enrichment network calculated from the FSK + P4 vs CTRL comparison of expression (Fig. 4.6B). A sizable portion of this network (indicated in red) shows the acti- vated regulatory interactions present only under the combination treatment of FSK and

P4. Many of the genes were up-regulated, implicating them as important molecules for pro-relaxtory signaling as opposed to the dominant anti-inflammatory signaling seen in Fig 4.6A. Again, many of the nodes in this network are co-regulated by P4 and cAMP

(indicated in yellow). These genes are mostly down-regulated (ovular nodes), which is 65

A Figure 4.5. Pathways that are

No activated/inhibited by IL-1β are Inhibition inhibited/activated by different combinations of P4 and FSK. The gene set collections of Gene Ontology Biological Processes

Inhibition and Transcription factor-targets by P4 were obtained from the Molecular Signatures Database. Gene sets significantly altered (|fold change| Inhibition by FSK > 1.5 and adjusted p-value < 0.05) from control to IL-1β were identi- fied using GSVA. These gene sets were then re-evaluated by GSVA for the conditions of P4+IL-1β, Inhibition by FSK+P4 only FSK+IL-1β, and FSK+P4+IL-1β compared with control. These gene sets were classified into 4 categories according to their regulation pat- B tern: 1) No inhibition (activated by IL-1β and not inhibited by the No Inhibition addition of P4 and/or FSK), 2) Inhibition by P4 (activated by IL-1β but this activation was reversed by the addition of P4), 3) Inhibition Inhibition by P4 by FSK (activated/inhibited by IL-1β but this activation/inhibition was reversed by the addition of FSK), 4) Inhibition by P4+FSK

Inhibition (activated/inhibited by IL-1β, this by FSK regulation was not reversed by the addition of P4 or FSK individually, but it was reversed by the addition of P4 and FSK together). For each Inhibition by FSK+P4 of these four categories, the fold change of sample gene sets (repre- senting A. Gene Ontology Biological Processes, B. Transcription Factor Targets) in response to different combinations of stimuli are shown. See Supplemental files 10 and 11 for enrichment fold change val- ues for all GO BP and TF terms, respectively, significant in IL-1β compared to control. These include enrichment fold changes for all IL-1β conditions. 66

A Inhibition of IL-1B Regulatory Signaling

IL18 JUN STAT3

CEBPB CEBPD

CEBPA POU2F1 IL1B

CCL2 SOCS3 SMAD4 CSF2 CSF3 PPARG CCL3 EGR1 DDIT3 CCL4

AR BCL2A1 SOD2 IL6 HIF1A

IL11 SMAD1 IL24 RARA FLI1 HSD11B1 SAA2 SERPINB2 SAA1 ALDH1A3 MMP1 MMP3 ITGB8 CXCL2 SPRR2A CYP27A1 MYOCD PTGS2 IL23A SERPINA1 CCL7 ANGPTL4 PDGFB CCL20 KRT14 KCNN3 MMP12

Not Significant IL1B-responsive not inhibited Inhibited by P4 only Inhibited by FSK only

Inhibited by FSK+P4 only Inhibited by P4, FSK+P4 Inhibited by FSK, FSK+P4 Inhibited by P4, FSK, FSK+P4

B FSK and P4 Regulatory Signaling

Not Significant Responsive to FSK only Responsive to FSK+P4 only

Responsive to P4, FSK+P4 Responsive to FSK, FSK+P4 Responsive to P4, FSK, FSK+P4

Not Significant Up-regulated Down-regulated

Figure 4.6. Regulation interplay of genomic signaling by IL-1β, FSK, and P4. A. Inhibition of the regulatory network induced by IL-1β (same network and inputs used for Fig. 4.3C) by differ- ent combinations of P4 and FSK (e.g. an orange node is significant in IL-1β vs Ctrl and remains significant in the same direction in P4+IL-1β vs Ctrl but becomes insignificant or significant in the opposite direction in FSK+IL-1β vs Ctrl and FSK+P4+IL-1β vs Ctrl). B. Regulatory patterns of P4 and FSK. The network is obtained by performing RNEA for FSK+P4 vs Ctrl with a log2(FC) cutoff of 2.4 and p-value cutoff of 0.05. Node colors are based on P4 vs Ctrl, FSK vs Ctrl, and FSK+P4 vs Ctrl expression comparisons. Node shape indicates differential expression of each node, including direction of regulation. 67

consistent with the idea that cAMP signaling and P4 are anti-inflammatory. It is impor- tant to understand how these two processes work individually and in concert to eluci- date their role in maintaining quiescence. The results in this chapter identify pathways and regulatory mechanisms that may be important for fulfilling this role.

4.7 Summary

RNA-seq data derived from an immortalized human myometrial cell line, hTERT-HMA/B

induced to express PR-A and PR-B, treated with combinations of IL-1β, P4, and FSK

(activates cAMP signaling) and vehicle were used to address two questions: 1) what

genes and regulatory pathways are affected by IL-1β, P4, and FSK in myometrial cells

and 2) how do P4 and FSK affect responsiveness of myometrial cells to IL-1β. These

questions were answered using global differential gene expression analysis, gene set en-

richment, and regulatory network enrichment. Those analyses suggest that 1) P4/PR

signaling inhibits responsiveness to IL-1β primarily through AP-1 (JUN), 2) cAMP sig-

naling exerts anti-inflammatory effects alone while also helping to establish a relaxed

and hypertrophic state, and 3) that P4 and cAMP signaling work together in a syner-

gistic manner to further inhibit certain inflammatory processes. Specific regulatory

molecules/interactions were observed to have altered activity under individual treat-

ment with the three stimuli, providing context to the inter-process regulation observed

from gene set enrichment. 68

5 Identifying Transcriptional

Similarities of Human Myometrial

Tissues and Cell Lines

5.1 Agreement of Transcriptional Alterations at the Gene,

Pathway, and Network Levels

Similar methods of analysis, such as gene set enrichment and network construction, were used to study tissue data (chapter 3) and cell line data (chapter 4). Therefore, a simple examination of the results from these chapters can allow for a comparison be- tween these two types of experimental specimens.

Enrichment analysis by GSVA, using a combination of hallmark and selected sets for tissue and GO BPs for cells, shows a level of similarity. For example, the term "TNFα signaling via NF-κB" is up-regulated in labor and both TNFα signaling and NF-κB sig- naling are up-regulated by IL-1β. This pattern is also seen for pathways related to IL-2/6,

STAT3/5, cytokine signaling, and muscle cell differentiation/migration. Interestingly, a number of these are inhibited by P4, either alone or when in the presence of IL-1β. 69

Smooth muscle contraction is up-regulated in laboring myometrium, not significantly altered by IL-1β, but significantly down-regulated by FSK alone.

A more direct comparison of term enrichment can be achieved by repeating the

GSVA calculation on the tissue data using GO BPs. Overlap in enriched terms from the tissue data (Warwick study and 126 high-confidence set of differentially expressed genes) and cell line data (terms significant in response to at least one of P4, FSK, or IL-

1β compared to control and just the IL-1β altered terms) in pairwise comparisons (Fig.

5.1). Interestingly, some of the terms in Fig. 5.1A are up-regulated in labor, but vary in terms of their response to FSK with some being strongly up-regulated and others in- hibited. Shared terms in Fig. 5.1D are of interest considering the high-confidence gene set strongly represented inflammatory signaling. When comparing with cells treated with IL-1β in addition to P4 and/or FSK, hypotheses can be drawn about how parts of this labor signature are affected by pro-quiescent stimuli. For example, a number of these pathways are only inhibited by the combination of P4 + FSK, including regulation of leukocyte chemotaxis, inflammatory response, and regulation of STAT cascade. All shared terms in Fig. 5.1A and D and their enrichment statistics are listed in Appendix

A Tables 7 and 8, which provides more detail for each pathway indicating how pathway enrichment varies under different cell line treatment conditions.

Despite using two quite different methods (shortest path-based starting from a full signaling network for tissue compared to a TF-target gene regulatory network method utilizing differential expression) there are some agreements between the tissue and cell line-based networks. Comparing the non-labor network (Fig. 3.7B) and FSK + P4 RNEA network (Fig. 4.6B), there are a number of common TFs including RELA, MYC, TP53,

PRKCZ, and ERG1. Likewise, for the labor network (Fig. 3.7C) and IL-1β RNEA network 70

A B

C D

41

Significant in at least 1 of: Significant Tissue High- Significant in P4 vs CTRL, in Warwick Confidence IL-1B vs CTRL FSK vs CTRL, Dataset 126 Gene Set IL-1B vs CTRL Figure 5.1. Comparison of cell line and tissue gene set enrichment of GO Biological Processes. Gene set variation analysis (GSVA) was performed for the conditions P4 vs CTRL, FSK vs CTRL, IL-1β vs CTRL and W-IL vs W-NL. Simple over-represenation anlaysis was also performed for the 126 high-confidence genes identified in Fig. 3.1 using Enrichr. A-D. Overlap in enriched GO BPs for different pairwise comparisons of cell line-based and tissue-based enrichment. Shared terms for A and D and their enrichment fold changes/p-values can be found in Tables 7 and 8 Appendix A.

(Fig. 4.6A), shared TFs are IL6, STAT3, ERG1, AR, SMAD, and IL-1β. Despite these sim- ilarities, there are very few cases of the down-stream targets comprising of the same genes (only SERPINA1 for the latter comparison).

Taken together, it appears that differences in myometrial cell gene regulation be- tween labor and non-labor are not always the same in treated cell line comparisons despite the observed agreement at the pathway level (based on similarity in enriched 71

pathways) and the TF level (based on comparisons of constructed networks). This sug- gests that a more global, unsupervised assessment of the alterations in gene expression will indicate a strong degree of similarity between the transcriptional landscapes of my- ometrial tissue and specifically treated cell lines. The analysis in the following section attempts to address this hypothesis.

5.2 Assessing Similarity Using Tissue Similarity Index

Tissue similarity analysis was used to determine the extent to which hTERT-HMA/B cells

in response to various stimuli mimic term myometrium. To do this, the Warwick RNA-

seq dataset from chapter 3 (term laboring and non-laboring myometrium, N = 5 for

each) was compared to the treated cell line using a method introduced by Sandnberg

and Ernberg (2005) called Tissue Similarity Index (TSI)53. TSI employs singular value

decomposition (SVD) to find an average profile for all samples of a given type (e.g. a spe-

cific or tissue) based on the variability within the gene expression matrix. The cell

line samples are then projected (using the calculated singular vectors) into the reduced

SVD space and their correlations to the different average tissue profiles is calculated.

The correlations produce the TSI values, which range from -1 (low similarity) to 1 (high

similarity).

Correlation for each hTERT-HMA/B cell line sample was calculated with the aver-

aged non-labor profile and labor profile and were symmetric (i.e. samples with a corre-

lation of 1 to the non-labor representative had a correlation of -1 to the labor represen-

tative). Based on the current understanding of P4, cAMP, and inflammatory signaling

in parturition, as well as the results presented in this work, a given trend was expected 72

for these cell line sample groups. IL-1β should likely be most labor-like, addition of

P4 and/or FSK to IL-1β-treated cells should make them less labor-like, untreated cells should be roughly in a middle-ground between labor and non-labor, while FSK-, P4-, and FSK+P4-treated cells should be more non-labor-like. TSI was performed on 4 sepa- rate gene expression matrices containing: 1. all genes, 2. highly significant (|fold change|

> 5 and adjusted p-value < 0.001) genes from P4, FSK, and IL-1β individual treatment

(145 genes), 3. a set of key cell line genes selected based on the results presented in Fig- ures 4.1-4.6 (17 genes), and 4. highly significant genes from the tissue dataset (non-labor vs labor; 151) (Fig. 5.2A, see chapter 2 for a detailed description of the four gene sets).

Using all genes shared between the two expression studies (gene set 1) in the TSI calculation produced fair TSI values (i.e. similar to the hypothesized pattern of similar- ity), with IL-1β-treated cells having the highest correlation, about 0.5, to the averaged labor profile while FSK+P4-treated cells were highly correlated, around -0.75, with the averaged non-labor profile. Using gene set 2, the correlation to labor increased to 0.85 for IL-1β and all IL-1β conditions were more labor-like. In contrast, P4-treated cells achieved the highest correlation to the non-laboring phenotype. This pattern indicates that the strong anti-inflammatory action of P4 is captured well by this gene set, but the relaxatory action of cAMP is less represented. Use of gene set 3 produced a result highly similar to our prior expected cell condition-tissue phenotype similarity. Addition- ally, the synergistic effect of FSK and P4 seen in the GSVA results were replicated, with

FSK+P4 being more highly correlated with the non-laboring phenotype and FSK+P4+IL-

1β correlated with non-labor, almost equivalent to the untreated cells. FSK and P4 were also clearly relaxatory stimuli in this result, seen by the substantial increase in corre- lation to non-labor compared with control. These patterns were also clearly visible in 73

A

B C

Figure 5.2. Comparison of cell line and tissue samples using Tissue Similarity Index. Singular Value Decomposition (SVD) was performed on the tissue gene expression matrix and phenotype centroids were calculated in the reduced SVD space. Cell line samples were then projected to this space and their correlations to the labor centroid were calculated. A. Median correlations and standard deviations per sample group are shown for all genes shared between the two stud- ies, highly significant (|fold change| > 5 and adjusted p-value < 0.001) genes from the individual stimulus treated cells compared to control, a set of key genes selected from the comprehensive analyses of cell lines reported in Figures 4.1-4.7, and highly significant genes from the tissue study by comparing in-labor and not-in-labor samples. B. A heatmap of the key genes with me- dian expression across all sample groups. C. Visualization of samples and tissue centroids using the first two singular vectors (or components) from the SVD analysis on the key genes selected based on the cell line analysis.

the gene expression matrix (Fig. 5.2B). Finally, gene set 4 (tissue-derived) produced the

same ordering as the key cell line genes, albeit with more variability. 74

Further visualization of the TSI results was achieved using the first two singular vectors, or components, of the SVD calculation. Figure 5.2C depicts this sample clus- tering for gene set 3 (selected key cell line genes). Tissue phenotype "centroids" indi- cate the representative profile for labor and non-labor. The top two SVD components indicate that the most dominant patterns in the gene expression data are first the IL-

1β/inflammatory signaling and cAMP signaling second. The former, component 1, rep- resents the TSI result in that the FSK/P4 and non-labor samples are separated from the

IL-1β and labor samples with opposite values. The latter, component 2, identifies cAMP signaling, separating FSK-treated samples from the rest. These results are interesting because they were clearly observed in the cell line data, but the components were iden- tified from the tissue data only. The genes came from the observed expression in the cell line, but no direct information from the cells is included in the SVD calculation. This indicates a clear role for cAMP in parturition. These genes also capture the inherent di- versity in the laboring phenotype, clearly exhibiting more variability in expression than the non-labor samples. These high correlations and the ordering of the cell line sample groups indicates that this experimental model of human myometrium accurately cap- tures the transcriptional landscapes of myometrium at term and during parturition.

5.3 Summary

Results from various bioinformatics approaches carried out on RNA-seq data from multiple term myometrial tissue studies and data derived from an immortalized human myometrial cell line, hTERT-HMA/B induced to express PR-A and PR-B, treated with combinations of IL-1β, P4, and forskolin (activates cAMP signaling) and vehicle 75

were used to address an essential question: how do the gene expression patterns in differently treated hTERT-HMA/B cells compare with expression patterns in laboring and non-laboring term myometrium. This question was answered implicitly through comparisons of global differential gene expression, gene set enrichment, and signaling/regulatory network analysis and then explicitly by TSI. Comparison of results from chapters 3 and 4 indicated a relative agreement at the pathway level and the TFs active in regulation. A number of pathways were similarly regulated across certain cell conditions and tissue phenotypes, specifically inflammatory-related signaling that were up-regulated in labor and up-regulated with IL-1β, often with inhibition by either

P4, FSK, or P4 + FSK. Tissue and cell line networks contained a number of the same TFs exhibiting similar differential expression, but there was little similarity when observing their most highly altered targets in the networks. Lastly, TSI results further supported the hypothesis that hTERT-HMA/B cells exposed to IL-1β mimics the laboring phenotype whereas exposure to the combination of P4 and forskolin mimics the pre-labor phenotype. 76

6 Discussion and Conclusions

6.1 Tissue Analysis

Overlap between datasets reveals high-confidence transcriptional markers of labor.

By examining multiple tissue datasets and employing various analysis techniques, more was learned about the transcriptional landscapes of myometrial tissue at different stages of parturition as compared to previous studies that examine only one dataset using sim- ple differential expression analysis. Investigation of the results in an integrative manner helps avoid making false conclusions or dismissing outliers of a given dataset as a con- sequence of the limitations for a single method. The benefit of using multiple datasets is clearly seen with the Venn diagram shown in Figure 3.1, where a minimum of 500 genes would be identified through any given study individually, however, upon comparison across datasets, a compact set of 126 high confidence genes is obtained. Gene expres- sion data are inherently noisy. In this study, there exists the added challenges of differing populations, study sizes, platforms, and clinical phenotypes. Therefore, the 126 identi-

fied genes meeting stringent significance criteria in all studies can be considered as a reliable signature of labor-associated genes. Included in this set of 126 genes are some well-known contraction associated proteins, including PTGS, PTGES, COX2, IL8, CXCL2, 77

CCL2, and SOCS3. The oxytocin receptor (OXTR) is only significantly differentially ex- pressed in the Warwick study.

Implications of miss-classified samples. In addition to the identification of high confidence transcriptional markers in chapter 3, application of multiple computational approaches on the same data helped identify potential outliers within a given study.

Namely, the comparison of miss-classified samples across LASSO/EN and SVD, revealed multiple samples that showed expression patterns more consistent with the opposite phenotype (e.g. samples D-IL7 and D-IL16, which are incorrectly classified in all LASSO/EN classification tasks (training on any dataset) and scored more similarly to non-labor samples by the most dominant singular vector of SVD; the same result is seen for samples L-ILEa2, L-ILEa6, L-ILEs3 from the London dataset). Together, these results bring into question the quality of the samples as they are repeatedly miss-classified in all cases. In this context, each sample has a clear phenotype distinction. However, visible characteristics of laboring women may not match the biochemical actions within the myometrium. The initial stages of labor for certain individuals may still involve further biochemical changes to achieve full activation characteristic of parturition. All three of the tissue studies analyzed in chapter 3 adopted different clinical definitions of labor to designate sample phenotype. These criteria were mainly based on contraction frequency and cervical dilation (Table 2.1).

This observation calls for a clinical standard for determining the presenting phenotype, adding to the importance of the work carried out in this study as it is possible to employ our discovered transcriptional signatures as a means to aid in phenotype calling of a given myometrial sample. 78

Gene expression signatures have greater variability in laboring samples. Analysis of samples via supervised learning also allowed for broader conclusions to be drawn about the data. One example is that the laboring phenotype has greater variability relative to the non-labor phenotype as all “questionable” samples fall into the labor class. This is not surprising as it is inherently easier for physicians to determine the non-laboring phenotype, whereas there is a broad spectrum of laboring markers (cervical dilation, frequency of contractions, etc.) There are also many factors that can influence these phenotype markers of parturition at the time of tissue sample collection, such as age, weight, race, and number of prior pregnancies.

Data normalization and pre-processing. It is important to note that much of the anal- ysis carried out in chapter 3 required an accurate procedure for creating comparable distributions of gene expression across the three transcriptome studies. It is well under- stood that raw data from RNA-seq and microarray platforms have dissimilar distribu- tions and characteristics.76 Therefore, to perform cross-dataset classification and SVD,

a careful normalization approach was needed. There are many proposed procedures for

creating agreement between RNA-seq and microarray data.76 In the analysis presented

above, a relatively simple approach was used and was highly effective, demonstrated by

the performance of the classification models and singular vectors from the SVD analysis.

It is important to keep in mind that more complicated study comparisons may require

other, more sophisticated techniques for sufficiently accurate results.

Other areas of consideration may also be important when combining studies in

similar analysis pipelines. Here both RNA-seq studies employed paired-end sequencing

and aligned around 97% of reads. London averaged 53 million reads per sample and

Warwick averaged 7.2 million. Here London and Warwick were aligned and processed 79

in the same manner, leading to the identification of 23617 genes while 19897 genes were identified for the Detroit study (microarray). After removal of duplicate genes and those with very low expression, the gene expression matrices were similar in size (around

15000 genes). Finally, 410 genes are significant in both the W-IL vs W-NL and the L-ILEs vs L-NL, 48 of which are not tiled on the array in the Detroit study.

Comparing tissue studies with other experimental techniques. The expression signa- tures obtained in chapter 3 can also be applied to understand how well other experi- mental samples match to true phenotypical conditions within the myometrium, at least in terms of gene expression patterns. For example, the trained classifiers and SVD sin- gular vectors can be employed to assess samples from new studies, cell lines, or even different uterine tissues. The similarity between cell line models and the tissues they are derived from is a very important aspect of consideration for in vitro research. Therefore, the consistencies and inconsistencies between tissue-based gene expression signatures and cell-line based expression signatures can serve as a means to understand the model system and assess the ability of the model system to accurately represent the biological state(s) being studied.

Results reveal differences between computational techniques. The results in chapter

3 also revealed certain strengths and weaknesses of different analysis techniques in the context of our research problem. One example can be seen from the LASSO and EN clas- sification performance on different datasets and classification tasks. It was observed that models derived from the Warwick and Detroit datasets produced similar perfor- mance on the London dataset despite the fact that the number of samples in the De- troit study is much greater than that of the Warwick study. This may be simply due to the observed higher level of differential expression of the Warwick study or an artifact 80

of comparing studies from different platforms. Additionally, the classification perfor- mance suggests that EN is better suited for cross-dataset classification. One probable reason for this is that EN, unlike LASSO, allows correlated features to be included in the

final model. A LASSO model will be restricted to including only one of these correlated features, which may or may not be differentially expressed in the testing dataset. On the other hand, EN can include this gene and others with similar expression patterns. This increases the likelihood that at least one of the correlated genes will exhibit differential expression in the testing set, allowing it to improve classification accuracy. It is impor- tant to note that the genes selected by these algorithms are not necessarily the genes most associated with phenotypic changes. Rather, they represent sets of genes that can robustly distinguish different phenotypes when considered together. Many other genes may exhibit differential expression between different phenotypes, the differential ex- pression of these genes can be more significant than that of the genes that are selected, and some of those genes may be the genes most associated with phenotypic changes.

However, such genes may be left out of the model if they do not provide additional infor- mation in the presence of other differentially expressed genes. Taken together with the previously mentioned sparsity induced by LASSO and EN, it is not surprising that there was little overlap between the differentially expressed genes and those genes incorpo- rated in the classification models. We also observed little overlap between model genes for different datasets but good cross-study classification. This likely represents the iden- tification of different genes that are part of similar biological systems/processes. Noise in gene expression data can affect the individual dataset signatures, but these algorithms can pick up a signal present in all studies. Another example comes from comparing the 81

differential expression and pathway enrichment results with the SVD output. The for- mer two analyses identified signatures that were almost exclusively associated with in-

flammatory processes (a majority of the 126 shared differentially expressed genes as well as most of the significant GSVA pathways were linked to inflammation). However, SVD identified genes highly associated with the non-labor phenotype (Fig. 3.3B), leading to improved characterization of the transcriptional programs utilized throughout parturi- tion.

Toward characterizing the systems biology of labor. In chapter 3, robust transcrip- tional signatures that could be used to allocate a clinical phenotype to a given sample were identified. These signatures, as seen from gene and pathway enrichment analyses, were highly composed of inflammatory-related processes. This attests to the impor- tance of inflammation in the onset of labor. However, certain pathways may be highly complex with sub-paths signaling during labor and others signaling during quiescence.

This can influence analyses like pathway enrichment. For this reason, it is important, and as we showed can be very informative, to utilize network and pathway interaction information to better interpret how genes are altered in the overall scheme for a pathway of interest.

The London dataset in chapter 3 incorporated a new clinical group to the two ex- isting ones (IL and NL), ILEa: early stage labor, allowing for more time points around the pivotal period of parturition. Although no significant gene expression changes were observed, it is still critical for future studies to assess biological changes during vari- ous stages of pregnancy and labor. Migale et al. (2016) carried out a temporal tran- scriptomics (RNA-seq) study on CD1 mice under conditions of term gestation, RU486- induced preterm labor, and lipopolysaccharide (LPS)-induced preterm labor.77 They 82

identified that the LPS model (inflammatory) most closely resembled human labor, ex- hibiting specific changes that were also seen in this study (up-regulation of interferon, , IL6, JAK-STAT, and cytokine signaling pathways). Salomonis et al. (2005) per- formed global gene expression analysis (microarray) on myometrial samples collected from mice at multiple stages of parturition (non-pregnant, mid-gestation, late gestation, and postpartum).78 Results from this study exhibit strong similarity with the results pre-

sented here including up-regulation of cAMP and progesterone signaling during qui-

escence and up-regulation of contractile signaling and cell remodeling during labor.

Further research in this area will help identify the timing mechanisms and responsible

pathways for transitioning the quiescent myometrium to the contractile state, which in- volves many factors and processes (inflammatory load, functional P4 withdrawal, tissue

cross-talk) working in concert.79 Understanding these timing mechanisms and the rela- tionship between P4 and inflammation can then be used to model parturition and make predictions about preterm birth risk.80 Larger studies probing gene expression through- out pregnancy and labor in humans is much more difficult, however, steps in this direc- tion would improve our understanding of how the uterine tissues establish and maintain quiescence, prepare for labor, and achieve parturition.

It is important to note that the transcriptome datasets were derived from total RNA extracted from whole-tissue samples, which are heterogeneous, containing myometrial cells, connective tissue and immune cells such as leukocytes. Shynlova et al. (2013) showed in mice that the number of immune cells recruited to the myometrium increases at the time of parturition.81 This is therefore an important consideration when assess- ing changes observed in the transcriptional studies analyzed in this work. To obtain a 83

full picture of transcriptional alterations during parturition, further studies utilizing ad- vanced experimental techniques (e.g. single-cell transcriptomics) are needed to assess gene expression changes specific to myometrial cells as well as determine the popula- tion dynamics of the non-myometrial cells during quiescence and labor.

6.2 Cell Line Analysis

Importance of the PR-A:PR-B Ratio. Human parturition is associated with an increase in the PR-A:PR-B ratio that is thought to cause functional P4 withdrawal2. Recent studies suggest that the PR-A:PR-B ratio is affected by P4 and pro-inflammatory stimuli due to differential effects of serine phosphorylation on the stability of PR-B

(steady-state level decreases) and PR-A (steady-state level increases)5. Here it was

proposed that the labor associated increase in the myometrial PR-A:PR-B ratio occurs

in response to pro-inflammatory stimuli inducing increased stability of PR-A17. Using

tissue similarity index analyses it was observed that hTERT-HMA/B cells exposed to P4

and forskolin mimic quiescent myometrium whereas cells exposed to IL-1β mimic

laboring myometrium. This suggest that transition to the labor phenotype (at least

based on transcriptome analyses) could be caused by a loss of cAMP/PKA signaling

and/or gain of pro-inflammatory stimuli that alters P4/PR activity due to changes in PR

phosphorylation. In this context the PR-A:PR-B ratio in myometrial cells reflects the

balance between the cAMP/PKA and inflammatory load stimuli.

Considerations Regarding Enrichment Analysis. Analysis of gene expression changes

using gene set enrichment allows for better interpretation of the altered expression of

hundreds of genes by grouping them in biologically meaningful sets (e.g. pathways or TF 84

targets). While this analysis is highly useful, there are caveats to consider. One is that the top enriched sets are not always the definitive cause/effect of the observed expression changes. This is especially the case for TF enrichment. While many targets of a specific

TF may exhibit altered expression, one cannot be certain that these alterations were due to increased or decreased activity of a specific TF in question. Furthermore, when many genes are changing, it is even less likely that a specific TF is solely responsible but rather a combination of some of the significantly enriched TFs. As an example, targets of

P4/PR were significantly down-regulated in the IL-1β vs CTRL comparison (Fig. 4.5B).

By experimental design, these cells express PR, however, PR is only expected to be active when bound to its ligand P4. Therefore, it is likely that other factors are responsible for changes in these genes (or potentially ligand-independent actions of PRs), but the enrichment shows PR as a top candidate.

GSVA was used to calculate enrichment partly because it provides directionality for each gene set in the form of an enrichment fold change and our initial hypothe- ses suggested opposite regulation of similar pathways in different conditions (e.g. pro- inflammatory signaling by IL-1β and anti-inflammatory action of P4). GSVA takes into

account the direction of expression change in each gene in the gene set while other en-

richment approaches calculate differential expression for each gene first, followed by

tallying the number of significant genes in the set, regardless of up or down-regulation.

Both analyses have their advantages and shortcomings and the approach used should

be based on the study design and research questions. For regulatory network enrich-

ment analysis (RNEA), TFs that undergo significant changes in expression are included

in the network. However, it is known that this is not solely dependent on gene expression

as most TFs are activated by post-translational modifications. RNEA covers this case by 85

allowing TFs exhibiting minimal or no change in expression in the network if it is up- and down-stream of two differentially expressed genes. TFs having no differentially ex- pressed up-stream regulators may play a role in the observed down-stream changes in target expression but will not be incorporated into the network. GSVA and RNEA were combined in the analysis to circumvent these shortcomings. RNEA gives the regulatory pathway detail with targets and the directionality of their changes, while GSVA requires no information about the transcriptional alteration of the TFs themselves. Combining the techniques allows for identification of pathways, TFs, and targets of those TFs that

fit specific relationships between P4, cAMP,and IL-1β signaling.

Observing results from these two approaches in the context of chapter 4 suggests two potential avenues for treating preterm birth. The transcription factor complex AP-1,

specifically the JUN sub-unit, was seen to be a key part of the inflammatory signaling in-

duced by IL-1β. This was distinctly inhibited with co-treatment of P4 and up-regulated

targets of JUN were also strongly inhibited by P4 alone, suggesting that this is one of

the key mechanisms influenced by P4 to maintain quiescence throughout pregnancy.

On the other hand, NF-κB was clearly a major part of the IL-1β-induced inflammatory

signaling and this activity remained significant under co-treatment with P4 and/or FSK.

Therefore, molecular targeting of JUN or NF-κB may reduce the likelihood of preterm

delivery. Inhibition of NF-κB would provide a second anti-inflammatory action sepa-

rate from that carried out by P4 and FSK while inhibition of JUN would act alongside

P4 and cAMP signaling in preventing signaling through this TF. Treatment with a JUN

inhibitor during late stages of pregnancy may work most effectively, as it could work in

place of P4 after its functional withdrawal. NF-κB is thought to cooperate with AP-1

and CEBP to activate labor genes due to the presence of both AP-1 and NF-κB binding 86

sites on the promoters of these genes.82,83 Therefore, an inhibitor of just one of AP-1 or

NF-κB could be successful in reducing the risk of preterm birth. Recent studies suggest that full inhibition of NF-κB signaling would be undesirable for a number of reasons and suggest partial inhibition through a small noncompetitive IL-1R biased ligand or com-

pounds targeting the non-canonical NFκB signaling pathways.82,84 Finally, the analysis

in chapter 4 identified an important role for cAMP signaling in terms of promoting qui-

escence (smooth muscle relaxation and hypertrophy). Therefore, there are three main

components to consider when developing any therapeutics for preterm birth, whether

it be inhibiting anti-inflammatory actions or stimulating relaxatory pathways.

6.3 Tissue and Cell Line Comparison

The use of cell line models as an approximation of in vivo systems provides alternative

and useful avenues of research to address various biological questions. The experimen-

tal model setup used in chapter 4 involving treated cell lines allows for isolated investiga-

tion of processes that are interconnected in a complex manner throughout parturition.

Through the comparisons in chapter 4 synergistic effects of P4 and cAMP was identified

as well as how cAMP inhibits signaling by IL-1β. These insights are made more mean-

ingful when the in vitro model was shown to effectively approximate term myometrium.

Importantly, as expected, IL-1β- and FSK + P4-treated cells had similar gene expression

profiles to laboring and non-laboring myometrial samples, respectively. The similarity

- at least at the gene expression level - suggests that the hTERT-HMA/B cell line mimics

the gravid myometrium in response to specific treatments. Interestingly, under basal 87

conditions hTERT-HMA/B cells appear less relaxed than non-laboring myometrium but can be transformed to this phenotype by exposure to P4 and FSK.

6.4 Conclusions

Translational Implications. Parturition is a complex process involving significant changes at the cellular and tissue levels. This makes it difficult to understand and treat labor disorders such as preterm labor. Understanding how parturition is carried out by the uterine tissues is key to gaining insights into the cause of labor disorders. By utilizing three large-scale molecular datasets in combination with advanced bioinformatics techniques, it was shown that there are significant changes at the level of transcription within the myometrium during the transition from quiescence to labor.

Furthermore, genes and pathways representing these changes were identified and a signaling network of parturition was constructed. This work, detailed in chapter 3, demonstrates the utility of bioinformatics approaches as a means to probe complex biological events, as it has provided robust transcriptional signatures that can be useful for both characterizing, by looking at the representation of these signatures in a given clinical sample, and understanding parturition. Additionally, combination of quality datasets can provide results that may not be seen through analysis of individual studies and allows for more powerful conclusions to be reached.

Experimental Implications. The cell line analysis presented in chapter 4 is, to my knowledge, the first to examine the relationship between major pathways that control human parturition at the genome-wide level in myometrial cells. P4, cAMP, and inflammatory signaling are believed to play a major role in pregnancy and parturition. 88

Through the presented analysis, testable hypotheses on how these processes interact were generated to guide future research efforts. This includes using the results presented in chapter 4 to find new ways to increase the anti-inflammatory actions of P4 and cAMP to promote uterine quiescence as a means for treating preterm birth. Top targets include JUN and NF-κB, whose inhibition may supplement the action of P4/PR after its functional withdrawal in the myometrium during the late stages of pregnancy.

Final Conclusions. When attempting to understand complex biological processes like parturition, it is important to pursue multiple avenues of research utilizing integra- tive and diverse techniques. Results can then be compared and corroborated to obtain stronger conclusions and more comprehensive knowledge. By studying parturition in myometrial tissues and a treated cell line, it was possible to strategically examine the entire system as a whole using tissue to generate hypotheses about important pathways and then follow this directly in cells by individually targeting these important pathways and the relationships between them. Additionally, the use of multiple tissue datasets and complimentary bioinformatics methods provided a means to establish hypothe- ses with higher confidence while performing similar analyses between tissue and cells

(e.g. pathway enrichment and network construction) allowed for better comparison be- tween the two experimental models. This work demonstrates the utility and effective- ness of the integration of well-designed studies and bioinformatic pipelines to investi- gate a given biological process or disease and will hopefully serve as a useful guide for similar studies in the future. 89

7 Suggested Future Research

7.1 Utilizing Clinical Data

Timing of various physiological and biomolecular events is highly important in partu- rition. There is a need to understand how the triggering of labor is achieved. If it is driven by biological changes in the cell (e.g. gene expression, post translational modifi- cations), do these changes occur gradually or very quickly? Do changes at the transcrip- tional level precede the observations that are made in the clinical setting? To answer these questions, accurate clinical data is needed. Having more information would allow for stratification of women into various stages of labor using bio-molecular markers or signatures, as opposed to the typical two-class conditions of labor and non-labor. Ad- ditionally, it may be possible to identify a series of bio-molecular changes that occur sequentially. Such information would strongly aid our ability to better understand the role of these changes and which may be involved in preterm birth. Regarding results presented in this work, these additional variables would potentially explain the hetero- geneity in the laboring tissue samples or the observed sample miss-classification. 90

7.2 Confirmational Studies for Validation of Findings

The results from chapters 3 and 4 generated a number of hypotheses using various bioinformatics approaches that, if validated experimentally, could strongly support our

findings. It is important to prioritize which findings to validate due to the expenses and time involved in experimental studies. The strongest signal from the cell line analysis

(chapter 4) was that P4 exhibited strong anti-inflammatory action mainly through the

JUN transcription factor. Performing ChIP-qPCR for a subset of JUN target genes (iden- tified from its known targets or the RNEA networks, e.g. MMP1/3/12, SERPINB2, IL6,

BCL2A1, etc.) to see how P4/PR influences the binding of JUN to the promoter regions of its target genes could identify the important aspects of this inhibitory affect and how it is achieved. Another case where ChIP-qPCR could be beneficial would involve look- ing at certain targets of NF-κB since it was observed that they are responsive to IL-1β and strictly not inhibited by P4 and/or FSK. To inform the effects of P4 on the signal- ing mechanisms in pregnancy, it may be of interest to perform phosphorylation stud- ies using the hTERT-HMA/B cell line to investigate specific interactions included in the non-labor specific signaling network from chapter 3 (Fig. 3.7B). In other words, treat cells with P4 and see if signaling proteins down-stream of PR (e.g. COPS9, MAPK10,

PRMT2) become phosphorylated. Experimental validation is an important follow-up step to bioinformatic analyses as confirmations at different levels of biochemical sig- naling/regulation can help researchers understand the accuracy of these analyses and promote more targeted approaches in the future. 91

7.3 Myometrial Cell-Specific Signaling

Tissue samples collected from the myometrium are heterogeneous, including the cells of interest (smooth muscle myometrial cells) but also connective tissue and immune cells (e.g. leukocytes). Therefore, when performing global transcriptomic analysis on these tissue samples, it is important to keep this in mind and note that the changes may not be myometrium specific. It has been shown in this work and others that an increase of inflammation is a key part of the labor process, meaning that an increase in immune cells in the region of myometrial tissue is likely. Thus, it is important to char- acterize these changes, and their effect at the transcriptional level, to better understand signaling at the time of labor in myometrial cells. One potential approach for reduc- ing sample heterogeneity is laser capture micro-disection (LCM), which is a method for isolating specific cells of interest from microscopic regions of tissue with the help of a laser. This technique would allow selection of myometrial cells and removal of un- wanted areas of the sample such as connective tissue, improving the specificity of the transcriptional changes we observe. Another experimental option gaining popularity in the area of transcriptomics is single cell sequencing, which can examine the sequence information from individual cells. These technologies are important to consider in fu- ture research efforts to understand the process of human parturition.

7.4 Linking the Uterine Tissues

Human parturition involves major changes to multiple tissues: cervical dilation, mem- brane rupture, and myometrial contractions. Therefore, to fully understand parturition, it may be necessary to study all uterine tissues during pregnancy and at the onset of 92

labor. Similar studies involving transcriptional data like this one should be performed in the cervix, membranes, and decidua. Are the same changes identified in the my- ometrium occurring in these other tissues during the onset of labor? Is there a cross-talk between the uterine tissues? These are important questions to answer and this can only be achieved by looking at these tissues both individually and as a collective. 93

Appendix A

Supplementary Tables

This section contains tables referenced in the document.

Table 1: Hundred and twenty six high-confidence gene set. Gene name, median •

log2(fold change), and median adjusted p-value for all 126 genes found to be

significantly differentially expressed in the sample group comparisons W-IL vs.

W-NL, D-IL vs. D-NL, L-ILEa vs. L-NL, and L-ILEs vs. L-NL.

Gene Median log2(FC) Median Adjusted P-value

PPP2R2C -0.95523 0.003191

PPP1R3C -0.90792 0.011512

SORL1 0.718524 0.012854

GTPBP4 0.621002 0.008805

CHSY1 0.820915 0.002171

MTHFD2 0.839168 0.001422

NCOA7 0.826025 0.00266

DUSP5 0.970045 0.002171

CD55 0.842211 0.005761

ETS2 0.791655 0.005359

LCP1 0.913972 0.003191

ELL2 1.002648 0.001425

LEFTY2 1.531495 0.002384

RNF149 0.918522 0.002384 94

BMP2 1.089915 0.013079

MNDA 0.899079 0.00475

MAP3K8 0.993665 0.002384

UAP1 0.969502 0.002384

NFKB2 0.825443 0.005728

GADD45A 1.09781 0.002074

SERPINB8 0.927669 0.003534

MAFF 1.298045 0.002171

NFKBIA 0.989885 0.004696

SIPA1L2 1.314768 0.002384

GCA 1.067897 0.005021

PTPRE 0.903658 0.002384

CCRN4L 1.214445 0.007029

UGCG 1.156174 0.003002

CSRNP1 1.449044 0.001511

FCGR3B 1.32533 0.003191

LDLR 0.999709 0.002384

ITPRIP 1.001999 0.002099

MXD1 1.30231 0.002312

TMEM2 1.311551 0.002384

HIF1A 1.016104 0.002384

TNFAIP3 1.398003 0.002384

KRT34 1.81075 0.002384 95

MYC 0.957301 0.002384

UPP1 1.254569 0.002384

TRIB1 1.221102 0.001926

IL4R 0.904563 0.002375

ZC3H12A 1.23521 0.002384

CEBPB 0.86979 0.003691

HBEGF 1.713045 0.002384

PNP 1.524425 0.002384

SBNO2 1.01676 0.002384

LMNB1 1.189158 0.006055

GFPT2 1.091361 0.002384

LMCD1 1.70608 0.002384

NNMT 1.05658 0.002099

PFKFB3 1.318825 0.00226

PIM1 1.274812 0.002361

SLC39A14 1.260558 0.002384

ALPL 1.072588 0.005215

ANPEP 1.424565 0.002384

CEBPD 1.011887 0.002384

ADAMTS9 1.339353 0.002384

SPHK1 1.695745 0.002384

IER3 1.523745 0.002384

SLA 1.459215 0.002384 96

SERPINE2 1.134225 0.008415

BCL3 1.122565 0.002384

SELL 1.664555 0.002384

FOSL1 2.26029 0.002384

ADM 1.45817 0.002384

ITGAX 1.8564 0.002384

FCN1 1.560249 0.002384

RDH10 1.270945 0.002384

SLC7A5 1.636065 0.002074

SOD2 1.615446 0.002384

MYO1G 1.410975 0.003191

PLAUR 1.787685 0.002384

IL1RN 2.601395 0.002384

RUNX1 1.214055 0.002384

LILRB3 1.187979 0.004659

IL1B 3.395985 0.002384

RGS16 1.92145 0.002384

FGR 1.540165 0.002384

ADAM8 1.872865 0.002384

SERPINA1 1.92345 0.002384

GK 2.43881 0.002384

FPR1 2.30624 0.002384

TNFAIP6 2.41967 0.002384 97

TREM1 2.676575 0.002384

STEAP1 2.032775 0.002384

IL7R 2.05551 0.002384

CLEC5A 2.819755 0.002384

CSF3R 1.885729 0.002384

PTGS2 2.61283 0.002099

HK2 1.750735 0.002207

CXCR2 2.19877 0.002384

DIO2 1.151498 0.002384

GREM1 1.995947 0.001935

FFAR2 2.589005 0.002384

SDS 3.219885 0.00448

RARRES1 1.36736 0.002384

CXCR1 2.646205 0.002384

MMP25 1.527125 0.006726

NAMPT 2.575605 0.002024

PTGES 1.739667 0.003534

CCL2 2.553615 0.002384

S100A9 2.016662 0.001535

MT1X 1.98029 0.002171

SOCS3 1.9309 0.002384

LIF 4.09664 0.002384

DIO3 1.334307 0.002099 98

LRG1 2.95882 0.002384

SLC16A10 4.189705 0.004162

PROK2 3.282925 0.002384

LILRA5 2.14944 0.001425

FPR2 3.385695 0.002384

NFE2 1.899855 0.003566

S100A8 2.43954 0.001896

AQP9 3.11008 0.00171

CXCL2 2.80759 0.002384

S100A12 2.31903 0.00475

IL6 3.09081 0.002384

MT2A 2.56653 0.001814

OSM 3.52356 0.003262

ANGPTL4 2.08278 0.002375

SELE 4.53256 0.003983

IL8 5.717645 0.002099

C19orf59 3.455735 0.002874

CCL11 3.078985 0.002384

CXCL6 2.84988 0.004692

CSF3 6.069955 0.008667

Table 2: Unique genes recruited in Warwick EN models. List of genes included • in at least one of the 100 Warwick EN models in 3-fold cross-validation. Column 99

1 provides the gene name, column 2 the percentage of model for which the gene was recruited (95 = 95 of 100 models), and column 3 the average coefficient of the gene for models in which it was recruited.

Gene Model Occurrence % Mean Model Coeffictient

MFAP2 100 0.055456654

PTGER3 100 -0.055312544

SLC43A3 100 0.042976227

KLF12 100 -0.033053288

ALDH1A2 100 -0.038536515

MLKL 100 0.048494287

SPHK1 100 0.068076662

SOCS3 100 0.05452764

C19orf48 100 0.04769892

PCYOX1 100 -0.064223546

SEC23B 100 0.028985156

HM13 100 0.04397553

MITF 100 -0.050448209

ATP1B3 100 0.030574544

SUMF1 100 -0.039376918

PCDH18 100 -0.060874845

ENPP1 100 -0.046371766

SASH1 100 -0.029205489

CAV2 100 -0.035534939 100

OSGIN2 100 -0.060924667

RAI2 100 -0.030830841

ARHGEF6 100 -0.032663317

RCC2 99 0.028576206

MARCH8 99 -0.025981063

CAB39L 99 -0.021524201

FUS 99 0.049451994

STXBP4 99 -0.031382984

LAMA1 99 0.03703682

EPB41L1 99 -0.022160437

IQCB1 99 0.050785036

H2AFY 99 0.025707875

SMAGP 98 0.035492593

PXMP4 98 -0.018151992

CAV1 98 -0.014656707

WDPCP 97 -0.029634657

C16orf59 95 0.035434844

DGCR5 95 0.027227933

LINGO2 95 -0.014788474

CRYZ 94 -0.015830414

SYT6 94 -0.017217741

CLK2 87 0.031805491

RELT 87 0.017522776 101

SH3D19 87 -0.014740035

CDK6 87 -0.018774441

CLSTN3 83 0.017187335

PLK3 76 0.017127736

FDPS 76 0.028329465

EMX2 76 -0.01010169

PITPNC1 76 0.02032546

HN1 76 0.011473405

DNAJB11 76 0.012164545

MYL3 76 -0.02793828

CYP2U1 76 -0.007713685

FBXO8 76 -0.031113747

LEPRE1 73 0.025994706

CHST6 73 0.018269392

CD109 73 -0.013500458

ATP8A1 71 -0.015912672

TSPAN5 71 -0.009770869

EPM2A 71 -0.013864931

FKBP11 70 0.015546604

GLYCTK 70 0.018336927

PALMD 69 -0.009585427

ALX3 69 -0.012381019

FAM189A2 69 -0.012373473 102

PTPRO 68 -0.016297166

UTP11L 65 0.020372246

MEAF6 65 -0.011551939

RABGGTB 62 0.017060963

TMEM200B 62 -0.017427013

BEND7 62 -0.022020567

TMOD2 62 -0.009782723

ZIK1 62 -0.018621767

HNRNPL 62 0.013285247

H1F0 62 -0.010936904

MAP3K4 62 0.014109961

SDC1 59 0.017158121

XBP1 59 0.01964373

MOGS 58 0.017222775

PRSS23 52 -0.012594117

HDAC3 52 0.014120383

TBL2 52 0.009382891

SCARF1 48 0.022630636

CNN2 48 0.012477531

SNN 46 -0.01048817

PTPN2 42 0.017123786

SLC24A6 36 0.020634464

KDM2B 36 0.016221773 103

SGPP2 36 -0.014881268

CERS6-AS1 36 -0.016614687

GGCT 36 0.011838866

MTERFD2 35 0.019410875

SGK1 35 0.018065784

KANK1 35 -0.011285577

NLRP6 33 0.012068898

POM121L9P 33 0.011308218

SEC61A1 33 0.007178117

ASTN2 33 -0.010339384

NONO 33 0.009800326

ZFAS1 32 0.008224224

AMPD2 28 0.0091122

HERPUD1 28 0.005229233

CSMD2 27 0.014017173

DUSP6 27 0.009835593

RNF14 27 -0.00929602

DCLRE1A 26 -0.014059007

EBPL 26 0.006635634

PDE8A 26 0.00993137

HSD17B13 26 -0.008513232

HNRNPH1 26 0.012253197

LGR4 24 -0.004400984 104

ZC3H6 24 -0.00919652

LOC254128 24 -0.006671542

NOL12 24 0.008317766

MYLK-AS1 24 -0.009262725

MSC 24 0.012266858

RIBC1 24 -0.002720396

DCK 23 -0.004691689

OLFML2A 23 -0.009578599

KIAA1462 22 -0.002656676

IFITM3 22 0.003251581

LOH12CR2 22 -0.002964264

NXPH3 22 -0.003316063

SRBD1 22 -0.007116577

AJUBA 18 -0.001556845

ZDHHC3 18 -0.002692914

MARCKSL1 13 0.002155927

SYTL2 13 0.001642151

MMP19 13 0.003509749

CCR7 13 0.004691744

LINC00152 13 0.00312735

PABPC1L 13 0.002441463

BCAS1 13 0.002617315

ABLIM3 13 -0.00197416 105

FPGS 13 0.004640536

LILRB1 11 0.000432662

C1QTNF7 10 -0.002632509

GUCY1A2 7 -0.00142709

GPR68 7 0.000998028

DHRS13 7 0.000432662

DTNA 7 -0.001155681

RPSAP58 7 0.001157481

GNL3 7 0.001356156

WDR46 7 0.001670878

C1orf21 1 -0.000258785

FZD4 1 -0.000530091

ENC1 1 0.000980655

Table 3: Unique genes recruited to distinguish the London NL samples in • LASSO models. List of genes included in at least one of the 100 London LASSO

models in 5-fold cross-validation, specifically for the NL class. 106

Gene Model Occurrence % Mean Model Coefficient

PER3 100 0.55719 MAGOH2 65 0.16145 LOC100505933 57 0.16289 CEBPA 36 0.16122 TEF 27 0.08654 FAM115C 10 -0.21290 C3orf49 7 0.05133

Table 4: Unique genes recruited to distinguish the London ILEa samples in • LASSO models. List of genes included in at least one of the 100 London LASSO

models in 5-fold cross-validation, specifically for the ILEa class. 107

Gene Model Occurrence % Mean Model Coefficient

C1orf229 87 -0.34234 VEPH1 45 0.08555 GEMIN7 36 0.14640 KLK13 36 0.27511 FLJ30403 34 0.12393 ABT1 27 0.15123 NFYC 23 0.16612 GRPR 22 0.05461 APLN 22 0.04980 CCDC144B 18 0.10765 DDX11L2 16 0.22352 NDST3 9 0.18653

Table 5: Unique genes recruited to distinguish the London ILEs samples in • LASSO models. List of genes included in at least one of the 100 London LASSO

models in 5-fold cross-validation, specifically for the ILEs class. 108

Gene Model Occurrence % Mean Model Coefficient

IFT43 69 0.42317 CHRNB2 47 0.06018 MAP2K1 46 0.29258 GOT2 27 0.06024 SPAG17 24 0.02246 SLC22A16 18 0.11233 ANKRD11 15 -0.14405 FGF17 11 0.12297 PGAM2 9 -0.09484 DDX59 5 0.04245 BOK-AS1 5 0.01617 CYB561D1 2 -0.00143

Table 6: Unique genes recruited in Detroit EN models. List of genes included in • at least one of the 100 Detroit EN models in 5-fold cross-validation. 109

Gene Model Occurrence % Mean Model Coefficient

ADORA2B 100 0.259

AEBP1 100 0.108

AMPH 100 0.256

C21orf70 100 0.592

C5orf25 100 -0.208

C6 100 -0.137

C6orf222 100 0.765

CASC1 100 -0.215

CYP4B1 100 -0.294

DBP 100 -0.402

FKBP5 100 -0.171

GREM1 100 0.297

HIST1H3D 100 0.462

HOXA11 100 -0.052

HTR3D 100 0.671

IGFBP6 100 -0.233

KLHDC9 100 -0.062

KRT17 100 0.047

LOC338799 100 -0.165

LOC440934 100 -0.018

CDC14C 100 -0.978

MS4A4A 100 -0.369 110

NUP98 100 0.454

SHMT2 100 0.063

SNORD15B 100 0.457

SSFA2 100 -0.364

STRBP 100 -0.470

TCTA 100 -0.127

TLR7 100 -0.097

ZFHX3 100 -0.180

ZNF189 100 -0.175

ZNF235 100 -1.491

ANK3 99 0.924

DPM2 99 0.087

PLEK2 99 0.403

PTGER4 99 0.059

SPRED3 99 0.501

SUFU 99 -0.445

C16orf86 98 -0.063

CHCHD4 98 0.046

TPPP 98 -0.024

DRG1 96 0.273

MITF 96 -0.038

RPL13P5 94 0.348

C17orf97 94 -0.090 111

ZNF681 92 -0.390

FAM164C 89 -0.064

C9orf93 89 -0.045

KLK7 89 0.108

SLITRK4 89 -0.013

ZBTB26 89 -0.206

GPR126 85 -0.072

VMA21 85 0.067

FAM26F 81 -0.165

PAFAH1B2 81 -0.128

RPS6KL1 81 -0.065

ZNF655 81 0.216

C15orf38 78 -0.106

EIF4E1B 78 0.154

SH2D1A 78 0.322

FAN1 73 -0.024

LOC644714 71 -0.163

C11orf91 71 0.040

KDELR2 56 0.049

PCDHGA6 56 0.083

SFRP1 54 0.004

LYSMD4 47 -0.074

WNK3 47 0.103 112

SCN3B 41 -0.051

KRT6A 38 0.006

PAQR8 34 -0.018

ATP9B 31 0.089

GPR37 29 -0.008

ANGEL2 24 -0.054

CEND1 24 -0.031

GCOM1 20 -0.012

CLK2 18 0.060

PCDHGB1 18 -0.072

ARID4B 13 -0.157

DKK1 13 -0.031

ERMN 13 0.105

SRMS 13 -0.051

TNNI3K 11 -0.046

KCNG3 9 -0.077

C17orf39 8 -0.014

DCBLD1 8 0.003

LOC653720 8 0.051

SCPEP1 8 -0.017

TAGLN3 8 0.036

RERG 6 -0.005

ASAH2B 5 0.022 113

LRRC3 4 -0.041

FAM122A 3 -0.006

PRKG1 3 0.054

OR1N2 2 0.034

PDE3A 2 0.019

SRPX 2 -0.005

COPS7B 1 -0.001

Table 7: GO Biological Processes significantly enriched in at least one of the sin- • gle stimulus conditions (P4, FSK, IL-1β) compared to CTRL and in the Warwick

study (labor vs non-labor) determined by GSVA. Significance is determined by

log(fold change) 1.5 and adjusted p-value < 0.05. Log (FC) values are re- > 2 ported for the relevant comparisons. 114 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL 0.000 1.2070.000 0.000 -1.0860.000 0.600 0.000 1.186 0.019 0.672 0.0000.000 0.008 0.822 -1.0910.000 0.000 0.007 -1.113 0.819 0.0000.000 0.002 0.630 0.972 0.018 0.000 0.773 0.007 vs CTRL GeneSetGO POSITIVE REGULATION OF EPIDERMIS DEVELOPMENT GO PYRIMIDINE CONTAINING COMPOUND BIOSYNTHETIC PROCESS GO REGULATION OF CHOLESTEROL EFFLUX GO TRNA TRANSPORTGO POSTTRANSCRIPTIONAL logFC GENE P4 SILENCING GO GLUCOSE 6 PHOSPHATE METABOLIC PROCESS GO FATTY ACID HOMEOSTASISGO PERK MEDIATED UNFOLDED 0.000PROTEIN RESPONSE -1.119 0.000 0.000 0.905 0.917 0.001 0.000 0.785 0.001 115 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL 0.000 -0.965 0.0000.000 0.709 0.782 0.033 0.0000.000 0.932 -1.008 0.001 0.0000.086 0.672 -0.815 0.008 0.402 0.744 0.019 vs CTRL GeneSetGO DNA DAMAGE RESPONSE DETECTION OF DNA DAMAGE GO CYTOPLASMIC TRANSLATIONGO POSITIVE REGULATION OF TRANSCRIPTION FROM RNA POLYMERASE II PROMOTER IN RESPONSE TO STRESS GO MATURATION OF 5 0.000 8S RRNAGO PYRIMIDINE logFC NUCLEOSIDE P4 -1.080BIOSYNTHETIC PROCESS 0.000GO NCRNA PROCESSINGGO PROTEIN LOCALIZATION TO 0.934 0.000ENDOPLASMIC RETICULUM 0.001 -1.027 0.000 0.220 1.012 0.002 -0.756 0.159 0.832 0.001 116 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL 0.000 -0.879 0.000 1.097 0.000 0.000 0.7950.000 0.000 -0.805 -0.702 0.000 0.002 0.777 0.020 vs CTRL GeneSetGO MATURATION OF SSU RRNA FROM TRICISTRONIC RRNA TRANSCRIPT SSU RRNA 5 8S RRNA LSU RRNA GO RRNA METABOLIC PROCESSGO PROTEIN EXPORT FROM NUCLEUSGO TRANSLATIONAL INITIATIONGO RIBOSOME BIOGENESISGO NCRNA logFC 3 END 0.000 PROCESSING P4 0.392GO NCRNA METABOLIC PROCESS -0.960GO -0.789 POSITIVE REGULATION OF 0.282 0.000DENDRITE MORPHOGENESIS 0.313 -0.839GO ESTABLISHMENT OF PROTEIN 0.359 0.673 0.000 0.872 0.232LOCALIZATION TO 0.044 -0.800 0.021 GO RESPONSE 0.002 TO -0.844 PAIN 0.770 -0.704 0.222 0.000 0.018 0.204 0.874 0.879 0.761 0.001 0.002 0.002 0.000 0.661 0.000 0.696 0.005 117 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL 0.000 -0.8010.087 0.000 -0.7840.000 0.950 -0.171 -0.751 0.001 0.811 -0.142 0.001 0.8450.000 0.008 -0.634 -0.048 0.709 0.006 vs CTRL GeneSetGO TRNA MODIFICATIONGO RIBOSOMAL SMALL SUBUNIT BIOGENESIS GO RIBONUCLEOPROTEIN COMPLEX LOCALIZATION GO MULTI ORGANISM METABOLIC 0.000PROCESS GO TRNA logFC METABOLIC -0.680 PROCESS P4 GO MATURATION OF SSU RRNA 0.218GO CAPPINGGO 0.814 RIBONUCLEOSIDE DIPHOSPHATE METABOLIC PROCESS 0.005 0.000GO NUCLEAR EXPORT 0.000 -0.629 -0.813 -0.007 0.000 0.000 0.722 -0.828 0.976 0.005 -0.742 0.000 0.144 -0.637 -0.718 0.022 -0.215 0.673 0.003 118 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL 0.000 -0.7770.217 0.000 -0.698 0.733 0.2100.000 0.024 0.857 0.6210.000 0.001 0.000 0.6880.000 0.931 -0.230 -0.590 0.001 0.6920.000 0.795 0.008 -0.691 0.674 0.000 0.007 0.657 0.010 vs CTRL GeneSetGO NUCLEAR TRANSCRIBED MRNA CATABOLIC PROCESS NONSENSE MEDIATED DECAY GO RIBONUCLEOPROTEIN COMPLEX BIOGENESIS GO RIBOSOME ASSEMBLYGO RESPONSE TO LAMINAR FLUID SHEAR STRESS GO POSITIVE logFC REGULATION OF P4 ERYTHROCYTE DIFFERENTIATION GO T CELL ACTIVATION INVOLVED 0.000IN IMMUNE RESPONSE GO NUCLEOTIDE BINDING -0.822 DOMAIN RICH REPEAT CONTAINING 0.000RECEPTOR SIGNALING PATHWAY 0.798 0.006 119 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL 0.000 -0.657 0.7100.000 0.659 0.645 0.021 0.0000.000 0.595 0.6230.000 0.016 0.000 -0.9080.000 0.660 0.000 -0.611 0.006 -0.814 0.000 0.018 0.587 0.007 vs CTRL GeneSetGO LIPOPOLYSACCHARIDE MEDIATED SIGNALING PATHWAY GO FIBRINOLYSISGO NUCLEOTIDE PHOSPHORYLATIONGO NEGATIVE REGULATION OF LEUKOCYTE APOPTOTIC PROCESS GO NUCLEOSIDE 0.000 SALVAGEGO REGULATION logFC OF TRANSCRIPTION P4 -0.596REGULATORY REGION DNA BINDING -0.042GO NEGATIVE REGULATION OF POTASSIUM 0.000ION TRANSMEMBRANE TRANSPORTER 0.700 ACTIVITY -0.594GO POSITIVE REGULATION OF 0.007 0.000INTERLEUKIN 0.199 1 BETA PRODUCTION -0.851 0.708 0.000 0.042 0.784 0.001 120 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL 0.000 -0.6110.000 0.772 0.6290.000 0.693 0.791 0.771 0.006 0.000 0.630 0.000 -0.835 0.040 0.596 0.000-1.064 0.019 -0.911 0.345-1.064 0.002 0.517 0.345 0.939 0.554 0.001 0.722 0.006 vs CTRL GeneSetGO POSITIVE REGULATION OF INTERLEUKIN 1 SECRETION GO REGULATION OF ALPHA BETA T CELL PROLIFERATION GO REGULATION OF MEIOTIC NUCLEAR DIVISION GO NEGATIVE REGULATION OF POTASSIUM ION logFC TRANSMEMBRANE TRANSPORT P4 GO EMBRYO IMPLANTATIONGO POSITIVE REGULATION OF GLIAL CELL DIFFERENTIATION GO POSITIVE REGULATION OF GLIOGENESIS -1.154 -0.230 0.752 0.720 0.002 121 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL -1.005 0.413-1.005 0.363 0.405-0.721 0.633 0.486 -0.005 0.018 0.000 1.067 0.538 0.000 0.001 0.000 0.618 1.114 0.004 0.3340.000 0.594 0.923 0.000 0.014 0.622 1.180 0.005 0.597 0.045 vs CTRL GeneSetGO REGULATION OF ASTROCYTE DIFFERENTIATION GO POSITIVE REGULATION OF ASTROCYTE DIFFERENTIATION GO POSITIVE REGULATION OF SMOOTH MUSCLE CELL PROLIFERATION GO POSITIVE REGULATION OF NF KAPPAB IMPORT logFC INTO NUCLEUS P4 GO POSITIVE REGULATION OF LIPID TRANSPORT GO NEGATIVE REGULATION OF INTERLEUKIN 10 PRODUCTION GO COPPER ION TRANSPORT 0.000 -0.428 1.010 0.704 0.002 122 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL 0.000 0.0000.000 0.982 -0.2550.000 0.654 -1.228 -0.203 0.012 0.657 1.0120.000 0.037 0.706 0.3760.000 0.004 0.913 -0.0420.000 0.594 0.967 -0.190 0.007 0.676 0.985 0.010 0.865 0.002 vs CTRL GeneSetGO REGULATION OF MACROPHAGE ACTIVATION GO REGULATION OF UBIQUITIN PROTEIN ACTIVITY GO CYTOPLASMIC PATTERN RECOGNITION RECEPTOR SIGNALING PATHWAY GO ACUTE PHASE RESPONSEGO POSITIVE logFC REGULATION OF P4 ACUTE INFLAMMATORY RESPONSE GO CELLULAR RESPONSE TO INTERLEUKIN 6 GO DENDRITIC CELL 0.000DIFFERENTIATION -0.042 1.076 0.747 0.001 123 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL 0.000 0.0000.000 -0.891 -0.5010.000 0.883 0.913 0.103 0.001 0.000 0.640 0.705 -0.011 0.008 0.000 0.641 0.743 0.000 0.009 0.000 0.604 0.712 0.008 0.023 0.000 0.632 0.638 0.051 0.040 0.648 0.595 0.007 0.677 0.003 vs CTRL GeneSetGO MICROTUBULE DEPOLYMERIZATION GO REGULATION OF INTERLEUKIN 1 SECRETION GO ALPHA BETA T CELL DIFFERENTIATION GO POSITIVE REGULATION OF INTERLEUKIN 2 logFC PRODUCTION P4 GO POSITIVE REGULATION OF ALPHA BETA T CELL PROLIFERATION GO POSITIVE REGULATION OF ALPHA BETA T CELL ACTIVATION GO REGULATION OF ALPHA BETA T CELL ACTIVATION 124 adj.P.Val L vs NL logFC L vs NL logFC IL1B vs CTRL logFC FSK vs CTRL 0.000 -0.2440.000 0.675 -0.362 0.760 -0.872 0.008 0.647 0.028 vs CTRL GeneSetGO MYELOID DENDRITIC CELL ACTIVATION GO REGULATION OF TRANSCRIPTION INVOLVED IN G1 S TRANSITION OF MITOTIC GO ACUTE INFLAMMATORY RESPONSE logFC P4 -0.294 -0.056 0.707 0.665 0.002 125

Table 8: GO Biological Processes significantly enriched in IL-1β compared to • CTRL calculated by GSVA and using the 126 high-confidence set of differentially

expressed genes (Fig. 3.1) calculated by EnrichR. Significance is determined by

log(fold change) 1.5 and adjusted p-value < 0.05 for GSVA. Columns 2-5 show >

log2(FC) values for all IL-1β conditions and the last column is the adjusted p-

value from the EnrichR analysis. 126 Enrichment Adj P-val logFC FSK + P4 + IL1B logFC FSK + IL1B logFC P4 + IL1B IL1B 1.093 0.2321.033 0.505 0.808 0.041 0.1860.984 1.49E-02 0.113 0.5780.913 1.02E-02 0.169 NA 0.068 0.8860.958 1.33E-02 0.575 0.5730.825 0.239 8.28E-08 1.054 0.220 0.651 9.69E-03 0.243 5.20E-03 GeneSetGO POSITIVE REGULATION OF TRANSCRIPTION FACTOR IMPORT INTO NUCLEUS GO POSITIVE REGULATION OF LEUKOCYTE CHEMOTAXIS GO CELLULAR RESPONSE TO DRUGGO REGULATION OF LEUKOCYTE CHEMOTAXIS GO POSITIVE logFC REGULATION OF ACUTE 0.883 INFLAMMATORY RESPONSE GO INFLAMMATORY RESPONSE 1.131GO POSITIVE REGULATION OF 0.190NEUTROPHIL MIGRATION 0.083GO PATTERN RECOGNITION RECEPTOR SIGNALING PATHWAY 1.33E-02 0.783 0.745 0.473 0.242 6.62E-13 127 Enrichment Adj P-val logFC FSK + P4 + IL1B logFC FSK + IL1B logFC P4 + IL1B IL1B 0.746 0.2710.803 0.363 0.712 0.0130.828 0.539 0.867 2.40E-04 0.1610.764 0.748 0.651 1.02E-02 0.1270.752 0.079 1.129 4.66E-02 0.124 0.805 1.90E-03 0.4940.921 0.374 1.99E-02 0.287 0.018 5.20E-03 GeneSetGO POSITIVE REGULATION OF VASCULATURE DEVELOPMENT GO REGULATION OF TYROSINE PHOSPHORYLATION OF STAT PROTEIN GO POSITIVE REGULATION OF STAT CASCADE GO POSITIVE REGULATION OF LEUKOCYTE MIGRATION logFC GO MYELOID LEUKOCYTE MEDIATED IMMUNITY GO RECEPTOR METABOLIC PROCESSGO CELL CHEMOTAXISGO REGULATION OF TRANSCRIPTION FACTOR IMPORT INTO NUCLEUS 0.735 -0.187 0.216 0.170 0.759 0.378 4.36E-02 0.131 0.078 4.63E-02 128 Enrichment Adj P-val logFC FSK + P4 + IL1B logFC FSK + IL1B logFC P4 + IL1B IL1B 0.649 0.5410.639 0.518 0.413 0.1300.666 0.100 0.740 1.23E-03 -0.057 0.2570.633 2.65E-02 0.235 NA0.614 2.52E-12 0.578 0.3210.627 0.146 0.038 0.338 -0.053 1.69E-02 0.650 3.09E-02 0.395 4.70E-02 GeneSetGO RESPONSE TO MOLECULE OF BACTERIAL ORIGIN GO REGULATION OF SMOOTH MUSCLE CELL PROLIFERATION GO CELLULAR RESPONSE TO CYTOKINE STIMULUS GO MONOCYTE CHEMOTAXISGO REGULATION logFC OF INSULIN RECEPTOR SIGNALING PATHWAY GO POSITIVE REGULATION OF CHEMOTAXIS 0.680GO POSITIVE REGULATION OF T CELL PROLIFERATION 0.846GO ACTIVATION OF MAPK 0.712 ACTIVITY -0.169 3.18E-03 0.605 -0.032 0.309 -0.206 1.29E-02 129 Enrichment Adj P-val logFC FSK + P4 + IL1B logFC FSK + IL1B logFC P4 + IL1B IL1B 0.592 -0.183 0.520 0.189 2.23E-03 GeneSetGO NEGATIVE REGULATION OF HORMONE SECRETION logFC 130

Complete References

[1] Daniel B Hardy, Bethany A Janowski, David R Corey, and Carole R Mendelson. Progesterone receptor plays a major antiinflammatory role in human myometrial cells by antagonism of nuclear factor-κb activation of cyclooxygenase 2 expression. Molecular endocrinology, 20(11):2724–2733, 2006.

[2] Huiqing Tan, Lijuan Yi, Neal S Rote, William W Hurd, and Sam Mesiano. Proges- terone receptor-a and-b have opposite effects on proinflammatory gene expression in human myometrial cells: implications for progesterone actions in human preg- nancy and parturition. The Journal of Clinical Endocrinology, 97(5):E719–E730, 2012.

[3] Li Chen, Kaiyu Lei, Johann Malawana, Angela Yulia, Suren R Sooranna, Phillip R Bennett, Zhiqing Liang, Dimitri Grammatopoulos, and Mark R Johnson. Cyclic amp enhances progesterone action in human myometrial cells. Molecular and cellular endocrinology, 382(1):334–343, 2014.

[4] Peyvand Amini, Daniel Michniuk, Kelly Kuo, Lijuan Yi, Yelenna Skomorovska- Prokvolit, Gregory A Peters, Huiqing Tan, Junye Wang, Charles J Malemud, and Sam Mesiano. Human parturition involves phosphorylation of progesterone receptor-a at serine-345 in myometrial cells. Endocrinology, 157(11):4434–4445, 2016.

[5] Gregory A Peters, Lijuan Yi, Yelenna Skomorovska-Prokvolit, Bansari Patel, Pey- vand Amini, Huiqing Tan, and Sam Mesiano. Inflammatory stimuli increase pro- gesterone receptor-a stability and transrepressive activity in myometrial cells. Endocrinology, 158(1):158–169, 2016.

[6] Sam Mesiano, Eng-Cheng Chan, John T Fitter, Kenneth Kwek, George Yeo, and Roger Smith. Progesterone withdrawal and estrogen activation in human parturi- tion are coordinated by progesterone receptor a expression in the myometrium. The Journal of Clinical Endocrinology & Metabolism, 87(6):2924–2930, 2002.

[7] Sam Mesiano, Yuguang Wang, and Errol R Norwitz. Progesterone receptors in the human pregnancy uterus: do they hold the key to birth timing? Reproductive Sciences, 18(1):6–19, 2011.

[8] Sarah A Price and Andrés López Bernal. Uterine quiescence: the role of cyclic amp. Experimental physiology, 86(2):265–272, 2001.

[9] Boonsri Chanrachakul, Fiona Broughton Pipkin, and Raheela N Khan. Contribu- tion of coupling between human myometrial β2-adrenoreceptor and the bkca 131

channel to uterine quiescence. American Journal of Physiology-Cell Physiology, 287(6):C1747–C1752, 2004.

[10] BS Skalhegg and Kjetil Taskén. Specificity in the camp/pka signaling pathway. dif- ferential expression, regulation, and subcellular localization of subunits of pka. Front Biosci, 2:d331–d342, 1997.

[11] Manuel Morgado, Elisa Cairrão, António José Santos-Silva, and Ignacio Verde. Cyclic nucleotide-dependent relaxation pathways in vascular smooth muscle. Cellular and molecular life sciences, 69(2):247–266, 2012.

[12] Angela Yulia, Natasha Singh, Kaiyu Lei, Suren R Sooranna, and Mark R John- son. Cyclic amp effectors regulate myometrial oxytocin receptor expression. Endocrinology, 157(11):4411–4422, 2016.

[13] Barbara M Sanborn, Chun-Ying Ku, Sergiy Shlykov, and Lidiya Babich. Molecu- lar signaling through g-protein-coupled receptors and the control of intracellu- lar calcium in myometrium. Journal of the Society for Gynecologic Investigation, 12(7):479–487, 2005.

[14] Barbara M Sanborn. Hormonal signaling and signal pathway crosstalk in the con- trol of myometrial calcium dynamics. In Seminars in cell & developmental biology, volume 18, pages 305–314. Elsevier, 2007.

[15] Anouk Oldenburger, Sara S Roscioni, Esther Jansen, Mark H Menzen, Andrew J Ha- layko, Wim Timens, Herman Meurs, Harm Maarsingh, and Martina Schmidt. Anti- inflammatory role of the camp effectors epac and pka: implications in chronic ob- structive pulmonary disease. PLoS One, 7(2):e31574, 2012.

[16] Charlotte K Billington, Oluwaseun O Ojo, Raymond B Penn, and Satoru Ito. camp regulation of airway smooth muscle function. Pulmonary pharmacology & therapeutics, 26(1):112–120, 2013.

[17] Peyvand Amini, Rachel Wilson, William Koeblitz, Junye Wang, Huiqing Tan, Lijuan Yi, Zachary Stanfield, Andrea Romani, Charles Malemud, and Sam Mesiano. Mech- anism by which progesterone and camp synergize to maintain uterine quiescence during pregnancy. Molecular and cellular endocrinology, 2018.

[18] Kripamoy Aguan, Jorge A Carvajal, Loren P Thompson, and Carl P Weiner. Ap- plication of a functional genomics approach to identify differentially expressed genes in human myometrium during pregnancy and labour. Molecular human reproduction, 6(12):1141–1145, 2000. 132

[19] E Cheng Chan, Sarah Fraser, S Yin, G Yeo, K Kwek, RJ Fairclough, and R Smith. Human myometrial genes are differentially expressed in labor: a suppression sub- tractive hybridization study. The Journal of Clinical Endocrinology & Metabolism, 87(6):2435–2441, 2002.

[20] Gilles Charpigny, M-J Leroy, Michelle Breuiller-Fouché, Zarha Tanfin, Sakina Mhaouty-Kodja, Ph Robin, Denis Leiber, Joëlle Cohen-Tannoudji, Dominique Cabrol, Claude Barberis, et al. A functional genomic study to identify differen- tial gene expression in the preterm and term human myometrium. Biology of reproduction, 68(6):2289–2296, 2003.

[21] M Sean Esplin, M Bardett Fausett, Morgan R Peltier, Steven Hamblin, Robert M Sil- ver, D Ware Branch, Eli Y Adashi, and David Whiting. The use of cdna microarray to identify differentially expressed labor-associated genes within the human my- ometrium during labor. American journal of obstetrics and gynecology, 193(2):404– 413, 2005.

[22] Jon C Havelock, Patrick Keller, Ndaya Muleba, Bobbie A Mayhew, Brian M Casey, William E Rainey, and R Ann Word. Human myometrial gene expression before and during parturition. Biology of reproduction, 72(3):707–719, 2005.

[23] Shrikant Bollopragada, Refaat Youssef, Fiona Jordan, Ian Greer, Jane Norman, and Scott Nelson. Term labor is associated with a core inflammatory response in hu- man fetal membranes, myometrium, and cervix. American journal of obstetrics and gynecology, 200(1):104–e1, 2009.

[24] Pooja Mittal, Roberto Romero, Adi L Tarca, Juan Gonzalez, Sorin Draghici, Yi Xu, Zhong Dong, Chia-Ling Nhan-Chang, Tinnakorn Chaiworapongsa, Stephen Lye, et al. Characterization of the myometrial transcriptome and biological pathways of spontaneous human labor at term. Journal of perinatal medicine, 38(6):617–643, 2010.

[25] Carl P Weiner, Clifford W Mason, Yafeng Dong, Irina A Buhimschi, Peter W Swaan, and Catalin S Buhimschi. Human effector/initiator gene sets that regulate myome- trial contractility during term and preterm labor. American journal of obstetrics and gynecology, 202(5):474–e1, 2010.

[26] Gemma C Sharp, James L Hutchinson, Nanette Hibbert, Tom C Freeman, Philippa TK Saunders, and Jane E Norman. Transcription analysis of the my- ometrium of labouring and non-labouring women. PloS one, 11(5):e0155413, 2016. 133

[27] Yi-Wah Chan, Hugo A Berg, Jonathan D Moore, Siobhan Quenby, and Andrew M Blanks. Assessment of myometrial transcriptome changes associated with spon- taneous human labour by high-throughput rna-seq. Experimental physiology, 99(3):510–524, 2014.

[28] Doris Pieber, Victoria C Allport, Frank Hills, Mark Johnson, and Phillip R Bennett. Interactions between progesterone receptor isoforms in myometrial cells in human labour. MHR: Basic science of reproductive medicine, 7(9):875–879, 2001.

[29] Manju Monga, Chun-Ying Ku, Kimberly Dodge, and Barbara M Sanborn. Oxytocin- stimulated responses in a pregnant human immortalized myometrial cell line. Biology of reproduction, 55(2):427–432, 1996.

[30] Amy A Merlino, Toni N Welsh, Huiqing Tan, Li Juan Yi, Vernon Cannon, Brian M Mercer, and Sam Mesiano. Nuclear progesterone receptors in the human preg- nancy myometrium: evidence that parturition involves functional progesterone withdrawal mediated by increased expression of progesterone receptor-a. The Journal of Clinical Endocrinology & Metabolism, 92(5):1927–1933, 2007.

[31] Gemma Madsen, Tamas Zakar, Chun Ying Ku, Barbara M Sanborn, Roger Smith, and Sam Mesiano. Prostaglandins differentially modulate progesterone receptor-a and-b expression in human myometrial cells: evidence for prostaglandin-induced functional progesterone withdrawal. The Journal of clinical endocrinology and metabolism, 89(2):1010–1013, 2004.

[32] Jennifer Condon, Su Yin, Bobbie Mayhew, R Ann Word, WE Wright, JW Shay, and William E Rainey. Telomerase immortalization of human myometrial cells. Biology of reproduction, 67(2):506–514, 2002.

[33] Cole Trapnell, Lior Pachter, and Steven L Salzberg. Tophat: discovering splice junc- tions with rna-seq. Bioinformatics, 25(9):1105–1111, 2009.

[34] Cole Trapnell, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J Van Baren, Steven L Salzberg, Barbara J Wold, and Lior Pachter. Transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switch- ing during cell differentiation. Nature biotechnology, 28(5):511, 2010.

[35] Loyal A Goff, Cole Trapnell, and David Kelley. Cummerbund: visualization and ex- ploration of cufflinks high-throughput sequencing data. R package version, 2(0), 2012.

[36] Jerome Friedman, Trevor Hastie, and Rob Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1):1, 2010. 134

[37] Gene H Golub and Christian Reinsch. Singular value decomposition and least squares solutions. In Linear Algebra, pages 134–151. Springer, 1971.

[38] Michael E Wall, Andreas Rechtsteiner, and Luis M Rocha. Singular value decom- position and principal component analysis. In A practical approach to microarray data analysis, pages 91–109. Springer, 2003.

[39] Maxim V Kuleshov, Matthew R Jones, Andrew D Rouillard, Nicolas F Fernandez, Qiaonan Duan, Zichen Wang, Simon Koplev, Sherry L Jenkins, Kathleen M Jagod- nik, Alexander Lachmann, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic acids research, 44(W1):W90–W97, 2016.

[40] Minoru Kanehisa and Susumu Goto. Kegg: kyoto encyclopedia of genes and genomes. Nucleic acids research, 28(1):27–30, 2000.

[41] Sonja Hänzelmann, Robert Castelo, and Justin Guinney. Gsva: gene set variation analysis for microarray and rna-seq data. BMC bioinformatics, 14(1):7, 2013.

[42] Arthur Liberzon, Chet Birger, Helga Thorvaldsdóttir, Mahmoud Ghandi, Jill P Mesirov, and Pablo Tamayo. The molecular signatures database hallmark gene set collection. Cell systems, 1(6):417–425, 2015.

[43] Aravind Subramanian, Pablo Tamayo, Vamsi K Mootha, Sayan Mukherjee, Ben- jamin L Ebert, Michael A Gillette, Amanda Paulovich, Scott L Pomeroy, Todd R Golub, Eric S Lander, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 102(43):15545–15550, 2005.

[44] Gordon K Smyth. Limma: linear models for microarray data. In Bioinformatics and computational biology solutions using R and Bioconductor, pages 397–420. Springer, 2005.

[45] Anna Ritz, Christopher L Poirel, Allison N Tegge, Nicholas Sharp, Kelsey Simmons, Allison Powell, Shiv D Kale, and TM Murali. Pathways on demand: automated re- construction of human signaling networks. NPJ systems biology and applications, 2:16002, 2016.

[46] Edsger W Dijkstra. A note on two problems in connexion with graphs. Numerische mathematik, 1(1):269–271, 1959.

[47] C Jiang, Zhenyu Xuan, Fang Zhao, and Michael Q Zhang. Tred: a transcriptional regulatory element database, new entries and other development. Nucleic acids research, 35(suppl_1):D137–D140, 2007. 135

[48] Ahmed Essaghir, Federica Toffalini, Laurent Knoops, Anders Kallin, Jacques van Helden, and Jean-Baptiste Demoulin. Transcription factor regulation can be ac- curately predicted from the presence of target gene signatures in microarray gene expression data. Nucleic acids research, 38(11):e120–e120, 2010.

[49] Obi L Griffith, Stephen B Montgomery, Bridget Bernier, Bryan Chu, Katayoon Ka- saian, Stein Aerts, Shaun Mahony, Monica C Sleumer, Mikhail Bilenky, Maximilian Haeussler, et al. Oreganno: an open-access community-driven resource for regula- tory annotation. Nucleic acids research, 36(suppl_1):D107–D113, 2007.

[50] Heonjong Han, Hongseok Shim, Donghyun Shin, Jung Eun Shim, Yunhee Ko, Junha Shin, Hanhae Kim, Ara Cho, Eiru Kim, Tak Lee, et al. Trrust: a reference database of human transcriptional regulatory interactions. Scientific reports, 5:11432, 2015.

[51] Panagiotis Chouvardas, George Kollias, and Christoforos Nikolaou. Inferring active regulatory networks from gene expression data using a combination of prior knowl- edge and enrichment analysis. BMC bioinformatics, 17(5):181, 2016.

[52] Paul Shannon, Andrew Markiel, Owen Ozier, Nitin S Baliga, Jonathan T Wang, Daniel Ramage, Nada Amin, Benno Schwikowski, and Trey Ideker. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research, 13(11):2498–2504, 2003.

[53] Rickard Sandberg and Ingemar Ernberg. Assessment of tumor characteristic gene expression in cell lines using a tissue similarity index (tsi). Proceedings of the National Academy of Sciences, 102(6):2052–2057, 2005.

[54] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.

[55] Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 67(2):301– 320, 2005.

[56] Xiang-Hong Li, A Hari Kishore, Doan Dao, Weiming Zheng, Christopher A Ro- man, and R Ann Word. A novel isoform of microphthalmia-associated transcrip- tion factor inhibits il-8 gene expression in human cervical stromal cells. Molecular Endocrinology, 24(8):1512–1528, 2010.

[57] Mathias Uhlén, Linn Fagerberg, Björn M Hallström, Cecilia Lindskog, Per Oksvold, Adil Mardinoglu, Åsa Sivertsson, Caroline Kampf, Evelina Sjöstedt, Anna Asplund, et al. Tissue-based map of the human proteome. Science, 347(6220):1260419, 2015. 136

[58] Orly Alter, Patrick O Brown, and David Botstein. Singular value decomposition for genome-wide expression data processing and modeling. Proceedings of the National Academy of Sciences, 97(18):10101–10106, 2000.

[59] Orly Alter, Patrick O Brown, and David Botstein. Generalized singular value decom- position for comparative analysis of genome-scale expression data sets of two dif- ferent organisms. Proceedings of the National Academy of Sciences, 100(6):3351– 3356, 2003.

[60] John Tomfohr, Jun Lu, and Thomas B Kepler. Pathway level analysis of gene expres- sion using singular value decomposition. BMC bioinformatics, 6(1):225, 2005.

[61] Wei Yuan and Andrés López Bernal. Cyclic amp signalling pathways in the regula- tion of uterine relaxation. In BMC pregnancy and childbirth, volume 7, page S10. BioMed Central, 2007.

[62] Morgan R Peltier. Immunology of term and preterm labor. Reproductive Biology and Endocrinology, 1(1):122, 2003.

[63] Jing Pan, Keiichi Fukuda, Mikiyoshi Saito, Junichi Matsuzaki, Hiroaki Kodama, Mo- toaki Sano, Toshiyuki Takahashi, Takahiro Kato, and Satoshi Ogawa. Mechanical stretch activates the jak/stat pathway in rat cardiomyocytes. Circulation research, 84(10):1127–1136, 1999.

[64] Andrew J Brooks, Wei Dai, Megan L O’Mara, Daniel Abankwa, Yash Chhabra, Re- becca A Pelekanos, Olivier Gardon, Kathryn A Tunny, Kristopher M Blucher, Craig J Morton, et al. Mechanism of activation of protein kinase jak2 by the growth hor- mone receptor. Science, 344(6185):1249783, 2014.

[65] Michèle Breuiller-Fouche, Gilles Charpigny, and Guy Germain. Functional ge- nomics of the pregnant uterus: from expectations to reality, a compilation of stud- ies in the myometrium. In BMC pregnancy and childbirth, volume 7, page S4. BioMed Central, 2007.

[66] Stephen R Master, Jennifer L Hartman, Celina M D’cruz, Susan E Moody, Eliz- abeth A Keiper, Seung I Ha, James D Cox, George K Belka, and Lewis A Cho- dosh. Functional microarray analysis of mammary organogenesis reveals a devel- opmental role in adaptive thermogenesis. Molecular endocrinology, 16(6):1185– 1203, 2002.

[67] Wenbo Deng, Jeeyeon Cha, Jia Yuan, Hirofumi Haraguchi, Amanda Bartos, Emma Leishman, Benoit Viollet, Heather B Bradshaw, Yasushi Hirota, and Sudhansu K Dey. coordinates decidual sestrin 2/ampk/ signaling to govern parturi- tion timing. The Journal of clinical investigation, 126(8):2941–2954, 2016. 137

[68] Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. Mining of massive datasets. Cambridge university press, 2014.

[69] Didier Croes, Fabian Couche, Shoshana J Wodak, and Jacques Van Helden. Meta- bolic pathfinding: inferring relevant pathways in biochemical networks. Nucleic acids research, 33(suppl_2):W326–W330, 2005.

[70] Dana Silverbush and Roded Sharan. Network orientation via shortest paths. Bioinformatics, 30(10):1449–1455, 2014.

[71] John R Challis, Charles J Lockwood, Leslie Myatt, Jane E Norman, Jerome F Strauss III, and Felice Petraglia. Inflammation and pregnancy. Reproductive sciences, 16(2):206–215, 2009.

[72] Yutaka Shimizu, Takashi Takeuchi, Shizuka Mita, Tatsuto Notsu, Kiyoshi Mizuguchi, and Satoru Kyo. Krüppel-like factor 4 mediates anti-proliferative effects of progesterone with g0/g1 arrest in human endometrial epithelial cells. Journal of endocrinological investigation, 33(10):745–750, 2010.

[73] Marilyn Safran, Irina Dalah, Justin Alexander, Naomi Rosen, Tsippi Iny Stein, Michael Shmoish, Noam Nativ, Iris Bahir, Tirza Doniger, Hagit Krug, et al. Genecards version 3: the human gene integrator. Database, 2010, 2010.

[74] Cecilia Garlanda, Barbara Bottazzi, Antonio Bastone, and Alberto Mantovani. Pen- traxins at the crossroads between innate immunity, inflammation, matrix deposi- tion, and female fertility. Annu. Rev. Immunol., 23:337–366, 2005.

[75] Xiao-Jun Feng, Hui Gao, Si Gao, Zhuoming Li, Hong Li, Jing Lu, Jiao-Jiao Wang, Xiao-Yang Huang, Min Liu, Jian Zou, et al. The orphan receptor nor1 participates in isoprenaline-induced cardiac hypertrophy by regulating parp-1. British journal of pharmacology, 172(11):2852–2863, 2015.

[76] Jaclyn N Taroni and Casey S Greene. Cross-platform normalization enables ma- chine learning model training on microarray and rna-seq data simultaneously. BioRxiv, page 118349, 2017.

[77] Roberta Migale, David A MacIntyre, Stefano Cacciatore, Yun S Lee, Henrik Hagberg, Bronwen R Herbert, Mark R Johnson, Donald Peebles, Simon N Waddington, and Phillip R Bennett. Modeling hormonal and inflammatory contributions to preterm and term labor using uterine temporal transcriptomics. BMC medicine, 14(1):86, 2016. 138

[78] Nathan Salomonis, Nathalie Cotte, Alexander C Zambon, Katherine S Pollard, Karen Vranizan, Scott W Doniger, Gregory Dolganov, and Bruce R Conklin. Identi- fying genetic networks underlying myometrial transition to labor. Genome biology, 6(2):R12, 2005.

[79] Ramkumar Menon, Elizabeth A Bonney, Jennifer Condon, Sam Mesiano, and Robert N Taylor. Novel concepts on pregnancy clocks and alarms: redundancy and synergy in human parturition. Human reproduction update, 22(5):535–560, 2016.

[80] Douglas Brubaker, Alethea Barbaro, Mark R Chance, and Sam Mesiano. A dynami- cal systems model of progesterone receptor interactions with inflammation in hu- man parturition. BMC systems biology, 10(1):79, 2016.

[81] Oksana Shynlova, Tamara Nedd-Roderique, Yunqing Li, Anna Dorogin, and Stephen J Lye. Myometrial immune cells contribute to term parturition, preterm labour and post-partum involution in mice. Journal of cellular and molecular medicine, 17(1):90–102, 2013.

[82] Mathieu Nadeau-Vallée, Christiane Quiniou, Julia Palacios, Xin Hou, Atefeh Er- fani, Ankush Madaan, Mélanie Sanchez, Kelycia Leimert, Amarilys Boudreault, François Duhamel, et al. Novel noncompetitive il-1 receptor-biased ligand prevents infection-and inflammation-induced preterm birth. The Journal of Immunology, page 1500758, 2015.

[83] Agata Sakowicz. The role of nfKb in the three stages of pregnancy: implanta- tion, maintenance and labour; a review article. BJOG: An International Journal of Obstetrics & Gynaecology, 2018.

[84] Bingbing Wang, Nataliya Parobchak, Adriana Martin, Max Rosen, Lumeng Jenny Yu, Mary Nguyen, Kseniya Gololobova, and Todd Rosen. Screening a small mole- cule library to identify inhibitors of nf-κb inducing kinase and pro-labor genes in human placenta. Scientific reports, 8(1):1657, 2018.