COMPUTATIONAL INVESTIGATION OF BIOLOGICAL NETWORKS AND

PROGESTERONE SIGNALING DYNAMICS IN PRETERM BIRTH

by

DOUGLAS K. BRUBAKER

Submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

Systems Biology and Bioinformatics

CASE WESTERN RESERVE UNIVERSITY

May 2016

CASE WESTERN RESERVE UNIVERSITY

SCHOOL OF GRADUATE STUDIES

We hereby approve the thesis/dissertation of

Douglas K. Brubaker

candidate for the degree of Doctor of Philosophy

Committee Member

Jill Barnholtz-Sloan, PhD

Committee Member

Mark R. Chance, PhD

Committee Member

Sam Mesiano, PhD

Committee Member

Alethea Barbaro, PhD

Date of Defense

February 10, 2016

*We also certify that written approval has been obtained for any proprietary

material contained therein

1

Table of Contents

List of Tables………………………………………………………………………...…..3

List of Figures…………………………………………………………………...... 5

Acknowledgements……………………………………………………………………..7

Abstract…………………………………………………………………………………..8

Chapter 1. Spontaneous Preterm Birth Without Inflammatory Pathway

Activation: Functional Genomics Investigation of Myometrial

Gene Expression Data Characterizes Three Parturition

Subtypes. …………………………………………………………....……10

Chapter 2. Finding Lost in GWAS via Integrative-omics Analysis

Reveals Novel Sub-networks Associated with Preterm

Birth……………………………………………………………..….…..…..26

Chapter 3. A Dynamical Systems Model of B

Interactions with Inflammation in Pregnancy…………..……..55

Discussion………………………………………………………………………………88

References……………………………………………………………………………..93

2

List of Tables

Table 1-1 Myometrial Transcriptome Datasets For Meta-analysis……………….14

Table 1-2 Confirmation of Studies of Term Labor Differential

Expression…………………………………………………………………18

Table 1-3 Confirmation of Studies of HCA-indicated Preterm

Birth Differential ……………………………..……….19

Table 1-4 Confirmation of Studies of Spontaneous Preterm

Birth Differential Gene Expression………………………………..…….19

Table 2-1 Single Gene qRT-PCR Results for Term Myometrium

Subnetworks………………………………………………………….…...46

Table 2-2 Single Gene qRT-PCR Results for Preterm Myometrium

Subnetworks…………………………………………………………..…..47

Table 2-3 Term and Preterm Myometrium Subnetwork

Significant Testing Results………………………………………………48

Table 3-1 Performance of Single Gene and Model Classifiers………………..….68

3

Table 3-2 Characterization of Stability of Equilibrium Points of Dynamical

Systems Model…………………………………………………………....73

4

List of Figures

Figure 1-1 Pipeline for Meta-analysis of Myometrium

Transcriptome Data………………………………………………………13

Figure 1-2 Differentially Expressed Genes and Enriched

Pathways in Term Labor, Preterm Labor with HCA,

and Spontaneous Preterm Labor……………………………………….20

Figure 1-3 Pathways Active in Both Term Labor and Preterm

Labor with HCA……………………………………………………………21

Figure 1-4 Pathways Active in Spontaneous Preterm Labor………...……...……21

Figure 2-1 Pipeline for Integrative Analysis of GWAS, PPI

Network, and Myometrium Transcriptome Data……………………….34

Figure 2-2 Preterm Birth SNP Enriched PPI Network………………………..……40

Figure 2-3 Coordinately Regulated Subnetworks in Term

Laboring Myometrium…………………………………………………….42

5

Figure 2-4 Coordinately Regulated Subnetworks in Preterm

Laboring Myometrium…………………………………………………….44

Figure 2-5 Validated Term Labor Myometrium Subnetworks

Regulated by MEF2C…………………………………………………….48

Figure 3-1 Illustration of the Basin of Attraction of the

Laboring Equilibrium Point……………………………………………….63

Figure 3-2 Two-Gene Model Classifiers of Laboring Phenotype…………………66

Figure 3-3 Single-Gene Inflammatory Classifiers of Laboring

Phenotype…………………………………………………………………67

Figure 3-4 Single Gene PR-B Surrogate Classifiers of Laboring

Phenotype…………………………………………………………………67

Figure 3-5 Phase Space Transition of Labor Bifurcation………………………….74

Figure 4-1 Summary of Thematic Areas and Findings……………………………88

6

Acknowledgements

This dissertation would not have been possible without the exemplary mentoring of my adviser, Dr. Mark R. Chance. He stimulated my scientific development by consistently challenging me and by teaching me how to effectively communicate and argue for my findings. He has played a formative role in my development as a scientist by honing my sense of how to effectively conceptualize and attack problems in systems biology. These skills will stay with me my entire career and owe him immeasurably for cultivating them.

Dr. Jill Barnholtz-Sloan taught me early on to seek out effective and fruitful collaborators. Because of this, I was able to initiate and benefit from the collaboration and mentoring of Dr. Alethea Barbaro. Their council in research and in navigating the world of academia has kept me on track through the program.

The direction and creative elements of this work owe a great debt to Dr. Sam

Mesiano for always keeping his door open to talk and always being wiling to try something new.

I benefitted greatly from working on projects beyond this dissertation with

Dr. Gurkan Bebek and Elena Svenson. Junye Wang deserves recognition for his experimental contributions to this dissertation. Dr. Lindsay Stetson has been my sounding board for many projects and a good friend throughout my studies.

Perhaps most importantly, I thank my mother Carol, brother David, sister

Sara and girlfriend Stephanie Doran for keeping me grounded throughout my studies. Most of all, I thank my grandfather Robert K. Koehler for whom this and all my future work is dedicated.

7

Computational Investigation of Biological Networks and Progesterone Signaling

Dynamics in Preterm Birth

Abstract

by

DOUGLAS K. BRUBAKER

Preterm birth (PTB) is a major public health issue that is the leading cause of infant mortality worldwide. To identify dysregulation and therapeutic opportunities in PTB a better understanding of healthy term labor is required. Further understanding of how the interaction of progesterone signaling with inflammatory pathways maintains quiescence is essential to assessing the therapeutic potential of progesterone for PTB. It has also been observed that PTB has a strong heritability from mother to daughter. This has motivated several genome wide association studies (GWAS) to try to identify single nucleotide polymorphisms (SNP) with genome wide significance that predispose a woman to PTB. To date, no SNPs have been identified and replicated with genome wide significance raising concerns about the effectiveness of GWAS in identifying the genetic predisposition of PTB. The myometrium, uterine smooth muscle tissue, undergoes a dramatic phenotypic transition from quiescent to forcefully contracting to deliver the conceptus. Understanding the biological signaling networks driving this transition, how genetic factors may modulate it, and the role

8

of progesterone signaling in labor are essential factors to addressing the challenge of PTB.

This dissertation addresses each of these issues to better characterize

PTB. A meta-analysis approach is used to characterize the signaling events governing the quiescent to laboring transition of the myometrium. We show that while inflammatory pathways are crucial to term labor and inflammation indicated

PTB, spontaneous PTB has a unique set of signaling pathways governing the myometrium’s transition. By organizing insignificant PTB-GWAS SNPs in a -protein interaction (PPI) network context, groups of modest effect SNPs were tested for combined effects on modules of a PPI network. Module function was assessed with term and preterm labor myometrium transcriptome data to identify modules dysregulated with labor onset. A module characterized by myocyte enhancer factor -2C (MEF2C) and 9 PTB-SNPs was implicated in term labor. Finally, we modeled progesterone signaling with inflammation using a dynamical systems model and used the model to precisely predict laboring phenotypes. Progesterone signaling dynamics are well characterized by this competitive interaction model, but the lack of inflammatory pathways in spontaneous PTB suggest limited effectiveness of progesterone modulation as a

PTB therapy in that context. This dissertation illustrates how understanding a complex disorder like PTB requires a systems level approach. Such an approach is only possible when high dimensional data is carefully modeled, assessed, and the dimensionality reduced using appropriate and diverse computational approaches.

9

Chapter 1: Spontaneous Preterm Birth Without Inflammatory Pathway

Activation: Functional Genomics Investigation of Myometrial Gene

Expression Data Characterizes Three Parturition Subtypes.

10

Background

One strategy for preventing preterm birth is to better understand the signaling pathways of term labor in hopes of identifying dysregulation and therapeutic opportunities in preterm birth (PTB). For most of pregnancy the myometrium (uterine smooth muscle) is maintained in a relaxed and quiescent state. Labor is characterized by a drastic transformation of the myometrium from quiescent and hypertrophied to contractile. It is well accepted that inflammatory pathways become active in the myometrium with the onset of labor and that these are regulated by and interact with the relaxatory actions of the hormone progesterone [1-3].

While some preterm labor occurs spontaneously, there is sometimes an indication of inflammation in preterm labor. The clinical indication for this is usually inflammation of the choriodecidua, referred to as histologic chorioamnionitis (HCA). This inflammatory event is a major risk factor for PTB and the resulting signaling cascade of labor is enriched with inflammatory pathways and genes [4, 5]. It is therefore widely believed that the labor cascade for all parturition subtypes is characterized by activation of the inflammatory pathways and their peripheral interactions with other pathways.

To better characterize the signaling characteristics of parturition subtypes,

Eidem and colleagues performed a systematic meta-analysis of 2,361 studies of gestational tissues. They compared results from 93 whole genome gene expression studies, 17 of which were studies of the pregnancy myometrium [6].

The main finding of their investigation was that there is a tremendous amount of

11

heterogeneity between studies and individuals. They identified important gaps in the literature where more work was required to fully characterize the subtypes of parturition [6]. Though this work is an important contribution to the field, the authors relied upon the reported gene lists of the original studies and did not attempt to standardize the statistical approaches to the raw datasets. Thus, their analysis assumes that all reported gene lists are valid regardless of how the results were generated. This raises concerns about the validity of the gene lists reported in their analysis and indicates a need to re-analyze the raw data from the studies and standardize the statistical approaches to generate high confidence gene lists.

We undertook a meta-analysis of all available transcriptome data of the myometrium describing spontaneous PTB, preterm birth with inflammatory indications (HCA), and term labor. By re-analyzing the raw datasets from the studies we are able to standardize the statistical approach to the data and generate comparable gene lists. Differential gene expression analysis was performed for each study individually for each parturition subtype present in the study. The differentially expressed genes were then used for pathway enrichment analysis thus enabling us to compare the parturition subtypes at both the single gene and pathway level. Our analysis identifies important areas of similarity as well as unique features of each labor subtype and is the most comprehensive analysis of gene expression in the pregnancy myometrium to date.

12

Methods

We performed a meta-analysis of raw gene expression datasets of the pregnancy myometrium (Figure 1-1). Differential expression analysis was standardized to ensure that the resulting gene lists were selected using identical criteria. Microarray studies were analyzed using nonparametric statistics while

RNA-seq studies were analyzed using the Cufflinks pipeline to account for read length bias [7, 8]. We assessed how well the original studies replicated and compared the gene lists for spontaneous PTB, HCA-indicated PTB, and term labor. Differentially expressed genes for each phenotype were then analyzed for pathway enrichment using Crosstalker [9].

Figure 1-1: Pipeline for the meta-analysis of raw gene expression datasets from parturition associated myometrium samples. Raw data was analyzed using a standardized set of differential expression methods for all studies to identify differentially expressed genes. Gene lists were then analyzed using a pathway and network analysis tool called Crosstalker. The resulting pathway signatures were then compared to each other to assess unique and shared characteristics of labor subtypes represented in the original transcriptome studies.

13

Datasets

We searched gene expression omnibus [10] and paper supplements for raw gene expression datasets of laboring myometrium analyzed in [6]. The authors of

[3] provided their dataset of 38 term non-laboring (TNL) and 37 term in-labor

(TIL) samples when contacted and [5] provided their dataset of 5 TNL, 5 TIL, 5 preterm non-laboring (PTNL) and 5 preterm labor with histologic chorioamnionitis

(HCA) in advance of publication. We also contributed a RNA sequencing dataset of 4 PTNL, 3 spontaneous preterm in labor (PTIL), 4 TNL, and 3 TIL samples hereafter referred to as the Mesiano cohort. A complete list of datasets obtained and phenotypes represented is shown in Table 1-1. A total of 15 PTNL, 6 PTIL, 8

HCA, 37 TNL, and 38 TIL samples were collected for an overall cohort size of

104 myometrium samples.

Table 1-1: List of myometrium transcriptome datasets obtained for this meta-analysis. Sequencing technology, phenotypes represented and sample sizes are shown. Cohort Name Technology PTNL1 PTIL2 HCA3 TNL4 TIL5

Mesiano (2016) RNA-seq 4 3 0 4 3

Chan, et al. (2014) RNA-seq 0 0 0 5 5

Ackerman, et al. (2015) RNA-seq 5 0 5 5 5

Mittal, et al. (2008) Microarray 0 0 0 20 19

Weiner, et al. (2010) Microarray 3 0 3 3 3

Bethin, et al. (2003) Microarray 3 3 0 0 3 1 PTNL-Preterm Non-laboring 2PTIL-Preterm In-labor 3HCA-Histologic Chorioamnionitis PTIL 4TNL-Term Non-laboring 5TIL-Term In-labor

14

RNA seq Dataset Processing

FASTQ data files from the Mesiano cohort and Ackerman, et al. [5] were aligned using bowtie2 [11], samtools-0.1.18, and tophat-2.0.13 [12] against the

University of California Santa Cruz (UCSC) hg19 reference transcriptome. This procedure is consistent with the approach outlined by Chan et al. [13] to ensure a standardized analysis pipeline. Transcript abundances were estimated using

Cufflinks-2.2.1 [8]. Cufflinks implements a normalization to filter out artificial and noisy transcripts and to account for read length bias [7, 8]. The resulting transcript abundances were pooled using Cuffcompare to create a merged transcriptome assembly for all samples against which differential expression is measured [7]. Cuffcompare was run separately for each study for the preterm and term myometrium samples to create two different merged transcriptome assemblies for differential expression analysis of the term and preterm samples respectively.

Differential Expression Analysis

Three differential expression tests were performed on the cohorts in Table

1-1. If a study had samples in each phenotype, we tested for differentially expressed genes comparing PTNL to HCA, PTNL to PTIL, and TNL to TIL.

Regardless of sequencing platform, genes were considered to be differentially expressed if two or more studies showed the gene to be differentially expressed with a false discovery q-value of q < 0.25.

For the RNA-seq studies we used Cuffdiff2 [7], a newer version of the

Cuffdiff differential expression algorithm implemented in Cufflinks 2.2.1 which

15

corrects for both gene and transcript level variability in gene expression estimates. Microarray studies were analyzed using a nonparametric Wilcoxon-

Mann-Whitney test with a false discovery rate (FDR) correction. Differential expression lists were compared to the original studies to assess consistency of reported findings.

Pathway and Network Analysis

We used a network biology tool called Crosstalker [9] to identify subnetworks of topologically related in a protein-protein interaction (PPI) network. Briefly, Crosstalker takes an input list of proteins and performs a random walk with restarts between those protein “seeds” on a given PPI network.

This procedure computes the relative amount of time spent at any protein in the

PPI during an infinite random walk when restarting only at the input proteins. The result is statistically analyzed to identify proteins that are highly traversed for the input “seeds”, and unlikely to be highly traversed for random inputs. All proteins that were traversed a statistically significant amount are added to the output network(s), which likely includes proteins that were not in the initial input. These added nodes, called crosstalkers, capture crosstalk between subnetworks and provide insight into how disparate signaling pathways may communicate with one another.

Crosstalker then tests for pathway enrichment using Fisher's exact test that compares the representation of genes in a subnetwork against a reference database of curated signaling pathways. These pathways can then be annotated on the networks with other data, such as the fold change of a gene encoding that

16

protein in an experiment, to give a detailed picture of biological activity and regulation both within and between pathways. By representing proteins in the networks with their coding genes different data types at the genomic and transcriptomic scale can be analyzed at the functional PPI network level to identify driver pathways and networks of biological processes.

We input the resulting differential expression lists from the three comparisons of non-laboring and laboring myometrium into Crosstalker to identify pathways and networks associated with HCA-indicated PTB, spontaneous PTB, and term labor. The random walk was performed on the STRING protein-protein interaction network database [14] and pathway enrichment tested against the

National Cancer Institute’s Pathway Interaction Database (NCI-PID) [15, 16].

Results

Comparison With Original Study Findings

We performed differential gene expression analysis for all available myometrium datasets to identify differentially expressed genes in term labor (TNL vs. TIL), preterm labor with histologic chorioamnionitis (PTNL vs. HCA), and spontaneous preterm labor (PTNL vs. PTIL). Nonparametric testing was used to test for differential expression in microarray datasets while the Cufflinks protocol was used to analyze RNA-seq datasets (Table 1-1) [7, 8]. We first compared our findings to those made in the original studies [1, 3-5, 13] to assess the extent to which the findings of the original studies could be replicated with alternative methods. Genes were considered to be differentially expressed in an individual cohort if they passed an FDR correction of q < 0.25.

17

For term labor we were able to confirm 68% of the genes reported as differentially expressed genes from Chan et al. [13] and only 36% of the findings from Ackerman et al. [5] (Table 1-2). In contrast, the best we could confirm from the microarray datasets was 3% of genes from Mittal et al. [3]. The best nominal p-value obtained when analyzing the TNL and TIL samples from Weiner [4] and

Bethin [1] was 0.0725. Therefore, no genes were differentially expressed in those cohorts by our statistical criteria. In the Mesiano cohort we found 185 genes are differentially expressed in term labor.

Table 1-2: Genes reported as differentially expressed in term labor (TNL vs. TIL) by five myometrial transcriptome studies. The number of reported genes from the original paper is shown as well as the number of genes measured by our statistical methodology. The number of genes that we confirmed and percent overlap with the original study is also shown. Cohort Name Reported Measured Confirmed Percent Overlap

Chan, et al. (2014) 764 2690 519 68

Ackerman, et al. (2015) 864 866 309 36

Mittal, et al. (2008) 100 335 3 3

Weiner, et al. (2010) 174 0 0 0

Bethin, et al. (2003) 478 0 0 0

For the both subtypes of preterm labor, spontaneous PTB and HCA, we once again found no genes in Weiner [4] and Bethin [1] to be differentially expressed (Tables 1-3 and 1-4). We confirmed 27% of the differentially expressed genes in HCA from Ackerman et al. [5]. Due to the lack of differentially expressed genes in Weiner [4] and Bethin [1] the only dataset remaining to assess genes differentially expressed in HCA is Ackerman et al. [5]. For spontaneous PTB we only had the Mesiano cohort where 94 genes were

18

differentially expressed. For these phenotypes of we could not enforce the criteria of requiring a gene to be differentially expressed in two or more studies.

We accepted genes as significant if Ackerman et al. [5] identified them in HCA or the Mesiano cohort identified them in spontaneous PTB.

Table 1-3: Genes reported as differentially expressed in HCA-indicated preterm labor (PTNL vs. HCA) by two myometrial transcriptome studies. The number of reported genes from the original paper is shown as well as the number of genes measured by our statistical methodology. The number of genes that we confirmed and percent overlap with the original study is also shown. Cohort Name Reported Measured Confirmed Percent Overlap

Ackerman, et al. (2015) 448 244 123 27

Weiner, et al. (2010) 34 0 0 0

Table 1-4: Genes reported as differentially expressed in spontaneous preterm labor (PTNL vs. PTIL) by one myometrial transcriptome study. The number of reported genes from the original paper is shown as well as the number of genes measured by our statistical methodology. The number of genes that we confirmed and percent overlap with the original study is also shown. Cohort Name Reported Measured Confirmed Percent Overlap

Bethin, et al. (2003) 478 0 0 0

Comparing Laboring Phenotypes By Gene and Pathway Signatures

In total, 685 genes were differentially expressed in term labor in two or more studies. We found 94 differentially expressed in the Mesiano cohort for spontaneous preterm labor and 244 genes differentially expressed in HCA- indicated preterm labor in Ackerman et al. [5]. We compared these gene lists to identify genes shared between the different laboring phenotypes and found that

HCA-indicated PTB and spontaneous PTB were more similar to healthy term labor than to each other (Figure 1-2). In fact, HCA-indicated PTB and spontaneous PTB only shared 7 differentially expressed genes.

19

Figure 1-2: Differentially expressed genes and enriched pathways for HCA-indicated PTB (red), term labor (blue), and spontaneous preterm labor (green).

Though different gene sets can sometimes activate similar pathways, the resulting pathway analysis with Crosstalker shows that this is not the case for these parturition subtypes (Figure 1-2). Spontaneous preterm birth is characterized by one pathway, the VEGF and VEGFR signaling network, and shares no functional similarities with HCA-indicated PTB or term labor (Figure 1-

3). In contrast, HCA-indicated PTB and term labor share a core set of 11 pathways that include several chemokine and integrin pathways that have potential crosstalk with the targets of the AP1 (Figure 1-4).

20

Figure 1-3: Crosstalker results showing the common pathways characterizing term labor and HCA-indicated preterm labor. Nodes are colored by the fold change of the genes in laboring relative to non-laboring myometrium and edges are colored by associated signaling pathways.

Figure 1-4: Crosstalker results showing the pathway characterizing spontaneous preterm labor. Nodes are colored by the fold change of the genes in laboring relative to non-laboring myometrium and edges are colored by associated signaling pathways. 21

Conclusions and Future Work

In the present study we perform a systematic meta-analysis of parturition associated myometrial transcriptome data to characterize three laboring subtypes. The seminal finding of our investigation is that the dogma that labor is the same biological process regardless of whether it occurs before or after term is incorrect. We show that spontaneous PTB has a distinct signaling pathway signature compared to HCA-indicated PTB and term labor. The shared pathways between HCA-indicated PTB and term labor are representative of the canonical singling events commonly seen in the myometrium during labor.

Surprisingly, spontaneous PTB shares no features in common with HCA- indicated PTB or term labor despite the myometrium undergoing the same phenotypic transition from hypertrophied and quiescent to contractile. While this could suggest that the mechanism of labor may be independent of the inflammatory pathways seen in the HCA and term labor samples, the lack of overlapping differentially expressed genes suggests and alternative explanation.

As products of evolution, biological signaling networks have developed an extraordinary amount of redundancy and it is possible that deactivation of the

VEGF and VEGFR signaling network may have evolved as an alternative mechanism for labor.

The Ackerman [5] and Mesiano datasets also enabled us to compare the phenotypes of PTNL and TNL to identify transcriptional changes between the early and late pregnancy quiescent uterus. The transcriptional changes between these two phenotypes are likely to be associated with the growth and remodeling of the uterus throughout pregnancy, dysregulation of which could explain a 22

portion of preterm birth phenotypes. 165 genes were differentially expressed in the Ackerman dataset and 51 were differentially expressed in the Mesiano cohort between PTNL and TNL at a FDR threshold of 0.25. However, only 11 genes overlapped between these two lists making pathway analysis with Crosstalker impossible. This is likely due to all of the Ackerman PTNL samples having signs of preeclampsia whereas only one Mesiano PTNL sample had preeclampsia.

Interestingly, analyzing the 51 differentially expressed genes from the Mesiano cohort with Crosstalker shows that VEGF and VEGFR signaling is upregulated in

TNL samples relative to PTNL samples. This suggests that activation of VEGF may be necessary for pregnancy to advance while inhibition of VEGF signaling triggers preterm labor.

An important consideration in undertaking a meta-analysis of parturition subtypes is the phenotypic uncertainty in the different cohorts. Though different laboratories follow comparable tissue procurement and sequencing protocols, the definition of labor presents a problem for precisely defining phenotypes. The characterization of samples as term or preterm carries some uncertainty due to the delay between a woman’s previous menstrual cycle and determination of pregnancy. This difficulty can be overcome by collecting samples from earlier in pregnancy to account for up to four weeks of uncertainty in the start of the pregnancy.

The determination of labor is much more uncertain. Samples are classified as in-labor if the cervix has dilated beyond 4 cm, maternal –fetal membranes rupture, or myometrial contractions occur. Myometrial contractions are a

23

particularly important marker of labor for this meta-analysis. The length and severity of patient reported contractions can be subjective and absent other labor indications from the cervix or membranes the determination of labor can be challenging. In the Mesiano cohort we determined samples to be in labor if cervical and myometrial indications were present, but the lack of detailed records from other cohorts introduces some phenotypic uncertainty into our analysis.

Further work is required to confirm these findings. Immunohistochemistry work is underway to assess whether or not leukocytes and macrophages are present in the spontaneous preterm birth samples from the Mesiano cohort. If present, an important comparison will be to see assess how this compares to the term labor samples in the Mesiano cohort which are expected to have more infiltrating macrophages than the spontaneous PTB samples. Similarly, we will assess by qRT-PCR whether the VEGF signaling events characterizing spontaneous PTB are present in the spontaneous PTB and term labor samples of the Mesiano cohort. These experiments will be important for assessing and confirming the validity the bioinformatics results from differential gene expression and pathway analysis.

An issue that became apparent while conducting this analysis was the consistency and reliability of the findings from the gene expression studies done using microarray technology [1, 3, 4]. Though Mittal et al. reported 471 genes as differentially expressed in the original study, only 100 were actually referenced in the manuscript and no full list of their findings was published [3]. This made assessing the consistency of our findings with the original publication difficult

24

since we could only compare the 335 genes found differentially expressed by our analysis to a small subset of their total findings. Bethin et al. applied a parametric

T-Test when assessing differential expression despite having only 3 samples in each phenotype [1]. This approach was not appropriate given their sample sizes and a nonparametric approach such as the one employed in the present investigation would have been more appropriate. Weiner et al. employed a non- standard methodology of identifying labor-associated genes and did not statistically assess differential expression in their cohort [4]. This explains why we were unable to replicate any genes from either of these studies and our limited confirmation of the Mittal group’s findings.

Our findings represent a significant contribution to the field of myometrium signaling in parturition. The present work confirms and solidifies the important role of inflammatory pathways in term labor and shows that these pathways are similarly altered in preterm labor when inflammation is present. We have also shown that spontaneous PTB has a completely different biological signaling character than either of these parturition subtypes. The present work shows that our understanding of term labor is not likely to yield insight into spontaneous PTB and that further work is required to characterize spontaneous PTB. Our work indicates that alternative hypotheses and therapeutic approaches are required for spontaneous PTB.

25

Chapter 2: A GWAS Rescue Mission Reveals MEF2C and TWIST1

Associated Sub-Networks Related to Preterm Labor

Submitted for Publication

26

Finding Lost Genes in GWAS via Integrative-omics Analysis Reveals

Novel Sub-networks Associated with Preterm Birth

Douglas Brubaker1, Yu Liu1*, Junye Wang2, Huijing Tan2, Jonas Bacelis3, Ge

Zhang4,5, Louis J. Muglia4, Sam Mesiano2, Mark R. Chance1**

1Center for Proteomics and Bioinformatics and 2Department of Reproductive

Biology and Department of Obstetrics and Gynecology, University Hospitals

Case Medical Center Case Western Reserve University, 11900 Euclid Avenue,

Cleveland OH, 44106. 3Department of Obstetrics and Gynecology, Sahlgrenska

University Hospital Östra (East), Gothenburg, Sweden 4 Division of Human

Genetics, Cincinnati Children’s Hospital Medical Center, Department of

Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229

5Center for Prevention of Preterm Birth, Perinatal Institute, Cincinnati Children’s

Hospital Medical Center, Department of Pediatrics, University of Cincinnati

College of Medicine, Cincinnati, OH 45229

Corresponding Author*

Mark R. Chance

Center for Proteomics and Bioinformatics

Case Western Reserve University

11900 Euclid Avenue

Cleveland, OH, 44106, USA

Email: [email protected]

27

Abstract

Genome wide association studies (GWAS) have failed to identify replicable single nucleotide polymorphisms (SNP) associated with preterm birth (PTB) likely due to insufficient power (e.g. rarity of SNPs with significant functional effects) or complexities in defining PTB phenotypes. Some genetically associated PTB phenotypes may result from multiple modest effect variants acting through common pathways. We explore the hypothesis that genetic variants act together to disrupt the expression of groups of genes that cluster together in modules of a tissue specific protein-protein interaction (PPI) network. A PPI network enriched with PTB-SNP genes of modest significance from a meta-analysis of three PTB-

GWAS was constructed using a Steiner Tree algorithm. Dysregulated gene expression modules associated with labor were identified in this network using mutual information scoring of transcriptome data from term and preterm myometrium samples. The dysregulated gene expression in these subnetworks was confirmed in an independent cohort by qRT-PCR for identified sets of genes.

Myocyte enhancer factor -2C (MEF2C) was identified in 15/22 significant term labor subnetworks involved with the regulation of , muscle function, and inflammatory genes carrying significant PTB SNPs. A significant PTB-SNP associated gene, twist basic helix-loop-helix transcription factor-1 (TWIST1), a direct repressor of MEF2C, was also frequently identified in the preterm labor subnetworks. qRT-PCR confirmed dysregulation of 8 term labor associated subnetworks. In addition, MEF2C was coordinately expressed with PTB SNP associated genes a phospholipase 2G4C (PLA2G4C) and lactose binding lectin -

28

2 (LGALS2) in both the term and preterm subnetworks. PLA2G4C is the only gene to replicate in the literature curation database for preterm birth.

Dysregulated interactions in subnetworks involving TWIST1 and MEF2C may affect the function of the myometrium and increase risk for PTB. This study demonstrates that GWAS data of moderate significance may contain highly enriched information of functional relevance in discerning important pathways and networks leading to complex disease.

Author Summary

The genetic component of preterm birth (PTB) has remained elusive due to the complex nature of the disorder. Genome wide association studies (GWAS) of

PTB have failed to identify replicable single nucleotide polymorphisms (SNP) associated with PTB. We explore the hypothesis that multiple smaller variants implicated by PTB-GWAS may act through tissue specific protein-protein interaction (PPI) network modules to disrupt the expression of groups of genes and drive PTB. Using mutual information scoring of term and preterm myometrial gene expression data, we identified 21 subnetworks associated with term and preterm labor enriched with the transcription factors myocyte enhancer factor -2C

(MEF2C) and its direct repressor twist basic twist basic helix-loop-helix transcription factor-1 (TWIST1). We confirmed eight subnetworks with qRT-PCR in an independent cohort of myometrium samples. This study identifies a novel pathway associated with PTB-GWAS variants. It also demonstrates how GWAS hits of modest significance may be enriched with functionally relevant SNPs that can explain complex disease if viewed in a tissue specific PPI network context.

29

Background

Spontaneous preterm birth (PTB) is a complex disorder that accounts for the majority of neonatal mortality worldwide [17]. Multiple biological and environmental factors are believed to converge upon common signaling pathways that result in either spontaneous premature rupture of fetal membranes or spontaneous preterm uterine contractions [18]. Though environmental factors are significant, maternal genome influences are associated with 40% of all preterm births and this strong inheritance of PTB tendencies indicates genetic components are related to the disorder [18-22]. Many genome wide association studies (GWAS) have attempted to identify single nucleotide polymorphisms

(SNPs) associated with PTB [18, 23-25]. Though some SNPs achieve statistical significance within study cohorts, no studies have identified a SNP that replicates in an independent cohort after multiple hypothesis testing [18, 23, 25].

There are several plausible explanations as to why the genetic component of preterm birth has eluded detection in previous GWAS of PTB. If genetically indicated PTB is the result of a single, large effect SNP, then it may be detectable in a larger cohort or meta-analysis that provides more statistical power

[18]. Alternatively, PTB could be the consequence of interactions between multiple smaller-effect variants that do not rise to significance after hypothesis testing corrections [18, 26]. Further, SNPs associated with complex disorders

(especially non-coding SNPs) may perturb gene expression only in selected tissues [27-30]. Thus, an approach that accounts for perturbations in gene expression at the tissue level and analyzes interactions among SNPs may be

30

required to address the complex genetics (and attendant complex environmental influences) mediating PTB.

One approach to identify genetic influence on preterm birth has been an integrated literature curation and pathway analysis approach [26, 31, 32]. The database of preterm birth (dbPTB) is a resource in which all the literature on the genetics of preterm birth is curated and assembled into a list of candidate SNPs and genes [26]. Pathway and network analyses using gene set enrichment analysis (GSEA) [33] and ingenuity pathway analysis (IPA) have been undertaken on these candidates to identify causal pathways and networks associated with preterm birth. Studies of these curated data have confirmed the inflammatory character of labor implicating cytokines and inflammatory pathways enriched with SNP-associated genes [31, 32].

This approach has several limitations. The issue of studies being underpowered to detect significant, replicable genetic variants associated with preterm birth is not addressed. Further, dbPTB accepts the statistical significance of the manuscripts and does not undertake a rigorous meta-analysis of the raw data [26]. Lastly, the reliance on GSEA limits discovery to curated pathways to identify enriched functions in the candidate genes while IPA network models are not disease or tissue specific and limit the integration with other data sets.

Although GWAS studies have identified many genes of functional interest across many diseases, these studies, as in the case of PTB, generally identify few significant genes, and thus provide a low level of functionally relevant information. The conventional understanding of this phenomena is that “common”

31

variants, which are targeted in major GWAS studies, are not always associated with major functional effects and that increased cohort sizes or higher resolution mapping of the genome (e.g. sequencing), or both are needed to fully understand the genetic basis of conditions having substantial heritable components [34].

Although this view has merit, we suggest that the interactions of common genetic variants affects phenotype, and that the interactions of multiple variants of modest effect size explains an important subset of heritable function. In this scenario, top-scoring SNPs associated with disease phenotypes in GWAS may not survive multiple hypothesis testing corrections to rise to individual significance, but may be highly enriched for functional associations that are highly relevant to the phenotype, providing key leads to reveal the basic biology of the disease. In this scenario, we can utilize “top” SNP candidates as a group of genes to seed bioinformatics workflows that identify tissue specific GWAS- associated functions of relevance to disease.

Network-based approaches have become increasingly popular for identifying novel biological associations and as a platform for integrating diverse molecular data types for integrative –omics analysis [35-38]. Protein-protein interaction (PPI) networks are particularly powerful as an analysis framework since PPIs directly reflect functional actions of genes [38]. Gene expression, in turn, can be modulated by underlying SNPs in and around the coding region for a gene [27-30]. A network biology approach, which combines candidate genes from GWAS of PTB with tissue-specific transcriptome data, has the potential to

32

reveal tissue-specific causal pathways of preterm birth and “rescue” GWAS studies that identify few SNPs of interest.

We assembled a PPI network of genes carrying maternal SNPs associated with preterm birth as well as topologically associated interacting genes. Candidate SNPs were obtained from a meta-analysis of three independent preterm birth GWAS constituting a cohort of 3,485 mother-child pairs [39]. This network was then scored with myometrium transcriptome data from women undergoing term and preterm labor to identify subnetworks of coordinately regulated genes associated with labor onset. Candidate sub- networks were then confirmed by qRT-PCR analysis of coordinately differentially expressed genes in an independent cohort. Overall, the approach suggests a novel method to “rescuing” modestly significant SNPs in GWAS studies that provides potentially important and interesting gene candidates ripe for productive future study.

Methods

Multiple high dimensional PTB datasets were analyzed in a protein-protein interaction network framework to identify genetically driven subnetworks associated with the preterm myometrial contraction phenotype (Figure 2-1).

Candidate genes carrying SNPs potentially associated with PTB were used to seed a PPI network enriched with these genes and their interacting partners. The search space was thus constrained to a set of high confidence protein interactions in the network neighborhood of the PTB SNPs. This tissue agnostic network was then scored with transcriptome data from term and preterm

33

myometrium to identify subnetworks associated with the onset of the term and preterm labor phenotypes. Subnetworks were then confirmed with qRT-PCR and compared against dbPTB to search for known associations with PTB.

Figure 2-1: Candidate preterm birth associated SNP carrying genes were mapped onto a protein- protein interaction network and connected along the shortest paths between them using a Steiner Tree Algorithm. The resulting network is enriched for the preterm birth associated candidates and includes topologically associated interacting proteins. Transcriptome data is used to score the enriched PPI network and identify groups of coordinately differentially expressed genes whose expression discriminated between in-labor and non-laboring samples in datasets of term and preterm myometrium samples. Networks were then confirmed by qRT-PCR.

Datasets

A list of 250 SNPs was obtained from a meta-analysis of three GWAS of preterm birth [39]. The meta-analysis analyzed cohorts of women from Denmark

[24], Norway [23], and Finland [25]. The top 250 SNPs had p-values ranging from

10-7 to 10-4 and were associated with 237 candidate genes defined as being within 20kb of the SNPs. Though the SNPs were not statistically significant per se, they served as a list of “genomic seed genes” that had the highest statistical significance in the meta-analysis. 34

Data from three transcriptome studies of the human pregnancy myometrium, one RNA-seq [13] and two microarray datasets [1, 4], were obtained to assemble a cohort of 5 term non-laboring (TNL), 5 term in-labor (TIL),

6 preterm non-laboring (PTNL), and 6 preterm in-labor (PTIL) samples. Each gene in the microarray datasets was converted to z-score for each sample to facilitate combining the studies into one gene expression matrix for subnetwork scoring.

Expanded Network Construction

A PPI network neighborhood was constructed from the seed genes using the STRING Protein-Protein Interaction Network Database [40]. STRING curates thousands of protein interactions and assesses the confidence of each based on the type and amount of evidence for the interactions. We obtained the full database and filtered for high confidence edges (edge weight > 0.5) to obtain a network of 51, 256 physical protein-protein interactions between 10,174 proteins where proteins were represented by their coding genes. The seed genes from the GWAS meta-analysis were mapped onto the STRING PPI network and connected along shortest paths using a Steiner Tree algorithm [41] such that a minimum number of edges and recruited genes, Steiner-Nodes, were added to the network. The resulting network contains both seed genes carrying PTB associated SNPs and potentially important interacting partners not in our original list. The network was visualized using Cytoscape [42].

35

Subnetwork Analysis and Scoring

The Subnetwork Analysis and Scoring System (SASSy) [36-38] subnetwork scoring algorithm was used to mine the PTB-enriched network for subnetworks of genes coordinately differentially expressed between non-laboring and laboring samples. Coordinate differential expression of a subnetwork was assessed by aggregating the gene expression (inferred from mRNA abundance) in the subnetwork and comparing the expression across phenotypes. To do this, the expression of a subnetwork in each phenotype was treated as a random variable. Mutual information was used to assess the dependence of a subnetwork’s expression in one phenotype (e.g., laboring) on its expression in another related phenotype (e.g., non-laboring). Such a subnetwork was considered to be biologically relevant since it implicated a tight module of proteins performing a coordinate function.

First, SASSy was applied to term myometrial transcriptome data (TNL vs.

TIL) and then preterm myometrial transcriptome data (PTNL vs. PTIL). The resulting subnetworks were then compared to assess functional connections and differences between term and preterm labor. SASSy was used to search the network for all possible combinations of 2-5 genes and compute the subnetwork activity of that group of genes [36, 37]. The restriction to small subnetworks of coordinately differentially expressed genes was purely for computational reasons and in principle one could search for larger subnetworks with sufficient computing power. Subnetwork activity was defined as the aggregated gene expression (i.e., mRNA abundance) of the subnetwork of genes for a given sample. Then the

36

mutual information between phenotypes (laboring and non-laboring samples) was computed as a measure of the ability of the subnetwork to distinguish between phenotypes [38] with the assumption that the higher the mutual information score, the better a subnetwork distinguishes between phenotypes.

In order to assess the significance of the mutual information score for a subnetwork and test the null hypothesis of whether that subnetwork did not associate with a particular phenotype, two permutations tests were performed.

The first test permutes the sample labels 100,000 times between phenotypes to randomize the patient groups (laboring and non-laboring) while preserving the expression correlations between genes for a given sample. The second test permutes the gene labels 1,000,000 times while preserving the patients in their respective phenotype groups. In each test, a null distribution for mutual information is estimated for each subnetwork size, 2-5 genes, and the cumulative distribution function (CDF) was computed for the distribution. The significance of the mutual information for a real network was then determined by evaluating the

CDF at that value of mutual information [38]. For example, a value of 95% from the CDF indicates that there is a 5% chance (p=0.05) of observing a higher mutual information value under the null hypothesis. Subnetworks were considered significant only if they passed both permutations tests with p < 0.05.

Sometimes SASSy identifies coordinately regulated genes which are not direct neighbors in the Steiner network used for scoring (i.e., PTB-SNP enriched

PPI Network). When this occurred the coordinately regulated candidate genes

37

were connected along all possible shortest paths in the PTB-SNP enriched PPI network.

Network Validation

Total RNA was extracted from uterine tissue obtained from the lower uterine segment at the time of cesarean delivery. Samples were collected at term

(≥ 37 weeks) before (n=5) and after (n=5) and preterm (≤ 37 weeks) before (n=5) and after (n=3) the onset of active labor defined by forceful and rhythmic uterine contractions and ≥ 4 cm cervical dilation. Tissue was collected with patient consent (IRB approval # 11-04-06) at MacDonald Women’s, University Hospitals of Cleveland. Total RNA was isolated as previously described [43]. Genomic

DNA was degraded by DNase treatment (Applied Biosystems). RNA was ethanol precipitated, resuspended in water, and quantified by absorbance at 260 nm. For quantitative RT-PCR, total RNA (400ng) was reverse transcribed with random primers using Superscript II reverse transcriptase (Life Technologies).

Primers for specific target mRNAs were designed using the Primer Express software (Applied Biosystems) based on published sequences. Assays were optimized and validated for all primer sets by confirming that single amplicons of appropriate size and sequence were generated and that the priming and amplification efficiencies of all primer pairs were identical. PCR was performed in the presence of SYBR Green (Applied Biosystems) in an ABI PRISM 7500

Sequence Detector (Applied Biosystems). The cycling conditions were 50°C for 2 minutes, 95°C for 10 minutes, and 40 cycles of 95°C for 15 seconds, 60°C for 1 minute. The cycle at which the fluorescence reached a preset threshold (cycle

38

threshold = CT) was used for quantitative analyses. The threshold in each assay was set at a level where the rate of exponential increase in amplicon abundance was approximately parallel between all samples. Messenger RNA abundance data were expressed relative to the abundance of the constitutively expressed glyceraldehyde 3-phosphate dehydrogenase (GAPDH) using the ΔCT method

(i.e., relative mRNA abundance = 2-(CT gene of interest - CT 18S rRNA)).

The significance of the qRT-PCR results was assessed both at the single gene and network level. Single gene differential expression was assessed using the Wilcoxon-Mann Whitney test on the relative mRNA abundance values between the laboring and non-laboring samples. We introduce a metric for quantifying subnetwork activity for significance testing called the network activity norm. The activity of the subnetwork for each sample was aggregated into a network activity norm (NAN) defined as the Euclidean norm of the relative mRNA abundance values for each gene in the network. Let n be a vector of abundance values for a k gene subnetwork. The NAN of n is defined as

! ! ! � = �! + �! +∙∙∙ +�! where �! is the expression of of gene i in subnetwork n. A

Wilcoxon-Mann Whitney test was then performed between laboring and non- laboring samples using the NANs for each subnetwork identified by SASSy testing the null hypothesis that NANs for a given network in each phenotype are the same. In both the single gene and network cases a p value less than 0.1 was considered a confirmatory p-value for the qRT-PCR. In addition to assessing network activation statistically we also queried each gene identified by SASSy in dbPTB to assess whether any candidate genes had a prior association with PTB.

39

Results

Constructing the PTB-SNP Enriched PPI Network

A Steiner Tree algorithm produced a PPI network enriched with 237 candidate PTB genes derived from the meta-analysis of PTB-GWAS. Of the 237 candidate genes, 56 encoded either microRNAs or did not have any associated protein in the full STRING database and 67 were excluded due to not being part of the filtered set of 10,174 proteins with interaction confidence greater than 0.5.

The remaining 114 candidate genes mapped to the filtered STRING PPI network,

91 topologically related Steiner Nodes were added, and 327 edges were required to connect them (Figure 2-2).

PDCD5

GREM1

COG4

pp

TSPAN3

ACVR2A

pp

CLDN11

pp

TCEAL1

pp

CNTN3

pp

pp

GRP

TSPAN4 GINS4

BMP2

COG5

RGS6

pp pp pp

pp

SOX5 pp HPCAL4

ITGA3

NID2

CMPK1

IGSF8

pp pp

JAZF1 pp ATP6V0A4

USP7

pp

NCL

pp

pp

pp MCPH1 CDC45 pp

pp

pp

pp SPON1

CD82

DNMT1

BACE1 SOX6

TGFB1 pp TFDP2 HSPG2

MEFV

DPP6

pp

NDN TSC22D3

WNK1 pp

pp pp

pp

pp pp

PREX2

PDK2

TEP1

pp pp pp

pp pp

pp

pp pp pp pp GOLGA7 pp

PRNP pp

pp

NR2C2 pp MTOR

pp ACSS2 pp

MSH3

pp VGLL2

DOCK2 pp pp

ZNF350

pp APP

pp pp MYBPC1

VPS37A

pp pp CTGF

TP53 TRAF3IP2 pp pp SUMO1P3 pp CEP63

UPF1 pp

pp

pp

pp

pp

GOLGA3 pp pp

pp pp pp

TWIST1 pp pp

SIPA1L2 pp pp

pp FHL1

pp TSG101 pp

LGALS2 AKT1

SF3B3 GNG2 pp

pp

pp

POLR2A

TIAM2

pp YWHAQ

pp MAP3K7 pp pp pp pp pp pp CEBPD pp pp pp pp pp DISC1

pp MEF2C pp pp pp pp pp pp ELSPBP1 pp BRCA1 pp pp pp pp pp

pp pp pp pp RAC1 pp pp pp HGS pp

RELA

pp pp pp pp

pp

ITCH RPS6KA5pp pp RUNX1 pp RASD1 pp pp pp pp FSHB

pp pp pp pp pp pp pp

pp

pp pp pp pp GNB1 pp

pp

pp pp pp FSHR pp EP300

CDC42EP1 pp pp pp CLOCK CASP3 pp pp TUBA4A pp CTBP1

CCDC88C pp pp pp

ESR1 pp pp pp pp pp pp pp

pp

pp

pp

HDAC9

CDC42 pp pp pp pp pp pp SIGLEC10

pp NOL3 pp pp pp pp pp pp

UBC pp ETS1 pp pp RAPGEF4

pp pp pp pp pp pp pp ERBB4 pp MAP1A pp KCNJ9 NCALD

PAX6 pp pp

pp pp pp pp

pp pp pp DVL1L1 pp pp

pp PLA2G4C

pp pp pp S100A10 RHOJ MTMR6 pp pppp

SRC pp pp pp ADRB2

pp BCL6 pp FGF18 pp pp pp HDAC5 pp pp pp pp pp

MTMR9 pp PRKCB pp PRKACA

pp

FYN pp pp pp pp

pp

MTMR7 pp pp pp pp pp pp pp pp MED24 pp pp

PLCG1 pp FGFR2 pp pp pp DLG4 pp

pp pp KLF12 pp

SH2D2A pp SIX3 pp pp pp pp NTRK1 ETNK2 pp CDC16 PTPN6 pp pp

pp CALM1 pp pp pp CTNNB1 pp pp

pp CTSB pp pp PXN

pp pp

pp pp

pp DLG2 pp

pp

PCDH9 pp

NUP214 pp pp pp pp pp pp pp pp

pp ROS1 pp GRID1 INSR pp pp

PLCG2 DLG1

pp YES1

pp MKI67 pp pp PTPRE

pp pp

CACNB2

pp

PEMT pp pp ANAPC13

pp GRIN1

REN pp pp

pp

KCNJ10 SQSTM1 FZD1 CDH7 pp pp

pp SHANK2

pp pp

SVIL MED9 SSTR2

pp pp

pp YWHAB

EPHB1 WNT5A

CTNNA3 NLGN1 pp

RBX1 pp ENPP1 pp

PCSK5 DDX20

pp

ICAM1

PYGO1 DDX19B

pp

KCNAB1

INSRR pp

PPP3R1

LPHN2

pp

pp WNT2

pp

pp

ZC3HC1

pp

MIR4329

PMCH

pp

pp PLAT

ARID1B PDCL2

pp

ETV3 NRXN1 pp

SFRP1

pp

MIF4GD

CABP5

pp PCSK2

pp

pp

SCLT1

pp

SLC3A2

WNT4

pp

SDCBP2

CKAP4 SCG5

DDX19A

pp

pp

pp

TM4SF1

FAM57A

USP50

pp

pp

DYNC2H1

GGT7

pp

DYNC2LI1

Figure 2-2: PPI Network of SNP carrying seed genes and interacting partners recruited by the Steiner Tree Algorithm. Green nodes indicate PTB-SNP genes mapped onto the STRING PPI Network (114), and red nodes indicate proteins added by the Steiner Tree algorithm (93).

40

Term and Preterm Myometrium Scored Networks

SASSy was used to analyze all groups of 2-5 genes in our PTB enriched network and identify subnetworks with high mutual information. We first scored the PTB enriched PPI network using a transcriptome dataset of 5 TNL and 5 TIL myometrium samples [13]. Twenty-two significant subnetworks of coordinately regulated genes were identified that passed both permutation tests done by

SASSy (p<0.05). All networks consisted of two genes with high mutual information connected along all possible shortest paths between them in the PTB enriched network. Fifteen of the twenty-two subnetworks scored highly due to high mutual information between MEF2C and a second coordinately differentially expressed gene. MEF2C is a transcription factor that has previously been shown to repress inflammatory pathways in endothelial cells [44] and is downregulated with the onset of labor in myometrial transcriptome datasets [1, 4, 13]. Though

MEF2C is not itself a significant preterm birth SNP-carrying gene, it was found to be coordinately expressed with several significant SNP-carrying seed genes.

Querying the genes in the MEF2C network in the STRING [40] and genecard databases [45] revealed three broad functional groups of networks that involve MEF2C (Figure 2-3). The networks defined by MEF2C and the genes

CACNB2, DPP6, KCNAB1, and KCNJ9 contain genes associated with ion channel function (Figure 2-3a). A fifth ion channel associated network, HDAC9 and CACNB2, scored highly as a subnetwork of the network defined by MEF2C and CACNB2 [40, 45] (Figure 2-3a). Muscle cell function associated networks were defined by MEF2C and the associated genes FHL1 (skeletal muscle), GRP

(smooth muscle), LGALS2 (cardiac muscle), and MYBPC1 (skeletal muscle) [40, 41

45] (Figure 2-3b). FHL1 is also associated with ion channel binding [40, 45]. The

MEF2C networks containing the genes, KLF12, RPS6KA5, and PLA2G4C, are involved in inflammatory processes and prostaglandin synthesis [40, 45] (Figure

2-3c). Within the inflammatory process networks the gene KLF12 also had high mutual information with RPS6KA5 in addition to being co-regulated with MEF2C

(Figure 2-3c).

Figure 2-3: MEF2C subnetworks are shown grouped by associated function of coordinately differentially expressed genes. Green nodes carry PTB-SNPs and red nodes were recruited by the Steiner Tree algorithm. Five ion channel (2-3a), four muscle function (2-3b), and 4 inflammation associated (2-3c) subnetworks were identified. 42

Scoring the PTB enriched network with transcriptome data from two microarray studies [1, 4] of the preterm myometrium identified 38 subnetworks that discriminated between the PTNL and PTIL samples. Six of these subnetworks were related to those identified by scoring with term myometrium data and were subjected to further investigation. The most frequently identified gene in networks with high mutual information was TWIST1. TWIST1 is a transcription factor activated by TNF-α and IL-1β via the NF-κB signaling pathway that acts as a negative feedback mechanism for inflammatory cytokines

[46]. Upregulation of TWIST1 has previously been noted in the human myometrium during term-labor [46], though it is coordinately down-regulated with two other genes in four identified subnetworks (Figure 2-4a). TWIST1 is also a direct repressor of MEF2C [40] and unlike MEF2C carries a significant preterm birth associated SNP (p=2.74*10-5).

43

Figure 2-4: Six networks were found coordinately regulated in preterm myometrium associated with function in term myometrium. Green nodes carry PTB-SNPs and red nodes were recruited by the Steiner Tree algorithm. Four TWIST1 networks (2-4a) and two MEF2C networks (2-4b) were identified.

Two preterm subnetworks contained MEF2C (Figure 2-4b). In these,

MEF2C was coordinately expressed with two other genes rather than with only one other gene as in the term networks. The two networks consisted of MEF2C with LGALS2 and PLA2G4C in one network and MEF2C with RUNX1 and

PLA2G4C in the other. All three genes (RUNX1, LGALS2, and PLA2G4C) carry 44

PTB associated SNPs. LGALS2 and PLA2G4C are the only genes that were coordinately regulated with MEF2C in both term and preterm myometrium.

Validated Subnetworks

qRT-PCR was performed on an independent cohort of 5 TNL and 5 TIL myometrium samples to confirm the changes in expression of 13 genes identified to be coordinately regulated in the term myometrium networks. After two attempts, the primers for KCNJ9, MYBPC1, and GRP failed to generate identical melt curves. These genes were excluded from further analysis. Table 2-1 shows the gene expression fold change predicted in the term networks by the original transcriptome studies along with the fold change of the genes in the qRT-PCR cohort (TIL/TNL). SASSy indicates that all term myometrium subnetworks (Figure

2-3), except for MEF2C-LGALS2 should exhibit coordinate downregulation of gene expression with the onset of labor. LGALS2 is expected to be upregulated based on the original study’s observed fold changes [13]. The qRT-PCR confirmed that all expected gene downregulation occurred with the onset of labor and indicated that LGALS2 is also downregulated with labor contrary to that observed in the discovery data set (Table 2-1). MEF2C (p ~0.03), HDAC9

(p~0.095), KLF12 (p~0.007), and CACN2B (p ~ 0.055) were significantly differentially expressed at the single gene level. They also had very similar fold- change levels in the discovery and validation experiments.

45

Table 2-1: qRT-PCR results for the term myometrium subnetworks. Fold changes are shown for each gene in the original transcriptome study and qRT-PCR cohort. All fold changes are calculated as laboring expression relative to non-laboring expression. The significance of the qRT-PCR cohort fold change is assessed by the Wilcoxon Mann Whitney test with p < 0.1 considered significant. Gene Chan et al. Fold qRT-PCR Fold p-value Change Change CACNB2 0.45 0.47 0.055 DPP6 0.36 0.57 0.15 FHL1 0.36 0.60 0.22 HDAC9 0.25 0.59 0.095 KCNAB1 0.39 0.38 0.22 KLF12 0.29 0.54 0.0079 LGALS2 1.4 0.69 0.15 MEF2C 0.53 0.62 0.032 PLA2G4C 0.52 0.75 0.31 RPS6KA5 0.32 0.43 0.15

qRT-PCR was performed on a cohort of 5 PTNL and 3 PTIL samples to confirm the changes in gene expression for the 12 coordinately regulated genes identified in the preterm myometrium network analysis. After 2 attempts, the primers for PAX6 and DLG1 failed to generate identical melt curves and those genes were excluded from further analysis. Table 2-2 shows the predicted gene expression fold changes as well as qRT-PCR fold changes (PTIL/PTNL) for each gene. No genes were significantly differentially expressed at the single gene level, though the qRT-PCR fold changes tended to be similar to at least one of the original transcriptome studies.

46

Table 2-2: qRT-PCR results for the preterm myometrium subnetworks. Fold changes are shown for each gene in the original transcriptome study and qRT-PCR cohort. All fold changes are calculated as laboring expression relative to non-laboring expression. The significance of the qRT-PCR cohort fold change is assessed by the Wilcoxon Mann Whitney test with p < 0.1 considered significant. Genes Weiner et al. Bethin et al. qRT-PCR p-value Fold Change Fold Change Fold Change CDC42 1.2 0.83 0.85 0.39 COG4 0.95 0.95 0.84 0.57 LGALS2 1.4 2.1 1.2 0.39 MEF2C 1.1 0.51 0.94 0.79 NR2C2 0.89 0.98 0.85 0.57 PLA2G4C 0.94 0.85 1.3 0.57 PLAT 1.05 1.5 0.60 0.14 PYGO1 1.02 0.95 1.01 0.79 RUNX1 1.2 1.5 1.4 0.79 TWIST1 0.97 1.2 0.70 0.14

Despite the paucity of significant differential expression of individual genes in the term and preterm myometrium networks, this analysis of sub-network level differential expression revealed that eight of the term myometrium networks were judged to be significant, with p-values ranging from 0.0079 to 0.055 (Table 2-3).

Since not all primers succeeded in measuring gene expression in the independent cohort, we calculated the NANs with whatever genes were successfully measured by qRT-PCR. Four validated networks are associated with ion channel function (MEF2C-DPP6, MEF2C-CACNB2, MEF2C-KCNAB1,

HDAC9-CACNB2). One was associated with muscle function (MEF2C-LGALS2), and three with inflammation (MEF2C-RPS6KA5, KLF12-RPS6KA5, MEF2C-

KLF12). The validated networks are shown combined as one network colored by coordinate differential expression in labor (Figure 2-5). PLA2G4C is included because despite not being coordinately expressed with MEF2C, it is the only

47

subnetwork gene that replicates in dbPTB as having a prior association with PTB

[26].

Table 2-3: Term and preterm myometrium network significance as assessed by aggregating subnetwork activity for each patient with a Network Activity Norm (NAN) and tested with the Wilcoxon Mann Whitney test. Network Phenotype p-value MEF2C-CACNB2 Term 0.055 MEF2C-DPP6 Term 0.016 MEF2C-KCNAB1 Term 0.032 MEF2C-FHL1 Term 0.22 HDAC9-CACNB2 Term 0.055 MEF2C-LGALS2 Term 0.032 MEF2C-PLA2G4C Term 0.22 MEF2C-KLF12 Term 0.0079 KLF12-RPS6KA5 Term 0.0079 MEF2C-RPS6KA5 Term 0.032 TWIST1-DLG1-PYGO1 Preterm 1.0 TWIST1-PAX6-DLG1 Preterm 0.14 TWIST1-COG4-PLAT Preterm 0.57 TWIST1-NR2C2-CDC42 Preterm 0.25 MEF2C-RUNX1-PLA2G4C Preterm 0.79 MEF2C-PLA2G4C-LGALS2 Preterm 0.79

Figure 2-5: Subnetworks of coordinately differentially expressed genes validated by qRT-PCR or querying dbPTB in term and preterm labor. Term labor is characterized by coordinate differential expression of MEF2C with the ion channel genes DPP6, CACNB2, and KCNAB1, the muscle function gene LGALS2, and inflammatory genes KLF12 and RPS6KA5. Network genes are colored by (5a) predicted fold change (up-red, not tested-grey, down-blue), (5b) actual qRT-PCR fold change (up-red, not tested-grey, down-blue), and PTB-SNP status (has SNP-green, no SNP- red).

48

Discussion

We performed an integrative network analysis of preterm birth GWAS and myometrium tissue transcriptome studies to identify key subnetworks of genes and proteins potentially driving the onset of term and preterm labor. Our approach leverages the high coverage of myometrial gene expression data to infer the functional activity of modules of proteins that interact with protein coding genes within 20kB of preterm birth associated SNPs. This consideration of

GWAS data in a PPI network increases the power of traditional GWAS analysis by enabling analysis of combined effects of multiple SNPs in a functional context.

The strength and novelty of our approach is that it is capable of taking candidate

SNPs that failed to achieve statistical significance in a traditional GWAS and successfully identifying tissue specific dysregulation of these previously discarded SNPs that have biological functions of immense relevance to disease pathophysiology.

The subnetworks identified in the preterm and term myometrium include the transcription factors TWIST1 and MEF2C, which are known to interact with each other. TWIST1 is a direct repressor of MEF2C via EP300 [40, 45]. The downstream effects of these genes include ion channel function, muscle cell functions, prostaglandin synthesis, and inflammatory pathways [40, 45], all of which are involved with transition of the myometrium from the quiescent to laboring state. TWIST1 and MEF2C have not previously been associated with

PTB and this analysis suggests that a subset of preterm birth associated SNPs

49

may act through TWIST1 and MEF2C associate mechanisms that manifests in premature uterine contractions (Figure 2-5).

The term myometrium subnetworks of MEF2C-LGALS2 and MEF2C-

PLA2G4C are the only ones to also be identified by SASSy in the preterm myometrium. While these genes were all downregulated with the onset of term labor, in preterm labor both LGALS2 and PLA2G4C were found to be upregulated by qRT-PCR. Additionally, the preterm MEF2C, LGALS2, and

PLA2G4C 3-gene subnetwork suggests stronger co-regulation of these genes in the preterm myometrium than the term myometrium where they formed two distinct subnetworks. This suggests that this module of MEF2C, PLA2G4C, and

LGALS2 may be worthy of further investigation as a differentially regulated subnetwork in term and preterm labor.

PLA2G4C is the only gene in our subnetworks that replicated in dbPTB

[26]. PLA2G4C is a calcium-independent phospholipase that has been shown to regulate prostaglandin synthesis independent of other labor signals such as oxytocin [47]. Disruption of PLA2G4C function both from internal SNPs and the upstream disruption of MEF2C by TWIST1 SNPs may produce a synergistic combined effect to prime the myometrium for preterm labor. Though PLA2G4C acts independent of intracellular calcium signaling pathways, our results suggest that SNPs on upstream regulators of PLA2G4C could influence calcium and potassium signaling pathways in parallel to PLA2G4C to prematurely initiate the myometrium’s contractile phenotype.

50

We also introduce a metric for quantifying subnetwork activity for significance testing called the network activity norm (NAN). Since our study focused on subnetworks of genes where the expected differential expression pattern was the same direction for all genes, the NAN is an appropriate metric for capturing the overall activation or deactivation of a subnetwork between phenotypes. Subnetworks where some genes increase and others decrease expression between phenotypes would not be appropriate candidates for evaluation via the NAN. One can easily see that a two-gene subnetwork where one gene activates and the other deactivates between phenotypes could produce the same NAN in each phenotype despite a drastic change in network component activity. A possible solution to this problem would be to subdivide networks into smaller units where the genes have the same differential expression pattern, but we do not attempt to assess this extension of the NAN in this work.

There are several possible improvements and extensions to our study. A larger cohort of patients or the recruitment of additional cohorts to a preterm birth

GWAS meta-analysis would increase the power of the analysis and enable identification of higher confidence candidate genes for network construction and scoring. An alternative approach would be to focus on a different population of women for the initial GWAS. Though we expect that our conclusions based on the PTB-GWAS meta-analysis from a Scandinavian population will generalize to most other populations, an alternative approach would be to obtain candidate genes from a PTB-GWAS of women from minority backgrounds. One could also

51

be stricter or more lenient with the 20kb threshold of associating candidate SNPs with genes.

To create a merged cohort of preterm myometrium samples for subnetwork scoring we combined data from two microarray studies. One of these studies had spontaneous preterm labor samples [1] while the other contained preterm in labor samples with HCA [4]. At the time we were unaware of the differing biological signaling characteristics of these two parturition subtypes and the three HCA samples present a potential confounding factor for our results.

However, our methodology potentially overcomes this difficulty. We focused on scoring a PPI network enriched for candidate genes related to a particular biological hypothesis, PTB-SNPs regulating gene expression in labor. Therefore the differing global signaling characteristics of HCA and spontaneous PTB, while significant in whole transcriptome analysis, are less significant in this restricted context of scoring a particular PPI network. Still, a potential extension of this study would be to obtain a gene expression dataset of 5 PTNL and 5 spontaneous PTIL samples for subnetwork scoring. To our knowledge, such a dataset does not currently exist making this a potential area of further study should such a dataset become available.

Further, though we focused on the myometrium and preterm labor as defined by premature uterine contractions, premature rupture of fetal membranes accounts for a significant portion of preterm births [4]. The advantage of our approach is that the PPI network we constructed enriched with PTB-SNP- carrying candidate genes is tissue agnostic and is highly useful for other studies

52

in other tissues. For example, the PTB-SNP enriched PPI network could be scored with other maternal gene expression data (e.g. decidua, cervix, etc.) to identify tissue specific PPI subnetworks associated with preterm birth. However, the available transcriptome data for the non-laboring and laboring fetal membranes in gene expression omnibus [10] is sparse and makes conducting such an analysis impossible at this time.

The overall hypothesis of our approach is that a complex disorder like preterm birth is usually not the product of a single large genetic disturbance, but rather is the product of several perturbations acting through common functional pathways and networks. Our identification of a set of overlapping MEF2C-

TWIST1 subnetworks and a set of 6 validated MEF2C regulatory subnetworks

(Figure 2-5) implicates these transcription factors as key drivers of parturition.

Further studies are required to assess the therapeutic potential of targeting this pathway, in particular TWIST1, PLA2G4C, and MEF2C, to prevent preterm labor.

Furthermore, our approach demonstrates that the apparent graveyard of insignificant SNPs in GWAS data may in fact be highly enriched for relevant candidate genes that may explain a complex disorder if viewed in a network context.

ACKNOWLEGEMENTS

This work was supported through the March of Dimes Ohio Prematurity

Research Collaborative and the Case Western Reserve University Clinical and

Translational Science Collaborative. We would also like to thank William

53

Ackerman, Alethea Barbaro and Jill Barnholtz-Sloan for their invaluable discussions.

54

Chapter 3. A Dynamical Systems Model of

Interactions with Inflammation in Human Parturition

Submitted for Publication

55

A Dynamical Systems Model of Progesterone Receptor Interactions with

Inflammation in Human Parturition

Douglas Brubaker1, Alethea Barbaro2, Mark Chance1 and Sam Mesiano3*

1Center for Proteomics and Bioinformatics, Case Western Reserve University,

11900 Euclid Avenue, 44106 Cleveland, OH, USA. 2Department of Mathematics,

Applied Mathematics, and Statistics, Case Western Reserve University,, 11900

Euclid Avenue, 44106 Cleveland, OH, USA. 3Department of Reproductive

Biology, Case Western Reserve University, 11900 Euclid Avenue, 44106

Cleveland, OH, USA.

Corresponding Author*

Sam Mesiano

Department of Reproductive Biology

Case Western Reserve University,

11900 Euclid Avenue,

Cleveland, OH, 44106 USA

Email: [email protected]

56

Summary

Background Progesterone promotes uterine relaxation and is essential for the maintenance of pregnancy. Withdrawal of progesterone activity and increased inflammation within the uterine tissues are key triggers for parturition.

Progesterone actions in myometrial cells are mediated by two progesterone receptor (PR) isoforms, PR-A and PR-B, that function as ligand-activated transcription factors. PR-B mediates relaxatory actions of progesterone, in part, by decreasing myometrial cell responsiveness to pro-inflammatory stimuli. These same pro-inflammatory stimuli promote the expression of PR-A which inhibits the anti-inflammatory activity of PR-B. Competitive interaction between the progesterone receptors then augments myometrial responsiveness to pro- inflammatory stimuli. The interaction between PR-B transcriptional activity and inflammation in pregnancy myometrium is examined using dynamical systems modeling in which quiescence and labor were represented as phase-space equilibrium points. Our model showed that PR-B transcriptional activity and the inflammatory load determine the stability of the quiescent and laboring phenotypes. The model is tested using published transcriptome datasets describing the mRNA abundances in myometrium before and after the onset of labor at term. Surrogate transcripts were selected to reflect PR-B transcriptional activity and inflammation status.

Results The model predicts contractile status (i.e., laboring or quiescent) with high precision and recall and outperforms single gene classifiers. Linear stability

57

analysis shows that phase space bifurcations exist in our model that may reflect the phenotypic states of the pregnancy uterus. The model describes a possible tipping point for the transition of the quiescent to the contractile laboring phenotype.

Conclusions Our mathematical model describes the functional interaction between the PR-A:PR-B hypothesis and tissue level inflammation in the pregnancy uterus and is a first step in more sophisticated dynamical systems modeling of human parturition. The model explains observed biochemical dynamics and as such will be useful for the development of a range of systems- based models using emerging data to predict preterm birth and identify strategies for its prevention.

Background

Preterm birth (PTB) causes the majority of neonatal mortality and morbidity and is a major public health and socioeconomic problem worldwide [17, 48]. To prevent PTB, a clear understanding is needed of the hormonal interactions and signaling pathways that control the contractile state of the pregnancy uterus. For most of pregnancy the myometrium (uterine muscle) is maintained in a relaxed and quiescent state to accommodate the growing conceptus. The process of parturition is initiated by a dramatic phenotypic transformation of the myometrium to the laboring state wherein it becomes the rhythmically contracting engine for birth. It is generally considered that the contractile state of the myometrium is controlled by the balance between the relaxatory influences of the steroid hormone progesterone and pro-labor stimuli, especially tissue-level inflammatory

58

stimuli within the myometrial compartment. Progesterone is essential for the establishment and maintenance of pregnancy and its withdrawal is the principle trigger for parturition [49-53]. Multiple studies support the concept that parturition is an inflammatory process and is associated with increased tissue-level inflammation within the myometrium, decidua, and cervix [2, 54, 55].

Actions of progesterone in myometrial cells are mediated by two progesterone receptor (PR) isoforms, designated PR-A and PR-B, that function as ligand activated transcription factors with PR-B exhibiting stronger transcriptional activity than PR-A. In vitro studies show that PR-A at some gene promoters acts as a repressor of progesterone responsiveness by inhibiting the transcriptional activity of PR-B [56-58]. In most species progesterone withdrawal occurs by a decrease in circulating progesterone levels [59-63]. Human parturition, however, occurs without systemic progesterone withdrawal, and instead is thought to involve decreased responsiveness of the myometrial cell to

PR mediated progesterone actions resulting in a functional progesterone withdrawal [2, 43]. We have proposed that for most of human pregnancy, progesterone via PR-B promotes uterine quiescence, in part by inhibiting the responsiveness of myometrial cells to pro-inflammatory stimuli and preventing tissue level inflammation, and that functional progesterone withdrawal at parturition is caused by increased PR-A-mediated trans-repression of PR-B [2,

43, 64]. This mechanism is referred to as the PR-A/PR-B hypothesis for functional progesterone withdrawal. Thus, a key mechanisms by which

59

progesterone/PR-B promotes uterine quiescence is by preventing labor-inducing uterine inflammation [2, 64, 65].

As pregnancy advances, the capacity for PR-B to mediate relaxatory and anti-inflammatory actions of progesterone on the pregnancy myometrium decreases due to increased repression by PR-A [64]. Interestingly, the amount and transrepressive activity of PR-A in myometrial cells is increased by pro- inflammatory stimuli suggesting a causal link between inflammation and PR-A- mediated functional progesterone withdrawal [66]. Thus, our working model for functional progesterone withdrawal in the control of human parturition posits that

PR-B-mediated progesterone actions in the myometrium gradually decreases with advancing gestation in response to gradual increases in PR-A in response to increased inflammatory load. A threshold nadir for functional progesterone withdrawal is eventually reached at which PR-B is functionally suppressed and progesterone no longer promotes uterine quiescence, and the pro-labor inflammatory influences prevail to induce labor and delivery. As inflammation and

PR-B signaling are temporally and functionally related, this hypothesis is amenable to dynamical systems modeling techniques to assess system behaviors in response to changes in inflammatory load and PR-B activity.

Dynamical systems modeling uses fixed rules to describe the behavior of a system as its interacting components change with time. This framework has been used to examine the temporal activity of multiple biological systems including epidemics [67], predator-prey population interactions [68], chemical kinetics, protein phosphorylation, and cell signaling pathways [69, 70]. These

60

models are effective at predicting sudden qualitative changes in system behavior, referred to as bifurcations. Furthermore, when the mechanism underlying the dynamics of a system is not well understood, a dynamical systems model can be useful for determining whether a particular set of hypotheses that underlie the model constitute a plausible mechanism if the predictions of that model are bourn out by the data. For all these reasons, dynamical systems are well suited for modeling the process of parturition where the myometrium undergoes a dramatic phenotypic bifurcation as it changes from the quiescent to laboring phenotype and the precise mechanism for this transformation is not yet known.

Herein we present a dynamical systems model consistent with the PR-

A:PR-B hypothesis that links PR-B activity and inflammatory status in the myometrium at term. PR-B and inflammation were each modeled with a differential equation describing their activation and generation rates, their limiting behavior, and how they interact in association with the onset of labor. The model was robust when tested using published transcriptome datasets from quiescent and laboring myometrium and predicted contractile status (i.e., laboring or quiescent) with high precision using a novel classifier developed from the model.

Results and Discussion

Computing a Patient Specific Probability of Labor

Two coupled differential equations were used to model the change in

^ transcriptionally active PR-B with time, !! , and the change of inflammation with !"

^ time, !! , as a function of the growth and depletion of each parameter and !"

61

interactions between parameters. The equations we propose to model the transcriptional activity of PR-B and inflammation status are

^ ^ ^ ^ !! ^ ^ ! ^ ^ !! ^ ^ ! ^ ^ = � � 1 − − �!� � , = � � 1 − − �!� � (1) !" !! !" !!

The growth of PR-B and inflammation is modeled by the terms directly to

^ ^ ^ ^ ! ^ ^ ^ ^ ! ^ ^ the right of the equals sign, � � 1 − − �!� � and � � 1 − − �!� � !! !! respectively. The levels of PR-B and inflammation increase in the presence of

PR-B, �^, and inflammation, �^, at rates �^and �^ respectively. The terms

^ ^ 1 − ! and 1 − ! , impose a maximum or critical value on the level of PR-B !! !! and inflammation. At any given time there is a finite level of PR-B induced activation of the transcriptional machinery. This critical level of PR-B is represented by the parameter �!. Analogously, there is some saturable level of inflammatory drivers active in a myometrial cell. This critical level of inflammation

^ ^ ^ ^ is represented by the parameter �!. The growth terms � � and � � in the equations for PR-B and inflammation will themselves increase in size as the amount of PR-B and inflammation increase, but in a way that is limited by the

^ ^ critical values for PR-B and inflammation. If � = �! or � = �! the limiting terms in parenthesis equal zero which causes the growth term to equal zero.

The depletion of PR-B and inflammation is modeled by the terms after the

^ ^ ^ ^ negative sign, namely �!� � and �!� � . Qualitatively, this means that the rate

^ ^ of depletion of PR-B is the product of � , � and a rate constant �!. The depletion of inflammation follows the same behavior with a different rate constant �!. The values of �! and �! account for the relative amounts of PR-B to repress

62

inflammation and relative impact of PR-B on inflammation repression. While we know that the phenomenon of PR-B repression of inflammation occurs, the exact mechanism for this repressive activity is not well understood. By allowing for �! and �! to take on different values relative to one another, we are able to explore multiple possible models for PR-B repression of inflammation. Figure 3-1 illustrates the way in which the pro-labor actions of PR-A and inflammation can be mathematically combined and modeled as a competitive interaction with the pro-relaxatory actions of PR-B.

Figure 3-1: Illustration of the basin of attraction about the labor equilibrium point in the PR-B- inflammation dynamics model. The right panel shows what happens to the system when � = � and right panel shows a condition where � > �. The labor equilibrium point is shown in black with its associated basin of attraction shaded in grey. The blue and orange lines are the null clines and !" !" correspond to the lines produced when we set = 0 and = 0. !" !"

The dimensionless version of the model substitutes three dimensionless groups for the six parameters currently in the model. The resulting dimensionless model is,

63

!" = �� 1 − � − ��, !" = �� 1 − � − ��� (2) !" !" and the dimensionless parameters are,

^ ^ � = ! , � = ! , � = !!!!, (3) !!!! !!!! !!!! where the values of � and � are determined by the normalized dimensionless values for the PR-B and inflammatory surrogate genes respectively and the value of � corresponds to the strength of PR-B's anti-inflammatory actions.

The core result, on which the predictive modeling is based, is the interpretation of the phase space of our model as a probability space and the application of this measure in a patient-specific manner to compute a probability of labor for each patient. In the phase space of a dynamical system each point represents a particular state of the system, trajectories at points show how the system would evolve in time from that starting point, and bolded points are equilibrium points where the system ceases to change and reaches a steady state. Figure 3-1 shows an example phase space of our dimensionless model where the parameters are set such that � = 1 and � = � = 0.5. The phase space here has four equilibrium points, each of which corresponds to a particular steady state level of PR-B and inflammatory activity in the myometrium.

The equilibrium point corresponding to maximal inflammation and minimal

PR-B at (0, 1) corresponds to the myometrium's laboring phenotype in our model.

The set of vectors pointing toward that point are the basin of attraction of the laboring equilibrium point. In the bounded domain of the unit square, the area of the laboring equilibrium point's basin of attraction divided by the area of the domain constitutes a probability measure on the domain where this probability 64

corresponds to the likelihood the system is in labor. The area of the basin changes as �, �, and � change (Figure 3-1). This probabilistic interpretation of a phase space is reasonable under the assumptions that all possible pairs of values of � and � in the domain occur with equal likelihood. For example, in the case where � = � and � is fixed at 0.5, the probability of labor is equal to 1, the entire domain is the basin of attraction for the laboring equilibrium, which means that quiescence is impossible. In order to ensure that quiescence is a possibility, we set � = 1 so that only one value of �, � = 1, results in a probability of labor equal to 1 enabling us to explore the full range of values for � and �. Thus, the model we apply to patient data is

!" = �� 1 − � − ��, !" = �� 1 − � − �� (4) !" !"

Given a value of (�, �) for a particular patient, the model's phase space reflects the state of the pregnancy myometrium for that particular patient. Interpreting the phase space as a probability means that we have a metric for predicting the likelihood of the patient going into labor. It is now possible to test the predictive power of particular sets of biomarkers for predicting the laboring phenotype by assigning the values of � and � based on the values of molecular markers of PR-

B activity and inflammation.

Predictive Modeling of Parturition Datasets

We used the normalized expression values of genes from two publicly available parturition transcriptome datasets as surrogates for the values of � and � in

Equation 4 to assess the predictive power of our model for identifying patient phenotypes as in labor (IL) or not in labor (NIL). Six predictors were assessed for

65

each patient in two independent datasets [1, 13] consisting of a PR-B surrogate paired with an inflammation surrogate. The PR-B surrogates were FOXO1A and

FKBP5 [64, 71] and the inflammatory surrogates were IL-1β, IL-6, and IL-8 [72].

The probability of labor was calculated for each patient in each dataset using each of the six predictor pairs (FOXO1A, IL-1β), (FKBP5, IL-1β), (FOXO1A, IL-6),

(FKBP5, IL-6), (FOXO1A, IL-8), and (FKBP5, IL-8) (Figure 3-2) and for each of the genes individually (Figures 3-3 and 3-4).

Figure 3-2: Patient-specific probabilities of labor. Probabilities of labor calculated for each patient in the RNA seq dataset and microarray dataset plotted for each predictor pair of genes. Non- laboring samples are shown in blue and laboring samples are shown in red. One sided nonparametric confidence intervals are displayed for the predictors which had significant Wilcoxon-Mann-Whitney and separated confidence intervals. These p-values are shown for each classifier along the right side of the plot.

66

Figure 3-3: Single gene inflammatory surrogate predictors. Single gene inflammatory surrogate classifiers for each patient in the RNA seq dataset and microarray dataset plotted for each gene. Non-laboring samples are shown in blue and laboring samples are shown in red. One sided nonparametric confidence intervals are displayed for the gene which had significant Wilcoxon- Mann-Whitney and separated confidence intervals. These p-values are shown for each classifier along the right side of the plot.

Figure 3-4: Single gene PR-B surrogate predictors. Single gene PR-B surrogate classifiers for each patient in the RNA seq dataset and microarray dataset plotted for each gene. Non-laboring samples are shown in blue and laboring samples are shown in red. One sided nonparametric confidence intervals are displayed for the gene which had significant Wilcoxon-Mann-Whitney p- values and separated confidence intervals. These p-values are shown for each classifier along the right side of the plot.

All of the model's predictors validated across datasets except for the two constructed using IL-8 as an inflammatory surrogate gene (Figure 3-2). None of the single gene classifiers except for IL-6 were able to produce confidence 67

intervals on expression that separated in both datasets. Thus, only IL-6 functioned as a cross-platform single-gene predictor that could be assessed for both datasets (Figure 3-3). Neither FOXO1A nor FKBP5 functioned as a single gene predictor for both the microarray and RNA seq dataset (Figure 3-4). Once classifiers were constructed for each dataset, cross-validation was used to assess the performance of each classifier. The results of this analysis are shown in Table 3-1.

Table 3-1: Performance of of the classifiers built using the RNA seq dataset at classifying the microarray dataset and of the classifiers built using the microarray dataset at classifying the RNA seq dataset.

Predictor RNA seq Classifier Microarray Classifier

Gene(s) Precision Recall Precision Recall

(FOXO1A, IL-1β) 1.0 0.88 1.0 0.7

(FKBP5, IL-1β) 1.0 1.0 1.0 0.7

(FOXO1A, IL-6) 1.0 0.38 1.0 1.0

(FKBP5, IL-6) 1.0 0.75 1.0 0.8

(FOXO1A, IL-8) 0.67 0.75 NA NA

(FKBP5, IL-8) 0.5 1.0 NA NA

FOXO1A 0.63 1.0 NA NA

FKBP5 NA NA NA NA

IL-1β NA NA 0.89 0.90

IL-6 1.0 0.63 1.0 1.0

IL-8 0.38 1.0 NA NA

68

IL-6 is the only single-gene that functioned as a predictor for both the RNA seq and microarray datasets. It performed well as a classifier with a precision of

1.0 regardless of which dataset was used for the model training set. The average recall of IL-6 as a single gene classifier was 0.82, making it a very strong predictor of the laboring or quiescent phenotypes. However, our model's classifiers built with FKBP5 as a PR-B surrogate and IL-1β as an inflammatory surrogate performed with average precision and recall of 1.0 and 0.85 respectively. The pairing of FKBP5 and IL-1β as predictor genes in our model is the only pairing to outperform IL-6 as a single gene classifier. With the exception of IL-6, all of our model's classifiers built with a pair of genes outperformed the single gene classifiers built with a gene from that predictor pair.

Identifying the specific inflammatory drivers that induce labor is an important step in identifying upstream and downstream therapeutic targets to delay the onset of premature labor. Interestingly, each of the inflammatory surrogate genes we used to construct our model's classifiers has a subtly different biological function in the pregnancy uterus. IL-8 functions as a chemokine, drawing neutrophils and macrophages to tissues where it is expressed [73]. In our model IL-8 was a poor single gene predictor and weak inflammatory surrogate (Table 3-1). The lack of phenotypic predictability by IL-8, even when paired with a PR-B surrogate, may suggest that IL-8 does not play an important role in the inflammatory process of labor. In contrast IL-1β and IL-6 [73] performed well as classifiers in our model when paired with PR-B surrogates,

69

suggesting that the inflammatory processes associated with these genes are more important for labor onset than those of IL-8. In particular, IL-6 is a cytokine that plays an important role in both the canonical and non canonical JAK-STAT signaling pathway [74, 75], a pathway that may integrate effects of circulating and local myometrial cytokines. Recent work examining IL-6 as a blood-based biomarker for labor [76] provides further evidence that understanding and modeling this cytokine in the myometrium could be key to elucidating the driver pathways of labor.

The small sample sizes in the transcriptome datasets (N=18) cause us to exercise caution in the data analysis. The nonparametric methods of constructing confidence intervals and testing the separation of the NIL and IL groups are less powerful than comparable parametric testing, but were ultimately more appropriate due to their resistance to outliers, non normality of the data, and small sample size. The reliance on nonparametric testing also produced wide confidence intervals on the NIL and IL groups and necessitated increased stringency in the construction of classifiers.

Calculating the Tipping Point For Labor

The model has four equilibrium solutions, values for �and � such that !" = !" = 0. !" !"

At these values of � and � the levels of PR-B and inflammation will remain constant. Each of these solutions corresponds to a physiological condition in the myometrium and the solutions can be stable, where the trajectories in the phase space in the neighborhood of the equilibrium point toward it, or unstable, where the trajectories in the neighborhood of the equilibrium point away from it. The

70

values of the parameters in the model can influence the size of the basin of attraction for the labor equilibrium point and alter the stability of the equilibrium points (see Characterizing the Stability of the Quiescent and Laboring Equilibria for a full discussion of how � and � impact the size of the basin of attraction and stability of the laboring and quiescent equilibria). A tipping point exists in our model where the quiescent equilibrium point transitions from stable to unstable and the phase space is completely biased toward the laboring equilibrium point.

This change in stability is called a bifurcation and linear stability analysis allows us to compute the exact parameter values which will cause the stability to change. The end result of this analysis is a quantitative prediction for the tipping point of labor, the conditions under which the myometrial cell is permanently in the laboring phenotype leading to uterine emptying.

Characterizing the Stability of the Quiescent and Laboring Equilibria

We analyzed the dimensionless form of our model in Equation 2 and computed the eigenvalues for the four equilibrium points,

(0,0), (1,0), (0,1), (! !!! , ! !!! ) (5) !!!" !!!" where (0, 0) corresponds to a myometrial cell that is not expressing any PR-B and has no inflammation. The solution (1,0) corresponds to a myometrial cell that does not express PR-B and has no inflammation. The solution (1,0) is the quiescent equilibrium corresponding to a physiological state where PR-B is at its maximal level with no inflammation. The solution (0, 1) is the laboring equilibrium where inflammation is maximized and no PR-B is present. The last solution

71

corresponds to an intermediate point between quiescent and laboring where the myometrial cell can pass into either phenotype.

In order to characterize the stability of these solutions, we need to solve for the eigenvalues of the model �! and �! at each equilibrium point from (5).

The sign of the eigenvalues, positive or negative, determines the stability and the formula for each eigenvalue tells us whether the eigenvalues can ever change sign. This is done by solving the characteristic polynomial equation of the

Jacobian matrix, ℑ. We begin by computing ℑ for our model at a general equilibrium point (�∗, �∗),

!!! !!! ∗ ∗ ∗ !" !" � − 2�� − � −� ℑ = = (6) !!! !!! −�∗ � − 2��∗ − ��∗ !" !"

The characteristic polynomial can be obtained by taking the determinant of the matrix,

� − � + 2��∗ + �∗ −�∗ �� − ℑ = (7) −�∗ � − � + 2��∗ + ��∗

Setting this determinant to zero gives us the eigenvalues, the roots of the characteristic polynomial whose signs determine the stability of the equilibrium solutions:

0 = �! + �� + �. (8) where

� = 2 ��∗ + ��∗ + ��∗ + �∗ − (� + �) (9) and

� = �� 1 − 2�∗ − 2�∗ + 4�∗�∗ + ��∗(−� + 2��∗) (10)

72

By applying the quadratic formula, we can obtain an expression for both �!

! � = !!! ! !!! (11) ! ! and �!

! � = !!! ! !!! (12) ! !

The sign of �! and �! determine the type and stability of each equilibrium point

(Table 3-2). The trivial equilibrium, (0,0), has two positive, real eigenvalues for all values of � and � indicating it is always unstable and can be classified as a source node. Physiologically, a myometrial cell in this state is not exposed to inflammatory stimuli and is not expressing PR-B. This state cannot endure long and like the equilibrium point is unstable, all trajectories are pointing away from the equilibrium. The quiescent and laboring equilibria have two real, negative eigenvalues each and are thus stable sink nodes. Both of these are stable so long as both � < 1 and � < 1. The intermediate equilibrium point allows us to identify the tipping point as � and � change. This equilibrium has two real eigenvalues, one positive and one negative, thus the intermediate equilibrium is a semi-stable saddle node.

Table 3-2: Equilibrium solutions to the PR-B/inflammation model along with eigenvalues and stability conditions for each equilibrium point Equilibrium Trivial: (0,0) Quiescent: Laboring: Intermediate: ! !!! ! !!! (1,0) (0,1) ( , ) !!!" !!!" Eigenvalues (�, �) (� − �, −�) (� − 1, −�) !!! !!!!! !!! !!!!! ( , ) (�!, �!) ! ! Stability Unstable Stable Stable Unstable Condition � > 0, � > 0 � > �, � > 0 1 > �, � > 0 � > 0 and � > 0

The formula for the eigenvalues of the intermediate equilibrium indicates that three bifurcations are possible as � and � change. Firstly, if � = 1 and � ≤ 1,

73

or if � ≤ �, then the quiescent equilibrium has one negative and one zero eigenvalue. In this case, the intermediate equilibrium point has moved through the phase space to collide with the quiescent equilibrium (Figure 3-5). When these two equilibrium points combine, the quiescent equilibrium point changes stability from stable, where all temporal trajectories in the neighborhood of the equilibrium pointing toward it, to unstable with all the trajectories pointing away from the equilibrium point. Similarly, if � = 1 and � ≤ 1the intermediate equilibrium collides with the laboring equilibrium resulting in a bifurcation where the laboring equilibrium point changes from stable to unstable. When � = �� the intermediate equilibrium becomes a singularity and is non-physiological. The third bifurcation occurs when � and � equal 0 and the intermediate equilibrium collides with the trivial equilibrium point.

Figure 3-5: The phase space bifurcation which occurs as the intermediate equilibrium point approaches the quiescent equilibrium point. This occurs as the ordered pair (�, �) changes from (left) �, � =(0.5, 0.5), to (center) (�, �)=(0.5, 0.75), to the bifurcation (right) at (�, �)=(0.5, 1).

The three bifurcations show that certain changes in the relative values of the parameters �, �, and � cause significant qualitative changes in the phase space of the model. The first bifurcation, the collision of the intermediate equilibrium with the quiescent equilibrium, corresponds to the physiological

74

condition when the myometrium moves from quiescent to laboring. As noted in the methods section, this condition corresponds to a probability of labor equal to one. The converse of this is the second bifurcation occurs when the intermediate equilibrium point collides with the laboring equilibrium point and the probability of labor is zero. Physiologically this could correspond to a therapeutic intervention that preserves quiescence and prevents labor. The third bifurcation may have physiological significance in the transition of the myometrial cell from non- pregnant to pregnant as inflammation and PR-B transition from inactive to active in the pregnancy uterus. However, since our model is based upon the activity of

PR-B and inflammatory drivers during pregnancy, this bifurcation, though interesting, is beyond the scope of the present investigation. These bifurcations show that our model reflects the conditions under which dynamic, qualitative changes in the myometrial cellular phenotype occur.

The dynamical systems model was designed to explore the functional interaction between the anti-inflammatory actions of progesterone mediated by

PR-B and the effect of inflammatory load on the contractile state of the human pregnancy uterus. In addition we sought to identify the conditions that induce a bifurcation in the model similar to the one that occurs when the uterus transitions to the laboring state. The dimensionless version of our model simplifies this task by enabling us to identify how changes in the three dimensionless parameters, �,

�, and �, influence the trajectory of a hypothetical pregnancy phase space. The underlying rationale was that bifurcations correspond to physiologically important events in the timeline of pregnancy representing uterine quiescence and its

75

transition to the laboring state. The interaction between the dimensionless model parameters � and � appears to be the most significant for initiating the labor bifurcation. The meaning of this interaction is that as long as the repressive capacity of PR-B, �, is greater than the activation rate of inflammation, �, quiescence will be maintained. This finding supports the PR-A/PR-B hypothesis since it recapitulates the important role of PR-B-mediated anti-inflammatory activity and shows how interference of this function, possibly by PR-A, destabilizes quiescence in favor of labor. Though we fixed � in the present analysis, we hypothesize that the level of � relative to � and � may be reflective of the trans-repressive activity of PR-A on PR-B. The phase space dynamics of modulating � seem to support this with higher values of � producing lower probabilities of labor and may be an interesting avenue for further investigation.

Conclusions

This mathematical model of the PR-A:PR-B hypothesis of human parturition produces qualitative dynamics which mimic those observed in vitro and in vivo. A novel interpretation of the phase space of a dynamical system as a probability space enables predictive modeling of all possible phenotypic states of the pregnancy myometrium in a patient specific manner. Predictive modeling of patient datasets shows that our model makes accurate predictions of the laboring phenotype in patients, performing best when the PR-B surrogate FKBP5 and inflammatory surrogate IL-1β are used to fit the dimensionless model. Linear stability analysis shows three phenotypically interesting phase space bifurcations

76

exist in our model and provides a quantitative tipping point for the myometrium transitioning to the contractile phenotype given our model. This dynamical systems model of progesterone receptor interactions in the pregnancy myometrium provides a plausible explanation for the observed biochemical dynamics in the literature and is a first step in more sophisticated modeling of human partition with dynamical systems models.

Methods

Datasets

We obtained data from two published studies which examined gene transcriptional changes in the myometrium (obtained at the time of cesarean section delivery) of women who were not in labor (NIL: closed and rigid cervix and no indication of uterine contractions) and in labor (IL: cervix dilated greater than 4cm and rhythmic contractions). One study [1] in which transcriptome analysis was performed by microarray technology (henceforth referred to as the microarray dataset or microarray data) comprised 3 myometrial samples from

NIL women at term (greater than 37 weeks gestation), 3 samples from IL women at term, and 3 samples from IL women undergoing preterm (less than 37 weeks gestation) cesarean section delivery. Some confusion has emerged since publication about whether one of the term IL samples was truly in labor. We excluded this sample for our analysis of this dataset and combined the preterm and term IL samples into one IL group. The other study [13] used RNA sequencing (henceforth referred to as the RNAseq dataset or RNAseq data) of myometrium from 5 NIL women and 5 IL women at term. The samples from both

77

datasets were collected from different women and are not paired samples from which we could model temporal changes in RNA levels at two independent time points. Rather, the cohort of 18 samples (10 IL, 8 NIL) was used to test whether or not our model can distinguish the laboring and non laboring phenotypes in actual women.

Model Definition

A host of experimental data has been collected which links PR-A, PR-B, and inflammatory drivers in the pregnancy uterus [2, 52, 64-66, 73, 77], the totality of which has resulted in the formulation of the PR-A:PR-B hypothesis of functional progesterone withdrawal. In an effort to understand the functional dynamics of how the progesterone receptors interact with inflammation during pregnancy we translated the principles of the PR-A:PR-B hypothesis into equations which could be used to mathematically explore the dynamics and consistency of the biological hypothesis. According to the hypothesis, PR-B is the principle mediator of the anti-inflammatory, relaxatory actions of progesterone in the myometrium

[64]. At term, increased PR-A function inhibits PR-B activity, an event shown to be mediated by inflammation [66]. In essence, the PR-A:PR-B hypothesis describes a standard competitive interaction between the pro-pregnancy actions of PR-B and the pro-labor actions of PR-A where the activity of PR-A is related to the level of inflammation in the myometrium. As such, we chose to consider PR-

B and inflammation and incorporated the effects of PR-A into the inflammatory terms of the model. The model is formulated as two coupled differential equations for the transcriptional activity of PR-B and inflammation status

78

^ ^ ^ ^ !! ^ ^ ! ^ ^ !! ^ ^ ! ^ ^ = � � 1 − − �!� � , = � � 1 − − �!� � (13) !" !! !" !!

Model Nondimensionalization and Simplification

^ ^ There are six parameters in our model �!, �!, �!, �!, � , and � with various units.

Nondimentionalization is a tool for simplifying our model whereby these parameters are replaced with dimensionless constants. While nondimensionalization can make it difficult to pinpoint the influence of individual parameters on the system's behavior, this concern is minimal since we only have six parameters in our model. Unpacking the influence of particular dimensionless constants and the parameters that constitute those constants is thus

^ ^ straightforward for our model. The units for � , � , �!, and �! are the amount of

PR-B or inflammation present, similar to a concentration. Time � is given in weeks. The rate constants �^and �^ are in units ! while the rate constants � , !""#$ ! and � are in units ! . We define three dimensionless variables ! !"#!$#%&'%("# ∗ !""#$ for our model,

!^ !^ !^ � = , � = , � = ��!�! . (14) !! !! !!

Substituting these for �^, �^, and � makes the PR-B equation,

� � � !" = �^� �(1 − �) − � � � �� (15) ! ! ! !" ! ! ! ! and the inflammation equation,

�!� !" = �^� �(1 − �) − � � � �� (16) ! ! !" ! ! ! !

79

We simplify the equations by dividing by � � � in the equation for !" and by �!� ! ! ! !" ! ! in the equation for !". The result is the dimensionless model, !"

^ ^ !" = ! !! �(1 − �) − �� !" = ! �(1 − �) − !!!! ��. (17) !" !!!!!! !" !!!! !!!!

!^ !^ We can now define three dimensionless constants, �! = , �! = , and !!!! !!!!

!!!! �! = . Substituting these yields the final version of the dimensionless model, !!!!

!" = � � 1 − � − ��, !" = � � 1 − � − � ��. (18) !" ! !" ! !

We infer the activity of PR-B and pro-inflammatory drivers with two PR-B responsive genes to serve as surrogates for PR-B, FOXO1A and FKBP5 [64,

71], and three pro-inflammatory genes to serve as surrogates for inflammation,

IL-1β, IL-6 and IL-8 [72]. The normalized values of the data for these genes �! are calculated using the equation,

! !! � = ! (19) ! !!! where �! is the value of the gene for patient �, � is the maximum expression value for that gene across patients in the dataset, and � is the minimum value of that gene across patients in the dataset. One consequence of this normalization procedure is that the transcriptome data has been nondimensionalized.

Therefore, the dimensionless constants and dimensionless transcriptome data can be seamlessly combined so that the surrogate genes parameterize the dimensionless model for each patient. This equation bounds the values for the

PR-B and inflammation surrogates from 0 to 1 and makes the natural choice of

80

values for critical levels of PR-B and inflammation �! = �! = 1. The dimensionless parameters then become,

^ ^ ! ! !! �! = = �, �! = = �, �! = = �. (20) !! !! !!

Now our model can be rewritten as,

!" = �� 1 − � − ��, !" = �� 1 − � − ��� (21) !" !" where the values of � and � are determined by the normalized dimensionless values for the PR-B and inflammatory surrogate genes respectively and the value of � corresponds to the strength of PR-B's anti-inflammatory actions.

Calculating the Probability of Labor for Each Patient

Next we quantify the behavior of � in order to apply our model to patient data. To do this, we have to derive the steady states solutions for our model. These solutions are the values of � and � which cause !" = !" = 0 and correspond to a !" !" state where the system undergoes no change. There are three steady states, equilibrium points, which are easy to derive. These occur when the ordered pair for PR-B and inflammation, (�, �), is equal to,

(0, 0), (1, 0), (0, 1) (22) where (0,0) is the trivial equilibrium point where neither PR-B nor inflammation is present, (1,0) is the quiescent equilibrium where PR-B is maximal and there is no inflammation, and (0,1) is the laboring equilibrium where there is no PR-B and inflammation is maximal. The quiescent equilibrium corresponds to a PR-B dominant state and laboring equilibrium corresponds to an inflammatory dominant state. There is a fourth equilibrium point which exists for some values

81

of � and � between the quiescent and laboring equilibrium which we will designate as the intermediate equilibrium, (�∗, �∗). We obtain this equilibrium point by first setting our model equations equal to zero,

!" = 0 = ��∗ 1 − �∗ − �∗�∗, !" = 0 = ��∗ 1 − �∗ − ��∗�∗. (23) !" !"

Simplifying this becomes,

0 = � 1 − �∗ − �∗, 0 = � 1 − �∗ − ��∗. (24)

This results in two equations, one for �∗ and one for �∗,

�∗ = � 1 − �∗ , �∗ = ! 1 − �∗ . (25) !

Depending on how we chose to perform the substitution, �∗ into the equation for

�∗ or �∗ into the equation for �∗, we can derive two forms of the same intermediate equilibrium point. These are,

! �∗, �∗ = !(!!!) , !(!!!) = ! ! !!! !!"(!!!) , !"(!!!) (26) !!!" !!!" !(!"!!) !"!!

Since the values of � and � are determined by PR-B and inflammatory surrogate genes scaled from 0 to 1, these terms are bounded to that interval. Furthermore,

∗ ∗ since � and � are bounded by �! = 1 and �! = 1, � and � are bounded to the square domain with vertices (0,0), (0,1), (1,1), and (1,0) of area 1. So, the intermediate equilibrium point, in both forms, should satisfy the constraints

0 ≤ �∗ ≤ 1 and 0 ≤ �∗ ≤ 1. By considering how this constraint impacts both forms of the intermediate equilibrium we can derive a set of constraints for the values of

�. In order to allow for the full range of values of �, we find that � and � satisfy

0 ≤ � < � ≤ 1.

82

In the limit in the case where � = �, the intermediate equilibrium point equals the quiescent equilibrium point (1,0). If we visualize this state in {phase space (Figure 3-2) we see that all the vectors point away from the quiescent equilibrium point toward the laboring equilibrium. In phase space, these vectors define trajectories that indicate how the system would evolve in time given a certain starting point. The set of vectors pointing toward the laboring equilibrium point is known as the basin of attraction for the laboring equilibrium point. We compute a probability of labor equal to the area of the laboring equilibrium point's basin of attraction divided by the area of the domain, which in our case is 1. The area of the basin changes as �, �, and � change (Figure 3-2). This probabilistic interpretation of a phase space is reasonable under the assumptions that all possible pairs of values of � and � in the domain occur with equal likelihood. For example, in the case where � = � and � is fixed at 0.5, the probability of labor is equal to 1, the entire domain is the basin of attraction for the laboring equilibrium, which means that quiescence is impossible. In order to ensure that quiescence is a possibility, we set � = 1 so that only one value of �, � = 1, results in a probability of labor equal to 1 enabling us to explore the full range of values for � and �. Thus, the model we apply to patient data is

!" = �� 1 − � − ��, !" = �� 1 − � − �� (27) !" !"

Each patient in the microarray and RNA seq datasets has an expression value for each of the surrogate genes, FOXO1A, FKBP5, IL-1β, IL-6, and IL-8. {In the absence of proteomic data precisely quantifying the protein level activity of these genes in vivo the mRNA expression levels can be combined with the framework

83

of our mathematical model to approximate the functional activity of these genes at the time of labor, FOXO1A or FKBP5 for PR-B and IL-1β, IL-6, or IL-8 for inflammation. We calculated a probability of labor for each patient in each dataset using all six possible combinations of surrogate genes (FOXO1A, IL-1β),

(FKBP5, IL-1β), (FOXO1A, IL-6), (FKBP5, IL-6), (FOXO1A, IL-8), and (FKBP5,

IL-8) where each surrogate was used to set the parameters � and � in our model.

We will hereafter refer to a pair of surrogate genes as a predictor. A probability of labor was computed for each patient which corresponds to the size of the basin of attraction for the laboring equilibrium point given a predictor pair of surrogate genes for � and �.

Classifier Construction and Assessment

A cross-validation procedure was used to assess the predictive power of our model's probability of labor. One dataset (microarray or RNA seq) was used as a training set to build a classifier and the other dataset was used as a test set to assess the training set's classifier. Classifiers were assessed for each predictor pair of genes in each dataset. The probability of labor was calculated for each patient in the training set for a given predictor pair of genes. These patients were then separated by phenotype, in labor (IL) and not in labor (NIL). A nonparametric one-sided 95% confidence interval on the median was constructed on the probabilities for each phenotype by transforming the probabilities into ranks and calculating the nonparametric standard error. This error indicated the rank of the probability to be used as the lower or upper bound of the confidence interval. For the NIL group the lower bound of the confidence

84

interval was set to 0 and the upper bound at the calculated rank. For the IL group the upper bound of the confidence interval was set at 1 and lower bound at the calculated rank. We hereafter refer to the one sided confidence intervals for the

IL and NIL groups as a classifier. The significance of each classifier was assessed using the Wilcoxon-Mann-Whitney (WMW) test on the group medians and a classifier was only used for testing if the confidence intervals separated.

Once the classifier was built for the training set it was used to classify the test set of patients. The probability of labor for each patient in the test set was computed for the same predictor as at the training set. The test set was then classified as

IL, NIL, or as a no-call depending whether the sample's probability fell into the training set confidence interval for IL, NIL, or somewhere in between. Precision and recall metrics were used to assess the classifier where precision and recall are defined as

��������� = !"##$!%&' !"#$$%&%'( !"#$%&!, ������ = !"!#$ !"#$$%&%'( !"#$%&!. (28) !"!#$ !"#$$%&%'( !"#$%&! !"!#$ !"#$%&!

This procedure of creating and assessing classifiers was performed using both datasets as a training and test set and for all possible predictors. Classifiers for

FOXO1A, FKBP5, IL-1β, IL-6, and IL-8 as single gene predictors were also assessed to determine whether the two-gene classifier built by our model is an improvement over classifiers based on the expression of the individual genes.

Competing Interests

The authors declare that they have no competing interests.

85

Authors’ Contributions

Douglas Brubaker conceived of, analyzed, and implemented the mathematical model for predictive testing. He performed data collection and analyses and led much of the project. He authored the manuscript both managing and implementing revisions from the co-authors.

Alethea Barbaro, Ph.D. is an expert in dynamical systems modeling and guided

Mr. Brubaker as he designed, analyzed, and implemented the model. Her expertise ensured that the linear stability analysis and phase space interpretation were correctly performed. She was instrumental in ensuring that the mathematical form of the model was justified by the existing biological data and was extensively involved in writing and revising the manuscript.

Mark R. Chance, Ph.D. oversaw the design of the predictive modeling and ensured that the models predictions were compared to other comparable predictors. He guided the overall design and implementation of the predictive modeling, ensured biological justification of the model, and provided important feedback in the manuscript writing and revision process.

Sam Mesiano, Ph.D. is an expert in myometrial physiology who studies the function of the progesterone receptor isoforms in parturition. His expertise ensured that the dynamics of the progesterone receptor interactions with inflammation were accurately described by the equations and his work provided much of the basis for the model itself. His contributions to the manuscript writing and revision process ensured that it faithfully reflected the biological reality of the progesterone receptors in human parturition.

86

Acknowledgements

The authors wish to than Drs. Jill Barnholtz-Sloan and Jenny Brynjarsdottir for their guidance in implementing nonparametric statistical testing for the predictive modeling. We also would like to thank Mr. Gavin Brown for his helpful discussion of interpreting a dynamical system phase space as a probability space for predictive modeling. This work was supported by NIH Grant T32HL007567, the

Clinical and Translational Science Collaborative in the Case Western Reserve

School of Medicine, the March of Dimes Ohio Prematurity Research

Collaborative, The Global Alliance to Prevent Prematurity and Stillbirth, and the

Eunice Kennedy Shriver National Institute of Child Health and Human

Development (HD069819). Phase space plots were generated using

Mathematica.

87

Discussion

This dissertation addresses the complex disorder of preterm birth from multiple biological angles each of which necessitates the use of particular computational approaches. Taken together, the major findings of Chapters 1, 2, and 3 provide a more complete understanding of parturition and prematurity and suggest potentially important avenues for future study (Figure 4-1).

Characterizing the signaling pathways of parturition subtypes in Chapter 1 provides a framework through which the MEF2C regulated network in Chapter 2 and the dynamical systems model of Chapter 3 should be interpreted.

Figure 4-1: Summary of major findings of this dissertation. Chapter 1 characterizes the singlaing pathway landscape of three parturition subtypes (inflammation indicated PTB (HCA), term labor (Term) and spontaneous PTB (sPTB)). Chapter 2 uses integrative network analysis to identify a network module regulated by MEF2C enriched with preterm birth SNP carrying genes. Chapter 3 uses dynamical systems modeling techniques to investigate interactions with inflammation and uses this model to predict laboring phenotypes with high precision. Together the findings provide a more complete understanding of preterm birth.

In Chapter 1 we undertook a meta-analysis of all publicly available transcriptome datasets for the human laboring myometrium. Using the raw

88

datasets in our analysis enabled us to standardize the statistical approaches to the microarray [1, 3, 4] and RNA-seq [5, 13] datasets. We examined the parturition subtypes of spontaneous preterm birth, preterm birth with an inflammatory indication, and term labor by performing differential gene expression and pathway analysis for each phenotype. Our results show that while term labor and preterm labor with inflammatory indications share a core set of inflammatory, integrin, and AP1 transcription factor signaling events, spontaneous preterm birth is a unique subtype that only shows deactivation of

VEGF and VEGFR signaling with labor onset. This finding challenges the conventional understanding of parturition by identifying a parturition subtype where the myometrium transitions to labor without activation of the inflammatory pathways.

In Chapter 2 we investigated the genetic predisposition to preterm birth by taking SNPs that failed to show genome wide significance in a GWAS meta- analysis [39] and analyzing their combined effects in a PPI network framework.

The organizing framework of a PPI network has been successfully used to group candidate genes associated with diseases and consider their combined impact to drive phenotypes from breast cancer to Alzheimer’s disease [78, 79]. Our approach used candidate genes 20kb from candidate PTB-SNPs to seed a PPI network which we then scored to assess whether subnetworks of genes in that network were coordinately regulated with the onset of labor [35, 37]. Our identification of a network module in term labor regulated by MEF2C enriched with eight PTB-SNP genes is the first time MEF2C has been implicated in PTB.

89

Further studies are required to assess the prognostic value of the SNPs in the

MEF2C network for preterm birth risk and the therapeutic potential of modulating

MEF2C function.

In Chapter 1 we found that VEGF and VEGFR signaling was downregulated in spontaneous PTB and have subsequently become aware that

VEGF is a direct regulator of MEF2C expression in endothelia cells [80].

Furthermore, though the prostaglandin and inflammation associated genes

PLA2G4C and RPS6KA5 were identified in the MEF2C network, none of the canonical inflammatory pathway genes associated with term labor described in

Chapter 1 were found in the MEF2C network. These observations together suggest that dysregulation of the MEF2C module is likely not associated with inflammatory labor subtypes. The MEF2C module may be dysregulated by associated PTB-SNP genes and the upstream deactivation of VEGF and VEGF signaling. This link between spontaneous PTB signaling pathways and genetic indications of PTB suggests that developing strategies to prevent spontaneous

PTB may require a combined investigation of MEF2C and VEGF signaling. The vasculature of the uterus may itself be a risk factor for spontaneous PTB and inflammatory signaling may not be a relevant therapeutic target for preventing spontaneous PTB.

The characterization of the role of inflammatory signaling in parturition subtypes also has implications for the scope of progesterone signaling in parturition. In Chapter 3 we presented a dynamical systems model of progesterone receptor B’s interactions with inflammatory signaling in the

90

myometrium. The model dynamics were similar to what is expected in vivo and the predictions the model made of laboring phenotype had high precision. The results from Chapter 1 affirm the inflammatory character of term labor and preterm labor with inflammatory indications. This suggests that our model from

Chapter 3 would be valid in those two phenotypic contexts, but the lack of inflammatory signaling in spontaneous PTB suggests the model would fail in that context. The lack of inflammatory signaling in spontaneous PTB may suggest that progesterone signaling plays a limited role in the spontaneous PTB phenotype.

If progesterone signaling is not a significant factor in spontaneous preterm birth, this may also explain why progesterone supplementation has had limited effectiveness in preventing PTB. In spite of previous evidence showing that progesterone supplementation has little effect in preventing preterm birth [81], it is still pursued as a therapy for preterm birth. Our results from Chapters 1 and 2 offer a partial explanation of this phenomenon by indicating that the mechanism driving spontaneous preterm birth may proceed independently from the progesterone block. If this is the case, then therapeutic strategies to modulate

MEF2C and VEGF signaling could be more effective in preventing spontaneous

PTB than progesterone supplementation.

In this dissertation we analyzed data from the genomic, transcriptomic, and proteomic scales of biology. The bulk of our analysis is based upon gene expression studies where the number of samples in each phenotype rarely exceeds five. We carefully address this issue by utilizing nonparametric statistical

91

approaches to analyze the raw datasets. We also employed consistent criteria for gene expression significance including a false discover q value of 0.25 and insistence when possible that a gene be differentially expressed in two or more studies before we considered it differentially expressed. This helped ensure confidence in the results when comparing differentially expressed gene lists from different microarray and RNA-seq platforms. For the purpose of comparing differentially expressed genes, RNA-seq and microarray pipelines have been observed to generate comparable results [82, 83] and the differing technologies likely did not play a significant role in our analysis.

The preceding work demonstrates the power of integrating multiple data and computational approaches in understanding a complex disorder like preterm birth. The agnostic functional genomics analysis in Chapter 1 provided a framework within which the results of our targeted investigations in Chapters 2 and 3 could be interpreted. We show that different parturition subtypes may interact differently with progesterone signaling and genetic factors. Each investigation provided novel insights on its own, but the insights gained by considering the results together are far more compelling. In conclusion, the parallel mathematical and computational approaches to PTB synergize to yield insights that one approach alone could never have achieved.

92

References

1. Bethin, K.E., et al., Microarray analysis of uterine gene expression in mouse and human pregnancy. Mol Endocrinol, 2003. 17(8): p. 1454-69. 2. Mesiano, S., et al., Progesterone withdrawal and estrogen activation in human parturition are coordinated by expression in the myometrium. J Clin Endocrinol Metab, 2002. 87(6): p. 2924-30. 3. Mittal, P., et al., Characterization of the myometrial transcriptome and biological pathways of spontaneous human labor at term. (1619-3997 (Electronic)). 4. Weiner, C.P., et al., Human effector/initiator gene sets that regulate myometrial contractility during term and preterm labor. Am J Obstet Gynecol, 2010. 202(5): p. 474.e1-20. 5. William E. Ackerman IV, I.A.B., Douglas Brubaker, Sean Maxwell, Mark R. Chance, Taryn L. Summerfield, Guomao Zhao, Hongwu Jing, Sam Mesiano, Catalin S. Buhimschi, Integrated microRNA and mRNA profiling in human myometrium following term and preterm labor. Submitted Molecular Human Reproduction, 2016. 6. Eidem, H.R., et al., Gestational tissue transcriptomics in term and preterm human pregnancies: a systematic review and meta-analysis. (1755-8794 (Electronic)). 7. Trapnell, C., et al., Differential analysis of gene regulation at transcript resolution with RNA-seq. (1546-1696 (Electronic)). 8. Trapnell, C., et al., Differential gene and transcript expression analysis of RNA- seq experiments with TopHat and Cufflinks. (1750-2799 (Electronic)). 9. Nibbe, R.K., M.R. Koyuturk M Fau - Chance, and M.R. Chance, An integrative - omics approach to identify functional sub-networks in human colorectal cancer. (1553-7358 (Electronic)). 10. Edgar, R., M. Domrachev, and A.E. Lash, Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res, 2002. 30(1): p. 207-10. 11. Langmead, B., et al., Ultrafast and memory-efficient alignment of short DNA sequences to the . (1474-760X (Electronic)). 12. Trapnell, C., S.L. Pachter L Fau - Salzberg, and S.L. Salzberg, TopHat: discovering splice junctions with RNA-Seq. (1367-4811 (Electronic)). 13. Chan, Y.W., et al., Assessment of myometrial transcriptome changes associated with spontaneous human labour by high-throughput RNA-seq. Exp Physiol, 2014. 99(3): p. 510-24. 14. Szklarczyk, D., et al., STRING v10: protein-protein interaction networks, integrated over the tree of life. (1362-4962 (Electronic)). 15. Cerami, E.G., et al., Pathway Commons, a web resource for biological pathway data. (1362-4962 (Electronic)).

93

16. Schaefer, C.F., et al., PID: the Pathway Interaction Database. (1362-4962 (Electronic)). 17. Ananth, C.V. and A.M. Vintzileos, Epidemiology of preterm birth and its clinical subtypes. J Matern Fetal Neonatal Med, 2006. 19(12): p. 773-82. 18. Zhang, H., et al., A genome-wide association study of early spontaneous preterm delivery. Genet Epidemiol, 2015. 39(3): p. 217-26. 19. Chaudhari, B.P., et al., The genetics of birth timing: insights into a fundamental component of human development. Clin Genet, 2008. 74(6): p. 493-501. 20. Plunkett, J., et al., Population-based estimate of sibling risk for preterm birth, preterm premature rupture of membranes, placental abruption and pre- eclampsia. BMC Genet, 2008. 9: p. 44. 21. Plunkett, J., et al., Mother's genome or maternally-inherited genes acting in the fetus influence gestational age in familial preterm birth. Hum Hered, 2009. 68(3): p. 209-19. 22. Svensson, A.C., et al., Maternal effects for preterm birth: a genetic epidemiologic study of 630,000 families. Am J Epidemiol, 2009. 170(11): p. 1365-72. 23. Myking, S., et al., X-chromosomal maternal and fetal SNPs and the risk of spontaneous preterm delivery in a Danish/Norwegian genome-wide association study. PLoS One, 2013. 8(4): p. e61781. 24. Olsen, J., et al., The Danish National Birth Cohort--its background, structure and aim. Scand J Public Health, 2001. 29(4): p. 300-7. 25. Plunkett, J., et al., An evolutionary genomic approach to identify genes involved in human birth timing. PLoS Genet, 2011. 7(4): p. e1001365. 26. Uzun, A., et al., dbPTB: a database for preterm birth. Database (Oxford), 2012. 2012: p. bar069. 27. Chen, Y., et al., Variations in DNA elucidate molecular networks that cause disease. Nature, 2008. 452(7186): p. 429-35. 28. Dixon, A.L., et al., A genome-wide association study of global gene expression. Nat Genet, 2007. 39(10): p. 1202-7. 29. Schadt, E.E., et al., Mapping the genetic architecture of gene expression in human liver. PLoS Biol, 2008. 6(5): p. e107. 30. Nicolae, D.L., et al., Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet, 2010. 6(4): p. e1000888. 31. Uzun, A., et al., Pathway-based genetic analysis of preterm birth. Genomics, 2013. 101(3): p. 163-70. 32. Uzun, A., S. Sharma, and J. Padbury, A bioinformatics approach to preterm birth. Am J Reprod Immunol, 2012. 67(4): p. 273-7. 33. Subramanian, A., et al., Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A, 2005. 102(43): p. 15545-50.

94

34. Visscher, Peter M., et al., Five Years of GWAS Discovery. American Journal of Human Genetics, 2012. 90(1): p. 7-24. 35. Liu, Y., et al., Gene, pathway and network frameworks to identify epistatic interactions of single nucleotide polymorphisms derived from GWAS data. BMC Syst Biol, 2012. 6 Suppl 3: p. S15. 36. Liu, Y., et al., Integrative analysis of common neurodegenerative diseases using gene association, interaction networks and mRNA expression data. AMIA Jt Summits Transl Sci Proc, 2012. 2012: p. 62-71. 37. Liu, Y., et al., Systems biology analyses of gene expression and genome wide association study data in obstructive sleep apnea. Pac Symp Biocomput, 2011: p. 14-25. 38. Nibbe, R.K., et al., Discovery and scoring of protein interaction subnetworks discriminative of late stage human colon cancer. Mol Cell Proteomics, 2009. 8(4): p. 827-45. 39. Zhang, G., et al., Assessing the Causal Relationship of Maternal Height on Birth Size and Gestational Age at Birth: A Mendelian Randomization Analysis. PLoS Med, 2015. 12(8): p. e1001865. 40. Szklarczyk, D., et al., The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res, 2011. 39(Database issue): p. D561-8. 41. Klein, P.R., R. , A nearly best-possible approximation algorithm for node- weighted Steiner trees. J Algorithm, 1995. 19(1): p. 104-115. 42. Smoot, M.E., et al., Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics, 2011. 27(3): p. 431-2. 43. Merlino, A.A., et al., Nuclear progesterone receptors in the human pregnancy myometrium: evidence that parturition involves functional progesterone withdrawal mediated by increased expression of progesterone receptor-A. J Clin Endocrinol Metab, 2007. 92(5): p. 1927-33. 44. Xu, Z., et al., Transcription factor MEF2C suppresses endothelial cell inflammation via regulation of NF-kappaB and KLF2. J Cell Physiol, 2015. 230(6): p. 1310-20. 45. Rebhan, M., et al., GeneCards: integrating information about genes, proteins and diseases. (0168-9525 (Print)). 46. O'Brien, M., J.J. Morrison, and T.J. Smith, Upregulation of PSCDBP, TLR2, TWIST1, FLJ35382, EDNRB, and RGS12 gene expression in human myometrium at labor. Reprod Sci, 2008. 15(4): p. 382-93. 47. Plunkett, J., et al., Primate-specific evolution of noncoding element insertion into PLA2G4C and human preterm birth. BMC Med Genomics, 2010. 3: p. 62. 48. Beck, S., et al., The worldwide incidence of preterm birth: a systematic review of maternal mortality and morbidity. Bull World Health Organ, 2010. 88(1): p. 31-8. 49. Corner, G.W. and A. Csapo, Action of the ovarian hormones on uterine muscle. Br Med J, 1953. 1(4812): p. 687-93.

95

50. Corner, G.C., A. , The hormones in human reproduction. Princeton University Press, 1946. 51. Csapo, A., Progesterone block. Am J Anat, 1956. 98(2): p. 273-91. 52. Norman, J.E., et al., Inflammatory pathways in the mechanism of parturition. BMC Pregnancy Childbirth, 2007. 7 Suppl 1: p. S7. 53. Romero, R., et al., Inflammation in preterm and term labour and delivery. Semin Fetal Neonatal Med, 2006. 11(5): p. 317-26. 54. Kim, J., et al., Transcriptome landscape of the human placenta. (1471-2164 (Electronic)). 55. Larsen, B. and J. Hwang, Progesterone interactions with the cervix: translational implications for term and preterm birth. (1098-0997 (Electronic)). 56. Giangrande, P.H., et al., The opposing transcriptional activities of the two isoforms of the human progesterone receptor are due to differential cofactor binding. Mol Cell Biol, 2000. 20(9): p. 3102-15. 57. Kastner, P., et al., Two distinct estrogen-regulated promoters generate transcripts encoding the two functionally different human progesterone receptor forms A and B. Embo j, 1990. 9(5): p. 1603-14. 58. Vegeto, E., et al., Human progesterone receptor A form is a cell- and promoter- specific repressor of human progesterone receptor B function. Mol Endocrinol, 1993. 7(10): p. 1244-55. 59. Boroditsky, R.S., et al., Maternal serum estrogen and progesterone concentrations preceding normal labor. Obstet Gynecol, 1978. 51(6): p. 686- 91. 60. Tulchinsky, D., et al., Plama estradiol, estriol, and progesterone in human pregnancy. II. Clinical applications in Rh-isoimmunization disease. Am J Obstet Gynecol, 1972. 113(6): p. 766-70. 61. Tulchinsky, D., et al., Plasma estrone, estradiol, estriol, progesterone, and 17- hydroxyprogesterone in human pregnancy. I. Normal pregnancy. Am J Obstet Gynecol, 1972. 112(8): p. 1095-100. 62. Tulchinsky, D. and D.M. Okada, Hormones in human pregnancy. IV. Plasma progesterone. Am J Obstet Gynecol, 1975. 121(3): p. 293-9. 63. Walsh, S.W., G.W. Kittinger, and M.J. Novy, Maternal peripheral concentrations of estradiol, estrone, cortisol, and progesterone during late pregnancy in rhesus monkeys (Macaca mulatta) and after experimental fetal anencephaly and fetal death. Am J Obstet Gynecol, 1979. 135(1): p. 37-42. 64. Tan, H., et al., Progesterone receptor-A and -B have opposite effects on proinflammatory gene expression in human myometrial cells: implications for progesterone actions in human pregnancy and parturition. J Clin Endocrinol Metab, 2012. 97(5): p. E719-30. 65. Hardy, D.B., et al., Progesterone receptor plays a major antiinflammatory role in human myometrial cells by antagonism of nuclear factor-kappaB activation of cyclooxygenase 2 expression. Mol Endocrinol, 2006. 20(11): p. 2724-33.

96

66. Madsen, G., et al., Prostaglandins differentially modulate progesterone receptor-A and -B expression in human myometrial cells: evidence for prostaglandin-induced functional progesterone withdrawal. J Clin Endocrinol Metab, 2004. 89(2): p. 1010-3. 67. Britton, N.F., Essential Mathematical Biology. Springer, 2005. 68. Calvetti, D.E., S. , Computational Mathematical Modeling. SIAM, 2013. 69. Iber, D. and G. Fengos, Predictive models for cellular signaling networks. Methods Mol Biol, 2012. 880: p. 1-22. 70. Rangamani, P. and R. Iyengar, Modelling cellular signalling systems. Essays Biochem, 2008. 45: p. 83-94. 71. Brosens, J.J. and E.W. Lam, Progesterone and FOXO1 signaling: harnessing cellular senescence for the treatment of ovarian cancer. Cell Cycle, 2013. 12(11): p. 1660-1. 72. Golightly, E., H.N. Jabbour, and J.E. Norman, Endocrine immune interactions in human parturition. Mol Cell Endocrinol, 2011. 335(1): p. 52-9. 73. Gotkin, J.L., et al., Progesterone reduces lipopolysaccharide induced interleukin- 6 secretion in fetoplacental chorionic arteries, fractionated cord blood, and maternal mononuclear cells. Am J Obstet Gynecol, 2006. 195(4): p. 1015-9. 74. Li Wx Fau - Li, W.X., Canonical and non-canonical JAK–STAT signaling. (0962- 8924 (Print)). 75. Shuai, K. and B. Liu, Regulation of JAK-STAT signalling in the immune system. (1474-1733 (Print)). 76. Neal, J.L., et al., Differences in inflammatory markers between nulliparous women admitted to hospitals in preactive vs active labor. American Journal of Obstetrics & Gynecology. 212(1): p. 68.e1-68.e8. 77. Mesiano, S., Y. Wang, and E.R. Norwitz, Progesterone receptors in the human pregnancy uterus: do they hold the key to birth timing? Reprod Sci, 2011. 18(1): p. 6-19. 78. Auffray, C., Protein subnetwork markers improve prediction of cancer outcome. Molecular Systems Biology, 2007. 3: p. 141-141. 79. Vanunu, O., et al., Associating Genes and Protein Complexes with Disease via Network Propagation. PLoS Computational Biology, 2010. 6(1): p. e1000641. 80. Maiti, D., Z. Xu, and E.J. Duh, Vascular Endothelial Growth Factor Induces MEF2C and -Dependent Activity in Endothelial Cells. Investigative ophthalmology & visual science, 2008. 49(8): p. 3640-3648. 81. Manuck, T.A., et al., Predictors of response to 17-alpha hydroxyprogesterone caproate for prevention of recurrent spontaneous preterm birth. LID - S0002- 9378(15)02491-6 [pii] LID - 10.1016/j.ajog.2015.12.010 [doi]. (1097-6868 (Electronic)). 82. Perkins, J.R., et al., A comparison of RNA-seq and arrays for whole genome transcription profiling of the L5 spinal nerve transection model of neuropathic pain in the rat. Molecular Pain, 2014. 10: p. 7-7. 83. Su, Z., et al., An investigation of biomarkers derived from legacy microarray data for their utility in the RNA-seq era. Genome Biology, 2014. 15(12): p. 3273.

97