<<

Discovery of Putative STAT5 Factor Binding Sites in Mice with Diabetic

Nephropathy

A thesis presented to

the faculty of

the Russ College of Engineering and Technology of Ohio University

In partial fulfillment

of the requirements for the degree

Master of Science

Jens Schmidt

December 2013

© 2013 Jens Schmidt. All Rights Reserved.

2

This thesis titled

Discovery of Putative STAT5 Binding Sites in Mice with Diabetic

Nephropathy

by

JENS SCHMIDT

has been approved for

the School of Electrical Engineering and Computer Science

and the Russ College of Engineering and Technology by

Lonnie R. Welch

Professor of Electrical Engineering and Computer Science

Dennis Irwin

Dean, Russ College of Engineering and Technology 3

ABSTRACT

SCHMIDT, JENS, M.S., December 2013, Computer Science

Discovery of Putative STAT5 Transcription Factor Binding Sites in Mice with Diabetic

Nephropathy

Director of Thesis: Lonnie R. Welch

Type 1 diabetes mellitus has become a major disease and impacts patients’ lives significantly. Because the underlying pathways and the genetic causes have not been exhaustively examined yet, this thesis focuses on identifying potential binding sites for

STAT5, a transcription factor that is hypothesized to play a role in the process of diabetic nephropathy, a complication of type 1 diabetes. In this study, motif finding was applied to determine a set of putative STAT5 binding sites. This set was filtered by comparison to three ontology terms that are associated with processes that can occur in diabetic nephropathy and by comparison to experimentally validated

STAT5 binding sites. This work generated a short list of six and their associated sites that should be given the highest priority for experimental validation in the laboratory in an effort to demonstrate a direct, repressive role for STAT5 in diabetic nephropathy. 4

DEDICATION

I would like to dedicate this thesis to Gudrun and Werner Schmidt and Ellen Lubbers.

5

ACKNOWLEDGMENTS

First, I would like to thank Dr. Lonnie Welch for offering me the opportunity to specialize in the field of bioinformatics and for welcoming me into the bioinformatics laboratory. I would like to thank Dr. Welch for giving me helpful advice and for sharing his vision about solving bioinformatics problems and general philosophical content with me. Dr. Welch always encouraged me to work hard and to continuously build new skills and to extend my knowledge inside and outside the class room.

Second, I would like to thank Dr. Karen Coschigano, as well as her collaborator

Dr. K. Wyatt McMahon, for collaborating with our bioinformatics laboratory and me on her diabetic nephropathy project, for proofreading this thesis several times and giving valuable feedback, for explaining the biological background and for providing feedback for our intermediate results.

Third, I would like to thank Dr. Chang Liu and Dr. Frank Drews for being on my committee.

Fourth, I would like to thank Russ College of Engineering and Ohio University for accepting my application for the computer science program and giving me, a non- native speaker, the opportunity to experience United States university education at a high level and for getting to know the American culture better.

Fifth, I would like to thank Xiaoyu Liang (Veronica) and Andrew Bissell for working on the diabetic nephropathy project as well. 6

Finally, I would like to thank the bioinformatics laboratory (Rami Al-Ouran,

Richard Wolfe, Kevin Plis, Yichao Li and Ashwini Naik) for supporting me. And I would like to thank my family and Ellen Lubbers for always supporting and being there for me. You are great! 7

TABLE OF CONTENTS

Page

Abstract ...... 3

Dedication ...... 4

Acknowledgments ...... 5

List of Tables...... 11

List of Figures ...... 13

List of Abbreviations ...... 16

1. Introduction ...... 17

1.1 Diabetes mellitus ...... 17

1.2 Diabetic Nephropathy ...... 17

1.3 Transcription factors and their mechanism ...... 18

1.4 Transcription factor STAT5 ...... 18

1.5 Previous laboratory experiments ...... 19

2. Hypothesis and Specific Aims ...... 21

2.1 Hypothesis ...... 21

2.2 Specific Aims ...... 21

3. Literature Review ...... 24 8

4. Results ...... 28

4.1 Identification of putative STAT5 binding sites ...... 28

4.2 Filtering genes by GO terms ...... 39

4.3 Comparison of putative STAT5 binding sites to published ChIP-seq results ...... 43

5. Discussion ...... 47

5.1 Significance ...... 51

5.2 Future work ...... 52

6. Methods ...... 54

6.1 Microarray experiments ...... 55

6.1.1 Description and interpretation of summarized microarray data sets ...... 56

6.1.2 Analysis of data sets ...... 57

6.2 Literature search for experimentally validated STAT5 binding sites ...... 58

6.2.1 Creation of position weight matrices ...... 59

6.3 Data storage (MySQL database) ...... 59

6.4 Framework for program execution and to display results (Galaxy) ...... 60

6.5 Perl programs...... 63

6.5.1 Sequence retrieval from the Ensembl database ...... 67

6.5.2 Motif finding (FIMO) ...... 67 9

6.5.3 Filter of potential STAT5 binding sites ...... 68

6.5.3.1 Comparison of potential STAT5 binding sites to experimentally validated

STAT5 binding sites (ChIP-seq)...... 68

6.5.3.2 Filter potential STAT5 binding sites via database ...... 69

6.6 System architecture and configuration ...... 70

6.7 Limitations...... 70

References ...... 72

Appendix 1: Galaxy GUI screenshots ...... 82

Appendix 2: Galaxy configuration files (XML format) ...... 87

Appendix 3: SQL script “create_db.sql” ...... 102

Appendix 4: Additional charts of potential STAT5 binding sites ...... 105

Long STAT5 binding motif ...... 105

Short STAT5 binding motif ...... 109

Appendix 5: Potential STAT5 binding sites (short motif) validated against ChIP-seq data set with GEO accession number GSM784027 ...... 111

Appendix 6: Potential STAT5 binding sites (long motif) validated against ChIP-seq data set with GEO accession number GSM784027 ...... 118 10

Appendix 7: Position weight matrix file “stat5_pwm__short_gas__default_bg.txt” (short

STAT5 motif) ...... 129

Appendix 8: Position weight matrices file “stat5MemeLong” (long STAT5A and

STAT5B motifs) ...... 130

Appendix 9: 43 Genes with potential STAT5 binding sites (long motif for STAT5A or

STAT5B and short motif) related to term "Inflammatory Response" ...... 132

Appendix 10: 58 Genes with potential STAT5 binding sites (long motif for STAT5A or

STAT5B and short motif) related to term "Apoptotic process" ...... 134

Appendix 11: 24 Genes with potential STAT5 binding sites (long motif for STAT5A or

STAT5B and short motif) related to term "-mediated signaling pathway" ...... 136

Appendix 12: Against ChIP-seq data set (GEO: GSM784027) validated potential STAT5 binding sites (80 genes, long motif) ...... 137

Appendix 13: Against ChIP-seq data set (GEO: GSM784027) validated potential STAT5 binding sites (74 genes, short motif) ...... 140

Appendix 14: Against ChIP-seq data set (GEO: GSM784027) validated potential STAT5 binding sites (61 genes in intersection between long and short motif) ...... 143

Appendix 15: 190 potential STAT5 binding sites for gene Prkcb (short motif) ...... 145

Appendix 16: 169 potential STAT5A binding sites for gene Wipf1 (long motif) ...... 153

Appendix 17: 174 potential STAT5B binding sites for gene Wipf1 (long motif) ...... 159 11

LIST OF TABLES

Page

Table 1 Five potential STAT5 binding sites with the lowest p values (long motif) ...... 34

Table 2 Five potential STAT5 binding sites with the lowest p values (short motif)...... 38

Table 3 Important Galaxy folders ...... 62

Table 4 Programs ...... 64

Table 5 Potential STAT5 binding sites (short motif) validated against ChIP-seq data set with GEO accession number GSM784027 ...... 111

Table 6 Potential STAT5 binding sites (long motif) validated against ChIP-seq data set with GEO accession number GSM784027 ...... 118

Table 7 43 Genes with potential STAT5 binding sites (long motif for STAT5A or

STAT5B and short motif) related to term "Inflammatory Response" ...... 132

Table 8 58 Genes with potential STAT5 binding sites (long motif for STAT5A or

STAT5B and short motif) related to term "Apoptotic process" ...... 134

Table 9 24 Genes with potential STAT5 binding sites (long motif for STAT5A or

STAT5B and short motif) related to term "Cytokine-mediated signaling pathway" ...... 136

Table 10 Against ChIP-seq data set (GEO: GSM784027) validated potential STAT5 binding sites (80 genes, long motif) ...... 137

Table 11 Against ChIP-seq data set (GEO: GSM784027) validated potential STAT5 binding sites (74 genes, short motif) ...... 140 12

Table 12 Against ChIP-seq data set (GEO: GSM784027) validated potential STAT5 binding sites (61 genes in intersection between long and short motif) ...... 143

Table 13 190 potential STAT5 binding sites for gene Prkcb (short motif) ...... 145

Table 14 169 potential STAT5A binding sites for gene Wipf1 (long motif) ...... 153

Table 15 174 potential STAT5B binding sites for gene Wipf1 (long motif) ...... 159

13

LIST OF FIGURES

Page

Figure 1. Sequence logo (long STAT5A motif) ...... 28

Figure 2. Sequence logo (long STAT5B motif) ...... 29

Figure 3. All genes with potential STAT5A binding sites (every tenth gene name shown)

...... 30

Figure 4. All genes with potential STAT5B binding sites (every eleventh gene name shown) ...... 31

Figure 5. Ten genes with highest number of potential STAT5A binding sites ...... 32

Figure 6. Ten genes with highest number of potential STAT5B binding sites ...... 33

Figure 7. Sequence logo (short STAT5 motif)...... 34

Figure 8. All genes with potential STAT5 binding sites (short motif, every tenth name printed) ...... 36

Figure 9. Ten genes with highest number of potential STAT5 binding sites ...... 37

Figure 10. Genes with STAT5 binding sites distinguished via covered motifs ...... 39

Figure 11. Genes with potential STAT5 binding sites related to term "Inflammatory

Response" ...... 40

Figure 12. Genes with potential STAT5 binding sites related to term "Apoptotic process"

...... 41

Figure 13. Genes with potential STAT5 binding sites related to term "Cytokine-mediated signaling pathway" ...... 42 14

Figure 14. Relationship between genes related to GO terms ...... 43

Figure 15. Relationship between successfully compared STAT5 binding sites (GEO:

GSM784027) ...... 44

Figure 16. Relationship between genes linked to GO terms and Chip-seq data comparison

...... 45

Figure 17. High level concept ...... 54

Figure 18. Microarray experiments and data sets (Original mouse picture [64] modified)

...... 55

Figure 19. Microarray data set matrix ...... 57

Figure 20. Relationship between upND and upDB data sets ...... 58

Figure 21. Data base scheme (keys with grey background) ...... 60

Figure 22. Galaxy welcome screen ...... 61

Figure 23. Galaxy: Retrieve DNA sequences (program “retrieve_sequences.pl”) ...... 63

Figure 24. Galaxy: Upload sequences/gene names (program “upload_seqs.py”) ...... 82

Figure 25. Perform motif finding (FIMO, program “find_motifs.pl”) ...... 83

Figure 26. Galaxy: Import BED file (program “import_bed_file.pl”) ...... 84

Figure 27. Galaxy: Perform ChIP-seq validation (program “chip_seq__comparison.pl”)85

Figure 28. Galaxy: Potential STAT5 binding site visualization (program “visualize.pl”) 86

Figure 29. Genes 11 to 60 with next highest number of potential STAT5A binding sites

...... 105 15

Figure 30. Genes 61 to 370 with next highest number of potential STAT5A binding sites

(every tenth name shown) ...... 106

Figure 31. Genes 11 to 60 with next highest number of potential STAT5B binding sites

...... 107

Figure 32. Genes 61 to 364 with next highest number of potential STAT5B binding sites

(every eleventh name shown)...... 108

Figure 33. Genes 11 to 60 with next highest number of potential STAT5 binding sites 109

Figure 34. Genes 61 to 357 with next highest number of potential STAT5 binding sites

(every tenth name shown) ...... 110

16

LIST OF ABBREVIATIONS

API Application programming interface

ChIP Chromatin immunoprecipitation

DNA Deoxyribonucleic acid

EMSA Electrophoretic mobility shift assay

FIMO Find individual motif occurrences

GAS Gamma-activated sequence

GEO omnibus

GO Gene ontology

GUI Graphical user interface

PWM Position weight matrix

SKO Stat5 knock out

SOAP Simple object access protocol

TFBS Transcription factor binding site

TSS Transcription start site

UTR Untranslated region

XML Extensible markup language

17

1. INTRODUCTION

1.1 Diabetes mellitus

Diabetes mellitus is regarded as “one of the most important health problems in the world” [1]. Patients who suffer from this disease are typically impacted by diabetes the remainder of their lives [2]. Diabetes is characterized by the presence of large amounts of sugar in the patient’s blood stream [2]. Basically, two types of diabetes exist. Type one diabetes mellitus results when “...the beta cells of the pancreas no longer make insulin...” [3]. Also, the “[t]reatment for type 1 diabetes includes taking insulin...” [3].

Type 2 diabetes mellitus on the other hand “...usually begins with insulin resistance...”

[3]. Insulin is produced by the pancreas and causes the uptake of glucose from blood to fat cells, liver cells and muscle cells [2]. Glucose is an important source of energy in these cells [2]. The condition of insulin resistance manifests itself in muscle, fat and liver cells not processing glucose correctly [3]. The pancreas produces more insulin to help signal these tissues to process glucose, but eventually the pancreas cannot keep pace with the higher demand for insulin and high blood glucose results [3].

1.2 Diabetic Nephropathy

Diabetic nephropathy is a condition which harms kidneys and can develop from diabetes mellitus [4]. Typical for patients with diabetic nephropathy are increased levels in urine. An essential part of diabetes mellitus is inflammation [2]. A typical progression of this disease includes the decrease of the kidney's function until it 18

fails eventually [5]. The kidney is an important organ that is involved in getting rid of undesired components such as urea or other “waste products” [6] via urine [6]. The kidney also controls the amounts of potassium, acid and salt in human bodies [7].

Characteristic symptoms of patients who are impacted by diabetic nephropathy are headaches, swollen legs, poor appetite, nausea, and tiredness [5]. Healthy kidneys result in low protein levels in urine whereas kidneys in disease state are typically smaller, have a “granular surface” [5], do not perform as well and result in high protein levels in urine [5].

1.3 Transcription factors and their mechanism

Transcription factors are that regulate gene expression, often by binding to DNA sequences [8]. For this purpose, transcription factors often bind to specific and relatively short (typically about 6-8nt long [8]) DNA sequences to activate or deactivate gene expression. The set of transcription factor binding sites (TFBSs) for a given transcription factor is typically bundled into a motif. A motif represents a set of TFBS and enables an intuitive graphical representation. A motif can be based on experimentally validated TFBSs. Electrophoretic mobility shift assays (EMSA) [9] are one means to experimentally validate a potential TFBS.

1.4 Transcription factor STAT5

STAT5 is a transcription factor [10] that belongs to the protein family “[s]ignal transducers and activators of transcription” and exists in two different variants: STAT5A 19

and STAT5B [10]. Both variants share 91% identity at the protein level [10] in humans.

Based on preliminary results from Dr. Karen Coschigano’s laboratory, STAT5 is hypothesized to repress the expression of inflammation-related genes in kidney cells of diabetic mice [11].

E. Soldaini et al. found that STAT5A and STAT5B homodimers “share similar core TTC(T/C)N(G/A)GAA gamma-activated sequence (GAS) motifs” [10].

STAT5 tetramers show different binding patterns that do not necessarily follow a consensus motif [10] and are not part of this study. The STAT5 binding motifs used in this study are depicted in Figure 7 (short motif) and Figure 1 and Figure 2 (long motif) in chapter “4. Results”. The long STAT5 binding motif consists of a 9 nucleotides (nt) binding motif in the center and a 3nt flanking sequence on each end.

STAT5 has been shown to bind to promoter, intron, exon, 3’ untranslated region

(UTR) and 5’ UTR sequences [12]. 3’ UTR and 5’ UTR sequences are part of messenger

RNA (mRNA) sequences [13]. 3’ UTR and 5’ UTR sequences are not translated into protein but they are targets of regulatory elements [13].

1.5 Previous laboratory experiments

To investigate the specific role of STAT5 in diabetic nephropathy, Dr.

Coschigano designed experiments that are explained in more detail in sub-chapter “6.1

Microarray experiment”. Four different groups of mice were studied: two groups of wild type mice with and without diabetes, and two groups of STAT5 knock out (KO) mice 20

with and without diabetes. The diabetic states in mice were created by injecting streptozotocin (STZ), which destroys the pancreatic cells that produce insulin [14], creating a type 1 diabetes model. Gene expression levels were determined via microarray experiments for all four types of mice. Sub-chapter “6.1.2 Analysis of data sets” further describes how these data sets were processed.

21

2. HYPOTHESIS AND SPECIFIC AIMS

2.1 Hypothesis

Dr. Karen Coschigano developed the hypothesis that STAT5 A/B acts directly to repress RNA expression of inflammation-related genes in diabetic kidneys. This hypothesis is the central topic of this thesis, therefore, the following chapters explain the applied methods as well as the gained results to help support the hypothesis.

2.2 Specific Aims

The hypothesis was explored via three Specific Aims. The first Specific Aim entails the construction of position weight matrices (PWMs) that represent STAT5 binding motifs and the identification of up-regulated genes during Stat5 knock out that contain potential STAT5 binding sites. A PWM is a means to code the motif of transcription factor binding sites, usually of one transcription factor. For each position of the motif the probability of each of the four nucleotides (A, C, G, T) to occur at that position is contained in the PWM. Applying computational methods to determine a set of potential STAT5 binding sites is advantageous over applying experimental methods at the initiation of a study because the computational methods are independent of cell type or state. In addition, they can be applied on a larger scale because they are less expensive to utilize compared to experimental laboratory procedures. In the next step, a small set of interesting potential STAT5 binding sites can be experimentally validated in the laboratory. 22

Two types of motifs were applied to search for potential STAT5 binding sites: a short motif and two long STAT5 motifs. Each motif is represented by a PWM and typically based on known STAT5 binding sites. Both motifs were used to search a set of sequences of interest for potential STAT5 binding sites.

The long STAT5 motifs [10] distinguish between STAT5A and STAT5B. Both motifs are 15nt long and contain three flanking nucleotides on each side of the nine nt long STAT5 binding site. The consensus motifs are

N(A/g)(T/A)TTC(C/T)N(G/a)GAA(A/t/c)(T/c)N for STAT5A and

N(A/t/g)(T/A)TTC(C/T)(T/c/a/g)(G/a)GAA(T/A)(T/c/a)N for STAT5B [10].

The short STAT5 motif [15] does not distinguish between STAT5A and STAT5B and is based on experimentally validated STAT5 binding sites. This motif is nine nt long and was applied to determine a set of potential STAT5 binding sites. The short STAT5 motif is TTC(C/T/g)N(A/G)GAA.

Reaching the second Specific Aim leads to the determination of processes and pathways in which the identified genes from Specific Aim 1 are involved. Because the initially determined set of potential STAT5 binding sites is large, filters help to create a smaller sub-set of potential STAT5 binding sites that is more manageable and whose sites are associated with more interesting genes to examine further. The base of this filter is a set whose genes are linked to terms that are associated with processes that can occur in diabetic nephropathy. This set of genes can be determined by searching a gene ontology

(GO) data base. The filtered set can be created by constructing the intersection between 23

the set of genes with potential STAT5 binding sites and the set of genes related to the GO terms of interest.

The third Specific Aim entails the validation of potential STAT5 binding sites against experimentally validated binding sites. The initial set of potential STAT5 binding sites can be narrowed by comparing the potential binding sites to STAT5 binding sites experimentally validated via ChIP-seq technology. These data are available via public databases.

24

3. LITERATURE REVIEW

The work of this thesis is centered around the hypothesis that transcription factor

STAT5 directly inhibits the expression of inflammation-related genes in diabetic nephropathy in mice. To gain more insight into the causes of diabetes is very valuable because diabetes is an important health problem today [1]. Transcription factors contribute to disease development because they can activate or deactivate genes by binding to their DNA sequences [8]. STAT5 binds to promoter, exon, intron, 3’ UTR and 5’ UTR sequences as shown before [12]. To examine the hypothesis, Dr.

Coschigano previously performed microarray experiments. The design of microarray experiments in general and further processing of their results were described by Slonim and Yanai [16]. “Gene expression microarrays provide a snapshot of all the transcriptional activity in a biological sample” [16]. “Unlike most traditional molecular biology tools, which generally allow the study of a single gene or a small set of genes, microarrays facilitate the discovery of totally novel and unexpected functional roles of genes” [16]. Microarray experiments can be applied to identify “novel disease subtypes”

[16] or they can be utilized to determine differences in gene expression between healthy state and disease state.

DNA sequences to be investigated and associated with genes of interest can be retrieved from the Ensembl database [17] by accessing Ensembl’s application programming interface (API). “The Ensembl project (http://www.ensembl.org) provides genome information for sequenced chordate genomes with a particular focus on human, 25

mouse, zebrafish and rat” [17]. Ensembl’s “resources include evidenced-based gene sets for all supported species” [17]. Ensembl requires the Bioperl toolkit [18] to be installed to access its API. Alternatively, DNA sequences can be retrieved via the web portals of the University of California Santa Cruz (UCSC) [19] and the National Center for

Biotechnology Information (NCBI) [20].

A framework to process biological data is Galaxy [21] [22] [23]. “Galaxy

(http://galaxyproject.org) is a software system” that “allows experimentalists without informatics or programming expertise to perform complex large-scale analysis with just a

Web browser” [21]. Galaxy is available as “a downloadable package that can be deployed in individual laboratories” [21] and can be customized as well to add additional functions.

A typical method to search for TFBSs applies PWMs [24]. PWMs and a set of sequences of interest can be fed into a motif finding tool such as FIMO [25] to discover a set of putative TFBSs which are typically represented by motifs [26]. FIMO can be accessed via its API which requires Opal2 [27] and SOAP::Lite [28] to be installed.

If TFBSs are not known in advance, motif discovery [29] can be performed via tools such as the “Multiple Em for Motif Elicitation” [30] (MEME) suite [31] or

Wordseeker [32]. Motif discovery might be helpful in addition to motif finding to discover new potential TFBSs. “The purpose of motif discovery is to discover patterns in biopolymer (nucleotide or protein) sequences in order to better understand the structure 26

and function of the molecules the sequences represent” [29]. “Sequence motifs can represent … TFBSs …” [29].

A set of potential TFBSs might statistically be relevant but these sequences do not necessarily bind a transcription factor. Therefore, the associated genes can be filtered depending on their associativity to a condition of interest such as diabetic nephropathy.

GO databases based on annotations by the Gene Ontology Consortium [33] provide associations between genes and a controlled vocabulary that describes the functions of a gene and the pathways in which it is involved. The GO database accessed by the tool

AmiGO [34] is based on the Gene Ontology Consortium’s annotation. If GO terms or the functions of a gene are not known in advance, tools such as GO::TermFinder [35] can be utilized to identify GO terms that are associated with a gene or a set of genes.

A different method to filter a set of potential TFBSs is the comparison of each potential TFBS to experimentally validated TFBSs. Typical methods to perform experimental validation of a TFBS are chromatin immunoprecipitation sequencing

(ChIP-seq) [36] and electrophoretic mobility shift assays (EMSAs) [9]. “Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a technique for genome-wide profiling of DNA-binding proteins, histone modifications or nucleosomes” [36]. “Model- based analysis of ChIP-Seq” (MACS) [37] is a tool that can be used to determine peaks in

ChIP-seq data. “The gel electrophoresis mobility shift assay (EMSA) is used to detect protein complexes with nucleic acids” [9]. “In the classical [EMSA] assay, solutions of protein and nucleic acid are combined and the resulting mixtures are subjected to 27

electrophoresis under native conditions through polyacrylamide or agarose gel” [9].

Alternatively, potential TFBS can be compared to the JASPAR database [38] which contains “matrix profiles describing the DNA-binding patterns of transcription factors”

[38].

Sequence logos [39] of TFBS motifs are helpful to visualize the binding characteristics of a transcription factor. WebLogo [40] is a sequence logo generator that can be accessed via its website to visualize a given binding motif.

28

4. RESULTS

The following results support our hypothesis that STAT5 A/B directly inhibits

RNA expression of inflammation-related genes in diabetic kidneys as further elaborated in chapter “5. Discussion”.

4.1 Identification of putative STAT5 binding sites

Potential STAT5 binding sites were determined by performing motif finding via the FIMO software [25] and PWMs that represent known STAT5 binding sites on a set of

DNA sequences that were hypothesized to contain STAT5 binding sites based on the microarray experiment result analysis as described in chapter “6.1.2 Analysis of data sets”.

The PWMs that represent the long binding motifs for STAT5A and STAT5B are shown in Appendix 8. Both long binding motifs for STAT5A and STAT5B depicted in

Figure 1 and Figure 2 were taken from an article by E. Soldaini et al [10]. The long binding motif logos for STAT5A and STAT5B [40] [39] were created via the Weblogo website [41].

Figure 1. Sequence logo (long STAT5A motif) 29

Figure 2. Sequence logo (long STAT5B motif)

Overviews of all 11,022 identified potential STAT5A and STAT5B binding sites based on the long motif are depicted in Figure 3 and Figure 4. 5,634 of these potential

STAT5 binding sites belong to the STAT5A motif and 5,388 of these sites belong to the

STAT5B motif. The charts for STAT5A and STAT5B (Figure 3 and Figure 4) look very similar. The most potential binding sites for a single gene (169 for STAT5A, 174 for

STAT5B) belong to sequences that are associated with gene Wipf1.

All potential STAT5A binding sites for gene Wipf1 are shown in Table 14

(Appendix 16). 89 potential STAT5A binding sites associated with gene Wipf1 are located on the positive strand and 80 are located on the negative strand. The closest distance of a potential STAT5A binding site is 222nt away from the transcription start site (TSS). The largest distance is 100778nt. Most potential STAT5A binding sites associated with gene Wipf1 are located in intronic regions (based on the input sequence).

All potential STAT5B binding sites for gene Wipf1 are shown in Table 15

(Appendix 17). 84 potential STAT5B binding sites associated with gene Wipf1 are located on the positive strand and 90 are located on the negative strand. The closest 30

potential STAT5B binding site is 222nt away from the TSS. The largest distance is

100778nt. Most potential STAT5B binding sites for gene Wipf1 are located in intronic regions (based on the input sequence).

Figure 3. All genes with potential STAT5A binding sites (every tenth gene name shown)

Figure 5 and Figure 6 focus on the 10 genes with the highest number of potential binding sites for STAT5A and STAT5B. The five genes with the highest number of potential binding sites are in identical order for STAT5A and STAT5B. In the second half three genes are identical but at different positions in the chart for STAT5A and

STAT5B. Two genes do not match, but the number of potential STAT5 binding sites 31

decreases more slowly after the five genes with the highest number of potential STAT5 binding sites. This is an indicator for the similarity of STAT5A and STAT5B. The remaining genes with potential STAT5 binding sites belonging to the long motif are depicted in Appendix 4 in Figure 29, Figure 30, Figure 31 and Figure 32.

Figure 4. All genes with potential STAT5B binding sites (every eleventh gene name shown)

32

Figure 5. Ten genes with highest number of potential STAT5A binding sites

33

Figure 6. Ten genes with highest number of potential STAT5B binding sites

Table 1 shows five potential STAT5 binding sites for the long motif with the smallest p values. All five potential STAT5 binding sites were found by FIMO on the opposite strand compared to the input sequence and were assigned the sequence type

(promoter, exon, intron, 3’ UTR, 5’ UTR) and gene name of the input sequence.

34

Table 1

Five potential STAT5 binding sites with the lowest p values (long motif)

Bin- Trans- Gene Se- Chro- Str. Str. Potential Potential p value q value Score Potential STAT5 binding site ding cription quence mo- (in- binding binding site factor type some put site start site end ID seq.) (mm10) (mm10)

23307 STAT5A Serpina3f Prom. 12 -1 1 104214420 104214434 0.00000015 0.427 17.9957 GATTTCTAGGAAATA

23310 STAT5B Ugt2b35 Exon 5 -1 1 87001306 87001320 0.000000159 0.475 18.5276 GGATTCCTGGAATTT

23312 STAT5A Vav1 Intron 17 -1 1 57323711 57323725 0.00000019 0.427 17.8906 CAATTCCAGGAAATT

23319 STAT5A Dcdc2a Intron 13 -1 1 25091907 25091921 0.000000246 0.427 17.7855 TAATTCTTGGAAATC

23320 STAT5B Ddx60 Intron 8 -1 1 62033543 62033557 0.000000275 0.475 18.1995 CATTTCTTGGAATTT

The PWM that represents the short STAT5 binding motif is shown in Appendix 8.

The short STAT5 binding motif as depicted in Figure 7 is not specific to STAT5A or

STAT5B and is based on 26 experimentally validated STAT5 binding sites [15]. The short STAT5 binding motif logo [40] [39] was created via the Weblogo website [41].

Figure 7. Sequence logo (short STAT5 motif)

The distribution of potential STAT5 binding sites per gene belonging to the short motif is depicted in Figure 8 and is similar to the distributions of potential STAT5A and

STAT5B binding sites per gene belonging to the long motif. The highest number of potential STAT5 binding sites associated with the short motif (190 binding sites) is 35

associated with gene Prkcb. All binding sites are shown in Table 13 (Appendix 15). 96 potential STAT5 binding sites associated with gene Prkcb are located on the positive strand and 94 are located on the negative strand. The closest potential STAT5 binding site is 8394nt away from the TSS. The largest distance is 339801nt. Most potential

STAT5 binding sites for gene Prkcb are located in intronic regions (based on input sequences). The ten genes with the largest number of potential STAT5 binding sites are depicted in the chart in Figure 9 and show strong similarities to the ten genes with the largest number of binding sites for STAT5A and STAT5B (long motif). All three sets have eight genes in common. The remaining genes with potential STAT5 binding sites belonging to the short motif are depicted in Appendix 4 in Figure 33 and Figure 34.

36

Figure 8. All genes with potential STAT5 binding sites (short motif, every tenth name printed)

37

Figure 9. Ten genes with highest number of potential STAT5 binding sites

Table 2 shows five potential STAT5 binding sites for the short motif with the smallest p values. Because 582 potential STAT5 binding sites have the same smallest p value, q value and the same highest score, all potential STAT5 binding sites with the same p value have been sorted alphabetically ascendingly. Five potential STAT5 binding sites closest to the letter “a” have been chosen for Table 2. Potential STAT5 binding sites that were found by FIMO on the opposite strand compared to the input sequence were assigned the sequence type (promoter, exon, intron, 3’ UTR, 5’ UTR) and gene name of the input sequence.

38

Table 2

Five potential STAT5 binding sites with the lowest p values (short motif)

Bin- Gene Se- Chro- Str. Str. Potential Potential p value q Score Potential STAT5 ding quence mo- (in- binding binding value binding site site type some put site start site end ID seq.) (mm10) (mm10)

2 2810459M11Rik Exon/ 3‘ 1 -1 1 86054362 86054370 0.00000403 0.262 12.9748 TTCCAGGAA UTR

8 A230050P20Rik Intron/ 9 1 1 20869194 20869202 0.00000403 0.262 12.9748 TTCCAGGAA Prom.

19 Amph Intron 13 -1 1 18975927 18975935 0.00000403 0.262 12.9748 TTCCAGGAA

20 Amph Intron 13 -1 1 18978374 18978382 0.00000403 0.262 12.9748 TTCCAGGAA

21 Amph Intron 13 -1 1 19012019 19012027 0.00000403 0.262 12.9748 TTCCAGGAA

All three sets of genes with potential STAT5 binding sites are depicted in the

Venn diagram in Figure 10. The majority of all genes with potential STAT5 binding sites belongs to all three sets of potential STAT5 binding sites. 39

Figure 10. Genes with STAT5 binding sites distinguished via covered motifs

4.2 Filtering genes by GO terms

The results of the comparison of gene names with potential STAT5 binding sites with gene names that are associated with GO terms of interest (definition in sub-chapter

“6.5.3.2 Filter potential STAT5 binding sites via gene ontology database” in the

“Methods” chapter) were identical for the short motif and the long STAT5 motif

(combined result set for STAT5A and STAT5B) with small differences between

STAT5A and STAT5B results. 40

Figure 11 depicts the relationship between genes with potential STAT5A or

STAT5B binding sites and genes that are associated with the term “Inflammatory

Response”. 43 genes (Table 7, Appendix 9) are included in the intersection of these two sets. Out of those 43 genes, 41 genes match the GO term and have both potential

STAT5A and STAT5B binding sites. Gene Cd14 matches the GO term but contains only potential STAT5A binding sites. Gene Ccl8 matches the GO term but contains only potential STAT5B binding sites.

Figure 11. Genes with potential STAT5 binding sites related to term "Inflammatory

Response"

Figure 12 depicts the relationship between genes with potential STAT5A or

STAT5B binding sites and genes that are associated with the term “Apoptotic process”.

58 genes (Table 8, Appendix 10) are included in the intersection of these two sets. Out of those 58 genes, 54 genes match the GO term and have both potential STAT5A and

STAT5B binding sites. Genes Birc5 and Lcn2 match the GO term but contain only 41

potential STAT5A binding sites. Genes Aif1 and Casp1 match the GO term but contain only potential STAT5B binding sites.

Figure 12. Genes with potential STAT5 binding sites related to term "Apoptotic process"

Figure 13 depicts the relationship between genes with potential STAT5A or

STAT5B binding sites and genes that are associated with the term “Cytokine-mediated signaling pathway”. 24 genes (Table 9, Appendix 11) are included in the intersection of these two sets. Out of those 24 genes, 23 genes match the GO term and have both potential STAT5A and STAT5B binding sites. Gene Cxcr3 matches the GO term but contains only potential STAT5A binding sites. No gene matches the GO term that contains only potential STAT5B binding sites.

42

Figure 13. Genes with potential STAT5 binding sites related to term "Cytokine-mediated signaling pathway"

Figure 14 depicts the relationship between all three aforementioned intersection sets. All three sets have the five genes Adipoq, Ccl2, Ccl5, Ccr5 and Il1b in common.

Considering all identified potential STAT5 binding sites based on the long motif, the lowest p value is about 0.00000004 and the largest p value is about 0.0001. For potential Adipoq binding sites the p value ranges from about 0.0000003 through about

0.0001. Regarding Ccl2, potential STAT5 binding sites have p values between about

0.000004 and about 0.00007. For potential Ccl5 binding sites the p value ranges from about 0.000007 through about 0.00001. Concerning Ccr5, potential STAT5 binding sites have p values between about 0.00001 and about 0.0001. For potential Il1b binding sites the p value ranges from about 0.000008 through 0.0001.

Considering all identified potential STAT5 binding sites based on the short motif, the lowest p value is about 0.000004 and the largest p value is about 0.0001. For potential Adipoq binding sites the p value ranges from about 0.000004 through about 43

0.00005. Regarding Ccl2, potential STAT5 binding sites have p values between about

0.000004 and about 0.00001. For potential Ccl5 binding sites the p value ranges from about 0.00004 through about 0.00007. Concerning Ccr5, potential STAT5 binding sites have p values between about 0.000004 and about 0.00009. For potential Il1b binding sites the p value ranges from about 0.00005 through 0.00006.

Figure 14. Relationship between genes related to GO terms

4.3 Comparison of putative STAT5 binding sites to published ChIP-seq results

Potential STAT5 binding sites were compared to the ChIP-seq data set with GEO accession number GSM784027 identifying STAT5 binding sites in mouse liver [42]. 412 44

potential binding sites for STAT5A and STAT5B belonging to the long motif could be successfully compared. A successful comparison is defined in sub-chapter “6.5.3.1

Comparison of potential STAT5 binding sites to experimentally validated STAT5 binding sites (ChIP-seq)” in the “Methods” chapter. The sites are associated with 80 genes (Table 10, Appendix 12). These results demonstrate that these 412 putative

STAT5 binding sites indentified in a kidney experiment could bind STAT5 in the liver.

Likewise, 257 potential binding sites for STAT5 belonging to the short motif could be successfully compared to the Chip-seq results. These sites are associated with 74 genes

(Table 11, Appendix 13). Figure 15 depicts the intersection of both sets which contains

61 genes (Table 12, Appendix 14).

Figure 15. Relationship between successfully compared STAT5 binding sites (GEO:

GSM784027)

Gene Ccr5, one out of five genes that matched all three GO terms, is one of these

61 genes (Figure 16). Ccr5’s corresponding potential binding sites are indexed via 45

binding site IDs 30759, 30760, 24833 and 24834 (long motif, Table 6 in Appendix 6) and

735 and 736 (short motif, Table 5 in Appendix 5). Gene P2rx7 matched GO terms

“Inflammatory response” and “Apoptotic process” and one associated potential STAT5 binding site could be successfully compared to the ChIP-seq data set (Figure 16). This binding site is indexed via binding site ID 31408 (Table 6 in Appendix 6). P2rx7 is one out of 80 genes whose associated potential STAT5 binding sites could be successfully compared against the ChIP-seq data set (Table 10, Appendix 12).

Figure 16. Relationship between genes linked to GO terms and Chip-seq data comparison

46

As a brief overview of the findings, a total of 11,022 potential STAT5 binding sites were identified initially based on the PWMs for the long motifs for STAT5A and

STAT5B. This initial set was filtered to 52 potential STAT5 binding sites via comparison to three GO terms and eventually yielded one potential STAT5 binding site which appeared in a ChIP-seq data set with experimentally validated STAT5 binding sites and whose associated gene matched all three GO terms (Figure 16). In addition, another potential STAT5 binding site associated with gene P2rx7 appeared in the ChIP- seq data set and matched two GO terms (“Inflammatory response” and “Apoptotic process”).

Likewise, a total of 6,135 potential STAT5 binding sites were initially determined based on the PWM for the short motif for STAT5. This set was filtered to 30 potential

STAT5 binding sites via comparison to three GO terms and yielded one potential STAT5 binding site by additional comparison to experimentally validated (ChIP-seq) STAT5 binding sites. This potential STAT5 binding site based on the short motif is actually identical to the 9nt core of a potential STAT5 binding site based on the long motif as described in the previous paragraph and is associated with gene Ccr5. The potential binding site based on the long STAT5 motif contains additional 3nt flanking sequences on both ends in contrast to the same potential site based on the short STAT5 motif.

47

5. DISCUSSION

The findings in this thesis are based on computational methods that apply statistical models. Therefore, the large set of potential STAT5 binding sites should be interpreted carefully with the knowledge that STAT5 does not bind to all of them necessarily. For example, the actual binding of a transcription factor such as STAT5 to potential binding sites might require assistance by other transcription factors bound to neighboring sites or be prevented by competitive factors bound at overlapping sites [43].

The chromatin state might be an additional factor that limits access of a transcription factor to a binding site such as a promoter [44]. The methods applied in this study did not validate potential STAT5 binding sites, but they helped to create a more manageable list of potential binding sites which is feasible to validate in the laboratory. The initial set of 11,022 potential STAT5 binding sites (long motif) was narrowed to 52 potential STAT binding sites of particular interest (GO filter) and eventually to one potential STAT5 binding site (ChIP-seq data set filter) whose associated gene matched all three GO terms.

In addition, one potential STAT5 binding site associated with gene P2rx7 that matched two GO terms (“Inflammatory response”, “Apoptotic process”) was successfully found in

ChIP-seq data as well. The initial set of 6,135 potential STAT5 binding sites (short motif) was narrowed to 30 potential STAT binding sites (GO filter) and eventually to one potential STAT5 binding site (ChIP-seq data set filter) whose associated gene matched all three GO terms. The short potential STAT5 binding site is actually identical with the 9nt core of the previously mentioned long potential STAT5 binding site and therefore 48

belongs to gene Ccr5. This site seems to be the most suitable potential STAT5 binding site to experimentally validate in the laboratory. However, it should be kept in mind when these results are interpreted, that the conditions of the data set from the ChIP-seq experiment were different (liver cell, no disease state) from the conditions in Dr.

Coschigano’s microarray experiment (kidney tissue, diabetes).

Gene Ccr5 [45] (“chemokine (C-C motif) 5”, human) is “a member of the beta family” and “is known to be an important co-receptor for macrophage-tropic virus, including HIV” [46]. Ccr5 is associated with diabetes [46] and its expression affects the phosphorylation of STAT5 [47]. These data support our hypothesis that STAT5 plays a role in diabetes and that it has a protective function. It still needs to be determined if STAT5 binds to the Ccr5 gene in kidney cells in a diabetic state. A successful experimental validation of STAT5 binding to Ccr5 in kidney cells of diabetic animals would further support our hypothesis that STAT5 inhibits the RNA expression of inflammation-related genes in diabetic nephropathy.

Potential STAT5 binding sites were identified in four additional genes that matched all three GO terms but were not found in the ChIP-seq results.

Gene Adipoq [48] (“adiponectin, C1Q and collagen domain containing”, human) is “expressed in adipose tissue exclusively” and “[t]he encoded protein circulates in the plasma and is involved with metabolic and hormonal processes” [49]. Adipoq is associated with type 2 diabetes [49] and its “regulatory regions … contain the Stat5

DNA-binding sequences and were demonstrated, by gel shift experiments in vitro” 49

previously [50]. This information supports the connections between STAT5 and diabetes on the one hand and between Adipoq and STAT5 on the other hand. These data do not support STAT5’s hypothesized role in kidney cells, but they confirm STAT5’s role in diabetes in general.

Another gene that matched all three GO terms is Ccl2 (“CCL2 chemokine (C-C motif) 2”, human) [51]. “Chemokines are a superfamily of secreted proteins involved in immunoregulatory and inflammatory processes” [51]. Ccl2 is connected to diabetes, kidney disease and the JAK-Stat pathway [52]. Ccl2 is known as “monocyte chemoattractant protein-1 (MCP-1)” as well [53]. Tanimoto et al. previously identified two STAT5 binding sites one of which was located upstream of the TSS and could be experimentally validated via EMSA technology [53]. These findings support our hypothesis.

Gene Ccl5 [54] (“chemokine (C-C motif) ligand 5”, human), also known as

RANTES (“regulated upon activation, normal T cells expressed and secreted”), belongs, as does Ccl2, to the chemokine “superfamily” [55]. Ccl5 “is one of the major HIV- suppressive factors” and “functions as one of the natural ligands for the chemokine receptor chemokine (C-C motif) receptor 5 (CCR5)” [55]. In addition, it has been shown that “CCL5 functioned by receptor-mediated activation of the STAT5-Cyclin D1 pro- proliferative pathway” [56]. These findings support our prediction that STAT5 and Ccl5 are connected, but evidence is missing as to how STAT5 and Ccl5 interact in diabetic kidneys. 50

Gene Il1b (“interleukin 1, beta”, human) codes for a protein that “is a member of the interleukin 1 cytokine family” [57]. “This cytokine is an important mediator of the inflammatory response, and is involved in a variety of cellular activities, including cell proliferation, differentiation, and ” [57]. Il1b is connected to diabetes and kidney disease as well [58], supporting our hypothesis.

Gene P2rx7 (“purinergic receptor P2X, ligand-gated ion channel, 7”, human) matched GO terms “Inflammatory response” and “Apoptotic process” and codes for a protein that “belongs to the family of purinoceptors for ATP” [59]. P2rx7 is connected to

“pancreatic diseases” [60]. These data suggest that additional experimental validation is required to determine the role of STAT5 in the regulation of P2rx7 RNA expression.

Many potential STAT5 binding sites are located distal to the TSS and are associated with intronic sequences. This is a rather atypical location to bind transcription factors. However, previous experiments showed preferential binding of STAT5 to intronic sequences as well [12]. It is possible that transcription factors regulate transcription via distal means, taking the three-dimensional structure of DNA into consideration, by binding to intronic sequences, as suggested of previously by Hubler and Scammell [61].

The results for STAT5A and STAT5B are very similar in regard to genes with the largest number of potential STAT5 binding sites and the application of the GO terms filter. This confirms the initially mentioned similarity between STAT5A and STAT5B regarding their functions. Small differences were identified between the two sets of 51

genes associated with either potential STAT5A or STAT5B binding sites and the GO terms to which they correlate. This seems to be due to the small differences between the long STAT5A and STAT5B consensus binding motifs that contain flanking sequences.

All genes with potential STAT5 binding sites based on the short motif were in the lists of genes who matched GO terms and whose potential binding sites were based on the long

STAT5 motif.

Genes with a relatively large number of potential STAT5 binding sites (Wipf1,

Prkcb) are not necessarily more likely to bind STAT5 per se. The large number of potential sites might be due to a repetitive pattern that is statistically significant, so binding still needs to be experimentally validated in the laboratory. The relatively large number of potential STAT5 binding sites (short motif) with identical p values, q values and scores might be due to the small length of the short motif and the small set of 26

STAT5 binding sites that was used to build the PWM for the short motif.

5.1 Significance

Because diabetes mellitus is considered to be “one of the most important health problems in the world” [1] and because this disease negatively impacts patients during their whole lives [2], it is important to research the cause of diabetes mellitus, its resulting complications and the biological pathways involved. As researchers try to examine what causes the disease on a molecular level, computational methods are well suited to gain more insight because they are less expensive than laboratory experiments 52

and independent of cell type and condition. Transcription factor STAT5 regulates RNA expression of genes and its dysregulation presumably allows the expression of inflammation related genes and therefore contributes to the development of diabetic nephropathy. This hypothesis is supported by previous findings [62] [63]. Dysregulation of gene expression by transcription factors caused, for example, by in the genes that code for the transcription factors themselves or by in their target binding sites can lead to diseases [62] [63]. Inflammation related genes that bind STAT5 and are repressed in healthy organisms could be potential targets of new drugs.

Determining these genes could also identify new pathways that play a role in the development of diabetic nephropathy.

5.2 Future work

A small set of interesting potential STAT5 binding sites should be experimentally validated via EMSAs in the laboratory. Interesting candidates are the potential STAT5 binding sites associated with gene Ccr5 because Ccr5 matches all three GO terms, some of Ccr5’s associated potential binding sites were successfully compared to ChIP-seq data

(these sites are of particular interest) and Ccr5 was up-regulated in the absence of STAT5 in diabetic kidney cells (but not in non-diabetic kidney cells in the absence of STAT5) in mice in Dr. Coschigano’s microarray experiments as described in sub-chapter “6.1

Microarray experiment”. Further candidates to be experimentally validated are potential

STAT5 binding sites associated with genes Adipoq, Ccl2, Ccl5, and Illb because these 53

genes matched all three GO terms. Gene P2rx7 matched two GO terms (“Inflammatory response”, “Apoptotic process”) and one associated potential STAT5 TFBS could be successfully compared to ChIP-seq data, which makes this site an interesting validation candidate as well.

The validation of potential STAT5 binding sites via ChIP-seq data should be extended by retrieving further ChIP-seq data sets with experimentally validated STAT5 binding sites and comparing them to the set of potential STAT5 binding sites.

In addition, Table 1 and Table 2 show five potential STAT5 binding sites for the long and the short motif. These potential STAT5 binding sites are associated with genes

Serpina3f, Ugt2b35, Vav1, Dcdc2a and Ddx60 for the long motif and with genes

2810459M11Rik, A230050P20Rik and Amph for the short motif. These sites are of interest as well because they are statistically most significant due to their lowest p values.

Early diagnosis of diabetic nephropathy could be determined by analyzing expression levels of inflammation related genes in kidney cells. Unusually high expression levels of inflammation related genes could be an early indicator of diabetic nephropathy and could help to stop the development of diabetic nephropathy.

Another future task is the retrieval and analysis of 20 genes, 518 5’ UTR sequences and 613 3’ UTR sequences that were not found in the Ensembl data base [17]. 54

6. METHODS

The whole process from identifying up-regulated genes in the absence of STAT5 in diabetic kidneys in mice through the experimental validation of a small sub-set of potential STAT TFBS in the laboratory is depicted in Figure 17.

Figure 17. High level concept

55

6.1 Microarray experiments

Four different types of mice were utilized for microarray experiments as illustrated in Figure 18: mice without diabetes and with Stat5 active ( ), mice with type 1 diabetes mellitus and Stat5 active ( ), mice without diabetes and Stat5 inactive ( ) and mice with type 1 diabetes mellitus and with Stat5 inactive ( ).

[64] [64]

[64] [64]

Figure 18. Microarray experiments and data sets (Original mouse picture [64] modified)

Type 1 diabetes mellitus mice were produced by Dr. Coschigano via inducing diabetes in male wild type mice at the age of two months using STZ [11]. The mice developed diabetes approximately one week after STZ injection. Kidney samples of all four mouse types were collected after 11 weeks of diabetes and analyzed.

56

6.1.1 Description and interpretation of summarized microarray data sets

RNA expression in the kidney samples of all four types of mice was compared via microarray analysis by Dr. Coschigano and Dr. K. Wyatt McMahon to determine the role of transcription factor STAT5 in diabetic nephropathy. For each of these four types of mice, RNA expression was measured and compared in the four ways shown in Figure 18: altered in mice in comparison to mice (data set ND), altered in mice in comparison to mice (data set WT), altered in mice in comparison to

mice (data set DB), and altered in mice in comparison to mice (data set SKO). Whereas data set ND shows the impact of Stat5 knock out (SKO) in non diabetic mice, data set DB shows the impact of Stat5 knock out (SKO) in mice with type

1 diabetes mellitus. Similarly, data set WT examines the impact of diabetes on wild type mice whereas the SKO data set highlights the impact of diabetes on SKO mice. Dr. K.

Wyatt McMahon performed analyses of the microarray results which yielded two subsets of genes that were identified for each data set: genes up-regulated or down-regulated in each comparison. Only the up-regulated subsets were used for this project.

57

Figure 19. Microarray data set matrix

6.1.2 Analysis of data sets

To determine genes that were solely up-regulated in kidneys of mice with diabetic nephropathy and the Stat5 deletion and not in wild type mice with the Stat5 deletion, the intersection (SKO, 104 genes) was built between the up-regulated genes in data set ND

(upND) and DB (upDB) by Xiaoyu Liang prior to this thesis as illustrated in Figure 20.

These 104 genes were subtracted from 553 genes in data set upDB which yielded 449 genes that were solely up-regulated in kidneys of Stat5 knock out mice with diabetic nephropathy.

58

Figure 20. Relationship between upND and upDB data sets

6.2 Literature search for experimentally validated STAT5 binding sites

Searching literature showed that STAT5A and STAT5B homodimers share

“gamma-activated sequence (GAS) motifs” [10] with the same base motif

“TTC(T/C)N(G/A)GAA” [10]. More specifically, STAT5A prefers to bind to the motif

"(A/g)(T/A)TTC(C/T)N(G/a)GAA(A/tc)(T/c)" [10] and STAT5B preferably binds to the motif "(A/tg)(T/A)TTC(C/T)(T/cag)(G/a)GAA(T/A)(T/ca)".

In addition to this long STAT5 binding motif (15nt), there is evidence (namely 26 experimentally STAT5 binding sites) that STAT5 binds to short (9nt) motifs [15].

59

6.2.1 Creation of position weight matrices

Two PWMs that represent the long STAT5A and the long STAT5B binding motifs were taken from an article [10] found in the literature. These PWMs were converted into the MEME format using the MEME software suite [31] and are listed in

Appendix 8.

The PWM that represents the short STAT5 motif (Appendix 7) was constructed based on 26 experimentally validated STAT5 binding sites [15]. For each position in the

9nt long motif, the number of occurrences for each nucleotide was counted and divided by the total number of validated binding sites (26). The PWM (MEME format) for the long motif was used as a blueprint for the short STAT5 motif.

6.3 Data storage (MySQL database)

All relevant DNA sequences that were retrieved and FIMO results (potential

STAT5 binding sites) were stored in a MySQL 5.5.31-0 (Debian Linux Wheezy edition) database. The data base scheme is illustrated in Figure 21. The data base was created using the SQL script “create_db.sql” shown in Appendix 3.

60

Figure 21. Data base scheme (keys with grey background)

6.4 Framework for program execution and to display results (Galaxy)

Galaxy [23] [21] [22] (downloaded on July 4th, 2013) was locally used as a platform to run Perl programs and document program run in- and outputs. Figure 22 shows the Galaxy welcome screen with a menu pane on the left containing all available

Perl programs described in Table 4 to aid discovery and validation of potential STAT5 binding sites. The right pane on the Galaxy welcome screen shows the history of program runs in the Galaxy framework. The center of the welcome screen is used to display the GUI elements that are connected to the underlying programs.

61

Figure 22. Galaxy welcome screen

Figure 23 shows the Galaxy GUI connected to Perl program

“retrieve_sequences.pl”. The GUI is defined via XML file “retrieve_sequences.xml”

(shown in Appendix 2) which is located in folder “galaxy-dist/tools/stat5_analysis” as well as all other XML and program files of this project. The user selects a list of genes that were uploaded via link “0) Upload Sequences”, sequences types that she wishes to retrieve, an organism, the promoter length, selects if one promoter per gene or per transcript should be retrieved and a database in which the retrieved sequences will be 62

stored. After clicking “Execute”, the job will be started and added to the history pane on the right of the Galaxy GUI.

Table 3 shows important folders of the Galaxy framework. File “tool_conf.xml”

(Appendix 2) defines the content of the left pane of the Galaxy GUI.

Table 3

Important Galaxy folders

Folder Relevant content galaxy-dist/tools/stat5_analysis STAT5 project programs and XML files galaxy-dist/ Galaxy left pane configuration file

“tool_conf.xml”

63

Figure 23. Galaxy: Retrieve DNA sequences (program “retrieve_sequences.pl”)

Screenshots of the GUIs for remaining programs are located in Appendix 1.

6.5 Perl programs

For this thesis relevant programs are listed in Table 4. Five out of these six programs are Perl programs. The remaining program was written in Python and was taken from the Galaxy default installation. All programs receive their input parameters from the corresponding Galaxy GUI screen and output data either into specified files or to the standard output device which is shown by the Galaxy GUI.

64

Table 4

Programs

Table 4: continued Program name Description chip_seq__comparison.pl A Perl program that compares potential STAT5 binding sites from a data base table to ChIP-seq data

set from a file in BED format and outputs the results to an output file. find_motifs.pl This Perl program performs motif finding by running the web version of FIMO [25] with a passed

PWM and optionally on a selection of sequence types. The returned potential STAT5 binding sites are

stored in a MySQL data base table (“Fimo_results” or “Fimo_results_short_gas__def_bg”) for further

processing. This program utilizes the client libraries version 1.9.2 of the Opal service architecture [27]

for Perl which were installed locally. The program also utilizes a set of Perl modules that belong to the

simple object access protocol (SOAP) [28]. Package “libsoap-lite-perl-0.714-1” for Debian Linux was

installed. Accessing the FIMO API was performed based on a reference Perl program

(http://meme.nbcr.net/meme/doc/examples/sample_opal_scripts/FimoClient.pl (accessed: 11/20/2012,

7:05pm). Parts of this Perl program were used as solutions for assignments/exams in Ohio University 65

Table 4: continued Program name Description course BME 5170. import_bed_file.pl A Perl program that imports converted coordinates of potential STAT5 binding sites (file in BED

format) into the database with potential STAT5 binding sites (table “Fimo_results” or

“Fimo_results_short_gas__def_bg”) retrieve_sequences.pl This Perl program retrieves promoter, exon, intron, 3’ UTR and/or 5’UTR sequences for a set of genes

from the Ensembl database over the internet and stores the returned DNA sequences in the MySQL

data base table “Sequences”. The user can specify the length of a promoter sequence and if one

promoter will be retrieved per gene or per transcript. This program utilizes BioPerl [18] version 1.2.3.

BioPerl is a set of Perl modules that were tailored to create programs in the area of bioinformatics.

This program also uses the Ensembl Perl API version 69. The content of this program is based on an

Ensembl tutorial (http://useast.ensembl.org/info/docs/api/core/core_tutorial.html) and on Rami Al

Ouran's Perl script, that accesses the Ensembl database, which is part of the Wordseeker tool. upload_seqs.py This is a default Galaxy Python program that uploads data such as sequences or gene names into 66

Table 4: continued Program name Description Galaxy and it accessible via the Galaxy history pane. This program was part of the default Galaxy

installation and was slightly modified. visualize.pl A Perl program that visualizes potential STAT5 binding sites from table “Fimo_results” or from table

“Fimo_results_short_gas__def_bg” as a bar chart. The user can filter the descendingly sorted list of

potential STAT5 binding sites per gene according to sequence type, the number of genes to display, the

position of the first gene in the list, the largest p value allowed and the transcription factor. The user

can further specify the frequency of labels on the x axis (gene names) and the color of the bars. Parts of

this Perl program were used as solutions for assignments/exams in Ohio University course BME 5170. 67

6.5.1 Sequence retrieval from the Ensembl database

DNA sequences were retrieved via Galaxy GUI link “1) Retrieve Sequences

(Ensembl) and Update Database” which is connected to Perl program

“retrieve_sequences.pl”. Promoter, exon, intron, 3 UTR and 5’ UTR sequences were retrieved for 429 out of 449 genes for the organism “Mus musculus” via the Ensembl database Perl API. 20 genes were missing in the Ensembl database release version v69

(mus_musculus_core_69_38) as well as 518 5’ UTR and 613 3’ UTR sequences. A promoter sequence was considered the sequence 1000bp upstream of the start site

(toward 5’ end) of each transcript. 13,852 DNA sequences were retrieved in total. 1,272 of these are promoter sequences, 6,358 are exon sequences, 4,707 are intron sequences,

691 are 3’ UTR sequences and 824 are 5’ UTR sequences.

6.5.2 Motif finding (FIMO)

Potential STAT5 binding sites were retrieved via Galaxy GUI link “2) Find

Motifs and Update Database (FIMO)” which is connected to Perl program

“find_motifs.pl” for promoter, exon, intron, 3’ UTR and 5’ UTR sequences for two different types of PWMs and stored in different tables: the PWM representing the short

STAT5 binding motif (Appendix 7) and the PWM representing the two long STAT5 binding motifs for STAT5A and STAT5B. Motif finding was performed by accessing the internet Perl API of FIMO including the search on both strands. Potential STAT5 binding sites that were found by FIMO on the opposite strand compared to the input 68

sequence were assigned the sequence type (promoter, exon, intron, 3’ UTR, 5’ UTR) and the gene name of the input sequence.

11,022 potential STAT5 binding sites were identified by FIMO for the long

STAT5 binding motif and 6,135 potential STAT5 binding sites for the short STAT5 binding motif. 896 potential STAT5 binding sites are associated with promoter sequences, 1,057 with exon sequences, 8,617 with intron sequences, 394 with 3’ UTR sequences and 58 with 5’ UTR sequences for the long STAT5 binding motif. For the short STAT5 binding motif 443 potential STAT5 binding sites are associated with promoter sequences, 625 with exon sequences, 4,789 with intron sequences, 245 with 3’

UTR sequences and 33 with 5’ UTR sequences.

6.5.3 Filter of potential STAT5 binding sites

6.5.3.1 Comparison of potential STAT5 binding sites to experimentally validated STAT5 binding sites (ChIP-seq)

ChIP-seq data set with GEO [65] accession number GSM784027 [42] was retrieved from the GEO website. This data set was produced by the Department of

Biology of Boston University at Boston, MA and published in December 2011 [42].

Liver cells from mice that were about seven or eight weeks old were used to perform the experiment in vivo and the “[s]equence reads were obtained and mapped to the mouse mm9 (July 2007) genome” [42]. 69

Because the potential STAT5 binding sites were retrieved in the mm10 annotation format, the coordinates were converted into the mm9 annotation format using the liftOver tool on line [66]. Two sets of potential STAT5 binding sites were compared to the retrieved ChIP-seq data set via Galaxy GUI link “4) Compare potential STAT5 binding sites to ChIP-seq data” which is linked to Perl program “chip_seq__comparison.pl”: a set resulting from the short STAT5 binding motif and a set resulting from the long STAT5 motif. Successfully compared potential STAT5 binding sites are listed in Table 5

(Appendix 5, short STAT5 binding motif) and Table 6 (Appendix 6, long STAT5 binding motifs). A successful comparison was defined as matching , strand, sequence and a complete overlap of coordinates between a potential STAT5 binding site and one item of the ChIP-seq data set.

6.5.3.2 Filter potential STAT5 binding sites via gene ontology database

“The Gene Ontology website” [67] [33] was searched for the gene ontology (GO) terms for organism Mus musculus: “Inflammatory Response”, “Apoptotic process” (GO accession number “0006915” and children) and “Cytokine-mediated signaling pathway”

(GO accession number “0019221” and children). These GO terms were suggested by Dr.

Coschigano because they are associated with processes that can occur in diabetic nephropathy and because they are of interest to her. Lists with unique gene names were created for each GO term. Each of these three lists was intersected with the lists of potential STAT5 binding sites for the long binding motif (379 genes) and for the short 70

binding motif (357 genes). All possible intersections between these three intersecting sets were built and depicted in a Venn diagram (Figure 14).

6.6 System architecture and configuration

The Galaxy software was downloaded from the official Galaxy project website on

July 4th, 2013 and installed under a Debian Linux 7.0.0 “Wheezy” (32 Bit) installation.

The Debian Linux was installed inside a virtual machine under VirtualBox 4.2.14 r86644.

On Debian Linux Perl v5.14.2 was installed to run the Perl programs and a MySQL

5.5.31-0 (Debian Wheezy version) database was used to store retrieved DNA sequences as well as retrieved motif finding (FIMO) results. The Debian package “libdbd-mysql- perl-4.021-1+b1 (32-bit)” was installed to provide an interface between Perl and the

MySQL database. Debian package “libgd-graph-perl-1.44-6” was installed to allow the creation of charts in Perl.

6.7 Limitations

DNA sequences from 20 out of 449 genes could not be retrieved from the

Ensembl data base. For the remaining 429 genes, 518 5’ UTR and 613 3’ UTR sequences could not be retrieved. In addition, sequence overlaps including promoter sequences were not determined. Duplicate sequences might be included in the retrieved

DNA sequences or in the identified potential STAT5 binding sites and therefore in the created database. Also, the organism Mus musculus is a model organism that researchers 71

like to perform research on to study human diseases [68], however the results of this study might not be entirely applicable to humans per se.

72

REFERENCES

[1] D. Luis-Rodríguez et al., "Pathophysiological role and therapeutic implications of

inflammation in diabetic nephropathy," World J. Diabetes, vol. 3, no. 1, pp. 7-18,

Jan. 2012.

[2] "Diabetes," 27 Jun. 2012. [Online]. Available:

http://www.ncbi.nlm.nih.gov/pubmedhealth/PMH0002194/. [Accessed 06 Feb.

2013].

[3] "National diabetes information clearinghouse (NDIC)," 6 Dec. 2011. [Online].

Available: http://diabetes.niddk.nih.gov/dm/pubs/type1and2/what.aspx. [Accessed

6 Jun. 2013].

[4] K. Coschigano, private communication, Mar. 2013.

[5] "Diabetes and kidney disease," 27 Jun. 2012. [Online]. Available:

http://www.ncbi.nlm.nih.gov/pubmedhealth/PMH0001524/. [Accessed 06 Feb.

2013].

[6] "National kidney and urologic diseases information clearinghouse (NKUDIC)," 23

Mar. 2012. [Online]. Available:

http://kidney.niddk.nih.gov/kudiseases/pubs/yourkidneys/. [Accessed 19 Oct.

2013]. 73

[7] "How your kidneys work," [Online]. Available:

http://www.kidney.org/kidneydisease/howkidneyswrk.cfm%20?utm_source=nkfho

me&utm_medium=static&utm_campaign=HowYourKidneysWork. [Accessed 12

Oct. 2013].

[8] "Transcription factors and transcriptional control in eukaryotic cells," [Online].

Available: http://www.nature.com/scitable/topicpage/transcription-factors-and-

transcriptional-control-in-eukaryotic-1046. [Accessed 19 Oct. 2013].

[9] L. M. Hellman and M. G. Fried, "Electrophoretic mobility shift assay (EMSA) for

detecting protein-nucleic acid interactions," Nat. Protoc., vol. 2, no. 8, pp. 1849-

1861, 2007.

[10] E. Soldaini et al., "DNA binding site selection of dimeric and tetrameric Stat5

proteins reveals a large repertoire of divergent tetrameric Stat5a binding sites,"

Mol. Cell. Biol., vol. 20, no. 1, pp. 389-401, Jan. 2000.

[11] K. Coschigano, private communication, Apr. 2012.

[12] J. X. Lin et al., "Critical role of STAT5 transcription factor tetramerization for

cytokine responses and normal immune function," Immunity, vol. 36, no. 4, pp.

586-599, Apr. 2012.

[13] B. Mazumder, V. Seshadri and P. L. Fox, "Translational control by the 3′-UTR: the

ends specify the means," Trends in Biochemical Sciences, vol. 28, no. 2, pp. 91-98,

Feb. 2003. 74

[14] M. L. Graham et al., "The Streptozotocin-induced diabetic nude mouse model:

differences between animals from different sources," Comp Med., vol. 61, no. 4,

pp. 356-360, Aug. 2011.

[15] B. Basham et al., "In vivo identification of novel STAT5 target genes," Nucleic

Acids Res., vol. 36, no. 11, pp. 3802-3818, May 2008.

[16] D. K. Slonim and I. Yanai, "Getting started in gene expression microarray

analysis," PLoS Comput. Biol., vol. 5, no. 10, 2009.

[17] P. Flicek et al., "Ensembl 2013," Nucleic Acids Res., vol. 41, pp. 48-55, 2013.

[18] J. E. Stajich, "The Bioperl toolkit: Perl modules for the life sciences," Genome

Res., vol. 12, no. 10, pp. 1611-1618, Oct. 2002.

[19] "UCSC Genome Bioinformatics," [Online]. Available: http://genome.ucsc.edu/.

[Accessed 28 Oct. 2013].

[20] "NCBI National Center for Biotechnology Information," [Online]. Available:

http://www.ncbi.nlm.nih.gov/. [Accessed 28 Oct. 2013].

[21] D. Blankenberg, "Galaxy: a web-based genome analysis tool for experimentalists,"

in Current Protocols in Molecular Biology, 2010, pp. 1-21.

[22] B. Giardine et al., "Galaxy: a platform for interactive large-scale genome analysis,"

Genome Research, vol. 15, no. 10, pp. 1451-1455, Oct. 2005. 75

[23] J. Goecks et al., "Galaxy: a comprehensive approach for supporting accessible,

reproducible, and transparent computational research in the life sciences," Genome

Biol., vol. 11, no. 8, pp. R86-98, Aug. 2010.

[24] R. Osada, E. Zaslavsky and M. Singh, "Comparative analysis of methods for

representing and searching for transcription factor binding sites," Bioinformatics,

vol. 20, no. 18, pp. 3516-3525, 2004.

[25] C. E. Grant, T. L. Bailey and W. S. Noble, "FIMO: Scanning for occurrences of a

given motif," Bioinformatics, vol. 27, no. 7, pp. 1017-1018, Apr. 2011.

[26] M. K. Das and H. Dai, "A survey of DNA motif finding algorithms," BMC

Bioinformatics, vol. 8, p. S21, 2007.

[27] S. Krishnan et al., "Design and evaluation of Opal2: A toolkit for scientific

software as a service," in Proc. 2009 IEEE Congress on Services, Los Angeles,

CA, 2009.

[28] "SOAP::Lite for Perl," [Online]. Available: http://www.soaplite.com/. [Accessed

14 Sep. 2013].

[29] T. L. Bailey, "Discovering sequence motifs," in Bioinformatics, vol. 452, J. Keith,

Ed., Humana Press, 2008, pp. 231-251.

[30] "The MEME Suite," [Online]. Available: http://meme.nbcr.net/meme/meme-

intro.html. [Accessed 17 Nov. 2013]. 76

[31] T. L. Bailey et al., "MEME suite: tools for motif discovery and searching," Nucleic

Acids Res., vol. 37, pp. 202-208, 2009.

[32] J. Lichtenberg et al., "WordSeeker: concurrent bioinformatics software for

discovering genome-wide patterns and word-based genomic signatures," in BMC

Bioinformatics, Boston, MA, 2010.

[33] The Gene Ontology Consortium, "Gene ontology: tool for the unification of

biology," Nat. Genet., vol. 25, no. 1, pp. 25-29, May. 2000.

[34] S. Carbon et al., "AmiGO: online access to ontology and annotation data,"

Bioinformatics, vol. 25, no. 2, pp. 288-289, Jan. 2009.

[35] E. I. Boyle et al., "GO::TermFinder--open source software for accessing gene

ontology information and finding significantly enriched gene ontology terms

associated with a list of genes," Bioinformatics, vol. 20, no. 18, pp. 3710-3715,

Dec. 2004.

[36] P. J. Park et al., "ChIP–seq: advantages and challenges of a maturing technology,"

Nature Reviews Genetics, vol. 10, pp. 669-680, Oct. 2009.

[37] Y. Zhang et al., "Model-based analysis of ChIP-seq (MACS)," Genome Biology,

vol. 9, no. 9, Sep. 2008.

[38] E. Portales-Casamar et al., "JASPAR 2010: the greatly expanded open-access

database of transcription factor binding profiles," Nucleic Acids Res., vol. 38, pp.

D105-D110, Jan. 2010. 77

[39] T. D. Schneider and R. M. Stephens, "Sequence logos: a new way to display

consensus sequences," Nucleic Acids Res., vol. 18, pp. 6097-6100, 1990.

[40] G. E. Crooks et al., " WebLogo: a sequence logo generator," Genome Research,

vol. 14, pp. 1188-1190, 2004.

[41] "Weblogo," [Online]. Available: http://weblogo.berkeley.edu/. [Accessed 27 Aug.

2013].

[42] Y. Zhang, E. V. Laz and D. J. Waxman, "Dynamic, sex-differential STAT5 and

BCL6 binding to sex-biased, growth hormone-regulated genes in adult mouse

liver," Mol. Cell. Biol., vol. 32, no. 4, pp. 880-896, Feb. 2012.

[43] R. Hermsen, S. Tans and P. R. ten Wolde, "Transcriptional regulation by

competing transcription factor modules," PLoS Comput. Biol., vol. 2, no. 12, 2006.

[44] T. Phillips, "Regulation of transcription and gene expression in eukaryotes," Nature

Education, vol. 1, no. 1, 2008.

[45] "Chemokine (C-C Motif) Receptor 5 (Gene/Pseudogene)," [Online]. Available:

http://www.genecards.org/cgi-bin/carddisp.pl?gene=CCR5. [Accessed 20 Nov.

2013].

[46] "CCR5 chemokine (C-C motif) receptor 5 (gene/pseudogene) [ Homo sapiens

(human) ]," 8 Nov. 2013. [Online]. Available:

http://www.ncbi.nlm.nih.gov/gene/1234. [Accessed 17 Nov. 2013]. 78

[47] J. F. Camargo et al., "CCR5 expression levels influence NFAT translocation, IL-2

production, and subsequent signaling events during T lymphocyte activation," J.

Immunol., vol. 182, pp. 171-182, 2009.

[48] "Adiponectin, C1Q And Collagen Domain Containing," 20 Nov. 2013. [Online].

Available: http://www.genecards.org/cgi-

bin/carddisp.pl?gene=ADIPOQ&search=adipoq

[49] "ADIPOQ adiponectin, C1Q and collagen domain containing [ Homo sapiens

(human) ]," [Online]. Available: http://www.ncbi.nlm.nih.gov/gene/9370.

[Accessed 18 Nov. 2013].

[50] A. Davoodi-Semiromi et al., "Influence of Tyrphostin AG490 on the expression of

diabetes-associated markers in human adipocytes," Immunogenetics, vol. 65, no. 1,

pp. 83-90, Jan. 2013.

[51] "CCL2 chemokine (C-C motif) ligand 2 [ Homo sapiens (human) ]," 17 Nov. 2013.

[Online]. Available: http://www.ncbi.nlm.nih.gov/gene/6347. [Accessed 18 Nov.

2013].

[52] "Chemokine (C-C Motif) Ligand 2," [Online]. Available:

http://www.genecards.org/cgi-bin/carddisp.pl?gene=CCL2&search=.

[Accessed 18 Nov. 2013]. 79

[53] A. Tanimoto et al., "Monocyte chemoattractant protein-1 expression is enhanced

by granulocyte-macrophage colony-stimulating factor via Jak2-Stat5 signaling and

inhibited by atorvastatin in human monocytic U937 cells," J. Biol. Chem., vol. 283,

no. 8, pp. 4643-4651, 2008.

[54] "Chemokine (C-C Motif) Ligand 5," 20 Nov. 2013. [Online]. Available:

http://www.genecards.org/cgi-bin/carddisp.pl?gene=CCL5&search=ccl5

[55] "CCL5 chemokine (C-C motif) ligand 5 [ Homo sapiens (human) ]," 17 Nov. 2013.

[Online]. Available: http://www.ncbi.nlm.nih.gov/gene/6352. [Accessed 18 Nov.

2013].

[56] M. Colombatti et al., "The prostate specific membrane antigen regulates the

expression of IL-6 and CCL5 in prostate tumour cells by activating the MAPK

pathways," PLoS One, vol. 4, no. 2, 2009.

[57] "IL1B interleukin 1, beta [ Homo sapiens (human) ]," 17 Nov. 2013. [Online].

Available: http://www.ncbi.nlm.nih.gov/gene/3553. [Accessed 18 Nov. 2013].

[58] "Interleukin 1, Beta," 18 Nov. 2013. [Online]. Available:

http://www.genecards.org/cgi-bin/carddisp.pl?gene=IL1B&search=IL1B

[59] "P2RX7 purinergic receptor P2X, ligand-gated ion channel, 7 [ Homo sapiens

(human) ]," 17 Nov. 2013. [Online]. Available:

http://www.ncbi.nlm.nih.gov/gene/5027. [Accessed 18 Nov. 2013]. 80

[60] "Purinergic Receptor P2X, Ligand-Gated Ion Channel, 7," [Online]. Available:

http://www.genecards.org/cgi-bin/carddisp.pl?gene=P2RX7&search=P2RX7.

[Accessed 18 Nov. 2013].

[61] T. R. Hubler and J. G. Scammell, "Intronic hormone response elements mediate

regulation of FKBP5 by progestins and glucocorticoids," Cell Stress Chaperones,

vol. 9, no. 3, pp. 243-252, Jul. 2004.

[62] J. Villard, "Transcription regulation and human diseases," Swiss Med. Wkly, vol.

134, pp. 571-579, 2004.

[63] T. I. Lee and R. A. Young, "Transcriptional regulation and its misregulation in

disease," Cell, vol. 152 , no. 6, pp. 1237-1251, Mar. 2013.

[64] "File:Black 6 mouse eating.jpg," 18 May. 2010. [Online]. Available:

http://en.wikipedia.org/wiki/File:Black_6_mouse_eating.jpg. [Accessed 13 Feb.

2013].

[65] T. Barrett et al., "NCBI GEO: archive for functional genomics data sets-update,"

Nucleic Acids Res., vol. 41, pp. D991-995, Jan. 2013.

[66] "Lift genome annotations," [Online]. Available: http://genome.ucsc.edu/cgi-

bin/hgLiftOver. [Accessed 23 Jul. 2013].

[67] "Search the gene ontology database," [Online]. Available:

http://amigo.geneontology.org. [Accessed 20 Jan. 2013]. 81

[68] "Why the mouse?," [Online]. Available:

http://genome.wellcome.ac.uk/doc_WTD023552.html. [Accessed 27 Oct. 2013].

82

APPENDIX 1: GALAXY GUI SCREENSHOTS

Figure 24. Galaxy: Upload sequences/gene names (program “upload_seqs.py”)

83

Figure 25. Perform motif finding (FIMO, program “find_motifs.pl”)

84

Figure 26. Galaxy: Import BED file (program “import_bed_file.pl”)

85

Figure 27. Galaxy: Perform ChIP-seq validation (program “chip_seq__comparison.pl”)

86

Figure 28. Galaxy: Potential STAT5 binding site visualization (program “visualize.pl”)

87

APPENDIX 2: GALAXY CONFIGURATION FILES (XML FORMAT)

88

retrieve_sequences.pl $output $genes $sequence_types $promoter_length $organism $output_fasta_files $one_promoter $db_name

89

This function retrieves sequences for the selected sequence types for all selected genes.

90

chip_seq__comparison.pl $output $separator $bed_path $db_name $table_name $output_folder

ATTENTION: Please make sure that the bed file annotation matches the data base annotation (currently mm10)!

91

92

find_motifs.pl $output $pwm $fimo_std $fimo_err $db_name $one_promoter $sequence_types

This runs FIMO on the public server of the MEME suite, thus it may take a bit for the motif finding process to finish and update the database.

93

import_bed_file.pl $bed_file_path $db_name $output $table_name $mm_db

94

samtools upload_seqs.py $GALAXY_ROOT_DIR $GALAXY_DATATYPES_CONF_FILE $paramfile #set $outnum = 0 #while $varExists('output%i' % $outnum): #set $output = $getVar('output%i' % $outnum) #set $outnum += 1 #set $file_name = $output.file_name ## FIXME: This is not future-proof for other uses of external_filename (other than for use by the library upload's "link data" feature) #if $output.dataset.dataset.external_filename: #set $file_name = "None" #end if ${output.dataset.dataset.id}:${output.files_path}:${file_name} #end while 95

not ( ( isinstance( value, unicode ) or isinstance( value, str ) ) and value != "" )

**Auto-detect**

The system will attempt to detect Axt, Fasta, Fastqsolexa, Gff, Gff3, Html, Lav, Maf, Tabular, Wiggle, Bed and Interval (Bed with headers) formats. If your file is not detected properly as one of the known formats, it most likely means that it has some format problems (e.g., different number of columns on different rows). You can still coerce the system to set your data to the format you think it should be. You can also upload compressed files, which will automatically be decompressed.

-----

**Ab1**

A binary sequence file in 'ab1' format with a '.ab1' file extension. You must manually select this 'File Format' when uploading the file.

-----

**Axt** blastz pairwise alignment format. Each alignment block in an axt file contains three lines: a summary line and 2 sequence lines. Blocks are separated from one another by blank lines. The summary line contains chromosomal position and size information about the alignment. It consists of 9 required fields.

-----

**Bam**

A binary file compressed in the BGZF format with a '.bam' file extension.

-----

96

**Bed**

* Tab delimited format (tabular) * Does not require header line * Contains 3 required fields:

- chrom - The name of the chromosome (e.g. chr3, chrY, chr2_random) or contig (e.g. ctgY1). - chromStart - The starting position of the feature in the chromosome or contig. The first base in a chromosome is numbered 0. - chromEnd - The ending position of the feature in the chromosome or contig. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99.

* May contain 9 additional optional BED fields:

- name - Defines the name of the BED line. This label is displayed to the left of the BED line in the Genome Browser window when the track is open to full display mode or directly to the left of the item in pack mode. - score - A score between 0 and 1000. If the track line useScore attribute is set to 1 for this annotation data set, the score value will determine the level of gray in which this feature is displayed (higher numbers = darker gray). - strand - Defines the strand - either '+' or '-'. - thickStart - The starting position at which the feature is drawn thickly (for example, the start codon in gene displays). - thickEnd - The ending position at which the feature is drawn thickly (for example, the stop codon in gene displays). - itemRgb - An RGB value of the form R,G,B (e.g. 255,0,0). If the track line itemRgb attribute is set to "On", this RBG value will determine the display color of the data contained in this BED line. NOTE: It is recommended that a simple color scheme (eight colors or less) be used with this attribute to avoid overwhelming the color resources of the Genome Browser and your Internet browser. - blockCount - The number of blocks (exons) in the BED line. - blockSizes - A comma-separated list of the block sizes. The number of items in this list should correspond to blockCount. - blockStarts - A comma-separated list of block starts. All of the blockStart positions should be calculated relative to chromStart. The number of items in this list should correspond to blockCount.

* Example::

chr22 1000 5000 cloneA 960 + 1000 5000 0 2 567,488, 0,3512 chr22 2000 6000 cloneB 900 - 2000 6000 0 2 433,399, 0,3601

-----

**Fasta**

A sequence in FASTA format consists of a single-line description, followed by lines of sequence data. The first character of the description line is a greater-than (">") symbol in the first column. All lines should be shorter than 80 characters::

>sequence1 atgcgtttgcgtgc gtcggtttcgttgc >sequence2 tttcgtgcgtatag tggcgcggtga 97

-----

**FastqSolexa**

FastqSolexa is the Illumina (Solexa) variant of the Fastq format, which stores sequences and quality scores in a single file::

@seq1 GACAGCTTGGTTTTTAGTGAGTTGTTCCTTTCTTT +seq1 hhhhhhhhhhhhhhhhhhhhhhhhhhPW@hhhhhh @seq2 GCAATGACGGCAGCAATAAACTCAACAGGTGCTGG +seq2 hhhhhhhhhhhhhhYhhahhhhWhAhFhSIJGChO

Or::

@seq1 GAATTGATCAGGACATAGGACAACTGTAGGCACCAT +seq1 40 40 40 40 35 40 40 40 25 40 40 26 40 9 33 11 40 35 17 40 40 33 40 7 9 15 3 22 15 30 11 17 9 4 9 4 @seq2 GAGTTCTCGTCGCCTGTAGGCACCATCAATCGTATG +seq2 40 15 40 17 6 36 40 40 40 25 40 9 35 33 40 14 14 18 15 17 19 28 31 4 24 18 27 14 15 18 2 8 12 8 11 9

-----

**Gff**

GFF lines have nine required fields that must be tab-separated.

-----

**Gff3**

The GFF3 format addresses the most common extensions to GFF, while preserving backward compatibility with previous formats.

-----

**Interval (Genomic Intervals)**

- Tab delimited format (tabular) 98

- File must start with definition line in the following format (columns may be in any order).::

#CHROM START END STRAND

- CHROM - The name of the chromosome (e.g. chr3, chrY, chr2_random) or contig (e.g. ctgY1). - START - The starting position of the feature in the chromosome or contig. The first base in a chromosome is numbered 0. - END - The ending position of the feature in the chromosome or contig. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99. - STRAND - Defines the strand - either '+' or '-'.

- Example::

#CHROM START END STRAND NAME COMMENT chr1 10 100 + exon myExon chrX 1000 10050 - gene myGene

-----

**Lav**

Lav is the primary output format for BLASTZ. The first line of a .lav file begins with #:lav..

-----

**MAF**

TBA and multiz multiple alignment format. The first line of a . file begins with ##maf. This word is followed by white-space-separated "variable=value" pairs. There should be no white space surrounding the "=".

-----

**Scf**

A binary sequence file in 'scf' format with a '.scf' file extension. You must manually select this 'File Format' when uploading the file.

-----

**Sff**

A binary file in 'Standard Flowgram Format' with a '.sff' file extension.

-----

**Tabular (tab delimited)** 99

Any data in tab delimited format (tabular)

-----

**Wig**

The wiggle format is line-oriented. Wiggle data is preceded by a track definition line, which adds a number of options for controlling the default display of this track.

-----

**Other text type**

Any text file

100

visualize.pl $output $p_value $sequence_types $transcription_factor $from $to $db_name $table_name $gene_label_freq $bar_color

101

Visualize FIMO data as a bar chart.

102

APPENDIX 3: SQL SCRIPT “CREATE_DB.SQL”

CREATE DATABASE IF NOT EXISTS `stat5_1`; USE `stat5_1`;

DROP TABLE IF EXISTS `Fimo_results`;

CREATE TABLE `Fimo_results` ( `id` int(11) NOT NULL AUTO_INCREMENT, `pattern_name` varchar(100) NOT NULL, `gene_name` varchar(100) NOT NULL, `type` varchar(1) NOT NULL, `chromosome` varchar(2) NOT NULL, `strand` int(11) NOT NULL, `start_mm9` int(11) NOT NULL, `end_mm9` int(11) NOT NULL, `start_mm10` int(11) NOT NULL, `end_mm10` int(11) NOT NULL, `p_value` float NOT NULL, `q_value` float NOT NULL, `score` float NOT NULL, `word` varchar(50) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB;

CREATE INDEX Fimo_results__start_mm9 ON stat5_1.Fimo_results (start_mm9); CREATE INDEX Fimo_results__end_mm9 ON stat5_1.Fimo_results (end_mm9); 103

CREATE INDEX Fimo_results__strand ON stat5_1.Fimo_results (strand); CREATE INDEX Fimo_results__chr ON stat5_1.Fimo_results (chromosome);

DROP TABLE IF EXISTS `Fimo_results_short_gas__def_bg`;

CREATE TABLE `Fimo_results_short_gas__def_bg` ( `id` int(11) NOT NULL AUTO_INCREMENT, `pattern_name` varchar(100) NOT NULL, `gene_name` varchar(100) NOT NULL, `type` varchar(1) NOT NULL, `chromosome` varchar(2) NOT NULL, `strand` int(11) NOT NULL, `start_mm8` int(11) NOT NULL, `end_mm8` int(11) NOT NULL, `start_mm9` int(11) NOT NULL, `end_mm9` int(11) NOT NULL, `start_mm10` int(11) NOT NULL, `end_mm10` int(11) NOT NULL, `p_value` float NOT NULL, `q_value` float NOT NULL, `score` float NOT NULL, `word` varchar(50) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB;

CREATE INDEX Fimo_results_short_gas__def_bg __start_mm8 ON stat5_1. Fimo_results_short_gas__def_bg (start_mm8); CREATE INDEX Fimo_results_short_gas__def_bg __end_mm8 ON stat5_1. Fimo_results_short_gas__def_bg (end_mm8); 104

CREATE INDEX Fimo_results_short_gas__def_bg __start_mm9 ON stat5_1. Fimo_results_short_gas__def_bg (start_mm9); CREATE INDEX Fimo_results_short_gas__def_bg __end_mm9 ON stat5_1. Fimo_results_short_gas__def_bg (end_mm9); CREATE INDEX Fimo_results_short_gas__def_bg __strand ON stat5_1. Fimo_results_short_gas__def_bg (strand); CREATE INDEX Fimo_results_short_gas__def_bg __chr ON stat5_1. Fimo_results_short_gas__def_bg (chromosome);

DROP TABLE IF EXISTS `Sequence`;

CREATE TABLE `Sequence` ( `id` int(11) NOT NULL AUTO_INCREMENT, `gene_name` varchar(100) NOT NULL, `type` varchar(1) NOT NULL, `chromosome` varchar(2) NOT NULL, `strand` int(11) NOT NULL, `start_mm9` int(11) NOT NULL, `end_mm9` int(11) NOT NULL, `start_mm10` int(11) NOT NULL, `end_mm10` int(11) NOT NULL, `sequence` text NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB; 105

APPENDIX 4: ADDITIONAL CHARTS OF POTENTIAL STAT5 BINDING SITES

Long STAT5 binding motif

Figure 29. Genes 11 to 60 with next highest number of potential STAT5A binding sites

106

Figure 30. Genes 61 to 370 with next highest number of potential STAT5A binding sites

(every tenth name shown)

107

Figure 31. Genes 11 to 60 with next highest number of potential STAT5B binding sites

108

Figure 32. Genes 61 to 364 with next highest number of potential STAT5B binding sites

(every eleventh name shown)

109

Short STAT5 binding motif

Figure 33. Genes 11 to 60 with next highest number of potential STAT5 binding sites

110

Figure 34. Genes 61 to 357 with next highest number of potential STAT5 binding sites

(every tenth name shown)

111

APPENDIX 5: POTENTIAL STAT5 BINDING SITES (SHORT MOTIF) VALIDATED AGAINST CHIP-SEQ DATA SET WITH

GEO ACCESSION NUMBER GSM784027

Potential STAT5 binding sites that were found by FIMO on the opposite strand compared to the input sequence were assigned the sequence type (promoter, exon, intron, 3’ UTR, 5’ UTR) and gene name of the input sequence.

Table 5

Potential STAT5 binding sites (short motif) validated against ChIP-seq data set with GEO accession number GSM784027

Table 5: continued

Bin- Gene Sequence Chro- Str. Str. Potential Potential Potential ChIP-seq ChIP-seq ChIP-seq region Position of ding type mo- (input binding binding STAT5 binding region region end potential site some seq.) site start site end site start STAT5 BS* ID (mm9) (mm9) 3813 2810459M11Rik 3' UTR 1 -1 1 87950176 87950184 TTCCCAGAA 87950170 87950205 AGGAAACCTAAGGGGATATCTTCCCAGAATGACAG 20 3080 2810459M11Rik Exon 1 1 1 87949536 87949544 TTCTCGGAA 87949512 87949547 TCGAATGTTCTAGAACACCTCCAGTTCTCGGAAGA 24 3812 2810459M11Rik Exon 1 -1 1 87950176 87950184 TTCCCAGAA 87950170 87950205 AGGAAACCTAAGGGGATATCTTCCCAGAATGACAG 20 5025 2810459M11Rik Intron 1 -1 1 87944206 87944214 TTCTCAGAA 87944206 87944241 AGCCAGGTAAGGAAGCCAATGAATGATTCTCAGAA 26 8 A230050P20Rik Intron 9 1 1 20673638 20673646 TTCCAGGAA 20673616 20673651 ATGCCTGGAGAACTTCGGATTTTTCCAGGAAATGG 22 9 A230050P20Rik Promoter 9 1 1 20673638 20673646 TTCCAGGAA 20673616 20673651 ATGCCTGGAGAACTTCGGATTTTTCCAGGAAATGG 22 6063 Adcy7 Intron 8 -1 1 90837332 90837340 TTCGGGGAA 90837317 90837352 CCACAGGAGTGTTCGGGGAAGGAGGGCCCTGGATG 11 4523 Amph Intron 13 1 1 19077479 19077487 TTCTTAGAA 19077456 19077491 CTAAATGGTCTAAGTCCTCAGCCTTCTTAGAATTC 23 21 Amph Intron 13 -1 1 19103888 19103896 TTCCAGGAA 19103884 19103919 CTCGAACAGATCATCAGAGGCCTTCCAGGAAATCA 22 2331 Amph Intron 13 -1 1 19149644 19149652 TTCTAAGAA 19149631 19149666 AAGTAATTTATCTTTCTAAGAATATTACTCCTACA 13 627 Ankrd6 3' UTR 4 -1 -1 32892044 32892052 TTCCTGGAA 32892030 32892065 GGATGTAGACATTCCTGGAAACATACAGGGTAGAT 11 5086 Anxa8 Intron 14 1 1 34901629 34901637 TTCTCAGAA 34901620 34901655 CTTGTTCCTTTCTCAGAAGCAGGACCCTTATGGAG 9 3874 Apol6 5' UTR 15 -1 1 76875406 76875414 TTCCCAGAA 76875380 76875415 TTCCCAGAATAGGGACCTTGCTGGGAAGTATACTG 0 3872 Apol6 Exon 15 -1 1 76875406 76875414 TTCCCAGAA 76875380 76875415 TTCCCAGAATAGGGACCTTGCTGGGAAGTATACTG 0 3880 Apol6 Promoter 15 -1 1 76875406 76875414 TTCCCAGAA 76875380 76875415 TTCCCAGAATAGGGACCTTGCTGGGAAGTATACTG 0 676 Batf Intron 12 1 1 87028195 87028203 TTCTAGGAA 87028187 87028222 GTGAAACTTTCTAGGAAGCTGGGGGCAGGGTGAAT 8 735 Ccr5 Exon 9 -1 1 124039884 124039892 TTCCTGGAA 124039874 124039909 TATTCAGTCCAAAGAATTCCTGGAAGGTGGTCAGG 16 736 Ccr5 Intron 9 -1 1 124039884 124039892 TTCCTGGAA 124039874 124039909 TATTCAGTCCAAAGAATTCCTGGAAGGTGGTCAGG 16 6070 Cenpe Intron 3 -1 1 134875763 134875771 TTCGGGGAA 134875751 134875786 AGTCGGACCGTGTTTTCGGGGAAACGGAGTCCTGC 14 112

Table 5: continued

Bin- Gene Sequence Chro- Str. Str. Potential Potential Potential ChIP-seq ChIP-seq ChIP-seq region Position of ding type mo- (input binding binding STAT5 binding region region end potential site some seq.) site start site end site start STAT5 BS* ID (mm9) (mm9) 6070 Cenpe Intron 3 -1 1 134875763 134875771 TTCGGGGAA 134875751 134875786 AGTCGGACCATGTTTTCGGGGAAACGGAGTCCTGC 14 5240 Cfi Intron 3 1 1 129541046 129541054 TTCTGAGAA 129541046 129541081 TTCTGAGAAAGGCTCTGGTCTAGTCACAGCTGAGG 0 5241 Cfi Intron 3 -1 1 129541046 129541054 TTCTCAGAA 129541035 129541070 ACTAGACCAGAGCCTTTCTCAGAATTAGCAAAGAC 15 5241 Cfi Intron 3 -1 1 129541046 129541054 TTCTCAGAA 129541038 129541073 GTGACTAGACCAGAGCCTTTCTCAGAATTAGCAAA 18 5241 Cfi Intron 3 -1 1 129541046 129541054 TTCTCAGAA 129541041 129541076 GCTGTGACTAGACCAGAGCCTTTCTCAGAATTAGC 21 5241 Cfi Intron 3 -1 1 129541046 129541054 TTCTCAGAA 129541042 129541077 AGCTGTGACTAGACCAGAGCCTTTCTCAGAATTAG 22 5241 Cfi Intron 3 -1 1 129541046 129541054 TTCTCAGAA 129541043 129541078 CAGCTGTGACTAGACCAGAGCCTTTCTCAGAATTA 23 5241 Cfi Intron 3 -1 1 129541046 129541054 TTCTCAGAA 129541045 129541080 CGCAGCTGTGACTAGACCAGAGCCTTTCTCAGAAT 25 4577 Cfi Intron 3 -1 1 129546859 129546867 TTCTTAGAA 129546853 129546888 ACAGTGAGGTCCCTACAGAATTCTTAGAATGGTTG 20 761 Cfi Intron 3 -1 1 129550000 129550008 TTCCTGGAA 129549990 129550025 GTGTGGTGTGACATTTTTCCTGGAAAGACAAAGAT 16 761 Cfi Intron 3 -1 1 129550000 129550008 TTCCTGGAA 129550000 129550035 ACATAACTTTGTGTGGTGTGACATTTTTCCTGGAA 26 3228 Cftr 3' UTR 6 -1 1 18272064 18272072 TTCTGGGAA 18272058 18272093 AGTCTTAAAGATCTGTTGCTTTCTGGGAACATAAG 20 3230 Cftr Exon 6 -1 1 18272064 18272072 TTCTGGGAA 18272058 18272093 AGTCTTAAAGATCTGTTGCTTTCTGGGAACATAAG 20 2424 Cftr Intron 6 1 1 18196405 18196413 TTCTAAGAA 18196399 18196434 AAGCTCTTCTAAGAATCTGAGTTATAAGAGACTTC 6 1853 Cftr Intron 6 1 1 18246893 18246901 TTCCGGGAA 18246880 18246915 CTGCAAGTCCTGTTTCCGGGAAGGATGGGCTCCTT 13 2023 Cftr Intron 6 1 1 18257678 18257686 TTCTTGGAA 18257665 18257700 CTTTTCCTATCTTTTCTTGGAACGTGAGGATTGCA 13 2430 Cftr Intron 6 -1 1 18169330 18169338 TTCTAAGAA 18169323 18169358 TTAGTAAACTATTCTACAGTTCTAAGAAATATATG 19 1852 Cftr Intron 6 -1 1 18246893 18246901 TTCCCGGAA 18246892 18246927 AAGTCAGAAAATAAGGAGCCCATCCTTCCCGGAAA 25 788 Creb5 Intron 6 1 1 53605231 53605239 TTCCTGGAA 53605207 53605242 ACAACAAAATAAAGGGCGTTTGTTTTCCTGGAACC 24 2040 Ctsc Exon 7 1 1 95437362 95437370 TTCTTGGAA 95437347 95437382 GCCCTCCTGAAGGATTTCTTGGAAAACAGTAATCA 15 1587 Ctsc Exon 7 -1 1 95437362 95437370 TTCCAAGAA 95437355 95437390 CACAGTGGTGATTACTGTTTTCCAAGAAATCCTTC 19 793 Ctsc Exon 7 -1 1 95445366 95445374 TTCTAGGAA 95445352 95445387 AACAAAATGTATTTCTAGGAATGAAAGAATGGCTG 12 793 Ctsc Exon 7 -1 1 95445366 95445374 TTCTAGGAA 95445353 95445388 AAACAAAATGTATTTCTAGGAATGAAAGAATGGCT 13 2038 Ctsc Intron 7 1 1 95437362 95437370 TTCTTGGAA 95437347 95437382 GCCCTCCTGAAGGATTTCTTGGAAAACAGTAATCA 15 5276 Ctsc Intron 7 1 1 95440722 95440730 TTCTGAGAA 95440702 95440737 GATTAAGGAAGTTGGCTTCTTTCTGAGAAAGGTAT 20 5276 Ctsc Intron 7 1 1 95440722 95440730 TTCTGAGAA 95440710 95440745 AAGTTGGCTTCTTTCTGAGAAAGGTATGGTAGTCT 12 3973 Ctsc Intron 7 1 1 95454382 95454390 TTCCCAGAA 95454358 95454393 GTCATTGATTAACAGTTGACAATCTTCCCAGAAAG 24 3973 Ctsc Intron 7 1 1 95454382 95454390 TTCCCAGAA 95454366 95454401 TTAACAGTTGACAATCTTCCCAGAAAGTCAGATAC 16 3973 Ctsc Intron 7 1 1 95454382 95454390 TTCCCAGAA 95454380 95454415 TCTTCCCAGAAAGTCAGATACTGAAAGCAAATTCT 2 1586 Ctsc Intron 7 1 1 95456104 95456112 TTCCAAGAA 95456097 95456132 ACCTGGCTTCCAAGAATATGCTTTCAATGAATTCC 7 2458 Ctsc Intron 7 -1 1 95427133 95427141 TTCTAAGAA 95427133 95427168 GGTCAAGAAAGCAGGTCTCCACTCGCTTCTAAGAA 26 1585 Ctsc Intron 7 -1 1 95437362 95437370 TTCCAAGAA 95437355 95437390 CACAGTGGTGATTACTGTTTTCCAAGAAATCCTTC 19 5277 Ctsc Intron 7 -1 1 95440722 95440730 TTCTCAGAA 95440703 95440738 CATACCTTTCTCAGAAAGAAGCCAACTTCCTTAAT 7 1856 Ctsc Intron 7 -1 1 95440985 95440993 TTCCCGGAA 95440968 95441003 TAAGTCTCATTCCCGGAATAGTACTTCAGAGAGGT 9 1856 Ctsc Intron 7 -1 1 95440985 95440993 TTCCCGGAA 95440968 95441003 TAAGTCTCCTTCCCGGAATAGTACTTCAGAGAGGT 9 792 Ctsc Intron 7 -1 1 95445366 95445374 TTCTAGGAA 95445352 95445387 AACAAAATGTATTTCTAGGAATGAAAGAATGGCTG 12 792 Ctsc Intron 7 -1 1 95445366 95445374 TTCTAGGAA 95445353 95445388 AAACAAAATGTATTTCTAGGAATGAAAGAATGGCT 13 2457 Ctsc Intron 7 -1 1 95452665 95452673 TTCTAAGAA 95452651 95452686 GAACATTTACAATTCTAAGAAACTGTTTGAGAACC 12 146 Ctss Intron 3 -1 1 95358087 95358095 TTCCAGGAA 95358085 95358120 GACCTTGACTTACAGAGCCTGTATTTCCAGGAAGC 24 2052 Dcdc2a Intron 13 -1 1 25228047 25228055 TTCTTGGAA 25228031 25228066 GGAAAATCAGTTCTTGGAAGAGGTCACTGAAGCAT 10 2052 Dcdc2a Intron 13 -1 1 25228047 25228055 TTCTTGGAA 25228044 25228079 ATCTGTGTAGGAAGGAAAATCAGTTCTTGGAAGAG 23 113

Table 5: continued

Bin- Gene Sequence Chro- Str. Str. Potential Potential Potential ChIP-seq ChIP-seq ChIP-seq region Position of ding type mo- (input binding binding STAT5 binding region region end potential site some seq.) site start site end site start STAT5 BS* ID (mm9) (mm9) 1601 Ddx60 Exon 8 -1 1 64427696 64427704 TTCCAAGAA 64427680 64427715 AGTGATACATTTCCAAGAAGAGAGAACATGAGGTG 10 5293 Ddx60 Intron 8 -1 1 64475126 64475134 TTCTGAGAA 64475114 64475149 CTAGGCAATTAATTTTCTGAGAAAAGAGGGAGACT 14 4657 Emr1 Intron 17 1 1 57597174 57597182 TTCTTAGAA 57597157 57597192 CTTCTCTATTCCAATCATTCTTAGAATTGGTCTTT 17 5356 Emr1 Intron 17 -1 1 57601044 57601052 TTCTCAGAA 57601026 57601061 CTGCTTATTTCTCAGAATTTCTCATCTCTCTTCTA 8 4036 Fbln1 Intron 15 -1 1 85040022 85040030 TTCCCAGAA 85040021 85040056 GTTCTTGCCCAGAATGTGAGTTTAGTTCCCAGAAC 25 886 Fga Promoter 3 1 1 82829841 82829849 TTCCTGGAA 82829815 82829850 ATCTCCCCAGCTTCCAAGGCCCCCATTTCCTGGAA 26 886 Fga Promoter 3 1 1 82829841 82829849 TTCCTGGAA 82829817 82829852 CTCCCCAGCTTCCAAGGCCCCCATTTCCTGGAATG 24 886 Fga Promoter 3 1 1 82829841 82829849 TTCCTGGAA 82829820 82829855 CCCAGCTTCCAAGGCCCCCATTTCCTGGAATGTAG 21 886 Fga Promoter 3 1 1 82829841 82829849 TTCCTGGAA 82829824 82829859 GCTTCCAAGGCCCCCATTTCCTGGAATGTAGATTC 17 886 Fga Promoter 3 1 1 82829841 82829849 TTCCTGGAA 82829828 82829863 CCAAGGCCCCCATTTCCTGGAATGTAGATTCCCCC 13 886 Fga Promoter 3 1 1 82829841 82829849 TTCCTGGAA 82829829 82829864 CAAGGCCCCCATTTCCTGGAATGTAGATTCCCCCC 12 886 Fga Promoter 3 1 1 82829841 82829849 TTCCTGGAA 82829834 82829869 CCCCCATTTCCTGGAATGTAGATTCCCCCCACCCC 7 886 Fga Promoter 3 1 1 82829841 82829849 TTCCTGGAA 82829837 82829872 CCATTTCCTGGAATGTAGATTCCCCCCACCCCCCA 4 192 Fga Promoter 3 -1 1 82829841 82829849 TTCCAGGAA 82829818 82829853 ACATTCCAGGAAATGGGGGCCTTGGAAGCTGGGGA 3 192 Fga Promoter 3 -1 1 82829841 82829849 TTCCAGGAA 82829828 82829863 GGGGGAATCTACATTCCAGGAAATGGGGGCCTTGG 13 192 Fga Promoter 3 -1 1 82829841 82829849 TTCCAGGAA 82829832 82829867 GGTGGGGGGAATCTACATTCCAGGAAATGGGGGCC 17 192 Fga Promoter 3 -1 1 82829841 82829849 TTCCAGGAA 82829836 82829871 GGGGGGTGGGGGGAATCTACATTCCAGGAAATGGG 21 5393 Fyb 3' UTR 15 -1 1 6610802 6610810 TTCTGAGAA 6610801 6610836 ACCAAATCGTTTTCTGTCTTTGCATTTCTGAGAAA 25 5391 Fyb Exon 15 -1 1 6610802 6610810 TTCTGAGAA 6610801 6610836 ACCAAATCGTTTTCTGTCTTTGCATTTCTGAGAAA 25 5387 Fyb Intron 15 -1 1 6610802 6610810 TTCTGAGAA 6610801 6610836 ACCAAATCGTTTTCTGTCTTTGCATTTCTGAGAAA 25 3330 Gbp1 Intron 3 -1 1 142263701 142263709 TTCTGGGAA 142263688 142263723 GAAGTTTAAATTTTTCTGGGAAATGAGGCAAACTC 13 4685 Gbp5 Intron 3 -1 1 142169549 142169557 TTCTTAGAA 142169545 142169580 TCAAAAGAATCTACATTTTATTTTCTTAGAAACCT 22 5403 Gbp5 Promoter 3 -1 1 142159846 142159854 TTCTGAGAA 142159825 142159860 CTTGGTTCTGAGAAATAACGGCCAGCCCTGGGAAT 5 5416 Gmfg Exon 7 -1 1 29231697 29231705 TTCTGAGAA 29231680 29231715 TCCCTTGGTTTCTGAGAAACAGGTGTCTGTTCTCC 9 5416 Gmfg Exon 7 -1 1 29231697 29231705 TTCTGAGAA 29231683 29231718 CCCTCCCTTGGTTTCTGAGAAACAGGTGTCTGTTC 12 5413 Gmfg Intron 7 1 1 29222671 29222679 TTCTGAGAA 29222650 29222685 GGAGACAAGGCTCAGACGGAATTCTGAGAAATGAG 21 5414 Gmfg Intron 7 -1 1 29222671 29222679 TTCTCAGAA 29222670 29222705 CTTTTCTCAACTCATGTGCCCTCATTTCTCAGAAT 25 3336 Gmfg Promoter 7 -1 1 29222074 29222082 TTCTCGGAA 29222059 29222094 ATTTCTTGGAGTTCTCGGAACAAGCCACGTTGCTG 11 3336 Gmfg Promoter 7 -1 1 29222074 29222082 TTCTCGGAA 29222062 29222097 TGTATTTCTTGGAGTTCTCGGAACAAGCCACGTTG 14 3336 Gmfg Promoter 7 -1 1 29222074 29222082 TTCTCGGAA 29222066 29222101 TGACTGTATTTCTTGGAGTTCTCGGAACAAGCCAC 18 3339 Gsdmd Intron 15 -1 1 75693021 75693029 TTCTGGGAA 75693021 75693056 AAACTGTCTTCAAGGTATAGTGGGGCTTCTGGGAA 26 1881 Icam1 Promoter 9 1 1 20820342 20820350 TTCCCGGAA 20820329 20820364 GAAGGCGCGAGGTTTCCCGGAAAGTGGCCCCGACA 13 1880 Icam1 Promoter 9 -1 1 20820342 20820350 TTCCGGGAA 20820319 20820354 ACTTTCCGGGAAACCTCGCGCCTTCCCCTCCGGAA 3 1880 Icam1 Promoter 9 -1 1 20820342 20820350 TTCCGGGAA 20820326 20820361 CGGGGCCACTTTCCGGGAAACCTCGCGCCTTCCCC 10 1880 Icam1 Promoter 9 -1 1 20820342 20820350 TTCCGGGAA 20820330 20820365 CTGTCGGGGCCACTTTCCGGGAAACCTCGCGCCTT 14 3365 Ifit3 Exon 19 -1 1 34662705 34662713 TTCTGGGAA 34662694 34662729 CGATTGACCTTTCATTTCTGGGAAATTGCAACAAA 15 5497 Ifngr1 Intron 10 1 1 19314207 19314215 TTCTCAGAA 19314199 19314234 TGGAGGAATTCTCAGAAGGAGAAGCACACTGTTCC 8 5497 Ifngr1 Intron 10 1 1 19314207 19314215 TTCTCAGAA 19314200 19314235 GGAGGAATTCTCAGAAGGAGAAGCACACTGTTCCT 7 2100 Igdcc3 Intron 9 1 1 64994323 64994331 TTCTTGGAA 64994300 64994335 GTGTGTAGTCCCAGGCTTTTTATTTCTTGGAAGCC 23 216 Ikzf1 Intron 11 -1 1 11665203 11665211 TTCCAGGAA 11665190 11665225 AAAAGAAACTTAATTCCAGGAAGAGGAGCAATGAA 13 1887 Il17rc Exon 6 1 1 113428903 113428911 TTCCGGGAA 113428898 113428933 GCCCCTTCCGGGAAGGTGAGCTAGTCTCCTAAGGC 5 114

Table 5: continued

Bin- Gene Sequence Chro- Str. Str. Potential Potential Potential ChIP-seq ChIP-seq ChIP-seq region Position of ding type mo- (input binding binding STAT5 binding region region end potential site some seq.) site start site end site start STAT5 BS* ID (mm9) (mm9) 4093 Il17rc Intron 6 -1 1 113430554 113430562 TTCCCAGAA 113430537 113430572 CGCACTGTATTCCCAGAAGCAGTACTGGCCCTGGC 9 5533 Inpp5d Intron 1 -1 1 89564180 89564188 TTCTGAGAA 89564176 89564211 AGTACCTCCTAGGGTCAGTTGGTTCTGAGAAGCAC 22 958 Irf1 3' UTR 11 1 1 53591281 53591289 TTCCTGGAA 53591267 53591302 GTGTATCCGTGGCTTTCCTGGAACTCACTGTGTAG 14 957 Irf1 Exon 11 1 1 53591281 53591289 TTCCTGGAA 53591267 53591302 GTGTATCCGTGGCTTTCCTGGAACTCACTGTGTAG 14 6088 Irf1 Promoter 11 -1 1 53583426 53583434 TTCGGGGAA 53583411 53583446 GGCCTCATCATTTCGGGGAAATCAGGCTGTTGTAG 11 6088 Irf1 Promoter 11 -1 1 53583426 53583434 TTCGGGGAA 53583424 53583459 CATTGGCCCACTCGGCCTCATCATTTCGGGGAAAT 24 3399 Irf8 Promoter 8 1 1 123260087 123260095 TTCTCGGAA 123260061 123260096 CGGGGTCGGGGACGTGCAAAAGTGATTTCTCGGAA 26 3399 Irf8 Promoter 8 1 1 123260087 123260095 TTCTCGGAA 123260064 123260099 GGTCGGGGACGTGCAAAAGTGATTTCTCGGAAAGA 23 3399 Irf8 Promoter 8 1 1 123260087 123260095 TTCTCGGAA 123260068 123260103 GGGGACGTGCAAAAGTGATTTCTCGGAAAGAGAGC 19 3399 Irf8 Promoter 8 1 1 123260087 123260095 TTCTCGGAA 123260081 123260116 AGTGATTTCTCGGAAAGAGAGCGCTTCAGAGAAGG 6 4114 Irf8 Promoter 8 -1 1 123260087 123260095 TTCCGAGAA 123260075 123260110 CTGAAGCGCTCTCTTTCCGAGAAATCACTTTTGCA 14 970 Irf9 5' UTR 14 1 1 56222961 56222969 TTCTAGGAA 56222954 56222989 ACGTGGTTTCTAGGAAATGCACCTCCCCGGAGGAG 7 2665 Irf9 5' UTR 14 -1 1 56222961 56222969 TTCCTAGAA 56222935 56222970 TTCCTAGAAACCACGTGGTCTGAGTTGCAGGGCAA 0 2665 Irf9 5' UTR 14 -1 1 56222961 56222969 TTCCTAGAA 56222939 56222974 GCATTTCCTAGAAACCACGTGGTCTGAGTTGCAGG 4 2665 Irf9 5' UTR 14 -1 1 56222961 56222969 TTCCTAGAA 56222946 56222981 GGGAGGTGCATTTCCTAGAAACCACGTGGTCTGAG 11 2665 Irf9 5' UTR 14 -1 1 56222961 56222969 TTCCTAGAA 56222948 56222983 CGGGGAGGTGCATTTCCTAGAAACCACGTGGTCTG 13 2665 Irf9 5' UTR 14 -1 1 56222961 56222969 TTCCTAGAA 56222949 56222984 CCGGGGAGGTGCATTTCCTAGAAACCACGTGGTCT 14 2665 Irf9 5' UTR 14 -1 1 56222961 56222969 TTCCTAGAA 56222950 56222985 TCCGGGGAGGTGCATTTCCTAGAAACCACGTGGTC 15 2665 Irf9 5' UTR 14 -1 1 56222961 56222969 TTCCTAGAA 56222951 56222986 CTCCGGGGAGGTGCATTTCCTAGAAACCACGTGGT 16 2665 Irf9 5' UTR 14 -1 1 56222961 56222969 TTCCTAGAA 56222952 56222987 CCTCCGGGGAGGTGCATTTCCTAGAAACCACGTGG 17 2665 Irf9 5' UTR 14 -1 1 56222961 56222969 TTCCTAGAA 56222952 56222987 CCTCCGTGGAGGTGCATTTCCTAGAAACCACGTGG 17 2665 Irf9 5' UTR 14 -1 1 56222961 56222969 TTCCTAGAA 56222952 56222987 CCTCCGGGGAGGTGCATTTCCTAGAAACCACGTGG 17 2665 Irf9 5' UTR 14 -1 1 56222961 56222969 TTCCTAGAA 56222952 56222987 CCTCCGTGGAGGTGCATTTCCTAGAAACCACGTGG 17 2665 Irf9 5' UTR 14 -1 1 56222961 56222969 TTCCTAGAA 56222953 56222988 TCCTCCGGGGAGGTGCATTTCCTAGAAACCACGTG 18 2665 Irf9 5' UTR 14 -1 1 56222961 56222969 TTCCTAGAA 56222956 56222991 TTCTCCTCCGGGGAGGTGCATTTCCTAGAAACCAC 21 969 Irf9 Exon 14 1 1 56222961 56222969 TTCTAGGAA 56222954 56222989 ACGTGGTTTCTAGGAAATGCACCTCCCCGGAGGAG 7 2664 Irf9 Exon 14 -1 1 56222961 56222969 TTCCTAGAA 56222935 56222970 TTCCTAGAAACCACGTGGTCTGAGTTGCAGGGCAA 0 2664 Irf9 Exon 14 -1 1 56222961 56222969 TTCCTAGAA 56222939 56222974 GCATTTCCTAGAAACCACGTGGTCTGAGTTGCAGG 4 2664 Irf9 Exon 14 -1 1 56222961 56222969 TTCCTAGAA 56222946 56222981 GGGAGGTGCATTTCCTAGAAACCACGTGGTCTGAG 11 2664 Irf9 Exon 14 -1 1 56222961 56222969 TTCCTAGAA 56222948 56222983 CGGGGAGGTGCATTTCCTAGAAACCACGTGGTCTG 13 2664 Irf9 Exon 14 -1 1 56222961 56222969 TTCCTAGAA 56222949 56222984 CCGGGGAGGTGCATTTCCTAGAAACCACGTGGTCT 14 2664 Irf9 Exon 14 -1 1 56222961 56222969 TTCCTAGAA 56222950 56222985 TCCGGGGAGGTGCATTTCCTAGAAACCACGTGGTC 15 2664 Irf9 Exon 14 -1 1 56222961 56222969 TTCCTAGAA 56222951 56222986 CTCCGGGGAGGTGCATTTCCTAGAAACCACGTGGT 16 2664 Irf9 Exon 14 -1 1 56222961 56222969 TTCCTAGAA 56222952 56222987 CCTCCGGGGAGGTGCATTTCCTAGAAACCACGTGG 17 2664 Irf9 Exon 14 -1 1 56222961 56222969 TTCCTAGAA 56222952 56222987 CCTCCGTGGAGGTGCATTTCCTAGAAACCACGTGG 17 2664 Irf9 Exon 14 -1 1 56222961 56222969 TTCCTAGAA 56222952 56222987 CCTCCGGGGAGGTGCATTTCCTAGAAACCACGTGG 17 2664 Irf9 Exon 14 -1 1 56222961 56222969 TTCCTAGAA 56222952 56222987 CCTCCGTGGAGGTGCATTTCCTAGAAACCACGTGG 17 2664 Irf9 Exon 14 -1 1 56222961 56222969 TTCCTAGAA 56222953 56222988 TCCTCCGGGGAGGTGCATTTCCTAGAAACCACGTG 18 2664 Irf9 Exon 14 -1 1 56222961 56222969 TTCCTAGAA 56222956 56222991 TTCTCCTCCGGGGAGGTGCATTTCCTAGAAACCAC 21 972 Irf9 Promoter 14 1 1 56222961 56222969 TTCTAGGAA 56222954 56222989 ACGTGGTTTCTAGGAAATGCACCTCCCCGGAGGAG 7 2667 Irf9 Promoter 14 -1 1 56222961 56222969 TTCCTAGAA 56222935 56222970 TTCCTAGAAACCACGTGGTCTGAGTTGCAGGGCAA 0 115

Table 5: continued

Bin- Gene Sequence Chro- Str. Str. Potential Potential Potential ChIP-seq ChIP-seq ChIP-seq region Position of ding type mo- (input binding binding STAT5 binding region region end potential site some seq.) site start site end site start STAT5 BS* ID (mm9) (mm9) 2667 Irf9 Promoter 14 -1 1 56222961 56222969 TTCCTAGAA 56222939 56222974 GCATTTCCTAGAAACCACGTGGTCTGAGTTGCAGG 4 2667 Irf9 Promoter 14 -1 1 56222961 56222969 TTCCTAGAA 56222946 56222981 GGGAGGTGCATTTCCTAGAAACCACGTGGTCTGAG 11 2667 Irf9 Promoter 14 -1 1 56222961 56222969 TTCCTAGAA 56222948 56222983 CGGGGAGGTGCATTTCCTAGAAACCACGTGGTCTG 13 2667 Irf9 Promoter 14 -1 1 56222961 56222969 TTCCTAGAA 56222949 56222984 CCGGGGAGGTGCATTTCCTAGAAACCACGTGGTCT 14 2667 Irf9 Promoter 14 -1 1 56222961 56222969 TTCCTAGAA 56222950 56222985 TCCGGGGAGGTGCATTTCCTAGAAACCACGTGGTC 15 2667 Irf9 Promoter 14 -1 1 56222961 56222969 TTCCTAGAA 56222951 56222986 CTCCGGGGAGGTGCATTTCCTAGAAACCACGTGGT 16 2667 Irf9 Promoter 14 -1 1 56222961 56222969 TTCCTAGAA 56222952 56222987 CCTCCGGGGAGGTGCATTTCCTAGAAACCACGTGG 17 2667 Irf9 Promoter 14 -1 1 56222961 56222969 TTCCTAGAA 56222952 56222987 CCTCCGTGGAGGTGCATTTCCTAGAAACCACGTGG 17 2667 Irf9 Promoter 14 -1 1 56222961 56222969 TTCCTAGAA 56222952 56222987 CCTCCGGGGAGGTGCATTTCCTAGAAACCACGTGG 17 2667 Irf9 Promoter 14 -1 1 56222961 56222969 TTCCTAGAA 56222952 56222987 CCTCCGTGGAGGTGCATTTCCTAGAAACCACGTGG 17 2667 Irf9 Promoter 14 -1 1 56222961 56222969 TTCCTAGAA 56222953 56222988 TCCTCCGGGGAGGTGCATTTCCTAGAAACCACGTG 18 2667 Irf9 Promoter 14 -1 1 56222961 56222969 TTCCTAGAA 56222956 56222991 TTCTCCTCCGGGGAGGTGCATTTCCTAGAAACCAC 21 4116 Irg1 Promoter 14 1 1 103445775 103445783 TTCCCAGAA 103445772 103445807 TCATTCCCAGAAGCTGTTTCATGGTTGGTTGGTGA 3 236 Itga4 Intron 2 -1 1 79109494 79109502 TTCCAGGAA 79109491 79109526 TATGTGTTACTAGGCAAAATTAGTTCCAGGAATAG 23 1891 Itgb2 Intron 10 1 1 77025018 77025026 TTCCGGGAA 77025018 77025053 TTCCGGGAAACCACTAGAAATTTCCAGAAAGTGGC 0 1901 Klf6 3' UTR 13 1 1 5868899 5868907 TTCCCGGAA 5868889 5868924 AAGGAGTACTTTCCCGGAATTTGGCATACCACAGC 10 1899 Klf6 Exon 13 1 1 5868899 5868907 TTCCCGGAA 5868889 5868924 AAGGAGTACTTTCCCGGAATTTGGCATACCACAGC 10 5547 Klf6 Intron 13 -1 1 5865427 5865435 TTCTGAGAA 5865416 5865451 ACTGAAATCATGCGTTTCTGAGAAAAAACCTCATT 15 5582 Lcp1 Intron 14 1 1 75550370 75550378 TTCTCAGAA 75550347 75550382 GAAAGAGTGGAGAGTAGGGGTTATTCTCAGAATTC 23 1019 Lcp1 Intron 14 1 1 75578779 75578787 TTCCTGGAA 75578771 75578806 CACATGACTTCCTGGAATCCTCTCCAGTGCCTTGA 8 5574 Lcp1 Intron 14 1 1 75601325 75601333 TTCTGAGAA 75601315 75601350 CCACTCCCTCTTCTGAGAACCTGAGCCTTGAGGGT 10 271 Lcp1 Intron 14 -1 1 75578779 75578787 TTCCAGGAA 75578769 75578804 AAGGCACTGGAGAGGATTCCAGGAAGTCATGTGGT 16 1018 Lcp1 Intron 14 -1 1 75585331 75585339 TTCCTGGAA 75585307 75585342 ATTTCCTGGAACTGGAGTTAAGGGTTCTGAGCTGC 2 1018 Lcp1 Intron 14 -1 1 75585331 75585339 TTCCTGGAA 75585325 75585360 GGCCAGAGAGGTCATCAGATTTCCTGGAACTGGAG 20 4168 Loxl2 3' UTR 14 -1 1 70093392 70093400 TTCCCAGAA 70093372 70093407 GCTGTGTTCCCAGAAGAAAACGGCCCCTCTGTGCA 6 4167 Loxl2 Exon 14 -1 1 70093392 70093400 TTCCCAGAA 70093372 70093407 GCTGTGTTCCCAGAAGAAAACGGCCCCTCTGTGCA 6 4749 Loxl2 Intron 14 1 1 70026345 70026353 TTCTTAGAA 70026327 70026362 AAACGAATGAGTGGTTTGTTCTTAGAATTTTCCAT 18 4749 Loxl2 Intron 14 1 1 70026345 70026353 TTCTTAGAA 70026330 70026365 CGAATGAGTGGTTTGTTCTTAGAATTTTCCATCCA 15 4749 Loxl2 Intron 14 1 1 70026345 70026353 TTCTTAGAA 70026333 70026368 ATGAGTGGTTTGTTCTTAGAATTTTCCATCCAATG 12 2711 Loxl2 Intron 14 -1 1 70026345 70026353 TTCTAAGAA 70026332 70026367 ATTGGATGGAAAATTCTAAGAACAAACCACTCATT 13 2711 Loxl2 Intron 14 -1 1 70026345 70026353 TTCTAAGAA 70026341 70026376 GAGAAAAACATTGGATGGAAAATTCTAAGAACAAA 22 2711 Loxl2 Intron 14 -1 1 70026345 70026353 TTCTAAGAA 70026343 70026378 ATGAGAAAAACATTGGATGGAAAATTCTAAGAACA 24 2712 Loxl2 Intron 14 -1 1 70028698 70028706 TTCTAAGAA 70028674 70028709 TATTCTAAGAAAATAATTGATCTGTGAACTTGCTC 2 2712 Loxl2 Intron 14 -1 1 70028698 70028706 TTCTAAGAA 70028675 70028710 GTATTCTAAGAAAATAATTGATCTGTGAACTTGCT 3 2712 Loxl2 Intron 14 -1 1 70028698 70028706 TTCTAAGAA 70028682 70028717 AAAAAAGGTATTCTAAGAAAATAATTGATCTGTGA 10 2712 Loxl2 Intron 14 -1 1 70028698 70028706 TTCTAAGAA 70028684 70028719 AAAAAAAAGGTATTCTAAGAAAATAATTGATCTGT 12 2712 Loxl2 Intron 14 -1 1 70028698 70028706 TTCTAAGAA 70028685 70028720 AAAAAAAAAGGTATTCTAAGAAAATAATTGATCTG 13 2712 Loxl2 Intron 14 -1 1 70028698 70028706 TTCTAAGAA 70028692 70028727 GTAGGCCAAAAAAAAAGGTATTCTAAGAAAATAAT 20 2712 Loxl2 Intron 14 -1 1 70028698 70028706 TTCTAAGAA 70028694 70028729 TAGTAGGCCAAAAAAAAAGGTATTCTAAGAAAATA 22 2719 Lrrc17 Intron 5 1 1 21073790 21073798 TTCTAAGAA 21073777 21073812 TCACAGTTCCTGGTTCTAAGAATCTCTGCTTCCTC 13 2719 Lrrc17 Intron 5 1 1 21073790 21073798 TTCTAAGAA 21073777 21073812 TTACAGTTCCTGGTTCTAAGAATCTCTGCTTCCTC 13 116

Table 5: continued

Bin- Gene Sequence Chro- Str. Str. Potential Potential Potential ChIP-seq ChIP-seq ChIP-seq region Position of ding type mo- (input binding binding STAT5 binding region region end potential site some seq.) site start site end site start STAT5 BS* ID (mm9) (mm9) 2738 Ly86 Intron 13 1 1 37473789 37473797 TTCCTAGAA 37473766 37473801 CAGGAAGTGATTCTGGACAGCAATTCCTAGAACAT 23 2738 Ly86 Intron 13 1 1 37473789 37473797 TTCCTAGAA 37473766 37473801 CAGGAAGTGATTCTGGACAGCCATTCCTAGAACAT 23 2130 Ly86 Intron 13 -1 1 37481260 37481268 TTCTTGGAA 37481247 37481282 CACACCACTGTTTTTCTTGGAATTCTCTTAGATTT 13 4173 Lyl1 Promoter 8 -1 1 87225038 87225046 TTCCCAGAA 87225032 87225067 TGTCAAAGCCATATCTGAGTTTCCCAGAAGTTGCT 20 1923 Maff 5' UTR 15 1 1 79177977 79177985 TTCCGGGAA 79177972 79178007 AGTAATTCCGGGAAGCTCGCCTTACAACTGCGCGC 5 1922 Maff 5' UTR 15 -1 1 79177977 79177985 TTCCCGGAA 79177966 79178001 AGTTGTAAGGCGAGCTTCCCGGAATTACTCACGCA 15 1922 Maff 5' UTR 15 -1 1 79177977 79177985 TTCCCGGAA 79177967 79178002 CAGTTGTAAGGCGAGCTTCCCGGAATTACTCACGC 16 1921 Maff Exon 15 1 1 79177977 79177985 TTCCGGGAA 79177972 79178007 AGTAATTCCGGGAAGCTCGCCTTACAACTGCGCGC 5 1920 Maff Exon 15 -1 1 79177977 79177985 TTCCCGGAA 79177966 79178001 AGTTGTAAGGCGAGCTTCCCGGAATTACTCACGCA 15 1920 Maff Exon 15 -1 1 79177977 79177985 TTCCCGGAA 79177967 79178002 CAGTTGTAAGGCGAGCTTCCCGGAATTACTCACGC 16 1925 Maff Promoter 15 1 1 79177977 79177985 TTCCGGGAA 79177972 79178007 AGTAATTCCGGGAAGCTCGCCTTACAACTGCGCGC 5 1924 Maff Promoter 15 -1 1 79177977 79177985 TTCCCGGAA 79177966 79178001 AGTTGTAAGGCGAGCTTCCCGGAATTACTCACGCA 15 1924 Maff Promoter 15 -1 1 79177977 79177985 TTCCCGGAA 79177967 79178002 CAGTTGTAAGGCGAGCTTCCCGGAATTACTCACGC 16 2745 Map4k1 Intron 7 1 1 29774604 29774612 TTCTAAGAA 29774593 29774628 GCCTGGCATGGTTCTAAGAAGGAGGGTGTGAGGTG 11 1090 Mical1 Exon 10 -1 1 41202076 41202084 TTCCTGGAA 41202068 41202103 TGTTCACCTGTGGTTGTCTTCCTGGAATCCGGCTC 18 3473 Mmp14 Exon 14 -1 1 55055546 55055554 TTCTCGGAA 55055544 55055579 CTCCCGGATGTAGGCATAGGGCACTTCTCGGAAGC 24 5641 Mmp2 Intron 8 1 1 95365665 95365673 TTCTCAGAA 95365646 95365681 AGTTAATATCACATCACTGTTCTCAGAAGAAGTAC 19 327 Mrc1 Intron 2 -1 1 14213159 14213167 TTCCAGGAA 14213138 14213173 ACCCATTCCAGGAAGGATAGCCTAGACCACTAAAG 5 1112 Ms4a4b Exon 19 1 1 11529521 11529529 TTCCTGGAA 11529497 11529532 GAGTTAAACATTGACACAACTGATTTCCTGGAAAG 24 331 Ms4a4b Exon 19 -1 1 11529521 11529529 TTCCAGGAA 11529520 11529555 CCCTATGATTGATTAGCAATGAGCTTTCCAGGAAA 25 1114 Ms4a4b Exon 19 -1 1 11537546 11537554 TTCCTGGAA 11537521 11537556 TTTCCTGGAACATTGGTCCCTTGGTGTTCTGATGG 1 1114 Ms4a4b Exon 19 -1 1 11537546 11537554 TTCCTGGAA 11537537 11537572 GTGGTTCTTGTACACATTTCCTGGAACATTGGTCC 17 1111 Ms4a4b Intron 19 1 1 11529521 11529529 TTCCTGGAA 11529497 11529532 GAGTTAAACATTGACACAACTGATTTCCTGGAAAG 24 330 Ms4a4b Intron 19 -1 1 11529521 11529529 TTCCAGGAA 11529520 11529555 CCCTATGATTGATTAGCAATGAGCTTTCCAGGAAA 25 4791 Ncapg Intron 5 1 1 46082712 46082720 TTCTTAGAA 46082700 46082735 CTTTAATTTTAATTCTTAGAAATGTTCACACTTAA 12 4792 Ncapg Promoter 5 1 1 46082712 46082720 TTCTTAGAA 46082700 46082735 CTTTAATTTTAATTCTTAGAAATGTTCACACTTAA 12 1132 Ncf2 Intron 1 -1 1 154655668 154655676 TTCCTGGAA 154655648 154655683 CGTAAGTTCCTGGAAGCCATAATAGGCATTTTACA 6 1700 Nckap1l Intron 15 1 1 103314156 103314164 TTCCAAGAA 103314147 103314182 CCTCCTGCTTTCCAAGAAGCCCATTTGGTGTTTCT 9 2153 Nckap1l Intron 15 -1 1 103314156 103314164 TTCTTGGAA 103314135 103314170 TGGGCTTCTTGGAAAGCAGGAGGATTTAGGCCAAG 5 4222 Oas2 Intron 5 -1 -1 121197194 121197202 TTCCCAGAA 121197185 121197220 CATTCCCAGAAAGGTGGCATCCCTGGGCCCTTCCC 2 1193 Plaur Intron 7 1 1 25260127 25260135 TTCCTGGAA 25260101 25260136 GTGTGAGTCTGACGCACTGCCAGGGGTTCCTGGAA 26 1193 Plaur Intron 7 1 1 25260127 25260135 TTCCTGGAA 25260101 25260136 GTGTGAGTCTGAGGCACTGCCAGGGGTTCCTGGAA 26 1193 Plaur Intron 7 1 1 25260127 25260135 TTCCTGGAA 25260103 25260138 GTGAGTCTGAGGCACTGCCAGGGGTTCCTGGAACA 24 394 Plaur Intron 7 -1 1 25260127 25260135 TTCCAGGAA 25260109 25260144 AAAGTGTGTTCCAGGAACCCCTGGCAGTGCCTCAG 8 394 Plaur Intron 7 -1 1 25260127 25260135 TTCCAGGAA 25260115 25260150 NTTCCCAAAGTGTGTTCCAGGAACCCCTGGCAGTG 14 394 Plaur Intron 7 -1 1 25260127 25260135 TTCCAGGAA 25260119 25260154 AGGGCTTCCCAAAGTGTGTTCCAGGAACCCCTGGC 18 394 Plaur Intron 7 -1 1 25260127 25260135 TTCCAGGAA 25260123 25260158 GAGCAGGGCTTCCCAAAGTGTGTTCCAGGAACCCC 22 1194 Plaur Promoter 7 1 1 25246888 25246896 TTCCTGGAA 25246863 25246898 CCTTGTGTTTCCTGCTGTGTTCCAGTTCCTGGAAC 25 4256 Pou2af1 Intron 9 -1 1 51030325 51030333 TTCCCAGAA 51030305 51030340 CAGCTTTTCCCAGAAAGTAGTACTAATGAAACCAT 6 4257 Pou2af1 Intron 9 -1 1 51030809 51030817 TTCCCAGAA 51030791 51030826 CAGTATATTTCCCAGAATCTAAGCAATGAACTTGA 8 3582 Prkcb Intron 7 1 1 129495227 129495235 TTCTGGGAA 129495221 129495256 CTCCATTTCTGGGAAGGAAGCTCTTTGGAGGACTT 6 117

Table 5: continued

Bin- Gene Sequence Chro- Str. Str. Potential Potential Potential ChIP-seq ChIP-seq ChIP-seq region Position of ding type mo- (input binding binding STAT5 binding region region end potential site some seq.) site start site end site start STAT5 BS* ID (mm9) (mm9) 4274 Prkcb Intron 7 1 1 129663907 129663915 TTCCCAGAA 129663895 129663930 CCTTAACTCAGTTTCCCAGAAGTTCACATCTTTGC 12 4274 Prkcb Intron 7 1 1 129663907 129663915 TTCCCAGAA 129663896 129663931 CTTAACTCAGTTTCCCAGAAGTTCACATCTTTGCA 11 1223 Prkcb Intron 7 -1 1 129710708 129710716 TTCTAGGAA 129710685 129710720 CGATTCTAGGAAGTAGACACTGCCGTGCATCTAGC 3 6102 Prune2 Intron 19 1 1 17282643 17282651 TTCGGGGAA 17282617 17282652 ATATAGAAAAGAACAGCAAGAAGGCTTTCGGGGAA 26 2859 Prune2 Intron 19 -1 1 17032425 17032433 TTCTAAGAA 17032404 17032439 GTCTATTCTAAGAAAAGACCAATTTAGTTATAATT 5 1240 Prune2 Intron 19 -1 1 17055160 17055168 TTCTAGGAA 17055142 17055177 TCTGTTCTTTCTAGGAACTGACAAAGAAATTGGAT 8 5772 Ptpre Intron 7 -1 1 142846266 142846274 TTCTGAGAA 142846253 142846288 CTTGAGTTAAGAGTTCTGAGAATGACTATGGAGAC 13 3618 Rnf213 Intron 11 -1 1 119316559 119316567 TTCTGGGAA 119316549 119316584 TGGTTGCCTCTTAGATTTCTGGGAACAGGATAGGC 16 2934 Serpina3f Promoter 12 1 1 105452633 105452641 TTCCTAGAA 105452621 105452656 TATCAGTTCTATTTCCTAGAAATCACCCATTTCCC 12 5006 Serpina3g Intron 12 1 1 105476721 105476729 TTCGTGGAA 105476714 105476749 GTTATAATTCGTGGAACACACACATTACCCATGCC 7 3661 Slc11a1 Intron 1 -1 1 74428204 74428212 TTCTGGGAA 74428194 74428229 CTTCTCTGAGTCACCGTTCTGGGAAAGTCCCCTAT 16 3665 Slc11a1 Promoter 1 -1 1 74428204 74428212 TTCTGGGAA 74428194 74428229 CTTCTCTGAGTCACCGTTCTGGGAAAGTCCCCTAT 16 3666 Slc25a24 Intron 3 -1 1 108936457 108936465 TTCTGGGAA 108936435 108936470 ACTGTTCTGGGAATGCACACAAACCAGTGTGCTAC 4 1382 Socs3 Intron 11 1 -1 117830544 117830552 TTCCTGGAA 117830530 117830565 GGCCAGTACGCCCCGCCCCCCGATTCCTGGAACTG 23 1382 Socs3 Intron 11 1 -1 117830544 117830552 TTCCTGGAA 117830530 117830565 GGCCAGGACGCCCCGCCCCCCGATTCCTGGAACTG 23 1382 Socs3 Intron 11 1 -1 117830544 117830552 TTCCTGGAA 117830530 117830565 GGCCAGTACGCCCCGCCCCCCGATTCCTGGAACTG 23 1382 Socs3 Intron 11 1 -1 117830544 117830552 TTCCTGGAA 117830531 117830566 GCCAGTACGCCCCGCCCCCCGATTCCTGGAACTGC 22 1382 Socs3 Intron 11 1 -1 117830544 117830552 TTCCTGGAA 117830532 117830567 CCAGTACGCCCCGCCCCCCGATTCCTGGAACCGCC 21 1382 Socs3 Intron 11 1 -1 117830544 117830552 TTCCTGGAA 117830532 117830567 CCAGTACGCCCCGCCCCCCGATTCCTGGAACTGCC 21 1382 Socs3 Intron 11 1 -1 117830544 117830552 TTCCTGGAA 117830533 117830568 CAGTACGCCCCGCCCCCCGATTCCTGGAACTGCCC 20 1382 Socs3 Intron 11 1 -1 117830544 117830552 TTCCTGGAA 117830534 117830569 AGTACGCCCCGCCCCCCGATTCCTGGAACTGCCCG 19 1382 Socs3 Intron 11 1 -1 117830544 117830552 TTCCTGGAA 117830541 117830576 CCCGCCCCCCGATTCCTGGAACTGCCCGGCCGGTC 12 519 Socs3 Intron 11 -1 -1 117830544 117830552 TTCCAGGAA 117830536 117830571 GCCGGGCAGTTCCAGGAATCGGGGGGCGGGGCGTA 9 2988 Stat2 Intron 10 -1 1 127708699 127708707 TTCTAAGAA 127708690 127708725 TTATATGCCTGTATTGCTTCTAAGAAACAAAAACT 17 4393 Stat2 Intron 10 -1 1 127724232 127724240 TTCCCAGAA 127724221 127724256 AGGACTGGGGTTTGGTTCCCAGAACCTGCATGGAG 15 534 Stk10 Intron 11 -1 1 32485939 32485947 TTCCAGGAA 32485916 32485951 GAATTCCAGGAATGGAGCACGGGTGGCCAGGCTTG 3 5924 Syt13 Intron 2 -1 1 92783873 92783881 TTCTCAGAA 92783859 92783894 ATCCAGGATAAGTTCTCAGAAATGTTTCTGACCTT 12 1406 Syt6 Intron 3 1 1 103423986 103423994 TTCTAGGAA 103423981 103424016 TGGCTTTCTAGGAAGGGACACTTTCTGCTCACAAG 5 1425 Trem2 Exon 17 1 1 48492268 48492276 TTCCTGGAA 48492262 48492297 CAGGTATTCCTGGAAGGCAGGCTGGGGCTGAAAAA 6 3744 Trpv2 Intron 11 -1 1 62392552 62392560 TTCTGGGAA 62392533 62392568 ATGAGGATTCTGGGAAGTCAGGTTCAGTACCCTAG 7 557 Ugt2b35 Exon 5 1 1 87430334 87430342 TTCCAGGAA 87430308 87430343 TTTCAAACAAACAGCTCATGACAAAATTCCAGGAA 26 1437 Ugt2b35 Exon 5 -1 1 87430334 87430342 TTCCTGGAA 87430320 87430355 ATCAAACTTGGATTCCTGGAATTTTGTCATGAGCT 12 2268 Vav1 Intron 17 1 1 57420534 57420542 TTCTTGGAA 57420523 57420558 TCTCTCTGACTTTCTTGGAAATCCTACATCTGGTG 11

*in ChIP-seq region

118

APPENDIX 6: POTENTIAL STAT5 BINDING SITES (LONG MOTIF) VALIDATED AGAINST CHIP-SEQ DATA SET WITH

GEO ACCESSION NUMBER GSM784027

Potential STAT5 binding sites that were found by FIMO on the opposite strand compared to the input sequence were assigned the sequence type (promoter, exon, intron, 3’ UTR, 5’ UTR) and gene name of the input sequence.

Table 6

Potential STAT5 binding sites (long motif) validated against ChIP-seq data set with GEO accession number GSM784027

Table 6: continued

Bin- Gene Transcrip- Se- Chro- Str. Str. Potential Potential Potential STAT5 binding ChIP-seq ChIP-seq ChIP-seq region Position of ding tion factor quence mo- (input binding binding site site region start region end potential site ID type some seq.) site start end (mm9) STAT5 (mm9) BS* 32169 2810459M11Rik STAT5B 3' UTR 1 1 1 87950197 87950211 GGTTTCCTGGGATCT 87950183 87950218 AAGATATCCCCTTAGGTTTCCTGGGATCTTGGCTC 14 32168 2810459M11Rik STAT5B Exon 1 1 1 87950197 87950211 GGTTTCCTGGGATCT 87950183 87950218 AAGATATCCCCTTAGGTTTCCTGGGATCTTGGCTC 14 24880 A230050P20Rik STAT5A Intron 9 1 1 20673635 20673649 TTTTTCCAGGAAATG 20673616 20673651 ATGCCTGGAGAACTTCGGATTTTTCCAGGAAATGG 19 24881 A230050P20Rik STAT5A Promoter 9 1 1 20673635 20673649 TTTTTCCAGGAAATG 20673616 20673651 ATGCCTGGAGAACTTCGGATTTTTCCAGGAAATGG 19 23728 A230050P20Rik STAT5B Intron 9 1 1 20673635 20673649 TTTTTCCAGGAAATG 20673616 20673651 ATGCCTGGAGAACTTCGGATTTTTCCAGGAAATGG 19 23729 A230050P20Rik STAT5B Promoter 9 1 1 20673635 20673649 TTTTTCCAGGAAATG 20673616 20673651 ATGCCTGGAGAACTTCGGATTTTTCCAGGAAATGG 19 31156 Adcy7 STAT5A Intron 8 1 1 90813448 90813462 GAATTCTGTGAACTG 90813443 90813478 ACATGGAATTCTGTGAACTGCCTGCTCTTAGATGC 5 24715 Amph STAT5A Intron 13 -1 1 19103885 19103899 GCCTTCCAGGAAATC 19103884 19103919 CTCGAACAGATCATCAGAGGCCTTCCAGGAAATCA 19 29748 Amph STAT5A Intron 13 -1 1 19149641 19149655 TCTTTCTAAGAATAT 19149631 19149666 AAGTAATTTATCTTTCTAAGAATATTACTCCTACA 10 28236 Amph STAT5A Intron 13 1 1 19077476 19077490 GCCTTCTTAGAATTC 19077456 19077491 CTAAATGGTCTAAGTCCTCAGCCTTCTTAGAATTC 20 33333 Amph STAT5A Intron 13 1 1 19185035 19185049 CATTTCTGGGCACTT 19185027 19185062 CTGCTCCTCATTTCTGGGCACTTAATATGGGCCAC 8 27909 Amph STAT5B Intron 13 -1 1 19103885 19103899 GCCTTCCAGGAAATC 19103884 19103919 CTCGAACAGATCATCAGAGGCCTTCCAGGAAATCA 19 28270 Amph STAT5B Intron 13 -1 1 19149641 19149655 TCTTTCTAAGAATAT 19149631 19149666 AAGTAATTTATCTTTCTAAGAATATTACTCCTACA 10 33217 Amph STAT5B Intron 13 -1 1 19221496 19221510 CATTTCCTGGGAGTT 19221476 19221511 CATTTCCTGGGAGTTAACTAAGATGGAAAGAGCTC 0 29308 Amph STAT5B Intron 13 1 1 19077476 19077490 GCCTTCTTAGAATTC 19077456 19077491 CTAAATGGTCTAAGTCCTCAGCCTTCTTAGAATTC 20 32669 Ankrd6 STAT5A 3' UTR 4 -1 -1 32892041 32892055 ACATTCCTGGAAACA 32892030 32892065 GGATGTAGACATTCCTGGAAACATACAGGGTAGAT 8 28340 Anxa8 STAT5A Intron 14 1 1 34901626 34901640 CCTTTCTCAGAAGCA 34901620 34901655 CTTGTTCCTTTCTCAGAAGCAGGACCCTTATGGAG 6 30588 Anxa8 STAT5B Intron 14 1 1 34901626 34901640 CCTTTCTCAGAAGCA 34901620 34901655 CTTGTTCCTTTCTCAGAAGCAGGACCCTTATGGAG 6 31550 Bst1 STAT5A Intron 5 -1 1 44217018 44217032 GATATCCTGGAACCC 44217000 44217035 ATGATATCCTGGAACCCTCCAAGTAGACTAGTCTG 2 119

Table 6: continued

Bin- Gene Transcrip- Se- Chro- Str. Str. Potential Potential Potential STAT5 binding ChIP-seq ChIP-seq ChIP-seq region Position of ding tion factor quence mo- (input binding binding site site region start region end potential site ID type some seq.) site start end (mm9) STAT5 (mm9) BS* 33229 Bst1 STAT5B Intron 5 -1 1 44217018 44217032 GATATCCTGGAACCC 44217000 44217035 ATGATATCCTGGAACCCTCCAAGTAGACTAGTCTG 2 32696 Casp4 STAT5A Intron 9 -1 1 5334541 5334555 TAATTCCTGGGATCC 5334533 5334568 ATCTTCTTGTTTTAATTCCTGGGATCCACATGTGA 12 33080 Casp4 STAT5A Intron 9 1 1 5317484 5317498 TATATCCAGGAATTG 5317478 5317513 TGATGGTATATCCAGGAATTGCTATGGTGGGAGAG 6 32837 Casp4 STAT5B Intron 9 -1 1 5334541 5334555 TAATTCCTGGGATCC 5334533 5334568 ATCTTCTTGTTTTAATTCCTGGGATCCACATGTGA 12 33505 Casp4 STAT5B Intron 9 1 1 5317484 5317498 TATATCCAGGAATTG 5317478 5317513 TGATGGTATATCCAGGAATTGCTATGGTGGGAGAG 6 30759 Ccr5 STAT5A Exon 9 -1 1 124039881 124039895 GAATTCCTGGAAGGT 124039874 124039909 TATTCAGTCCAAAGAATTCCTGGAAGGTGGTCAGG 13 30760 Ccr5 STAT5A Intron 9 -1 1 124039881 124039895 GAATTCCTGGAAGGT 124039874 124039909 TATTCAGTCCAAAGAATTCCTGGAAGGTGGTCAGG 13 24833 Ccr5 STAT5B Exon 9 -1 1 124039881 124039895 GAATTCCTGGAAGGT 124039874 124039909 TATTCAGTCCAAAGAATTCCTGGAAGGTGGTCAGG 13 24834 Ccr5 STAT5B Intron 9 -1 1 124039881 124039895 GAATTCCTGGAAGGT 124039874 124039909 TATTCAGTCCAAAGAATTCCTGGAAGGTGGTCAGG 13 26330 Cenpe STAT5A Intron 3 -1 1 134875760 134875774 GTTTTCGGGGAAACG 134875751 134875786 AGTCGGACCGTGTTTTCGGGGAAACGGAGTCCTGC 11 26330 Cenpe STAT5A Intron 3 -1 1 134875760 134875774 GTTTTCGGGGAAACG 134875751 134875786 AGTCGGACCATGTTTTCGGGGAAACGGAGTCCTGC 11 27918 Cenpe STAT5B Intron 3 -1 1 134875760 134875774 GTTTTCGGGGAAACG 134875751 134875786 AGTCGGACCGTGTTTTCGGGGAAACGGAGTCCTGC 11 27918 Cenpe STAT5B Intron 3 -1 1 134875760 134875774 GTTTTCGGGGAAACG 134875751 134875786 AGTCGGACCATGTTTTCGGGGAAACGGAGTCCTGC 11 27072 Cfi STAT5A Intron 3 -1 1 129541043 129541057 CCTTTCTCAGAATTA 129541043 129541078 CAGCTGTGACTAGACCAGAGCCTTTCTCAGAATTA 20 27072 Cfi STAT5A Intron 3 -1 1 129541043 129541057 CCTTTCTCAGAATTA 129541042 129541077 AGCTGTGACTAGACCAGAGCCTTTCTCAGAATTAG 19 27072 Cfi STAT5A Intron 3 -1 1 129541043 129541057 CCTTTCTCAGAATTA 129541041 129541076 GCTGTGACTAGACCAGAGCCTTTCTCAGAATTAGC 18 27072 Cfi STAT5A Intron 3 -1 1 129541043 129541057 CCTTTCTCAGAATTA 129541038 129541073 GTGACTAGACCAGAGCCTTTCTCAGAATTAGCAAA 15 27072 Cfi STAT5A Intron 3 -1 1 129541043 129541057 CCTTTCTCAGAATTA 129541035 129541070 ACTAGACCAGAGCCTTTCTCAGAATTAGCAAAGAC 12 27559 Cfi STAT5B Intron 3 -1 1 129541043 129541057 CCTTTCTCAGAATTA 129541043 129541078 CAGCTGTGACTAGACCAGAGCCTTTCTCAGAATTA 20 27559 Cfi STAT5B Intron 3 -1 1 129541043 129541057 CCTTTCTCAGAATTA 129541042 129541077 AGCTGTGACTAGACCAGAGCCTTTCTCAGAATTAG 19 27559 Cfi STAT5B Intron 3 -1 1 129541043 129541057 CCTTTCTCAGAATTA 129541041 129541076 GCTGTGACTAGACCAGAGCCTTTCTCAGAATTAGC 18 27559 Cfi STAT5B Intron 3 -1 1 129541043 129541057 CCTTTCTCAGAATTA 129541038 129541073 GTGACTAGACCAGAGCCTTTCTCAGAATTAGCAAA 15 27559 Cfi STAT5B Intron 3 -1 1 129541043 129541057 CCTTTCTCAGAATTA 129541035 129541070 ACTAGACCAGAGCCTTTCTCAGAATTAGCAAAGAC 12 27008 Cfi STAT5B Intron 3 -1 1 129546856 129546870 GAATTCTTAGAATGG 129546853 129546888 ACAGTGAGGTCCCTACAGAATTCTTAGAATGGTTG 17 25688 Cfi STAT5B Intron 3 -1 1 129549997 129550011 TTTTTCCTGGAAAGA 129549990 129550025 GTGTGGTGTGACATTTTTCCTGGAAAGACAAAGAT 13 27201 Cftr STAT5A 3' UTR 6 -1 1 18272061 18272075 GCTTTCTGGGAACAT 18272058 18272093 AGTCTTAAAGATCTGTTGCTTTCTGGGAACATAAG 17 27202 Cftr STAT5A Exon 6 -1 1 18272061 18272075 GCTTTCTGGGAACAT 18272058 18272093 AGTCTTAAAGATCTGTTGCTTTCTGGGAACATAAG 17 25172 Cftr STAT5A Intron 6 -1 1 18169327 18169341 CAGTTCTAAGAAATA 18169323 18169358 TTAGTAAACTATTCTACAGTTCTAAGAAATATATG 16 29903 Cftr STAT5A Intron 6 1 1 18196402 18196416 CTCTTCTAAGAATCT 18196399 18196434 AAGCTCTTCTAAGAATCTGAGTTATAAGAGACTTC 3 33369 Cftr STAT5A Intron 6 1 1 18260405 18260419 GATTTCTAGAAAACT 18260400 18260435 CTCTGGATTTCTAGAAAACTAAACTTTAACTATCC 5 26796 Cftr STAT5B 3' UTR 6 -1 1 18272061 18272075 GCTTTCTGGGAACAT 18272058 18272093 AGTCTTAAAGATCTGTTGCTTTCTGGGAACATAAG 17 26797 Cftr STAT5B Exon 6 -1 1 18272061 18272075 GCTTTCTGGGAACAT 18272058 18272093 AGTCTTAAAGATCTGTTGCTTTCTGGGAACATAAG 17 28026 Cftr STAT5B Intron 6 -1 1 18169327 18169341 CAGTTCTAAGAAATA 18169323 18169358 TTAGTAAACTATTCTACAGTTCTAAGAAATATATG 16 31939 Cftr STAT5B Intron 6 1 1 18196402 18196416 CTCTTCTAAGAATCT 18196399 18196434 AAGCTCTTCTAAGAATCTGAGTTATAAGAGACTTC 3 28625 Cftr STAT5B Intron 6 1 1 18246890 18246904 TGTTTCCGGGAAGGA 18246880 18246915 CTGCAAGTCCTGTTTCCGGGAAGGATGGGCTCCTT 10 28117 Cftr STAT5B Intron 6 1 1 18257675 18257689 CTTTTCTTGGAACGT 18257665 18257700 CTTTTCCTATCTTTTCTTGGAACGTGAGGATTGCA 10 31352 Creb5 STAT5A Intron 6 -1 1 53595206 53595220 GAGTTCACGGAACTA 53595199 53595234 TTATCTTGCTTTAGAGTTCACGGAACTAGTTGCAT 13 31352 Creb5 STAT5A Intron 6 -1 1 53595206 53595220 GAGTTCACGGAACTA 53595195 53595230 CTTGCTTTAGAGTTCACGGAACTAGTTGCATCATA 9 33996 Creb5 STAT5B Intron 6 1 1 53575331 53575345 GATCTCTTAGAATTC 53575322 53575357 TAAAAGTCAGATCTCTTAGAATTCACCCCTGGCAT 9 26085 Ctsc STAT5A Exon 7 -1 1 95437359 95437373 GTTTTCCAAGAAATC 95437355 95437390 CACAGTGGTGATTACTGTTTTCCAAGAAATCCTTC 16 31356 Ctsc STAT5A Exon 7 -1 1 95445363 95445377 TATTTCTAGGAATGA 95445353 95445388 AAACAAAATGTATTTCTAGGAATGAAAGAATGGCT 10 31356 Ctsc STAT5A Exon 7 -1 1 95445363 95445377 TATTTCTAGGAATGA 95445352 95445387 AACAAAATGTATTTCTAGGAATGAAAGAATGGCTG 9 120

Table 6: continued

Bin- Gene Transcrip- Se- Chro- Str. Str. Potential Potential Potential STAT5 binding ChIP-seq ChIP-seq ChIP-seq region Position of ding tion factor quence mo- (input binding binding site site region start region end potential site ID type some seq.) site start end (mm9) STAT5 (mm9) BS* 23997 Ctsc STAT5A Exon 7 1 1 95437359 95437373 GATTTCTTGGAAAAC 95437347 95437382 GCCCTCCTGAAGGATTTCTTGGAAAACAGTAATCA 12 26084 Ctsc STAT5A Intron 7 -1 1 95437359 95437373 GTTTTCCAAGAAATC 95437355 95437390 CACAGTGGTGATTACTGTTTTCCAAGAAATCCTTC 16 29085 Ctsc STAT5A Intron 7 -1 1 95440982 95440996 TCCTTCCCGGAATAG 95440968 95441003 TAAGTCTCCTTCCCGGAATAGTACTTCAGAGAGGT 6 31353 Ctsc STAT5A Intron 7 -1 1 95445363 95445377 TATTTCTAGGAATGA 95445353 95445388 AAACAAAATGTATTTCTAGGAATGAAAGAATGGCT 10 31353 Ctsc STAT5A Intron 7 -1 1 95445363 95445377 TATTTCTAGGAATGA 95445352 95445387 AACAAAATGTATTTCTAGGAATGAAAGAATGGCTG 9 24614 Ctsc STAT5A Intron 7 -1 1 95452662 95452676 CAATTCTAAGAAACT 95452651 95452686 GAACATTTACAATTCTAAGAAACTGTTTGAGAACC 9 23996 Ctsc STAT5A Intron 7 1 1 95437359 95437373 GATTTCTTGGAAAAC 95437347 95437382 GCCCTCCTGAAGGATTTCTTGGAAAACAGTAATCA 12 29377 Ctsc STAT5A Intron 7 1 1 95456101 95456115 GGCTTCCAAGAATAT 95456097 95456132 ACCTGGCTTCCAAGAATATGCTTTCAATGAATTCC 4 24075 Ctsc STAT5B Exon 7 -1 1 95437359 95437373 GTTTTCCAAGAAATC 95437355 95437390 CACAGTGGTGATTACTGTTTTCCAAGAAATCCTTC 16 26563 Ctsc STAT5B Exon 7 -1 1 95445363 95445377 TATTTCTAGGAATGA 95445353 95445388 AAACAAAATGTATTTCTAGGAATGAAAGAATGGCT 10 26563 Ctsc STAT5B Exon 7 -1 1 95445363 95445377 TATTTCTAGGAATGA 95445352 95445387 AACAAAATGTATTTCTAGGAATGAAAGAATGGCTG 9 23376 Ctsc STAT5B Exon 7 1 1 95437359 95437373 GATTTCTTGGAAAAC 95437347 95437382 GCCCTCCTGAAGGATTTCTTGGAAAACAGTAATCA 12 24074 Ctsc STAT5B Intron 7 -1 1 95437359 95437373 GTTTTCCAAGAAATC 95437355 95437390 CACAGTGGTGATTACTGTTTTCCAAGAAATCCTTC 16 33520 Ctsc STAT5B Intron 7 -1 1 95440719 95440733 CCTTTCTCAGAAAGA 95440703 95440738 CATACCTTTCTCAGAAAGAAGCCAACTTCCTTAAT 4 30146 Ctsc STAT5B Intron 7 -1 1 95440982 95440996 TCCTTCCCGGAATAG 95440968 95441003 TAAGTCTCCTTCCCGGAATAGTACTTCAGAGAGGT 6 26562 Ctsc STAT5B Intron 7 -1 1 95445363 95445377 TATTTCTAGGAATGA 95445353 95445388 AAACAAAATGTATTTCTAGGAATGAAAGAATGGCT 10 26562 Ctsc STAT5B Intron 7 -1 1 95445363 95445377 TATTTCTAGGAATGA 95445352 95445387 AACAAAATGTATTTCTAGGAATGAAAGAATGGCTG 9 25775 Ctsc STAT5B Intron 7 -1 1 95452662 95452676 CAATTCTAAGAAACT 95452651 95452686 GAACATTTACAATTCTAAGAAACTGTTTGAGAACC 9 23375 Ctsc STAT5B Intron 7 1 1 95437359 95437373 GATTTCTTGGAAAAC 95437347 95437382 GCCCTCCTGAAGGATTTCTTGGAAAACAGTAATCA 12 32955 Ctsc STAT5B Intron 7 1 1 95440719 95440733 TCTTTCTGAGAAAGG 95440702 95440737 GATTAAGGAAGTTGGCTTCTTTCTGAGAAAGGTAT 17 32955 Ctsc STAT5B Intron 7 1 1 95440719 95440733 TCTTTCTGAGAAAGG 95440710 95440745 AAGTTGGCTTCTTTCTGAGAAAGGTATGGTAGTCT 9 29317 Ctsc STAT5B Intron 7 1 1 95456101 95456115 GGCTTCCAAGAATAT 95456097 95456132 ACCTGGCTTCCAAGAATATGCTTTCAATGAATTCC 4 27387 Dcdc2a STAT5A Intron 13 -1 1 25228044 25228058 CAGTTCTTGGAAGAG 25228044 25228079 ATCTGTGTAGGAAGGAAAATCAGTTCTTGGAAGAG 20 27387 Dcdc2a STAT5A Intron 13 -1 1 25228044 25228058 CAGTTCTTGGAAGAG 25228031 25228066 GGAAAATCAGTTCTTGGAAGAGGTCACTGAAGCAT 7 33094 Dcdc2a STAT5A Intron 13 -1 1 25259372 25259386 CATTTCCCTGAATTT 25259354 25259389 CTCATTTCCCTGAATTTGCAAGCTTGTAAAAATAA 2 27653 Dcdc2a STAT5B Intron 13 -1 1 25228044 25228058 CAGTTCTTGGAAGAG 25228044 25228079 ATCTGTGTAGGAAGGAAAATCAGTTCTTGGAAGAG 20 27653 Dcdc2a STAT5B Intron 13 -1 1 25228044 25228058 CAGTTCTTGGAAGAG 25228031 25228066 GGAAAATCAGTTCTTGGAAGAGGTCACTGAAGCAT 7 31671 Dcdc2a STAT5B Intron 13 -1 1 25259372 25259386 CATTTCCCTGAATTT 25259354 25259389 CTCATTTCCCTGAATTTGCAAGCTTGTAAAAATAA 2 27968 Ddx60 STAT5A Exon 8 -1 1 64427693 64427707 CATTTCCAAGAAGAG 64427680 64427715 AGTGATACATTTCCAAGAAGAGAGAACATGAGGTG 7 27466 Ddx60 STAT5B Exon 8 -1 1 64427693 64427707 CATTTCCAAGAAGAG 64427680 64427715 AGTGATACATTTCCAAGAAGAGAGAACATGAGGTG 7 25282 Emr1 STAT5A Intron 17 -1 1 57601041 57601055 TATTTCTCAGAATTT 57601026 57601061 CTGCTTATTTCTCAGAATTTCTCATCTCTCTTCTA 5 27280 Emr1 STAT5A Intron 17 1 1 57597171 57597185 TCATTCTTAGAATTG 57597157 57597192 CTTCTCTATTCCAATCATTCTTAGAATTGGTCTTT 14 23576 Emr1 STAT5B Intron 17 -1 1 57601041 57601055 TATTTCTCAGAATTT 57601026 57601061 CTGCTTATTTCTCAGAATTTCTCATCTCTCTTCTA 5 26568 Emr1 STAT5B Intron 17 1 1 57597171 57597185 TCATTCTTAGAATTG 57597157 57597192 CTTCTCTATTCCAATCATTCTTAGAATTGGTCTTT 14 33827 Epsti1 STAT5A Intron 14 -1 1 78323126 78323140 CATTTCCTGTAACTT 78323125 78323160 GATGAATAAAAGCTGGTCCCATTTCCTGTAACTTC 19 31852 Fga STAT5A Promoter 3 -1 1 82829838 82829852 ACATTCCAGGAAATG 82829836 82829871 GGGGGGTGGGGGGAATCTACATTCCAGGAAATGGG 18 31852 Fga STAT5A Promoter 3 -1 1 82829838 82829852 ACATTCCAGGAAATG 82829832 82829867 GGTGGGGGGAATCTACATTCCAGGAAATGGGGGCC 14 31852 Fga STAT5A Promoter 3 -1 1 82829838 82829852 ACATTCCAGGAAATG 82829828 82829863 GGGGGAATCTACATTCCAGGAAATGGGGGCCTTGG 10 31852 Fga STAT5A Promoter 3 -1 1 82829838 82829852 ACATTCCAGGAAATG 82829818 82829853 ACATTCCAGGAAATGGGGGCCTTGGAAGCTGGGGA 0 30980 Fga STAT5A Promoter 3 1 1 82829838 82829852 CATTTCCTGGAATGT 82829820 82829855 CCCAGCTTCCAAGGCCCCCATTTCCTGGAATGTAG 18 30980 Fga STAT5A Promoter 3 1 1 82829838 82829852 CATTTCCTGGAATGT 82829824 82829859 GCTTCCAAGGCCCCCATTTCCTGGAATGTAGATTC 14 30980 Fga STAT5A Promoter 3 1 1 82829838 82829852 CATTTCCTGGAATGT 82829828 82829863 CCAAGGCCCCCATTTCCTGGAATGTAGATTCCCCC 10 121

Table 6: continued

Bin- Gene Transcrip- Se- Chro- Str. Str. Potential Potential Potential STAT5 binding ChIP-seq ChIP-seq ChIP-seq region Position of ding tion factor quence mo- (input binding binding site site region start region end potential site ID type some seq.) site start end (mm9) STAT5 (mm9) BS* 30980 Fga STAT5A Promoter 3 1 1 82829838 82829852 CATTTCCTGGAATGT 82829829 82829864 CAAGGCCCCCATTTCCTGGAATGTAGATTCCCCCC 9 30980 Fga STAT5A Promoter 3 1 1 82829838 82829852 CATTTCCTGGAATGT 82829834 82829869 CCCCCATTTCCTGGAATGTAGATTCCCCCCACCCC 4 30980 Fga STAT5A Promoter 3 1 1 82829838 82829852 CATTTCCTGGAATGT 82829837 82829872 CCATTTCCTGGAATGTAGATTCCCCCCACCCCCCA 1 24346 Fga STAT5B Promoter 3 1 1 82829838 82829852 CATTTCCTGGAATGT 82829820 82829855 CCCAGCTTCCAAGGCCCCCATTTCCTGGAATGTAG 18 24346 Fga STAT5B Promoter 3 1 1 82829838 82829852 CATTTCCTGGAATGT 82829824 82829859 GCTTCCAAGGCCCCCATTTCCTGGAATGTAGATTC 14 24346 Fga STAT5B Promoter 3 1 1 82829838 82829852 CATTTCCTGGAATGT 82829828 82829863 CCAAGGCCCCCATTTCCTGGAATGTAGATTCCCCC 10 24346 Fga STAT5B Promoter 3 1 1 82829838 82829852 CATTTCCTGGAATGT 82829829 82829864 CAAGGCCCCCATTTCCTGGAATGTAGATTCCCCCC 9 24346 Fga STAT5B Promoter 3 1 1 82829838 82829852 CATTTCCTGGAATGT 82829834 82829869 CCCCCATTTCCTGGAATGTAGATTCCCCCCACCCC 4 24346 Fga STAT5B Promoter 3 1 1 82829838 82829852 CATTTCCTGGAATGT 82829837 82829872 CCATTTCCTGGAATGTAGATTCCCCCCACCCCCCA 1 33398 Fyb STAT5A Intron 15 1 1 6594164 6594178 TAATTCTGGGACCCC 6594154 6594189 CTCATACCTATAATTCTGGGACCCCAGAGGCTTAG 10 31196 Galnt12 STAT5A Intron 4 -1 1 47116155 47116169 CAATGCTTGGAAATT 47116153 47116188 GTGGCGCTCACTATGGGGCAATGCTTGGAAATTGT 18 33263 Galnt12 STAT5B Intron 4 -1 1 47116155 47116169 CAATGCTTGGAAATT 47116153 47116188 GTGGCGCTCACTATGGGGCAATGCTTGGAAATTGT 18 33112 Gbp5 STAT5A Intron 3 1 1 142182240 142182254 GAGCTCCAGGAAATG 142182223 142182258 TAGACAACAAAGCCTATGAGCTCCAGGAAATGCCA 17 26201 Gbp5 STAT5A Promoter 3 -1 1 142159843 142159857 TGGTTCTGAGAAATA 142159825 142159860 CTTGGTTCTGAGAAATAACGGCCAGCCCTGGGAAT 2 28402 Gbp5 STAT5B Promoter 3 -1 1 142159843 142159857 TGGTTCTGAGAAATA 142159825 142159860 CTTGGTTCTGAGAAATAACGGCCAGCCCTGGGAAT 2 24895 Gmfg STAT5A Exon 7 -1 1 29231694 29231708 GGTTTCTGAGAAACA 29231683 29231718 CCCTCCCTTGGTTTCTGAGAAACAGGTGTCTGTTC 9 24895 Gmfg STAT5A Exon 7 -1 1 29231694 29231708 GGTTTCTGAGAAACA 29231680 29231715 TCCCTTGGTTTCTGAGAAACAGGTGTCTGTTCTCC 6 23721 Gmfg STAT5A Intron 7 1 1 29222668 29222682 GAATTCTGAGAAATG 29222650 29222685 GGAGACAAGGCTCAGACGGAATTCTGAGAAATGAG 18 25739 Gmfg STAT5A Promoter 7 -1 1 29222071 29222085 GAGTTCTCGGAACAA 29222066 29222101 TGACTGTATTTCTTGGAGTTCTCGGAACAAGCCAC 15 25739 Gmfg STAT5A Promoter 7 -1 1 29222071 29222085 GAGTTCTCGGAACAA 29222062 29222097 TGTATTTCTTGGAGTTCTCGGAACAAGCCACGTTG 11 25739 Gmfg STAT5A Promoter 7 -1 1 29222071 29222085 GAGTTCTCGGAACAA 29222059 29222094 ATTTCTTGGAGTTCTCGGAACAAGCCACGTTGCTG 8 33114 Gmfg STAT5A Promoter 7 -1 1 29222080 29222094 TATTTCTTGGAGTTC 29222066 29222101 TGACTGTATTTCTTGGAGTTCTCGGAACAAGCCAC 6 33114 Gmfg STAT5A Promoter 7 -1 1 29222080 29222094 TATTTCTTGGAGTTC 29222062 29222097 TGTATTTCTTGGAGTTCTCGGAACAAGCCACGTTG 2 34159 Gmfg STAT5A Promoter 7 1 1 29222080 29222094 GAACTCCAAGAAATA 29222078 29222113 GAGAACTCCAAGAAATACAGTCAGGGTGGAGTGTC 2 25783 Gmfg STAT5B Exon 7 -1 1 29231694 29231708 GGTTTCTGAGAAACA 29231683 29231718 CCCTCCCTTGGTTTCTGAGAAACAGGTGTCTGTTC 9 25783 Gmfg STAT5B Exon 7 -1 1 29231694 29231708 GGTTTCTGAGAAACA 29231680 29231715 TCCCTTGGTTTCTGAGAAACAGGTGTCTGTTCTCC 6 24321 Gmfg STAT5B Intron 7 1 1 29222668 29222682 GAATTCTGAGAAATG 29222650 29222685 GGAGACAAGGCTCAGACGGAATTCTGAGAAATGAG 18 26691 Gmfg STAT5B Promoter 7 -1 1 29222071 29222085 GAGTTCTCGGAACAA 29222066 29222101 TGACTGTATTTCTTGGAGTTCTCGGAACAAGCCAC 15 26691 Gmfg STAT5B Promoter 7 -1 1 29222071 29222085 GAGTTCTCGGAACAA 29222062 29222097 TGTATTTCTTGGAGTTCTCGGAACAAGCCACGTTG 11 26691 Gmfg STAT5B Promoter 7 -1 1 29222071 29222085 GAGTTCTCGGAACAA 29222059 29222094 ATTTCTTGGAGTTCTCGGAACAAGCCACGTTGCTG 8 31266 Gmfg STAT5B Promoter 7 -1 1 29222080 29222094 TATTTCTTGGAGTTC 29222066 29222101 TGACTGTATTTCTTGGAGTTCTCGGAACAAGCCAC 6 31266 Gmfg STAT5B Promoter 7 -1 1 29222080 29222094 TATTTCTTGGAGTTC 29222062 29222097 TGTATTTCTTGGAGTTCTCGGAACAAGCCACGTTG 2 27928 Havcr2 STAT5B Intron 11 1 1 46270830 46270844 CGTTTCCTCGAAGCT 46270826 46270861 GTCCCGTTTCCTCGAAGCTATCTATTACCTCTGGC 4 33116 Hmha1 STAT5A Intron 10 -1 1 79486056 79486070 GAGCTCTCGGAAATC 79486036 79486071 GAGCTCTCGGAAATCAATAACAATCCCCTGTGTCA 0 32735 Icam1 STAT5A Promoter 9 -1 1 20820339 20820353 ACTTTCCGGGAAACC 20820330 20820365 CTGTCGGGGCCACTTTCCGGGAAACCTCGCGCCTT 11 32735 Icam1 STAT5A Promoter 9 -1 1 20820339 20820353 ACTTTCCGGGAAACC 20820326 20820361 CGGGGCCACTTTCCGGGAAACCTCGCGCCTTCCCC 7 32735 Icam1 STAT5A Promoter 9 -1 1 20820339 20820353 ACTTTCCGGGAAACC 20820319 20820354 ACTTTCCGGGAAACCTCGCGCCTTCCCCTCCGGAA 0 30781 Icam1 STAT5A Promoter 9 1 1 20820339 20820353 GGTTTCCCGGAAAGT 20820329 20820364 GAAGGCGCGAGGTTTCCCGGAAAGTGGCCCCGACA 10 24763 Icam1 STAT5B Promoter 9 1 1 20820339 20820353 GGTTTCCCGGAAAGT 20820329 20820364 GAAGGCGCGAGGTTTCCCGGAAAGTGGCCCCGACA 10 30986 Icos STAT5A Exon 1 1 1 61050604 61050618 GATTTCTTGTAAATA 61050600 61050635 TACAGATTTCTTGTAAATACCCTGAGACTGTCCAG 4 31753 Icos STAT5B Exon 1 1 1 61050604 61050618 GATTTCTTGTAAATA 61050600 61050635 TACAGATTTCTTGTAAATACCCTGAGACTGTCCAG 4 32081 Ifi35 STAT5A 5' UTR 11 -1 1 101309730 101309744 CATTTCTGTGAATTC 101309727 101309762 GGCCTCACTTCCACTTTCATTTCTGTGAATTCTGC 17 122

Table 6: continued

Bin- Gene Transcrip- Se- Chro- Str. Str. Potential Potential Potential STAT5 binding ChIP-seq ChIP-seq ChIP-seq region Position of ding tion factor quence mo- (input binding binding site site region start region end potential site ID type some seq.) site start end (mm9) STAT5 (mm9) BS* 32082 Ifi35 STAT5A 5' UTR 11 1 1 101309730 101309744 GAATTCACAGAAATG 101309712 101309747 GCTGTATGCTCAGAAGCAGAATTCACAGAAATGAA 18 32082 Ifi35 STAT5A 5' UTR 11 1 1 101309730 101309744 GAATTCACAGAAATG 101309713 101309748 CTGTATGCTCAGAAGCAGAATTCACAGAAATGAAA 17 32082 Ifi35 STAT5A 5' UTR 11 1 1 101309730 101309744 GAATTCACAGAAATG 101309728 101309763 CAGAATTCACAGAAATGAAAGTGGAAGTGAGGCCG 2 32079 Ifi35 STAT5A Exon 11 -1 1 101309730 101309744 CATTTCTGTGAATTC 101309727 101309762 GGCCTCACTTCCACTTTCATTTCTGTGAATTCTGC 17 32080 Ifi35 STAT5A Exon 11 1 1 101309730 101309744 GAATTCACAGAAATG 101309712 101309747 GCTGTATGCTCAGAAGCAGAATTCACAGAAATGAA 18 32080 Ifi35 STAT5A Exon 11 1 1 101309730 101309744 GAATTCACAGAAATG 101309713 101309748 CTGTATGCTCAGAAGCAGAATTCACAGAAATGAAA 17 32080 Ifi35 STAT5A Exon 11 1 1 101309730 101309744 GAATTCACAGAAATG 101309728 101309763 CAGAATTCACAGAAATGAAAGTGGAAGTGAGGCCG 2 32085 Ifi35 STAT5A Promoter 11 -1 1 101309730 101309744 CATTTCTGTGAATTC 101309727 101309762 GGCCTCACTTCCACTTTCATTTCTGTGAATTCTGC 17 32086 Ifi35 STAT5A Promoter 11 1 1 101309730 101309744 GAATTCACAGAAATG 101309712 101309747 GCTGTATGCTCAGAAGCAGAATTCACAGAAATGAA 18 32086 Ifi35 STAT5A Promoter 11 1 1 101309730 101309744 GAATTCACAGAAATG 101309713 101309748 CTGTATGCTCAGAAGCAGAATTCACAGAAATGAAA 17 32086 Ifi35 STAT5A Promoter 11 1 1 101309730 101309744 GAATTCACAGAAATG 101309728 101309763 CAGAATTCACAGAAATGAAAGTGGAAGTGAGGCCG 2 23381 Ifit3 STAT5A Exon 19 -1 1 34662702 34662716 CATTTCTGGGAAATT 34662694 34662729 CGATTGACCTTTCATTTCTGGGAAATTGCAACAAA 12 23459 Ifit3 STAT5B Exon 19 -1 1 34662702 34662716 CATTTCTGGGAAATT 34662694 34662729 CGATTGACCTTTCATTTCTGGGAAATTGCAACAAA 12 29530 Ifngr1 STAT5B Intron 10 1 1 19314204 19314218 GAATTCTCAGAAGGA 19314199 19314234 TGGAGGAATTCTCAGAAGGAGAAGCACACTGTTCC 5 29530 Ifngr1 STAT5B Intron 10 1 1 19314204 19314218 GAATTCTCAGAAGGA 19314200 19314235 GGAGGAATTCTCAGAAGGAGAAGCACACTGTTCCT 4 24004 Igdcc3 STAT5A Intron 9 1 1 64994320 64994334 TATTTCTTGGAAGCC 64994300 64994335 GTGTGTAGTCCCAGGCTTTTTATTTCTTGGAAGCC 20 23700 Igdcc3 STAT5B Intron 9 1 1 64994320 64994334 TATTTCTTGGAAGCC 64994300 64994335 GTGTGTAGTCCCAGGCTTTTTATTTCTTGGAAGCC 20 32739 Ikzf1 STAT5A Intron 11 -1 1 11594225 11594239 TAATCCCAGGAACCC 11594208 11594243 GTTTAATCCCAGGAACCCTCAGAGAAAGCCCAGTA 3 32739 Ikzf1 STAT5A Intron 11 -1 1 11594225 11594239 TAATCCCAGGAACCC 11594208 11594243 GTTTAATCCCAGGAACCCTCAGAGAAAGCCCCGTA 3 25446 Ikzf1 STAT5A Intron 11 -1 1 11665200 11665214 TAATTCCAGGAAGAG 11665190 11665225 AAAAGAAACTTAATTCCAGGAAGAGGAGCAATGAA 10 25224 Ikzf1 STAT5B Intron 11 -1 1 11665200 11665214 TAATTCCAGGAAGAG 11665190 11665225 AAAAGAAACTTAATTCCAGGAAGAGGAGCAATGAA 10 28067 Il17rc STAT5A Intron 6 -1 1 113430551 113430565 GTATTCCCAGAAGCA 113430537 113430572 CGCACTGTATTCCCAGAAGCAGTACTGGCCCTGGC 6 26574 Il17rc STAT5B Intron 6 -1 1 113430551 113430565 GTATTCCCAGAAGCA 113430537 113430572 CGCACTGTATTCCCAGAAGCAGTACTGGCCCTGGC 6 28356 Inpp5d STAT5A Intron 1 -1 1 89564177 89564191 TGGTTCTGAGAAGCA 89564176 89564211 AGTACCTCCTAGGGTCAGTTGGTTCTGAGAAGCAC 19 31681 Inpp5d STAT5B Intron 1 -1 1 89564177 89564191 TGGTTCTGAGAAGCA 89564176 89564211 AGTACCTCCTAGGGTCAGTTGGTTCTGAGAAGCAC 19 23887 Irf1 STAT5A 3' UTR 11 1 1 53591278 53591292 GCTTTCCTGGAACTC 53591267 53591302 GTGTATCCGTGGCTTTCCTGGAACTCACTGTGTAG 11 23886 Irf1 STAT5A Exon 11 1 1 53591278 53591292 GCTTTCCTGGAACTC 53591267 53591302 GTGTATCCGTGGCTTTCCTGGAACTCACTGTGTAG 11 23484 Irf1 STAT5A Promoter 11 -1 1 53583423 53583437 CATTTCGGGGAAATC 53583411 53583446 GGCCTCATCATTTCGGGGAAATCAGGCTGTTGTAG 8 23959 Irf1 STAT5B 3' UTR 11 1 1 53591278 53591292 GCTTTCCTGGAACTC 53591267 53591302 GTGTATCCGTGGCTTTCCTGGAACTCACTGTGTAG 11 23958 Irf1 STAT5B Exon 11 1 1 53591278 53591292 GCTTTCCTGGAACTC 53591267 53591302 GTGTATCCGTGGCTTTCCTGGAACTCACTGTGTAG 11 26040 Irf1 STAT5B Promoter 11 -1 1 53583423 53583437 CATTTCGGGGAAATC 53583411 53583446 GGCCTCATCATTTCGGGGAAATCAGGCTGTTGTAG 8 25660 Irf8 STAT5A Promoter 8 -1 1 123260084 123260098 TCTTTCCGAGAAATC 123260075 123260110 CTGAAGCGCTCTCTTTCCGAGAAATCACTTTTGCA 11 29991 Irf8 STAT5A Promoter 8 1 1 123260084 123260098 GATTTCTCGGAAAGA 123260064 123260099 GGTCGGGGACGTGCAAAAGTGATTTCTCGGAAAGA 20 29991 Irf8 STAT5A Promoter 8 1 1 123260084 123260098 GATTTCTCGGAAAGA 123260068 123260103 GGGGACGTGCAAAAGTGATTTCTCGGAAAGAGAGC 16 29991 Irf8 STAT5A Promoter 8 1 1 123260084 123260098 GATTTCTCGGAAAGA 123260081 123260116 AGTGATTTCTCGGAAAGAGAGCGCTTCAGAGAAGG 3 25788 Irf8 STAT5B Promoter 8 -1 1 123260084 123260098 TCTTTCCGAGAAATC 123260075 123260110 CTGAAGCGCTCTCTTTCCGAGAAATCACTTTTGCA 11 25135 Irf8 STAT5B Promoter 8 1 1 123260084 123260098 GATTTCTCGGAAAGA 123260064 123260099 GGTCGGGGACGTGCAAAAGTGATTTCTCGGAAAGA 20 25135 Irf8 STAT5B Promoter 8 1 1 123260084 123260098 GATTTCTCGGAAAGA 123260068 123260103 GGGGACGTGCAAAAGTGATTTCTCGGAAAGAGAGC 16 25135 Irf8 STAT5B Promoter 8 1 1 123260084 123260098 GATTTCTCGGAAAGA 123260081 123260116 AGTGATTTCTCGGAAAGAGAGCGCTTCAGAGAAGG 3 24116 Irf9 STAT5A 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222956 56222991 TTCTCCTCCGGGGAGGTGCATTTCCTAGAAACCAC 18 24116 Irf9 STAT5A 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222953 56222988 TCCTCCGGGGAGGTGCATTTCCTAGAAACCACGTG 15 24116 Irf9 STAT5A 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGGGGAGGTGCATTTCCTAGAAACCACGTGG 14 123

Table 6: continued

Bin- Gene Transcrip- Se- Chro- Str. Str. Potential Potential Potential STAT5 binding ChIP-seq ChIP-seq ChIP-seq region Position of ding tion factor quence mo- (input binding binding site site region start region end potential site ID type some seq.) site start end (mm9) STAT5 (mm9) BS* 24116 Irf9 STAT5A 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGTGGAGGTGCATTTCCTAGAAACCACGTGG 14 24116 Irf9 STAT5A 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGGGGAGGTGCATTTCCTAGAAACCACGTGG 14 24116 Irf9 STAT5A 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGTGGAGGTGCATTTCCTAGAAACCACGTGG 14 24116 Irf9 STAT5A 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222951 56222986 CTCCGGGGAGGTGCATTTCCTAGAAACCACGTGGT 13 24116 Irf9 STAT5A 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222950 56222985 TCCGGGGAGGTGCATTTCCTAGAAACCACGTGGTC 12 24116 Irf9 STAT5A 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222949 56222984 CCGGGGAGGTGCATTTCCTAGAAACCACGTGGTCT 11 24116 Irf9 STAT5A 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222948 56222983 CGGGGAGGTGCATTTCCTAGAAACCACGTGGTCTG 10 24116 Irf9 STAT5A 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222946 56222981 GGGAGGTGCATTTCCTAGAAACCACGTGGTCTGAG 8 24116 Irf9 STAT5A 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222939 56222974 GCATTTCCTAGAAACCACGTGGTCTGAGTTGCAGG 1 23503 Irf9 STAT5A 5' UTR 14 1 1 56222958 56222972 GGTTTCTAGGAAATG 56222954 56222989 ACGTGGTTTCTAGGAAATGCACCTCCCCGGAGGAG 4 24115 Irf9 STAT5A Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222956 56222991 TTCTCCTCCGGGGAGGTGCATTTCCTAGAAACCAC 18 24115 Irf9 STAT5A Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222953 56222988 TCCTCCGGGGAGGTGCATTTCCTAGAAACCACGTG 15 24115 Irf9 STAT5A Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGGGGAGGTGCATTTCCTAGAAACCACGTGG 14 24115 Irf9 STAT5A Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGTGGAGGTGCATTTCCTAGAAACCACGTGG 14 24115 Irf9 STAT5A Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGGGGAGGTGCATTTCCTAGAAACCACGTGG 14 24115 Irf9 STAT5A Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGTGGAGGTGCATTTCCTAGAAACCACGTGG 14 24115 Irf9 STAT5A Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222951 56222986 CTCCGGGGAGGTGCATTTCCTAGAAACCACGTGGT 13 24115 Irf9 STAT5A Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222950 56222985 TCCGGGGAGGTGCATTTCCTAGAAACCACGTGGTC 12 24115 Irf9 STAT5A Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222949 56222984 CCGGGGAGGTGCATTTCCTAGAAACCACGTGGTCT 11 24115 Irf9 STAT5A Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222948 56222983 CGGGGAGGTGCATTTCCTAGAAACCACGTGGTCTG 10 24115 Irf9 STAT5A Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222946 56222981 GGGAGGTGCATTTCCTAGAAACCACGTGGTCTGAG 8 24115 Irf9 STAT5A Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222939 56222974 GCATTTCCTAGAAACCACGTGGTCTGAGTTGCAGG 1 23502 Irf9 STAT5A Exon 14 1 1 56222958 56222972 GGTTTCTAGGAAATG 56222954 56222989 ACGTGGTTTCTAGGAAATGCACCTCCCCGGAGGAG 4 24118 Irf9 STAT5A Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222956 56222991 TTCTCCTCCGGGGAGGTGCATTTCCTAGAAACCAC 18 24118 Irf9 STAT5A Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222953 56222988 TCCTCCGGGGAGGTGCATTTCCTAGAAACCACGTG 15 24118 Irf9 STAT5A Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGGGGAGGTGCATTTCCTAGAAACCACGTGG 14 24118 Irf9 STAT5A Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGTGGAGGTGCATTTCCTAGAAACCACGTGG 14 24118 Irf9 STAT5A Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGGGGAGGTGCATTTCCTAGAAACCACGTGG 14 24118 Irf9 STAT5A Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGTGGAGGTGCATTTCCTAGAAACCACGTGG 14 24118 Irf9 STAT5A Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222951 56222986 CTCCGGGGAGGTGCATTTCCTAGAAACCACGTGGT 13 24118 Irf9 STAT5A Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222950 56222985 TCCGGGGAGGTGCATTTCCTAGAAACCACGTGGTC 12 24118 Irf9 STAT5A Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222949 56222984 CCGGGGAGGTGCATTTCCTAGAAACCACGTGGTCT 11 24118 Irf9 STAT5A Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222948 56222983 CGGGGAGGTGCATTTCCTAGAAACCACGTGGTCTG 10 24118 Irf9 STAT5A Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222946 56222981 GGGAGGTGCATTTCCTAGAAACCACGTGGTCTGAG 8 24118 Irf9 STAT5A Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222939 56222974 GCATTTCCTAGAAACCACGTGGTCTGAGTTGCAGG 1 23505 Irf9 STAT5A Promoter 14 1 1 56222958 56222972 GGTTTCTAGGAAATG 56222954 56222989 ACGTGGTTTCTAGGAAATGCACCTCCCCGGAGGAG 4 23864 Irf9 STAT5B 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222956 56222991 TTCTCCTCCGGGGAGGTGCATTTCCTAGAAACCAC 18 23864 Irf9 STAT5B 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222953 56222988 TCCTCCGGGGAGGTGCATTTCCTAGAAACCACGTG 15 23864 Irf9 STAT5B 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGGGGAGGTGCATTTCCTAGAAACCACGTGG 14 23864 Irf9 STAT5B 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGTGGAGGTGCATTTCCTAGAAACCACGTGG 14 23864 Irf9 STAT5B 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGGGGAGGTGCATTTCCTAGAAACCACGTGG 14 23864 Irf9 STAT5B 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGTGGAGGTGCATTTCCTAGAAACCACGTGG 14 124

Table 6: continued

Bin- Gene Transcrip- Se- Chro- Str. Str. Potential Potential Potential STAT5 binding ChIP-seq ChIP-seq ChIP-seq region Position of ding tion factor quence mo- (input binding binding site site region start region end potential site ID type some seq.) site start end (mm9) STAT5 (mm9) BS* 23864 Irf9 STAT5B 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222951 56222986 CTCCGGGGAGGTGCATTTCCTAGAAACCACGTGGT 13 23864 Irf9 STAT5B 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222950 56222985 TCCGGGGAGGTGCATTTCCTAGAAACCACGTGGTC 12 23864 Irf9 STAT5B 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222949 56222984 CCGGGGAGGTGCATTTCCTAGAAACCACGTGGTCT 11 23864 Irf9 STAT5B 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222948 56222983 CGGGGAGGTGCATTTCCTAGAAACCACGTGGTCTG 10 23864 Irf9 STAT5B 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222946 56222981 GGGAGGTGCATTTCCTAGAAACCACGTGGTCTGAG 8 23864 Irf9 STAT5B 5' UTR 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222939 56222974 GCATTTCCTAGAAACCACGTGGTCTGAGTTGCAGG 1 23787 Irf9 STAT5B 5' UTR 14 1 1 56222958 56222972 GGTTTCTAGGAAATG 56222954 56222989 ACGTGGTTTCTAGGAAATGCACCTCCCCGGAGGAG 4 23863 Irf9 STAT5B Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222956 56222991 TTCTCCTCCGGGGAGGTGCATTTCCTAGAAACCAC 18 23863 Irf9 STAT5B Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222953 56222988 TCCTCCGGGGAGGTGCATTTCCTAGAAACCACGTG 15 23863 Irf9 STAT5B Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGGGGAGGTGCATTTCCTAGAAACCACGTGG 14 23863 Irf9 STAT5B Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGTGGAGGTGCATTTCCTAGAAACCACGTGG 14 23863 Irf9 STAT5B Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGGGGAGGTGCATTTCCTAGAAACCACGTGG 14 23863 Irf9 STAT5B Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGTGGAGGTGCATTTCCTAGAAACCACGTGG 14 23863 Irf9 STAT5B Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222951 56222986 CTCCGGGGAGGTGCATTTCCTAGAAACCACGTGGT 13 23863 Irf9 STAT5B Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222950 56222985 TCCGGGGAGGTGCATTTCCTAGAAACCACGTGGTC 12 23863 Irf9 STAT5B Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222949 56222984 CCGGGGAGGTGCATTTCCTAGAAACCACGTGGTCT 11 23863 Irf9 STAT5B Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222948 56222983 CGGGGAGGTGCATTTCCTAGAAACCACGTGGTCTG 10 23863 Irf9 STAT5B Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222946 56222981 GGGAGGTGCATTTCCTAGAAACCACGTGGTCTGAG 8 23863 Irf9 STAT5B Exon 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222939 56222974 GCATTTCCTAGAAACCACGTGGTCTGAGTTGCAGG 1 23786 Irf9 STAT5B Exon 14 1 1 56222958 56222972 GGTTTCTAGGAAATG 56222954 56222989 ACGTGGTTTCTAGGAAATGCACCTCCCCGGAGGAG 4 23866 Irf9 STAT5B Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222956 56222991 TTCTCCTCCGGGGAGGTGCATTTCCTAGAAACCAC 18 23866 Irf9 STAT5B Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222953 56222988 TCCTCCGGGGAGGTGCATTTCCTAGAAACCACGTG 15 23866 Irf9 STAT5B Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGGGGAGGTGCATTTCCTAGAAACCACGTGG 14 23866 Irf9 STAT5B Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGTGGAGGTGCATTTCCTAGAAACCACGTGG 14 23866 Irf9 STAT5B Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGGGGAGGTGCATTTCCTAGAAACCACGTGG 14 23866 Irf9 STAT5B Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222952 56222987 CCTCCGTGGAGGTGCATTTCCTAGAAACCACGTGG 14 23866 Irf9 STAT5B Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222951 56222986 CTCCGGGGAGGTGCATTTCCTAGAAACCACGTGGT 13 23866 Irf9 STAT5B Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222950 56222985 TCCGGGGAGGTGCATTTCCTAGAAACCACGTGGTC 12 23866 Irf9 STAT5B Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222949 56222984 CCGGGGAGGTGCATTTCCTAGAAACCACGTGGTCT 11 23866 Irf9 STAT5B Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222948 56222983 CGGGGAGGTGCATTTCCTAGAAACCACGTGGTCTG 10 23866 Irf9 STAT5B Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222946 56222981 GGGAGGTGCATTTCCTAGAAACCACGTGGTCTGAG 8 23866 Irf9 STAT5B Promoter 14 -1 1 56222958 56222972 CATTTCCTAGAAACC 56222939 56222974 GCATTTCCTAGAAACCACGTGGTCTGAGTTGCAGG 1 23789 Irf9 STAT5B Promoter 14 1 1 56222958 56222972 GGTTTCTAGGAAATG 56222954 56222989 ACGTGGTTTCTAGGAAATGCACCTCCCCGGAGGAG 4 28253 Irg1 STAT5A Promoter 14 1 1 103445772 103445786 TCATTCCCAGAAGCT 103445772 103445807 TCATTCCCAGAAGCTGTTTCATGGTTGGTTGGTGA 0 28132 Irg1 STAT5B Promoter 14 1 1 103445772 103445786 TCATTCCCAGAAGCT 103445772 103445807 TCATTCCCAGAAGCTGTTTCATGGTTGGTTGGTGA 0 26476 Itga4 STAT5A Intron 2 -1 1 79109491 79109505 TAGTTCCAGGAATAG 79109491 79109526 TATGTGTTACTAGGCAAAATTAGTTCCAGGAATAG 20 31389 Itga4 STAT5A Intron 2 -1 1 79154314 79154328 GAATTCTGGGTACTT 79154307 79154342 AAGTGTGGTGCAGGAATTCTGGGTACTTAGCTTAG 13 31867 Itga4 STAT5A Intron 2 1 1 79116512 79116526 GATTTCCTGGAGGTC 79116503 79116538 GAGAGTAATGATTTCCTGGAGGTCACAACAAAAGA 9 25789 Itga4 STAT5B Intron 2 -1 1 79109491 79109505 TAGTTCCAGGAATAG 79109491 79109526 TATGTGTTACTAGGCAAAATTAGTTCCAGGAATAG 20 31272 Itga4 STAT5B Intron 2 1 1 79116512 79116526 GATTTCCTGGAGGTC 79116503 79116538 GAGAGTAATGATTTCCTGGAGGTCACAACAAAAGA 9 31202 Itgal STAT5A Exon 7 -1 1 134444712 134444726 GAATTCCAGGATTTT 134444699 134444734 CCTTCATGAATTCCAGGATTTTTTCAAAGTCCTTT 7 33408 Itgal STAT5A Intron 7 -1 1 134465016 134465030 CATTTCTGGGCACTT 134465004 134465039 CCTACCCTCATTTCTGGGCACTTGGTAACACATCA 8 125

Table 6: continued

Bin- Gene Transcrip- Se- Chro- Str. Str. Potential Potential Potential STAT5 binding ChIP-seq ChIP-seq ChIP-seq region Position of ding tion factor quence mo- (input binding binding site site region start region end potential site ID type some seq.) site start end (mm9) STAT5 (mm9) BS* 31760 Itgal STAT5B Exon 7 -1 1 134444712 134444726 GAATTCCAGGATTTT 134444699 134444734 CCTTCATGAATTCCAGGATTTTTTCAAAGTCCTTT 7 32501 Itgb2 STAT5A Intron 10 -1 1 77025066 77025080 GACTTCAAGGAAACA 77025061 77025096 TACAGACAAAATAAAGACTTCAAGGAAACACAGCA 15 33410 Itpr3 STAT5A Intron 17 -1 1 27227968 27227982 GAATTCCCAGGAATC 27227952 27227987 TCATGAATTCCCAGGAATCCTAGTGACAGACACTT 4 27527 Klf6 STAT5A Intron 13 -1 1 5865424 5865438 CGTTTCTGAGAAAAA 5865416 5865451 ACTGAAATCATGCGTTTCTGAGAAAAAACCTCATT 12 33549 Klf6 STAT5B 3' UTR 13 1 1 5868896 5868910 ACTTTCCCGGAATTT 5868889 5868924 AAGGAGTACTTTCCCGGAATTTGGCATACCACAGC 7 33548 Klf6 STAT5B Exon 13 1 1 5868896 5868910 ACTTTCCCGGAATTT 5868889 5868924 AAGGAGTACTTTCCCGGAATTTGGCATACCACAGC 7 28036 Klf6 STAT5B Intron 13 -1 1 5865424 5865438 CGTTTCTGAGAAAAA 5865416 5865451 ACTGAAATCATGCGTTTCTGAGAAAAAACCTCATT 12 32755 Lcp1 STAT5A 3' UTR 14 -1 1 75630173 75630187 GAATTCCAGGCAGTT 75630169 75630204 CTGTCCTCTTTGTTCTGAATTCCAGGCAGTTCCTA 16 32754 Lcp1 STAT5A Exon 14 -1 1 75630173 75630187 GAATTCCAGGCAGTT 75630169 75630204 CTGTCCTCTTTGTTCTGAATTCCAGGCAGTTCCTA 16 23564 Lcp1 STAT5A Intron 14 -1 1 75578776 75578790 GGATTCCAGGAAGTC 75578769 75578804 AAGGCACTGGAGAGGATTCCAGGAAGTCATGTGGT 13 23347 Lcp1 STAT5A Intron 14 -1 1 75585328 75585342 GATTTCCTGGAACTG 75585325 75585360 GGCCAGAGAGGTCATCAGATTTCCTGGAACTGGAG 17 24013 Lcp1 STAT5A Intron 14 1 1 75578776 75578790 GACTTCCTGGAATCC 75578771 75578806 CACATGACTTCCTGGAATCCTCTCCAGTGCCTTGA 5 29760 Lcp1 STAT5A Intron 14 1 1 75601322 75601336 CTCTTCTGAGAACCT 75601315 75601350 CCACTCCCTCTTCTGAGAACCTGAGCCTTGAGGGT 7 24151 Lcp1 STAT5B Intron 14 -1 1 75578776 75578790 GGATTCCAGGAAGTC 75578769 75578804 AAGGCACTGGAGAGGATTCCAGGAAGTCATGTGGT 13 23329 Lcp1 STAT5B Intron 14 -1 1 75585328 75585342 GATTTCCTGGAACTG 75585325 75585360 GGCCAGAGAGGTCATCAGATTTCCTGGAACTGGAG 17 24352 Lcp1 STAT5B Intron 14 1 1 75578776 75578790 GACTTCCTGGAATCC 75578771 75578806 CACATGACTTCCTGGAATCCTCTCCAGTGCCTTGA 5 33415 Lgals3 STAT5A Exon 14 -1 1 48005281 48005295 GATTTCCCGGAGGTT 48005264 48005299 GCTGATTTCCCGGAGGTTCTTCATCCGATGGTTGT 3 31687 Lgals3 STAT5B Exon 14 -1 1 48005281 48005295 GATTTCCCGGAGGTT 48005264 48005299 GCTGATTTCCCGGAGGTTCTTCATCCGATGGTTGT 3 30554 Loxl2 STAT5A 3' UTR 14 -1 1 70093389 70093403 GTGTTCCCAGAAGAA 70093372 70093407 GCTGTGTTCCCAGAAGAAAACGGCCCCTCTGTGCA 3 30553 Loxl2 STAT5A Exon 14 -1 1 70093389 70093403 GTGTTCCCAGAAGAA 70093372 70093407 GCTGTGTTCCCAGAAGAAAACGGCCCCTCTGTGCA 3 29212 Loxl2 STAT5A Intron 14 -1 1 70028695 70028709 GTATTCTAAGAAAAT 70028694 70028729 TAGTAGGCCAAAAAAAAAGGTATTCTAAGAAAATA 19 29212 Loxl2 STAT5A Intron 14 -1 1 70028695 70028709 GTATTCTAAGAAAAT 70028692 70028727 GTAGGCCAAAAAAAAAGGTATTCTAAGAAAATAAT 17 29212 Loxl2 STAT5A Intron 14 -1 1 70028695 70028709 GTATTCTAAGAAAAT 70028685 70028720 AAAAAAAAAGGTATTCTAAGAAAATAATTGATCTG 10 29212 Loxl2 STAT5A Intron 14 -1 1 70028695 70028709 GTATTCTAAGAAAAT 70028684 70028719 AAAAAAAAGGTATTCTAAGAAAATAATTGATCTGT 9 29212 Loxl2 STAT5A Intron 14 -1 1 70028695 70028709 GTATTCTAAGAAAAT 70028682 70028717 AAAAAAGGTATTCTAAGAAAATAATTGATCTGTGA 7 29212 Loxl2 STAT5A Intron 14 -1 1 70028695 70028709 GTATTCTAAGAAAAT 70028675 70028710 GTATTCTAAGAAAATAATTGATCTGTGAACTTGCT 0 29449 Loxl2 STAT5A Intron 14 1 1 70026342 70026356 TTGTTCTTAGAATTT 70026327 70026362 AAACGAATGAGTGGTTTGTTCTTAGAATTTTCCAT 15 29449 Loxl2 STAT5A Intron 14 1 1 70026342 70026356 TTGTTCTTAGAATTT 70026330 70026365 CGAATGAGTGGTTTGTTCTTAGAATTTTCCATCCA 12 29449 Loxl2 STAT5A Intron 14 1 1 70026342 70026356 TTGTTCTTAGAATTT 70026333 70026368 ATGAGTGGTTTGTTCTTAGAATTTTCCATCCAATG 9 29686 Loxl2 STAT5B 3' UTR 14 -1 1 70093389 70093403 GTGTTCCCAGAAGAA 70093372 70093407 GCTGTGTTCCCAGAAGAAAACGGCCCCTCTGTGCA 3 29685 Loxl2 STAT5B Exon 14 -1 1 70093389 70093403 GTGTTCCCAGAAGAA 70093372 70093407 GCTGTGTTCCCAGAAGAAAACGGCCCCTCTGTGCA 3 26291 Loxl2 STAT5B Intron 14 -1 1 70028695 70028709 GTATTCTAAGAAAAT 70028694 70028729 TAGTAGGCCAAAAAAAAAGGTATTCTAAGAAAATA 19 26291 Loxl2 STAT5B Intron 14 -1 1 70028695 70028709 GTATTCTAAGAAAAT 70028692 70028727 GTAGGCCAAAAAAAAAGGTATTCTAAGAAAATAAT 17 26291 Loxl2 STAT5B Intron 14 -1 1 70028695 70028709 GTATTCTAAGAAAAT 70028685 70028720 AAAAAAAAAGGTATTCTAAGAAAATAATTGATCTG 10 26291 Loxl2 STAT5B Intron 14 -1 1 70028695 70028709 GTATTCTAAGAAAAT 70028684 70028719 AAAAAAAAGGTATTCTAAGAAAATAATTGATCTGT 9 26291 Loxl2 STAT5B Intron 14 -1 1 70028695 70028709 GTATTCTAAGAAAAT 70028682 70028717 AAAAAAGGTATTCTAAGAAAATAATTGATCTGTGA 7 26291 Loxl2 STAT5B Intron 14 -1 1 70028695 70028709 GTATTCTAAGAAAAT 70028675 70028710 GTATTCTAAGAAAATAATTGATCTGTGAACTTGCT 0 25986 Loxl2 STAT5B Intron 14 1 1 70026342 70026356 TTGTTCTTAGAATTT 70026327 70026362 AAACGAATGAGTGGTTTGTTCTTAGAATTTTCCAT 15 25986 Loxl2 STAT5B Intron 14 1 1 70026342 70026356 TTGTTCTTAGAATTT 70026330 70026365 CGAATGAGTGGTTTGTTCTTAGAATTTTCCATCCA 12 25986 Loxl2 STAT5B Intron 14 1 1 70026342 70026356 TTGTTCTTAGAATTT 70026333 70026368 ATGAGTGGTTTGTTCTTAGAATTTTCCATCCAATG 9 28364 Lrrc17 STAT5A Intron 5 1 1 21073787 21073801 TGGTTCTAAGAATCT 21073777 21073812 TCACAGTTCCTGGTTCTAAGAATCTCTGCTTCCTC 10 28364 Lrrc17 STAT5A Intron 5 1 1 21073787 21073801 TGGTTCTAAGAATCT 21073777 21073812 TTACAGTTCCTGGTTCTAAGAATCTCTGCTTCCTC 10 126

Table 6: continued

Bin- Gene Transcrip- Se- Chro- Str. Str. Potential Potential Potential STAT5 binding ChIP-seq ChIP-seq ChIP-seq region Position of ding tion factor quence mo- (input binding binding site site region start region end potential site ID type some seq.) site start end (mm9) STAT5 (mm9) BS* 28720 Lrrc17 STAT5B Intron 5 1 1 21073787 21073801 TGGTTCTAAGAATCT 21073777 21073812 TCACAGTTCCTGGTTCTAAGAATCTCTGCTTCCTC 10 28720 Lrrc17 STAT5B Intron 5 1 1 21073787 21073801 TGGTTCTAAGAATCT 21073777 21073812 TTACAGTTCCTGGTTCTAAGAATCTCTGCTTCCTC 10 25670 Ly86 STAT5A Intron 13 -1 1 37481257 37481271 TTTTTCTTGGAATTC 37481247 37481282 CACACCACTGTTTTTCTTGGAATTCTCTTAGATTT 10 26979 Ly86 STAT5A Intron 13 1 1 37473786 37473800 CAATTCCTAGAACAT 37473766 37473801 CAGGAAGTGATTCTGGACAGCAATTCCTAGAACAT 20 23425 Ly86 STAT5B Intron 13 -1 1 37481257 37481271 TTTTTCTTGGAATTC 37481247 37481282 CACACCACTGTTTTTCTTGGAATTCTCTTAGATTT 10 25485 Ly86 STAT5B Intron 13 1 1 37473786 37473800 CAATTCCTAGAACAT 37473766 37473801 CAGGAAGTGATTCTGGACAGCAATTCCTAGAACAT 20 23771 Maff STAT5A 5' UTR 15 1 1 79177974 79177988 TAATTCCGGGAAGCT 79177972 79178007 AGTAATTCCGGGAAGCTCGCCTTACAACTGCGCGC 2 23770 Maff STAT5A Exon 15 1 1 79177974 79177988 TAATTCCGGGAAGCT 79177972 79178007 AGTAATTCCGGGAAGCTCGCCTTACAACTGCGCGC 2 23772 Maff STAT5A Promoter 15 1 1 79177974 79177988 TAATTCCGGGAAGCT 79177972 79178007 AGTAATTCCGGGAAGCTCGCCTTACAACTGCGCGC 2 23961 Maff STAT5B 5' UTR 15 1 1 79177974 79177988 TAATTCCGGGAAGCT 79177972 79178007 AGTAATTCCGGGAAGCTCGCCTTACAACTGCGCGC 2 23960 Maff STAT5B Exon 15 1 1 79177974 79177988 TAATTCCGGGAAGCT 79177972 79178007 AGTAATTCCGGGAAGCTCGCCTTACAACTGCGCGC 2 23962 Maff STAT5B Promoter 15 1 1 79177974 79177988 TAATTCCGGGAAGCT 79177972 79178007 AGTAATTCCGGGAAGCTCGCCTTACAACTGCGCGC 2 27090 Mical1 STAT5A Exon 10 -1 1 41202073 41202087 GTCTTCCTGGAATCC 41202068 41202103 TGTTCACCTGTGGTTGTCTTCCTGGAATCCGGCTC 15 25607 Mical1 STAT5B Exon 10 -1 1 41202073 41202087 GTCTTCCTGGAATCC 41202068 41202103 TGTTCACCTGTGGTTGTCTTCCTGGAATCCGGCTC 15 32101 Mmp2 STAT5A Intron 8 1 1 95365662 95365676 CTGTTCTCAGAAGAA 95365646 95365681 AGTTAATATCACATCACTGTTCTCAGAAGAAGTAC 16 33289 Mmp2 STAT5B Intron 8 1 1 95365662 95365676 CTGTTCTCAGAAGAA 95365646 95365681 AGTTAATATCACATCACTGTTCTCAGAAGAAGTAC 16 34062 Mrc1 STAT5B Intron 2 -1 1 14213156 14213170 CCATTCCAGGAAGGA 14213138 14213173 ACCCATTCCAGGAAGGATAGCCTAGACCACTAAAG 2 24909 Ms4a4b STAT5A Exon 19 -1 1 11537543 11537557 CATTTCCTGGAACAT 11537537 11537572 GTGGTTCTTGTACACATTTCCTGGAACATTGGTCC 14 23751 Ms4a4b STAT5B Exon 19 -1 1 11537543 11537557 CATTTCCTGGAACAT 11537537 11537572 GTGGTTCTTGTACACATTTCCTGGAACATTGGTCC 14 32373 Ms4a4d STAT5B Intron 19 1 1 11620929 11620943 TATGTCCTAGAATTT 11620910 11620945 CTCAAAGACAAGAGTTGTTTATGTCCTAGAATTTT 19 24193 Ncapg STAT5A Intron 5 1 1 46082709 46082723 TAATTCTTAGAAATG 46082700 46082735 CTTTAATTTTAATTCTTAGAAATGTTCACACTTAA 9 24194 Ncapg STAT5A Promoter 5 1 1 46082709 46082723 TAATTCTTAGAAATG 46082700 46082735 CTTTAATTTTAATTCTTAGAAATGTTCACACTTAA 9 23924 Ncapg STAT5B Intron 5 1 1 46082709 46082723 TAATTCTTAGAAATG 46082700 46082735 CTTTAATTTTAATTCTTAGAAATGTTCACACTTAA 9 23925 Ncapg STAT5B Promoter 5 1 1 46082709 46082723 TAATTCTTAGAAATG 46082700 46082735 CTTTAATTTTAATTCTTAGAAATGTTCACACTTAA 9 27408 Nckap1l STAT5A Intron 15 1 1 103314153 103314167 GCTTTCCAAGAAGCC 103314147 103314182 CCTCCTGCTTTCCAAGAAGCCCATTTGGTGTTTCT 6 30056 Nckap1l STAT5B Intron 15 -1 1 103314153 103314167 GGCTTCTTGGAAAGC 103314135 103314170 TGGGCTTCTTGGAAAGCAGGAGGATTTAGGCCAAG 2 28145 Nckap1l STAT5B Intron 15 1 1 103314153 103314167 GCTTTCCAAGAAGCC 103314147 103314182 CCTCCTGCTTTCCAAGAAGCCCATTTGGTGTTTCT 6 31408 P2rx7 STAT5A Intron 5 1 1 123116808 123116822 GATTTCCAGCAACTA 123116790 123116825 TTGAGAGGACCCAGGTTTGATTTCCAGCAACTACA 18 31891 Pik3r5 STAT5A Intron 11 1 1 68273757 68273771 CAATTCCCTGAAACT 68273743 68273778 ACAAAAGAAGGCATCAATTCCCTGAAACTGGAGTT 14 31608 Pla2g7 STAT5A Intron 17 1 1 43745581 43745595 GATTTCTTGGTACTA 43745561 43745596 TGTTACAAATGTTGGCTATTGATTTCTTGGTACTA 20 26500 Plaur STAT5A Intron 7 -1 1 25260124 25260138 GTGTTCCAGGAACCC 25260123 25260158 GAGCAGGGCTTCCCAAAGTGTGTTCCAGGAACCCC 19 26500 Plaur STAT5A Intron 7 -1 1 25260124 25260138 GTGTTCCAGGAACCC 25260119 25260154 AGGGCTTCCCAAAGTGTGTTCCAGGAACCCCTGGC 15 26500 Plaur STAT5A Intron 7 -1 1 25260124 25260138 GTGTTCCAGGAACCC 25260115 25260150 NTTCCCAAAGTGTGTTCCAGGAACCCCTGGCAGTG 11 26500 Plaur STAT5A Intron 7 -1 1 25260124 25260138 GTGTTCCAGGAACCC 25260109 25260144 AAAGTGTGTTCCAGGAACCCCTGGCAGTGCCTCAG 5 27169 Plaur STAT5B Intron 7 -1 1 25260124 25260138 GTGTTCCAGGAACCC 25260123 25260158 GAGCAGGGCTTCCCAAAGTGTGTTCCAGGAACCCC 19 27169 Plaur STAT5B Intron 7 -1 1 25260124 25260138 GTGTTCCAGGAACCC 25260119 25260154 AGGGCTTCCCAAAGTGTGTTCCAGGAACCCCTGGC 15 27169 Plaur STAT5B Intron 7 -1 1 25260124 25260138 GTGTTCCAGGAACCC 25260115 25260150 NTTCCCAAAGTGTGTTCCAGGAACCCCTGGCAGTG 11 27169 Plaur STAT5B Intron 7 -1 1 25260124 25260138 GTGTTCCAGGAACCC 25260109 25260144 AAAGTGTGTTCCAGGAACCCCTGGCAGTGCCTCAG 5 25455 Pou2af1 STAT5A Intron 9 -1 1 51030806 51030820 TATTTCCCAGAATCT 51030791 51030826 CAGTATATTTCCCAGAATCTAAGCAATGAACTTGA 5 28422 Pou2af1 STAT5B Intron 9 -1 1 51030322 51030336 CTTTTCCCAGAAAGT 51030305 51030340 CAGCTTTTCCCAGAAAGTAGTACTAATGAAACCAT 3 23615 Pou2af1 STAT5B Intron 9 -1 1 51030806 51030820 TATTTCCCAGAATCT 51030791 51030826 CAGTATATTTCCCAGAATCTAAGCAATGAACTTGA 5 24021 Prkcb STAT5A Intron 7 -1 1 129710705 129710719 CGATTCTAGGAAGTA 129710685 129710720 CGATTCTAGGAAGTAGACACTGCCGTGCATCTAGC 0 127

Table 6: continued

Bin- Gene Transcrip- Se- Chro- Str. Str. Potential Potential Potential STAT5 binding ChIP-seq ChIP-seq ChIP-seq region Position of ding tion factor quence mo- (input binding binding site site region start region end potential site ID type some seq.) site start end (mm9) STAT5 (mm9) BS* 31895 Prkcb STAT5A Intron 7 1 1 129495224 129495238 CATTTCTGGGAAGGA 129495221 129495256 CTCCATTTCTGGGAAGGAAGCTCTTTGGAGGACTT 3 26711 Prkcb STAT5B Intron 7 -1 1 129710705 129710719 CGATTCTAGGAAGTA 129710685 129710720 CGATTCTAGGAAGTAGACACTGCCGTGCATCTAGC 0 29000 Prkcb STAT5B Intron 7 1 1 129495224 129495238 CATTTCTGGGAAGGA 129495221 129495256 CTCCATTTCTGGGAAGGAAGCTCTTTGGAGGACTT 3 29457 Prune2 STAT5A Intron 19 -1 1 17032422 17032436 CTATTCTAAGAAAAG 17032404 17032439 GTCTATTCTAAGAAAAGACCAATTTAGTTATAATT 2 25188 Prune2 STAT5A Intron 19 -1 1 17055157 17055171 TCTTTCTAGGAACTG 17055142 17055177 TCTGTTCTTTCTAGGAACTGACAAAGAAATTGGAT 5 33905 Prune2 STAT5A Intron 19 -1 1 17202867 17202881 CAATTCTTGGCATCC 17202865 17202900 TTCCAGAGGACCCAGGTACAATTCTTGGCATCCAC 18 29148 Prune2 STAT5B Intron 19 -1 1 17032422 17032436 CTATTCTAAGAAAAG 17032404 17032439 GTCTATTCTAAGAAAAGACCAATTTAGTTATAATT 2 27043 Prune2 STAT5B Intron 19 -1 1 17055157 17055171 TCTTTCTAGGAACTG 17055142 17055177 TCTGTTCTTTCTAGGAACTGACAAAGAAATTGGAT 5 31134 Ptpre STAT5B Intron 7 -1 1 142846263 142846277 GAGTTCTGAGAATGA 142846253 142846288 CTTGAGTTAAGAGTTCTGAGAATGACTATGGAGAC 10 24640 Rnf213 STAT5A Intron 11 -1 1 119316556 119316570 GATTTCTGGGAACAG 119316549 119316584 TGGTTGCCTCTTAGATTTCTGGGAACAGGATAGGC 13 32788 Rnf213 STAT5A Promoter 11 1 1 119254262 119254276 GATTACCGGGAATCT 119254254 119254289 CTGGTTTCGATTACCGGGAATCTTCTGAGTCCCCA 8 32788 Rnf213 STAT5A Promoter 11 1 1 119254262 119254276 GATTACCGGGAATCT 119254261 119254296 CGATTACCGGGAATCTTCTGAGTCCCCATCACTTC 1 24698 Rnf213 STAT5B Intron 11 -1 1 119316556 119316570 GATTTCTGGGAACAG 119316549 119316584 TGGTTGCCTCTTAGATTTCTGGGAACAGGATAGGC 13 31987 Rnf213 STAT5B Promoter 11 1 1 119254262 119254276 GATTACCGGGAATCT 119254254 119254289 CTGGTTTCGATTACCGGGAATCTTCTGAGTCCCCA 8 31987 Rnf213 STAT5B Promoter 11 1 1 119254262 119254276 GATTACCGGGAATCT 119254261 119254296 CGATTACCGGGAATCTTCTGAGTCCCCATCACTTC 1 32552 Sema3c STAT5A Intron 5 1 1 17086027 17086041 TATTTCCGGAAATTA 17086023 17086058 ATATTATTTCCGGAAATTAGTGCTCTGTGACTTGA 4 33700 Sema3c STAT5B Intron 5 1 1 17086027 17086041 TATTTCCGGAAATTA 17086023 17086058 ATATTATTTCCGGAAATTAGTGCTCTGTGACTTGA 4 23849 Serpina3f STAT5A Promoter 12 1 1 105452630 105452644 TATTTCCTAGAAATC 105452621 105452656 TATCAGTTCTATTTCCTAGAAATCACCCATTTCCC 9 23393 Serpina3f STAT5B Promoter 12 1 1 105452630 105452644 TATTTCCTAGAAATC 105452621 105452656 TATCAGTTCTATTTCCTAGAAATCACCCATTTCCC 9 25865 Serpina3g STAT5A Intron 12 1 1 105476718 105476732 TAATTCGTGGAACAC 105476714 105476749 GTTATAATTCGTGGAACACACACATTACCCATGCC 4 27945 Serpina3g STAT5B Intron 12 1 1 105476718 105476732 TAATTCGTGGAACAC 105476714 105476749 GTTATAATTCGTGGAACACACACATTACCCATGCC 4 31996 Slc25a24 STAT5B Intron 3 -1 1 108936454 108936468 CTGTTCTGGGAATGC 108936435 108936470 ACTGTTCTGGGAATGCACACAAACCAGTGTGCTAC 1 34282 Slc7a6 STAT5A Intron 8 1 1 108705997 108706011 CATTTCTTGGCATTG 108705987 108706022 AGCACCTGCACATTTCTTGGCATTGAGTTTCACAA 10 24302 Socs3 STAT5A Intron 11 -1 -1 117830541 117830555 CAGTTCCAGGAATCG 117830536 117830571 GCCGGGCAGTTCCAGGAATCGGGGGGCGGGGCGTA 6 23594 Socs3 STAT5A Intron 11 1 -1 117830541 117830555 CGATTCCTGGAACTG 117830530 117830565 GGCCAGTACGCCCCGCCCCCCGATTCCTGGAACTG 20 23594 Socs3 STAT5A Intron 11 1 -1 117830541 117830555 CGATTCCTGGAACTG 117830530 117830565 GGCCAGGACGCCCCGCCCCCCGATTCCTGGAACTG 20 23594 Socs3 STAT5A Intron 11 1 -1 117830541 117830555 CGATTCCTGGAACTG 117830530 117830565 GGCCAGTACGCCCCGCCCCCCGATTCCTGGAACTG 20 23594 Socs3 STAT5A Intron 11 1 -1 117830541 117830555 CGATTCCTGGAACTG 117830531 117830566 GCCAGTACGCCCCGCCCCCCGATTCCTGGAACTGC 19 23594 Socs3 STAT5A Intron 11 1 -1 117830541 117830555 CGATTCCTGGAACTG 117830532 117830567 CCAGTACGCCCCGCCCCCCGATTCCTGGAACTGCC 18 23594 Socs3 STAT5A Intron 11 1 -1 117830541 117830555 CGATTCCTGGAACTG 117830533 117830568 CAGTACGCCCCGCCCCCCGATTCCTGGAACTGCCC 17 23594 Socs3 STAT5A Intron 11 1 -1 117830541 117830555 CGATTCCTGGAACTG 117830534 117830569 AGTACGCCCCGCCCCCCGATTCCTGGAACTGCCCG 16 23594 Socs3 STAT5A Intron 11 1 -1 117830541 117830555 CGATTCCTGGAACTG 117830541 117830576 CCCGCCCCCCGATTCCTGGAACTGCCCGGCCGGTC 9 26056 Socs3 STAT5B Intron 11 -1 -1 117830541 117830555 CAGTTCCAGGAATCG 117830536 117830571 GCCGGGCAGTTCCAGGAATCGGGGGGCGGGGCGTA 6 24513 Socs3 STAT5B Intron 11 1 -1 117830541 117830555 CGATTCCTGGAACTG 117830530 117830565 GGCCAGTACGCCCCGCCCCCCGATTCCTGGAACTG 20 24513 Socs3 STAT5B Intron 11 1 -1 117830541 117830555 CGATTCCTGGAACTG 117830530 117830565 GGCCAGGACGCCCCGCCCCCCGATTCCTGGAACTG 20 24513 Socs3 STAT5B Intron 11 1 -1 117830541 117830555 CGATTCCTGGAACTG 117830530 117830565 GGCCAGTACGCCCCGCCCCCCGATTCCTGGAACTG 20 24513 Socs3 STAT5B Intron 11 1 -1 117830541 117830555 CGATTCCTGGAACTG 117830531 117830566 GCCAGTACGCCCCGCCCCCCGATTCCTGGAACTGC 19 24513 Socs3 STAT5B Intron 11 1 -1 117830541 117830555 CGATTCCTGGAACTG 117830532 117830567 CCAGTACGCCCCGCCCCCCGATTCCTGGAACTGCC 18 24513 Socs3 STAT5B Intron 11 1 -1 117830541 117830555 CGATTCCTGGAACTG 117830533 117830568 CAGTACGCCCCGCCCCCCGATTCCTGGAACTGCCC 17 24513 Socs3 STAT5B Intron 11 1 -1 117830541 117830555 CGATTCCTGGAACTG 117830534 117830569 AGTACGCCCCGCCCCCCGATTCCTGGAACTGCCCG 16 24513 Socs3 STAT5B Intron 11 1 -1 117830541 117830555 CGATTCCTGGAACTG 117830541 117830576 CCCGCCCCCCGATTCCTGGAACTGCCCGGCCGGTC 9 27633 Stat2 STAT5A Intron 10 -1 1 127708696 127708710 TGCTTCTAAGAAACA 127708690 127708725 TTATATGCCTGTATTGCTTCTAAGAAACAAAAACT 14 128

Table 6: continued

Bin- Gene Transcrip- Se- Chro- Str. Str. Potential Potential Potential STAT5 binding ChIP-seq ChIP-seq ChIP-seq region Position of ding tion factor quence mo- (input binding binding site site region start region end potential site ID type some seq.) site start end (mm9) STAT5 (mm9) BS* 27908 Stat2 STAT5A Intron 10 -1 1 127724229 127724243 TGGTTCCCAGAACCT 127724221 127724256 AGGACTGGGGTTTGGTTCCCAGAACCTGCATGGAG 12 32001 Stat2 STAT5B Intron 10 -1 1 127708696 127708710 TGCTTCTAAGAAACA 127708690 127708725 TTATATGCCTGTATTGCTTCTAAGAAACAAAAACT 14 28543 Stat2 STAT5B Intron 10 -1 1 127724229 127724243 TGGTTCCCAGAACCT 127724221 127724256 AGGACTGGGGTTTGGTTCCCAGAACCTGCATGGAG 12 30121 Stk10 STAT5A Intron 11 -1 1 32485936 32485950 GAATTCCAGGAATGG 32485916 32485951 GAATTCCAGGAATGGAGCACGGGTGGCCAGGCTTG 0 25711 Stk10 STAT5B Intron 11 -1 1 32485936 32485950 GAATTCCAGGAATGG 32485916 32485951 GAATTCCAGGAATGGAGCACGGGTGGCCAGGCTTG 0 31521 Syt6 STAT5B Intron 3 1 1 103423983 103423997 GCTTTCTAGGAAGGG 103423981 103424016 TGGCTTTCTAGGAAGGGACACTTTCTGCTCACAAG 2 30814 Timp1 STAT5A Exon X -1 1 20449923 20449937 GATTTCTGGGGAACC 20449909 20449944 CTCGTTGATTTCTGGGGAACCCATGAATTTAGCCC 6 26730 Trem2 STAT5B Exon 17 1 1 48492265 48492279 GTATTCCTGGAAGGC 48492262 48492297 CAGGTATTCCTGGAAGGCAGGCTGGGGCTGAAAAA 3 23643 Trpv2 STAT5A Intron 11 -1 1 62392549 62392563 GGATTCTGGGAAGTC 62392533 62392568 ATGAGGATTCTGGGAAGTCAGGTTCAGTACCCTAG 4 24585 Trpv2 STAT5B Intron 11 -1 1 62392549 62392563 GGATTCTGGGAAGTC 62392533 62392568 ATGAGGATTCTGGGAAGTCAGGTTCAGTACCCTAG 4 23535 Ugt2b35 STAT5A Exon 5 -1 1 87430331 87430345 GGATTCCTGGAATTT 87430320 87430355 ATCAAACTTGGATTCCTGGAATTTTGTCATGAGCT 9 23310 Ugt2b35 STAT5B Exon 5 -1 1 87430331 87430345 GGATTCCTGGAATTT 87430320 87430355 ATCAAACTTGGATTCCTGGAATTTTGTCATGAGCT 9 33949 Unc93b1 STAT5A Intron 19 -1 1 3938019 3938033 GGATTCCTTGAAACT 3938004 3938039 GTGTTGGATTCCTTGAAACTGGAATTACAGCTGGT 5 33606 Unc93b1 STAT5B Intron 19 -1 1 3938019 3938033 GGATTCCTTGAAACT 3938004 3938039 GTGTTGGATTCCTTGAAACTGGAATTACAGCTGGT 5 32821 Vav1 STAT5A Intron 17 1 1 57420531 57420545 ACTTTCTTGGAAATC 57420523 57420558 TCTCTCTGACTTTCTTGGAAATCCTACATCTGGTG 8

* in ChIP-seq region 129

APPENDIX 7: POSITION WEIGHT MATRIX FILE

“STAT5_PWM__SHORT_GAS__DEFAULT_BG.TXT” (SHORT STAT5 MOTIF)

MEME version 4.4

ALPHABET= ACGT strands: + -

Background letter frequencies (from uniform background): A 0.25000 C 0.25000 G 0.25000 T 0.25000

MOTIF stat5_short letter-probability matrix: alength= 4 w= 9 nsites= 26 E= 0 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.076923 0.730769 0.076923 0.115385 0.076923 0.384615 0.192308 0.346154 0.346154 0.192308 0.192308 0.269231 0.423077 0.000000 0.500000 0.076923 0.038462 0.076923 0.769231 0.115385 0.961538 0.000000 0.000000 0.038462 0.961538 0.000000 0.038462 0.000000

130

APPENDIX 8: POSITION WEIGHT MATRICES FILE “STAT5MEMELONG” (LONG

STAT5A AND STAT5B MOTIFS)

MEME version 4.4

ALPHABET= ACGT strands: + -

Background letter frequencies (from uniform background): A 0.25000 C 0.25000 G 0.25000 T 0.25000

MOTIF stat5a_long letter-probability matrix: alength= 4 w= 15 nsites= 32 E= 0 0.000000 0.300000 0.400000 0.300000 0.600000 0.100000 0.233333 0.066667 0.424242 0.090909 0.121212 0.363636 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.454545 0.151515 0.393939 0.272727 0.212121 0.242424 0.272727 0.212121 0.000000 0.787879 0.000000 0.000000 0.000000 1.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.406250 0.218750 0.125000 0.250000 0.093750 0.312500 0.000000 0.593750 0.300000 0.266667 0.200000 0.233333

MOTIF stat5b_long letter-probability matrix: alength= 4 w= 15 nsites= 43 E= 0 0.000000 0.166667 0.466667 0.366667 0.585366 0.048780 0.170732 0.195122 0.355556 0.022222 0.066667 0.555556 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.600000 0.022222 0.377778 131

0.155556 0.244444 0.155556 0.444444 0.222222 0.044444 0.733333 0.000000 0.000000 0.000000 1.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.377778 0.111111 0.111111 0.400000 0.177778 0.222222 0.022222 0.577778 0.190476 0.238095 0.166667 0.404762

132

APPENDIX 9: 43 GENES WITH POTENTIAL STAT5 BINDING SITES (LONG

MOTIF FOR STAT5A OR STAT5B AND SHORT MOTIF) RELATED TO TERM

"INFLAMMATORY RESPONSE"

Table 7

43 Genes with potential STAT5 binding sites (long motif for STAT5A or STAT5B and short motif) related to term "Inflammatory Response"

Table 7: continued Gene Adam8 Adipoq Alox5ap C3 Ccl12 Ccl2 Ccl5 Ccl7 Ccl8 Ccr2 Ccr5 Cd14 Cd44 Clec7a Csf1 Cxcl11 Cxcl13 Cxcl9 Fcer1g Fcgr1 Fcgr3 Hck Il17rc Il1b Il34 133

Table 7: continued Gene Itgb2 Lat Ly86 Masp1 Nfkbiz P2rx7 Pik3cd Pla2g7 S100a9 Saa3 Serpina3n Slc11a1 Spn Tlr1 Tlr2 Tlr7 Tnfaip8l2 Tnfrsf1b

134

APPENDIX 10: 58 GENES WITH POTENTIAL STAT5 BINDING SITES (LONG

MOTIF FOR STAT5A OR STAT5B AND SHORT MOTIF) RELATED TO TERM

"APOPTOTIC PROCESS"

Table 8

58 Genes with potential STAT5 binding sites (long motif for STAT5A or STAT5B and short motif) related to term "Apoptotic process"

Table 8: continued Gene Adam8 Adipoq Aif1 Akr1c18 Ankrd1 Bak1 Bcl3 Birc3 Birc5 Card11 Casp1 Casp12 Casp4 Ccl19 Ccl2 Ccl5 Ccr5 Cd44 Cd5 Clec5a Cx3cr1 Dnase1l3 Dusp2 Fcer1g Hck 135

Table 8: continued Gene Hcls1 Hk2 Ifi204 Ifit3 Il1b Il2rb Inpp5d Irf1 Lck Lcn2 Med1 Mical1 Mmp2 Ncf2 Nckap1l P2rx7 Pdcd1 Plaur Plk2 Prkcb Prune2 Ptprc Ripk3 S100a9 Scin Serpina3g Snca Sp110 Spn Timp1 Tmem173 Traf1 Trpv2

136

APPENDIX 11: 24 GENES WITH POTENTIAL STAT5 BINDING SITES (LONG

MOTIF FOR STAT5A OR STAT5B AND SHORT MOTIF) RELATED TO TERM

"CYTOKINE-MEDIATED SIGNALING PATHWAY"

Table 9

24 Genes with potential STAT5 binding sites (long motif for STAT5A or STAT5B and short motif) related to term "Cytokine-mediated signaling pathway"

Gene Adipoq Ccdc88b Ccl2 Ccl5 Ccr2 Ccr5 Csf2rb Cx3cr1 Cxcl13 Cxcr3 Cxcr6 Duox2 Ifitm3 Il10ra Il1b Il2rb Irf1 Klf6 Lilrb3 Med1 Osmr Ptprc Socs1 Tnfrsf1b

137

APPENDIX 12: AGAINST CHIP-SEQ DATA SET (GEO: GSM784027) VALIDATED

POTENTIAL STAT5 BINDING SITES (80 GENES, LONG MOTIF)

Table 10

Against ChIP-seq data set (GEO: GSM784027) validated potential STAT5 binding sites

(80 genes, long motif)

Table 10: continued Gene 2810459M11Rik A230050P20Rik Adcy7 Amph Ankrd6 Anxa8 Bst1 Casp4 Ccr5 Cenpe Cfi Cftr Creb5 Ctsc Dcdc2a Ddx60 Emr1 Epsti1 Fga Fyb Galnt12 Gbp5 Gmfg Havcr2 Hmha1 Icam1 Icos Ifi35 138

Table 10: continued Gene Ifit3 Ifngr1 Igdcc3 Ikzf1 Il17rc Inpp5d Irf1 Irf8 Irf9 Irg1 Itga4 Itgal Itgb2 Itpr3 Klf6 Lcp1 Lgals3 Loxl2 Lrrc17 Ly86 Maff Mical1 Mmp2 Mrc1 Ms4a4b Ms4a4d Ncapg Nckap1l P2rx7 Pik3r5 Pla2g7 Plaur Pou2af1 Prkcb Prune2 Ptpre Rnf213 Sema3c Serpina3f Serpina3g 139

Table 10: continued Gene Slc25a24 Slc7a6 Socs3 Stat2 Stk10 Syt6 Timp1 Trem2 Trpv2 Ugt2b35 Unc93b1 Vav1

140

APPENDIX 13: AGAINST CHIP-SEQ DATA SET (GEO: GSM784027) VALIDATED

POTENTIAL STAT5 BINDING SITES (74 GENES, SHORT MOTIF)

Table 11

Against ChIP-seq data set (GEO: GSM784027) validated potential STAT5 binding sites

(74 genes, short motif)

Table 11: continued Gene 2810459M11Rik A230050P20Rik Adcy7 Amph Ankrd6 Anxa8 Apol6 Batf Ccr5 Cenpe Cfi Cftr Creb5 Ctsc Ctss Dcdc2a Ddx60 Emr1 Fbln1 Fga Fyb Gbp1 Gbp5 Gmfg Gsdmd Icam1 Ifit3 Ifngr1 141

Table 11: continued Gene Igdcc3 Ikzf1 Il17rc Inpp5d Irf1 Irf8 Irf9 Irg1 Itga4 Itgb2 Klf6 Lcp1 Loxl2 Lrrc17 Ly86 Lyl1 Maff Map4k1 Mical1 Mmp14 Mmp2 Mrc1 Ms4a4b Ncapg Ncf2 Nckap1l Oas2 Plaur Pou2af1 Prkcb Prune2 Ptpre Rnf213 Serpina3f Serpina3g Slc11a1 Slc25a24 Socs3 Stat2 Stk10 142

Table 11: continued Gene Syt13 Syt6 Trem2 Trpv2 Ugt2b35 Vav1

143

APPENDIX 14: AGAINST CHIP-SEQ DATA SET (GEO: GSM784027) VALIDATED

POTENTIAL STAT5 BINDING SITES (61 GENES IN INTERSECTION BETWEEN

LONG AND SHORT MOTIF)

Table 12

Against ChIP-seq data set (GEO: GSM784027) validated potential STAT5 binding sites

(61 genes in intersection between long and short motif)

Table 12: continued Gene 2810459M11Rik A230050P20Rik Adcy7 Amph Ankrd6 Anxa8 Ccr5 Cenpe Cfi Cftr Creb5 Ctsc Dcdc2a Ddx60 Emr1 Fga Fyb Gbp5 Gmfg Icam1 Ifit3 Ifngr1 Igdcc3 Ikzf1 Il17rc Inpp5d 144

Table 12: continued Gene Irf1 Irf8 Irf9 Irg1 Itga4 Itgb2 Klf6 Lcp1 Loxl2 Lrrc17 Ly86 Maff Mical1 Mmp2 Mrc1 Ms4a4b Ncapg Nckap1l Plaur Pou2af1 Prkcb Prune2 Ptpre Rnf213 Serpina3f Serpina3g Slc25a24 Socs3 Stat2 Stk10 Syt6 Trem2 Trpv2 Ugt2b35 Vav1

145

APPENDIX 15: 190 POTENTIAL STAT5 BINDING SITES FOR GENE PRKCB (SHORT MOTIF)

The difference of the potential STAT5 binding sites start coordinates to the TSS was performed in the same orientation for both strands. Therefore, the start coordinate of each potential STAT5 binding site was subtracted from the TSS. The table is sorted descendingly after field “Start position (mm10)*”.

Potential STAT5 binding sites that were found by FIMO on the opposite strand compared to the input sequence were assigned the sequence type (promoter, exon, intron, 3’ UTR, 5’ UTR) and gene name of the input sequence. Gene Prkcb is located on the positive strand.

Table 13

190 potential STAT5 binding sites for gene Prkcb (short motif)

Table 13: continued

Binding Sequence Chromosome Strand Potential Potential Potential Potential Start position p_value q_value Score Potential site ID type binding site binding site binding site binding site (mm10)* STAT5 binding start (mm9) end (mm9) start end (mm10) site (mm10) 1736 Intron 7 -1 129440658 129440666 122297144 122297152 -8394 0.0000179 0.369 12.4785 TTCCAAGAA 1747 Intron 7 -1 129440658 129440666 122297144 122297152 -8394 0.0000179 0.369 12.4785 TTCCAAGAA 1751 Intron 7 -1 129440658 129440666 122297144 122297152 -8394 0.0000179 0.369 12.4785 TTCCAAGAA 1756 Intron 7 -1 129440658 129440666 122297144 122297152 -8394 0.0000179 0.369 12.4785 TTCCAAGAA 2191 Intron 7 1 129440658 129440666 122297144 122297152 -8394 0.0000294 0.486 12.1806 TTCTTGGAA 146

Table 13: continued

Binding Sequence Chromosome Strand Potential Potential Potential Potential Start position p_value q_value Score Potential site ID type binding site binding site binding site binding site (mm10)* STAT5 binding start (mm9) end (mm9) start end (mm10) site (mm10) 2202 Intron 7 1 129440658 129440666 122297144 122297152 -8394 0.0000294 0.486 12.1806 TTCTTGGAA 2206 Intron 7 1 129440658 129440666 122297144 122297152 -8394 0.0000294 0.486 12.1806 TTCTTGGAA 2211 Intron 7 1 129440658 129440666 122297144 122297152 -8394 0.0000294 0.486 12.1806 TTCTTGGAA 4820 Intron 7 -1 129440805 129440813 122297291 122297299 -8541 0.0000664 0.509 11.6843 TTCTTAGAA 4827 Intron 7 -1 129440805 129440813 122297291 122297299 -8541 0.0000664 0.509 11.6843 TTCTTAGAA 4828 Intron 7 -1 129440805 129440813 122297291 122297299 -8541 0.0000664 0.509 11.6843 TTCTTAGAA 4830 Intron 7 -1 129440805 129440813 122297291 122297299 -8541 0.0000664 0.509 11.6843 TTCTTAGAA 2836 Intron 7 1 129440805 129440813 122297291 122297299 -8541 0.0000403 0.497 12.0814 TTCTAAGAA 2850 Intron 7 1 129440805 129440813 122297291 122297299 -8541 0.0000403 0.497 12.0814 TTCTAAGAA 2852 Intron 7 1 129440805 129440813 122297291 122297299 -8541 0.0000403 0.497 12.0814 TTCTAAGAA 2857 Intron 7 1 129440805 129440813 122297291 122297299 -8541 0.0000403 0.497 12.0814 TTCTAAGAA 5731 Intron 7 -1 129442551 129442559 122299037 122299045 -10287 0.0000852 0.532 11.4857 TTCTGAGAA 5737 Intron 7 -1 129442551 129442559 122299037 122299045 -10287 0.0000852 0.532 11.4857 TTCTGAGAA 5741 Intron 7 -1 129442551 129442559 122299037 122299045 -10287 0.0000852 0.532 11.4857 TTCTGAGAA 5745 Intron 7 -1 129442551 129442559 122299037 122299045 -10287 0.0000852 0.532 11.4857 TTCTGAGAA 5730 Intron 7 1 129442551 129442559 122299037 122299045 -10287 0.0000852 0.532 11.4857 TTCTCAGAA 5736 Intron 7 1 129442551 129442559 122299037 122299045 -10287 0.0000852 0.532 11.4857 TTCTCAGAA 5740 Intron 7 1 129442551 129442559 122299037 122299045 -10287 0.0000852 0.532 11.4857 TTCTCAGAA 5744 Intron 7 1 129442551 129442559 122299037 122299045 -10287 0.0000852 0.532 11.4857 TTCTCAGAA 1737 Intron 7 -1 129448344 129448352 122304830 122304838 -16080 0.0000179 0.369 12.4785 TTCCAAGAA 1748 Intron 7 -1 129448344 129448352 122304830 122304838 -16080 0.0000179 0.369 12.4785 TTCCAAGAA 1752 Intron 7 -1 129448344 129448352 122304830 122304838 -16080 0.0000179 0.369 12.4785 TTCCAAGAA 1757 Intron 7 -1 129448344 129448352 122304830 122304838 -16080 0.0000179 0.369 12.4785 TTCCAAGAA 2192 Intron 7 1 129448344 129448352 122304830 122304838 -16080 0.0000294 0.486 12.1806 TTCTTGGAA 2203 Intron 7 1 129448344 129448352 122304830 122304838 -16080 0.0000294 0.486 12.1806 TTCTTGGAA 2207 Intron 7 1 129448344 129448352 122304830 122304838 -16080 0.0000294 0.486 12.1806 TTCTTGGAA 2212 Intron 7 1 129448344 129448352 122304830 122304838 -16080 0.0000294 0.486 12.1806 TTCTTGGAA 3557 Intron 7 -1 129456838 129456846 122313324 122313332 -24574 0.0000524 0.509 11.9821 TTCTCGGAA 147

Table 13: continued

Binding Sequence Chromosome Strand Potential Potential Potential Potential Start position p_value q_value Score Potential site ID type binding site binding site binding site binding site (mm10)* STAT5 binding start (mm9) end (mm9) start end (mm10) site (mm10) 3571 Intron 7 -1 129456838 129456846 122313324 122313332 -24574 0.0000524 0.509 11.9821 TTCTCGGAA 3575 Intron 7 -1 129456838 129456846 122313324 122313332 -24574 0.0000524 0.509 11.9821 TTCTCGGAA 3579 Intron 7 -1 129456838 129456846 122313324 122313332 -24574 0.0000524 0.509 11.9821 TTCTCGGAA 4269 Intron 7 1 129456838 129456846 122313324 122313332 -24574 0.0000604 0.509 11.8828 TTCCGAGAA 4283 Intron 7 1 129456838 129456846 122313324 122313332 -24574 0.0000604 0.509 11.8828 TTCCGAGAA 4287 Intron 7 1 129456838 129456846 122313324 122313332 -24574 0.0000604 0.509 11.8828 TTCCGAGAA 4291 Intron 7 1 129456838 129456846 122313324 122313332 -24574 0.0000604 0.509 11.8828 TTCCGAGAA 4266 Intron 7 -1 129459662 129459670 122316148 122316156 -27398 0.0000604 0.509 11.8828 TTCCCAGAA 4280 Intron 7 -1 129459662 129459670 122316148 122316156 -27398 0.0000604 0.509 11.8828 TTCCCAGAA 4284 Intron 7 -1 129459662 129459670 122316148 122316156 -27398 0.0000604 0.509 11.8828 TTCCCAGAA 4288 Intron 7 -1 129459662 129459670 122316148 122316156 -27398 0.0000604 0.509 11.8828 TTCCCAGAA 3558 Intron 7 1 129459662 129459670 122316148 122316156 -27398 0.0000524 0.509 11.9821 TTCTGGGAA 3572 Intron 7 1 129459662 129459670 122316148 122316156 -27398 0.0000524 0.509 11.9821 TTCTGGGAA 3576 Intron 7 1 129459662 129459670 122316148 122316156 -27398 0.0000524 0.509 11.9821 TTCTGGGAA 3580 Intron 7 1 129459662 129459670 122316148 122316156 -27398 0.0000524 0.509 11.9821 TTCTGGGAA 1217 Intron 7 -1 129463789 129463797 122320275 122320283 -31525 0.000013 0.328 12.5777 TTCCTGGAA 1230 Intron 7 -1 129463789 129463797 122320275 122320283 -31525 0.000013 0.328 12.5777 TTCCTGGAA 1233 Intron 7 -1 129463789 129463797 122320275 122320283 -31525 0.000013 0.328 12.5777 TTCCTGGAA 1238 Intron 7 -1 129463789 129463797 122320275 122320283 -31525 0.000013 0.328 12.5777 TTCCTGGAA 411 Intron 7 1 129463789 129463797 122320275 122320283 -31525 0.00000403 0.262 12.9748 TTCCAGGAA 417 Intron 7 1 129463789 129463797 122320275 122320283 -31525 0.00000403 0.262 12.9748 TTCCAGGAA 419 Intron 7 1 129463789 129463797 122320275 122320283 -31525 0.00000403 0.262 12.9748 TTCCAGGAA 421 Intron 7 1 129463789 129463797 122320275 122320283 -31525 0.00000403 0.262 12.9748 TTCCAGGAA 2189 Intron 7 -1 129466014 129466022 122322500 122322508 -33750 0.0000294 0.486 12.1806 TTCTTGGAA 2200 Intron 7 -1 129466014 129466022 122322500 122322508 -33750 0.0000294 0.486 12.1806 TTCTTGGAA 2204 Intron 7 -1 129466014 129466022 122322500 122322508 -33750 0.0000294 0.486 12.1806 TTCTTGGAA 2209 Intron 7 -1 129466014 129466022 122322500 122322508 -33750 0.0000294 0.486 12.1806 TTCTTGGAA 1738 Intron 7 1 129466014 129466022 122322500 122322508 -33750 0.0000179 0.369 12.4785 TTCCAAGAA 148

Table 13: continued

Binding Sequence Chromosome Strand Potential Potential Potential Potential Start position p_value q_value Score Potential site ID type binding site binding site binding site binding site (mm10)* STAT5 binding start (mm9) end (mm9) start end (mm10) site (mm10) 1749 Intron 7 1 129466014 129466022 122322500 122322508 -33750 0.0000179 0.369 12.4785 TTCCAAGAA 1753 Intron 7 1 129466014 129466022 122322500 122322508 -33750 0.0000179 0.369 12.4785 TTCCAAGAA 1758 Intron 7 1 129466014 129466022 122322500 122322508 -33750 0.0000179 0.369 12.4785 TTCCAAGAA 2190 Intron 7 -1 129468905 129468913 122325391 122325399 -36641 0.0000294 0.486 12.1806 TTCTTGGAA 2201 Intron 7 -1 129468905 129468913 122325391 122325399 -36641 0.0000294 0.486 12.1806 TTCTTGGAA 2205 Intron 7 -1 129468905 129468913 122325391 122325399 -36641 0.0000294 0.486 12.1806 TTCTTGGAA 2210 Intron 7 -1 129468905 129468913 122325391 122325399 -36641 0.0000294 0.486 12.1806 TTCTTGGAA 1739 Intron 7 1 129468905 129468913 122325391 122325399 -36641 0.0000179 0.369 12.4785 TTCCAAGAA 1750 Intron 7 1 129468905 129468913 122325391 122325399 -36641 0.0000179 0.369 12.4785 TTCCAAGAA 1754 Intron 7 1 129468905 129468913 122325391 122325399 -36641 0.0000179 0.369 12.4785 TTCCAAGAA 1759 Intron 7 1 129468905 129468913 122325391 122325399 -36641 0.0000179 0.369 12.4785 TTCCAAGAA 5733 Intron 7 -1 129490200 129490208 122346686 122346694 -57936 0.0000852 0.532 11.4857 TTCTCAGAA 5739 Intron 7 -1 129490200 129490208 122346686 122346694 -57936 0.0000852 0.532 11.4857 TTCTCAGAA 5743 Intron 7 -1 129490200 129490208 122346686 122346694 -57936 0.0000852 0.532 11.4857 TTCTCAGAA 5747 Intron 7 -1 129490200 129490208 122346686 122346694 -57936 0.0000852 0.532 11.4857 TTCTCAGAA 5732 Intron 7 1 129490200 129490208 122346686 122346694 -57936 0.0000852 0.532 11.4857 TTCTGAGAA 5738 Intron 7 1 129490200 129490208 122346686 122346694 -57936 0.0000852 0.532 11.4857 TTCTGAGAA 5742 Intron 7 1 129490200 129490208 122346686 122346694 -57936 0.0000852 0.532 11.4857 TTCTGAGAA 5746 Intron 7 1 129490200 129490208 122346686 122346694 -57936 0.0000852 0.532 11.4857 TTCTGAGAA 4267 Intron 7 -1 129494153 129494161 122350639 122350647 -61889 0.0000604 0.509 11.8828 TTCCGAGAA 4281 Intron 7 -1 129494153 129494161 122350639 122350647 -61889 0.0000604 0.509 11.8828 TTCCGAGAA 4285 Intron 7 -1 129494153 129494161 122350639 122350647 -61889 0.0000604 0.509 11.8828 TTCCGAGAA 4289 Intron 7 -1 129494153 129494161 122350639 122350647 -61889 0.0000604 0.509 11.8828 TTCCGAGAA 3559 Intron 7 1 129494153 129494161 122350639 122350647 -61889 0.0000524 0.509 11.9821 TTCTCGGAA 3573 Intron 7 1 129494153 129494161 122350639 122350647 -61889 0.0000524 0.509 11.9821 TTCTCGGAA 3577 Intron 7 1 129494153 129494161 122350639 122350647 -61889 0.0000524 0.509 11.9821 TTCTCGGAA 3581 Intron 7 1 129494153 129494161 122350639 122350647 -61889 0.0000524 0.509 11.9821 TTCTCGGAA 4268 Intron 7 -1 129495227 129495235 122351713 122351721 -62963 0.0000604 0.509 11.8828 TTCCCAGAA 149

Table 13: continued

Binding Sequence Chromosome Strand Potential Potential Potential Potential Start position p_value q_value Score Potential site ID type binding site binding site binding site binding site (mm10)* STAT5 binding start (mm9) end (mm9) start end (mm10) site (mm10) 4282 Intron 7 -1 129495227 129495235 122351713 122351721 -62963 0.0000604 0.509 11.8828 TTCCCAGAA 4286 Intron 7 -1 129495227 129495235 122351713 122351721 -62963 0.0000604 0.509 11.8828 TTCCCAGAA 4290 Intron 7 -1 129495227 129495235 122351713 122351721 -62963 0.0000604 0.509 11.8828 TTCCCAGAA 3560 Intron 7 1 129495227 129495235 122351713 122351721 -62963 0.0000524 0.509 11.9821 TTCTGGGAA 3574 Intron 7 1 129495227 129495235 122351713 122351721 -62963 0.0000524 0.509 11.9821 TTCTGGGAA 3578 Intron 7 1 129495227 129495235 122351713 122351721 -62963 0.0000524 0.509 11.9821 TTCTGGGAA 3582 Intron 7 1 129495227 129495235 122351713 122351721 -62963 0.0000524 0.509 11.9821 TTCTGGGAA 1218 Intron 7 -1 129499200 129499208 122355686 122355694 -66936 0.000013 0.328 12.5777 TTCTAGGAA 1231 Intron 7 -1 129499200 129499208 122355686 122355694 -66936 0.000013 0.328 12.5777 TTCTAGGAA 1234 Intron 7 -1 129499200 129499208 122355686 122355694 -66936 0.000013 0.328 12.5777 TTCTAGGAA 1239 Intron 7 -1 129499200 129499208 122355686 122355694 -66936 0.000013 0.328 12.5777 TTCTAGGAA 2837 Intron 7 1 129499200 129499208 122355686 122355694 -66936 0.0000403 0.497 12.0814 TTCCTAGAA 2851 Intron 7 1 129499200 129499208 122355686 122355694 -66936 0.0000403 0.497 12.0814 TTCCTAGAA 2853 Intron 7 1 129499200 129499208 122355686 122355694 -66936 0.0000403 0.497 12.0814 TTCCTAGAA 2858 Intron 7 1 129499200 129499208 122355686 122355694 -66936 0.0000403 0.497 12.0814 TTCCTAGAA 2213 Intron 7 -1 129564038 129564046 122420524 122420532 -131774 0.0000294 0.486 12.1806 TTCTTGGAA 1760 Intron 7 1 129564038 129564046 122420524 122420532 -131774 0.0000179 0.369 12.4785 TTCCAAGAA 2193 Intron 7 -1 129572753 129572761 122429239 122429247 -140489 0.0000294 0.486 12.1806 TTCTTGGAA 1740 Intron 7 1 129572753 129572761 122429239 122429247 -140489 0.0000179 0.369 12.4785 TTCCAAGAA 4821 Intron 7 -1 129575027 129575035 122431513 122431521 -142763 0.0000664 0.509 11.6843 TTCTTAGAA 2839 Intron 7 1 129575027 129575035 122431513 122431521 -142763 0.0000403 0.497 12.0814 TTCTAAGAA 2838 Intron 7 -1 129575936 129575944 122432422 122432430 -143672 0.0000403 0.497 12.0814 TTCCTAGAA 1220 Intron 7 1 129575936 129575944 122432422 122432430 -143672 0.000013 0.328 12.5777 TTCTAGGAA 1219 Intron 7 -1 129583808 129583816 122440294 122440302 -151544 0.000013 0.328 12.5777 TTCTAGGAA 2840 Intron 7 1 129583808 129583816 122440294 122440302 -151544 0.0000403 0.497 12.0814 TTCCTAGAA 4270 Intron 7 -1 129588888 129588896 122445374 122445382 -156624 0.0000604 0.509 11.8828 TTCCCAGAA 3561 Intron 7 1 129588888 129588896 122445374 122445382 -156624 0.0000524 0.509 11.9821 TTCTGGGAA 2194 Intron 7 -1 129591999 129592007 122448485 122448493 -159735 0.0000294 0.486 12.1806 TTCTTGGAA 150

Table 13: continued

Binding Sequence Chromosome Strand Potential Potential Potential Potential Start position p_value q_value Score Potential site ID type binding site binding site binding site binding site (mm10)* STAT5 binding start (mm9) end (mm9) start end (mm10) site (mm10) 1741 Intron 7 1 129591999 129592007 122448485 122448493 -159735 0.0000179 0.369 12.4785 TTCCAAGAA 5735 Intron 7 -1 129604377 129604385 122460863 122460871 -172113 0.0000852 0.532 11.4857 TTCTGAGAA 5734 Intron 7 1 129604377 129604385 122460863 122460871 -172113 0.0000852 0.532 11.4857 TTCTCAGAA 4997 Intron 7 1 129610102 129610110 122466588 122466596 -177838 0.0000704 0.532 11.585 TTCGTGGAA 3562 Intron 7 -1 129632936 129632944 122489422 122489430 -200672 0.0000524 0.509 11.9821 TTCTGGGAA 4271 Intron 7 1 129632936 129632944 122489422 122489430 -200672 0.0000604 0.509 11.8828 TTCCCAGAA 2195 Intron 7 -1 129635550 129635558 122492036 122492044 -203286 0.0000294 0.486 12.1806 TTCTTGGAA 1742 Intron 7 1 129635550 129635558 122492036 122492044 -203286 0.0000179 0.369 12.4785 TTCCAAGAA 412 Intron 7 -1 129635674 129635682 122492160 122492168 -203410 0.00000403 0.262 12.9748 TTCCAGGAA 1221 Intron 7 1 129635674 129635682 122492160 122492168 -203410 0.000013 0.328 12.5777 TTCCTGGAA 3563 Intron 7 -1 129644176 129644184 122500662 122500670 -211912 0.0000524 0.509 11.9821 TTCTGGGAA 4272 Intron 7 1 129644176 129644184 122500662 122500670 -211912 0.0000604 0.509 11.8828 TTCCCAGAA 3564 Intron 7 -1 129648147 129648155 122504633 122504641 -215883 0.0000524 0.509 11.9821 TTCTGGGAA 4273 Intron 7 1 129648147 129648155 122504633 122504641 -215883 0.0000604 0.509 11.8828 TTCCCAGAA 2841 Intron 7 -1 129648992 129649000 122505478 122505486 -216728 0.0000403 0.497 12.0814 TTCTAAGAA 4822 Intron 7 1 129648992 129649000 122505478 122505486 -216728 0.0000664 0.509 11.6843 TTCTTAGAA 4823 Intron 7 -1 129658198 129658206 122514684 122514692 -225934 0.0000664 0.509 11.6843 TTCTTAGAA 2842 Intron 7 1 129658198 129658206 122514684 122514692 -225934 0.0000403 0.497 12.0814 TTCTAAGAA 3565 Intron 7 -1 129663907 129663915 122520393 122520401 -231643 0.0000524 0.509 11.9821 TTCTGGGAA 4274 Intron 7 1 129663907 129663915 122520393 122520401 -231643 0.0000604 0.509 11.8828 TTCCCAGAA 3566 Intron 7 -1 129666397 129666405 122522883 122522891 -234133 0.0000524 0.509 11.9821 TTCTGGGAA 4275 Intron 7 1 129666397 129666405 122522883 122522891 -234133 0.0000604 0.509 11.8828 TTCCCAGAA 2196 Intron 7 -1 129666911 129666919 122523397 122523405 -234647 0.0000294 0.486 12.1806 TTCTTGGAA 1743 Intron 7 1 129666911 129666919 122523397 122523405 -234647 0.0000179 0.369 12.4785 TTCCAAGAA 2843 Intron 7 -1 129668108 129668116 122524594 122524602 -235844 0.0000403 0.497 12.0814 TTCCTAGAA 1222 Intron 7 1 129668108 129668116 122524594 122524602 -235844 0.000013 0.328 12.5777 TTCTAGGAA 4998 Intron 7 1 129683479 129683487 122539965 122539973 -251215 0.0000704 0.532 11.585 TTCGTGGAA 1744 Intron 7 -1 129686637 129686645 122543123 122543131 -254373 0.0000179 0.369 12.4785 TTCCAAGAA 151

Table 13: continued

Binding Sequence Chromosome Strand Potential Potential Potential Potential Start position p_value q_value Score Potential site ID type binding site binding site binding site binding site (mm10)* STAT5 binding start (mm9) end (mm9) start end (mm10) site (mm10) 2197 Intron 7 1 129686637 129686645 122543123 122543131 -254373 0.0000294 0.486 12.1806 TTCTTGGAA 4824 Intron 7 -1 129690577 129690585 122547063 122547071 -258313 0.0000664 0.509 11.6843 TTCTTAGAA 2845 Intron 7 1 129690577 129690585 122547063 122547071 -258313 0.0000403 0.497 12.0814 TTCTAAGAA 413 Intron 7 -1 129696318 129696326 122552804 122552812 -264054 0.00000403 0.262 12.9748 TTCCAGGAA 1224 Intron 7 1 129696318 129696326 122552804 122552812 -264054 0.000013 0.328 12.5777 TTCCTGGAA 2844 Intron 7 -1 129702448 129702456 122558934 122558942 -270184 0.0000403 0.497 12.0814 TTCTAAGAA 4825 Intron 7 1 129702448 129702456 122558934 122558942 -270184 0.0000664 0.509 11.6843 TTCTTAGAA 414 Intron 7 -1 129707818 129707826 122564304 122564312 -275554 0.00000403 0.262 12.9748 TTCCAGGAA 1225 Intron 7 1 129707818 129707826 122564304 122564312 -275554 0.000013 0.328 12.5777 TTCCTGGAA 1223 Intron 7 -1 129710708 129710716 122567194 122567202 -278444 0.000013 0.328 12.5777 TTCTAGGAA 2846 Intron 7 1 129710708 129710716 122567194 122567202 -278444 0.0000403 0.497 12.0814 TTCCTAGAA 3567 Intron 7 -1 129710881 129710889 122567367 122567375 -278617 0.0000524 0.509 11.9821 TTCTGGGAA 4276 Intron 7 1 129710881 129710889 122567367 122567375 -278617 0.0000604 0.509 11.8828 TTCCCAGAA 2198 Intron 7 -1 129713119 129713127 122569605 122569613 -280855 0.0000294 0.486 12.1806 TTCTTGGAA 1745 Intron 7 1 129713119 129713127 122569605 122569613 -280855 0.0000179 0.369 12.4785 TTCCAAGAA 3568 Intron 7 -1 129716132 129716140 122572618 122572626 -283868 0.0000524 0.509 11.9821 TTCTGGGAA 4278 Intron 7 1 129716132 129716140 122572618 122572626 -283868 0.0000604 0.509 11.8828 TTCCCAGAA 4277 Intron 7 -1 129717107 129717115 122573593 122573601 -284843 0.0000604 0.509 11.8828 TTCCGAGAA 3570 Intron 7 1 129717107 129717115 122573593 122573601 -284843 0.0000524 0.509 11.9821 TTCTCGGAA 3569 Intron 7 -1 129721621 129721629 122578107 122578115 -289357 0.0000524 0.509 11.9821 TTCTGGGAA 4279 Intron 7 1 129721621 129721629 122578107 122578115 -289357 0.0000604 0.509 11.8828 TTCCCAGAA 1226 Intron 7 -1 129745699 129745707 122602185 122602193 -313435 0.000013 0.328 12.5777 TTCTAGGAA 1235 Intron 7 -1 129745699 129745707 122602185 122602193 -313435 0.000013 0.328 12.5777 TTCTAGGAA 2849 Intron 7 1 129745699 129745707 122602185 122602193 -313435 0.0000403 0.497 12.0814 TTCCTAGAA 2856 Intron 7 1 129745699 129745707 122602185 122602193 -313435 0.0000403 0.497 12.0814 TTCCTAGAA 2847 Intron 7 -1 129761438 129761446 122617924 122617932 -329174 0.0000403 0.497 12.0814 TTCTAAGAA 2854 Intron 7 -1 129761438 129761446 122617924 122617932 -329174 0.0000403 0.497 12.0814 TTCTAAGAA 4826 Intron 7 1 129761438 129761446 122617924 122617932 -329174 0.0000664 0.509 11.6843 TTCTTAGAA 152

Table 13: continued

Binding Sequence Chromosome Strand Potential Potential Potential Potential Start position p_value q_value Score Potential site ID type binding site binding site binding site binding site (mm10)* STAT5 binding start (mm9) end (mm9) start end (mm10) site (mm10) 4829 Intron 7 1 129761438 129761446 122617924 122617932 -329174 0.0000664 0.509 11.6843 TTCTTAGAA 2848 Intron 7 -1 129764728 129764736 122621214 122621222 -332464 0.0000403 0.497 12.0814 TTCCTAGAA 2855 Intron 7 -1 129764728 129764736 122621214 122621222 -332464 0.0000403 0.497 12.0814 TTCCTAGAA 1227 Intron 7 1 129764728 129764736 122621214 122621222 -332464 0.000013 0.328 12.5777 TTCTAGGAA 1236 Intron 7 1 129764728 129764736 122621214 122621222 -332464 0.000013 0.328 12.5777 TTCTAGGAA 1746 Intron 7 -1 129766235 129766243 122622721 122622729 -333971 0.0000179 0.369 12.4785 TTCCAAGAA 1755 Intron 7 -1 129766235 129766243 122622721 122622729 -333971 0.0000179 0.369 12.4785 TTCCAAGAA 2199 Intron 7 1 129766235 129766243 122622721 122622729 -333971 0.0000294 0.486 12.1806 TTCTTGGAA 2208 Intron 7 1 129766235 129766243 122622721 122622729 -333971 0.0000294 0.486 12.1806 TTCTTGGAA 418 3' UTR 7 -1 129772065 129772073 122628551 122628559 -339801 0.00000403 0.262 12.9748 TTCCAGGAA 416 Exon 7 -1 129772065 129772073 122628551 122628559 -339801 0.00000403 0.262 12.9748 TTCCAGGAA 415 Intron 7 -1 129772065 129772073 122628551 122628559 -339801 0.00000403 0.262 12.9748 TTCCAGGAA 420 Intron 7 -1 129772065 129772073 122628551 122628559 -339801 0.00000403 0.262 12.9748 TTCCAGGAA 1232 3' UTR 7 1 129772065 129772073 122628551 122628559 -339801 0.000013 0.328 12.5777 TTCCTGGAA 1229 Exon 7 1 129772065 129772073 122628551 122628559 -339801 0.000013 0.328 12.5777 TTCCTGGAA 1228 Intron 7 1 129772065 129772073 122628551 122628559 -339801 0.000013 0.328 12.5777 TTCCTGGAA 1237 Intron 7 1 129772065 129772073 122628551 122628559 -339801 0.000013 0.328 12.5777 TTCCTGGAA

* relative to TSS (122288751, Strand: 1, mm10)

153

APPENDIX 16: 169 POTENTIAL STAT5A BINDING SITES FOR GENE WIPF1 (LONG MOTIF)

The difference of the potential STAT5A binding sites end coordinates to the TSS was performed in the same orientation for both strands. Therefore, the end coordinate of each potential STAT5A binding site was subtracted from the TSS. The table is sorted ascendingly after field “End position (mm10)*”.

Potential STAT5 binding sites that were found by FIMO on the opposite strand compared to the input sequence were assigned the sequence type (promoter, exon, intron, 3’ UTR, 5’ UTR) and gene name of the input sequence. Gene Wipf1 is located on the negative strand.

Table 14

169 potential STAT5A binding sites for gene Wipf1 (long motif)

Table 14: continued

Binding Sequence Chromo- Strand Potential Potential Potential Potential End position p_value q_value Score Potential STAT5 binding site ID type some binding site binding site binding site binding site (mm10)* site start (mm9) end (mm9) start (mm10) end (mm10) 24029 Promoter 2 1 73368555 73368569 73530498 73530512 222 0.00000477 0.49 15.4732 GGATTCTTGGAATCT 24030 Promoter 2 1 73368045 73368059 73529988 73530002 732 0.00000477 0.49 15.4732 GGATTCTTGGAATCT 24031 Promoter 2 1 73367851 73367865 73529794 73529808 926 0.00000477 0.49 15.4732 GGATTCTTGGAATCT 25875 Exon 2 1 73367728 73367742 73529671 73529685 1049 0.0000184 0.546 13.3711 TTTTTCTGGGAAACT 25877 Exon 2 1 73367473 73367487 73529416 73529430 1304 0.0000184 0.546 13.3711 TTTTTCTGGGAAACT 33207 Intron 2 -1 73366948 73366962 73528891 73528905 1829 0.0000866 0.65 6.64431 TATTTGTAGGAAATG 33484 Intron 2 1 73366130 73366144 73528073 73528087 2647 0.0000905 0.658 6.53921 CGTTTACAGGAAATA 34319 Intron 2 -1 73366130 73366144 73528073 73528087 2647 0.0000987 0.662 6.329 TATTTCCTGTAAACG 154

Table 14: continued

Binding Sequence Chromo- Strand Potential Potential Potential Potential End position p_value q_value Score Potential STAT5 binding site ID type some binding site binding site binding site binding site (mm10)* site start (mm9) end (mm9) start (mm10) end (mm10) 31654 Intron 2 1 73365245 73365259 73527188 73527202 3532 0.0000719 0.634 7.16984 AACTTCCAGGAACTC 24072 Intron 2 -1 73365245 73365259 73527188 73527202 3532 0.00000519 0.49 15.3681 GAGTTCCTGGAAGTT 26536 Intron 2 1 73363499 73363513 73525442 73525456 5278 0.000023 0.547 12.8455 CAGTTCTAAGAACCA 27553 Intron 2 -1 73363499 73363513 73525442 73525456 5278 0.0000321 0.57 11.8996 TGGTTCTTAGAACTG 25765 Intron 2 1 73362864 73362878 73524807 73524821 5913 0.0000175 0.546 13.4762 TCATTCTAGGAAGCC 32161 Intron 2 1 73362645 73362659 73524588 73524602 6132 0.000077 0.637 6.95963 GATTTCTGGGACCTT 23364 Intron 2 -1 73359250 73359264 73521193 73521207 9527 0.000000465 0.427 17.4702 GAATTCTGGGAACTT 31454 Intron 2 -1 73357873 73357887 73519816 73519830 10904 0.0000695 0.631 7.27495 GAATTGCTGGAACTT 27550 Intron 2 1 73356563 73356577 73518506 73518520 12214 0.0000321 0.57 11.8996 TCCTTCCTGGAAGCC 31651 Intron 2 -1 73355723 73355737 73517666 73517680 13054 0.0000719 0.634 7.16984 TATTTCTCGGACATA 29745 Intron 2 -1 73355592 73355606 73517535 73517549 13185 0.0000527 0.605 9.16685 GTGTTCGGAGAAGTC 27002 Intron 2 1 73355244 73355258 73517187 73517201 13533 0.000027 0.56 12.4251 TGTTTCCTGGAAGAG 26915 Intron 2 -1 73355244 73355258 73517187 73517201 13533 0.000026 0.554 12.5302 CTCTTCCAGGAAACA 31656 Intron 2 -1 73353934 73353948 73515877 73515891 14843 0.0000719 0.634 7.16984 CAATTCCAGGCACTG 25564 Intron 2 -1 73350914 73350928 73512857 73512871 17863 0.0000157 0.535 13.6864 CATTTCTAAGAACCT 30829 Intron 2 1 73349939 73349953 73511882 73511896 18838 0.0000641 0.626 7.59026 GAATACCTGGAAACT 33962 Intron 2 -1 73348189 73348203 73510132 73510146 20588 0.0000946 0.659 6.4341 GGATGCTCGGAATTA 31235 Intron 2 1 73346788 73346802 73508731 73508745 21989 0.0000675 0.631 7.38005 CATTTCCTGGAGACC 32570 Intron 2 -1 73346788 73346802 73508731 73508745 21989 0.00008 0.641 6.85453 GGTCTCCAGGAAATG 33960 Intron 2 1 73345466 73345480 73507409 73507423 23311 0.0000946 0.659 6.4341 AAGTTCTTGGAATCC 24824 Intron 2 -1 73345466 73345480 73507409 73507423 23311 0.0000104 0.496 14.4221 GGATTCCAAGAACTT 26535 Intron 2 1 73344861 73344875 73506804 73506818 23916 0.000023 0.547 12.8455 CATTTCCCAGAAAAG 24936 Intron 2 -1 73344861 73344875 73506804 73506818 23916 0.0000111 0.496 14.317 CTTTTCTGGGAAATG 32158 Intron 2 1 73342947 73342961 73504890 73504904 25830 0.000077 0.637 6.95963 AGATTCTAGGAATCC 25559 Intron 2 -1 73342947 73342961 73504890 73504904 25830 0.0000157 0.535 13.6864 GGATTCCTAGAATCT 26534 Intron 2 1 73341794 73341808 73503737 73503751 26983 0.000023 0.547 12.8455 CAGTTCCTAGAAGTG 24406 Intron 2 -1 73341794 73341808 73503737 73503751 26983 0.00000723 0.49 14.9476 CACTTCTAGGAACTG 32163 Intron 2 1 73340800 73340814 73502743 73502757 27977 0.000077 0.637 6.95963 GATTTCTGGGACCTT 29072 Intron 2 1 73338347 73338361 73500290 73500304 30430 0.0000463 0.598 10.2179 CTTTTCCGAGAAAAA 28227 Intron 2 -1 73338347 73338361 73500290 73500304 30430 0.0000379 0.578 11.269 TTTTTCTCGGAAAAG 26116 Intron 2 1 73337270 73337284 73499213 73499227 31507 0.0000202 0.547 13.1609 TGCTTCTGGGAAGTG 27308 Intron 2 -1 73337270 73337284 73499213 73499227 31507 0.00003 0.565 12.1098 CACTTCCCAGAAGCA 29744 Intron 2 1 73334813 73334827 73496756 73496770 33964 0.0000527 0.605 9.16685 TTGTTCTGAGAATCT 27552 Intron 2 1 73334718 73334732 73496661 73496675 34059 0.0000321 0.57 11.8996 TCCTTCCTGGAAGCC 25373 Intron 2 1 73334462 73334476 73496405 73496419 34315 0.0000142 0.535 13.8966 GAGTTCCAGGAACAA 155

Table 14: continued

Binding Sequence Chromo- Strand Potential Potential Potential Potential End position p_value q_value Score Potential STAT5 binding site ID type some binding site binding site binding site binding site (mm10)* site start (mm9) end (mm9) start (mm10) end (mm10) 26533 Intron 2 -1 73334462 73334476 73496405 73496419 34315 0.000023 0.547 12.8455 TTGTTCCTGGAACTC 31655 Intron 2 -1 73333878 73333892 73495821 73495835 34899 0.0000719 0.634 7.16984 TATTTCTCGGACATA 30832 Intron 2 1 73331555 73331569 73493498 73493512 37222 0.0000641 0.626 7.59026 GAATACCTGGAAACT 23775 Intron 2 1 73331276 73331290 73493219 73493233 37501 0.00000329 0.49 15.8936 GAATTCTGGGAAAAA 27637 Intron 2 -1 73331276 73331290 73493219 73493233 37501 0.0000331 0.572 11.7945 TTTTTCCCAGAATTC 25558 Intron 2 1 73329157 73329171 73491100 73491114 39620 0.0000157 0.535 13.6864 CGATTCTGAGAATTT 25567 Intron 2 -1 73329069 73329083 73491012 73491026 39708 0.0000157 0.535 13.6864 CATTTCTAAGAACCT 31238 Intron 2 1 73328404 73328418 73490347 73490361 40373 0.0000675 0.631 7.38005 CATTTCCTGGAGACC 32573 Intron 2 -1 73328404 73328418 73490347 73490361 40373 0.00008 0.641 6.85453 GGTCTCCAGGAAATG 33205 Intron 2 -1 73325719 73325733 73487662 73487676 43058 0.0000866 0.65 6.64431 TATTTGTAGGAAATG 33482 Intron 2 1 73324901 73324915 73486844 73486858 43876 0.0000905 0.658 6.53921 CGTTTACAGGAAATA 34317 Intron 2 -1 73324901 73324915 73486844 73486858 43876 0.0000987 0.662 6.329 TATTTCCTGTAAACG 32162 Intron 2 1 73324563 73324577 73486506 73486520 44214 0.000077 0.637 6.95963 AGATTCTAGGAATCC 25566 Intron 2 -1 73324563 73324577 73486506 73486520 44214 0.0000157 0.535 13.6864 GGATTCCTAGAATCT 25561 Promoter 2 1 73324420 73324434 73486363 73486377 44357 0.0000157 0.535 13.6864 CGATTCTGAGAATTT 31648 Intron 2 1 73324016 73324030 73485959 73485973 44761 0.0000719 0.634 7.16984 AACTTCCAGGAACTC 24070 Intron 2 -1 73324016 73324030 73485959 73485973 44761 0.00000519 0.49 15.3681 GAGTTCCTGGAAGTT 33963 Intron 2 1 73323398 73323412 73485341 73485355 45379 0.0000946 0.659 6.4341 AAGTTCTTGGAATCC 24825 Intron 2 -1 73323398 73323412 73485341 73485355 45379 0.0000104 0.496 14.4221 GGATTCCAAGAACTT 26539 Intron 2 1 73322793 73322807 73484736 73484750 45984 0.000023 0.547 12.8455 CATTTCCCAGAAAAG 24937 Intron 2 -1 73322793 73322807 73484736 73484750 45984 0.0000111 0.496 14.317 CTTTTCTGGGAAATG 26523 Intron 2 1 73322270 73322284 73484213 73484227 46507 0.000023 0.547 12.8455 CAGTTCTAAGAACCA 27549 Intron 2 -1 73322270 73322284 73484213 73484227 46507 0.0000321 0.57 11.8996 TGGTTCTTAGAACTG 25763 Intron 2 1 73321635 73321649 73483578 73483592 47142 0.0000175 0.546 13.4762 TCATTCTAGGAAGCC 33958 Intron 2 1 73320903 73320917 73482846 73482860 47874 0.0000946 0.659 6.4341 AAGTTCTTGGAATCC 24822 Intron 2 -1 73320903 73320917 73482846 73482860 47874 0.0000104 0.496 14.4221 GGATTCCAAGAACTT 26530 Intron 2 1 73320298 73320312 73482241 73482255 48479 0.000023 0.547 12.8455 CATTTCCCAGAAAAG 24934 Intron 2 -1 73320298 73320312 73482241 73482255 48479 0.0000111 0.496 14.317 CTTTTCTGGGAAATG 33956 Intron 2 1 73320146 73320160 73482089 73482103 48631 0.0000946 0.659 6.4341 AAGTTCTTGGAATCC 24820 Intron 2 -1 73320146 73320160 73482089 73482103 48631 0.0000104 0.496 14.4221 GGATTCCAAGAACTT 26538 Intron 2 1 73319726 73319740 73481669 73481683 49051 0.000023 0.547 12.8455 CAGTTCCTAGAAGTG 24407 Intron 2 -1 73319726 73319740 73481669 73481683 49051 0.00000723 0.49 14.9476 CACTTCTAGGAACTG 26526 Intron 2 1 73319541 73319555 73481484 73481498 49236 0.000023 0.547 12.8455 CATTTCCCAGAAAAG 24932 Intron 2 -1 73319541 73319555 73481484 73481498 49236 0.0000111 0.496 14.317 CTTTTCTGGGAAATG 26119 Intron 2 1 73318886 73318900 73480829 73480843 49891 0.0000202 0.547 13.1609 TGCTTCTGGGAAGTG 156

Table 14: continued

Binding Sequence Chromo- Strand Potential Potential Potential Potential End position p_value q_value Score Potential STAT5 binding site ID type some binding site binding site binding site binding site (mm10)* site start (mm9) end (mm9) start (mm10) end (mm10) 27311 Intron 2 -1 73318886 73318900 73480829 73480843 49891 0.00003 0.565 12.1098 CACTTCCCAGAAGCA 23362 Intron 2 -1 73318021 73318035 73479964 73479978 50756 0.000000465 0.427 17.4702 GAATTCTGGGAACTT 26529 Intron 2 1 73317231 73317245 73479174 73479188 51546 0.000023 0.547 12.8455 CAGTTCCTAGAAGTG 24405 Intron 2 -1 73317231 73317245 73479174 73479188 51546 0.00000723 0.49 14.9476 CACTTCTAGGAACTG 31452 Intron 2 -1 73316644 73316658 73478587 73478601 52133 0.0000695 0.631 7.27495 GAATTGCTGGAACTT 26525 Intron 2 1 73316474 73316488 73478417 73478431 52303 0.000023 0.547 12.8455 CAGTTCCTAGAAGTG 24404 Intron 2 -1 73316474 73316488 73478417 73478431 52303 0.00000723 0.49 14.9476 CACTTCTAGGAACTG 29073 Intron 2 1 73316279 73316293 73478222 73478236 52498 0.0000463 0.598 10.2179 CTTTTCCGAGAAAAA 28228 Intron 2 -1 73316279 73316293 73478222 73478236 52498 0.0000379 0.578 11.269 TTTTTCTCGGAAAAG 29740 Intron 2 -1 73314363 73314377 73476306 73476320 54414 0.0000527 0.605 9.16685 GTGTTCGGAGAAGTC 27000 Intron 2 1 73314015 73314029 73475958 73475972 54762 0.000027 0.56 12.4251 TGTTTCCTGGAAGAG 26913 Intron 2 -1 73314015 73314029 73475958 73475972 54762 0.000026 0.554 12.5302 CTCTTCCAGGAAACA 29071 Intron 2 1 73313784 73313798 73475727 73475741 54993 0.0000463 0.598 10.2179 CTTTTCCGAGAAAAA 28226 Intron 2 -1 73313784 73313798 73475727 73475741 54993 0.0000379 0.578 11.269 TTTTTCTCGGAAAAG 29070 Intron 2 1 73313027 73313041 73474970 73474984 55750 0.0000463 0.598 10.2179 CTTTTCCGAGAAAAA 28225 Intron 2 -1 73313027 73313041 73474970 73474984 55750 0.0000379 0.578 11.269 TTTTTCTCGGAAAAG 23778 Intron 2 1 73312892 73312906 73474835 73474849 55885 0.00000329 0.49 15.8936 GAATTCTGGGAAAAA 27640 Intron 2 -1 73312892 73312906 73474835 73474849 55885 0.0000331 0.572 11.7945 TTTTTCCCAGAATTC 29746 Intron 2 1 73312745 73312759 73474688 73474702 56032 0.0000527 0.605 9.16685 TTGTTCTGAGAATCT 31650 Intron 2 -1 73312705 73312719 73474648 73474662 56072 0.0000719 0.634 7.16984 CAATTCCAGGCACTG 25374 Intron 2 1 73312394 73312408 73474337 73474351 56383 0.0000142 0.535 13.8966 GAGTTCCAGGAACAA 26537 Intron 2 -1 73312394 73312408 73474337 73474351 56383 0.000023 0.547 12.8455 TTGTTCCTGGAACTC 25565 Intron 2 1 73310773 73310787 73472716 73472730 58004 0.0000157 0.535 13.6864 CGATTCTGAGAATTT 29742 Intron 2 1 73310250 73310264 73472193 73472207 58527 0.0000527 0.605 9.16685 TTGTTCTGAGAATCT 25372 Intron 2 1 73309899 73309913 73471842 73471856 58878 0.0000142 0.535 13.8966 GAGTTCCAGGAACAA 26528 Intron 2 -1 73309899 73309913 73471842 73471856 58878 0.000023 0.547 12.8455 TTGTTCCTGGAACTC 29741 Intron 2 1 73309493 73309507 73471436 73471450 59284 0.0000527 0.605 9.16685 TTGTTCTGAGAATCT 30833 Intron 2 1 73309487 73309501 73471430 73471444 59290 0.0000641 0.626 7.59026 GAATACCTGGAAACT 25371 Intron 2 1 73309142 73309156 73471085 73471099 59635 0.0000142 0.535 13.8966 GAGTTCCAGGAACAA 26524 Intron 2 -1 73309142 73309156 73471085 73471099 59635 0.000023 0.547 12.8455 TTGTTCCTGGAACTC 33206 Intron 2 -1 73307335 73307349 73469278 73469292 61442 0.0000866 0.65 6.64431 TATTTGTAGGAAATG 30831 Intron 2 1 73306992 73307006 73468935 73468949 61785 0.0000641 0.626 7.59026 GAATACCTGGAAACT 33955 Intron 2 -1 73306960 73306974 73468903 73468917 61817 0.0000946 0.659 6.4341 GGATGCTCGGAATTA 33483 Intron 2 1 73306517 73306531 73468460 73468474 62260 0.0000905 0.658 6.53921 CGTTTACAGGAAATA 34318 Intron 2 -1 73306517 73306531 73468460 73468474 62260 0.0000987 0.662 6.329 TATTTCCTGTAAACG 157

Table 14: continued

Binding Sequence Chromo- Strand Potential Potential Potential Potential End position p_value q_value Score Potential STAT5 binding site ID type some binding site binding site binding site binding site (mm10)* site start (mm9) end (mm9) start (mm10) end (mm10) 31239 Intron 2 1 73306336 73306350 73468279 73468293 62441 0.0000675 0.631 7.38005 CATTTCCTGGAGACC 32574 Intron 2 -1 73306336 73306350 73468279 73468293 62441 0.00008 0.641 6.85453 GGTCTCCAGGAAATG 30830 Intron 2 1 73306235 73306249 73468178 73468192 62542 0.0000641 0.626 7.59026 GAATACCTGGAAACT 31652 Intron 2 1 73305632 73305646 73467575 73467589 63145 0.0000719 0.634 7.16984 AACTTCCAGGAACTC 24071 Intron 2 -1 73305632 73305646 73467575 73467589 63145 0.00000519 0.49 15.3681 GAGTTCCTGGAAGTT 26532 Intron 2 1 73303886 73303900 73465829 73465843 64891 0.000023 0.547 12.8455 CAGTTCTAAGAACCA 27551 Intron 2 -1 73303886 73303900 73465829 73465843 64891 0.0000321 0.57 11.8996 TGGTTCTTAGAACTG 31237 Intron 2 1 73303841 73303855 73465784 73465798 64936 0.0000675 0.631 7.38005 CATTTCCTGGAGACC 32572 Intron 2 -1 73303841 73303855 73465784 73465798 64936 0.00008 0.641 6.85453 GGTCTCCAGGAAATG 25764 Intron 2 1 73303251 73303265 73465194 73465208 65526 0.0000175 0.546 13.4762 TCATTCTAGGAAGCC 31236 Intron 2 1 73303084 73303098 73465027 73465041 65693 0.0000675 0.631 7.38005 CATTTCCTGGAGACC 32571 Intron 2 -1 73303084 73303098 73465027 73465041 65693 0.00008 0.641 6.85453 GGTCTCCAGGAAATG 32164 Intron 2 1 73302495 73302509 73464438 73464452 66282 0.000077 0.637 6.95963 AGATTCTAGGAATCC 25569 Intron 2 -1 73302495 73302509 73464438 73464452 66282 0.0000157 0.535 13.6864 GGATTCCTAGAATCT 32160 Intron 2 1 73300000 73300014 73461943 73461957 68777 0.000077 0.637 6.95963 AGATTCTAGGAATCC 25563 Intron 2 -1 73300000 73300014 73461943 73461957 68777 0.0000157 0.535 13.6864 GGATTCCTAGAATCT 23363 Intron 2 -1 73299637 73299651 73461580 73461594 69140 0.000000465 0.427 17.4702 GAATTCTGGGAACTT 32157 Intron 2 1 73299571 73299585 73461514 73461528 69206 0.000077 0.637 6.95963 GATTTCTGGGACCTT 32159 Intron 2 1 73299243 73299257 73461186 73461200 69534 0.000077 0.637 6.95963 AGATTCTAGGAATCC 25560 Intron 2 -1 73299243 73299257 73461186 73461200 69534 0.0000157 0.535 13.6864 GGATTCCTAGAATCT 31453 Intron 2 -1 73298260 73298274 73460203 73460217 70517 0.0000695 0.631 7.27495 GAATTGCTGGAACTT 26120 Intron 2 1 73296818 73296832 73458761 73458775 71959 0.0000202 0.547 13.1609 TGCTTCTGGGAAGTG 27312 Intron 2 -1 73296818 73296832 73458761 73458775 71959 0.00003 0.565 12.1098 CACTTCCCAGAAGCA 29743 Intron 2 -1 73295979 73295993 73457922 73457936 72798 0.0000527 0.605 9.16685 GTGTTCGGAGAAGTC 27001 Intron 2 1 73295631 73295645 73457574 73457588 73146 0.000027 0.56 12.4251 TGTTTCCTGGAAGAG 26914 Intron 2 -1 73295631 73295645 73457574 73457588 73146 0.000026 0.554 12.5302 CTCTTCCAGGAAACA 26118 Intron 2 1 73294323 73294337 73456266 73456280 74454 0.0000202 0.547 13.1609 TGCTTCTGGGAAGTG 27310 Intron 2 -1 73294323 73294337 73456266 73456280 74454 0.00003 0.565 12.1098 CACTTCCCAGAAGCA 31653 Intron 2 -1 73294321 73294335 73456264 73456278 74456 0.0000719 0.634 7.16984 CAATTCCAGGCACTG 26117 Intron 2 1 73293566 73293580 73455509 73455523 75211 0.0000202 0.547 13.1609 TGCTTCTGGGAAGTG 27309 Intron 2 -1 73293566 73293580 73455509 73455523 75211 0.00003 0.565 12.1098 CACTTCCCAGAAGCA 27548 Intron 2 1 73293489 73293503 73455432 73455446 75288 0.0000321 0.57 11.8996 TCCTTCCTGGAAGCC 31649 Intron 2 -1 73292649 73292663 73454592 73454606 76128 0.0000719 0.634 7.16984 TATTTCTCGGACATA 23779 Intron 2 1 73290824 73290838 73452767 73452781 77953 0.00000329 0.49 15.8936 GAATTCTGGGAAAAA 27641 Intron 2 -1 73290824 73290838 73452767 73452781 77953 0.0000331 0.572 11.7945 TTTTTCCCAGAATTC 158

Table 14: continued

Binding Sequence Chromo- Strand Potential Potential Potential Potential End position p_value q_value Score Potential STAT5 binding site ID type some binding site binding site binding site binding site (mm10)* site start (mm9) end (mm9) start (mm10) end (mm10) 25568 Intron 2 1 73288705 73288719 73450648 73450662 80072 0.0000157 0.535 13.6864 CGATTCTGAGAATTT 33961 Intron 2 -1 73288576 73288590 73450519 73450533 80201 0.0000946 0.659 6.4341 GGATGCTCGGAATTA 23777 Intron 2 1 73288329 73288343 73450272 73450286 80448 0.00000329 0.49 15.8936 GAATTCTGGGAAAAA 27639 Intron 2 -1 73288329 73288343 73450272 73450286 80448 0.0000331 0.572 11.7945 TTTTTCCCAGAATTC 33957 Intron 2 1 73288247 73288261 73450190 73450204 80530 0.0000946 0.659 6.4341 AAGTTCTTGGAATCC 24821 Intron 2 -1 73288247 73288261 73450190 73450204 80530 0.0000104 0.496 14.4221 GGATTCCAAGAACTT 25557 Intron 2 -1 73287840 73287854 73449783 73449797 80937 0.0000157 0.535 13.6864 CATTTCTAAGAACCT 26527 Intron 2 1 73287642 73287656 73449585 73449599 81135 0.000023 0.547 12.8455 CATTTCCCAGAAAAG 24933 Intron 2 -1 73287642 73287656 73449585 73449599 81135 0.0000111 0.496 14.317 CTTTTCTGGGAAATG 23776 Intron 2 1 73287572 73287586 73449515 73449529 81205 0.00000329 0.49 15.8936 GAATTCTGGGAAAAA 27638 Intron 2 -1 73287572 73287586 73449515 73449529 81205 0.0000331 0.572 11.7945 TTTTTCCCAGAATTC 33959 Intron 2 1 73286914 73286928 73448857 73448871 81863 0.0000946 0.659 6.4341 AAGTTCTTGGAATCC 24823 Intron 2 -1 73286914 73286928 73448857 73448871 81863 0.0000104 0.496 14.4221 GGATTCCAAGAACTT 26531 Intron 2 1 73286309 73286323 73448252 73448266 82468 0.000023 0.547 12.8455 CATTTCCCAGAAAAG 24935 Intron 2 -1 73286309 73286323 73448252 73448266 82468 0.0000111 0.496 14.317 CTTTTCTGGGAAATG 25876 5' UTR 2 1 73286301 73286315 73448244 73448258 82476 0.0000184 0.546 13.3711 TTTTTCTGGGAAACT 25562 Intron 2 1 73286210 73286224 73448153 73448167 82567 0.0000157 0.535 13.6864 CGATTCTGAGAATTT 25950 Exon 2 1 73268484 73268498 73430427 73430441 100293 0.0000192 0.547 13.266 TGGTTCGTGGAAACA 25951 3' UTR 2 1 73268428 73268442 73430371 73430385 100349 0.0000192 0.547 13.266 TGGTTCGTGGAAACA 25948 Exon 2 1 73268055 73268069 73429998 73430012 100722 0.0000192 0.547 13.266 TGGTTCGTGGAAACA 25949 3' UTR 2 1 73267999 73268013 73429942 73429956 100778 0.0000192 0.547 13.266 TGGTTCGTGGAAACA

* relative to TSS (73530734, Strand: -1, mm10)

159

APPENDIX 17: 174 POTENTIAL STAT5B BINDING SITES FOR GENE WIPF1 (LONG MOTIF)

The difference of the potential STAT5B binding sites end coordinates to the TSS was performed in the same orientation for both strands. Therefore, the end coordinate of each potential STAT5B binding site was subtracted from the TSS. The table is sorted ascendingly after field “End position (mm10)*”.

Potential STAT5 binding sites that were found by FIMO on the opposite strand compared to the input sequence were assigned the sequence type (promoter, exon, intron, 3’ UTR, 5’ UTR) and gene name of the input sequence. Gene Wipf1 is located on the negative strand.

Table 15

174 potential STAT5B binding sites for gene Wipf1 (long motif)

Table 15: continued

Binding Sequence Chro- Strand Potential Potential Potential Potential End posi- p value q value Score Potential STAT5 binding site ID type mo- binding site binding site binding site binding site tion site some start (mm9) end (mm9) start end (mm10) (mm10)* (mm10) 23547 Promoter 2 1 73368555 73368569 73530498 73530512 222 0.00000181 0.475 16.559 GGATTCTTGGAATCT 23548 Promoter 2 1 73368045 73368059 73529988 73530002 732 0.00000181 0.475 16.559 GGATTCTTGGAATCT 23549 Promoter 2 1 73367851 73367865 73529794 73529808 926 0.00000181 0.475 16.559 GGATTCTTGGAATCT 24094 Exon 2 1 73367728 73367742 73529671 73529685 1049 0.00000547 0.475 15.1372 TTTTTCTGGGAAACT 24099 Exon 2 1 73367473 73367487 73529416 73529430 1304 0.00000547 0.475 15.1372 TTTTTCTGGGAAACT 160

Table 15: continued

Binding Sequence Chro- Strand Potential Potential Potential Potential End posi- p value q value Score Potential STAT5 binding site ID type mo- binding site binding site binding site binding site tion site some start (mm9) end (mm9) start end (mm10) (mm10)* (mm10) 33613 Intron 2 -1 73366130 73366144 73528073 73528087 2647 0.0000915 0.675 6.93481 TATTTCCTGTAAACG 23625 Intron 2 -1 73365245 73365259 73527188 73527202 3532 0.00000239 0.475 16.2309 GAGTTCCTGGAAGTT 28662 Intron 2 -1 73363499 73363513 73525442 73525456 5278 0.0000423 0.596 10.3251 TGGTTCTTAGAACTG 31150 Intron 2 1 73363499 73363513 73525442 73525456 5278 0.0000672 0.648 8.2472 CAGTTCTAAGAACCA 28454 Intron 2 1 73362864 73362878 73524807 73524821 5913 0.0000401 0.582 10.5439 TCATTCTAGGAAGCC 32427 Intron 2 -1 73362864 73362878 73524807 73524821 5913 0.0000799 0.662 7.48164 GGCTTCCTAGAATGA 30958 Intron 2 -1 73362666 73362680 73524609 73524623 6111 0.0000644 0.641 8.46593 CACTTCCCCGAAGTT 34105 Intron 2 1 73362433 73362447 73524376 73524390 6344 0.0000972 0.682 6.71608 TTTTTCCTGGACTCT 23624 Intron 2 -1 73359250 73359264 73521193 73521207 9527 0.00000239 0.475 16.2309 GAATTCTGGGAACTT 32288 Intron 2 -1 73357873 73357887 73519816 73519830 10904 0.0000777 0.662 7.59101 GAATTGCTGGAACTT 29982 Intron 2 1 73356563 73356577 73518506 73518520 12214 0.0000548 0.626 9.23149 TCCTTCCTGGAAGCC 33734 Intron 2 -1 73355723 73355737 73517666 73517680 13054 0.0000945 0.68 6.82545 TATTTCTCGGACATA 33321 Intron 2 1 73355592 73355606 73517535 73517549 13185 0.0000887 0.672 7.04418 GACTTCTCCGAACAC 24595 Intron 2 1 73355244 73355258 73517187 73517201 13533 0.00000873 0.475 14.3717 TGTTTCCTGGAAGAG 29574 Intron 2 -1 73355244 73355258 73517187 73517201 13533 0.000051 0.622 9.55959 CTCTTCCAGGAAACA 26736 Intron 2 -1 73350914 73350928 73512857 73512871 17863 0.0000249 0.526 12.075 CATTTCTAAGAACCT 31091 Intron 2 1 73349939 73349953 73511882 73511896 18838 0.0000659 0.644 8.35657 GAATACCTGGAAACT 33737 Intron 2 -1 73349491 73349505 73511434 73511448 19286 0.0000945 0.68 6.82545 CACTTCCTAGAAGGT 33316 Intron 2 1 73346788 73346802 73508731 73508745 21989 0.0000887 0.672 7.04418 CATTTCCTGGAGACC 25337 Intron 2 -1 73345466 73345480 73507409 73507423 23311 0.0000137 0.482 13.4967 GGATTCCAAGAACTT 24594 Intron 2 -1 73344861 73344875 73506804 73506818 23916 0.00000873 0.475 14.3717 CTTTTCTGGGAAATG 25066 Intron 2 1 73344861 73344875 73506804 73506818 23916 0.0000117 0.475 13.8248 CATTTCCCAGAAAAG 23942 Intron 2 -1 73342947 73342961 73504890 73504904 25830 0.00000439 0.475 15.4653 GGATTCCTAGAATCT 27048 Intron 2 1 73341794 73341808 73503737 73503751 26983 0.0000278 0.544 11.7469 CAGTTCCTAGAAGTG 29270 Intron 2 -1 73341794 73341808 73503737 73503751 26983 0.0000484 0.62 9.77832 CACTTCTAGGAACTG 24712 Intron 2 -1 73338347 73338361 73500290 73500304 30430 0.00000931 0.475 14.2623 TTTTTCTCGGAAAAG 27190 Intron 2 1 73338347 73338361 73500290 73500304 30430 0.0000289 0.544 11.6375 CTTTTCCGAGAAAAA 30957 Intron 2 -1 73338126 73338140 73500069 73500083 30651 0.0000644 0.641 8.46593 CCTTTCCTAGAAAGG 161

Table 15: continued

Binding Sequence Chro- Strand Potential Potential Potential Potential End posi- p value q value Score Potential STAT5 binding site ID type mo- binding site binding site binding site binding site tion site some start (mm9) end (mm9) start end (mm10) (mm10)* (mm10) 31531 Intron 2 1 73338126 73338140 73500069 73500083 30651 0.0000705 0.654 8.02847 CCTTTCTAGGAAAGG 29712 Intron 2 1 73337270 73337284 73499213 73499227 31507 0.0000522 0.622 9.45022 TGCTTCTGGGAAGTG 30950 Intron 2 -1 73337270 73337284 73499213 73499227 31507 0.0000644 0.641 8.46593 CACTTCCCAGAAGCA 32657 Intron 2 -1 73336468 73336482 73498411 73498425 32309 0.000082 0.666 7.37228 GAATCCCTAGAATTT 28661 Intron 2 1 73334813 73334827 73496756 73496770 33964 0.0000423 0.596 10.3251 TTGTTCTGAGAATCT 29983 Intron 2 1 73334718 73334732 73496661 73496675 34059 0.0000548 0.626 9.23149 TCCTTCCTGGAAGCC 25263 Intron 2 -1 73334462 73334476 73496405 73496419 34315 0.0000131 0.475 13.6061 TTGTTCCTGGAACTC 26737 Intron 2 1 73334462 73334476 73496405 73496419 34315 0.0000249 0.526 12.075 GAGTTCCAGGAACAA 33736 Intron 2 -1 73333878 73333892 73495821 73495835 34899 0.0000945 0.68 6.82545 TATTTCTCGGACATA 31094 Intron 2 1 73331555 73331569 73493498 73493512 37222 0.0000659 0.644 8.35657 GAATACCTGGAAACT 23979 Intron 2 -1 73331276 73331290 73493219 73493233 37501 0.00000471 0.475 15.356 TTTTTCCCAGAATTC 24095 Intron 2 1 73331276 73331290 73493219 73493233 37501 0.00000547 0.475 15.1372 GAATTCTGGGAAAAA 28545 Intron 2 -1 73329239 73329253 73491182 73491196 39538 0.0000412 0.589 10.4345 TATTTCGTGGAATGC 28885 Intron 2 1 73329239 73329253 73491182 73491196 39538 0.0000449 0.61 10.1064 GCATTCCACGAAATA 25908 Intron 2 1 73329157 73329171 73491100 73491114 39620 0.0000184 0.517 12.8406 CGATTCTGAGAATTT 26738 Intron 2 -1 73329069 73329083 73491012 73491026 39708 0.0000249 0.526 12.075 CATTTCTAAGAACCT 33320 Intron 2 1 73328404 73328418 73490347 73490361 40373 0.0000887 0.672 7.04418 CATTTCCTGGAGACC 33611 Intron 2 -1 73324901 73324915 73486844 73486858 43876 0.0000915 0.675 6.93481 TATTTCCTGTAAACG 23945 Intron 2 -1 73324563 73324577 73486506 73486520 44214 0.00000439 0.475 15.4653 GGATTCCTAGAATCT 28546 Promoter 2 -1 73324502 73324516 73486445 73486459 44275 0.0000412 0.589 10.4345 TATTTCGTGGAATGC 28886 Promoter 2 1 73324502 73324516 73486445 73486459 44275 0.0000449 0.61 10.1064 GCATTCCACGAAATA 25909 Promoter 2 1 73324420 73324434 73486363 73486377 44357 0.0000184 0.517 12.8406 CGATTCTGAGAATTT 23621 Intron 2 -1 73324016 73324030 73485959 73485973 44761 0.00000239 0.475 16.2309 GAGTTCCTGGAAGTT 25338 Intron 2 -1 73323398 73323412 73485341 73485355 45379 0.0000137 0.482 13.4967 GGATTCCAAGAACTT 24596 Intron 2 -1 73322793 73322807 73484736 73484750 45984 0.00000873 0.475 14.3717 CTTTTCTGGGAAATG 25067 Intron 2 1 73322793 73322807 73484736 73484750 45984 0.0000117 0.475 13.8248 CATTTCCCAGAAAAG 28657 Intron 2 -1 73322270 73322284 73484213 73484227 46507 0.0000423 0.596 10.3251 TGGTTCTTAGAACTG 31148 Intron 2 1 73322270 73322284 73484213 73484227 46507 0.0000672 0.648 8.2472 CAGTTCTAAGAACCA 162

Table 15: continued

Binding Sequence Chro- Strand Potential Potential Potential Potential End posi- p value q value Score Potential STAT5 binding site ID type mo- binding site binding site binding site binding site tion site some start (mm9) end (mm9) start end (mm10) (mm10)* (mm10) 28452 Intron 2 1 73321635 73321649 73483578 73483592 47142 0.0000401 0.582 10.5439 TCATTCTAGGAAGCC 32425 Intron 2 -1 73321635 73321649 73483578 73483592 47142 0.0000799 0.662 7.48164 GGCTTCCTAGAATGA 30949 Intron 2 -1 73321437 73321451 73483380 73483394 47340 0.0000644 0.641 8.46593 CACTTCCCCGAAGTT 34103 Intron 2 1 73321204 73321218 73483147 73483161 47573 0.0000972 0.682 6.71608 TTTTTCCTGGACTCT 25335 Intron 2 -1 73320903 73320917 73482846 73482860 47874 0.0000137 0.482 13.4967 GGATTCCAAGAACTT 24591 Intron 2 -1 73320298 73320312 73482241 73482255 48479 0.00000873 0.475 14.3717 CTTTTCTGGGAAATG 25064 Intron 2 1 73320298 73320312 73482241 73482255 48479 0.0000117 0.475 13.8248 CATTTCCCAGAAAAG 25333 Intron 2 -1 73320146 73320160 73482089 73482103 48631 0.0000137 0.482 13.4967 GGATTCCAAGAACTT 27049 Intron 2 1 73319726 73319740 73481669 73481683 49051 0.0000278 0.544 11.7469 CAGTTCCTAGAAGTG 29271 Intron 2 -1 73319726 73319740 73481669 73481683 49051 0.0000484 0.62 9.77832 CACTTCTAGGAACTG 24589 Intron 2 -1 73319541 73319555 73481484 73481498 49236 0.00000873 0.475 14.3717 CTTTTCTGGGAAATG 25062 Intron 2 1 73319541 73319555 73481484 73481498 49236 0.0000117 0.475 13.8248 CATTTCCCAGAAAAG 29715 Intron 2 1 73318886 73318900 73480829 73480843 49891 0.0000522 0.622 9.45022 TGCTTCTGGGAAGTG 30956 Intron 2 -1 73318886 73318900 73480829 73480843 49891 0.0000644 0.641 8.46593 CACTTCCCAGAAGCA 32660 Intron 2 -1 73318084 73318098 73480027 73480041 50693 0.000082 0.666 7.37228 GAATCCCTAGAATTT 23620 Intron 2 -1 73318021 73318035 73479964 73479978 50756 0.00000239 0.475 16.2309 GAATTCTGGGAACTT 27047 Intron 2 1 73317231 73317245 73479174 73479188 51546 0.0000278 0.544 11.7469 CAGTTCCTAGAAGTG 29269 Intron 2 -1 73317231 73317245 73479174 73479188 51546 0.0000484 0.62 9.77832 CACTTCTAGGAACTG 32286 Intron 2 -1 73316644 73316658 73478587 73478601 52133 0.0000777 0.662 7.59101 GAATTGCTGGAACTT 27046 Intron 2 1 73316474 73316488 73478417 73478431 52303 0.0000278 0.544 11.7469 CAGTTCCTAGAAGTG 29268 Intron 2 -1 73316474 73316488 73478417 73478431 52303 0.0000484 0.62 9.77832 CACTTCTAGGAACTG 24713 Intron 2 -1 73316279 73316293 73478222 73478236 52498 0.00000931 0.475 14.2623 TTTTTCTCGGAAAAG 27191 Intron 2 1 73316279 73316293 73478222 73478236 52498 0.0000289 0.544 11.6375 CTTTTCCGAGAAAAA 30960 Intron 2 -1 73316058 73316072 73478001 73478015 52719 0.0000644 0.641 8.46593 CCTTTCCTAGAAAGG 31532 Intron 2 1 73316058 73316072 73478001 73478015 52719 0.0000705 0.654 8.02847 CCTTTCTAGGAAAGG 33315 Intron 2 1 73314363 73314377 73476306 73476320 54414 0.0000887 0.672 7.04418 GACTTCTCCGAACAC 24588 Intron 2 1 73314015 73314029 73475958 73475972 54762 0.00000873 0.475 14.3717 TGTTTCCTGGAAGAG 29572 Intron 2 -1 73314015 73314029 73475958 73475972 54762 0.000051 0.622 9.55959 CTCTTCCAGGAAACA 163

Table 15: continued

Binding Sequence Chro- Strand Potential Potential Potential Potential End posi- p value q value Score Potential STAT5 binding site ID type mo- binding site binding site binding site binding site tion site some start (mm9) end (mm9) start end (mm10) (mm10)* (mm10) 24711 Intron 2 -1 73313784 73313798 73475727 73475741 54993 0.00000931 0.475 14.2623 TTTTTCTCGGAAAAG 27189 Intron 2 1 73313784 73313798 73475727 73475741 54993 0.0000289 0.544 11.6375 CTTTTCCGAGAAAAA 30954 Intron 2 -1 73313563 73313577 73475506 73475520 55214 0.0000644 0.641 8.46593 CCTTTCCTAGAAAGG 31530 Intron 2 1 73313563 73313577 73475506 73475520 55214 0.0000705 0.654 8.02847 CCTTTCTAGGAAAGG 24710 Intron 2 -1 73313027 73313041 73474970 73474984 55750 0.00000931 0.475 14.2623 TTTTTCTCGGAAAAG 27188 Intron 2 1 73313027 73313041 73474970 73474984 55750 0.0000289 0.544 11.6375 CTTTTCCGAGAAAAA 23982 Intron 2 -1 73312892 73312906 73474835 73474849 55885 0.00000471 0.475 15.356 TTTTTCCCAGAATTC 24100 Intron 2 1 73312892 73312906 73474835 73474849 55885 0.00000547 0.475 15.1372 GAATTCTGGGAAAAA 30952 Intron 2 -1 73312806 73312820 73474749 73474763 55971 0.0000644 0.641 8.46593 CCTTTCCTAGAAAGG 31529 Intron 2 1 73312806 73312820 73474749 73474763 55971 0.0000705 0.654 8.02847 CCTTTCTAGGAAAGG 28663 Intron 2 1 73312745 73312759 73474688 73474702 56032 0.0000423 0.596 10.3251 TTGTTCTGAGAATCT 25264 Intron 2 -1 73312394 73312408 73474337 73474351 56383 0.0000131 0.475 13.6061 TTGTTCCTGGAACTC 26739 Intron 2 1 73312394 73312408 73474337 73474351 56383 0.0000249 0.526 12.075 GAGTTCCAGGAACAA 28548 Intron 2 -1 73310855 73310869 73472798 73472812 57922 0.0000412 0.589 10.4345 TATTTCGTGGAATGC 28888 Intron 2 1 73310855 73310869 73472798 73472812 57922 0.0000449 0.61 10.1064 GCATTCCACGAAATA 25911 Intron 2 1 73310773 73310787 73472716 73472730 58004 0.0000184 0.517 12.8406 CGATTCTGAGAATTT 28659 Intron 2 1 73310250 73310264 73472193 73472207 58527 0.0000423 0.596 10.3251 TTGTTCTGAGAATCT 25262 Intron 2 -1 73309899 73309913 73471842 73471856 58878 0.0000131 0.475 13.6061 TTGTTCCTGGAACTC 26735 Intron 2 1 73309899 73309913 73471842 73471856 58878 0.0000249 0.526 12.075 GAGTTCCAGGAACAA 28658 Intron 2 1 73309493 73309507 73471436 73471450 59284 0.0000423 0.596 10.3251 TTGTTCTGAGAATCT 31095 Intron 2 1 73309487 73309501 73471430 73471444 59290 0.0000659 0.644 8.35657 GAATACCTGGAAACT 25261 Intron 2 -1 73309142 73309156 73471085 73471099 59635 0.0000131 0.475 13.6061 TTGTTCCTGGAACTC 26734 Intron 2 1 73309142 73309156 73471085 73471099 59635 0.0000249 0.526 12.075 GAGTTCCAGGAACAA 33733 Intron 2 -1 73308262 73308276 73470205 73470219 60515 0.0000945 0.68 6.82545 CACTTCCTAGAAGGT 31093 Intron 2 1 73306992 73307006 73468935 73468949 61785 0.0000659 0.644 8.35657 GAATACCTGGAAACT 33612 Intron 2 -1 73306517 73306531 73468460 73468474 62260 0.0000915 0.675 6.93481 TATTTCCTGTAAACG 33322 Intron 2 1 73306336 73306350 73468279 73468293 62441 0.0000887 0.672 7.04418 CATTTCCTGGAGACC 31092 Intron 2 1 73306235 73306249 73468178 73468192 62542 0.0000659 0.644 8.35657 GAATACCTGGAAACT 164

Table 15: continued

Binding Sequence Chro- Strand Potential Potential Potential Potential End posi- p value q value Score Potential STAT5 binding site ID type mo- binding site binding site binding site binding site tion site some start (mm9) end (mm9) start end (mm10) (mm10)* (mm10) 23623 Intron 2 -1 73305632 73305646 73467575 73467589 63145 0.00000239 0.475 16.2309 GAGTTCCTGGAAGTT 28660 Intron 2 -1 73303886 73303900 73465829 73465843 64891 0.0000423 0.596 10.3251 TGGTTCTTAGAACTG 31149 Intron 2 1 73303886 73303900 73465829 73465843 64891 0.0000672 0.648 8.2472 CAGTTCTAAGAACCA 33318 Intron 2 1 73303841 73303855 73465784 73465798 64936 0.0000887 0.672 7.04418 CATTTCCTGGAGACC 28453 Intron 2 1 73303251 73303265 73465194 73465208 65526 0.0000401 0.582 10.5439 TCATTCTAGGAAGCC 32426 Intron 2 -1 73303251 73303265 73465194 73465208 65526 0.0000799 0.662 7.48164 GGCTTCCTAGAATGA 33317 Intron 2 1 73303084 73303098 73465027 73465041 65693 0.0000887 0.672 7.04418 CATTTCCTGGAGACC 30955 Intron 2 -1 73303053 73303067 73464996 73465010 65724 0.0000644 0.641 8.46593 CACTTCCCCGAAGTT 34104 Intron 2 1 73302820 73302834 73464763 73464777 65957 0.0000972 0.682 6.71608 TTTTTCCTGGACTCT 23946 Intron 2 -1 73302495 73302509 73464438 73464452 66282 0.00000439 0.475 15.4653 GGATTCCTAGAATCT 23944 Intron 2 -1 73300000 73300014 73461943 73461957 68777 0.00000439 0.475 15.4653 GGATTCCTAGAATCT 23622 Intron 2 -1 73299637 73299651 73461580 73461594 69140 0.00000239 0.475 16.2309 GAATTCTGGGAACTT 23943 Intron 2 -1 73299243 73299257 73461186 73461200 69534 0.00000439 0.475 15.4653 GGATTCCTAGAATCT 32287 Intron 2 -1 73298260 73298274 73460203 73460217 70517 0.0000777 0.662 7.59101 GAATTGCTGGAACTT 29716 Intron 2 1 73296818 73296832 73458761 73458775 71959 0.0000522 0.622 9.45022 TGCTTCTGGGAAGTG 30959 Intron 2 -1 73296818 73296832 73458761 73458775 71959 0.0000644 0.641 8.46593 CACTTCCCAGAAGCA 32661 Intron 2 -1 73296016 73296030 73457959 73457973 72761 0.000082 0.666 7.37228 GAATCCCTAGAATTT 33319 Intron 2 1 73295979 73295993 73457922 73457936 72798 0.0000887 0.672 7.04418 GACTTCTCCGAACAC 24593 Intron 2 1 73295631 73295645 73457574 73457588 73146 0.00000873 0.475 14.3717 TGTTTCCTGGAAGAG 29573 Intron 2 -1 73295631 73295645 73457574 73457588 73146 0.000051 0.622 9.55959 CTCTTCCAGGAAACA 29714 Intron 2 1 73294323 73294337 73456266 73456280 74454 0.0000522 0.622 9.45022 TGCTTCTGGGAAGTG 30953 Intron 2 -1 73294323 73294337 73456266 73456280 74454 0.0000644 0.641 8.46593 CACTTCCCAGAAGCA 29713 Intron 2 1 73293566 73293580 73455509 73455523 75211 0.0000522 0.622 9.45022 TGCTTCTGGGAAGTG 30951 Intron 2 -1 73293566 73293580 73455509 73455523 75211 0.0000644 0.641 8.46593 CACTTCCCAGAAGCA 32659 Intron 2 -1 73293521 73293535 73455464 73455478 75256 0.000082 0.666 7.37228 GAATCCCTAGAATTT 29981 Intron 2 1 73293489 73293503 73455432 73455446 75288 0.0000548 0.626 9.23149 TCCTTCCTGGAAGCC 32658 Intron 2 -1 73292764 73292778 73454707 73454721 76013 0.000082 0.666 7.37228 GAATCCCTAGAATTT 33732 Intron 2 -1 73292649 73292663 73454592 73454606 76128 0.0000945 0.68 6.82545 TATTTCTCGGACATA 165

Table 15: continued

Binding Sequence Chro- Strand Potential Potential Potential Potential End posi- p value q value Score Potential STAT5 binding site ID type mo- binding site binding site binding site binding site tion site some start (mm9) end (mm9) start end (mm10) (mm10)* (mm10) 23983 Intron 2 -1 73290824 73290838 73452767 73452781 77953 0.00000471 0.475 15.356 TTTTTCCCAGAATTC 24101 Intron 2 1 73290824 73290838 73452767 73452781 77953 0.00000547 0.475 15.1372 GAATTCTGGGAAAAA 33735 Intron 2 -1 73289878 73289892 73451821 73451835 78899 0.0000945 0.68 6.82545 CACTTCCTAGAAGGT 28549 Intron 2 -1 73288787 73288801 73450730 73450744 79990 0.0000412 0.589 10.4345 TATTTCGTGGAATGC 28889 Intron 2 1 73288787 73288801 73450730 73450744 79990 0.0000449 0.61 10.1064 GCATTCCACGAAATA 25912 Intron 2 1 73288705 73288719 73450648 73450662 80072 0.0000184 0.517 12.8406 CGATTCTGAGAATTT 23981 Intron 2 -1 73288329 73288343 73450272 73450286 80448 0.00000471 0.475 15.356 TTTTTCCCAGAATTC 24098 Intron 2 1 73288329 73288343 73450272 73450286 80448 0.00000547 0.475 15.1372 GAATTCTGGGAAAAA 25334 Intron 2 -1 73288247 73288261 73450190 73450204 80530 0.0000137 0.482 13.4967 GGATTCCAAGAACTT 26733 Intron 2 -1 73287840 73287854 73449783 73449797 80937 0.0000249 0.526 12.075 CATTTCTAAGAACCT 24590 Intron 2 -1 73287642 73287656 73449585 73449599 81135 0.00000873 0.475 14.3717 CTTTTCTGGGAAATG 25063 Intron 2 1 73287642 73287656 73449585 73449599 81135 0.0000117 0.475 13.8248 CATTTCCCAGAAAAG 23980 Intron 2 -1 73287572 73287586 73449515 73449529 81205 0.00000471 0.475 15.356 TTTTTCCCAGAATTC 24097 Intron 2 1 73287572 73287586 73449515 73449529 81205 0.00000547 0.475 15.1372 GAATTCTGGGAAAAA 25336 Intron 2 -1 73286914 73286928 73448857 73448871 81863 0.0000137 0.482 13.4967 GGATTCCAAGAACTT 24592 Intron 2 -1 73286309 73286323 73448252 73448266 82468 0.00000873 0.475 14.3717 CTTTTCTGGGAAATG 25065 Intron 2 1 73286309 73286323 73448252 73448266 82468 0.0000117 0.475 13.8248 CATTTCCCAGAAAAG 24096 5' UTR 2 1 73286301 73286315 73448244 73448258 82476 0.00000547 0.475 15.1372 TTTTTCTGGGAAACT 28547 Intron 2 -1 73286292 73286306 73448235 73448249 82485 0.0000412 0.589 10.4345 TATTTCGTGGAATGC 28887 Intron 2 1 73286292 73286306 73448235 73448249 82485 0.0000449 0.61 10.1064 GCATTCCACGAAATA 25910 Intron 2 1 73286210 73286224 73448153 73448167 82567 0.0000184 0.517 12.8406 CGATTCTGAGAATTT 29865 Exon 2 -1 73268484 73268498 73430427 73430441 100293 0.0000536 0.622 9.34086 TGTTTCCACGAACCA 30353 Exon 2 1 73268484 73268498 73430427 73430441 100293 0.0000587 0.632 8.90339 TGGTTCGTGGAAACA 29866 3' UTR 2 -1 73268428 73268442 73430371 73430385 100349 0.0000536 0.622 9.34086 TGTTTCCACGAACCA 30354 3' UTR 2 1 73268428 73268442 73430371 73430385 100349 0.0000587 0.632 8.90339 TGGTTCGTGGAAACA 29863 Exon 2 -1 73268055 73268069 73429998 73430012 100722 0.0000536 0.622 9.34086 TGTTTCCACGAACCA 30351 Exon 2 1 73268055 73268069 73429998 73430012 100722 0.0000587 0.632 8.90339 TGGTTCGTGGAAACA 29864 3' UTR 2 -1 73267999 73268013 73429942 73429956 100778 0.0000536 0.622 9.34086 TGTTTCCACGAACCA 166

Table 15: continued

Binding Sequence Chro- Strand Potential Potential Potential Potential End posi- p value q value Score Potential STAT5 binding site ID type mo- binding site binding site binding site binding site tion site some start (mm9) end (mm9) start end (mm10) (mm10)* (mm10) 30352 3' UTR 2 1 73267999 73268013 73429942 73429956 100778 0.0000587 0.632 8.90339 TGGTTCGTGGAAACA

* relative to TSS (73530734, Strand: -1, mm10) ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

Thesis and Dissertation Services ! !