Insights into in vivo transcription factor targeting through studies of the archetypal zinc finger

KLF3.

Jon Burdach

A thesis submitted for the degree of Doctor of Philosophy

Biochemistry and molecular genetics

School of Biotechnology and Biomolecular Sciences

University of New South Wales

May 2013

Originality statement

'I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged.'

Signed ·· ~············· · ····· ··· ······HH H

Date .... ~/1-lcP.!'f ......

Table of Contents

Acknowledgements ...... i Publications arising from this thesis ...... ii Journal articles ...... ii Conference abstracts ...... ii Abstract ...... iii Abbreviations ...... iv Chapter 1. General introduction ...... 1 1.1. Transcription factors in regulation ...... 1 1.1.1. Transcription factor domain structure ...... 2 1.1.2. Transcription factor families ...... 4 1.1.3. The role of DNA sequence in determining TF occupancy...... 9 1.1.4. The role of chromatin in determining TF occupancy ...... 11 1.1.5. The role of cofactors and cobinding in determining TF occupancy ...... 18 1.2. Krüppel-like factors ...... 23 1.2.1. DNA binding by KLFs ...... 25 1.2.2. KLF3 ...... 28 1.3. Aims of this thesis ...... 31 Chapter 2. Materials and methods ...... 32 2.1. Materials ...... 32 2.1.1. Reagents and kits ...... 32 2.1.2. Enzymes ...... 32 2.1.3. Cell lines ...... 33 2.1.4. Antibodies ...... 33 2.1.5. Oligonucelotides ...... 33 2.1.6. Vectors ...... 33 2.2. Laboratory methods ...... 34 2.2.1. General methods ...... 34 2.2.2. Cell culture ...... 34 2.2.3. Generation of retroviral and expression vectors ...... 34 2.2.4. Retroviral transduction ...... 34 2.2.5. Transient transfections for protein production ...... 35 2.2.6. Nuclear extracts...... 35 2.2.7. SDS-PAGE ...... 35 2.2.8. Western blot ...... 35 2.2.9. Electrophoretic mobility shift assay ...... 36 2.2.10. Immunoflourescence microscopy ...... 36 2.2.11. Chromatin immunopreciptiation ...... 37 2.2.12. High-throughput sequencing ...... 37 2.2.13. Real-time PCR ...... 37 2.2.14. Gene expression microarrays ...... 38 2.3. Bioinformatics methods ...... 38 2.3.1. Alignment ...... 38 2.3.2. Peak calling, peak overlap and genomic annotation...... 38 2.3.3. Quantification of ChIP tags...... 39

2.3.4. Visualization ...... 39 2.3.5. ENCODE data sets ...... 39 2.3.6. Gene expression microarray analysis ...... 40 Chapter 3. Genome-wide profile of KLF3 occupancy...... 41 3.1. Introduction ...... 41 3.2. Experimental model ...... 42 3.3. ChIP-seq ...... 43 3.4. Evaluation of replicates ...... 45 3.5. KLF3 peak characterisation ...... 46 3.5.1. Confirmation of peaks by qPCR ...... 46 3.5.2. Distribution of KLF3 peaks across the genome ...... 49 3.5.3. Peak height and sequence conservation at different genomic regions ...... 50 3.6. Overlap with differentially expression ...... 52 3.7. RNA polymerase II and KLF3 ...... 59 3.8. Discussion ...... 63 Chapter 4. KLF3 DNA binding and chromatin state ...... 68 4.1. Introduction ...... 68 4.1.1. DNA binding by KLF3 ...... 68 4.1.2. KLF3 and chromatin state ...... 69 4.2. KLF3 DNA binding preference in vivo ...... 70 4.3. Validation of the de novo generated KLF3 consensus motif ...... 72 4.4. Distribution of KLF3 motifs within peaks ...... 74 4.5. Search for known motifs of other TFs ...... 76 4.5.1. KLF3 associated motifs in promoter peaks ...... 77 4.5.2. KLF3 associated motifs in intronic and intergenic peaks ...... 79 4.6. KLF3 is bound to nucleosome depleted regions...... 82 4.7. KLF3 co-associates with modified histones ...... 84 4.8. Discussion ...... 91 Chapter 5. Non-DNA binding domains of KLF3 specify chromatin occupancy...... 96 5.1. Introduction ...... 96 5.2. Experimental model ...... 98 5.3. ChIP-seq ...... 100 5.4. Evaluation of replicates ...... 102 5.5. Confirmation of peaks by ChIP-PCR ...... 106 5.6. Qualitative peak comparisons ...... 106 5.7. Comparison of peak distribution...... 110 5.8. Comparison of DNA consensus motifs ...... 112 5.9. Conservation of peaks for bona fide KLF3 target genes ...... 113 5.10. Discussion ...... 114 Chapter 6. General discussion ...... 119 6.1. The biological role of KLF3 ...... 119 6.2. The molecular role of KLF3 ...... 120 6.3. The in vivo specificity of KLF3 ...... 122 References ...... 124 Appendix ...... 138

Acknowledgements

Firstly, I would like to express my appreciation to my supervisor, Merlin Crossley and my co- supervisor Richard Pearson. Both have been instrumental in guiding me through my PhD and in preparing this thesis. I would also like to thank Alister Funnell for the many discussions we have had over both technical and theoretical issues throughout the course of my PhD. Thanks also must go to the team in the Crossley Lab for all the day to day contributions that have collectively enabled this thesis to be completed. I would also like to acknowledge our collaborators including the Mackay-Mathews lab and the Nicholas Lab at the University of Sydney and the Wilkins lab at the University of New South Wales.

Naturally, I would also like to thank my friends and family for their support over the last four years. Most of all however, I give my thanks to Sophie as she has been there through it all.

i

Publications arising from this thesis

Journal articles

Burdach J, Funnell AP, Artuz CM, Mak KS, Tan LT, Pearson RC, Crossley M. (2013) Regions outside the DNA-binding domain are critical for proper in vivo specificity of an archetypal zinc finger transcription factor. Nucleic Acids Research, under revision.

Bell-Anderson KS, Funnell AP, Williams H, Jusoh HM, Scully T, Lim WF, Burdach JG, Mak KS, Knights AJ, Hoy AJ, Nicholas HR, Sainsbury A, Turner N, Pearson RC, Crossley M. Loss of Krüppel-like Factor 3 (KLF3/BKLF) leads to upregulation of the insulin-sensitizing factor adipolin (FAM132A/CTRP12/C1qdc2). Diabetes. Accepted.

Funnell AP, Norton LJ, Mak KS, Burdach J, Artuz CM, Twine NA, Wilkins MR, Power CA, Hung TT, Perdomo J, Koh P, Bell-Anderson KS, Orkin SH, Fraser ST, Perkins AC, Pearson RC, Crossley M. (2012). The CACCC-binding protein KLF3/BKLF represses a subset of KLF1/EKLF target genes and is required for proper erythroid maturation in vivo. Molecular and Cellular Biology, 32:16, 3281-3292.

Sibling Rivalry: the Repressive Transcription Factor KLF3 Keeps the Activator Erythroid KLF on Target. Editorial comment on Funnell et al. 2012: The CACCC-binding protein KLF3/BKLF represses a subset of KLF1/EKLF target genes and is required for proper erythroid maturation in vivo. Molecular and Cellular Biology (2012), 32:16, 3281-3292.

Conference abstracts

Burdach J, Funnell A, Pearson R, Crossley M. (2013) Transcription factor specificity: beyond the DNA-binding domain. In: The 34th Annual Lorne Genome Conference 2013, Lorne, Australia. Oral presentation.

Burdach J, Funnell A, Pearson R, Crossley M. (2012) Transcription factor specificity: beyond the DNA-binding domain. In: The 4th EMBO Meeting 2012, Nice, France. Poster presentation.

Burdach J, Crossley M. (2012) How do transcription factors find their target genes? In: The 33rd Annual Lorne Genome Conference 2012, Lorne, Australia. Poster presentation.

ii

Abstract

Background: Transcription factors (TFs) are major drivers of gene regulatory programs and underpin cellular processes including differentiation and development. TFs are often regarded as being minimally comprised of a DNA-binding domain and a functional domain. The two domains are considered separable and autonomous, with the DNA-binding domain directing the factor to its target genes and the functional domain imparting transcriptional regulation. We examined an archetypal zinc finger (ZF) transcription factor, Krüppel-like factor 3 (KLF3) with an N-terminal domain that binds the co-repressor CtBP, and a DNA-binding domain comprised of three ZFs at its C-terminus.

Methods: A system was established to compare the genomic occupancy profile of wild-type

KLF3 with two mutants affecting the N-terminal functional domain: a mutant unable to contact the cofactor CtBP and a mutant lacking the entire N-terminal domain, but retaining the ZFs intact.

Chromatin immunoprecipitation followed by high-throuput sequencing (CHIP-seq) was used to assess binding across the genome in murine embryonic fibroblasts.

Results: The ChIP-seq experiments revealed that KLF3 occupancy is strongly enriched at gene promoters and that KLF3 peak height is related to degree of nucleosome depletion. De novo motif discovery showed that the KLF3 DNA-consensus motif was highly similar to known motifs for

KLF1 and KLF4. Unexpectedly, we observed that mutations to the N-terminal domain of KLF3 dramatically altered KLF3 occupancy. Deletion of the N-terminal domain or disruption of the

CtBP recruitment motif generally reduced binding, but also enabled retention or even an increase in binding at certain loci.

Conclusion: These results provide a clear demonstration that the correct localisation of transcription factors to their target genes is not solely dependent on their DNA-contact domains.

This informs our understanding of how transcription factors operate and is of relevance to the design of artificial ZF .

iii

Abbreviations

3T3-L1 3-day transfer, inoculum 3 x 105 cells – L1

5mC 5-methyl cytosine

Ac Acetylation

ANOVA Analysis of variance

Anxa2 Annexin A2

AP Alkaline phosphatase

AP-1 Activator protein 1

AR Androgen receptor

Birc6 Baculoviral IAP repeat-containing 6 bp

Cbf1 Centromere binding factor 1

CDS Coding DNA sequence

CGI CpG island

ChIP-chip Chromatin immunoprecipitation followed by microarray

ChIP-PCR Chromatin immunoprecipitation followed by polymerase chain reaction

ChIP-seq Chromatin immunoprecipitation followed by high throughput sequencing

ChRC Chromatin remodelling complex

CoREST Corepressor of REST

COS CV-1 (simian) in origin, and carrying the SV40 genetic material

iv

CpG Cytosine-phosphate-guanine dinucleotide

CtBP C-terminal binding protein

DAPI 4',6-diamidino-2-phenylindole

DBD DNA-binding domain

ΔDL Substitution of aspartic acid and leucine

DMEM Dulbecco’s modified eagle medium

DNA Deoxyribonucleic acid

DNase-seq DNase I hypersensitive site sequencing

ECL Electrochemiluminescence

EDTA Ethylenediaminetetraacetic acid

EMSA Electrophoretic mobility shift assay

ENCODE Encyclopedia of DNA elements

Epgn Epithelial mitogen

ERG ETS related gene

ES Embryonic stem

ETS E-twenty six

Exd Extradenticle

Fam132a Family with sequence similarity 132, member A

FCS Foetal calf serum

FDR False-discovery rate

v

Fez2 Fasciculation and elongation protein zeta 2 (zygin II)

FHL3 Four and a half LIM domains 3

FITC Fluorescein isothiocyanate

FOG-1 Friend of GATA-1

G1ME GATA1-null megakaryocytic cells

GATA-1 GATA binding factor 1

Gb Gigabase

GLP/Eu-HMT-1 Euchromatic Histone Methyltransferase 1

Grin1 Glutamate receptor, ionotropic, N-methyl D-aspartate 1

GTF General transcription factor

H2A Histone 2A

H2A.Z Histone 2A variant Z

H2B Histone 2B

H3 Histone 3

H3K27ac Histone 3 lysine 27 acetylation

H3K4me1 Histone 3 lysine 4 monomethyl

H3K4me3 Histone 3 lysine 4 trimethylation

H3K4me3 Histone 3 lysine 4 trimethyl

H4 Histone 4

HAT Histone acetyltransferase

vi

HDAC Histone deacetylase

HDAC1/2 Histone deacetylase 1/2

HLSD Histone lysine specific demethylase

HMT Histone methyltransferases

HOMER Hypergeometric Optimization of Motif EnRichment

HOX Homeobox

HRP Horseradish peroxidase

HS Hypersensitive

IgG Immunglobulin type G

IN Input

IP Immunoprecipitant iPS Induced pluripotent stem

Kb Kilobase

KLF Krüppel-like factor

KLF3 Krüppel-like factor 3

KO Knock-out

Lgals3 Galectin 3

Lmna Lamin A

Map3k6 Mitogen-activated protein kinase kinase kinase 6

Mb Megabase

vii

MEF Murine embryonic fibroblast

MEME Multiple EM for motif elicitation

Met4 Methionine requiring 4

Met8 Methionine requiring 8

Mgst3 Glutathione-S-transferase 3

Mir3109 MicroRNA 3109

NAD+ Nicotinamide adenine dinucleotide (oxidised)

NADH Nicotinamide adenine dinucleotide (reduced)

NCBI National centre for biotechnology information

NDR Nucleosome depleted region

NFY Nuclear transcription factor Y

NRF1 Nuclear respiratory factor 1

PBS Phosphate buffered saline

PCR Polymerase chain reaction

Pqlc3 PQ loop repeat containing 3

PSG Penicillin Streptomycin Glutamate

PTM Post-translational modification

Rc3h1 RING CCCH (C3H) domains 1

RMA Robust multi-array average

RNA Ribonucleic acid

viii

RNAP II RNA-polymerase II

RT Room temperature

RUNX Runt-related transcription factor

SCL/TAL1 T-cell acute lymphocytic leukemia protein 1

SDS-PAGE Sodium dodecyl sulfate polyacrylamide gel electrophoresis

SELEX Systematic evolution of ligands by exponential enrichment

Sin3A SIN3 transcription regulator homolog A

SP Specificity Protein

SPDEF SAM pointed domain containing ETS transcription factor

SPI1 Spleen focus forming virus (SFFV) proviral integration oncogene

Stard4 StAR-related lipid transfer protein 4

SV5 Simian virus 5

SWI/SNF Switch/sucrose non-fermentable

TBST Tris-Buffered Saline, tween-20

TEF Transcriptional enhancer factor

TF Transcription factor

Thada Thyroid adenoma associated

TSS Transcription start site

TTS Transcription termination site

UTR Untranslated region

ix

UV Ultraviolet

VCaP Vertebral-cancer of the prostate

VP16 Virus encoded protein 16

WT Wild-type

ZF Zinc finger

ZIF268 Zinc finger 268

x

Chapter 1. General introduction

This thesis examines the issue of how transcription factors achieve their specific chromatin occupancy throughout the genome. In order to establish the context around this issue, the introduction is split into two sections. The first section deals principally with how transcription factors function, both as members of larger transcription factor families and as individual regulators. In this section, particular focus is given to the known mechanisms by which TFs achieve their specific in vivo occupancy and areas are highlighted where these mechanisms are still poorly understood. The second section of the introduction focuses on one particular family of TFs, the Krüppel-like factors (KLFs). KLFs are archetypal zinc finger (ZF) proteins that provide an excellent platform for investigating the specific chromatin occupancy of TFs. Relevant information regarding the DNA-binding specificity within the family is discussed. Finally, these concepts are drawn together around the TF KLF3 as means to investigate how typical ZF transcription factors achieve their distinct functions in vivo.

1.1. Transcription factors in gene regulation

TFs are proteins which can regulate the process of transcribing a DNA template into RNA by the action of RNA polymerases (Latchman 1997; Brivanlou and Darnell 2002). These proteins promote or repress transcription by affecting the recruitment and initiation of RNA polymerases.

TFs achieve this regulation by binding to particular DNA sequences and interacting either directly, or indirectly via other TFs or cofactors with RNA polymerases (Roeder 1996; Nikolov and Burley 1997; Lee and Young 2000). TFs typically act together in large complexes to bring about their changes to gene expression, thereby creating a diverse regulatory system capable of achieving the wide ranging effects required to enable proper cellular functions. They can act over both short distances, by binding to the proximal promoter of target genes, and can also act over long distances by binding to distal regulatory elements such as enhancers, silencers or insulators

(Lee and Young 2000). There are no clear rules to define where these distal elements may lie in

1

relation to the gene they regulate. Enhancers for instance can be anywhere from 1 kb to up to 1

Mb away from their target genes and can lie either upstream or downstream of their related gene

(Blackwood and Kadonaga 1998; Lettice et al. 2003; Schoenfelder et al. 2010).

TFs are amongst the most powerful regulators in the cell and underpin a large portion of all cellular processes. The process of transcription was recognised early on to be one of the principal forces in the transfer of cellular information. This concept was posited by Francis Crick as the

“central dogma of molecular biology” where information content tended to flow from DNA to

RNA to protein (Crick 1958). Although the central dogma failed to capture some more subtle or complex regulatory processes such as post-translational modification of proteins and DNA methylation, it remains largely accurate and highlights the central nature of transcription in cellular processes.

As agents that regulate this process, TFs are therefore central to the development and function of living cells. The most striking evidence for their central role has been the advent of cellular reprogramming where the forced expression of limited numbers of particular transcription factors has enabled the conversion of cells from one type to another. This approach has been used to successfully convert mouse fibroblasts to induced pluripotent stem (iPS) cells (Takahashi and

Yamanaka 2006), murine embryonic fibroblasts (MEFs) to cardiomyocytes (Efe et al. 2011), human dermal fibroblasts to multipotent blood progenitors (Szabo et al. 2010) and mouse fibroblasts to neuronal progenitors (Kim et al. 2011).

1.1.1. Transcription factor domain structure

Transcription factors are generally considered to be minimally composed of at least two domains; a DNA-binding domain and a functional domain. Cofactors, chromatin modifying enzymes and other similar regulatory proteins are thus differentiated from transcription factors on the basis that they lack a DNA-binding domain (Brivanlou and Darnell 2002). Under this model the DNA- binding domain guides the TF to appropriate cis-regulatory elements in genomic DNA, whilst the functional domain influences gene expression in trans, either by directly contacting RNA

2

polymerase or through the recruitment of cofactors and other complexes. The modularity and independence of these domains is exemplified by the development of the yeast two-hybrid system

(Fields and Song 1989), where the yeast TF Gal4, is split into separate DNA-binding and activation domains. These are fused to putative interacting proteins termed “bait and prey” and are used in a reporter assay to test for interactions between the bait and prey (Figure 1.1A). The activation of the reporter gene occurs when the modular and separated Gal4 domains are reunited to recreate a functional TF capable of activating gene transcription.

Further evidence for the modular nature of these domains comes from the ability to create artificial transcription factors by coupling designer ZF DNA-binding domains that target a sequence of interest with functional domains such as the VP16 activation domain (Figure 1.1B) (Mandell and

Barbas 2006b; Papworth et al. 2006; Klug 2010). The creation of such proteins shows that these domains are capable of acting independently and are by their essence, modular and autonomous.

3

Figure 1.1 Engineered proteins demonstrate the modular nature of transcription factor domains. (A) The yeast- 2-hybrid assay involves the reunification of the DNA-binding domain and activation domain of the yeast transcription factor Gal4 to detect interactions between bait and prey (Fields and Song 1989). (B) Artificial transcription factors can be created by fusing a ZF domain designed to target a particular DNA sequence, to a functional domain such as the VP16 activation domain (Mandell and Barbas 2006b; Papworth et al. 2006; Klug 2010).

1.1.2. Transcription factor families

The has around 2,000 - 3,000 transcription factors meaning that they are the largest functional family of proteins and constitute around 10% of all protein coding genes

(Brivanlou and Darnell 2002; Vaquerizas et al. 2009). The number of TFs generally increases with organism complexity with approximately 300 transcription factors in yeast, approximately

1000 transcription factors in Drosophila and C. elegans and approximately 2000 – 3000 in humans (Levine and Tjian 2003; Rodriguez-Caso et al. 2005). In lower eukaryotes such as yeast, there is approximately one TF for every 20 protein genes, whereas in humans this ratio changes to approximately one TF for 10 protein coding genes (Levine and Tjian 2003). Thus TFs (along with the entire gene regulatory system) have been central in the evolutionary processes that have shaped higher organisms.

The general expansion in the number of TFs that has occurred with increasing organism complexity is largely attributable to gene duplication events (Babu et al. 2004). This process has

4

resulted in the development and expansion of related TF classes that are typically grouped based on the presence of a common DNA-binding motif (Figure 1.2). The most prominent of these motifs in humans are the ZF motif, the homeobox motif and the helix-loop-helix motif

(Vaquerizas et al. 2009). Within these classes are closely related families of TFs where the members have diverged more recently in evolutionary time. Whereas classes of TFs are loosely grouped based on the presence of a certain motif, TF families usually carry the same DNA binding domain and typically exhibit highly similar DNA-binding preferences. The fact that large groups of TFs carry the same DNA-binding preferences is interesting as analysis of such families may provide insights into whether these DNA-binding preferences alone can determine TF occupancy in vivo.

Figure 1.2 Vaquerizas et al show the classes and prevalence of various DNA-binding motifs amongst sequence specific human transcription factors. (Vaquerizas et al. 2009).

5

There are numerous TF families in mammals, but one that has been studied extensively is the ETS family. 28 ETS factors have been reported in humans to date (Wei et al. 2010; Hollenhorst et al.

2011) and these proteins exhibit extremely diverse functions ranging from neuronal development, angiogenesis, hematopoiesis, cellular proliferation, differentiation, migration and oncogenesis

(Bartel et al. 2000; Sharrocks 2001; Schober et al. 2005; Vrieseling and Arber 2006; Kumar-Sinha et al. 2008). Proteins of the ETS family are all linked by a common DNA-binding domain called the ETS domain, which is a structural variant of the winged helix-turn-helix motif (Sharrocks

2001). The common ETS DNA-binding domain shared amongst these proteins means that they bind to highly similar DNA motifs in vitro, yet they are capable of achieving diverse functions

(Wei et al. 2010). Whether this is through differential occupancy or other means is discussed below.

The DNA-binding preferences of 27 human and 26 mouse ETS proteins have been thoroughly investigated by one group of authors using a combination of high-throughput microwell-based

TF DNA-binding specificity assays and protein binding microarrays (Wei et al. 2010). These data revealed very minor divergences in the DNA binding preferences of some ETS factors, but nevertheless enabled them to be further classified into four classes (Figure 1.3A). Although classes could be defined on varying DNA sequence preference, the motif divergences were rather small and where a nucleotide was preferred in one class it was still generally tolerated by the consensus motif of another class (Figure 1.3A).

Having established that the in vitro DNA-binding preferences of the ETS factors were highly similar, Wei et al. then went on to perform ChIP-seq on three of the class I ETS factors and one each of classes II, III and IV (Figure 1.3B). They then analysed the in vivo binding sites to determine which peaks showed similar differential enrichment of class I-IV motifs. Although there was some enrichment for class I motifs in class I ETS factor ChIP peaks, there were still many peaks that exhibited enrichment of class II motifs, indicating that DNA sequence differences between class I and II motifs do not seem to be the only constraint on ETS chromatin

6

occupancy. This trend continued across the other ETS proteins tested, with the exception of SPI1 and SPDEF which uniquely showed enrichment of class III and class IV motifs respectively. In these cases, it would appear that DNA-sequence preference provides much greater specificity.

These findings clearly show that there is little divergence in sequence preference in vivo amongst the bulk of ETS factors (especially classes I and II; 26 proteins in total) and that ETS factors from a particular class are still able to bind to motifs that are characteristic of other classes.

Next, the overlap in chromatin binding sites across the various ETS factors was tested (Figure

1.3C). Different factors in the same cell type and the same factors across different cell types were analysed. Related cell types generally showed a greater degree of overlap in ETS factor binding, however the majority of peaks were specific to individual cell-line and factor combinations. The degree of overlap ranged from 0.9% to 25.8% across the various ETS factors and cell lines tested

(Figure 1.3C). Indeed the control ChIP-seq on the non-ETS factor androgen receptor (AR), displayed much greater overlap with the ETS factor ERG (44.4%), than any of the ETS factors did to each other. These results suggests that while there is some degree of overlap between ETS factors, the majority of their binding sites do not overlap suggesting that DNA-sequences preferences (while important) are not the sole constraint on occupancy.

On the other hand, there are certainly instances where TFs within a family can show highly similar chromatin occupancy profiles. The p53 homologues p63 and p73 share around 85% amino acid conservation in their DNA-binding domains but have divergent functions; p63-/- mice show defects in epithelial development and DNA-damage responses (Mills et al. 1999; Suh et al. 2006) whilst p73-/- mice show defects in neurogenesis, inflammation and osteoblastic differentiation

(Yang et al. 2000; Kommagani et al. 2010). Despite their functional differences, these proteins exhibit close to identical binding profiles in ME180 (human cervical carcinoma) cells (Yang et al. 2010). The ratio of p63:p73 at these binding sites is consistent across the genome suggesting that their differential functions are not attributable to differential occupancy; rather they are likely to be attributable to cell-specific expression or differential recruitment of cofactors.

7

Figure 1.3 Wei et al analyse DNA consensus sequence preferences and genome-wide occupancy within the ETS family using both in vivo and in vitro assays. (A) Consensus DNA-binding motifs enable classifcation of ETS factors into four classes (I-IV). Motifs were determined using high-throughput microwell-based TF DNA-binding specificity assays. (B) Enrichment of class I-IV consensus motifs found within ChIP-seq peaks of different classes of ETS factors. Bars are sorted into classes based on colour as indicated. (C) Overlap of genomic occupancy of various ETS factors and androgen receptor (AR) control as measured by ChIP-seq in a range of different cell lines. All overlaps were significantly enriched (P<0.001), with exception of ELF1 versus AR (in gray, significance P=0.0013). Note that the highest observed overlap (red bold typeface) in all experiments is between AR and ERG. Other overlaps over 10% are coloured in brown. Panels A-C were all sourced or adapted from (Wei et al. 2010)

Although some TFs within families can share overlapping occupancy profiles, it seems that many

TFs within families are able to achieve different patterns of chromatin occupancy whilst exhibiting a near identical DNA-sequence preference. As such these observations point to a

8

mechanism of determining TF occupancy that lies beyond the sequence preference of the DNA- binding domains of these factors. These issues are further explored in the following sections.

1.1.3. The role of DNA sequence in determining TF occupancy

The issue of whether TFs can achieve distinct occupancy profiles within families has been partially addressed above; however there is a larger question that could provide important insight.

How is it that TFs achieve specificity at all? One of the initial surprises with the advent of ChIP- chip was how few potential TF binding sites seemed to be occupied. The authors of an early ChIP- chip paper on the erythroid transcription factor GATA-1 noted that there was no correlation between predicted GATA motif density and observed GATA-1 binding sites in the human β- globin (Horak et al. 2002). With the development of ChIP-seq technologies these observations have been extended genome-wide where experiments have revealed that less than

1% of potential GATA sites are actually bound by GATA-1 in erythroid cells (Fujiwara et al.

2009). To better understand the issues behind the low level of occupancy exhibited by TFs in vivo, it is useful to consider how prokaryotic and eukaryotic TFs differ in the lengths of their binding site motifs and DNA targeting mechanisms.

Prokaryotic TFs exhibit a mean DNA consensus motif length of 15 nucleotides (Stewart and

Plotkin 2012) (Figure 1.4A). Given the relatively small size of bacterial genomes (4.6-5.9 Mb in

Escherichia coli (Lukjancenko et al. 2010)), motifs of this length are generally sufficiently unique to enable highly specific binding. For instance, the prokaryotic TF LexA binds to a motif 21 nucleotides in length and exhibits a high level of restricted occupancy (Figure 1.4C) (Wade et al.

2005). LexA is involved in the SOS system in E. coli which is responsible for controlling the cellular response to DNA damage (Little 1993). Remarkably, LexA exhibits an almost universal occupancy of every predicted LexA binding site in the E. coli genome (Wade et al. 2005). This close to perfect relationship between DNA sequence and occupancy is so robust that if artificial

LexA binding sequences are introduced into the E. coli genome, they too become occupied, regardless of genomic location (Wade et al. 2005). It is clear that genomic DNA in E. coli is

9

highly permissive to binding by TFs regardless of genomic region and that DNA-sequence preferences alone are sufficient to direct LexA to its binding sites.

Such a high degree of correlation between the presence of particular DNA sequence and observed transcription factor binding does not carry over to eukaryotes. Typical eukaryotic motifs are shorter with a mean length of less than 10 nucleotides (Figure 1.4B) and typically show widespread degeneracy in their motif sequences (Stewart and Plotkin 2012; Weirauch et al. 2013).

Given the much larger relative size of the human genome at 3.2 Gb (Morton 1991), it would be expected that longer motifs would be required to define a highly specific set of sites. Thus it seems that evolution has not favoured increasing the length of TF DNA-consensus motifs to maintain specificity, but has rather developed alternative mechanisms to target TFs within a dramatically larger genome.

Figure 1.4 Differences in DNA consensus motifs between prokaryotes and eukaryotes. Stewart and Plokington show that motif lengths differ between prokaryotes and eukaryotes. (A) Histogram of 454 curated eukaryotic transcription factors motifs. (B) Histogram of 79 curated prokaryotic transcription factors motifs. (Stewart and Plotkin 2012) (C) Wade et al. derive the LexA consensus DNA-binding motif from ChIP-chip data (Wade et al. 2005).

10

The observation that eukaryotic TF motifs carry insufficient information content to specify highly restricted binding sites is perhaps best evidenced by the difficulties in trying to predict such binding sites de novo. As mentioned earlier, prediction of prokaryotic TF binding is rather straight-forward as most predicted sites are bound in vivo (Wade et al. 2005). In eukaryotes however, the poor correlation between predicted and observed TF occupancy has led one group of authors to describe this relationship as the “Futility Theorem”. Their observations are essentially that the use of a binding model (based on the consensus sequence) for an individual eukaryotic TF will produce so many false positives that this approach will have almost no useful application (Wasserman and Sandelin 2004).

1.1.4. The role of chromatin in determining TF occupancy

So how is it that eukaryotic TFs can distinguish their ‘genuine’ occupied sites from other potential sites? The obvious difference between prokaryotic and eukaryotic genomic DNA is the absence of chromatin in prokaryotes. In eukaryotes, DNA is packed into nucleosomes comprised of a

145-147 bp stretch of dsDNA wrapped around a histone octamer (Luger et al. 1997). The histone octamer generally consists of two molecules of each of the histone proteins H2A, H2B, H3 and

H4 (Luger et al. 1997). Typically, nucleosomal DNA is much less permissive to interactions with

DNA binding proteins than stretches of DNA which are not bound by histones (Struhl and Segal

2013). Thus, it could be reasonably suggested that the presence of nucleosomes might dramatically reduce the number of potential TF binding sites available. Under these assumptions, only the subset of sites that are marked by open chromatin and nucleosome depletion could be bound by TFs. Such a notion would appear to fit rather neatly with other observations, however it is notable that nucleosome position is at least partially determined by the binding of TFs. In order to understand the interplay between nucleosome position and the occupancy of DNA by

TFs, it will be useful to explain the current state of knowledge around how nucleosome positioning is established in the cell.

11

1.1.4.1. Nucleosome position

Nucleosome position is dynamic in vivo and can be modified by various distinct mechanisms at particular genomic regions. Here we will only focus on promoters and enhancers as these regions are of primary relevance to understanding the binding of sequence specific TFs. The mechanisms that determine chromatin state across other parts of the genome are complex and are beyond the scope of this thesis. Active promoters and enhancers are typically characterised by nucleosome depleted regions (NDRs) and contain binding sites for multiple TFs. In the following paragraphs, we will cover some of the known mechanisms by which nucleosome position is determined at these enhancer and promoter NDRs.

For some time it has been known that DNA sequence can affect the likelihood of nucleosomes spontaneously forming on genomic DNA (Trifonov and Sussman 1980; Drew and Travers 1985;

Shrader and Crothers 1989). The ability for a stretch of dsDNA to wrap around a histone core and form a nucleosome is known to be impacted by the flexibility of certain DNA sequences. The optimal configuration for spontaneous nucleosome formation is the presence of a flexible A-T repeat every 10 bp in the DNA that wraps around the histone core (Struhl and Segal 2013). In addition, tracts of poly(dA:dT) or poly(dC:dG) are particularly inflexible and can inhibit nucleosome formation (Nelson et al. 1987; Suter et al. 2000). Poly(dA:dT) tracts are abundant in eukaryotes, especially at gene promoters and may account for the fact that yeast promoter regions and transcription termination sites are far less conducive to spontaneous nucleosomal occupancy in vitro than are gene bodies (Suter et al. 2000; Zhang et al. 2009). The relationship between DNA sequence and nucleosome occupancy is only part of the narrative however, as genes that are involved in stress responses in yeast, notably have fewer such poly(dA:dT) tracts, and other mechanisms discussed below are likely to be the primary force in defining nucleosome positions in these regions (Field et al. 2008; Tirosh and Barkai 2008). Support for the minor role for DNA sequence in directly determining nucleosome position comes from the experimental determination that sequence accounts for only 20% of all nucleosome positions when comparing spontaneous positioning in vitro and observed positions in vivo (Zhang et al. 2010).

12

In addition to the contribution of sequence to the position of nucleosomes, dynamic processes also exist to affect nucleosome position in promoters and enhancers. In different cell types, promoter nucleosome occupancy patterns differ dynamically in association with gene expression profiles (Bargaje et al. 2012). The fact that these NDRs can change dynamically suggests that local sequence cannot be the sole determinant of nucleosome density at these locations, and that other mechanisms must contribute. The dynamic modulation of NDRs at promoters is brought about by three primary mechanisms; the direct binding of TFs, the recruitment of chromatin remodelling complexes and the post-translational modification of histones.

Recent work in yeast has revealed that transcription factors are able to displace nucleosomes from particular regions by simply competing with histones for DNA (Charoensawan et al. 2012).

Transcription factors in yeast tend to bind to the same groups of sequences that are favoured for the spontaneous formation of nucleosomal DNA. Moreover, activating factors are more likely to disrupt nucleosome position via this mechanism than repressors. Repressors tend to exhibit sequence preferences for regions that lie between nucleosomes (linker regions) rather than for histone-bound sequences (Charoensawan et al. 2012). More granular experiments have revealed that the binding of sequence-specific TFs are critical for the generation of the NDR upstream of a particular gene (CLN2) in yeast (Bai et al. 2011). At this locus, intrinsic sequence preference does not establish the NDR, rather eight transcription factors that act redundantly and simultaneously are responsible for the formation of the NDR (Bai et al. 2011).

How TFs might disrupt the relatively stable nucleosomal complex to bind and outcompete histones seems also to have been elucidated. Studies of histone:DNA interactions have found that nucleosomes can “breathe” periodically with nucleosomal DNA dissociating from histones 10-

50ms out of every 260-300ms before reforming (Li et al. 2005). This property of spontaneous nucleosome dissolution and reformation may facilitate the competitive binding by TFs and displacement of histones.

13

In addition to the direct role of TFs in establishing NDRs, indirect effects via cofactors are also of importance. Chromatin remodelling complexes can shuffle nucleosomes to either create NDRs or nucleosome enriched regions (reviewed in (Clapier and Cairns 2009)). Again TFs play a primary role in controlling this process as these generic chromatin remodelling complexes are recruited to particular loci by sequence specific TFs (Peterson and Logie 2000; Peterson and

Workman 2000; Ko et al. 2008).

Perhaps the most extensively studied of the chromatin remodelling complexes is the SWI/SNF complex. This complex is conserved from yeast to humans and shows a high level of diversity in the composition of its subunits facilitating differential regulatory effects dependent on gene context, (Vignali et al. 2000; Gangaraju and Bartholomew 2007). In particular contexts,

SWI/SNF can act as either a co-activator or co-repressor depending on the precise subunit composition (Sudarsanam and Winston 2000; Zhang et al. 2007). The SWI/SNF complex is capable of interacting with a large number of sequence specific TFs which enable the recruitment of the complex to specific loci (Peterson and Logie 2000; Peterson and Workman 2000; Ko et al.

2008). After being recruited, SWI/SNF remodels chromatin structure by either shifting nucleosomes along the DNA or twisting or bending DNA to modulate the nucleosome structure in an ATP-dependent manner (Sudarsanam and Winston 2000; Saha et al. 2002; Lorch et al.

2005). As such it can either positively or negatively affect the subsequent binding of TFs by altering nucleosome density.

Nucleosomes are removed from promoters to create NDRs but they can also be reconstituted with different histone variants to create nucleosome diversity with functional significance. In particular, histone variants have a known role in establishing chromatin structure around the TSS

(Guillemette and Gaudreau 2006). It has been shown in mammalian cells that the NDR in promoters of active genes is flanked on both sides by histone H2A.Z–containing nucleosomes

(Barski et al. 2007; Kelly et al. 2010). In addition, more recent work has shown that the NDR is not necessarily “nucleosome-free” but rather incorporates labile double variant nucleosomes

14

which include histone H3.3–H2A.Z (Jin et al. 2009). It is thought that these unstable nucleosomes play a role in promoting transcription factor binding but a full understanding of their interaction with DNA-binding transcription factors remains unknown at the present time.

1.1.4.2. Histone post-translational modifications

Histone post-translational modifications (PTMs) can also affect the affinity of nucleosomes and

TFs for DNA within promoters and enhancers. A huge diversity of histone PTMs have been reported with well over 100 unique modifications discovered to date (Zentner and Henikoff 2013).

Again, we will primarily focus on promoters and enhancers to better understand how histone

PTMs in these regions affect TF binding. At promoters or enhancers, histone PTMs are often associated with either permissive or closed chromatin states and PTMs are known to affect the binding of TFs to these regions. For instance, acetylation of lysine residues is known to affect the DNA binding affinity of histones by cumulatively neutralising the charge of the histone tails

(Allfrey et al. 1964; Hong et al. 1993; Dion et al. 2005). Thus, when histones become acetylated, they typically adopt a looser configuration with DNA allowing TFs to bind more easily to such regions (Lee et al. 1993; Vettese-Dadey et al. 1996). Acetylation may also destabilise histone:DNA interactions allowing chromatin remodelling complexes to shuffle nucleosomes more readily (Chatterjee et al. 2011).

The major issue when examining the role of histone modifications in transcriptional regulation is one of causality. The majority of histone PTMs at promoters and enhancers are brought about by the recruitment of generic histone modifying enzymes by sequence-specific TFs (Brown et al.

2000). For instance, activating TFs typically recruit histone acetyltransferases which result in the acetylation of lysine residues. As such, sequence-specific binding by TFs is the key specifying event for gene activation (Bell et al. 2011) initiating a combination of secondary mechanisms including nucleosome shuffling and eviction and post-translational modification of histone proteins (Figure 1.5). In a similar manner, repressor TFs are also central to the induction and

15

maintenance of repressive chromatin states by recruiting cofactors to modify histones and compact chromatin (Hathaway et al. 2012).

1. Becoming active

...... • . . .••• •• ...... · ...... ·~tY .--I-+~ TSS ~@@@@@@@@@@@@@~

2. Low expression

Q) E ~

3. High expression

' Ac /'

Figure 1.5 Mechanisms of nucleosome remodelling at promoters or enhancers by activating transcription factors. (1) When gene expression is low, nucleosomes tend to be more dense and chromatin is less permissive to DNA-binding proteins. TFs can still bind to nucleosomal DNA in these contexts and can recruit cofactors such as histone acetyltransferases and chromatin remodelling complexes to assist in forming nucleosome depleted regions. (2) TF binding, acetylation of histones and shuffling by chromatin remodelling complexes begins to establish a nucleosome depleted region around the transcription factor binding site. This leads to increased occupancy by the transcription factor and greater gene activation. (3) The successive binding events of activating TFs and their co-factors have heavily

16

depleted the binding site of nucleosomes resulting in greater occupancy and even higher levels of gene expression. Note that the additional effects of TF-binding on DNA methylation are not displayed here.

Abbreviations: HAT=histone acetyltransferase, ChRC=chromatin remodelling complex, TF(+)=activating transcription factor, GTFs=general transcription factors, Ac=acetylation of histone lysine residues, RNAP=RNA polymerase, TSS=transcription start site.

1.1.4.3. DNA methylation

DNA methylation can also affect chromatin state and the accessibility of DNA to TFs. The major modification in higher organisms is the attachment of a methyl group at the 5 position to cytosine residues (5mC). This modification mostly occurs at CpG dinucleotides and around 60-80% of these CpG dinucleotides show evidence of methylation in humans (Smith and Meissner 2013).

Multiple CpG dinucleotides clustered together are termed CpG islands (CGIs) and these CGIs contain around 10% of the total CpGs in the human genome (Smith and Meissner 2013). Again the sole focus here will be on mechanisms of methylation at promoters and enhancers. Around

60% of human genes have CGIs in their promoters (Jones 2012) but most of these are constitutively hypomethylated (Smith and Meissner 2013). The limited number of promoters that do show CpG methylation are generally associated with genes that show long-term and stable repressed states such as imprinted genes (Jones 2012). The general hypomethylation at most promoters is thought to occur by multiple mechanisms that exclude DNA-methyltransferases from these regions. The histone variant H2A.Z and the histone mark H3K4me3 are strongly associated with the nucleosomes that border NDRs in promoters and these two nucleosome variations are thought to block de novo CpG methylation (Ooi et al. 2007; Conerly et al. 2010).

TFs also appear to block methylation at promoters as a number of studies have shown that the deletion of TF binding sites from promoters leads to increased levels of methylation (Brandeis et al. 1994; Macleod et al. 1994). Moreover artificial insertion of TF binding sites has been shown to lead to hypomethylation at these regions (Macleod et al. 1994). Similar trends are observed at enhancers where low levels of methylation are also observed. These low levels of methylation are known to be established and maintained by the binding of sequence specific TFs (Stadler et al. 2011).

17

The observation that TFs play a major role in determining nucleosome position, histone PTMs and methylation patterns at promoters and enhancers would appear to challenge the view that chromatin state is the sole determinant of available sites for TF binding. Rather a more complex interplay is likely at work, where a combination of incumbent chromatin state and the ability of

TFs to alter chromatin state work against each together to determine TF occupancy. If chromatin accessibility does not always restrict potential transcription factor DNA-binding sites, then the short degenerate motifs of eukaryotic TFs are of insufficient complexity alone to specify observed in vivo occupancy. In the next section, we will examine the potential additional sources of specificity that might enable TFs to achieve such a highly restricted occupancy profile in vivo.

1.1.5. The role of cofactors and cobinding in determining TF occupancy

Eukaryotic genes are well known to be combinatorially regulated by multiple transcription factors in a developmental and cell-type specific manner (Maston et al. 2006). The ENCODE project has recently found widespread evidence of both TF cobinding (where two factors bind nearby to each other on the DNA strand, often in the presence of additional cofactors) and TF tethering (where one TF is recruited to DNA indirectly by a protein:protein interaction with another TF) (Wang et al. 2012). In 70 out of 87 (80.4%) of sequence specific TFs investigated, secondary motifs were found within TF peaks (indicating either cobinding or tethering). From these data, 151 potential tethering pairs were inferred based on both measured occupancy of a TF and the lack of a consensus motif for that factor at that particular site. 104 cobinding pairs were identified from these same data based on the dual occupancy and motif enrichment of two TFs at the same genomic location (Wang et al. 2012). These results were by no means unexpected as the tendency of multiple TFs to act together at cis-regulatory modules is well documented. However, these recent results show the very high prevalence of inter-factor relations.

The widespread nature of TF cobinding may be a major contributor to TF specificity as interactions between various TFs may bolster specificity by increasing the number of nucleotides contacted (Figure 1.6). Whereas a single eukaryotic TF may bind to a somewhat degenerate motif

18

of around 10 nucleotides in length, multiple interacting TFs would show a dramatically increased level of specificity, both by the additional sequence preference of the bridged TF, and also by the spacing between the motifs.

Figure 1.6 Interactions between TFs may assist in improving specificity and stability at regulatory elements. (A) DNA-binding sequence specificity by a single eukaryotic TF is restricted by the short length and the degenerate nature of DNA-consensus motifs. (B) Interaction and cobinding by two TFs depends on a more specific combined DNA- consensus motif and increases DNA binding affinity by providing multiple contact surfaces. (C) Cofactors can also allow cobinding through the formation of bridges.

There are a number of lines of evidence that support this idea. Earlier the concept of the futility theorem was introduced, suggesting that predicting TF binding sites based on the consensus motif of a single factor would generate so many false positives as to be of little use (Wasserman and

Sandelin 2004). A more recent approach to predicting eukaryotic TF binding sites may give insights into how TFs might achieve specificity by binding combinatorially. This approach makes use of the assumption that TFs often bind within close proximity to each other in cis-regulatory modules. If the consensus motifs of multiple TFs, particular those that have known

19

protein:protein interactions are found to be co-localised in a stretch of DNA, then a site will carry a much better predictive value (Mysickova and Vingron 2012). By looking for clusters of potential

TF binding sites, such predictive models become much more useful.

Experimental evidence also supports the importance of cobinding as being a major determinant of TF occupancy. Some TFs are able to bind to a subset of their genomic targets in the absence of an intact DNA binding domain. A DNA-binding domain mutant of the haematopoietic TF,

SCL/TAL1 has been shown to rescue a KO phenotype in haematopoietic cells. ChIP-seq studies revealed that the DBD mutant could still occupy around 20% of the binding sites bound by the wild-type protein (Porcher et al. 1999; Kassouf et al. 2008; Kassouf et al. 2010). This would imply that another interaction must be involved in either recruiting or stabilising SCL/TAL1 at particular loci.

Perhaps a more elegant way of investigating how binding partners might affect TF specificity involves the mutation of domains that recruit these partners. The genome-wide profiles of both wild-type GATA-1 and a naturally occurring GATA-1 mutant (GATA-1V205G), that carries a non- functional binding domain for the co-factor Friend Of GATA-1 (FOG-1) were compared in

G1ME cells (a Gata1-/- megakaryocyte-erythroid progenitor cell line) (Chlon et al. 2012). GATA-

1 occupancy was shown to be dependent on its interaction with the co-factor FOG-1 as the GATA-

1V205G mutant showed a differential chromatin binding pattern to the wild-type protein. In addition, cells expressing the GATA-1V205G mutant showed a marked deficiency in megakaryocyte differentiation, a hallmark of GATA-1 expression in these cells. Chlon et al. went on to establish that in the absence of FOG-1, GATA-1 occupies mast cell specific genes and forced expression of FOG-1 can displace GATA-1 from these targets. Again these results imply that cofactors are capable of tuning TF specificity.

Cofactors are also known to influence the specificity of Hox proteins. The 8 known Hox proteins are patterning TFs that control the body plan of the embryo and the segment structures that form along the anterior-posterior axis (Pearson et al. 2005). All 8 of these proteins are characterised

20

by a common DNA-binding domain and show similar DNA-binding preferences in vitro. As discussed earlier, this presents a challenge in that these proteins are known to regulate separate genes and their DNA binding domains do not seem capable of conferring this specificity.

However in the case of the Hox family, the mechanism by which these proteins achieve specificity may have come to light. The shared Hox cofactor extra-denticle (Exd), enables the 8 Hox proteins to recognise different sequences (Lelli et al. 2011; Slattery et al. 2011). When bound to Exd, each of the Hox factors show differences in their consensus motifs as measured by high-throughput

SELEX, achieved by extended DNA-binding motif lengths, comprising a highly conserved 6 nucleotide core and loosely conserved 9 nucleotide flanking regions on either side (Stormo and

Zhao 2010; Lelli et al. 2011; Slattery et al. 2011).

Similarly in Saccharomyces cerevisiae, certain cofactors are able to influence the specificity of a transcriptional regulatory complex. The non-DNA-binding cofactors Met4 and Met8 achieve this by extending the DNA-binding motif of the TF Cbf1 by 5 nucleotides. This lengthened motif enhances DNA-binding specificity and allows the Met4-Met8-Cbf1 complex to specifically target sulphur metabolism genes (Siggers et al. 2011).

Another class of proteins, the high-mobility group (HMG) proteins may also offer insights into co-operative targeting. HMG proteins are characterised by their non-sequence-specific DNA- binding activity. These highly abundant proteins localise to nucleosome cores where they bend

DNA and alter nucleosome stability as a prelude to transcription initiation (Ragab and Travers

2003; Travers 2003; Stros 2010). In addition, HMG proteins may serve as chaperones for transcription factors, bending DNA to bring transcription factor binding sites into close proximity to facilitate interactions (Travers et al. 1994).

Other remarkable effects may also assist TFs in gaining specificity. An intriguing study by (Kim et al. 2013) has reported that allosteric effects can be transmitted through protein:DNA interactions. The authors examined the kinetics of the binding of a number of protein pairs including T7 RNA polymerase and the Lac repressor. They found that the binding of each of

21

these proteins can be either stabilised or destabilised depending on the spacing between their binding sites (Figure 1.7). This was shown not to be via protein:protein interactions, but rather through allostery transferred via the DNA strand. The authors proposed that this allostery is linked to the periodicity of the DNA-helix, such that different major groove widths are generated proximal to the binding of a DNA binding protein. As such, clusters of TF binding sites may stabilise the binding of multiple factors via DNA:protein allosteric modulation. This may be a key mechanism that allows TFs to occupy only a subset of sites.

Figure 1.7 Allosteric coupling between transcription factors through DNA. Kim et al. show that changes in the major groove width of DNA causes the variation of the allosteric coupling between two DNA-binding proteins A and B. If protein B binds better at increase values of R, it would favour binding at positions where R is already widened (dR > 0) by protein A (left), but disfavor where R is narrowed (dR < 0) (right). R indicates major groove width. Sourced from (Crothers 2013), adapted from (Kim et al. 2013)

Although some evidence that implicates cobinding of TFs as a major component of TF specificity is accumulating, no consistent paradigms have yet emerged. Most likely the binding of a TF to a particular genomic region is influenced by multiple parameters including the DNA sequence preference of individual TFs, interactions with cofactors, cobinding of TFs, local TF concentration, competition from related TFs and chromatin effects such as DNA methylation and

DNA accessibility. Additionally, the cooperative binding and remodelling of chromatin by a number of TFs at a particular genomic location may be sufficient to define a promoter or enhancer region, whereas the occasional spurious binding of a single TF at a random location will have little effect. What is clear however is that the short degenerate motifs, typical of eukaryotic TFs,

22

are insufficient alone to specify in vivo binding sites. As such, it seems reasonable to suggest that additional determinants of specificity must be at play, and that a large component of these may involve the nearby binding of other TFs.

In order to study these additional mechanisms of targeting, we turned to the KLF family of TFs.

The KLFs are a large family of archetypal ZF TFs that share common DNA-binding consensus sequences but have distinct functions in vivo and therefore provide a good system for testing the specificity determinants of TFs, both in general, and within a TF family.

1.2. Krüppel-like factors

KLFs are a 17 member family of mammalian TFs that are grouped together based on a highly conserved DNA binding domain (DBD) (McConnell and Yang 2010). The DBD is located at the

C-terminus of all KLF proteins and is comprised of three tandem classical (C2H2) ZF motifs. The

KLF DBD shares homology with the DBD of the SP family of TFs and these two TF families are often grouped together as the larger SP/KLF family. There are nine known SP factors, giving a total number of 26 SP/KLF factors reported to date. In contrast to their highly conserved C- terminal DBD, KLFs exhibit highly variable N-terminal domains that give rise to wide functional diversity within the family (Nagashima et al. 2009). A schematic of the common domain configuration of a generic KLF protein is shown in Figure 1.8A.

The conservation of the DBD across the family has provided a basis on which to examine the evolutionary divergence of these proteins (Figure 1.8B) (Nagashima et al. 2009). Such analysis has led to the classification of KLFs into related groups: group 1 KLFs are predominantly activators with some members exhibiting additional repressive functions (Wang et al. 2004;

Siatecka and Bieker 2011); group 2 KLFs all recruit the corepressor CtBP and are predominantly repressors (Turner and Crossley 1998; van Vliet et al. 2000; Schuierer et al. 2001) and group 3

KLFs recruit the cofactor Sin3A and although usually discussed in the context of repression, can also act as activators in some situations (Kaczynski et al. 2001; Fernandez-Zapico et al. 2003).

23

The KLFs have a diverse range of functions and show differential patterns of tissue specific expression. Many of the Klf genes have been studied in gene ablation models in mice, revealing effects ranging from embryonic lethality to relatively mild phenotypic defects (reviewed in

(Pearson et al. 2008)). Such studies have also revealed the extremely widespread function of these proteins across most aspects of physiology including but not limited to: pluripotency, cancer, development, proliferation, apoptosis and inflammation.

24

Figure 1.8 The Krüppel-like factor (KLF) family of mammalian proteins. (A) A schematic of the generic domain structure of KLFs showing the common C-terminal DNA binding domain and variable N-terminal functional domains. (B) A phylogenetic tree showing the conservation of the KLF DNA binding domain across the family. Adapted from (Nagashima et al. 2009)

1.2.1. DNA binding by KLFs

The common element uniting the KLFs is the highly conserved C-terminal ZF DNA binding domain. Around 3% of human genes contain at least one ZF motif making it the most prevalent

DNA-binding domain in the human proteome (Tadepally et al. 2008). A number of subclasses

25

of ZF motif exist, with the most abundant form being the ‘classical’ or C2H2 ZF, which commonly occurs in tandem repeats in eukaryotic proteins (Brayer and Segal 2008). The general consensus sequence for classical ZF motifs is (F/Y)XCX2–5CX3(F/Y)X5ΨX2HX3–5H/C, where X is any amino acid and Ψ is a hydrophobic residue (Wolfe et al. 2000). This sequence forms a ββα fold with the two cysteine and two histidine residues coordinating a zinc ion to stabilize the motif (Lee et al. 1989; Rhodes and Klug 1993). Most classical ZFs are automatically assumed to be DNA- binding domains and the vast majority of experimental data supports this, however ZF domains have also been reported to interact with protein and RNA binding partners (Brayer and Segal

2008; Burdach et al. 2012). Understanding of the structural basis for DNA binding by classical

ZFs has advanced to the point where it is possible to design and develop ZF proteins to target a particular DNA sequence of interest (Mandell and Barbas 2006a; Papworth et al. 2006; Klug

2010). As a result, the ZF motif has commonly been used as an engineered DNA binding domain for a range of applications including generation of artificial transcription factors and zinc finger nucleases (Mandell and Barbas 2006a; Papworth et al. 2006; Klug 2010).

The crystal structure of the ZF domain of KLF4 bound to DNA was recently published revealing that the KLF ZFs follow the structure of other well studied classical ZF domains including ZIF268 and SP1 (Pavletich and Pabo 1991; Oka et al. 2004; Schuetz et al. 2011). A canonical model for

DNA-binding by classical ZFs was initially developed based on the crystal structure of ZIF268

(Pavletich and Pabo 1991). Under this model, residues in positions –1, +3 and +6 of the α-helix contact the 3′, middle and 5′ nucleotides of the target strand, respectively. Position 2 in the helix interacts with DNA bases on the reverse strand in certain cases (Payre and Vincent 1988) meaning that a single ZF recognises four nucleotides; three on the sense strand and one on the antisense strand which overlaps with the next ZF (if one is present) as depicted in Figure 1.9A. The

KLF4:DNA crystal structure revealed the presence of similar contacts to those predicted in the canonical model (Schuetz et al. 2011).

26

An alignment of all 17 KLFs reveals that the contact amino acids identified by the canonical

DNA-binding model are almost entirely conserved across the family (Figure 1.9B). Based on this model, the amino acids that contact DNA are predicted to bind to the site 5’-NRG-GNG-NGR-3’

(where R can be either A or G) (for a more detailed review of KLF3 predicted DNA binding preferences see (Pearson et al. 2011)).

The DNA binding preferences of KLFs have been extensively studied in a wide range of experiments. KLFs have long been known to bind to CACCC boxes and GC-rich sequences in

DNA (Miller and Bieker 1993; Anderson et al. 1995; Crossley et al. 1996; Kaczynski et al. 2003;

Suske et al. 2005). Indeed the initial cloning of KLF1 identified a potential binding site in the β- globin promoter with the sequence CCA CAC CCT based on in vitro assays (Miller and Bieker

1993). Subsequently, the binding preferences of other KLFs were further established using in vitro assays at first (Nagashima et al. 2009), and later with the identification of a limited number of in vivo target sites using ChIP-PCR (Eaton et al. 2008). In more recent times the DNA consensus motifs of KLF1 and KLF4 have been determined by ChIP-seq (Figure 1.9C-D). These

ChIP-seq experiments have established that KLF1 and KLF4 exhibit highly similar DNA-binding sequence preferences in vivo. This thesis adds a third KLF consensus sequence for KLF3 which also displays a high degree of similarity to the consensus motifs of KLF1 and KLF4.

27

Figure 1.2

Figure 1.9 DNA binding by KLF proteins. (A) The current model for predicting ZF DNA binding. Adapted from (Klug 2010). (B) Conservation of DNA interacting residues within the KLF family. DNA contact residues are highlighted in green. Amino acids are coloured based on their degree of conservation with black indicating perfect conservation and light grey indicating no conservation. (C) KLF1 consensus binding motif from a ChIP-seq experiment on erythroid cells (Tallack et al. 2012). (D) KLF4 consensus binding site from ChIP-seq on ES cells (Chen et al. 2008).

1.2.2. KLF3

For the studies in this thesis, we chose to primarily focus on the TF KLF3. KLF3 is a widely expressed protein that is most closely related to KLF8 and to KLF12, all three of which are thought to act predominantly as repressors of transcription (Pearson et al. 2011). Our lab and others have worked toward an understanding of KLF3 biology using molecular and biochemical techniques to reveal roles for KLF3 in a number of tissues.

KLF3 was originally cloned from erythroid cells and the Klf3 gene shows high levels of expression in erythroid tissues, due in part to an upstream erythroid specific promoter which is activated by KLF1 (Crossley et al. 1996; Funnell et al. 2007). Given the specific activation of

28

this promoter, it is not unexpected that KLF3 has been shown to have important roles in erythropoiesis. Klf3 expression increases during erythroid maturation allowing it to regulate certain aspects of terminal erythrocyte differentiation. Klf3 ablation leads to a mild compensated anaemia, potentially attributable to KLF3 repressing a subset of KLF1 target genes in the late stages of erythroid maturation (Funnell et al. 2012). The fact that KLF3 seems to be able to suppress a subset of KLF1 target genes, but leave many other KLF1 targets unaffected, suggests the presence of differential specification mechanisms beyond the DNA binding preference of these proteins.

The generation of a Klf3-/- mouse by disruption of the ZF region of the Klf3 gene has implicated

KLF3 in adipogenesis, where Klf3 ablation leads to a reduction in white adipose tissue in male mice (Sue et al. 2008). These mice exhibit a lean body composition and low bodyweight. On a high fat diet they are resistant to diet induced obesity and show reduced susceptibility to insulin resistance. Additionally, 3T3-L1 cells (murine preadipocytes) show a decrease in Klf3 expression through differentiation and forced expression of Klf3 blocks differentiation into mature adipocytes (Sue et al. 2008), showing that KLF3 is important for proper adipogenesis.

A role for KLF3 has also been identified in B-cell development where it is important for proper maturation of B-cells in bone marrow, spleen and the peritoneum (Turchinovich et al. 2011; Vu et al. 2011). In Klf3-/- mice, B-cell maturation defects are observed across these three sites, with each regional population of B-cells exhibiting different effects. A reduction in the number of immature B-cells is found in Klf3-/- animals, whilst circulating mature B-cell numbers are increased. The structure of the marginal zone that accommodates B-cells in the spleen is disrupted and the B1 B cells from the peritoneum showed significant defects in development (Vu et al.

2011).

The molecular mechanisms by which KLF3 can regulate gene expression have been investigated in detail (Figure 1.10). Like the other group 2 KLFs (KLF8 and KLF12), KLF3 acts predominantly as a repressor by recruiting the co-factor C-terminal binding protein (CtBP)

29

(Turner and Crossley 1998; van Vliet et al. 2000; Schuierer et al. 2001). CtBP is recruited by

KLF3 via a PVDLT domain in the N-terminus of KLF3 (Turner and Crossley 1998). CtBP in turn recruits a range of factors including the histone lysine-specific demethylase HLSD, the transcriptional corepressor CoREST, the class 1 histone deacetylases HDAC1/2 and the histone methyltransferases GLP/Eu-HMT1 and G9a (Laherty et al. 1997; Kaczynski et al. 2001; Sif et al.

2001; Yang et al. 2003). These co-factors are able to remodel chromatin to reduce accessibility for activating transcription factors, thus limiting gene activation. CtBP can also self-associate in the presence of NAD(H) (Nardini et al. 2009) leading to the suggestion that CtBP may function as a metabolic sensor (Jack et al. 2011), a concept supported by the metabolic effects observed in

Klf3-/- mice (Sue et al. 2008). CtBP can act as a co-factor for over 30 mammalian transcription factors (Chinnadurai 2007), and may therefore provide a scaffold for larger transcriptionally repressive complexes to form at cis-regulatory elements.

Figure 1.10 KLF3 is a transcription factor that represses target genes by recruiting the cofactor CtBP. CtBP is capable of dimerising in the presence of NADH and can recruit histone methyltransferases (HMTs), histone deacetylases (HDACs), and histone lysine specific demethylases (HLSDs). These cofactors operate in combination to repress transcription.

Deletion of the N-terminal 75 amino acids of KLF3 (a region that includes the CtBP contact domain) results in full disruption of KLF3’s repressor activity in reporter assays (Turner and

Crossley 1998). However the mutation of the CtBP contact domain alone only reduces the

30

repression potential of KLF3 in these assays suggesting that other cofactors may also be bound by the N-terminal domain (Turner and Crossley 1998).

Yeast two-hybrid analysis and co-immunoprecipitations have revealed FHL3 as a potential binding partner for KLF3 (Turner et al. 2003). FHL3 is also known to associate with CtBP, suggesting that these proteins may form a larger complex (Turner et al. 2003). Despite the evidence of interaction, little is known about the functional role of FHL3 recruitment in KLF3 directed gene regulation.

1.3. Aims of this thesis

The genome-wide in vivo occupancy of KLF3 has not yet been established in any cell type.

Moreover, the mechanisms by which TFs achieve their restricted chromatin occupancy similarly remain unknown. Evidence points largely to the role of cofactors and combinatorial binding of

TFs at promoters and distal regulatory elements as the primary mechanisms by which TFs achieve specificity. In this thesis, the genome-wide occupancy of KLF3 in murine embryonic fibroblasts

(MEFs) is established and is then linked with gene expression and RNA polymerase II (RNAP II)

ChIP data to better understand the correlation between KLF3 occupancy and KLF3 gene regulation. The association between KLF3 and other cobinding TFs is also investigated, as is the relationship between KLF3 occupancy and chromatin state. Finally, the question of KLF3 specificity is addressed by mutating non-DNA binding domains of KLF3 to investigate their contribution to TF targeting.

31

Chapter 2. Materials and methods

2.1. Materials

2.1.1. Reagents and kits

Below is a list of important reagents and kits used with details of the suppliers or manufacturers.

For brevity, common laboratory reagents are not mentioned as they are easily obtainable from numerous suppliers in any region.

 Fluoroshield™ with DAPI (Sigma-Aldrich, MO, USA).

 cOmplete EDTA free protease inhibitor cocktail (Roche Applied Science, Switzerland)

 Adenosine 5-[γ-32P] triphosphate ([γ-32P] ATP) (PerkinElmer Life Sciences, MA, USA)

 Polybrene (Sigma-Aldrich, MO, USA)

 FuGENE6 transfection reagent (Roche Applied Science, Switzerland)

 Dynabeads Protein G (Life Technologies, CA, USA)

 TruSeq DNA Sample Prep Kit (Illumina, CA, USA)

 WT Sense Labelling Kit (Affymetrix, CA, USA)

 MinElute PCR Purification Kit (Qiagen, Netherlands)

 RNeasy Mini Kit (Qiagen, Netherlands)

 Immunobiolion Western HRP and AP Chemiluminescent HRP Substrate (Millipore, MA,

USA)

2.1.2. Enzymes

 PfuUltra™ Hotstart DNA polymerase (Stratagene, CA, USA)

 Proteinase K (Roche Applied Science, Switzerland)

 T4 DNA ligase (New England Biolabs, MA, USA)

 T4 polynucleotide kinase (New England Biolabs, MA, USA)

32

2.1.3. Cell lines

 Klf3-/- and Klf3+/+ MEFs were a gift from Crisbel Artuz, University of New South Wales

and were generated from C57BL6 mice.

 EcoPack 2-293 cells (Clontech Laboratories, CA, USA)

 COS cells were gifts from Alister Funnell

2.1.4. Antibodies

Antibodies used for western blot, immunofluorescence microscopy, electrophoretic mobility shift assay and chromatin immunoprecipitation are listed below.

2.1.4.1. Primary antibodies

 Anti-KLF3 – rabbit polyclonal – generated in-house against amino acids 1-268.

 Anti-V5 – mouse monoclonal - R960-CUS (Invitrogen, CA, USA)

 Anti-V5-HRP – mouse monoclonal - R961-25 (Invitrogen, CA, USA)

 Anti-V5-FITC – mouse monoclonal - R963-25 (Invitrogen, CA, USA)

 Anti-β-actin – mouse monoclonal - clone AC-74 (Sigma-Aldrich, MO, USA)

2.1.4.2. Secondary antibodies

 ECL™ Anti – Mouse IgG – NA931V (GE Life Sciences, UK)

 ECL™ Anti – Rat IgG – NA 9350 (GE Life Sciences, UK)

2.1.5. Oligonucelotides

All oligonucleotides were synthesised by Sigma-Aldrich, Australia. Sequences are given in the

Appendix in Tables A1 and A2.

2.1.6. Vectors

 pMSCV-puro (Clontech Laboratories, CA, USA)

 pMT3 – gift from Alister Funnell

 pCDNA3 (Invitrogen, CA, USA)

 pGEX-6P (GE Life Sciences, UK)

33

 pGEM-T-easy (Promega, WI, USA)

2.2. Laboratory methods

2.2.1. General methods

Standard molecular biology techniques were carried out as described in ‘Molecular Cloning, A

Laboratory Manual’ (Sambrook et al. 1989)

2.2.2. Cell culture

All cell lines were maintained at 37°C with 5% CO2. Cells were cultured in a standard medium

(HG DMEM, 10% FCS, 1% PSG) and were maintained under selection in 2.0 µg/mL puromycin where appropriate.

2.2.3. Generation of retroviral and expression vectors

Retroviral vectors were created by inserting the coding sequence of murine Klf3, ∆DL or DBD into the multiple cloning site of the pMSCV-puro vector (Clontech Laboratories, CA, USA). The

Klf3 construct included the full length murine cDNA for Klf3. ∆DL included the same full length

Klf3 cDNA with a two amino acid substitution, with AS replacing DL in the CtBP contact motif

– PVDLT – within the N-terminal domain of KLF3 (Turner and Crossley 1998). The DBD construct included amino acids 240-344 of the full length protein. Subcloning was performed as described in (Sambrook et al. 1989)

2.2.4. Retroviral transduction

To generate competent retroviruses, Ecopack2-293 cells were transfected with 5μg of plasmid

DNA using FuGENE6 transfection reagent (Roche Applied Science, Switzerland) according to the manufacturer’s instructions. After 48 hours the media containing the virus particles was harvested from the EcoPak 2-293 cells and was filtered (pore size 45μm). Klf3-/- MEF cells were seeded at 1.5x105 cells per 60mm dish and were incubated for 12 hours before transduction. At this point, the media was removed and was replaced with the viral particle containing media from the EcoPak 2-293 cells. 8μg/mL Polybrene (Sigma-Aldrich, MO, USA) was added to the media

34

and cells were incubated for 24 hours. Transduced cells were selected with puromycin (2.0μg/mL) for 48 hrs to allow non-infected cells to lift-off. The media and non-infected cells were removed and the remaining adherent cells were plated in 24x well plates in serial dilutions such that single colonies could be recovered from a single well.

2.2.5. Transient transfections for protein production

Transient transfection in COS cells for protein production were performed as previously described

(Crossley et al. 1996)

2.2.6. Nuclear extracts

Nuclear extracts were prepared as described in (Andrews and Faller 1991).

2.2.7. SDS-PAGE

Protein concentration was determined by UV-light absorbance at 280nm using a Nanodrop

(Thermo Fischer Scientific, MA, USA) such that equal loading could be achieved in each lane.

Protein extracts were run at 200V for 55mins on NuPAGE Novex 10% Bis-Tris gels (Life

Technologies, CA, USA) using X-Cell modules (Life Technologies, CA, USA) as per the manufacturer’s instructions.

2.2.8. Western blot

Proteins were transferred to a nitrocellulose membrane using the X-Cell blot module (Life

Technologies, CA, USA). Membranes were then blocked in in TBST (50mM Tris pH7.4, 150mM

NaCl, 0.05% Tween-20) with 5% (w/v) skim milk powder for 30 minutes. Membranes were then probed using the antibodies and conditions described below

2.2.8.1. Anti-V5-HRP

The anti-V5-HRP primary antibody (Invitrogen, CA, USA) was diluted 1/5000 in TBST and was incubated for 2 hrs at 4⁰C. The membrane was washed five times for 10 mins in TBST before visualisation.

35

2.2.8.2. Anti-KLF3(1-268)

The anti-KLF3 (1-268) polyclonal antibody was diluted 1/1000 in TBST with 0.5% skim milk powder and was incubated overnight at 4ºC. The membrane was then washed five times for 10 mins in TBST and secondary anti-Rabbit IgG diluted 1/10000 was applied and incubated for 1 hour. The membrane was washed five times for 10 mins in TBST before visualisation.

2.2.8.3. Anti-β-Actin

The anti-β-actin primary antibody (Sigma-Aldrich, MO, USA) was diluted 1/10000 in TBST and was incubated for 1 h at 4⁰C. The membrane was then washed five times for 10 mins in TBST and secondary anti-mouse IgG diluted 1/10000 was applied and incubated for 1 hour. The membrane was washed five times for 10 mins in TBST before visualisation.

2.2.8.4. Visualisation

HRP labelled antibodies were detected using Immunobiolion Western HRP and AP

Chemiluminescent HRP Substrate (Millipore, MA, USA) according the manufacturer’s instructions and chemiluminescent bands were detected wither using film or ImageQaunt LAS

4000 (GE Healthcare, UK). Rainbow molecular weight markers (GE Healthcare, UK) were included for size estimation.

2.2.9. Electrophoretic mobility shift assay

Nuclear extracts were prepared as described in (Andrews and Faller 1991) and EMSAs were carried out as described previously (Crossley et al. 1996). Sequences of oligonucleotides used in the synthesis of radiolabelled probes (representing the mouse β-major globin promoter, Klf8 promoter and Stard4 promoter) are displayed in the Appendix in Table A2.

2.2.10. Immunoflourescence microscopy

Cells were grown on glass slides coated with coating buffer (DMEM supplemented with 10ng/mL fibronectin, 2% collagen, 100μg/mL BSA, filter sterilised). Cells were fixed with 3% formaldehyde in Phosphate Buffered Saline (PBS) at room temperature for 10 mins. Slides were

36

washed twice with PBS. Cells were permeablised with 0.5% Triton X-100 in PBS for 5 mins at room temperature. Slides were blocked by soaking in PBS supplemented with 10% foetal calf serum and 0.05% triton-X for 30 mins at room temperature. Antigens were probed with anti-V5-

FITC (2μg/mL in PBS) in the absence of light for 1 h at RT. Slides were washed 3 times with

PBS for 5 mins each and were then mounted in Fluoroshield™ with DAPI (Sigma-Aldrich, MO,

USA). Slides were examined using an FXS-100 confocal microscope (Olympus, Japan).

2.2.11. Chromatin immunopreciptiation

ChIP was conducted in duplicate on Klf3-/- MEFs expressing recombinant Klf3-V5, ΔDL-V5 or

DBD-V5. Approximately 5x107 cells were used for each experiment and ChIP was conducted as previously described (Schmidt et al. 2009) using an anti-V5 antibody (Cat# R960-CUS, Life

Technologies, Carlsbad, CA). Library preparation was performed using the TruSeq DNA Sample

Preparation Kit (Cat# FC-121-2001, Illumina, San Diego, CA) according to the manufacturer’s instructions with minor modifications. Adapter sequences were diluted 1/40 before use and following adapter ligation, the library size extracted from the gel was 100-280 bp (excluding adapters) in line with the size of sonicated fragments. Libraries were prepared using multiplexed adapters to enable multiple samples to be run in a single lane. Library preparation was performed by the Ramaciotti Centre, University of New South Wales, New South Wales, Australia.

2.2.12. High-throughput sequencing

Libraries (6 inputs and 6 IP samples) were multiplexed into 4 lanes such that there were 3 samples per lane. Samples were sequenced using 50 bp chemistry on the HiSeq 2000 (Illumina, San

Diego, CA). Sequencing was performed by the Ramaciotti Centre, University of New South

Wales, New South Wales, Australia.

2.2.13. Real-time PCR

Real-time PCR was performed using duplicate wells for every sample on genomic DNA templates using Power SYBR green PCR Master Mix and the 7500 fast real-time PCR system (Applied

Biosystems, USA) as described previously (Hancock et al. 2010)

37

2.2.14. Gene expression microarrays

Total RNA was purified from Klf3-/- or Klf3-V5 rescued MEF cells using Tri-reagent according to the manufacturer’s instructions (Sigma-Aldrich, St Louis, MO). RNA was subsequently ethanol precipitated and washed with 75% ethanol in DEPC-treated deionized water for further purification.

RNA was then subjected to whole transcript sense labelling and hybridized to Affymetrix

GeneChip 1.0 ST mouse gene arrays (Affymetrix, CA, USA). Microarray preparation and scanning were performed by the Ramaciotti Centre, University of New South Wales, New South

Wales, Australia.

2.3. Bioinformatics methods

2.3.1. Alignment

Reads were aligned to the mm9/NCBI37 Mus musculus genome using Bowtie2 v2.0.0-beta7

(Langmead and Salzberg 2012). In the first round, Bowtie2 was set to --very-sensitive and –D 40.

Non-aligned reads were subjected to a second round of alignment where the read could be soft clipped by running Bowtie2 with the switch --very-sensitive-local. Resulting alignments were sorted, merged and indexed using Samtools v0.1.18 (Li et al. 2009).

2.3.2. Peak calling, peak overlap and genomic annotation.

Peak calling and downstream analysis was primarily performed using the HOMER software package v4.1 (available from http://biowhat.ucsd.edu/homer/ngs/index.html) (Heinz et al. 2010).

The script findPeaks.pl was used to for peak discovery using the paired input sample as a control with the settings -style factor, -F 5 and -L 5, requiring 5x fold enrichment over input and 5x fold enrichment over background (surrounding 10 kb) to call a peak. Peaks were subjected to an FDR cut-off of 0.001. Peaks were merged using mergePeaks using the switch -d given meaning that peaks had to literally overlap in genomic space to be considered overlapping. Peak lists were annotated using annotatePeaks.pl using the HOMER annotation set for mm9/NCBI37. HOMER

38

was also used to determine sequence conservation around peaks using the mouse PhastCons data supplied with the software package.

2.3.3. Quantification of ChIP tags.

HOMER was used to quantify ChIP tag density at peak locations across the genome. Unless otherwise noted, tags were counted within 400 bp around the peak centre (as peak widths could vary across the three different samples). All tag counts were normalized to 100M reads and were thus expressed as reads/100M reads to allow comparison across samples. Histograms of tag densities around various genomic features were also derived using HOMER. Bin sizes varied depending on the application and are given with each result.

2.3.4. Visualization

HOMER was used to create bedgraph files using the makeUCSCfile program. These were viewed using IGV v2.2 (Thorvaldsdottir et al. 2012). Venn diagrams were produced using BioVenn

(Hulsen et al. 2008), Venn Diagram Plotter v1.4.3740 (available from http://omics.pnl.gov/software/VennDiagramPlotter.php) and eulerAPE v2.0 (available from http://www.eulerdiagrams.org/eulerAPE/).

2.3.5. ENCODE data sets

An ENCODE DNase-seq dataset produced from murine lung fibroblasts by the

Stamatoyannopoulos Lab at the University of Washington was downloaded from GEO

(Accession# GSM1014199) (Encode Project Consortium 2011; Neph et al. 2012). An ENCODE

RNA-pol II ChIP-seq dataset produced from MEFs by the Ren Lab at the Ludwig Institute for

Cancer Research was also downloaded from GEO (Accession# GSM918761) (Encode Project

Consortium 2011; Shen et al. 2012). The raw sequencing reads from these datasets were processed using the ChIP-seq pipeline described above to make bedgraph files for visualisation in IGV and to quantify sequencing tags at genomic locations of interest.

39

2.3.6. Gene expression microarray analysis

Microarray data were analysed using Partek genomic suite v6.5 (Partek Inc., St. Louis, MO).

Microarray CEL files were imported into Partek and normalized using the robust multi array average (RMA) algorithm. After confirming array quality (Affymetrix built-in controls and principal components analysis), differential gene expression was calculated and tested for significance using a 1-way analysis of variance (ANOVA). Gene expression P values were corrected for multiple testing using a false discovery rate (FDR) threshold of 0.2.

40

Chapter 3. Genome-wide profile of KLF3 occupancy.

3.1. Introduction

As discussed in the general introduction, KLF3 is a mammalian transcription factor which has been shown to act predominantly as a repressor of transcription (Turner and Crossley 1998, Sue

2008, Funnell 2012). A number of physiological roles have been attributed to KLF3 encompassing erythropoiesis, lymphopoiesis and adipogenesis, reviewed in (Pearson et al. 2011).

Like other KLFs, KLF3 has three tandem classical zinc finger domains at the C-terminus that are known to bind to CACCC boxes and GC-rich motifs in DNA (Kaczynski et al. 2003; Suske et al.

2005). The DNA binding domain is very highly conserved across the KLF family and there is evidence of overlapping specificity between different KLF proteins (Dang et al. 2002; Funnell et al. 2012). KLF3 utilizes its N-terminal domain to recruit the co-repressor C-terminal binding protein (CtBP) (Turner and Crossley 1998) and CtBP can recruit a range of factors including histone methyltransferases, histone deacetylases and histone-lysine specific demethylases that remodel chromatin to repress gene expression (Laherty et al. 1997; Turner and Crossley 1998;

Kaczynski et al. 2001; Sif et al. 2001; Yang et al. 2003).

Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) allows the determination of transcription factor occupancy genome-wide (Johnson et al. 2007). The technique involves crosslinking living cells to preserve protein:DNA interactions and immunopreciptiation of chromatin fragments associated with a protein of interest. Enriched DNA is then extracted and libraries are created for high throughput DNA sequencing. Sequenced tags are then aligned back to the reference genome where significant accumulation of tags at a given region can be used to infer transcription factor binding. As such a genome-wide map of occupancy can be established, leading to new understanding of how particular factors operate to regulate gene transcription.

41

ChIP-seq experiments have been previously published on KLF1 and KLF4 (Chen et al. 2008;

Tallack et al. 2010; Pilon et al. 2011) however, the genome-wide occupancy of KLF3 has not yet been published in any cell type. Thus we set out to profile KLF3 binding genome-wide using

ChIP-seq. Later in the chapter, we combine KLF3 ChIP-seq and expression microarray data to identify new targets of KLF3 and to gain insights into the characteristics of KLF3 mediated gene regulation. Finally, we investigate the association of KLF3 with RNA polymerase II (RNAP II) binding using RNAP II ChIP-seq data sourced from the ENCODE project (Encode Project

Consortium 2011).

3.2. Experimental model

In order to study the occupancy of KLF3 genome-wide, an epitope tagged model was selected to enable consistent immunoprecipitation of KLF3 and later various mutants (Chapter 5) (Fig 3.1A).

The tag selected was the V5 epitope from the P and V proteins of the paramyxovirus, simian virus

5 (SV5). We used the longer form of the epitope comprising 14 amino acids with the sequence

(GKPIPNPLLGLDST). The epitope tag was attached to the C-terminus of KLF3 using a flexible glycine-serine linker (with 5 repeats of GS). The selection of the V5 tag was based upon its known level of performance under formaldehyde crosslinking conditions and the favourable degree of efficiency when compared with Biotin-Streptavidin systems (Kolodziej et al. 2009).

We elected to rescue a Klf3-/- cell line with epitope-tagged KLF3 to avoid potential for interference from endogenous KLF3 protein. Klf3-/- MEFs were stably transduced with full-length, epitope- tagged Klf3 using the pMSCV retroviral delivery systems (as described in Methods). Stable clonal cell lines expressing ectopic Klf3 were then isolated under puromycin selection. A stable cell line that expressed similar levels of epitope-tagged KLF3 protein to the levels of endogenous

KLF3 found in wild-type MEFs was selected (Fig 3.1C). Selection of such a cell line was necessary to ensure that additional targets were not identified purely because of excessive ectopic expression.

42

Figure 3.1 Establishing epitope tagged model for KLF3 ChIP-seq experiments. (A) A schematic of the epitope tagged KLF3 construct used to rescue Klf3-/- MEFs. (B) A representative Western blot showing similar relative levels of endogenous KLF3 in wild-type MEFs and tagged KLF3 in rescued MEFs, using a polyclonal antibody made to the N-terminal region of KLF3. (C) Densitometry results for the Western blot shown in B (arbitrary units).

3.3. ChIP-seq

ChIP-seq was performed in duplicate on MEF cells expressing Klf3-V5. Approximately 5x107

MEF cells were used per experiment and ChIP was conducted as described in methods. Briefly, cells were crosslinked with 1% formaldehyde for 5 minutes, before being quenched with glycine and washed with PBS. Cell pellets were frozen in liquid nitrogen and stored at -80ºC. Cell pellets were then lysed and the lysate was sonicated to produce dsDNA fragments of 100-200 bp in length. KLF3-V5 containing complexes were pulled down using an anti-V5 antibody on magnetic beads. Pull-downs were washed and DNA was eluted from the beads. Library preparation was

43

performed on 2 inputs (KLF3_IN-1 and KLF3_IN-2) and 2 immunoprecipitated samples

(KLF3_IP-1 and KLF3_IP-2) using sample specific adapters such that 3 samples were run per lane (1 and 1/3 lanes used in total). Samples were sequenced using 50bp chemistry on the HiSeq

2000 (Illumina, San Diego, CA). Raw read counts per sample are given in Table 3.1.

Sequences were aligned to the Mus musculus mm9 genome using Bowtie2 (Langmead and

Salzberg 2012). In the first round, sequences were required to align completely from end to end across the full 50 bp length. As some of the IP sample reads did not align under the round one criterion (Table 3.1), a second round of alignment was then carried out. In this round, reads were aligned using the “--local” mode which allows reads to be soft clipped to achieve a better alignment score. Trimming a small number of bases from the edges of reads can remove sample adapter sequences which may prevent proper alignment of the read. Finally these two sets of alignments were merged to create a total alignment. In total, 161 million and 87 million reads were mapped for input (IN) and immunoprecipitation (IP) samples respectively. A reduced mapping efficiency was noted in the IP samples. This commonly occurs as the library staring material for an IP sample is usually of much lower concentration than an IN sample resulting in reduced yield compared to input.

Table 3.1 ChIP-seq sequencing and mapping results.

Mapped to mm9 Fraction Sample Reads Round 1 Round 2 Total mapped INPUTs Klf3_IN-1 71,393,072 69,239,545 1,047,776 70,287,321 98% Klf3_IN-2 91,750,375 89,870,419 902,061 90,772,480 99% Total 163,143,447 159,109,964 1,949,837 161,059,801 99% IPs Klf3_IP-1 67,059,305 36,117,799 4,178,265 40,296,064 60% Klf3_IP-2 68,883,891 44,497,337 2,517,949 47,015,286 68% Total 135,943,196 80,615,136 6,696,214 87,311,350 64%

44

3.4. Evaluation of replicates

Peaks were then called using the software package HOMER on each individual replicate IP sample using the paired IN sample as a control (Heinz et al. 2010). Initially HOMER searched for clusters of high tag density that fit the expected peak width based on fragment size. These high tag density regions were then subjected to a P-value cut-off of <0.0001 and an FDR cut-off of <0.001. Peaks were then filtered such that putative peaks had to be more than 5-fold enriched in IP over IN and had to be more than 5-fold enriched over the surrounding 10kb from the peak centre. The cut-offs used for peak calling were selected based on those used by the authors of the

HOMER software package in their original publication (Heinz et al. 2010). Based on these criteria, a total of 26,772 (P < 7.1x 10-8) and 23,872 (P < 1.1x10-6) peaks were called in KLF3_1 and KLF3_2 samples respectively.

The degree of overlap between these replicates was then established. Peak boundaries had to literally overlap in genomic space to be designated overlapping (rather than peaks falling within a given distance of one another being termed as overlapping). A large proportion of the peaks overlapped indicating that the replicates were fairly consistent (Figure 3.2A). Note that those peaks that did not overlap in the two data sets are most likely small peaks that were not enriched enough to satisfy the cut-offs in both replicates. Thus the non-overlapping peak sets likely contain lower affinity peaks that satisfy cut-offs in one sample but not in the other. The core set of 14,115 overlapping peaks were taken to be high confidence KLF3 peaks. The degree of correlation between the two replicates at these high confidence peaks was found to be quite strong with an

R2=0.6423 (Figure 3.2B). Based on the good correlation between replicates, the 14,115 high confidence KLF3 peaks were carried forward for further analysis.

45

Figure 3.2 Replicate KLF3 ChIP-seq samples (A) Overlap (O/LAP) of peaks called for each KLF3 ChIP-seq replicate. (B) Peak height correlation (R2) between replicates at overlapping peaks.

3.5. KLF3 peak characterisation

3.5.1. Confirmation of peaks by qPCR

Encouragingly, the ChIP-seq peak analysis revealed strong peaks at previously identified KLF3 targets including Klf8, Lgals3 and Fam132a (Eaton et al. 2008; Funnell et al. 2012; Bell-Anderson

46

et al. 2013). To further validate these results, a number of peaks were selected for confirmation by ChIP-PCR. These peaks included the known targets mentioned above and a new peak at the

Stard4 promoter. Two negative control regions in the Klf8 locus were also selected (Figure 3.3).

These regions have previously been shown to not be occupied by KLF3 (Eaton et al. 2008). New, independent ChIP assays were performed and the recovered DNA was subjected to amplification by quantitative real-time PCR using primers for the six specific loci (primer sequences and genomic coordinates of amplicons are available in appendix A). As shown in Figure 3.3, the ChIP-

PCR confirmed the presence of the expected peaks and the absence of peaks in negative control regions. IgG precipitated controls analysed with these primer sets have previously and consistently demonstrated that the IgG background signal is similar to unbound regions (data not shown).

We were also interested to see whether peaks present in the KLF3-V5 rescue were found in wild- type MEF cells. A number of KLF3 targets identified in the KLF3-V5 ChIP-seq were successfully verified in wild-type MEFs using an antibody to endogenous KLF3, validating the transgenic model (Burdach et al. 2013).

47

A 0.07%

0.06%

-[0.05% r:::: 0 0.04% -Q) C) ns 1: 0.03% Q) ....0 &. 0.02%

0.01%

0.00% (-) Ctrl 1 (-) Ctrl 2 KlfB 1 a Lgals3 Fam 132a Star04 Locus

(-) Ctrl 1 (-) Ctrl 2 KlfB (-4.5kb exon 1a) KlfB (+33kb exon 1a) KlfB promoter 1a B ChrX 3 kb ChrX 3 kb ChrX 3 kb 149.667.000 149,668.000 149.669.000 149.700.000 149,701.000 149,672,000 149.673.000 I I I I [0-500] INPUT [0-500] INPUT [0-500] INPUT [0-500] KLF3 [0-500] KLF3 [0-500] KLF3

- • Amplicon • Amplicon Amplicon

KlfB KlfB

Refseq Refseq Refseq

Lgals3 (-20kb) Fam132a Stard4 Chrl4 3 kb Chr4 3 kb Chr5 3 kb 47.973.000 47.974.000 47.975.000 155.335.000 155.336.000 155.337.000 33.373.000 33.374.000 I I I I I I I I I [0-500] INPUT [0-500] INPUT [0-500] INPUT [0-500] KLF3 [0-500] KLF3 [0-500] KLF3

!_ - Amplicon Amplicon Amplicon - I I I I ... Fam l32a Stard4

Refseq Refseq Refs eq

Figure 3.3 Confirmation of ChIP-seq peaks by ChIP-PCR. (A) Relative enrichment of KLF3 binding sites as measured by ChIP-PCR. Values are expressed as a fraction of input. (B) ChIP-seq tracks of the control peaks amplified in (A). The location of the qPCR amplicons is indicated by the black boxes. Panels in the top row assessed KLF3 binding in the Klf8 locus at sites which have previously been investigated (Eaton et al. 2008). Lgals3 and Fam132a are previously established targets based on studies of Klf3 ablation (Funnell et al. 2012; Bell-Anderson et al. 2013). Stard4 is a novel Klf3 target identified using ChIP-seq.

48

3.5.2. Distribution of KLF3 peaks across the genome

The distribution of KLF3 peaks across the mm9 Mus musculus genome was analysed based on

RefSeq gene annotations. Promoter regions were defined as being -1kb to +0.1kb from the transcription start site (TSS), intronic regions were those lying between exons and intergenic regions made up the rest of the genome that wasn’t otherwise annotated. Peaks that fell into CDS exons, 5’ and 3’ UTR exons and transcription termination sites (TTS) (-100 bp to +1kb from the

TTS) were all labelled as ‘other’. Just under one third of the peaks were found to lie in promoters, approximately one third in introns and just over one third in intergenic regions (Figure 3.4A).

Since promoters and introns constitute much less than a third of the total genome each, these results represent a very strong enrichment of KLF3 peaks in promoters and also a notable but lesser enrichment in introns. Intronic peaks were spread throughout introns and didn’t show a clear positional bias toward intron exon boundaries. Genome browser tracks Examples of intronic peaks are given in Chapter 5 (Figure 5.6)

The precise location of the promoter peaks relative to the transcription start site is shown in Figure

3.4B. The confluence of peak centres is located approximately 50 bp upstream from the TSS.

The TATA box is typically found 25bp upstream of the TSS suggesting that KLF3 might be in close physical proximity to RNAP II as it binds to initiate transcription. The remarkable proximity of KLF3 peaks to the TSS also fits with results from Drosophila, where it has been suggested that the cofactor CtBP functions as a short-range co-repressor for the transcription factors Krüppel,

Knirps, Giant and Snail and is typically found within 100 bp of promoters or of activating transcription factors within enhancer units (Nibu et al. 1998; Mannervik et al. 1999; Nibu and

Levine 2001).

49

Figure 3.4 Distribution of KLF3 peaks across the genome. (A) Relative proportions of genomic regions bound by KLF3. (B) Histogram of KLF3 peak centres within 1.5kb of the TSS of those genes with KLF3 peaks at the proximal promoter.

3.5.3. Peak height and sequence conservation at different genomic regions

The relative enrichment of KLF3 ChIP-seq tags at promoters, introns and intergenic regions was determined using HOMER. A histogram of KLF3 ChIP tag density around KLF3 peak centres is shown in Figure 3.5. Based on this analysis, promoter peaks exhibit an approximately 50% greater peak height than peaks found in intergenic and intronic regions. As discussed in the general introduction, active gene promoters are generally more nucleosome depleted than other regions of the genome. This nucleosome depletion may explain the greater KLF3 occupancy (and corresponding peak height) observed at gene promoters in comparison to other genomic regions.

The average DNA fragment size is also easily distinguishable here at approximately 130bp

(Figure 3.5).

The conservation of nucleotides around KLF3 peaks was also analysed. Conservation was evaluated using HOMER, based on a 30-way vertebrate conservation score (PhastCons) available for the mm9 genome from the UCSC genome browser. The score reflects mouse conservation versus 29 other vertebrate species with a value of 1 indicating total conservation and a value of 0 indicating no conservation. As can be seen in Figure 3.6, an enrichment of conserved nucleotides

50

around the centre of KLF3 peaks is evident, indicating that these sites are more evolutionarily constrained than the surrounding sequence and therefore likely to be functional regions.

Promoter binding sites are the most evolutionarily constrained followed by intronic and intergenic peak regions.

Figure 3.5 Histogram of sequencing coverage per nucleotide 1.5kb around KLF3 peak centres. Peaks were separated into either promoter, intron or intergenic categories based on their localisation within the genome.

51

Figure 3.6 Relative conservation of nucleotides around KLF3 peak centres at single nucleotide resolution. Scores are based on the PhastCons algorithm on a 30-way vertebrate dataset. A score of 1 indicates perfect conservation and a score of 0 indicates no conservation. 3.6. Overlap with differentially expression genes

Given that we had identified a large number of KLF3 peaks at proximal promoters, we wished to better understand how KLF3 occupancy related to changes in gene expression. To accomplish this, we performed gene expression microarrays on Klf3-/- MEFs and the same cell line rescued with Klf3-V5. RNA was extracted from the relevant cells and was then subjected to whole transcript sense labelling and hybridisation to Mouse Gene ST 1.0 arrays (Affymetrix, CA).

Microarray data were analyzed using Partek genomic suite v6.5 (Partek Inc., MO).

Microarray CEL files were imported into Partek and normalized using the robust multi array average (RMA) algorithm. After confirming array quality (Affymetrix built-in controls and principal components analysis), differential gene expression was calculated and tested for significance using a one-way analysis of variance (ANOVA). Volcano plots of these data are shown in Figure 3.7. A P-value cut-off of less than 0.05 was applied using 1-way ANOVA, and transcripts dysregulated more than 2-fold were selected. In total, 196 transcripts were repressed

52

and 201 were upregulated upon rescue with KLF3 according to these cut-offs. To further refine these putative KLF3 targets, we searched within these groups for genes that exhibited a KLF3 peak at the proximal promoter (-1 kb, +0.1 kb). When relating changes in gene expression to transcription factor occupancy, it is only possible to connect occupancy and gene expression for genes that exhibit proximal promoter binding as distal regulatory elements do not necessarily regulate the most proximal gene (Schoenfelder et al. 2010). A total of 65 genes showed >2.0 fold repression in the presence of KLF3 and exhibited KLF3 occupancy at their proximal promoters

(the top 20 transcripts ranked by fold change are shown in Table 3.2). Only 19 genes showed activation >2.0 fold in the presence of KLF3 and a KLF3 promoter peak, which is consistent with previous results (Funnell et al. 2012) and reinforces the view that KLF3 is predominantly a repressor of transcription.

Limited overlap is observed between genes that are bound by KLF3 at the promoter and genes that are repressed more than 2-fold on KLF3 rescue (65 out of 4263). This result may be explained by the fact that the related protein KLF8 can efficiently compensate for the absence of KLF3 at several loci (Eaton et al. 2008; Funnell et al. 2013). KLF3 and KLF8 show a high level of conservation and both recruit the co-repressor CtBP to silence target genes. KLF3 represses KLF8 in many tissues, such that loss of KLF3 results in upregulation of KLF8, leading to compensatory masking of effects on gene dysregulation. It is only when both factors are ablated that it is possible to interpret gene expression changes attributable to KLF3.

53

Figure 3.7 Effect of Klf3 rescue on transcript expression in Klf3-/- MEF cells. Blue data points represent those that satisfy a P < 0.05 cut-off as determined by one-way ANOVA whilst red and grey points fail this significance test. Vertical cut-offs represent a fold change of 2.0 in transcript expression. Positive fold changes indicate repression in the presence of KLF3.

The relatively symmetrical dysregulation of genes following rescue with KLF3 (196 repressed/201 upregulated) is quite surprising given KLF3’s known role as a repressor. This result could be explained based on indirect or secondary effects where KLF3 affects the expression of other gene regulators which in turn can lead to up or downregulation of other targets.

Support for this notion comes from the fact that if KLF3 promoter bound genes which are dysregulated more than 2-fold are quantitated, approximately three times as many genes are repressed as are upregulated. Representative examples of genes repressed in the presence of

KLF3 targets are given in Figure 3.8. All four of these genes (Lgals3, Pqlc3, Mgst3 and Fam132a)

54

have previously been identified as KLF3 targets based on expression changes upon Klf3 ablation in erythroid cells (Funnell et al. 2012; Bell-Anderson et al. 2013). Here we show KLF3 occupancy at the proximal promoter by ChIP-seq bolstering the case for these genes being bona fide KLF3 targets. Also included in this figure are tracks for DNase-seq (discussed in Chapter 4) and RNAP II (discussed later in this chapter). There is a good deal of overlap between KLF3 bound sites and DNase sensitive nucleosome depleted regions that mark the proximal promoter.

RNAP II is also localised near to the TSS, reflecting transcription initiation.

Table 3.2 Top 20 genes ranked by fold change that are repressed upon KLF3 rescue and that also exhibit a KLF3 ChIP peak at the proximal promoter.

Further insights into how KLF3 occupancy might relate to gene expression changes can be gleaned by focusing on those genes that exhibit KLF3 binding at the proximal promoter (here designated as -1kb to +100bp). 3,632 genes showed KLF3 peaks at the proximal promoter within the aforementioned spatial constraints. The fold change difference in gene expression for all genes was calculated based on expression microarray data discussed previously. These fold

55

changes were then plotted as a histogram (Figure 3.9, blue line) which revealed a shift to the right of centre, indicating that more KLF3 promoter-bound genes are repressed in the presence KLF3 rescue than are activated. Among these differentially expressed genes are likely to be both direct targets of KLF3 and secondary targets where some other regulator would play a role in the change of gene expression.

Figure 3.8 A selection of putative KLF3 target genes. Genes displayed were repressed more than 2-fold upon rescue with KLF3 and also exhibit a ChIP peak at the proximal promoter. DNase-seq and RNAP II ChIP-seq tracks generated from experiments from the Ren and Stamatoyannopoulos labs respectively are also displayed. Both were sourced from the ENCODE project (Encode Project Consortium 2011; Neph et al. 2012; Shen et al. 2012). Gene expression changes are based on microarray data and have passed a P < 0.05 cut-off as measured by 1-way analysis of variance (ANOVA). Error bars represent standard error of the mean (SEM).

It was then possible to look at how KLF3 promoter-bound genes behaved under the influence of

KLF3 versus all genes (Figure 3.9). Contingency tables were assembled based on fold change cut-offs of >log20.25 and < -log20.25. Chi-square statistical tests were run on these groups revealing that there was a highly statistically significant difference in fold-changes (activated and repressed) for genes that were bound by KLF3 versus those that were not. Odds-ratios revealed

56

that KLF3 promoter bound genes were more likely to be repressed (OR=2.433) than activated

(OR=1.500), consistent with the view that KLF3 is predominantly a repressor of transcription.

Figure 3.9 Histogram of changes in gene expression upon rescue with Klf3-V5 for all genes (blue line) and for those genes which have a KLF3 peak at the proximal promoter by ChIP-seq (red line). A positive fold change indicates higher expression in the empty vector Klf3-/- cell line (repressed in the presence of KLF3). Note that the y- axis is expressed as a proportion of genes rather than a raw count. The number of genes in each data series are given in parentheses. No statistical cut-off has been applied to this expression data as many of the transcripts that show small fold changes would logically not pass this cut-off. Bin sizes for the histogram are log2 0.1. In the lower portion of the figure are contingency tables which show the relationship between promoter occupancy and change in gene expression on KLF3 rescue. Tables were analysed with two-tailed chi-square tests using Prism 6.0. The higher odds ratio associated with KLF3 bound genes exhibiting > 0.25 log2 fold change indicates that in this model KLF3 acts more strongly as a repressor.

Now that a complete set of gene expression data had been generated, it was also possible to consider the level of expression of genes bound by KLF3 at the promoter in comparison to all the genes on the array for Klf3-V5 rescued MEFs (Figure 3.10). If all genes are analysed (blue series,

57

Figure 3.10), a bi-modal distribution is observed where genes fall into one of two groups; either highly expressed or weakly expressed. Genes that were bound by KLF3 at the promoter (red series, Figure 3.10) were consistently in the more highly expressed set of genes. In other words,

KLF3 seems to preferentially bind to highly expressed genes.

As mentioned in the general introduction, promoters of active genes are highly permissive to

DNA-binding proteins whereas less active or silent promoters are less permissive. Naturally, it would be expected that higher levels of occupancy by DNA binding proteins would occur at the promoters of active genes in comparison to less active or silent gene promoters. Thus it is not surprising the KLF3 is predominantly associated with regions of open chromatin found in the promoters of highly expressed genes (Figure 3.10). This may seem at odds with KLF3’s known role as a repressor, however the issue of whether KLF3 has an impact of the expression of the proximal gene is separate to whether it shows occupancy. These ideas are explored further in

Chapter 4 where the relationship between KLF3 binding and chromatin state is analysed.

Figure 3.10 Histogram of the expression level (by microarray) of all genes versus those that exhibited a KLF3 peak by ChIP-seq. The number of genes in each data series are given in parentheses. Bin sizes for the histogram are log2 0.5.

58

3.7. RNA polymerase II and KLF3

As noted earlier, KLF3 binds very close to the TSS of many genes with the confluence of KLF3 peaks occurring 50 bp upstream of the TSS. This fits with CtBP’s role as a short-range repressor in Drosophila where it is known to act as a repressive cofactor for TFs that operate within 100 bp of the proximal promoter and to counteract activators within 100 bp of enhancer units (Nibu et al.

1998). Given that the TATA box is usually found around 25 bp upstream of the TSS, it would appear that KLF3 and RNAP II may be in very close physical proximity to each other during transcription initiation. In order to obtain a better understanding of how KLF3 may interact or co- localise with RNAP II, an ENCODE RNAP II ChIP-seq dataset produced from MEFs by the Ren

Lab at the Ludwig Institute for Cancer Research was downloaded from NCBI gene expression omnibus (GEO) (Accession# GSM918761) (Encode Project Consortium 2011; Shen et al. 2012).

The raw sequencing reads from these datasets were processed using the ChIP-seq pipeline previously described to make bedgraph files for visualisation and to quantify sequencing tags at genomic locations of interest. Sequencing tags 1kb around the TSS of each annotated RefSeq gene were then counted to develop a profile of gene specific RNAP II occupancy in MEF cells.

These data could then be combined with gene expression data for each gene to investigate linkages between RNAP II occupancy and gene expression (Figure 3.11). Additionally, those genes showing a KLF3 peak at the proximal promoter could be segregated and contrasted to all genes in the genome in terms of RNAP II occupancy and gene expression. As previously discussed,

KLF3 bound genes tend to have a higher level of expression compared to non-bound genes (red versus blue Figure 3.11). KLF3 bound genes also seem to have a higher level of RNAP II occupancy at the promoters as measured by ChIP-seq in comparison to non KLF3 bound genes.

Indeed, if RNAP II occupancy of KLF3 bound genes is compared to non KLF3 bound genes using a histogram, there is a strong co-association of RNAP II at the promoters of KLF3 bound genes compared to non KLF3 bound genes (Fig 3.12).

59

Figure 3.11 Relationship between gene expression and RNAP II promoter occupancy. Genes that exhibit a KLF3 peak at the proximal promoter are highlighted in red. Data for all genes is shown in blue.

60

Figure 3.12 Histogram of RNA pol II occupancy (within 1 kb of the TSS) as measured by ChIP-seq for all genes and those genes that exhibited a KLF3 peak at the proximal promoter. The number of genes in each data series are given in parentheses. Bin sizes for the histogram are log2 0.5.

However, this simplisic analysis of RNAP II occupancy at the promoter does account for the level of gene expression. Logically, if a gene is more highly expressed there is an increased chance that RNAP II may be detected at the promoter regardless of whether it is pausing there or not.

Thus we developed the RNAP II pausing index which normalises RNAP II promoter occupancy based on the level of gene expression. The formula describing how the pausing index is calculated is shown in Figure 3.13A. It is possible to count the RNAP II ChIP-seq tags within 1kb of the

TSS for all RefSeq annotated genes. This RNAP II occupancy data can then be combined with global gene expression data as determined by microarray. Taking the log of the ratio of occupancy to gene expression gives an indication of how much RNAP II is dwelling at the promoter, with negative values indicating that RNAP II is not pausing at the promoter and positive values indicating that RNAP II is held up at the promoter.

Once the Pausing Index has been calculated for all annotated TSSs in the genome, it is possible to compare the RNAP II Pausing Index for all genes to those genes that are bound by KLF3. In

61

performing this analysis, it was observed that KLF3 bound genes showed more pausing than all genes, demonstrated as a shift to the right on the histogram displayed in Figure 3.13. Realising that KLF3 promoter occupancy was strongly associated with more highly expressed genes, it was thought that ‘all genes’ were not necessarily a good control gene set. Thus KLF3 bound genes were also compared to the top 20% of genes based on expression levels. These highly expressed genes showed an even greater divergence from KLF3 bound genes supporting the finding that

RNAP II pausing levels are increased for those genes bound by KLF3 at the proximal promoter.

Whether the increased RNAP II pausing at the promoters of KLF3 bound genes is due to a direct or indirect interaction remains to be seen, however there is at least an association between RNAP

II pausing and KLF3 occupancy.

62

Figure 3.13 RNA pol II pausing index. (A) The RNA pol II pausing index calculation. (B) Histogram of RNA pol II pausing index for the TSSs of all genes and for the subset of genes bound by KLF3 at the proximal promoter. The number of TSSs in each data series are given in parentheses. Bin sizes for the histogram were 0.5.

3.8. Discussion

Here we have reported the genome-wide occupancy of KLF3 for the first time. ChIP-seq data have previously been published for two other KLF family members; the erythroid TF KLF1

(Tallack et al. 2010; Pilon et al. 2011) and the pluripotency TF KLF4 (Chen et al. 2008). As the number of KLF datasets grows it is increasingly possible to appreciate the similarities and differences between the members of the KLF family. In the case of KLF3, we found that it is

63

largely bound to the proximal promoters of target genes, specifically around 50 bp upstream of the TSS. This is in contrast to KLF1 which is bound mostly at distal regulatory elements and has well explored functions in enhancers (Tallack et al. 2010; Su et al. 2013). The binding of KLF3 to proximal promoters fits with CtBP’s known role as a short range repressor in Drosophila, where it acts as a corepressor for transcription factors that bind within 100 bp of gene promoters or represses the action of activating factors within enhancer units (Nibu et al. 1998).

We elected to use a rescue model to enable a comparison of the binding profiles of WT KLF3 with different KLF3 mutants in an isogenic background. This model was designed to determine the contribution of KLF3 functional domains to in vivo specificity at a genome-wide level. It also enabled the use of an epitope tag for chromatin pull-downs, meaning that the same antibody could be used in a consistent manner across KLF3 WT and mutant transgenic lines (see Chapter 5). This system was hence ideal for comparing DNA binding profiles in a controlled setting with minimal external variables. It should be noted that this approach differs from ChIP in wild-type cells, where antibodies are used against endogenous proteins. The absence of KLF3 in development and subsequent generation of immortalised MEF cells may have resulted in a chromatin state that differs from a wild-type cell. As a result it is possible that ChIP-Seq peaks seen here may not be identical to profiles derived from wild-type cells interrogated using an anti-KLF3 antibody.

By combining gene expression data and genome-wide occupancy of KLF3, we have identified some novel KLF3 targets in MEF cells and have provided additional evidence to support previously reported targets of KLF3. We have validated a number of known KLF3 targets including Lgals3, and Fam132a which had previously been identified as KLF3 targets in erythroid cells based on changes in gene expression associated with ablation of Klf3 (Funnell et al. 2012;

Bell-Anderson et al. 2013). For the first time, we have shown KLF3 occupancy at the promoters of these genes in MEFs bolstering the evidence that these genes are directly regulated by KLF3.

Fam132a (Adipolin) is a secreted adipokine with roles in insulin sensitisation and metabolism

(Enomoto et al. 2011; Enomoto et al. 2012; Wei et al. 2012a; Wei et al. 2012b) and is the subject

64

of recent work in our lab. Male Klf3-/- mice show a metabolic phenotype with lean body composition and low bodyweight (Sue et al. 2008). On a high fat diet they are resistant to diet induced obesity and show reduced susceptibility to insulin resistance and it is probable that some of these metabolic defects are attributable to dysregulation of Fam132a and systemic elevation of plasma adipolin (Bell-Anderson et al. 2013). Another KLF3 target gene, Galectin 3 (Lgals3), is also involved in metabolism as Lgals3-/- mice exhibit an high-fat diet induced phenotype and dysregulated glucose regulation (Pang et al. 2013; Pejnovic et al. 2013). Galectin 3 has also been found to have roles in cell migration, inflammation and tumourogenesis (Dumic et al. 2006;

Radosavljevic et al. 2012). We also identified a novel KLF3 target Stard4, that is involved in cholesterol transport, particularly the movement of cholesterol to the endoplasmic reticulum and the formation of lipid droplets (Rodriguez-Agudo et al. 2011). These KLF3 target genes may each contribute to the metabolic defects previously identified in Klf3-/- mice (Sue et al. 2008).

Additionally, we have identified a range of novel KLF3 targets in MEF cells based on both microarray and ChIP-seq data. Some of these genes have roles in cell junction and migration and others are involved in apoptosis. Both groups are also the subject of future work in our lab.

Our data collectively support the assertion that KLF3 is predominantly a repressor of transcription. Comparing KO and rescue samples is complex as changes in gene expression are attributable both to the direct ablation of the gene of interest and also to the change in expression level of all of its downstream targets (secondary effects). Given that around one third of KLF3 occupancy is at proximal promoters, we have a rare opportunity with this data set to associate gene expression changes with occupancy. Regulation by occupancy at proximal promoters is widely accepted as being directed at the proximal gene, whereas binding at distal regulatory elements cannot be linked to nearest genes as these interactions often occur over large genomic distances (Schoenfelder et al. 2010). When KLF3 promoter-bound genes are separated from non- promoter-bound, it is clear that the promoter-bound genes predominantly show repression. Given the large number of peaks identified, there is a significant expectation of random association

65

between KLF3 bound genes and differentially expressed genes. Genes that are promoter bound and show up-regulation in the presence of KLF3 may be examples of such associations.

We find that KLF3 mostly occupies highly expressed genes. This may seem counter intuitive, as

KLF3 is predominantly a repressor, but it is increasingly being appreciated that occupancy does not always equal action. Given that TFs are known to act combinatorially in cis regulatory modules, and that many genes show poised states based on histone marks, it seems quite reasonable for a repressor to be bound at active genes, especially if repression is contingent on the convergence of multiple regulators to enact an effect on gene expression. Moreover, increased occupancy at active genes may also be a reflection of the fact that active genes have more permissive promoters, characterised by nucleosome depletion and active chromatin marks.

Despite the correlation of KLF3 occupancy with active genes, we found that KLF3 promoter bound genes tended to be repressed in the presence of KLF3 based on gene expression data.

Taken together, these data support the role of KLF3 as being predominantly a repressor. It remains possible that KLF3 can also directly activate genes, however, it is very difficult to make an absolute determination due to the complex and highly contextual nature of gene regulatory systems.

We have also shown a correlation between RNAP II pausing at promoters and KLF3 occupancy.

Given that KLF3 binds so closely to the TSS and to the TATA box, it is possible that KLF3 could interact either directly or indirectly with RNAP II. We have shown that KLF3 bound genes exhibit more RNAP II occupancy, even when normalised for expression level. Ideally, one would compare RNAP II occupancy at KLF3 target genes in the presence and absence of KLF3 to better understand the effects of KLF3 occupancy on RNAP II. It would also be of interest to examine the phosphorylation status of RNAP II at these target genes. During transcriptional initiation, serine 5 (ser5) in the C-terminal domain of RNAP II becomes phosphorylated whilst serine 2

(ser2) becomes phosphorylated upon elongation (Phatnani and Greenleaf 2006). One could examine the phosphorylation status at ser2 and ser5 of RNAP II by ChIP at KLF3 bound

66

promoters in the presence and absence of KLF3 to see whether KLF3 affects RNAP II initiation or elongation.

In summary, we have profiled KLF3 occupancy in MEF cells and have found that KLF3 is highly enriched at the proximal promoters of genes. Using microarray data, a number of novel KLF3 target genes have been identified and consistent with previous studies, we have found that KLF3 predominantly acts as a repressor of transcription. An association between increased RNAP II occupancy at the TSS and KLF3 occupancy at the proximal promoter was noted, suggesting some degree of interaction between KLF3 and RNAP II (whether this interaction is direct or indirect remains to be determined). Surprisingly, it was also noted that KLF3 seemed to preferentially occupy highly expressed genes. In the following chapter, we will examine this finding in more detail by investigating the chromatin state of regulatory regions bound by KLF3.

67

Chapter 4. KLF3 DNA binding and chromatin state

4.1. Introduction

4.1.1. DNA binding by KLF3

KLF proteins all share a highly conserved DNA binding domain consisting of three tandem classical ZF motifs located in the C-terminus of the protein. The classical ZF motif consists of a

ββα fold, with two cysteine and two histidine residues coordinating a zinc ion to stabilize the structure with the α-helix contacting DNA in a sequence specific manner (Lee et al. 1989; Rhodes and Klug 1993). A model for prediction of DNA binding by classical ZFs has been developed, that identifies particular residues in the α-helical regions of the motif which make sequence specific contacts with the DNA bases (Klug 2010; Pearson et al. 2011). Under this model, the great majority of the 17 known KLF transcription factors are predicted to bind the same DNA consensus sequence based on an almost total conservation of the predicted DNA contact residues within the KLF ZFs.

KLFs have long been known to bind to CACCC boxes and GC-rich sequences in DNA (Miller and Bieker 1993; Anderson et al. 1995; Crossley et al. 1996; Kaczynski et al. 2003; Suske et al.

2005). Indeed the initial cloning of KLF1 indentified a potential binding site in the β-globin promoter with the sequence CCA CAC CCT based on results from in vitro assays (Miller and

Bieker 1993). Subsequently, the binding preferences of other KLFs including KLF3 were further established using in vitro assays at first (Nagashima et al. 2009), and later with the identification of a limited number of in vivo target sites using ChIP-PCR (Eaton et al. 2008 ).

More recently, the DNA consensus motifs of other KLFs have been refined with genome-wide

ChIP-seq datasets. KLF1 ChIP-seq has been performed in erythroid cells by two groups (Tallack et al. 2010; Pilon et al. 2011) and KLF4 ChIP-seq data generated from ES cells has also been published (Chen et al. 2008). With the exception of the Pilon data, which failed to produce a reliable motif due to data quality issues, these datasets have all revealed very similar motifs across

68

KLF family members, which is consistent with predictions based on the conservation of contact residues in the DBD. In this chapter we define the KLF3 consensus DNA motif and investigate the colocalisation of motifs for other known TFs.

4.1.2. KLF3 and chromatin state

Chromatin state is determined by a number of factors including nucleosome position and density,

DNA methylation and post-translation modification (PTM) of histone proteins. These elements all contribute to the accessibility of DNA to DNA binding proteins, forming a key component of the regulatory processes which control transcription. Here we focus on two measures of chromatin state: nucleosome position, and a number of histone PTMs.

4.1.2.1. Nucleosome position

Nucleosome position affects the accessibility of DNA-binding proteins to DNA. Naked DNA that is not associated with histones is more readily bound by trans-acting factors whereas nucleosomal

DNA is less accessible due to the spatial and electrochemical interference brought about by contact with histone proteins. Thus nucleosome position can affect the availability of binding sites to DNA binding proteins and occupancy by DNA binding proteins is usually greatest in nucleosome free regions.

4.1.2.2. Post-translational modification of histones

A large number of residues within histone proteins can be modified post-translationally with a range of functional groups. Well studied modifications are largely limited to mono, di and tri- methylation and acetylation of lysine residues across various histone proteins. Some of these modifications are associated with particular regulatory regions (for instance promoters or enhancers) and some are associated with activation or repression of their associated gene. A number of histone marks are examined in this chapter using data from the ENCODE project

(Encode Project Consortium 2011).

69

In summary, within this chapter we define the KLF3 consensus DNA binding motif for the first time and show colocalisation of motifs for other known TFs. We also investigate the association of KLF3 with various markers of chromatin state including nucleosome position and density, and finally examine the co-occurrence of post-translationally modified histones with KLF3 binding sites.

4.2. KLF3 DNA binding preference in vivo

Given that we had established a genome-wide occupancy profile for KLF3, we then sought to define the DNA consensus motif for KLF3 in MEF cells de novo. KLF3 peaks were ranked based on peak height and the top 500 peaks were taken for further analysis. Sequences were extracted from the 100bp surrounding these peaks using HOMER to create a sequence database for de novo motif discovery using MEME (Bailey and Elkan 1994). The KLF3 consensus motif is shown in

Figure 4.1, alongside previously published motifs for KLF1 and KLF4. The KLF3 motif discovered is remarkably similar to those previously reported for other KLF transcription factors based on ChIP-seq experiments (Chen et al. 2008; Tallack et al. 2010). As discussed in the introduction, the amino acids predicted to contact DNA in KLF1, KLF3 and KLF4 are completely conserved, so it was not unexpected that they might exhibit similar sequence preferences in vivo.

The implications of this are rather interesting as although different KLFs exhibit significantly different biological functions, they seem to share a highly conserved DNA consensus sequence.

This might suggest that other factors in addition to their DNA-binding preference may contribute to in vivo occupancy, a concept which is further explored in Chapter 5. On the other hand, it seems quite possible that several KLFs in a cell may target at least some of the same cis-regulatory elements. This notion is supported by the previous observations that KLF3 can repress a subset of KLF1 target genes in erythroid cells and that KLF4 and KLF5 can share target genes (Dang et al. 2002; Funnell et al. 2012).

70

Figure 4.1 Characterization of the KLF3 consensus DNA binding site. (A) The KLF3 consensus binding site derived from KLF3 ChIP-seq peaks. De novo motif discovery was accomplished using MEME on a sequence database comprised of the 100 bp surround the top 500 peaks ranked by peak height. (B) KLF1 consensus binding motif from a ChIP-seq experiment on erythroid cells (Tallack et al. 2012). (C) KLF4 consensus binding site from ChIP-seq on ES cells (Chen et al. 2008).

We were also interested to determine whether the KLF3 consensus motif found in promoter peaks differed from the motifs found in intronic and intergenic peaks. The top 500 promoter, intron and intergenic peaks were analysed to see if there was any difference in the consensus motif from these regions (results shown in Figure 4.2). The motifs were quite similar, however two slight changes were observed. The C at position 2 carried more information content in promoters than in distal regulatory regions. Also, at promoters a G was preferred at position 6, whereas at introns and intergenic regions, an A was preferred. The intronic and intergenic KLF3 motifs bore a stronger resemblance to KLF1 with an A at position 6. KLF1 was found to predominantly occupy enhancer regions rather than proximal promoters (Tallack et al. 2010). Therefore two slightly different CACCC motifs may exist that delineate proximal promoters versus enhancers and other distal regulatory elements; however it remains to be seen whether these marginal differences have a functional role in specification or whether the sequence content of promoters simply tends to be more GC-rich than other genomic regions. In light of the minor divergences discussed, our analysis showed that there is very little difference in motif preference at these different genomic regions (Figure 4.2).

71

Figure 4.2 De novo KLF3 motifs derived from KLF3 peaks in different genomic regions. Motif discovery was performed using MEME (Bailey and Elkan 1994) on the 100 bp at the centre of the top 500 peaks in each region ranked by peak height. Frequencies of motifs are indicated below each position weight matrix.

4.3. Validation of the de novo generated KLF3 consensus motif

As discussed, the de novo generated KLF3 consensus motif was highly similar to sequences previously reported for other KLF family members. Given these similarities, we wished to investigate the individual contribution of a range of nucleotides to KLF3 DNA binding using electrophoretic mobility shift assay (EMSA) (Figure 4.3). A DNA probe from the Klf8 locus with the sequence 5´-GCCCCACCC-3´ had previously been established as a target of KLF3 (Eaton et al. 2008) and was included here as a control. A KLF3 ChIP-peak at the Stard4 promoter contained the same sequence as the consensus KLF3 DNA binding motif 5´-GCCCCGCCC-3´ and was also used. The importance of various nucleotides in the KLF3 motif were then analysed using probes containing various point mutations. A preference for a G at position 1 was observed in the KLF3 ChIP-seq results, however, the importance of this nucleotide has not been previously discussed or analysed to our knowledge. When the G at position 1 was changed to a T, a reduction of binding was observed, indicating that the G at position 1 is indeed important (Figure 4.3). As suggested by the de novo generated motif, an A can also be tolerated at this position and we found that this was indeed the case. In fact, the in vitro results suggest that an A may even be preferred at position 1. We also confirmed the importance of residue 3, which is typically a C in the de novo generated motif and showed that changing this residue to a T reduced binding. Given that the

72

flanking nucleotides in the Klf8 probe were different to those in the Stard4 probe, we also created a Stard4 point mutant to investigate the presence of A in position 6 (as occurs in the Klf8 probe).

This showed that the nucleotide in position 6 can be either an A or a G as previously reported

(Crossley et al. 1996) and again was consistent with the de novo motif. Taken together these results confirm that the in vitro binding preference matches the ChIP-seq generated consensus and highlights for the first time the importance of having a G or A at position 1.

Figure 4.3 EMSA showing the effect of various mutations on DNA binding by KLF3. The probe from the Klf8 locus (lanes 1-3) has previously been established as a target of KLF3 (Eaton et al. 2008). The probe from the Stard4 promoter has the KLF3 consensus sequence shown in Figure 4.1 (lanes 4-6). Mutations were made to the Stard4 probe to investigate the contribution of individual nucleotides to binding (lanes 7-10) revealing that the G at position 1 and the C at position 3 are critical for proper binding in vitro. Full probe sequences are given in Appendix A.

73

4.4. Distribution of KLF3 motifs within peaks

A search was undertaken to find instances of the KLF3 de novo motif within KLF3 peaks. This is a challenging task as the motif is largely comprised of Gs and Cs and therefore is not particularly distinguishable from the high GC sequence content of promoter regions. In order to designate the presence of a motif over background a cut-off must be used. A log-odds ratio cut-off of 3 was used meaning that approximately 1 mismatch was allowed for a candidate sequence to be labelled as a KLF3 motif. On this basis 10,554 (74%) of peaks had KLF3 motifs Importantly, peaks designated as not having a KLF3 motif, more likely have a less conserved motif rather than no motif at all and merely fail to satisfy the cut-off set. It is also worth considering that ChIP-seq data may theoretically include secondary peaks that reflect long-range DNA interactions. The immunoprecipitation of a KLF3 binding site may result in the co-precipitation of other regions of the genome where long range interactions have been preserved by formaldehyde crosslinking, leading to the identification of artifactual peaks. Moreover, it is also possible that KLF3 may be tethered to DNA via another protein or RNA without the use of its DNA binding domain. Such tethered interactions would also be found to lack the consensus binding motif. Either of these indirect sources of peaks may be contained within the 26% of peaks that do not exhibit the KLF3 consensus, based on the cut-offs used.

When comparison to a randomly shuffled set of background peaks, a large number of KLF3 peaks exhibited the presence of more than one KLF3 motif (Figure 4.4A), and as the number of KLF3 motifs present in the peak increased, so did the mean height of those peaks (Figure 4.4B). This increase in peak height with increased motif frequency did not occur in a one to one ratio, however there is a clear trend linking increased motif density with increased occupancy. The de novo generated KLF3 motif was also used to test the spatial localisation of the KLF3 consensus motif within KLF3 ChIP peaks. As expected, the motif was found to be centrally enriched within the pooled KLF3 ChIP peaks (Figure 4.4C).

74

We also examined the relative difference in KLF3 motif abundance in those genes that exhibited a strong transcriptional response to KLF3 promoter binding. Those genes that showed KLF3 binding at the promoter and were repressed by more than 2-fold in the presence of KLF3 were compared to other KLF3 promoter-bound genes that showed a lesser response to KLF3 occupancy

(≤2-fold repression in the presence of KLF3). Interestingly, there was a significant increase in the number of KLF3 motifs present in those genes that showed a strong transcriptional response to

KLF3. In light of the observation in Figure 4.4B that KLF3 peak height also tends to increase with KLF3 motif abundance, these results taken together suggest that the presence of multiple

KLF3 motifs within a promoter tends to lead to increased KLF3 occupancy and a stronger effect on transcriptional regulation.

75

Figure 4.4 Analysis of KLF3 motifs within KLF3 peaks. (A) The frequency of KLF3 motifs in KLF3 ChIP-seq peaks (red bars) was determined in comparison to a size-matched set of randomly selected background peaks (blue bars) revealing that KLF3 ChIP-seq peaks showed an enrichment of KLF3 motifs over background. (B) Relationship between peak height and KLF3 motif count within the peak. Motif counts were established using HOMER (Heinz et al. 2010) and the mean peak height was taken. Error bars represent the standard error of the mean (SEM). (C) Cumulative distribution of KLF3 consensus motifs within KLF3 ChIP-seq peaks. (D) Box plot showing the difference in motif count for those genes promoter-bound by KLF3 and also repressed more than 2-fold versus other KLF3 promoter- bound genes. Significance was established using the non-parametric Mann-Whitney-Wilcoxon test. *** denotes P< 0.0001. Error bars represent 95% confidence intervals

4.5. Search for known motifs of other TFs

HOMER was used to search for known motifs in KLF3 peaks using the motif library embedded in the HOMER package (Heinz et al. 2010). The HOMER motif library is predominantly derived from the JASPAR motif database and has been subject to quality filtering and is curated by the

76

HOMER package authors. The 200 bp of sequence surrounding KLF3 peak centres were analysed for these known motifs at KLF3 promoter, intronic and intergenic peaks using log-odds ratios defined for each motif by the HOMER motif database. A cut-off of P < 1x10-60 and a frequency in KLF3 peaks of >12% was applied to the results to limit the large number of low frequency results. When multiple factors with highly similar motifs belonging to a family of proteins were discovered, only the best motif was retained and designated as belonging to that family rather than to the specific factor. After the enriched motifs were identified, their distribution in the 1kb surrounding peak centres was determined at KLF3 promoter, intronic and intergenic peaks. The enrichments of known motifs in peaks from various genomic regions are discussed individually below.

4.5.1. KLF3 associated motifs in promoter peaks

The known motifs enriched that were detected as enriched within all KLF3 promoter peaks are shown in Figure 4.5. These enriched motifs are typical TFs or TF families that are associated with proximal promoters. Although none of these proteins are known to interact with KLF3 directly, they may work together with KLF3 in cis-regulatory modules to combinatorially control gene expression. Each of the enriched motifs are discussed individually below.

More than 30% of KLF3 promoter peaks were found to contain an ETS motif. The ETS family of TFs consists of at least 28 members with diverse physiological roles (reviewed here

(Hollenhorst et al. 2011)). ETS factors are known to bind to both proximal (promoters) and distal

(enhancers) regulatory elements (Hollenhorst et al. 2011). Given the size and diversity of this family, it is reasonable to expect that some ETS proteins may act together with KLF3 to modulate gene expression in certain promoter contexts. KLF3 promoter peaks also exhibited enrichment for the E2F motif. The eight members of the E2F family are involved in cellular proliferation, differentiation and apoptosis and dimerise with DP protein to form heterodimeric complexes which bind predominantly to gene promoters (van den Heuvel and Dyson 2008; Wu et al. 2009).

77

Again, KLF3 may share gene regulatory targets with E2F factors as KLF1 is known to regulate expression of KLF3, E2F2 and E2F4 (Eaton et al. 2008; Tallack et al. 2010).

A Promoter KLF3 Background Motif Name P-value peaks sequences with motif with motif

ETS family 1e-323 30.05% 9.18%

E2F family 1e-40 20.71 % 13.24%

CCAAT binding 1e- 166 20.24% 7.1 6% proteins

NRF1 1e-83 12.08% 4.64%

B 0.012 - KLF3 0.01 - ETSfamily ..¥:: NFY ~ 0.008 c. - E2Ffamily -c. ::e 0.006 -NRF1 ~ ~ ~ 0.004

0.002

0 -500 -300 -100 100 300 500 Distance from KLF3 peak centre (bp)

Figure 4.5 Co-occurrence of known transcription factor motifs enriched in KLF3 promoter peaks. (A) Known motifs and statistics on enrichment within KLF3 promoter peaks. (B) Distribution of motifs within 1 kb of promoter peaks. Motif occurrences are reported as motifs per base pair per peak around the centre of all KLF3 promoter peaks. Bins are 20 bp

78

Here we have exclusively examined KLF3 promoter peaks so it might be expected that enriched motifs may include typical promoter elements. We found that the core eukaryotic promoter element the CCAAT box was present within approximately 20% of KLF3 peaks. The CCAAT box is typically found around 80bp upstream from the TSS and would therefore be expected to be in close proximity to the KLF3 binding sites, which on average occur 50bp upstream of the

TSS (see Chapter 3) (Dolfini et al. 2009). SP1 is also known to interact physically with the

CCAAT box binding protein NFY (Roder et al. 1999; Wang et al. 2012). Given that SP factors and KLFs can compete for binding sites, this may explain the enrichment of the CCAAT box motif.

We also detected enrichment for the motif for nuclear respiratory factor 1 (NRF1) in approximately 12% of KLF3 peaks. NRF1 is heavily involved in the regulation of energy metabolism and regulates the expression of the enzyme cytochrome c oxidase which is the terminal enzyme in the mitochondrial electron transport chain(Dhar et al. 2008; Wong-Riley

2012). Given that KLF3 mice exhibit a metabolic phenotype with lean body composition and low body weight, there may be some functional cross-over between NRF1 and KLF3 gene regulation (Sue et al. 2008).

A histogram of the distribution of known motifs discussed above around KLF3 peak centres is shown in Figure 4.5B. KLF3 and the other enriched known factors exhibit a central bias. NRF1 uniquely shows a slight depletion at the point of maximal KLF3 binding, perhaps indicating that it cannot bind in close proximity to KLF3.

4.5.2. KLF3 associated motifs in intronic and intergenic peaks

After performing equivalent analyses on intronic and intergenic peaks, it was found that peaks from these two genomic regions showed enrichment for the same known TFs, so for simplicity

79

they are discussed here together. The enrichment of known TF motifs in intronic and intergenic peaks are shown in Figures 4.6 and 4.7 respectively.

Figure 4.6 Co-occurrence of known transcription factor motifs enriched in KLF3 intronic peaks. (A) Known motifs and statistics on enrichment within KLF3 intronic peaks. (B) Distribution of motifs within 1kb of the centre of all KLF3 intronic peaks. Motif occurrences are reported as motifs per base pair per peak around the centre of all KLF3 intronic peaks. Bins are 20 bp.

80

Figure 4.7 Co-occurrence of known transcription factor motifs enriched in KLF3 intergenic peaks. (A) Known motifs and statistics on enrichment within KLF3 intergenic peaks. (B) Distribution of motifs within 1kb of the centre of all KLF3 intergenic peaks. Motif occurrences are reported as motifs per base pair per peak around the centre of all KLF3 intergenic peaks. Bins are 20 bp.

The motif for the heterodimeric transcription factor AP-1 was detected in approximately 32% of intronic peaks and 33% of intergenic peaks. C-Jun, a subunit of AP-1, has previously been shown to interact with KLF5 (Liu et al. 2010) and KLF6 is known to functionally antagonise AP-1’s role

81

as an oncogene (Slavin et al. 2004). The AP-1 motif shows a very strong central enrichment within intronic peaks to a level similar to KLF3 itself. In intergenic peaks, the AP-1 motif shows an even greater enrichment than the KLF3 motif, suggesting that KLF3 might be tethered to DNA by AP-1 at intergenic regions.

The motif of the transcription enhancer factor (TEF) family was discovered in approximately 17% of intronic and intergenic peaks. The four known TEF proteins in humans are known to play a range of roles in tissue specific development (Anbanandam et al. 2006). As the name suggests, these proteins are long-range regulators of gene expression and principally bind to enhancers.

Given that intronic and intergenic KLF3 peaks are likely to contain enhancers, it is not unexpected that these motifs would be enriched in such genomic regions. The TEF motif was found to be centrally enriched in both KLF3 intronic and intergenic peaks (Figures 4.6 and 4.7).

The RUNX family of transcription factors was found to be enriched in both KLF3 intronic and intergeinc peaks with approximately 15% of intronic peaks and 18% of intergenic peaks showing this motif. The three mammalian proteins that form the RUNX family are principally involved in cellular differentiation and cell cycle progression (Durst and Hiebert 2004). Importantly, RUNX proteins are known to interact with c-Fos and c-Jun, two subunits of the AP-1 complex that also show motif enrichment in these peak locations (Hess et al. 2001; D'Alonzo et al. 2002). The

RUNX motifs were centrally located in both intronic and intergenic peaks.

4.6. KLF3 is bound to nucleosome depleted regions.

Nucleosome position can be inferred using a technique called DNase-seq (Wu et al. 1979;

Crawford et al. 2006). The endonuclease DNase I is used to partially digest chromatin resulting in the preferential liberation of DNA fragments from nucleosome depleted regions. These fragments can then be used to create a library for high-throughput sequencing. The net result is a confluence of reads mapping to locations in the genome that are free of nucleosomes with peak height being proportional to the degree of nucleosome depletion across a pool of cells. These

82

regions are termed DNase I hypersensitive (HS) regions and are highly accessible to DNA binding proteins. Such regions commonly demarcate cis-regulatory elements in the genome.

A DNase-seq dataset produced by the Stamatoyannopoulos laboratory at the University of

Washington was analysed using our in-house ChIP-seq pipeline as described in methods (Encode

Project Consortium 2011; Neph et al. 2012). Unfortunately, the mouse ENCODE datasets are not as comprehensive as the human datasets however it way possible to obtain a dataset from a reasonably close related cell type; mouse lung fibroblasts. Although these cell types are similar, it is possible that the DNase hypersensitivity of regulatory regions may vary across these tissues.

Briefly, reads were mapped to the Mus musculus (mm9) genome using Bowtie 2 and a coverage track was generated using HOMER. The co-occurrence of DNase I HS sites with KLF3 peaks was then analysed using HOMER. KLF3 peaks showed a strong enrichment for nucleosome depletion, and splitting KLF3 peaks into subsets based on genomic localization revealed a divergence in the extent of this depletion (Figure 4.8A). KLF3 promoter peaks were found to have almost double the nucleosome depletion compared to peaks in introns or intergenic regions.

As shown in figure 4.8B, the extent of KLF3 occupancy (peak height) was found to be fairly strongly correlated with the degree of nucleosome depletion (R2=0.46) suggesting that KLF3, like other DNA binding proteins, preferentially occupies nucleosome depleted regions.

83

Figure 4.8 Relationship between DNase I hypersensitive sites and KLF3 peaks. (A) Cumulative distribution of nucleosome depletion as measured by DNase-seq at single nucleotide resolution across all KLF3 peaks at promoters, intronic and intergenic regions. The DNase-seq dataset was produced by the Stamatoyannopoulos laboratory at the University of Washington from mouse lung fibroblasts and was released under the ENCODE consortium (Encode Project Consortium 2011; Neph et al. 2012). (B) Correlation (R2) between KLF3 peak height and nucleosome depletion across all KLF3 peaks genome-wide. The black line indicates the linear regression of DNase-seq and KLF3 tag counts,

4.7. KLF3 co-associates with modified histones

Specific histone PTMs are known to generally associate with active or repressed genes and their cis-regulatory elements. For instance, the PTM of histone 3, lysine 4 with a tri-methyl group

(H3K4me3) is associated with promoters of actively transcribed genes, whilst the modification of histone 3, lysine 4 with a mono-methyl (H3K4me1) group is associated with enhancers of poised and actively transcribed genes (Bernstein et al. 2005; Heintzman et al. 2007). Thus a better understanding of the role and state of various regulatory regions can be gleaned by profiling histone PTMs at various regions.

A number of murine histone ChIP-seq datasets are available through the ENCODE project and those available for MEF cells were used to investigate the co-occurrence of KLF3 binding sites with various histone modifications. Unfortunately to date, only a small number of histone marks have been profiled for MEF cells under the ENCODE project, and therefore the analysis here is limited to histone marks predominantly associated with active genes (Table 4.1). These datasets

84

were analysed using the ChIP-seq pipeline as described in methods. Briefly, reads were mapped to the Mus musculus (mm9) genome using Bowtie 2. Alignments were then used to create normalised tag counts across the genome using HOMER and a coverage track was generated using HOMER.

Table 4.1. List of histone modification datasets used and their association with gene expression (Bernstein et al. 2005; Heintzman et al. 2007; Heintzman et al. 2009; Creyghton et al. 2010; Lin et al. 2010; Encode Project Consortium 2011; Rada-Iglesias et al. 2011; Shen et al. 2012).

Histone Predominantly associated Cell Mouse Sequencing modification with line Strain technology H3K4me3 Activated promoters MEFs C57BL/6 Illumina GA II H3K4me1 Activated/poised enhancers MEFs C57BL/6 Illumina GA II H3K27ac Activated MEFs C57BL/6 Illumina GA II promoters/enhancers

A selection of histone profiles around particular proximal and distal regulatory regions is displayed in Figure 4.9. The PTM H3K4me1 is known to mark both poised and active enhancer regions and the presence of H3K27ac can separate poised and active enhancers (Heintzman et al.

2009; Creyghton et al. 2010; Lin et al. 2010). Enhancers that show H3K4me1 alone are typically poised, whilst those that exhibit both H3K4me1 and H3K27ac are active. Panels A and B of

Figure 4.9 show typical active enhancers. At these sites, the KLF3 peaks are centred over regions of nucleosome depletion as indicated by increased peak height in the DNase-seq track. The histones up and downstream of the KLF3 binding site are marked with the combined marks

H3K4me1 and H3K27ac which together mark active enhancers (Creyghton et al. 2010; Rada-

Iglesias et al. 2011). Panel C shows a poised enhancer which is marked by H3K4me1 but not by

H3K27ac.

The H3K4me3 mark is associated with proximal promoters of active genes and has been shown to recruit the TAF3 subunit of TFIID leading to an increase in transcriptional initiation (Bernstein et al. 2005; Heintzman et al. 2007; Vermeulen et al. 2007; Lauberth et al. 2013). Panel D shows an active bidirectional promoter strongly marked by H3K4me3 and panel E shows the promoter of the highly expressed Actn1 gene which is strongly marked both upstream and downstream of

85

the DNase I HS with H3K4me3. Finally, panel F shows a known KLF3 target gene Fam132a

(discussed in Chapter 3) which is known to be repressed by KLF3 in MEF cells. It is interesting to note that even here the promoter is marked with H3K4me3, albeit to a lesser degree.

The co-occurrence of the histone PTMs described in Table 4.1 at KLF3 peaks was then determined genome wide (Figure 4.10A). KLF3 peaks were split into promoter, intronic and intergenic peaks and the analysis was repeated to determine whether there was an association between KLF3 peaks at various genomic locations and particular histone marks (Figure 4.10B).

When the co-localisation of KLF3 peaks with H3K4me3 marks was analysed, a clear and strong enrichment of H3K4me3 was observed within KLF3 peaks compared to the surrounding background regions (Figure 4.10A). The H3K4me3 mark is associated with proximal promoters of active genes (Bernstein et al. 2005; Heintzman et al. 2007). As would be expected, the

H3K4me3 mark was much more enriched at promoters than at intronic or intergenic KLF3 peaks

(Figure 4.10B).

86

Figure 4.9 Histone marks differentiate various proximal and distal cis-regulatory elements. Note that the y-axis scales are identical between panels to allow direct comparison of peak heights. The DNase-seq track was generated from fibroblast data from the Stamatoyannopoulos lab. RNAP II, and histone ChIP-seq data were generated from experiments from the Ren lab at the Ludwig Institute of Cancer Research. Both were sourced from the ENCODE project (Encode Project Consortium 2011; Neph et al. 2012; Shen et al. 2012).

87

Figure 4.10 Co-localisation of histone marks across KLF3 peaks genome wide. Histone ChIP data were sourced from the ENCODE project and were produced by the Ren lab at the Ludwig Institute for Cancer Research (Encode Project Consortium 2011; Shen et al. 2012). Data is presented as a histogram of histone ChIP tag density around KLF3 peak centres. (A) The association of various histone marks with all KLF3 peaks regardless of genomic localisation. The histogram was created using 20 bp bins on the 5kb surrounding the peak centre. (B) Enrichment of various histone marks around the centre of KLF3 peaks from promoter, intronic and intergenic regions. Profiles are shown at single nucleotide resoultion.

H3K4me1 was also enriched at KLF3 binding sites genome-wide, but to a lesser extent than

H3K4me3 (Figure 4.10A). H3K4me1 is known to mark enhancer regions (Heintzman et al. 2009;

Lin et al. 2010) and separation of KLF3 promoter, intronic and intergenic peaks revealed a greater enrichment of H3K4me1 at intronic and intergenic peaks (Fig 10B). H3K27ac is associated with both the promoters and enhancers of active genes (Wang et al. 2008; Creyghton et al. 2010). We found that KLF3 promoters, intronic and intergenic peaks were marked with H3K27ac (Fig 10B).

KLF3 peaks were then split into two groups: those lying in the proximal promoters of genes

(termed proximal) and those peaks lying in either intergenic or intronic regions (termed distal).

It was then possible to create histograms of chromtain mark enrichments at these regions (Figure

88

4.11). The majority of KLF3 peaks lying in proximal elements were associated with high levels of H3K4me3 and low levels of H3K4me1 (which are logically mutually exclusie states).

H3K27ac showed a moderate enrichment at these peaks.

Figure 4.11 Histograms of histone marks reveal the spectrum of associations between histone marks and genomic regions. Histone ChIP data were sourced from the ENCODE project and were produced by the Ren lab at the Ludwig Institute for Cancer Research (Encode Project Consortium 2011; Shen et al. 2012). Data is presented as histograms of histone mark enrichment within 1kb of KLF3 peaks at either proximal or distal cis-elements. The histograms were created using 0.2 log2 tag count bins.

Distal KLF3 peaks showed a number of interesting characteristics. Firstly, a bi-modal distribution of the active promoter mark H3K4me3 was observed where the majority of distal KLF3 peaks showed low H3K4me3 as expected, however a subset of distal peaks showed high H3K4me3.

Given that H3K4me3 is associated exclusively with active gene promoters, these KLF3 peaks may lie in unannotated promoters of coding or non-coding genes, may mark sites of spurious but repetitive transcription or may reflect looping of distal elements to promoters. The majority of distal peaks were marked with H3K4me1, indicating that they were enhancers with potential to be active, but to some extent H3K4me1 was also bimodal, possibly reflecting the looping of promoters to distal regulatory elements. Peaks that were not marked with H3K4me1 are likely to

89

be silent enhancers, other distal regulatory elements or spurious KLF3 binding sites. A spectrum of H3K27ac marks at KLF3 peaks was noted.

The poised or active state of enhancers was then further investiagted by profiling the enrichment of both H3K4me1 and H3K27ac for each KLF3 peak in distal regulatory regions (Figure 4.12).

It should be noted here that the cut-offs that separate these peaks into high/low levels of H3K27ac and H3K4me1 are arbitrary. This simple analysis allows for the KLF3 peaks in distal regions to be grouped into four categories. Those peaks that exhibit H3K4me1hi /H3K27achi are likely to be active enhancers. Peaks showing H3K4me1hi /H3K27aclo are likely to be poised enhancers and all other peaks could be assumed to be silent enhancers, other distal regualtory elements or spurious KLF3 binding sites that have no (or an inconsequential) regulatory role. Most distal

KLF3 peaks are asociated with a chromatin state that suggests a role for these regulatory elements as active enhancers. There are relatively few distal KLF3 peaks with a poised enhancer chromatin configuration although the distinction between these two groups is of course reliant on the cut- off applied to the H3K27ac tag count. There are a reasonable number of peaks that show

H3K27lo/H3K4me1lo that may be inactive enhancers, other regulatory elements or spurious peaks.

90

Figure 4.12 Analysing the relationship between H3K4me1 and HK27ac at distal KLF3 peaks reveals active and poised enhancers. Histone ChIP data were sourced from the ENCODE project and were produced by the Ren lab at the Ludwig Institute for Cancer Research (Encode Project Consortium 2011; Shen et al. 2012). Data are presented as a scatter-plot of H3K4me1 and H3K27ac within 1kb of KLF3 peaks. Dashed lines are arbitrary cut-offs for high and low levels of H3K4me1 and H3K27ac. The percentage of distal KLF3 peaks lying in each of the four quadrants is given in parentheses. 4.8. Discussion

In this chapter we have defined the KLF3 consensus motif based on genome-wide ChIP-seq data for the first time. The KLF1 motif has been previously defined in erythroid cells and the KLF4 motif has previously been defined in ES cells (Chen et al. 2008; Tallack et al. 2010). Based on these three datasets, KLFs are one of the few families where the in vivo binding specificity of different family members can be compared (albeit in different cell types). There is a very high degree of similarity between the emerging consensus DNA motifs of KLF1, KLF3 and KLF4

91

(Figure 4.1). The preferred site matches previously published consensus sequences but we have identified an important role for a G or A at position 1, and confirmed this using in vitro binding experiments (Figure 4.3). This similarity of the binding sites identified for KLF1, KLF3 and KLF4 is not unexpected given the highly conserved nature of their ZF DBDs, where the precise DNA- contact amino acid residues are completely conserved within these three members of the KLF family. Thus the ZF DBD domain clearly plays a significant role in restricting the binding of KLF proteins to CACCC-like binding sites, however the question of how different KLFs achieve their specific and divergent functions remains (and is addressed further in chapter 5).

We observed minor differences in the KLF3 consensus sequence at promoter peaks versus intronic and intergenic peaks. KLFs can tolerate an A or a G at position 6 of the motif and we found that in promoter peaks a G is more prevalent, whereas in non-promoter peaks an A is more common. The sequence 5´-GCCCGCCCC-3´ is known as a GC-box and forms the in vivo consensus DNA binding site for SP factors (Terrados et al. 2012). This motif is almost identical to the KLF3 consensus which is 5´-GCCCCGCCC-3´ suggesting that KLFs and SP factors could compete for GC-boxes. On the other hand, when an A is found in position 6 of the KLF3 motif

(5´-GCCCCACCC -3´), the motif does not conform to the SP consensus sequence and SP binding affinity for this site might be expected to be reduced. Thus the presence of an A or G at position

6 may demarcate two distinct motifs with varying affinities for KLF and SP factors.

Another interesting implication of the presence of an A or G at position 6 of the KLF3 motif is that if this nucleotide is a G, then the upstream Cs on both the sense and antisense strand may become methylated under certain conditions (sense 5´-GCCCCGCCC-3´/ antisense 5´-

GGGCGGGGC-3´ - where methylated Cs are underlined). If this were to occur, such methylation may positively or negatively regulate the binding of TFs to these sites. As such, the KLF3 motif found predominantly in distal genomic regions 5´-GCCCCACCC-3´ may be less prone to methylation compared to the KLF3 motif found predominantly in promoters with the sequence

92

5´-GCCCCGCCC-3´, where residue 5 on the sense strand and residue 6 on the antisense strand may be differentially methylated. This is a topic of further investigation in our laboratory.

When searching for the motifs of known transcription factors within KLF3 peaks, we found that peaks from different genomic regions were enriched for different factors. These enrichments were particular to promoter peaks and non-promoter peaks (intronic and intergenic peaks). The array of factors that were enriched were typical promoter or enhancer motifs. The enrichment of the AP-1 motif at non-promoter peaks stood out due to the known interaction between KLF5 and c-Jun (He et al. 2009; Liu et al. 2010). In addition, KLF6 is known to functionally antagonise c-

Jun (Slavin et al. 2004). At intergenic peaks the central enrichment of the AP-1 motif exceeded that of KLF3, suggesting that KLF3 may be tethered or stabilised at some enhancers by AP-1.

Further work including immunoprecipitations and in vitro binding assays might shed light on whether an interaction can occur between AP-1 and KLF3.

As expected we found that KLF3 peaks were associated with nucleosome depleted regions and that there was a reasonable correlation between KLF3 peak height and the degree of nucleosome depletion. The modest correlation (R2=0.46) may be explained by the ENCODE DNase-seq data being obtained from a different cell line (adult mouse lung fibroblasts rather than MEFs).

We also investigated the co-occurrence of a small number of active and poised histone marks with

KLF3 peaks. Unfortunately we were unable to investigate links between KLF3 occupancy and the presence of repressive marks as these datasets were not currently available for MEF cells.

However, given that CtBP can recruit a range of factors including histone methyltransferases, histone deacetylases and histone-lysine specific demethylases (Pearson et al. 2011), this is a subject of strong future interest in our lab. We found an association between KLF3 peaks and enrichment of histone PTMs that mark active genes and active enhancers. It was shown earlier that KLF3 seems to preferentially occupy promoters of highly expressed genes (Chapter 3), and that KLF3 peak height is correlated with nucleosome depletion (Figure 4.8). Such observations

93

may merely reflect that KLF3 simply occupies permissive chromatin more readily, and that KLF3 peak height is not necessarily an important predictor of function.

An alternate interpretation of these data is that KLF3 may act as a molecular brake, dampening the effect of other activating TFs leading to only modest fold changes on Klf3 ablation. In the evolutionary development of TF families, repressors logically become necessary when systems wish to counteract gene activation by activators. This may either be to switch off bona fide target genes, or to silence or dampen inappropriate transcription brought about by these activators. As such KLF3’s primary function may be to act to temper the activity of other KLF or SP factors, which might explain its association with highly expressed genes and open chromatin. There are three principal observations in support of this notion. Firstly, KLF3 is known to dampen expression of a subset of genes which are activated by KLF1 in erythroid cells (Funnell et al.

2012). Secondly, a number of KLFs are known to compete for target genes and also cross-regulate other KLF family members creating feedback loops (Dang et al. 2002; Eaton et al. 2008) and finally, negative feedback gene regulatory systems of this style are ubiquitous, highly stable and are conserved across higher vertebrates (Becskei and Serrano 2000; Kiełbasa and Vingron 2008).

Thus the widespread occupancy of KLF3 and the association between KLF3 and highly expressed genes may be a combination of both the permissive state of chromatin at active genes, and KLF3’s role in tempering activation by other KLF and SP factors.

In summary, we have investigated the relationship between KLF3 DNA binding and chromatin state in MEF cells. In doing so, we have defined the consensus DNA motif of KLF3 as being highly similar to that of other KLFs and have confirmed important nucleotides using in vitro assays. Investigating the relationship between KLF3 occupancy and chromatin state has revealed that DNase I HS sites are strongly associated with KLF3 peaks and that peak height was proportional to nucleosome depletion. Analysis of histone PTMs showed that most KLF3 peaks at proximal promoters were associated with active states of gene expression based on association with H3K4me3. The majority of KLF3 peaks in intronic and intergenic regions were found to be

94

marked with both H3K4me1 and H3K27ac, suggesting that these KLF3 peaks are largely sites of active enhancers.

95

Chapter 5. Non-DNA binding domains of KLF3 specify chromatin

occupancy.

5.1. Introduction

Transcription factors (TFs) are typically regarded as having two distinct components: a sequence specific DNA-binding domain (DBD) and a trans-acting functional domain that is capable of activating or repressing gene expression. Under this model, the DBD acts to direct the TF to certain regulatory regions in the genome based on its affinity for a particular DNA sequence and the trans-acting domain then imposes regulatory effects on the appropriate gene. Recognizing the capability of the two distinct domains to function autonomously has been helpful in understanding transcription factor function and has led to the development of methodologies, such as the yeast two hybrid system, where two separable domains are reunited to recreate a functional transcription factor. Nevertheless, it is known that the situation is sometimes more complex:

DNA-binding domains can also make functional protein-protein interactions with coregulators, and several results imply that non-DNA-binding domains can contribute to the localization of transcription factors to their target genes (Tsang et al. 1997; Kassouf et al. 2010; Chlon et al.

2012).

Most strikingly, it is now becoming clear that the DNA-binding domains of transcription factors alone are unlikely to provide sufficient specificity to account for the highly selective in vivo genomic profiles being observed in chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experiments. Occupancy profiles show that in vivo, transcription factors are far more discriminating about where they bind than in vitro. That is, in vitro most transcription factors bind to all sites that reasonably match their consensus binding sequence but in vivo only a small subset, are bound. For instance, ChIP-seq studies have revealed that GATA-1 binds to less that 1% of predicted consensus sites in erythroid cells (Fujiwara et al. 2009). The poor correlation between predicted and observed occupancy has been dubbed the “Futility Theorem” by one group of authors based on the assertion that essentially all in vivo TF binding site predictions generated

96

using binding consensus sequences for individual TFs will have no functional role (Wasserman and Sandelin 2004).

At the root of this problem is the length of the DNA-binding motif and the information content contained therein. Given the size of the human genome (~3.9 gigabases), a motif would need to be greater than 16 bp in length to be unique if a random nucleotide distribution is assumed.

Despite this, most eukaryotic TF motifs are rather short and only some positions carry strong sequence preference. The zinc finger (ZF) transcription factors of the KLF family, for instance, recognise a 10 bp sequence with only 4 of these positions being restricted to a single, specific nucleotide (see Chapter 4) (Chen et al. 2008; Tallack et al. 2010). Furthermore, the overall motif is mostly comprised of C and G nucleotides which are over-represented in promoter regions.

Taken together, these observations point to a level of information content far short of what might be expected, given the rather restricted nature of TF binding profiles in vivo. In other words, it does not seem that the DNA-binding surface within the ZF domain alone could provide sufficient specificity to explain observed chromatin occupancy. Thus other TF domains, particularly those outside the DBD, or other phenomena such as the availability of target sites and/or cofactors may also play a role.

To gain a better understanding of the mechanisms by which DNA-binding proteins are localised to particular cis-regulatory elements, we have focused on ZF transcription factors, as ZFs are the most prevalent DBD in the proteome. In particular, we wished to determine whether regions outside the DBD contributed to specificity. We have chosen to investigate the ZF TF KLF3, a factor with a range of biological roles in adipogenesis, erythropoiesis, and lymphopoiesis

(reviewed in (Pearson et al. 2011)). As discussed previously, KLF3 is a member of the KLF family of transcription factors, of which 17 members have been described to date. These proteins are characterised by a highly conserved C-terminal DBD comprised of three tandem classical ZF motifs and are known to bind to CACCC or GC-rich boxes in both proximal and distal cis- regulatory genomic elements (Kaczynski et al. 2003; Suske et al. 2005). The N-terminal domains

97

of different KLF family members vary significantly, such that some KLFs recruit activating cofactors while others bind repressors. Indeed, some KLFs are able to act as activators and/or repressors depending on promoter or cellular context (Kaczynski et al. 2003).

The molecular mechanisms by which KLF3 regulates gene expression have been extensively investigated. KLF3 utilizes its N-terminal, non-ZF domain, to recruit the co-repressor C-terminal binding protein (CtBP) (Turner and Crossley 1998). CtBP in-turn can recruit a range of factors including histone methyltransferases, histone deacetylases and histone-lysine specific demethylases (Laherty et al. 1997; Kaczynski et al. 2001; Sif et al. 2001; Yang et al. 2003) that remodel chromatin to repress gene expression. Thus KLF3 can be regarded as a typical ZF TF with an N-terminal functional domain and a C-terminal DBD.

In this chapter we compare the in vivo DNA-binding specificity of KLF3 and two KLF3 mutants with disrupted N-terminal domains using ChIP-seq to reveal that non-DNA binding domains of

KLF3 are critical for proper occupancy across the genome.

5.2. Experimental model

Having previously established the in vivo binding profile of KLF3 in MEF cells (see Chapter 3), we sought to build on this data set by examining two KLF3 mutants with disrupted N-terminal domains (Figure 5.1A). The first mutant designated ΔDL, contained a two amino acid substitution, with AS replacing DL in the CtBP contact motif – PVDLT – within the N-terminal domain of KLF3. This mutation effectively renders KLF3 unable to recruit its co-repressor CtBP

(Turner and Crossley 1998). The second mutant, designated DBD, involved the deletion of the entire N-terminal domain, leaving just the putative nuclear localisation signal and the DNA- binding domain intact. As was the case with the experimental model used in Chapter 3, the two mutants were tagged with the V5 epitope at the C-terminus such that the tag location was consistent across the mutants to allow direct comparison of binding profiles.

98

Figure 5.1. Design of KLF3 mutants and rescue of Klf3-/- MEFs. (A) Schematic showing the three constructs used to rescue Klf3-/- murine embryonic fibroblasts (MEFs). Wild-type KLF3 was previously examined in Chapter 3. (B) A representative Western blot showing relative levels of ectopic expression of the three constructs in rescued Klf3-/- MEFs. (C) Densitometry results from (B) shown in arbitrary units.

As was the case in Chapter 3, Klf3-/- murine embryonic fibroblasts (MEFs) were rescued with mutant, epitope-tagged Klf3 using the murine stem cell virus retroviral delivery systems (as described in Methods). We purposefully selected a rescue model for these experiments to avoid

99

any effects due to the presence of endogenous KLF3. Stable clonal cell lines expressing ΔDL and

DBD were then isolated under puromycin selection. In order to minimise the possibility that differences between the mutants and wild-type KLF3 might be affected by differential expression, cell lines with equivalent levels of KLF3, ΔDL and DBD protein were selected based on comparison by western blot (Figure 5.1B). As previously shown in Chapter 3, the level of ectopic

KLF3 was also shown to be similar to the level of endogenous KLF3 expressed in wild-type

MEFs (Figure 3.1B). Cell lines with physiological levels of wild-type or mutant Klf3 expression were deliberately selected to ensure that additional targets were not identified purely because of excessive ectopic expression. Immunofluorescence confocal microscopy was undertaken to confirm that the KLF3 mutant proteins were all correctly localised to the nucleus (Figure 5.2).

5.3. ChIP-seq

ChIP-seq was performed in duplicate on MEF cells expressing the mutant forms of KLF3; ΔDL or DBD. These ChIP-seq experiments were performed in parallel with those in Chapter 3 in the manner previously described. Briefly, approximately 5x107 MEF cells were crosslinked with 1% formaldehyde for 5 minutes before being lysed and sonicated to produce dsDNA fragments of

100-200 bp in length. Epitope tagged complexes containing either KLF3 or KLF3 mutants were pulled down using a V5 antibody on magnetic beads. Pull-downs were washed and DNA was eluted from the beads. Library preparation was performed on the 4 input (IN) and 4 immunoprecipitation (IP) samples using sample specific adapters such that 3 samples were run per lane (2 and 2/3 lanes used in total). Samples were sequenced using 50bp chemistry on the

HiSeq 2000 (Illumina, San Diego, CA). Raw read counts per sample are given in Table 5.1.

Sequences were then aligned to the Mus musculus (mm9/NCBI37) genome using Bowtie2

(Langmead and Salzberg 2012) under the same criteria discussed in Chapter 3 and detailed in

Methods. A reduced mapping efficiency was noted in the IP samples. This commonly occurs as the library staring material for an IP sample is usually of much lower concentration than an IN sample resulting in reduced yield compared to input.

100

Figure 5.2. Immunofluorescence confocal micrograph showing sub-cellular localisation of KLF3 and KLF3 mutant proteins. Green is anti-V5 FITC conjugated antibody, blue is nuclear DAPI stain.

101

Table 5.1 Sequencing read and alignment statistics (KLF3 wild-type data was previously reported in Chapter 3).

Mapped to mm9 Fraction Sample Reads Round 1 Round 2 Total mapped Inputs KLF3_IN-1 71,393,072 69,239,545 1,047,776 70,287,321 98% KLF3_IN-2 91,750,375 89,870,419 902,061 90,772,480 99% ΔDL_IN-1 73,683,716 71,846,509 883,882 72,730,391 99% ΔDL _IN-2 84,401,726 82,654,459 838,840 83,493,299 99% DBD_IN-1 76,069,171 74,726,669 643,595 75,370,264 99% DBD _IN-2 83,372,096 81,767,929 775,603 82,543,532 99% Total 480,670,156 470,105,530 5,091,757 475,197,287 99% Immunoprecipitants KLF3_IP-1 67,059,305 36,117,799 4,178,265 40,296,064 60% KLF3_IP-2 68,883,891 44,497,337 2,517,949 47,015,286 68% ΔDL _IP-1 57,033,726 14,968,689 5,220,457 20,189,146 35% ΔDL _IP-2 71,526,811 30,829,561 4,746,531 35,576,092 50% DBD _IP-1 69,247,176 38,472,342 4,062,922 42,535,264 61% DBD _IP-2 70,763,867 42,291,793 4,000,621 46,292,414 65% Total 404,514,776 207,177,521 24,726,745 231,904,266 57% IN=input, IP=immunoprecipitant, numerated suffixes indicate replicates (1 or 2).

5.4. Evaluation of replicates

Peaks were called on each individual Klf3 mutant IP sample using the paired IN sample as a control using the software package HOMER (Heinz et al. 2010). A total of 17,206 (P < 1.7x 10-

7) and 32,624 (P < 5.6x10-7) peaks were called in ΔDL_1 and ΔDL_2 samples respectively and

9,156 (P < 7.3x 10-8) and 8,105 (P < 9.4x10-8) peaks in DBD_1 and DBD_2 samples respectively

(Figure 5.3). As previously reported in Chapter 3, a total of 26,772 (P < 7.1x 10-8) and 23,872 (P

< 1.1x106) peaks were called in KLF3_1 and KLF3_2 samples respectively.

The degree of overlap between the replicates was then established (Figure 5.3) with the core sets of 12,248 and 4,955 overlapping peaks taken to be high confidence ΔDL or DBD peaks

102

respectively. The degree of correlation between the two replicates at these high confidence peaks was then investigated. The correlation between the replicates was quite strong for both cell lines and was comparable to the correlation for KLF3 (Chapter 3) (Fig 5.4). ΔDL showed some degree of divergence in replicate correlation compared with the other cell-lines and also showed fewer reads sequenced and mapped (Table 5.1). It is likely that for the ΔDL samples, particularly for replicate 1, that the library preparation process has not been as successful as for the other samples.

Nevertheless, a good correlation is still observed and results reported later in the chapter support the quality of this data. Based on the correlation between replicates, the overlapping high confidence peaks were carried forward for further analysis.

103

A Klf3 1 Klf3 2

Total 26,772 Total 23,873

p < 1.1 x 1o-6

FOR 0.001

B ~OL 1 ~OL 2

Total 17,206 Total 32,624

p < 1.7 X 10-7

FOR 0.001

080 1 080 2

Total9,156 Total8,105

p < 7.5 X 10-8

FOR 0.001

Figure 5.3 Overlap of peaks called for each KLF3 ChIP-seq replicate. Data for wild-type KLF3 (Chapter 3) are shown in (A) allowing comparison with new data for the mutant forms of KLF3 shown in (B). FDR cut-offs and P- values for each replicate are given. Venn diagrams are proportional.

104

A 12 N -C) 0 .J R2 = 0.642 -c: 10 ~ 0 (.) C) ~ 8 N I M LL .J :::.::: 6 7 8 9 10 11 12 KLF3-1 Tag Count (Log2)

8 12 N -C) 2 0 R = 0.300 .J -c::: 10 ~ 0 (.) C) ~ 8 N I .Jc

Figure 5.4. Correlation between replicates at overlapping peaks. Data for wild-type KLF3 (Chapter 3) are shown in (A) allowing comparison with new data for the mutant forms of KLF3 shown in (B). The overlapping peak regions between replicates were analysed to determine their correlation based on normalized tag counts within 400 bp of each peak centre.

105

5.5. Confirmation of peaks by ChIP-PCR

The ChIP-seq data for ΔDL and DBD was validated using the same ChIP-PCR approach used in

Chapter 3. A number of genomic regions were selected to test the binding of ΔDL and DBD at both known KLF3 target genes and negative control regions. These regions included the known targets Klf8, Lgals3 and Fam132a (Eaton et al. 2008; Funnell et al. 2012), a new peak at the

Stard4 promoter and two previously established unbound regions in the Klf8 locus (Eaton et al.

2008). Independent ChIP assays were performed on each of the ΔDL and DBD cell lines in duplicate and the recovered DNA was subjected to amplification by quantitative real-time PCR using primers for the six specific loci (primer sequences and genomic coordinates of amplicons are available in appendix A). As shown in Figure 5.5A, the ChIP-PCR confirmed the presence of the expected peaks and the absence of peaks in negative control regions. IgG controls analysed with these primer sets have previously and consistently demonstrated that the IgG background signal is similar to unbound regions (data not shown). Even at this preliminary stage of analysis divergent binding profiles were beginning to emerge between KLF3, ΔDL and DBD at particular loci.

5.6. Qualitative peak comparisons

Using the KLF3 data generated in Chapter 3, along with the KLF3 mutant data generated here, it was possible to visually compare the binding profiles of KLF3, ΔDL and DBD. In addition, the

DNase-seq data analysed in Chapter 4 from the ENCODE project was also included (Encode

Project Consortium 2011; Neph et al. 2012). A range of striking differences were observed in the binding profiles of the three proteins and a number of illustrative peaks are displayed in (Figure

5.6). Panels A-B show peaks where all three KLF3 constructs have similar binding profiles in the introns of the Grin1 and Map3k6 genes. Panel C shows an example of dramatic loss of binding by ΔDL and DBD at the promoter of the Rc3h1 gene. Panel D shows a new binding activity by

ΔDL that is not present in KLF3 3´ to the Epgn gene. Panel E shows similar binding by DBD in the Lmna gene in the region marked by red bars, but loss of binding at the peak to the immediate

106

left of the marked section. In panels F-I, both ΔDL and DBD show loss of binding at the promoters of various genes, again marked by red bars, but binding of ΔDL or a much reduced level of binding by DBD at other nearby peaks.

Figure 5.5 (Overleaf) Confirmation of ChIP-seq peaks by ChIP-PCR. (A) Relative enrichment of KLF3 and KLF3 mutant binding sites as measured by ChIP-PCR. Values are expressed as a fraction of input. Error bars represent SEM. (B) ChIP-seq tracks of the control peaks amplified in (A). The location of the qPCR amplicons is indicated by the black boxes. Panels in the top row assessed KLF3 binding in the Klf8 locus at sites which have previously been investigated (Eaton et al. 2008). Lgals3 and Fam132a are previously established targets based on studies of Klf3 ablation (Funnell et al. 2012). Stard4 is a novel Klf3 target identified using ChIP-seq.

107

A 7.00%

6.00%

-S, S.OO% • KLF3 c ~ • D.DL 04.00% Q) C) • DBD ca 'C 3.oo% Q) ...(.) :. 2.00%

1.00%

0.00% (-) Ctr/1 (-) Ctrl 2 Klf8 1a Lga/s3 Fam132a Star04 Locus

(-) Ctrll (-) Ctrl 2 Klf8 (-4.5kb exon l a) Klf8 (+33kb exon l a) Klf8 promoter l a B ChrX 3 kb ChrX 3 kb ChrX 3 kb 149,667,000 149,668,000 149,669,000 149,700,000 149,701,000 149,672,000 149,673,000 I I I I

(~500) KLF3 (~500) KLF3 (~500) KLF3

-(~500) t.Dl (~500) t.Dl [~500) t.DL

(~500) DBD (~500) DBD DBD

- - • Amplicon • Amplicon • Amplicon-

Klf8 Klf8

Refseq Refseq Refseq

Lgals3 (-20kb) Fam132a Stard4 Chr1 4 3kb Chr4 3 k:b Chr5 3 k:b 47,973,000 47,974.000 47,975,000 155,335,000 155,336,000 155,337,000 33,373,000 33,374,000 I I I I I I I I (~00) KLF3 (~500) KLF3 (~) KLF3

- !_ 1 (~00) t.Dl (~500) t.Dl (~) t.DL

-· -- l - oso · --..&. --~- (~00) DBD (~500) - (~500) DBD -- __. Amplicon • Amplicon • Amplicon

faml32a Stard4

Refseq Refseq Refseq

Figure 5.5 Confirmation of ChIP-seq peaks by ChIP-PCR. See legend on previous page.

108

A 6.2 kb Chr2 B 4.8 kb Chr4 C 10 kb Chr1

1JZ.IN,IOO. 1:a2.-..... U2,11117.oot. tU,IOI,OOU,. tu,u:t ~ 1u,I:M.. 1tz.IM.. 1u.&ll II* 1U.... .,. I I I I I I I I I [0-1200] KLF3 [0-900] KLF3 [0-900] KLF3 l _ _L __ [0-1200] j LlDL [0-900] LlDL [0-900] LlDL [0-1200] DBD [0-900] DBD [0-900] DBD j [0-250] DNase I HS [0-250] DNase I HS [0-250] DNase I HS

" 'I I 1 1 1 I .. 1 . 1 1 1 ...... Lrrc26 Grin1 Map3k6 Rc3h1 Grin1

Refseq Refseq Refseq

D 10 kb Chr5 E 9.4 kb Chr3 F 13 kb Chr17 ,,...., ...... M.M... Iop ...... M,lll:rlllo ...... "' ...... I I I I ...... I I I I I I [0-1200] KLF3 [0-700] KLF3 [0-:::J KLF3 - ... -. [0-1200] LlDL LlDL [0-500] LlDL -- ...! . j . __ _._ ____ .... [0-1200] DBD DBD (0-500] DBD

- ·"- [0-250] DNase I HS [0-250] DNase I HS [0-250] DNase I HS

~ Epgn Lmna Thada Lmna Refseq Lmna Refseq Refseq

G 13 kb Chr17 H 26 kb Chr17 8.5 kb Chr9 ,...... ,...... ,...... • ,...... I I I I I I [0-500] KLF3 [0-500] LF3 [0-9 OJ KLF3 ~ .. lk_--·-·~ -- - (0-500] LlDL [0-500] DL LlDL

- - · ....&- ••.• • • A .JL.-- ·. .~ .. [0-500] DBD [0-500] BD DBD

,.. - ~~ DNase I HS (0-250) DNase I HS [0-250] DNase I HS

I • I • I •I I I I • I I I I Birc6 Fez2 Anxa2 I Mir3109

Refseq Refseq Refseq

Figure 5.6 An illustrative range of peaks showing similarities and differences between the occupancy of KLF3, ΔDL and DBD. Notable changes in occupancy are highlighted by red vertical bars. Note that the y-axis scales may differ between panels but that within a panel, KLF3, ΔDL and DBD tracks use the same scale to allow direct comparison of peak height. The DNase-seq track was generated from fibroblast data from the Stamatoyannopoulos lab and was sourced from the ENCODE project (Encode Project Consortium 2011; Neph et al. 2012).

109

5.7. Comparison of peak distribution.

We were interested to see whether KLF3, ΔDL and DBD bound to the same regions in the genome. The overlap of peaks between the three mutants is shown in Figure 5.7. For peaks to be considered overlapping their boundaries had to literally overlap in genomic space. KLF3 and ΔDL show some degree of overlap however the majority of sites are not overlapping. Around half of the DBD peaks overlap with KLF3. ΔDL and DBD show a close relationship to each other with the vast majority of DBD peaks co-occurring with ΔDL.

Figure 5.7 Contrasting KLF3, ΔDL and DBD peaks. A proportional Venn diagram showing the overlap of peak locations between the three proteins.

The distribution of KLF3, ΔDL and DBD peaks were analysed based on genomic region. The raw numbers of peaks occurring within each genomic region are displayed in Figure 5.8A. It is immediately apparent that there is a dramatic reduction in the number of peaks in the ΔDL and

DBD experiments compared to KLF3 at the proximal promoters of genes, consistent with the data shown previously in Figure 5.6F-I. ΔDL shows around a 75% reduction in the number of promoter peaks, whilst DBD shows an almost complete loss in the number of promoter peaks

110

(Figure 5.8A). In intronic and intergenic regions, ΔDL and KLF3 showed similar numbers of peaks, while DBD simply showed fewer peaks overall.

Figure 5.8 Peak locations and peak heights. (A) Distribution of KLF3 and mutant peaks across the genome. Promoters are defined as the region -1000 bp, +100 bp around the TSS of Refseq genes. Peaks that fell into CDS exons, 5’ and 3’ UTR exons and transcription termination sites (-100 bp to +1 kb) were all labelled as ‘other’. (B) Histogram showing the distribution of peak heights for KLF3 and the two mutant proteins, ΔDL and DBD.

It was possible to quantify the number of sequencing tags falling under each peak in order to compare peak heights across KLF3, ΔDL and DBD. Sequencing tags had already been normalised and were expressed as tags per 100M reads to allow direct comparison between the three experiments. Reads were counted within a 400 bp region surrounding each peak centre for KLF3,

ΔDL and DBD genome-wide. A histogram of peak height across the three proteins is presented in Figure 5.8B. KLF3 exhibits the greatest peak heights evidenced by the shift of frequency profile to the right. ΔDL shows slightly less binding at higher peak heights but has more peaks showing weak binding than KLF3. DBD exhibits a strong loss of binding with a large shift to the left. It is also clear that there are far fewer DBD peaks called.

We then looked more closely at the differences in peak heights between KLF3, ΔDL and DBD by comparing their mean peak heights at various genomic regions (Figure 5.9). Both ΔDL and in particular DBD show a smaller mean peak height than KLF3 at the proximal promoter of KLF3 bound genes, reinforcing observations from previous analyses (Figure 5.6F-I, Figure 5.8A). At

111

intronic and intergenic regions, KLF3 and ΔDL showed similar mean peak heights, whilst DBD exhibited much weaker occupancy.

Figure 5.9 Differences in peak heights between KLF3 and KLF3 mutants at various genomic regions. Values are mean peak heights (reads/100M reads within 400 bp of peak centres). Error bars represent SEM. * P < 0.05 ** P < 0.00005. 5.8. Comparison of DNA consensus motifs

We also examined the KLF3, ΔDL and DBD peaks to see whether there were any differences in the KLF3 or KLF3 mutant consensus binding motifs (Figure 5.10). Consensus motifs were very similar between KLF3 and ΔDL although only 142/500 ΔDL peaks showed the presence of the motif compared with 448/500 for KLF3. The DBD motif was somewhat different to the KLF3 and ΔDL motifs, although some of the observed differences may be due to the small samples size as few DBD peaks (52/500) showed the presence of this consensus. The reduction in the number of occurrences of the motifs in the ΔDL and DBD peaks suggests that the specificity of these proteins has been compromised by the mutations introduced.

112

Figure 5.10 Motif differences between KLF3, ΔDL and DBD peaks. De novo motif discovery was performed on the central 100 bp of the top 500 peaks ranked by peak height using MEME (Bailey and Elkan 1994). Note that the frequency at which the motif occurs in the top 500 peaks is given below each motif.

5.9. Conservation of peaks for bona fide KLF3 target genes

We also examined occurrence of KLF3, ΔDL and DBD peaks within the promoters of KLF3- responsive genes (showing >2-fold repression on rescue) based on the microarray results presented in Chapter 3 (Table 3.2). The promoters of the 65 KLF3 bound and repressed genes were examined for peaks for each of the three KLF3 constructs. It is important to note that each gene can have more than one peak location within the promoter region (-1000 to +100 relative to the TSS). In total, 72 distinct peak locations were identified at these promoters and the overlap in binding for each of the mutants is shown in Figure 5.11 below. Interestingly, most were only bound by KLF3 and were not bound by ΔDL and or by DBD. This suggests that the mutation of the CtBP binding domain has a strong effect in reducing KLF3 occupancy at KLF3-responsive promoters. Similarly, deletion of the N-terminus also seems to have a strong effect in reducing occupancy at KLF3-responsive promoters. The peaks that showed unique ΔDL binding sites are presumably due to new binding specificities shown only by ΔDL.

113

Figure 5.11 KLF3, ΔDL and DBD peaks at the promoters of KLF3-responsive genes. The promoters of the 65 KLF3 promoter-bound genes that showed >2-fold repression on KLF3 rescue were analysed for occupancy by each of the three KLF3 constructs. Numbers of unique and overlapping peaks are displayed in a proportional Venn diagram. 72 distinct peak locations were identified.

5.10. Discussion

In this chapter, we have examined the role of non-DNA binding domains in determining the genome-wide occupancy profile of KLF3. We used a deletion and a point mutation in the N- terminus of KLF3 to test the contribution of non-DNA-contact regions to in vivo specificity and found that the mutations had a profound effect on binding in vivo. In general, deletion of the entire

N-terminus, leaving only the ZF domain, significantly reduced binding, and mutation of the CtBP- contact motif by the two amino acid substitution reduced binding to a lesser extent. However, the actual profiles were complex, with examples of some regions where the mutants bound as well as wild-type, and some where they bound better, as well as the more wide scale reduction in binding at many locations. The mutants retained the preference for typical CACCC-like motifs, consistent with the fact that they were still relying on an intact ZF DBD for binding but the stringency of binding and heights of peaks was often reduced. Nevertheless, the results argue strongly that these

N-terminal domains, hitherto thought to be dispensable for DNA-binding in vitro, are of considerable importance in vivo.

114

One hypothesis to explain this observation and the related finding that other KLF family members with different N-terminal domains bind different genes is that these KLFs and the mutants might differ in their ability to contact cofactor proteins that somehow enhance or increase the specificity of binding. KLF3 and KLF4, for example, bind CtBP but KLF1 does not (Pearson et al. 2008;

Liu et al. 2009). The KLF3 ΔDL construct that cannot bind CtBP only differs from wild-type

KLF3 by the mutation of two amino acids making the observed changes in occupancy quite remarkable. KLF3 and ΔDL both bind to DNA with similar affinity in vitro (Turner and Crossley

1998) suggesting that binding differences in vivo may be attributable to contextual factors such as the presence of CtBP. KLF1 recruits entirely different cofactors including CBP/p300 (Zhang and Bieker 1998), which may well contribute to the different specificities.

Precisely how CtBP contact may alter binding specificity in vivo is not currently clear, but several direct and indirect effects may be at play. Most simply, one should note that CtBP is capable of self-associating and contacting more than 30 other vertebrate TFs (Chinnadurai 2007). It may therefore act as a bridging molecule linking KLF3 to other DNA-bound transcription factors and enhancing targeting to specific loci already occupied by these factors (Figure 5.11A). In this way, the CtBP-binding motif may be important for directing KLF3 to specific sites and loss of the motif could result in loss of binding those sites. It may be particularly relevant that the CtBP binding mutant appears to have particularly lost the ability to target promoter regions, regions where additional TFs may well be bound.

Indirect effects may account for the curious observation that the CtBP contact point mutant actually bound better to certain loci. It is important to recall that KLF3 is a transcriptional repressor that appears to have the capacity to remodel chromatin domains to be less permissive by recruiting CtBP. However in Klf3-/- cells rescued with the KLF3 CtBP contact point mutant, these regions of chromatin will not be shut down and may remain open and accessible (Figure

5.11B). It is possible that the KLF3 mutant then has additional access to these loci, because they are more open rather than because the loss of CtBP contact facilitates KLF3’s ability to target

115

specific loci. Figure 5.6D provides a good example of such a circumstance. Here, ΔDL has acquired a new binding specificity that occurs at a region where there is nucleosome aggregation in WT fibroblasts (DNase-seq track shows a low level of tags at this newly acquired peak) but which may be open in the ΔDL rescued cell line.

The occupancy of DBD is reduced many fold at most peaks, again particularly at gene promoters.

It is difficult to tell whether to interpret the data as a significant loss of binding overall or a spreading of binding across more regions of the genome, giving lower peaks on average.

Progressive deletion of the N-terminus of KLF3 increases DNA binding affinity in vitro (Burdach et al. 2013). This increased affinity for DNA may lead to increased promiscuity by DBD and less specific binding in vivo (Figure 5.11C). If DBD’s binding affinity for DNA was increased, it may be redirected to what would usually be lower affinity sites, resulting in low occupancy at a greater number of genomic regions. This trend would result in a large reduction in peak height at KLF3 targets sites and agrees with the observed data. Further supporting this notion is the increased level of background in the qPCR negative controls in locations where KLF3 is not normally bound

(Figure 5.5A). Also of note is that the related protein KLF1 is known to have an autoinhibitory domain immediately N-terminal to the ZF region (Chen and Bieker 1996). This domain has been shown to inhibit DNA binding by interacting in cis with the DBD.

The observation that regions outside the ZF DBD of KLF3 are required for proper in vivo DNA binding is unexpected but suggests that the N-terminus may be involved in recruiting KLF3 to its target sites. The finding that regions outside the DNA-binding domain of a transcription factor may be involved in gene recognition fits with the observation that certain transcription factors retain functions even when their DNA-binding domain is mutated. For example, an SCL/TAL1 mutant with a non-functional DBD has been shown to partially rescue a knockout phenotype in haematopoietic cells (Porcher et al. 1999). ChIP-seq revealed that this DBD mutant could still occupy around 20% of the binding sites that were bound by the wild-type protein (Kassouf et al.

2008; Kassouf et al. 2010).

116

A Wild-type KLF3 .. •, ._...... <+.. ! r--- (J}J ~~~ ~ ~~~ (}JJ (}JJ (}JJ (jj) cc£((l ® (J}Lf}JJ TSS (Jj)_JJ}J._

ChiP-seq track A

b.DL mutant KLF3

· • 3!il I I I

ChiP-seq track

ChiP-seq track

b.DL mutant KLF3 ;4111 I I I 11 1 .J1!J. ® ® ccxU(l (JJJ (}j)_JJJJ

ChiP-seq track A

c Wild-type KLF3 ....~ . "'- 1!! r--- (J}J ~~~~~~~ (}JJ_f/J)_JJJ) (jj) cc£((( ® (jj) (jj) TSS (})J ®_

ChiP-seq track A

DBD mutant KLF3

080 I£II)

ChiP-seq track

117

Figure 5.11 Proposed models explaining observed changes in occupancy between wild-type KLF3 and mutants. Figure legend overleaf.

Figure 5.11 Proposed models explaining observed changes in occupancy between wild-type KLF3 and mutants (see previous page). (A) CtBP is known to dimerise and can associate with over 30 other mammalian transcription factors (Chinnadurai 2007). It is possible that such interactions may stabilize wild-type KLF3 at certain genomic regions. (B) CtBP can modify chromatin domains via recruitment of a range of histone modifying enzymes. CtBP’s action at some regulatory elements may reduce occupancy by making chromatin less permissive. When KLF3 cannot properly recruit CtBP due to the ΔDL mutation, occupancy may increase as a result of chromatin being more open. (C) The DBD mutant shows higher DNA binding in vitro (Tan unpublished results) and also lacks the N-terminal domain that recruits CtBP. These two changes may lead to a decreased level of DNA-binding specificity, and a concomitant increase in DNA-binding affinity, which offers a potential explanation for the reduced occupancy observed genome- wide.

Similarly, studies on GATA-1 have recently revealed how cofactors can influence in vivo DNA- binding specificity. GATA-1 occupancy was shown to be dependent on its interaction with the cofactor Friend of GATA-1 (FOG-1). A GATA-1 mutant carrying a non-functional binding domain for FOG-1 displayed a different occupancy profile than wild-type protein (Chlon et al.

2012). In the absence of FOG-1, GATA-1 occupies mast cell specific genes and forced expression of FOG-1 can displace GATA-1 from these targets. Again, the precise mechanisms by which

FOG-1 alters GATA-1 specificity are not yet clear.

Although we do not yet fully understand the mechanisms that determine the specificity of KLF3, it is clear that regions of the protein outside of the ZFs do influence targeting. In general, deletion of the entire N-terminus, significantly reduced binding, and mutation of the CtBP-contact motif reduced binding to a lesser extent. However, the dependence upon N-terminal domains for proper specificity is complex, with instances where the mutants showed similar binding to wild-type and other instances where binding was lost or gained. The finding that non-DNA binding domains can affect KLF3 occupancy in such a manner has broader implications for understanding how

TFs function in vivo. The ZF domain is the most common DBD in higher organisms and a large number of proteins show a high level of conservation with KLFs, including SP factors, the GLI family, TFIIIA, WT1, YY1 and others (Schuetz et al. 2011). Thus it is likely that the specificity of other factors may also be dependent on non-DNA-binding domains. Finally, understanding how these additional domains operate may help in the design of yet more effective and specific artificial ZF proteins.

118

Chapter 6. General discussion

In this thesis, the specific chromatin occupancy of KLF3 in MEF cells has been determined. These data have led to a number of insights into the mechanisms of KLF3 mediated gene repression and have enabled a deeper understanding of how the proteins of the KLF family may achieve their differential functions. The KLF3 occupancy map has allowed the definition of a set of putative

KLF3 target genes to be confirmed based on KLF3 promoter binding (Figure 3.8). These genes had previously been identified based on gene expression changes on Klf3 ablation (Eaton et al.

2008; Sue et al. 2008; Vu et al. 2011; Funnell et al. 2012; Bell-Anderson et al. 2013), and the revelation that a subset are also bound by Klf3 in MEFs has enhanced our understanding of which genes are directly regulated by KLF3. Importantly, in addition to validating previously identified

KLF3 targets, KLF3 occupancy and gene expression microarray analysis has led to the identification of novel KLF3 targets in MEFs (Table 3.2).

6.1. The biological role of KLF3

A number of the genes identified or validated in this thesis are the subject of current and future work in our lab and some of these genes may contribute to the metabolic defects previously identified in Klf3-/- mice (Sue et al. 2008). Male Klf3-/- mice show a metabolic phenotype with lean body composition and low bodyweight (Sue et al. 2008). On a high fat diet they are resistant to diet induced obesity and show reduced susceptibility to insulin resistance. Three genes in particular that we have identified may be important in determining this phenotype.

The Fam132a gene codes for an adipokine (a metabolic signalling molecule) called adipolin with roles in insulin sensitisation and metabolism (Enomoto et al. 2011; Enomoto et al. 2012; Wei et al. 2012a; Wei et al. 2012b). Recent work in our lab has determined that upon Klf3 ablation,

Fam132a becomes derepressed resulting in a systemic elevation of plasma adipolin (Bell-

Anderson et al. 2013). Another KLF3 target gene, Galectin 3 (Lgals3), is also involved in metabolism, with Lgals3-/- mice exhibiting a high-fat diet induced obesity phenotype and dysregulated glucose regulation (Pang et al. 2013; Pejnovic et al. 2013). A novel KLF3 target

119

Stard4, has also been identified in the experiments in this thesis. STARD4 is involved in cholesterol transport, particularly the movement of cholesterol to the endoplasmic reticulum and the formation of lipid droplets (Rodriguez-Agudo et al. 2011). These three genes may collectively contribute to the metabolic changes observed in Klf3-/- mice.

6.2. The molecular role of KLF3

In aggregate, the gene expression and KLF3 occupancy data analysed herein suggest that KLF3 is predominantly a repressor of transcription (Figure 3.9). Additionally, the combination of these data with previously reported RNAP II ChIP-seq data (Encode Project Consortium 2011; Shen et al. 2012) has revealed an association between RNAP II pausing at promoters bound by KLF3.

Given that KLF3 binds closely to the TSS and to the TATA box, it is possible that KLF3 could interact either directly or indirectly with RNAP II. KLF3 bound genes exhibit more RNAP II promoter pausing, even when normalised for expression level.

When searching for the motifs of known transcription factors within KLF3 peaks, it was found that peaks from different genomic regions were enriched for different factors that typically associate with either promoters or enhancers. The enrichment of the AP-1 motif at distal (non- promoter) peaks is of interest as another KLF family member, KLF5, has previously been shown to interact with the AP-1 protein c-Jun (He et al. 2009; Liu et al. 2010). At intergenic peaks the central enrichment of the AP-1 motif exceeded that of KLF3, suggesting that KLF3 may be tethered or stabilised at some enhancers by AP-1. Further work including co- immunoprecipitations and in vitro binding assays might shed light on whether an interaction can occur between AP-1 and KLF3.

When the relationship between KLF3 occupancy and chromatin state was analysed, it was revealed that KLF3 preferentially bound nucleosome depleted regions (NDRs), as revealed by previously reported DNase sensitivity mapping (Encode Project Consortium 2011; Neph et al.

2012 ). Interestingly, KLF3 occupancy (peak height) was roughly correlated to the degree of nucleosome depletion. Despite KLF3 being known to be a repressor of transcription, it was found

120

to associate with active chromatin marks and highly expressed genes. The examination of enhancers bound by KLF3 revealed that the majority were active enhancers rather than poised enhancers.

The simplest explanation for these observations is that like other DNA-binding factors, KLF3 preferentially binds to NDRs, which predominate in the promoters of highly expressed genes and active enhancers. An alternate interpretation of these data is that at some active loci KLF3 may act as a molecular brake, thereby explaining its association with highly expressed genes and open chromatin. In this role, KLF3’s primary function may be to temper the activity of other KLF or

SP factors that bind to the same site or region to drive transcription. This is supported by three observations. Firstly, KLF3 is known to repress a subset of genes that are activated by KLF1 in erythroid cells (Funnell et al. 2012). Secondly, KLFs are known to compete for target genes and also cross-regulate other KLF family members creating feedback loops (Dang et al. 2002; Eaton et al. 2008) and finally, negative feedback systems of this style are near ubiquitous, highly stable and are conserved across higher vertebrates (Becskei and Serrano 2000; Kiełbasa and Vingron

2008). Additionally, the increased level of RNAP II pausing at KLF3-bound genes also supports the idea that KLF3 may act as a transcriptional brake. Thus the widespread occupancy of KLF3 and the association between KLF3 and highly expressed genes may be a combination of both the permissive state of chromatin at active genes, and KLF3’s role in tempering activation by other

KLF and SP factors.

There is a very high degree of similarity between the emerging consensus DNA motifs of KLF1,

KLF3 and KLF4 (Figure 4.1). This is not unexpected given the highly conserved nature of their

ZF DBDs, where the precise DNA-contact amino acid residues are completely identical.

Furthermore, it seems that these sequence preferences will likely be shared across the rest of the

KLF family due to the extent of the DBD conservation. Thus the ZF DBD domain clearly plays a significant role in restricting the binding of KLF proteins to CACCC-like binding sites, however the question of how different KLFs achieve their specific and divergent functions remains.

121

6.3. The in vivo specificity of KLF3

To examine the specificity issue further, we investigated the role of non-DNA-binding domains of KLF3 in determining in vivo KLF3 occupancy. This was accomplished by examining a KLF3

N-terminal deletion mutant and a point mutant unable to bind the cofactor CtBP. When the occupancy profiles of wild-type KLF3 and the two mutants were compared, it was found that the mutations had a profound effect on binding in vivo. In general, deletion of the entire N-terminus, leaving only the ZFs, significantly reduced binding, and mutation of the CtBP-contact motif by the two amino acid substitution reduced binding to a lesser extent. However, the actual profiles were more complex. In some regions peaks exhibited near identical occupancy in the mutants compared with wild-type, whilst in other locations the mutants bound better. The most widespread effect however, was a general reduction in binding by the mutants at many locations, particularly promoters.

The two KLF3 mutants still exhibited a preference for typical CACCC-like motifs, consistent with the fact that they were relying on an intact ZF DBD for binding, however the stringency of binding and heights of peaks was generally reduced. Nevertheless, the results argue strongly that

N-terminal domains are of considerable importance in vivo. These findings have implications for how KLF proteins might achieve specificity within the KLF family. Different KLF family members are able to recruit different co-factors through their highly divergent N-terminal domains (McConnell and Yang 2010). Given that KLF3 occupancy seems to be in part determined by the binding of CtBP, other KLFs may similarly be affected by the binding of their cofactors.

As such, differential occupancy due to cofactor binding may account for the differential functions of KLF proteins in vivo.

The role of non-DNA binding domains and cofactors in influencing TF occupancy has previously been shown in a number of examples. These include the recruitment of SCL/TAL1 in the absence of a functional DBD (Kassouf et al. 2008; Kassouf et al. 2010), the altered occupancy of GATA-

1 when unable to recruit the cofactor FOG-1 protein (Chlon et al. 2012), the increase in divergent

122

specificities amongst Hox proteins upon binding to the cofactor Exd (Stormo and Zhao 2010;

Lelli et al. 2011; Slattery et al. 2011) and the extension of the binding motif of Cbf1 by the non-

DNA-binding cofactors Met 4 and Met 8 (Siggers et al. 2011).

Given that KLF3 is an archetypal representative of the largest class of DNA binding proteins in mammals, the classical ZF proteins, it might be expected that these phenomena may play an extensive role in determining TF occupancy in vivo. Supporting this notion is the observation that

TFs bind combinatorially in the vast majority of cases and that cis-regulatory modules

(embodying binding sites for multiple TFs) are nearly ubiquitous in regulatory regions (Wang et al. 2012). Such a highly configured relationship between TF binding sites implies some degree of function.

In conclusion, the data in this thesis has led to a deeper understanding of the functions of KLF3.

In particular it is clear that regions of KLF3 outside the ZF domain are critical for proper targeting.

The finding that non-DNA binding domains can affect KLF3 occupancy in such a manner has broader implications for understanding how TFs function.

123

References

Allfrey VG, Faulkner R, Mirsky AE. 1964. Acetylation and Methylation of Histones and Their Possible Role in the Regulation of Rna Synthesis. Proceedings of the National Academy of Sciences of the United States of America 51: 786-794.

Anbanandam A, Albarado DC, Nguyen CT, Halder G, Gao X, Veeraraghavan S. 2006. Insights into transcription enhancer factor 1 (TEF-1) activity from the solution structure of the TEA domain. Proceedings of the National Academy of Sciences of the United States of America 103(46): 17225-17230.

Anderson KP, Kern CB, Crable SC, Lingrel JB. 1995. Isolation of a gene encoding a functional zinc finger protein homologous to erythroid Kruppel-like factor: identification of a new multigene family. Mol Cell Biol 15(11): 5957-5965.

Andrews NC, Faller DV. 1991. A rapid micropreparation technique for extraction of DNA- binding proteins from limiting numbers of mammalian cells. Nucleic acids research 19(9): 2499.

Babu MM, Luscombe NM, Aravind L, Gerstein M, Teichmann SA. 2004. Structure and evolution of transcriptional regulatory networks. Curr Opin Struct Biol 14(3): 283-291.

Bai L, Ondracka A, Cross FR. 2011. Multiple sequence-specific factors generate the nucleosome-depleted region on CLN2 promoter. Mol Cell 42(4): 465-476.

Bailey TL, Elkan C. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings / International Conference on Intelligent Systems for Molecular Biology ; ISMB International Conference on Intelligent Systems for Molecular Biology 2: 28-36.

Bargaje R, Alam MP, Patowary A, Sarkar M, Ali T, Gupta S, Garg M, Singh M, Purkanti R, Scaria V et al. 2012. Proximity of H2A.Z containing nucleosome to the transcription start site influences gene expression levels in the mammalian liver and brain. Nucleic acids research 40(18): 8965-8978.

Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K. 2007. High-resolution profiling of histone methylations in the human genome. Cell 129(4): 823- 837.

Bartel FO, Higuchi T, Spyropoulos DD. 2000. Mouse models in the study of the Ets family of transcription factors. Oncogene 19(55): 6443-6454.

Becskei A, Serrano L. 2000. Engineering stability in gene networks by autoregulation. Nature 405(6786): 590-593.

Bell-Anderson KS, Funnell AP, Williams H, Jusoh HM, Scully T, Lim WF, Burdach JG, Mak KS, Knights AJ, Hoy AJ et al. 2013. Loss of Kruppel-like Factor 3 (KLF3/BKLF) leads to upregulation of the insulin-sensitizing factor adipolin (FAM132A/CTRP12/C1qdc2). Diabetes.

Bell O, Tiwari VK, Thoma NH, Schubeler D. 2011. Determinants and dynamics of genome accessibility. Nature reviews Genetics 12(8): 554-564.

Bernstein BE, Kamal M, Lindblad-Toh K, Bekiranov S, Bailey DK, Huebert DJ, McMahon S, Karlsson EK, Kulbokas EJ, 3rd, Gingeras TR et al. 2005. Genomic maps and comparative analysis of histone modifications in human and mouse. Cell 120(2): 169-181.

124

Blackwood EM, Kadonaga JT. 1998. Going the distance: a current view of enhancer action. Science 281(5373): 60-63.

Brandeis M, Frank D, Keshet I, Siegfried Z, Mendelsohn M, Nemes A, Temper V, Razin A, Cedar H. 1994. Sp1 elements protect a CpG island from de novo methylation. Nature 371(6496): 435-438.

Brayer KJ, Segal DJ. 2008. Keep your fingers off my DNA: Protein-protein interactions mediated by C2H2 zinc finger domains. Cell Biochem Biophys 50(3): 111-131.

Brivanlou AH, Darnell JE, Jr. 2002. Signal transduction and the control of gene expression. Science 295(5556): 813-818.

Brown CE, Lechner T, Howe L, Workman JL. 2000. The many HATs of transcription coactivators. Trends in biochemical sciences 25(1): 15-19.

Burdach J, Funnell AP, Mak KS, Artuz CM, Wienert B, Lim WF, Tan LY, Pearson RC, Crossley M. 2013. Regions outside the DNA-binding domain are critical for proper in vivo specificity of an archetypal zinc finger transcription factor. Nucleic acids research.

Burdach J, O'Connell MR, Mackay JP, Crossley M. 2012. Two-timing zinc finger transcription factors liaising with RNA. Trends in biochemical sciences 37(5): 199-205.

Charoensawan V, Janga SC, Bulyk ML, Babu MM, Teichmann SA. 2012. DNA sequence preferences of transcriptional activators correlate more strongly than repressors with nucleosomes. Mol Cell 47(2): 183-192.

Chatterjee N, Sinha D, Lemma-Dechassa M, Tan S, Shogren-Knaak MA, Bartholomew B. 2011. Histone H3 tail acetylation modulates ATP-dependent remodeling through multiple mechanisms. Nucleic acids research 39(19): 8378-8391.

Chen X, Bieker JJ. 1996. Erythroid Kruppel-like factor (EKLF) contains a multifunctional transcriptional activation domain important for inter- and intramolecular interactions. EMBO J 15(21): 5888-5896.

Chen X, Xu H, Yuan P, Fang F, Huss M, Vega VB, Wong E, Orlov YL, Zhang W, Jiang J et al. 2008. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133(6): 1106-1117.

Chinnadurai G. 2007. Transcriptional regulation by C-terminal binding proteins. The international journal of biochemistry & cell biology 39(9): 1593-1607.

Chlon TM, Dore LC, Crispino JD. 2012. Cofactor-mediated restriction of GATA-1 chromatin occupancy coordinates lineage-specific gene expression. Mol Cell 47(4): 608-621.

Clapier CR, Cairns BR. 2009. The biology of chromatin remodeling complexes. Annual review of biochemistry 78: 273-304.

Conerly ML, Teves SS, Diolaiti D, Ulrich M, Eisenman RN, Henikoff S. 2010. Changes in H2A.Z occupancy and DNA methylation during B-cell lymphomagenesis. Genome Res 20(10): 1383-1390.

Crawford GE, Holt IE, Whittle J, Webb BD, Tai D, Davis S, Margulies EH, Chen Y, Bernat JA, Ginsburg D et al. 2006. Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res 16(1): 123-131.

125

Creyghton MP, Cheng AW, Welstead GG, Kooistra T, Carey BW, Steine EJ, Hanna J, Lodato MA, Frampton GM, Sharp PA et al. 2010. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proceedings of the National Academy of Sciences of the United States of America 107(50): 21931-21936.

Crick FH. 1958. On protein synthesis. Symposia of the Society for Experimental Biology 12: 138-163.

Crossley M, Whitelaw E, Perkins A, Williams G, Fujiwara Y, Orkin SH. 1996. Isolation and characterization of the cDNA encoding BKLF/TEF-2, a major CACCC-box-binding protein in erythroid cells and selected other cells. Mol Cell Biol 16(4): 1695-1705.

Crothers DM. 2013. Biophysics. Fine tuning gene regulation. Science 339(6121): 766-767.

D'Alonzo RC, Selvamurugan N, Karsenty G, Partridge NC. 2002. Physical interaction of the activator protein-1 factors c-Fos and c-Jun with Cbfa1 for collagenase-3 promoter activation. J Biol Chem 277(1): 816-822.

Dang DT, Zhao W, Mahatan CS, Geiman DE, Yang VW. 2002. Opposing effects of Kruppel- like factor 4 (gut-enriched Kruppel-like factor) and Kruppel-like factor 5 (intestinal-enriched Kruppel-like factor) on the promoter of the Kruppel-like factor 4 gene. Nucleic acids research 30(13): 2736-2741.

Dhar SS, Ongwijitwat S, Wong-Riley MT. 2008. Nuclear respiratory factor 1 regulates all ten nuclear-encoded subunits of cytochrome c oxidase in neurons. J Biol Chem 283(6): 3120-3129.

Dion MF, Altschuler SJ, Wu LF, Rando OJ. 2005. Genomic characterization reveals a simple histone H4 acetylation code. Proceedings of the National Academy of Sciences of the United States of America 102(15): 5501-5506.

Dolfini D, Zambelli F, Pavesi G, Mantovani R. 2009. A perspective of promoter architecture from the CCAAT box. Cell cycle 8(24): 4127-4137.

Drew HR, Travers AA. 1985. DNA bending and its relation to nucleosome positioning. J Mol Biol 186(4): 773-790.

Dumic J, Dabelic S, Flogel M. 2006. Galectin-3: an open-ended story. Biochim Biophys Acta 1760(4): 616-635.

Durst KL, Hiebert SW. 2004. Role of RUNX family members in transcriptional repression and gene silencing. Oncogene 23(24): 4220-4224.

Eaton SA, Funnell AP, Sue N, Nicholas H, Pearson RC, Crossley M. 2008. A network of Kruppel-like Factors (Klfs). Klf8 is repressed by Klf3 and activated by Klf1 in vivo. J Biol Chem 283(40): 26937-26947.

Efe JA, Hilcove S, Kim J, Zhou H, Ouyang K, Wang G, Chen J, Ding S. 2011. Conversion of mouse fibroblasts into cardiomyocytes using a direct reprogramming strategy. Nature cell biology 13(3): 215-222.

Encode Project Consortium. 2011. A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS biology 9(4): e1001046.

Enomoto T, Ohashi K, Shibata R, Higuchi A, Maruyama S, Izumiya Y, Walsh K, Murohara T, Ouchi N. 2011. Adipolin/C1qdc2/CTRP12 protein functions as an adipokine that improves glucose metabolism. J Biol Chem 286(40): 34552-34558.

126

Enomoto T, Shibata R, Ohashi K, Kambara T, Kataoka Y, Uemura Y, Yuasa D, Murohara T, Ouchi N. 2012. Regulation of adipolin/CTRP12 cleavage by obesity. Biochemical and biophysical research communications 428(1): 155-159.

Fernandez-Zapico ME, Mladek A, Ellenrieder V, Folch-Puy E, Miller L, Urrutia R. 2003. An mSin3A interaction domain links the transcriptional activity of KLF11 with its role in growth regulation. EMBO J 22(18): 4748-4758.

Field Y, Kaplan N, Fondufe-Mittendorf Y, Moore IK, Sharon E, Lubling Y, Widom J, Segal E. 2008. Distinct modes of regulation by chromatin encoded through nucleosome positioning signals. PLoS computational biology 4(11): e1000216.

Fields S, Song O. 1989. A novel genetic system to detect protein-protein interactions. Nature 340(6230): 245-246.

Fujiwara T, O'Geen H, Keles S, Blahnik K, Linnemann AK, Kang YA, Choi K, Farnham PJ, Bresnick EH. 2009. Discovering hematopoietic mechanisms through genome-wide analysis of GATA factor chromatin occupancy. Mol Cell 36(4): 667-681.

Funnell AP, Mak KS, Twine NA, Pelka GJ, Norton LJ, Radziewic T, Power M, Wilkins MR, Bell-Anderson KS, Fraser ST et al. 2013. Generation of mice deficient in both KLF3/BKLF and KLF8 reveals a genetic interaction and a role for these factors in embryonic globin gene silencing. Mol Cell Biol 33(15): 2976-2987.

Funnell AP, Maloney CA, Thompson LJ, Keys J, Tallack M, Perkins AC, Crossley M. 2007. Erythroid Kruppel-like factor directly activates the basic Kruppel-like factor gene in erythroid cells. Mol Cell Biol 27(7): 2777-2790.

Funnell AP, Norton LJ, Mak KS, Burdach J, Artuz CM, Twine NA, Wilkins MR, Power CA, Hung TT, Perdomo J et al. 2012. The CACCC-binding protein KLF3/BKLF represses a subset of KLF1/EKLF target genes and is required for proper erythroid maturation in vivo. Mol Cell Biol 32(16): 3281-3292.

Gangaraju VK, Bartholomew B. 2007. Mechanisms of ATP dependent chromatin remodeling. Mutation research 618(1-2): 3-17.

Guillemette B, Gaudreau L. 2006. Reuniting the contrasting functions of H2A.Z. Biochem Cell Biol 84(4): 528-535.

Hancock D, Funnell A, Jack B, Johnston J. 2010. Introducing undergraduate students to real- time PCR. Biochemistry and molecular biology education : a bimonthly publication of the International Union of Biochemistry and Molecular Biology 38(5): 309-316.

Hathaway NA, Bell O, Hodges C, Miller EL, Neel DS, Crabtree GR. 2012. Dynamics and memory of heterochromatin in living cells. Cell 149(7): 1447-1460.

He M, Han M, Zheng B, Shu YN, Wen JK. 2009. Angiotensin II stimulates KLF5 phosphorylation and its interaction with c-Jun leading to suppression of p21 expression in vascular smooth muscle cells. J Biochem 146(5): 683-691.

Heintzman ND, Hon GC, Hawkins RD, Kheradpour P, Stark A, Harp LF, Ye Z, Lee LK, Stuart RK, Ching CW et al. 2009. Histone modifications at human enhancers reflect global cell-type- specific gene expression. Nature 459(7243): 108-112.

127

Heintzman ND, Stuart RK, Hon G, Fu Y, Ching CW, Hawkins RD, Barrera LO, Van Calcar S, Qu C, Ching KA et al. 2007. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat Genet 39(3): 311-318.

Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK. 2010. Simple combinations of lineage-determining transcription factors prime cis- regulatory elements required for macrophage and B cell identities. Mol Cell 38(4): 576-589.

Hess J, Porte D, Munz C, Angel P. 2001. AP-1 and Cbfa/runt physically interact and regulate parathyroid hormone-dependent MMP13 expression in osteoblasts through a new osteoblast- specific element 2/AP-1 composite element. J Biol Chem 276(23): 20029-20038.

Hollenhorst PC, McIntosh LP, Graves BJ. 2011. Genomic and biochemical insights into the specificity of ETS transcription factors. Annual review of biochemistry 80: 437-471.

Hong L, Schroth GP, Matthews HR, Yau P, Bradbury EM. 1993. Studies of the DNA binding properties of histone H4 amino terminus. Thermal denaturation studies reveal that acetylation markedly reduces the binding constant of the H4 "tail" to DNA. J Biol Chem 268(1): 305-314.

Horak CE, Mahajan MC, Luscombe NM, Gerstein M, Weissman SM, Snyder M. 2002. GATA- 1 binding sites mapped in the beta-globin locus by using mammalian chIp-chip analysis. Proceedings of the National Academy of Sciences of the United States of America 99(5): 2924- 2929.

Hulsen T, de Vlieg J, Alkema W. 2008. BioVenn - a web application for the comparison and visualization of biological lists using area-proportional Venn diagrams. BMC genomics 9: 488.

Jack BH, Pearson RC, Crossley M. 2011. C-terminal binding protein: A metabolic sensor implicated in regulating adipogenesis. The international journal of biochemistry & cell biology 43(5): 693-696.

Jin C, Zang C, Wei G, Cui K, Peng W, Zhao K, Felsenfeld G. 2009. H3.3/H2A.Z double variant-containing nucleosomes mark 'nucleosome-free regions' of active promoters and other regulatory regions. Nat Genet 41(8): 941-945.

Johnson DS, Mortazavi A, Myers RM, Wold B. 2007. Genome-wide mapping of in vivo protein-DNA interactions. Science 316(5830): 1497-1502.

Jones PA. 2012. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nature reviews Genetics 13(7): 484-492.

Kaczynski J, Cook T, Urrutia R. 2003. Sp1- and Kruppel-like transcription factors. Genome biology 4(2): 206.

Kaczynski J, Zhang JS, Ellenrieder V, Conley A, Duenes T, Kester H, van Der Burg B, Urrutia R. 2001. The Sp1-like protein BTEB3 inhibits transcription via the basic transcription element box by interacting with mSin3A and HDAC-1 co-repressors and competing with Sp1. J Biol Chem 276(39): 36749-36756.

Kassouf MT, Chagraoui H, Vyas P, Porcher C. 2008. Differential use of SCL/TAL-1 DNA- binding domain in developmental hematopoiesis. Blood 112(4): 1056-1067.

Kassouf MT, Hughes JR, Taylor S, McGowan SJ, Soneji S, Green AL, Vyas P, Porcher C. 2010. Genome-wide identification of TAL1's functional targets: insights into its mechanisms of action in primary erythroid cells. Genome Res 20(8): 1064-1083.

128

Kelly TK, Miranda TB, Liang G, Berman BP, Lin JC, Tanay A, Jones PA. 2010. H2A.Z maintenance during mitosis reveals nucleosome shifting on mitotically silenced genes. Mol Cell 39(6): 901-911.

Kiełbasa SM, Vingron M. 2008. Transcriptional Autoregulatory Loops Are Highly Conserved in Vertebrate Evolution. PloS one 3(9): e3210.

Kim J, Efe JA, Zhu S, Talantova M, Yuan X, Wang S, Lipton SA, Zhang K, Ding S. 2011. Direct reprogramming of mouse fibroblasts to neural progenitors. Proceedings of the National Academy of Sciences of the United States of America 108(19): 7838-7843.

Kim S, Brostromer E, Xing D, Jin J, Chong S, Ge H, Wang S, Gu C, Yang L, Gao YQ et al. 2013. Probing allostery through DNA. Science 339(6121): 816-819.

Klug A. 2010. The discovery of zinc fingers and their development for practical applications in gene regulation and genome manipulation. Quarterly reviews of biophysics 43(1): 1-21.

Ko M, Sohn DH, Chung H, Seong RH. 2008. Chromatin remodeling, development and disease. Mutation research 647(1-2): 59-67.

Kolodziej KE, Pourfarzad F, de Boer E, Krpic S, Grosveld F, Strouboulis J. 2009. Optimal use of tandem biotin and V5 tags in ChIP assays. BMC molecular biology 10: 6.

Kommagani R, Whitlatch A, Leonard MK, Kadakia MP. 2010. p73 is essential for vitamin D- mediated osteoblastic differentiation. Cell death and differentiation 17(3): 398-407.

Kumar-Sinha C, Tomlins SA, Chinnaiyan AM. 2008. Recurrent gene fusions in prostate cancer. Nature reviews Cancer 8(7): 497-511.

Laherty CD, Yang WM, Sun JM, Davie JR, Seto E, Eisenman RN. 1997. Histone deacetylases associated with the mSin3 corepressor mediate mad transcriptional repression. Cell 89(3): 349- 356.

Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nature methods 9(4): 357-359.

Latchman DS. 1997. Transcription factors: an overview. The international journal of biochemistry & cell biology 29(12): 1305-1312.

Lauberth SM, Nakayama T, Wu X, Ferris AL, Tang Z, Hughes SH, Roeder RG. 2013. H3K4me3 Interactions with TAF3 Regulate Preinitiation Complex Assembly and Selective Gene Activation. Cell 152(5): 1021-1036.

Lee DY, Hayes JJ, Pruss D, Wolffe AP. 1993. A positive role for histone acetylation in transcription factor access to nucleosomal DNA. Cell 72(1): 73-84.

Lee MS, Gippert GP, Soman KV, Case DA, Wright PE. 1989. Three-dimensional solution structure of a single zinc finger DNA-binding domain. Science 245(4918): 635-637.

Lee TI, Young RA. 2000. Transcription of eukaryotic protein-coding genes. Annual review of genetics 34: 77-137.

Lelli KM, Noro B, Mann RS. 2011. Variable motif utilization in homeotic selector (Hox)- cofactor complex formation controls specificity. Proceedings of the National Academy of Sciences of the United States of America 108(52): 21122-21127.

129

Lettice LA, Heaney SJ, Purdie LA, Li L, de Beer P, Oostra BA, Goode D, Elgar G, Hill RE, de Graaff E. 2003. A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum Mol Genet 12(14): 1725-1735.

Levine M, Tjian R. 2003. Transcription regulation and animal diversity. Nature 424(6945): 147- 151.

Li G, Levitus M, Bustamante C, Widom J. 2005. Rapid spontaneous accessibility of nucleosomal DNA. Nature structural & molecular biology 12(1): 46-53.

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16): 2078-2079.

Lin YC, Jhunjhunwala S, Benner C, Heinz S, Welinder E, Mansson R, Sigvardsson M, Hagman J, Espinoza CA, Dutkowski J et al. 2010. A global network of transcription factors, involving E2A, EBF1 and Foxo1, that orchestrates B cell fate. Nature immunology 11(7): 635-643.

Little JW. 1993. LexA cleavage and other self-processing reactions. Journal of bacteriology 175(16): 4943-4950.

Liu G, Zheng H, Ai W. 2009. C-terminal binding proteins (CtBPs) attenuate KLF4-mediated transcriptional activation. FEBS letters 583(19): 3127-3132.

Liu Y, Wen JK, Dong LH, Zheng B, Han M. 2010. Kruppel-like factor (KLF) 5 mediates cyclin D1 expression and cell proliferation via interaction with c-Jun in Ang II-induced VSMCs. Acta pharmacologica Sinica 31(1): 10-18.

Lorch Y, Davis B, Kornberg RD. 2005. Chromatin remodeling by DNA bending, not twisting. Proceedings of the National Academy of Sciences of the United States of America 102(5): 1329- 1332.

Luger K, Mader AW, Richmond RK, Sargent DF, Richmond TJ. 1997. Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 389(6648): 251-260.

Lukjancenko O, Wassenaar TM, Ussery DW. 2010. Comparison of 61 sequenced Escherichia coli genomes. Microbial ecology 60(4): 708-720.

Macleod D, Charlton J, Mullins J, Bird AP. 1994. Sp1 sites in the mouse aprt gene promoter are required to prevent methylation of the CpG island. Genes & development 8(19): 2282-2292.

Mandell JG, Barbas CF. 2006a. Zinc Finger Tools: custom DNA-binding domains for transcription factors and nucleases. Nucleic acids research 34(suppl 2): W516-W523.

Mandell JG, Barbas CF, 3rd. 2006b. Zinc Finger Tools: custom DNA-binding domains for transcription factors and nucleases. Nucleic acids research 34(Web Server issue): W516-523.

Mannervik M, Nibu Y, Zhang H, Levine M. 1999. Transcriptional coregulators in development. Science 284(5414): 606-609.

Maston GA, Evans SK, Green MR. 2006. Transcriptional regulatory elements in the human genome. Annual review of genomics and human genetics 7: 29-59.

McConnell BB, Yang VW. 2010. Mammalian Kruppel-like factors in health and diseases. Physiological reviews 90(4): 1337-1381.

130

Miller IJ, Bieker JJ. 1993. A novel, erythroid cell-specific murine transcription factor that binds to the CACCC element and is related to the Kruppel family of nuclear proteins. Mol Cell Biol 13(5): 2776-2786.

Mills AA, Zheng B, Wang XJ, Vogel H, Roop DR, Bradley A. 1999. p63 is a p53 homologue required for limb and epidermal morphogenesis. Nature 398(6729): 708-713.

Morton NE. 1991. Parameters of the human genome. Proceedings of the National Academy of Sciences of the United States of America 88(17): 7474-7476.

Mysickova A, Vingron M. 2012. Detection of interacting transcription factors in human tissues using predicted DNA binding affinity. BMC genomics 13 Suppl 1: S2.

Nagashima T, Hayashi F, Umehara T, Yokoyama S. 2009. Molecular Structures of Krüppel-like Factors. In The Biology of Krüppel-like Factors, (ed. R Nagai, S Friedman, M Kasuga), pp. 21- 31. Springer Japan.

Nardini M, Valente C, Ricagno S, Luini A, Corda D, Bolognesi M. 2009. CtBP1/BARS Gly172-->Glu mutant structure: impairing NAD(H)-binding and dimerization. Biochemical and biophysical research communications 381(1): 70-74.

Nelson HC, Finch JT, Luisi BF, Klug A. 1987. The structure of an oligo(dA).oligo(dT) tract and its biological implications. Nature 330(6145): 221-226.

Neph S, Stergachis AB, Reynolds A, Sandstrom R, Borenstein E, Stamatoyannopoulos JA. 2012. Circuitry and dynamics of human transcription factor regulatory networks. Cell 150(6): 1274-1286.

Nibu Y, Levine MS. 2001. CtBP-dependent activities of the short-range Giant repressor in the Drosophila embryo. Proceedings of the National Academy of Sciences of the United States of America 98(11): 6204-6208.

Nibu Y, Zhang H, Levine M. 1998. Interaction of short-range repressors with Drosophila CtBP in the embryo. Science 280(5360): 101-104.

Nikolov DB, Burley SK. 1997. RNA polymerase II transcription initiation: a structural view. Proceedings of the National Academy of Sciences of the United States of America 94(1): 15-22.

Oka S, Shiraishi Y, Yoshida T, Ohkubo T, Sugiura Y, Kobayashi Y. 2004. NMR structure of transcription factor Sp1 DNA binding domain. Biochemistry 43(51): 16027-16035.

Ooi SK, Qiu C, Bernstein E, Li K, Jia D, Yang Z, Erdjument-Bromage H, Tempst P, Lin SP, Allis CD et al. 2007. DNMT3L connects unmethylated lysine 4 of histone H3 to de novo methylation of DNA. Nature 448(7154): 714-717.

Pang J, Rhodes DH, Pini M, Akasheh RT, Castellanos KJ, Cabay RJ, Cooper D, Perretti M, Fantuzzi G. 2013. Increased Adiposity, Dysregulated Glucose Metabolism and Systemic Inflammation in Galectin-3 KO Mice. PloS one 8(2): e57915.

Papworth M, Kolasinska P, Minczuk M. 2006. Designer zinc-finger proteins and their applications. Gene 366(1): 27-38.

Pavletich NP, Pabo CO. 1991. Zinc finger-DNA recognition: crystal structure of a Zif268-DNA complex at 2.1 A. Science 252(5007): 809-817.

Payre F, Vincent A. 1988. Finger proteins and DNA-specific recognition: distinct patterns of conserved amino acids suggest different evolutionary modes. FEBS letters 234(2): 245-250.

131

Pearson JC, Lemons D, McGinnis W. 2005. Modulating Hox gene functions during animal body patterning. Nature reviews Genetics 6(12): 893-904.

Pearson R, Fleetwood J, Eaton S, Crossley M, Bao S. 2008. Kruppel-like transcription factors: a functional family. The international journal of biochemistry & cell biology 40(10): 1996-2001.

Pearson RC, Funnell AP, Crossley M. 2011. The mammalian zinc finger transcription factor Kruppel-like factor 3 (KLF3/BKLF). IUBMB Life 63(2): 86-93.

Pejnovic N, Pantic J, Jovanovic I, Radosavljevic G, Milovanovic M, Nikolic I, Zdravkovic N, Djukic A, Arsenijevic N, Lukic M. 2013. Galectin-3 Deficiency Accelerates High-Fat Diet Induced Obesity and Amplifies Inflammation in Adipose Tissue and Pancreatic Islets. Diabetes.

Peterson CL, Logie C. 2000. Recruitment of chromatin remodeling machines. Journal of cellular biochemistry 78(2): 179-185.

Peterson CL, Workman JL. 2000. Promoter targeting and chromatin remodeling by the SWI/SNF complex. Current opinion in genetics & development 10(2): 187-192.

Phatnani HP, Greenleaf AL. 2006. Phosphorylation and functions of the RNA polymerase II CTD. Genes & development 20(21): 2922-2936.

Pilon AM, Ajay SS, Kumar SA, Steiner LA, Cherukuri PF, Wincovitch S, Anderson SM, Mullikin JC, Gallagher PG, Hardison RC et al. 2011. Genome-wide ChIP-Seq reveals a dramatic shift in the binding of the transcription factor erythroid Kruppel-like factor during erythrocyte differentiation. Blood 118(17): e139-148.

Porcher C, Liao EC, Fujiwara Y, Zon LI, Orkin SH. 1999. Specification of hematopoietic and vascular development by the bHLH transcription factor SCL without direct DNA binding. Development 126(20): 4603-4615.

Rada-Iglesias A, Bajpai R, Swigut T, Brugmann SA, Flynn RA, Wysocka J. 2011. A unique chromatin signature uncovers early developmental enhancers in humans. Nature 470(7333): 279-283.

Radosavljevic G, Volarevic V, Jovanovic I, Milovanovic M, Pejnovic N, Arsenijevic N, Hsu DK, Lukic ML. 2012. The roles of Galectin-3 in autoimmunity and tumor progression. Immunologic research 52(1-2): 100-110.

Ragab A, Travers A. 2003. HMG-D and histone H1 alter the local accessibility of nucleosomal DNA. Nucleic acids research 31(24): 7083-7089.

Rhodes D, Klug A. 1993. Zinc fingers. Scientific American 268(2): 56-59, 62-55.

Roder K, Wolf SS, Larkin KJ, Schweizer M. 1999. Interaction between the two ubiquitously expressed transcription factors NF-Y and Sp1. Gene 234(1): 61-69.

Rodriguez-Agudo D, Calderon-Dominguez M, Ren S, Marques D, Redford K, Medina-Torres MA, Hylemon P, Gil G, Pandak WM. 2011. Subcellular localization and regulation of StarD4 protein in macrophages and fibroblasts. Biochim Biophys Acta 1811(10): 597-606.

Rodriguez-Caso C, Medina MA, Sole RV. 2005. Topology, tinkering and evolution of the human transcription factor network. The FEBS journal 272(24): 6423-6434.

Roeder RG. 1996. The role of general initiation factors in transcription by RNA polymerase II. Trends in biochemical sciences 21(9): 327-335.

132

Saha A, Wittmeyer J, Cairns BR. 2002. Chromatin remodeling by RSC involves ATP- dependent DNA translocation. Genes & development 16(16): 2120-2134.

Sambrook J, Fritsch EF, Maniatis T. 1989. Molecular Cloning, A Laboratory Manual. Cold Spring Harbour Laboratory Press.

Schmidt D, Wilson MD, Spyrou C, Brown GD, Hadfield J, Odom DT. 2009. ChIP-seq: using high-throughput sequencing to discover protein-DNA interactions. Methods 48(3): 240-248.

Schober M, Rebay I, Perrimon N. 2005. Function of the ETS transcription factor Yan in border cell migration. Development 132(15): 3493-3504.

Schoenfelder S, Clay I, Fraser P. 2010. The transcriptional interactome: gene expression in 3D. Current opinion in genetics & development 20(2): 127-133.

Schuetz A, Nana D, Rose C, Zocher G, Milanovic M, Koenigsmann J, Blasig R, Heinemann U, Carstanjen D. 2011. The structure of the Klf4 DNA-binding domain links to self-renewal and macrophage differentiation. Cellular and molecular life sciences : CMLS 68(18): 3121-3131.

Schuierer M, Hilger-Eversheim K, Dobner T, Bosserhoff AK, Moser M, Turner J, Crossley M, Buettner R. 2001. Induction of AP-2alpha expression by adenoviral infection involves inactivation of the AP-2rep transcriptional corepressor CtBP1. J Biol Chem 276(30): 27944- 27949.

Sharrocks AD. 2001. The ETS-domain transcription factor family. Nature reviews Molecular cell biology 2(11): 827-837.

Shen Y, Yue F, McCleary DF, Ye Z, Edsall L, Kuan S, Wagner U, Dixon J, Lee L, Lobanenkov VV et al. 2012. A map of the cis-regulatory sequences in the mouse genome. Nature 488(7409): 116-120.

Shrader TE, Crothers DM. 1989. Artificial nucleosome positioning sequences. Proceedings of the National Academy of Sciences of the United States of America 86(19): 7418-7422.

Siatecka M, Bieker JJ. 2011. The multifunctional role of EKLF/KLF1 during erythropoiesis. Blood 118(8): 2044-2054.

Sif S, Saurin AJ, Imbalzano AN, Kingston RE. 2001. Purification and characterization of mSin3A-containing Brg1 and hBrm chromatin remodeling complexes. Genes & development 15(5): 603-618.

Siggers T, Duyzend MH, Reddy J, Khan S, Bulyk ML. 2011. Non-DNA-binding cofactors enhance DNA-binding specificity of a transcriptional regulatory complex. Molecular systems biology 7: 555.

Slattery M, Riley T, Liu P, Abe N, Gomez-Alcala P, Dror I, Zhou T, Rohs R, Honig B, Bussemaker HJ et al. 2011. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147(6): 1270-1282.

Slavin DA, Koritschoner NP, Prieto CC, Lopez-Diaz FJ, Chatton B, Bocco JL. 2004. A new role for the Kruppel-like transcription factor KLF6 as an inhibitor of c-Jun proto-oncoprotein function. Oncogene 23(50): 8196-8205.

Smith ZD, Meissner A. 2013. DNA methylation: roles in mammalian development. Nature reviews Genetics 14(3): 204-220.

133

Stadler MB, Murr R, Burger L, Ivanek R, Lienert F, Scholer A, van Nimwegen E, Wirbelauer C, Oakeley EJ, Gaidatzis D et al. 2011. DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature 480(7378): 490-495.

Stewart AJ, Plotkin JB. 2012. Why transcription factor binding sites are ten nucleotides long. Genetics 192(3): 973-985.

Stormo GD, Zhao Y. 2010. Determining the specificity of protein-DNA interactions. Nature reviews Genetics 11(11): 751-760.

Stros M. 2010. HMGB proteins: interactions with DNA and chromatin. Biochim Biophys Acta 1799(1-2): 101-113.

Struhl K, Segal E. 2013. Determinants of nucleosome positioning. Nature structural & molecular biology 20(3): 267-273.

Su MY, Steiner LA, Bogardus H, Mishra T, Schulz VP, Hardison RC, Gallagher PG. 2013. Identification of biologically relevant enhancers in human erythroid cells. J Biol Chem 288(12): 8433-8444.

Sudarsanam P, Winston F. 2000. The Swi/Snf family nucleosome-remodeling complexes and transcriptional control. Trends in genetics : TIG 16(8): 345-351.

Sue N, Jack BH, Eaton SA, Pearson RC, Funnell AP, Turner J, Czolij R, Denyer G, Bao S, Molero-Navajas JC et al. 2008. Targeted disruption of the basic Kruppel-like factor gene (Klf3) reveals a role in adipogenesis. Mol Cell Biol 28(12): 3967-3978.

Suh EK, Yang A, Kettenbach A, Bamberger C, Michaelis AH, Zhu Z, Elvin JA, Bronson RT, Crum CP, McKeon F. 2006. p63 protects the female germ line during meiotic arrest. Nature 444(7119): 624-628.

Suske G, Bruford E, Philipsen S. 2005. Mammalian SP/KLF transcription factors: bring in the family. Genomics 85(5): 551-556.

Suter B, Schnappauf G, Thoma F. 2000. Poly(dA.dT) sequences exist as rigid DNA structures in nucleosome-free yeast promoters in vivo. Nucleic acids research 28(21): 4083-4089.

Szabo E, Rampalli S, Risueno RM, Schnerch A, Mitchell R, Fiebig-Comyn A, Levadoux- Martin M, Bhatia M. 2010. Direct conversion of human fibroblasts to multilineage blood progenitors. Nature 468(7323): 521-526.

Tadepally HD, Burger G, Aubry M. 2008. Evolution of C2H2-zinc finger genes and subfamilies in mammals: species-specific duplication and loss of clusters, genes and effector domains. BMC evolutionary biology 8: 176.

Takahashi K, Yamanaka S. 2006. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126(4): 663-676.

Tallack MR, Magor GW, Dartigues B, Sun L, Huang S, Fittock JM, Fry SV, Glazov EA, Bailey TL, Perkins AC. 2012. Novel roles for KLF1 in erythropoiesis revealed by mRNA-seq. Genome Res 22(12): 2385-2398.

Tallack MR, Whitington T, Yuen WS, Wainwright EN, Keys JR, Gardiner BB, Nourbakhsh E, Cloonan N, Grimmond SM, Bailey TL et al. 2010. A global role for KLF1 in erythropoiesis revealed by ChIP-seq in primary erythroid cells. Genome Res 20(8): 1052-1063.

Tan LY. unpublished results.

134

Terrados G, Finkernagel F, Stielow B, Sadic D, Neubert J, Herdt O, Krause M, Scharfe M, Jarek M, Suske G. 2012. Genome-wide localization and expression profiling establish Sp2 as a sequence-specific transcription factor regulating vitally important genes. Nucleic acids research 40(16): 7844-7857.

Thorvaldsdottir H, Robinson JT, Mesirov JP. 2012. Integrative Genomics Viewer (IGV): high- performance genomics data visualization and exploration. Briefings in bioinformatics.

Tirosh I, Barkai N. 2008. Two strategies for gene regulation by promoter nucleosomes. Genome Res 18(7): 1084-1091.

Travers AA. 2003. Priming the nucleosome: a role for HMGB proteins? EMBO Rep 4(2): 131- 136.

Travers AA, Ner SS, Churchill ME. 1994. DNA chaperones: a solution to a persistence problem? Cell 77(2): 167-169.

Trifonov EN, Sussman JL. 1980. The pitch of chromatin DNA is reflected in its nucleotide sequence. Proceedings of the National Academy of Sciences of the United States of America 77(7): 3816-3820.

Tsang AP, Visvader JE, Turner CA, Fujiwara Y, Yu C, Weiss MJ, Crossley M, Orkin SH. 1997. FOG, a multitype zinc finger protein, acts as a cofactor for transcription factor GATA-1 in erythroid and megakaryocytic differentiation. Cell 90(1): 109-119.

Turchinovich G, Vu TT, Frommer F, Kranich J, Schmid S, Alles M, Loubert JB, Goulet JP, Zimber-Strobl U, Schneider P et al. 2011. Programming of marginal zone B-cell fate by basic Kruppel-like factor (BKLF/KLF3). Blood 117(14): 3780-3792.

Turner J, Crossley M. 1998. Cloning and characterization of mCtBP2, a co-repressor that associates with basic Kruppel-like factor and other mammalian transcriptional regulators. EMBO J 17(17): 5129-5140.

Turner J, Nicholas H, Bishop D, Matthews JM, Crossley M. 2003. The LIM protein FHL3 binds basic Kruppel-like factor/Kruppel-like factor 3 and its co-repressor C-terminal-binding protein 2. J Biol Chem 278(15): 12786-12795. van den Heuvel S, Dyson NJ. 2008. Conserved functions of the pRB and E2F families. Nature reviews Molecular cell biology 9(9): 713-724. van Vliet J, Turner J, Crossley M. 2000. Human Kruppel-like factor 8: a CACCC-box binding protein that associates with CtBP and represses transcription. Nucleic acids research 28(9): 1955-1962.

Vaquerizas JM, Kummerfeld SK, Teichmann SA, Luscombe NM. 2009. A census of human transcription factors: function, expression and evolution. Nature reviews Genetics 10(4): 252- 263.

Vermeulen M, Mulder KW, Denissov S, Pijnappel WW, van Schaik FM, Varier RA, Baltissen MP, Stunnenberg HG, Mann M, Timmers HT. 2007. Selective anchoring of TFIID to nucleosomes by trimethylation of histone H3 lysine 4. Cell 131(1): 58-69.

Vettese-Dadey M, Grant PA, Hebbes TR, Crane- Robinson C, Allis CD, Workman JL. 1996. Acetylation of histone H4 plays a primary role in enhancing transcription factor binding to nucleosomal DNA in vitro. EMBO J 15(10): 2508-2518.

135

Vignali M, Hassan AH, Neely KE, Workman JL. 2000. ATP-dependent chromatin-remodeling complexes. Mol Cell Biol 20(6): 1899-1910.

Vrieseling E, Arber S. 2006. Target-induced transcriptional control of dendritic patterning and connectivity in motor neurons by the ETS gene Pea3. Cell 127(7): 1439-1452.

Vu TT, Gatto D, Turner V, Funnell AP, Mak KS, Norton LJ, Kaplan W, Cowley MJ, Agenes F, Kirberg J et al. 2011. Impaired B cell development in the absence of Kruppel-like factor 3. Journal of immunology 187(10): 5032-5042.

Wade JT, Reppas NB, Church GM, Struhl K. 2005. Genomic analysis of LexA binding reveals the permissive nature of the Escherichia coli genome and identifies unconventional target sites. Genes & development 19(21): 2619-2630.

Wang H, Yang L, Jamaluddin MS, Boyd DD. 2004. The Kruppel-like KLF4 transcription factor, a novel regulator of urokinase receptor expression, drives synthesis of this binding site in colonic crypt luminal surface epithelial cells. J Biol Chem 279(21): 22674-22683.

Wang J, Zhuang J, Iyer S, Lin X, Whitfield TW, Greven MC, Pierce BG, Dong X, Kundaje A, Cheng Y et al. 2012. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res 22(9): 1798-1812.

Wang Z, Zang C, Rosenfeld JA, Schones DE, Barski A, Cuddapah S, Cui K, Roh TY, Peng W, Zhang MQ et al. 2008. Combinatorial patterns of histone acetylations and methylations in the human genome. Nat Genet 40(7): 897-903.

Wasserman WW, Sandelin A. 2004. Applied bioinformatics for the identification of regulatory elements. Nature reviews Genetics 5(4): 276-287.

Wei GH, Badis G, Berger MF, Kivioja T, Palin K, Enge M, Bonke M, Jolma A, Varjosalo M, Gehrke AR et al. 2010. Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo. EMBO J 29(13): 2147-2160.

Wei Z, Lei X, Seldin MM, Wong GW. 2012a. Endopeptidase cleavage generates a functionally distinct isoform of C1q/tumor necrosis factor-related protein-12 (CTRP12) with an altered oligomeric state and signaling specificity. J Biol Chem 287(43): 35804-35814.

Wei Z, Peterson JM, Lei X, Cebotaru L, Wolfgang MJ, Baldeviano GC, Wong GW. 2012b. C1q/TNF-related protein-12 (CTRP12), a novel adipokine that improves insulin sensitivity and glycemic control in mouse models of obesity and diabetes. J Biol Chem 287(13): 10301-10315.

Weirauch MT, Cote A, Norel R, Annala M, Zhao Y, Riley TR, Saez-Rodriguez J, Cokelaer T, Vedenko A, Talukder S et al. 2013. Evaluation of methods for modeling transcription factor sequence specificity. Nature biotechnology 31(2): 126-134.

Wolfe SA, Nekludova L, Pabo CO. 2000. DNA recognition by Cys2His2 zinc finger proteins. Annu Rev Biophys Biomol Struct 29: 183-212.

Wong-Riley MT. 2012. Bigenomic regulation of cytochrome c oxidase in neurons and the tight coupling between neuronal activity and energy metabolism. Advances in experimental medicine and biology 748: 283-304.

Wu C, Wong YC, Elgin SC. 1979. The chromatin structure of specific genes: II. Disruption of chromatin structure during gene activity. Cell 16(4): 807-814.

Wu Z, Zheng S, Yu Q. 2009. The E2F family and the role of E2F1 in apoptosis. The international journal of biochemistry & cell biology 41(12): 2389-2397.

136

Yang A, Walker N, Bronson R, Kaghad M, Oosterwegel M, Bonnin J, Vagner C, Bonnet H, Dikkes P, Sharpe A et al. 2000. p73-deficient mice have neurological, pheromonal and inflammatory defects but lack spontaneous tumours. Nature 404(6773): 99-103.

Yang A, Zhu Z, Kettenbach A, Kapranov P, McKeon F, Gingeras TR, Struhl K. 2010. Genome- wide mapping indicates that p73 and p63 co-occupy target sites and have similar dna-binding profiles in vivo. PloS one 5(7): e11572.

Yang L, Mei Q, Zielinska-Kwiatkowska A, Matsui Y, Blackburn ML, Benedetti D, Krumm AA, Taborsky GJ, Jr., Chansky HA. 2003. An ERG (ets-related gene)-associated histone methyltransferase interacts with histone deacetylases 1/2 and transcription co-repressors mSin3A/B. The Biochemical journal 369(Pt 3): 651-657.

Zentner GE, Henikoff S. 2013. Regulation of nucleosome dynamics by histone modifications. Nature structural & molecular biology 20(3): 259-266.

Zhang B, Chambers KJ, Faller DV, Wang S. 2007. Reprogramming of the SWI/SNF complex for co-activation or co-repression in prohibitin-mediated estrogen receptor regulation. Oncogene 26(50): 7153-7157.

Zhang W, Bieker JJ. 1998. Acetylation and modulation of erythroid Kruppel-like factor (EKLF) activity by interaction with histone acetyltransferases. Proceedings of the National Academy of Sciences of the United States of America 95(17): 9855-9860.

Zhang Y, Moqtaderi Z, Rattner BP, Euskirchen G, Snyder M, Kadonaga JT, Liu XS, Struhl K. 2009. Intrinsic histone-DNA interactions are not the major determinant of nucleosome positions in vivo. Nature structural & molecular biology 16(8): 847-852.

Zhang Y, Moqtaderi Z, Rattner BP, Euskirchen G, Snyder M, Kadonaga JT, Liu XS, Struhl K. 2010. Evidence against a genomic code for nucleosome positioning Reply to [ldquo]Nucleosome sequence preferences influence in vivo nucleosome organization[rdquo]. Nature structural & molecular biology 17(8): 920-923.

137

Appendix

Table A1. Primer sequences

Locus (F/R) Primer sequence 5’  3’

Klf8 -4.5kb exon 1a-F GGTTTCTGAGACCTAACACTTCACACT

Klf8 -4.5kb exon 1a-R CCATTTAGTCATCCAGCGAACAA

Klf8 +33kb exon 1a-F AACCTGGGTGCCTCCTTGTA

Klf8 +33kb exon 1a-R TCATGCCTTTGACTTTAGTGCTTT

Klf8 +0.25kb exon 1a-F CCAGCTCGTGCACACTGAA

Klf8 +0.25kb exon 1a-R GAAGCCTTAACATCAGGAGTGGAA

Lgals3-(-20kb)-F TGGAAAAACACCCGTGCCTCTGA

Lgals3-(-20kb)-R CAGTGCCTACGCCCAGATGACTC

Fam132a-prom-F GATTCGCTTCCCTGGAGGTGTGG

Fam132a-prom-R GCCCAGTCTCTGGTCTCCTCTCT

Stard4-prom-F TCCAGCCACAGCCAATCA

Stard4-prom-R TACTCCCGCTAACACCCCA

138

Table A2. EMSA probe sequences.

Description Oligo sequence 5’  3’

Klf8 sense GGGCCCGCCCCACCCCCTCCT

Klf8 anti-sense AGGAGGGGGTGGGGCGGGCCC

Stard4 sense CCTTCAGCCCCGCCCTGCCCG

Stard4 anti-sense CGGGCAGGGCGGGGCTGAAGG

Stard4 1G>T sense CCTTCATCCCCGCCCTGCCCG

Stard4 1G>T anti-sense CGGGCAGGGCGGGGATGAAGG

Stard4 1G>A sense CCTTCAACCCCGCCCTGCCCG

Stard4 1G>A anti-sense CGGGCAGGGCGGGGTTGAAGG

Stard4 3C>T sense CCTTCAGCTCCGCCCTGCCCG

Stard4 3C>T anti-sense CGGGCAGGGCGGAGCTGAAGG

Stard4 6G>A sense CCTTCAGCCCCACCCTGCCCG

Stard4 6G>A anti-sense CGGGCAGGGTGGGGCTGAAGG

139

Table A3. Genes repressed >2-fold on KLF3 rescue and promoter-bound by KLF3. 1 i e ) ) ac pro m c , x sh i ) i i r f d r t ti a r a h ma ) r zeb carre t 5 ( a l s osp r u g 2 h l ate l ke n p i n) i l i yea - n ( n 2 e ph i i so ace g 12 r ly 1 ( o ote l G membe s e r 8 o 2 ext phos n c p i 1 y conta A i l , lik i f r i ycoprot g n l ke i e a- i n l ase i hom g a h ri - am r f r p e n doma l mber i nd f spec l n i S1 e a ca i s 3 2 u b M dom l , n I ated 2C ription m t i member t) 4 on a , a i i fib T n r , e I in) r ng i y, ase oc 2 at l d 0/SP 3 4b r l i n te esc d ylt doma eas i n ng 1 ase u typ i n e fe D 13 r m l E2 eme (y h n e s a ac 2 l t 6) i 1 a ho - -ass pro y T 1 f - e n 1 e n ng t a g ub i n n) t ast l n F S l i a 0 i r ke yn oc ogenase n i ni r n se conta i ge gene , t r l h i s a A t a ) i (g ng - e l / o r so - atase i 22 i t d i g a Ge n i th t l i g 2 con h m n 1 5 OA a S 39 i (b n i - ( m a de rop i w p n 1 hy in 119 l i i t h i a heta s n e s n 1 i t t i ng, s 3 s ph i n i tor doma conta n m i 2 GDS e 1 sp 25 de ctase i a member ase t o a yn n 426 dy n u i n t ra doma nd i i ac n ( nes i s con n pho i ly i 1 f type associated prote h i ho n i l y 4 Ra i b k ( hol adeny ed prot p a ce d n h n t at co i r - i y ote 83 i Droso t co , l V, l HC r rin g ( t 3 ke co am ate 4 4831426119 u i D52 se e - ne l f quence l l n ac n be l prote H fam i i doma receptor i - in ~en~e o og owt a po - cytok tion g r l o n t r noreceptor c i D og r se e a l i ass eop l G d n, c mye i g l l u cc i , NA ch i non doma act of d i ri i te ac a epea l l ng c c i anhydrase r u ase l r i r sub reon ase cadhe 1 u oc u l , s ( d e a no yce d eDNA eD c i i yc or l t doma q ke ger e r n i A l 10 th t c with ga pro mm n homo / li n n i g asm i n n l lin i no i homo ox hera i e n fin ass 1 oop bon yl og i ly ni t ca e n, i l l A- ydro i rin p ne i ect i t in i ri bo i so derma l u Pase fide t s c c b y G osphat l r h i d1 crosoma r rc l r l l pt top s KEN KEN ac cell g i m i s I I T O - a ed h r y y in in a umor ec 1 sa se se so su d c ca eph cate c co ab a ep cannab OAF m l l y pe p L LA M PRELl P NIP Ra R R A f t z z T ge ik ik e 1 1 2 0 2 b l l 1 19R 19R 4 t2 am l 1 1 1 1 1 1 d 0a 32a i t s3 d a l 4 14b 52 i c3 l r st3 a cb p2 r n l l 1 x r13 1 1ip g dh2 ec12 na1 i pt ri p rp q l tk39 i 25a24 d e emp1 Oaf hd Gsn i tnn T f dhhc Cn Sg Ass c S C Lo S Pq Ef Lass Ca l n DockS Mg N Tpd Za Cpeb2 Lga am La Pre Rassf5 Se Pmp22 Atp Agpa Ppap2c Z C M E Cc Ab S F Ge 4831426 4831426 1 1 4 2 9 0 5 5 0 7 205 1 1 2 90 4 4 1 2699 q 1 6431 66635 8 45953 46325 18 6866 8885 9728 86 875 8761 58 0107 7726 7780 4495 1 1 1 1 0 04 26058 304 72574 78395 75937 1 1 1 1 1 1 460 46090 46 2 Se 1 1 1 102491 1 1 1040399 1042699 1 f 1 1 1 1 1 1 1 17 17 0 0 0 0 0 024223 0 02612 029631 021507 025569 02621 0 0 029942 007494 00 00 00 e ______00 00 00 00 00 00 00 00 00 R ______NM_172685 NM NM NM NM NM NM NM NM NM NM NM NM NM NM NM NM NM NM NM NM NM NM NM NM_009412 NM NM NM NM NM M M M M M M M M M N N N N N N N N N 1 1 5 5 0 5 5 ue 23 23 41 04 03 166 1 146 1 18 267 289 338 5 085 0003 0051 0092 0080 085 0111 0004 0313 0060 0208 0338 0232 006 ...... 0.0 0 0.0286 0.000 0 0.0 0.0086 0 0.007 0.0016 0 0 0 0.003 0 0.00 0.0000 0.0650 0 0 0.0081 0.0010 0.0 0.0 0.0156 0.0087 0.00 0 0.0 0.0 0 0.0 0.0067 0.0 0.0 0 0.0 0 0.008 p-val e ssion 1 1 1 1 2 2 5 3 9 5 ~e 26 20 62 65 88 59 54 54 5 00 5 06 93 06 92 74 77 4 .1 .1 .1 .1 .3 . . . . .61 . . . . .4 .4 .39 . .31 ...... old 1 . 2.90 2 2.87 2 2 2 2 2 2 2 2 2 2.42 2.39 2 2.39 2.37 2.63 2 2 2.43 6.29 3 3 5.34 5 3.87 3 5 3 3 5 3 F xpr chang E nk 1 4 2 6 3 3 3 0 5 8 2 9 0 1 1 2 7 5 8 3 9 3 7 4 1 1 20 21 22 23 25 26 27 28 29 24 31 32 33 35 36 37 38 39 3 34 1 1 1 1 1 1 1 17 Ra to 7 1 4 6 2 0 0 1 1 9 07 54 08 53 77 9 SS 38 77 48 40 20 23 62 59 22 240 29 393 358 306 345 365 1 1 1 1 1 1 1 1 208 277 882 665 937 - -42 - - -41 - - -61 -30 - - -4 -21 -40 -55 -89 T ------Dist. ) 2 5 1 1 9 7 3 57 ak . . 1 23 5 52 0 37 54 92 35 63 85 69 67 38 59 86 95 88 33 99 58 83 88 69 58 85 38 86 09 ight ...... 47 . .9 . . 8 9 e 7 9.68 8 9. 8 8 8 9 8 8 8 8 8 8.48 9.09 9 8 9 9 9 9 8 8 8.96 9.05 8 9 7 7 7 7 7 7 7 7 7 Pe 10. (log h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + trand S 4 2 5 3 3 5 5 6 6 6 3 1 1 14 1 1 1 69 50 0 596 750 162 1 197 192 1 494 240 272 552 586 00 885 858 344 8 3 5 584 051 827 739 799 7 773 786 793 , , ,4 , , , , , , , , , , , , , , , , , ,0 , ,160 , , , , , ,00 ,40 1, 2 6 5 7 1 1 1 1, 1, 8 information 1 1 12 4 07 39 8 nd 707 14 460 493 0 0 509 624 993 996 378 945 59 92 00 325 752 753,8 ,1 ,039,22 , ,878 , , , ,0 , , , , , , , ,64 , , , , E ak 1, 1, 4 4 4 0,294,655 9 9 8 6,248,584 6,248 7, 55,336 56 3 58,684 34 7 73 78 1 1 1 14 43 42,6 47 40 42 4 43,0 43,664 28 22 26 35,1 68,310 65,913,694 68,55 69,324,43 62 84,257 06,35 33 1 1 1 1 1 1 10 10 10 P~e 6 6 5 1 0 0 9 16 27 84 62 84 5 5 52 9 85 4 255 294 605 603 65 914 316 034 094 944 399 3 38 00 393 373 872 339 712 797 713 792 760 ,1 ,1 ,1 ,1 ,1 , , , , , ,0 , , , , , , , , , , , ,4 , ,7 , , , , , 1, 1 1,66 1, 4 6 0 0 0 0,186 3 7,840 7 59 1 1 1 1 1 4 25 0 83,403 89,083 07 84 24 248 25 378 64 508 687 993 038 3 336 9 551,458 324 878 047 325 0 004 94 684 35 59 752 753,416 7 tart 493 ,1 ,1 ,1 , ,0 , , , , , , ,6 , , , , , , ,4 , , ,0 , , , , , , , , , , S 4 4 2 6 9 6 3 7 1 14, 14, 10, 294 28 26 68 65 68 31 8 62 0 58 56 35 89 06 34 08,9 0 74,038,825 78,996 73 43,624 41 42 47 43 40 4 4 1 1 1 133 155 169 122 1 1 4 4 2 2 2 2 6 9 8 0 9 8 8 1 1 4 4 2 2 2 2 2 8 8 5 5 5 3 3 1 9 1 3 3 3 7 X 1 1 1 1 1 1 1 1 1 1 1 1 1 1 10 r r r r r r r r r r r r4 r r r9 r r r r r r r r r r r r r r r r r r r r r r r h h h h h h h h Chr c chr c ch ch ch ch ch c c ch ch ch ch ch ch ch ch ch c c c c ch ch ch ch ch ch ch ch ch ch ch ch ch ch ch ch

140

Table A3. Genes repressed >2-fold on KLF3 rescue and promoter-bound by KLF3 (cont). A 1 r t i n 2 n ubu i g s membe a n 1 i 1 1 , h r c n n i 5 i c i t 2 o g g t a a y n p n n t l a i i l h te u n o n i c um in u i u i - a n co i pt t cata gro eta i l n , b m i ong mod l med 3 C 2 e con conta ma l , , 1 ase r r 2 y t n l b n e i i i a e u agy e Descr d doma k ase i ype h n nge t li l 1 e i n p 55 am do - f se f a n e 1 op ge ll a , t n l g ge , 1 i doma doma r r B 1 u 23 n ase ogenase 5 o o Ge i 1 ce r ro 1 ase m L a t t phosph r member 1 n 78 r r 1 gene i d d r n 2 , 3 i 1 k ydro n g o te - i 1 2K e T t ed eceptor h g t - n hy hy t me e i 1 r 4 3 3 i LCCL LCC ase n t e a C n l i e l n i recep recep type y t oma 3 d de l n d o or u in in pro ep i a t n se BC03 t i t n I, pro ecep e a A A u 4 t g r e e a and ke ed ed V eg e t ase i l l oge ac e l k n ote ami r r nt e r 99300 n i f ce i n a n r p p n f 2 - i ade I B n f con u y - d inos n mo l e pro p u u e l a h co in h h U t oaceta c n e u ge ge n i r r t er A2b ass i G i i e i p l s u zyme co co C CUB l n t h dy o b i - - c a i de t rri 2 nt nzym n nt , , o u p t t pro r , n n VCP r ne a med s a a eDNA n n i i ace a i e i i an n l r p t c r growt seq g t e e h be d d 1 o t t a damage ocato oca a i i N n a doma l l membr f acid p - d e Coe Coe i S 5 A n i e t - - s s s s doma i E l l l omo 1 T - i o u N n n n n rm os c i pro pro l K mary A sco sco e D59a D I H3 araoxonase e denos cy h i i u l o ra ra u s d d eD op dock d a acy a caveoli C S G C G G p pyruv p n p DNA MOCO R EH tra t t tra ATPase f fatty fetu k i R 1 1 23 l b 1 1 1 3 3 5 1 1 a2 b m 1 1 1 4 2 1 d d rf h p l l a2b 1 s d d2 p name 155 c5a vl m2 1 frl r tu t 2K1 3 r r ag am ok h spo d d l p 4k2b e a N 1 d59a d151 ab e r h i p p Fa Dse t c3 l E Tspo T P Pon2 D Cav P F Acad Og n M F C C G Mosc2 D S P Dcb Dcb Aca A G G mem S Ado e T BC03178 G 99300 5 5 1 3 1 060 079 1 1 44 1 1 564 1 90297 78 1 1 1 098230 08 1 53068 33684 72508 72788 83308 814 33739 1 1 1 1 1 1 1 1 1 1 1 028744 02 027878 02617 009775 010176 007382 009775 016900 019969 007381 009842 025705 025705 010786 010634 028022 029570 ______001004 00 00 00 00 C03 RefSeq M _ _ _ _ _ B NM_025286 NM NM NM NM NM NM NM NM NM NM NM_053246 NM NM NM NM NM NM NM NM NM_0074 NM NM NM NM NM NM N M M M NM NM N N N 9 ue 58 56 4 23 83 l 1 1 1 1 253 696 a 1 2087 0984 0071 0052 0206 0287 0165 0557 0052 09 0058 0 0831 0324 0339 0466 0466 0370 0 0041 0002 0311 0232 0767 0983 0001 0002 0 0 0352 0086 v .1 ...... - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0. 0 0 0 0 0.0061 0 0 0 0 0 p on i s s 5 2 6 5 8 8 5 3 e ld 1 1 1 1 16 1 29 25 29 29 27 27 24 24 23 27 08 06 35 36 07 04 04 04 02 02 02 00 08 0 33 03 r .1 .1 ...... o 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 F xp change E k 1 1 1 1 2 2 9 2 70 73 63 68 60 6 6 65 67 50 55 57 5 66 5 54 72 53 58 5 56 7 40 45 49 47 43 44 42 46 48 64 69 Ran to 7 7 2 3 S 1 1 2 1 1 1 68 33 7 27 2 1 36 50 50 54 28 - 2 90 84 86 6 94 46 3 1 3 1 1 1 10 267 4 206 263 5 377 205 259 296 236 226 939 st. - - - -84 ------i TS ------D ) 3 k 1 1 1 1 3 3 4 2 2 6 1 24 68 9 8 5 1 1 1 ght ...... 23 29 22 26 28 65 69 69 8 84 01 92 34 3 36 39 38 7 72 77 74 i .1 .1 .1 ...... 10 0 8 0 0 8 9 og2 e 9 8 8 8 8 8 8 9 8 8 9 8 8 8 7 7 7 7 8 8 8 8 8 8 Pea 1 1 1 (l h + + + + + + + + + + + 7 + + + + + + 7 + + + + + + + + + + + + + + + + rand t n S o i at 1 1 1 1 1 1 1 4 2 + 7 9 m 1 1 60 80 04 93 86 54 4 1 1 1 1 1 1 2 209 284 699 602 60 67 853 629 382 09 06 972 054 5 366 026 729 863 55 029 992 7 77 457 499 47 or , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , f 1 1 5 4 8 8 n d 8 33 00,496 n 11 1 14 393 553 764 754 4 23 276 248 224 012 653 653 01 390 559 607 394 936 062 670 953 07 953 893 947 910 927 820 920 7 865 k i , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , E 1, 1, 1, 1, 1 6 5 7 7 3 7 4 10 1 1 1 23 22 58 5 53 35 35 5 33 70 53 73 48 62 66 83 83 03 97 82 9 83 87 86 1 1 1 1 1 1 1 1 Pea 1 1 1 1 1 1 1 4 4 14 5 54 5 63 53 1 299 202 229 6 27 20 057 626 329 809 966 096 099 982 3 071 66 69 37 849 793 572 780 760 629 786 712 6 592 884 70 4 4 ,1 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 9 7 9 4 17 1 1 32 47 4 230 275 22 248 552 012 607 669 652 865 653 893 390 393 393 935 909 952 074 559 927 062 9 015 9 763 953 780 754 8 400 tart ,1 ,1 ,1 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , S 1 1 6 3 7 5 7 4 0 1 1 17 1 1 51 91 61 53 86 83 83 03 66 53 97 82 83 58 33 62 35 35 5 73 70 23 22 87 48 1 1 1 1 1 1 1 1 1 1 5 5 4 0 6 0 0 0 0 0 2 2 3 6 7 3 8 1 1 1 1 1 1 1 1 1 1 1 1 r1 r1 r1 r1 r7 r4 r7 r4 r9 r r5 r r6 r r r6 r3 r r1 r4 r r r r r r r r r r r r r r h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h Chr c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c

141

Table A4. Genes activated >2-fold on KLF3 rescue and promoter-bound by KLF3. 3 de i t l ypep po g 2 , n i n i ase 1 r e e f k A i conta l 1 r - ans 3 n n r i l t l'l n e e 2 i i t y s e s epeat 9 t embe pt a r l i pro m e ne r pro , i d acto i dro l 2 35 y ea 0 l h ng 1 ga e i Descr 1 c 1 - ) n u 4 e a nd , tosam n se l i , member e n i ge rity 1 copept c y a b c u mino A l r i 12 a a ri i l l 1 a 1 1 i n Ge 1 fe ss nt e at member yl i r e t s m n in , t i h beta s , r s e 8A e omeobox t rosoph D-ga et an ge 25 to - eme na h l tor tr i D rot Ac d l g ( im e a l y k a 040 p n ac i embryo n i N ty d i v f i 1 superfam c , pha i h l l n 98 81 ve 2 i t!l ne i c e a e n a i a am i 2 3 G a n ne t - f C l act ace ri i og n l kem l 1 - e u r y a n os l nt u i t owth o n e r r AS sequence e C i r Gt!i e l g n l gene ob r g co i ty box l l rri :beta l l a ll ox l bu a ace e eDNA espons a hom - oske B2 og t c nogen r l 1 wtth A2 tu ce pro N Ga box Iik N e ead n cted l - - - e l - n n P i l l y a mi i i h i E l ri k P s a h ut k M c l e B r p methy a acen ex RY ansmemb omeobox omeob l l r m l bu l ri i am s so S d cA a eph muscu p h h pred p immun p p p UDP RIK UD FMS f fi fo tr k i e R 1 ! m 1 3 1 1 4 2a 1 a 2 2 l 8 2 e l t 1 1 t l 1 f a2 1 t 1 f9 tn n1 tc2 na t a nt l k 10 n x l l s s rr l xc13 bx c m9 x fnb eb3 Pg e g u p F l P r l ri P Ata c25a35 E Fb Foxc2 So G l Hoxa5 Tm Ddah S P Ga M Ho B4ga C P 0408A en S Fam 1 G 28 1 9 4 8 9 9 744 11 1 1 38 4 2 180 1 228 368 957 390 3348 904 45800 4 14 1 1 8783 3739 9264 8882 8827 8872 35 1 1 1 0 10464 104 53 1 280 274 20579 53560 1 1 10 10 10 8 Seq 1 f 1 1 177 17 0 0 0 0 0 0 0 0 0 0 0 00 00 00 00 00 ______00 001 001 001 001033217 Re _ _ _ _ _ NM NM NM NM NM NM NM NM NM NM NM NM NM NM NM NM NM NM NM NM M M M M N N NM N N 1 0 3 8 7 7 ue 23 54 30 1 1 1 1 1 l 140 170 256 367 0570 00 0027 . . . 0.0 0.00 0.0 0.0884 0.00 0.1005 0 0.0 0.0062 0.0652 0 0.00 0.00 0.0544 0.0795 0.0038 0.04 0.08 0.004 0.0048 0 0.0641 0.01 0.0497 0.0 p-va s.it'ln s. 1 1 1 1 2 6 6 3 28 67 95 37 65 56 01 39 72 46 .1 .1 .0 .0 .4 .04 . .04 . . . .00 . . . .0 . . . .7 .47 old 2 2 2 2 2 2.04 2 2 2 2.05 2 2 2 2 3.40 3 3 3 3 3 5 3 3.0 5 F change Exprt!~ k n 1 4 2 2 8 9 3 5 1 2 3 6 8 5 9 7 4 1 10 1 21 24 20 22 23 25 1 1 1 1 1 16 17 Ra to 1 1 4 5 26 2 5 36 45 47 50 77 1 45 40 28 2 68 65 28 34 229 227 379 350 359 385 733 1 1 1 1 1 1 830 st. ------i TSS ------t D ) h 1 4 2 2 8 24 24 0 84 52 54 g 08 93 59 69 96 72 8 i .1 .1 . . . .4 ...... 7 . . . 7 7 7 7 7.69 7 7 7 7 7 7.95 7.28 8.65 9.04 8 8 9 8 8.39 8 8.46 8 8 9.29 log2 Peak ( he + + + + + + + + + + + + + + + + + + + + + + + + + trand S 2 2 2 3 3 1 1 1 1 1 20 83 umatil'ln 758 185 198 168 154 124 430 671 6 836 590 800 567 9 5 8 t , , , , , , , , , , , ,490 , , , , ,0 1 1,4 1 nf i 1 54 90 96 36 nd 1 751,507 1 426 2 224 868,040 692 363 864 518 640 66 ,390,916 , , , , ,0 , , , ,0 , ,047, E ak 1, 1, 2 2 3,200 5 8 8,923 0,315,488 4,412 t11 3 2 52, 3 52 35, 36 1 1 48,537,9 23 23 69,714,728 9 9 86 85 68,78 96,446 93 170, 1 1 1 1 1 10 17 17 10 P 0 2 0 0 1 0 1 12 13 12 9 2 67 3 90 754 5 27 2 640 328 516 358 513 083 724 088 785 798 768 4 , , ,1 ,1 ,1 ,6 , , , , ,0 , , , , , , , , , , ,40 1,0 1 6 4 4 2 8 5 36 1 1 1 1 1 2 45 46 92 54 95 289 864 22 200 362 639 867 66 390 5 537 3 923,436 751,107 4 4 tart . . . ,1 ,1 ,0 , , , , ,4 , ,o ,6 , , , , , ,0 , , S s 2 5 8 5 8 8 10, 1 ~2 ~1 23 g6 31 3 36 52 38,781 39,7 32 73 ~3 4 0 170 123 1 152 136 1 10 174 1 1 7 4 2 5 5 g 5 1 1 1 1 2 2 6 8 8 5 8 3 7 r 11 11 1 1 1 1 1 r r r r r r r r r r r r r r r r10 r17 r r r r r r1 h h h h hr2 h h h h h h Ch c c ch ch ch c ch ch ch ch c ch c c c c c chr19 c c ch ch ch ch ch

142