Using genome editing to introduce naturally occurring mutations associated with elevated foetal haemoglobin

Beeke Wienert

A thesis submitted for the degree of Doctor in Philosophy (Biochemistry and Molecular Genetics)

School of Biotechnology and Biomolecular Sciences The University of New South Wales

May 2016

Table of Contents

Table of Contents

Originality statement ...... ii Table of Contents ...... iii Acknowledgements ...... vii Publications arising from this candidature ...... viii Abstract ...... x List of Abbreviations ...... xi 1 CHAPTER 1: Introduction...... 1 1.1 Mammalian regulation ...... 1 1.1.1 Gene regulatory elements ...... 1 1.1.2 Transcription factors (TFs) ...... 2 1.2 Blood and Haematopoiesis ...... 3 1.2.1 Erythropoiesis ...... 4 1.3 Haemoglobin ...... 5 1.4 The globin loci...... 6 1.4.1 The -globin locus ...... 6 1.4.2 The -globin locus ...... 6 1.5 Haemoglobin switching ...... 7 1.5.1 Transcription factors in globin switching ...... 9 1.6 Haemoglobinopathies ...... 11 1.6.1 -haemoglobinopathies ...... 11 1.6.2 Current treatments ...... 12 1.7 Hereditary persistence of Foetal Haemoglobin (HPFH) ...... 13 1.7.1 Deletional HPFH ...... 13 1.7.2 Non-deletional HPFH ...... 14 1.8 Genome editing ...... 16 1.8.1 DNA Nucleases ...... 17 1.8.2 The CRISPR/Cas9 system ...... 19 1.8.3 DNA repair mechanisms ...... 21 1.8.4 Applications ...... 22 1.9 Gene therapy ...... 22 1.10 Aims and hypothesis ...... 23 2 CHAPTER 2: Material and Methods ...... 25 2.1 Materials ...... 25

Page | iii

Table of Contents

2.1.1 Chemicals and Reagents ...... 25 2.1.2 Enzymes ...... 29 2.1.3 Antibodies ...... 29 2.1.4 Cytokines ...... 29 2.1.5 Plasmids ...... 30 2.1.6 Oligonucleotides ...... 30 2.1.7 Bacterial strains and culture ...... 31 2.1.8 Commercial services and kits ...... 31 2.2 Methods ...... 31 2.2.1 General methods ...... 31 2.2.2 Mammalian cell culture and transfection ...... 32 2.2.3 Overexpression of in E.coli ...... 34 2.2.4 Electrophoretic mobility shift assays (EMSAs) ...... 34 2.2.5 RNA extraction and cDNA synthesis ...... 35 2.2.6 Quantitative real-time RT-PCR and Fluidigm ...... 35 2.2.7 Western Blotting ...... 36 2.2.8 Genome editing ...... 36 2.2.9 Fluorescence-activated cell sorting and flow cytometry ...... 38 2.2.10 Chromatin immunoprecipitation ...... 39 2.2.11 Pyrosequencing ...... 39 2.2.12 Chromatin conformation capture (3C) ...... 39 2.2.13 Transient Transactivation Assays ...... 40 2.2.14 Mitotic arrest and cell cycle analysis ...... 40 2.2.15 Statistical analysis ...... 40 3 CHAPTER 3 – The -175 T to C mutation ...... 41 3.1 Chapter 3 Introduction ...... 41 3.1.1 The -175 T>C HPFH mutation elevates foetal haemoglobin levels in humans ..... 41 3.2 The -175 T>C mutation does not affect GATA1 binding to that region in vitro ...... 44 3.3 The -175 T>C mutation is a gain-of-function mutation ...... 46 3.3.1 The -175 T>C mutation creates a TAL1 binding site ...... 46 3.3.2 TAL1 and GATA1 can bind to the -175 mutation in a complex ...... 47 3.4 Generation of murine erythroid cell lines carrying the -175 T to C mutation ...... 49 3.5 TAL1 binds and activates -globin -175 T>C in transgenic mouse erythroid cells ..... 51 3.6 Genome editing in a human erythroid cell line ...... 53

Page | iv

Table of Contents

3.7 TAL1 binds and activates -globin -175 T>C in K562 cells ...... 55 3.8 -175 T>C increases enhancer looping to the -globin promoter ...... 58 3.9 Chapter 3 Discussion ...... 60 4 CHAPTER 4: The -198 T to C mutation ...... 65 4.1 Chapter 4 Introduction ...... 65 4.1.1 The −198 T>C HPFH mutation boosts foetal haemoglobin levels in vivo ...... 65 4.1.2 The -198 T>C mutation creates a CACCC box ...... 65 4.2 KLF1 binds to the -198T>C mutation in vitro ...... 68 4.3 Other KLF proteins also bind to the -198T>C mutation in vitro ...... 69 4.4 KLF3 and KLF1 are the most abundant KLFs in erythroid cells ...... 71 4.5 KLF1 activates the -globin promoter in a reporter assay ...... 73 4.6 The effect of the -198 T>C mutation on -globin expression in a human cell model 75 4.7 The -198 T>C mutation increases foetal globin expression in HUDEP2 cells ...... 78 4.8 KLF1, KLF3 and SP1 bind to the -198 T>C mutation in vivo ...... 80 4.9 KLFs compete with ZBTB7A for binding to the -198 T>C globin promoter ...... 83 4.10 Chapter 4 Discussion ...... 84 5 CHAPTER 5: The -200 site ...... 91 5.1 Chapter 5 Introduction ...... 91 5.1.1 A cluster of mutations at the -200 site causes HPFH in humans ...... 91 5.1.2 HPFH mutations disrupt binding of ZBTB7a in vitro ...... 92 5.1.3 ZBTB7a binds to the -200 region of the foetal globin promoter in vivo ...... 93 5.2 ChIP-qPCR of ZBTB7A in HUDEP2 cells ...... 95 5.3 ChIP-qPCR experiments of ZBTB7A at various stages of the cell cycle ...... 96 5.4 Generation of human erythroid cell lines carrying the -globin promoter -195 C>G mutation ...... 98 5.4.1 Genome editing in K562 cells ...... 99 5.4.2 Genome editing in HUDEP2 cells ...... 100 5.5 The -195 C>G mutation disrupts ZBTB7a binding in vivo ...... 101 5.6 The -195 C>G mutation increases foetal globin expression ...... 102 5.7 Modifying a ZF of ZBTB7a can restore binding to -195 C>G in vitro ...... 104 5.8 Chapter 5 Discussion ...... 107 6 CHAPTER 6: General discussion and conclusion ...... 113 6.1 Summary ...... 113 6.2 The molecular basis of HPFH mutations ...... 113

Page | v

Table of Contents

6.3 Translation into the clinic ...... 115 6.4 Conclusion ...... 118 References ...... 119 Appendix I - Primer lists ...... 138 Appendix II - Controls ...... 143 Appendix III - Additional data ...... 145

Page | vi

Acknowledgements

Acknowledgements

I would like to start by thanking my supervisor, Merlin, for giving me the chance to complete my PhD here in Australia. Your critical mind was and always will be a true inspiration for me and has shaped my conception of science tremendously. You have been a fantastic mentor throughout my PhD and I cannot thank you enough for that.

A big thank you also goes to Richard and Kate, who have been incredibly helpful in the planning, fine tuning and trouble-shooting of this work. Especially you, Kate, have my sincerest respect and admiration for your professionalism and supervision skills and I believe that you are a wonderful asset to the Crossley lab.

Next, I would like to acknowledge our collaborators Jacqui Matthews, Joel Mackay, Jim Vadolas, Matt Porteus and Takahiro Maeda without whom this project would not have been possible. You all have my genuine gratitude for your exceptional work, donations and insights that have helped with the publication and progress of this work.

There are a few more people who have been enormously helpful throughout my PhD and who deserve to be mentioned here. Firstly there is Dale Wright from the Children’s Hospital in Westmead who gratuitously offered to karyotype our K562 cells and then there is Chris Brownlee from the Flow Cytometry Facility, who has always been immensely helpful, friendly and cooperative. Thank you!

Thanks also to all the members, and past members, of the Crossley lab. When I joined this lab in 2013 I had no idea how fun and wonderful it will be to work with you. The work environment that you have created always made me feel welcome and respected and I loved coming into the office every single day of my PhD. Thanks to Cath for the late night conversations, to Beth for the endless supply of chocolate, to Gabbie for always agreeing on how complicated things are and to Dixie for the finger buns. And finally, a special thank you goes to Laura for always supplying me with a second beer and for becoming such a good friend over the years.

Lastly, I want to thank my family, who have always supported me in every crazy decision that I have made in my life and who, even though I live on the other side of the world, never stopped loving me. The final thanks goes to my wonderful friends, old ones from back home and new ones that I have made here. Thank you for putting up with me for so long and being my family away from home!

Page | vii

Publications

Publications arising from this candidature

Journal articles

Wienert B, Funnell AP, Norton LJ, Pearson RC, Wilkinson-White LE, Lester K, Vadolas J, Porteus MH, Matthews JM, Quinlan KG, Crossley M: Editing the genome to introduce a beneficial naturally occurring mutation associated with increased fetal globin. Nat Comms. 2015

Wienert B, et al: KLF1 drives the expression of foetal haemoglobin in British HPFH. Manuscript in preparation.

Wienert B, et al: Mutations associated with Hereditary Persistence of Foetal Haemoglobin disrupt binding of the foetal globin repressor ZBTB7A/LRF. Manuscript in preparation.

Dewi V, Kwok A, Lee S, Lee MM, Tan YM, Nicholas HR, Isono KI, Wienert B, Mak KS, Knights AJ, Quinlan KG, Cordwell SJ, Funnell AP, Pearson RC, Crossley M.: Phosphorylation of Kruppel-like Factor 3 (KLF3/BKLF) and C-terminal Binding 2 (CtBP2) by Homeodomain Interacting Protein Kinase 2 (HIPK2) Modulates KLF3 DNA Binding and Activity. J Biol Chem. 2015

Funnell AP, Vernimmen D, Lim WF, Mak KS, Wienert B, Martyn GE, Artuz CM, Burdach J, Quinlan KG, Higgs DR, Whitelaw E, Pearson RC, Crossley M.: Differential regulation of the α- globin locus by Krüppel-like Factor 3 in erythroid and non-erythroid cells. BMC Mol Biol. 2014

Burdach J, Funnell AP, Mak KS, Artuz CM, Wienert B, Lim WF, Tan LY, Pearson RC, Crossley M.: Regions outside the DNA-binding domain are critical for proper in vivo specificity of an archetypical . Nucleic Acids Res. 2014

Conference abstracts

Oral presentations Wienert B, Norton LJ, Funnell AP, Pearson RC, Lester K, Wilkinson-White L, Vadolas J, Porteus MH, Matthews JM, Quinlan KG and Crossley M (September 2015): Boosting foetal haemoglobin levels by genome editing. Abstract presented at the Crispr/Cas Cold Spring Harbor Meeting, NY, USA.

Wienert B, Norton LJ, Funnell AP, Pearson RC, Lester K, Wilkinson-White L, Vadolas J, Porteus MH, Matthews JM, Quinlan KG and Crossley M (June 2015): Editing the genome to introduce

Page | viii

Publications

a beneficial naturally occurring mutation associated with increased fetal globin. Abstract presented at the Australian Society for Medical Research (ASMR) Scientific Meeting, Sydney, NSW.

Wienert B, Norton LJ, Funnell AP, Pearson RC, Lester K, Wilkinson-White L, Vadolas J, Porteus MH, Matthews JM, Quinlan KG and Crossley M (November 2014): Reactivation of -globin by genome editing as a potential therapy in Sickle-cell anaemia. Abstract presented at BABS Symposium, Sydney, NSW.

Poster presentations Wienert B, Norton LJ, Martyn GE, Funnell AP, Pearson RC, Lester K, Wilkinson-White L, Vadolas J, Porteus MH, Matthews JM, Quinlan KG and Crossley M (January 2016): Naturally occuring gain-of-function mutations boost fetal hemoglobin levels. Poster presented at the Keystone Meeting for Molecular Biology, Snowbird, UT.

Wienert B, Norton LJ, Funnell AP, Pearson RC, Lester K, Wilkinson-White L, Vadolas J, Porteus MH, Matthews JM, Quinlan KG and Crossley M (February 2015): Reactivation of g-globin via TALEN-mediated genome editing. Poster presented at Lorne Genome Conference, Lorne, VIC.

Wienert B, Norton LJ, Funnell AP, Pearson RC, Lester K, Wilkinson-White L, Vadolas J, Porteus MH, Matthews JM, Quinlan KG and Crossley M (September 2014): A novel globin regulatory complex induced by a naturally occurring mutation. Poster presented at Haemoglobin Switching Conference, Oxford, UK.

Wienert B, Norton LJ, Funnell AP, Pearson RC, Lester K, Wilkinson-White L, Vadolas J, Porteus MH, Matthews JM, Quinlan KG and Crossley M (June 2014): The molecular mechanism behind a regulatory mutation in the fetal hemoglobin gene with therapeutic potential in sickle cell disease andβ-thalassemia . Poster presented at ASMR Scientific Meeting, Sydney, NSW.

Wienert B, Funnell AP, Pearson RC, Lester K, Wilkinson-White L, Matthews JM and Crossley M (February 2014): Novel regulatory complexes controlling globin . Poster presented at Lorne Genome Conference, Lorne and Asian Conference for Transcription, Melbourne, VIC.

Wienert B, Funnell AP, Pearson RC and Crossley M (October 2013): Novel regulatory complexes controlling globin gene expression. Poster presented at ComBio2013 conference, Perth, WA.

Page | ix

Abstract

Abstract

-haemoglobinopathies are amongst the most common inherited diseases in the world with devastating prospects for the affected individuals if untreated. The discovery that high foetal haemoglobin (HbF) levels are beneficial for patients leading to less severe symptoms has been one of the key drivers of haemoglobin research. Naturally occurring mutations in the promoter region of foetal -globin result in the continued expression of foetal haemoglobin into adulthood - a benign condition known as Hereditary Persistence of Foetal Haemoglobin (HPFH). Individuals with HPFH have foetal haemoglobin levels between 3 % and 40 % whilst the normal adult only produces about 1 % of HbF. The high foetal haemoglobin levels in individuals with HPFH are sufficient to ameliorate the symptoms in individuals with -haemoglobinopathies such as -thalassaemia and sickle cell anaemia. The purpose of our research is to explore reactivation of foetal globin expression in adult life as a therapeutic strategy by developing mechanistic understanding and by introducing these advantageous HPFH mutations in cell models.

Here we introduced three different naturally occurring HPFH mutations into various erythroid cell models by TALEN- and CRISPR/Cas9-mediated genome editing and found that this resulted in elevated levels of HbF. Thus, we propose that introducing these mutations into patients with -haemoglobinopathies could represent a possible gene therapeutic approach to ameliorate symptoms.

Furthermore, we were able to uncover the molecular mechanisms underlying these HPFH mutations through in vitro and in vivo binding studies. We demonstrated that the -175 T>C and the -198 T>C mutations create de novo binding sites for the erythroid specific activators TAL1 and KLF1, respectively. Chromatin conformation capture experiments revealed that TAL1 mediates looping of the LCR to the -globin promoter through recruitment of LMO2 and LDB1 to activate foetal globin expression. We also provide evidence that a cluster of HPFH mutations around -200 bp upstream of the -globin transcription start site decreases binding of the foetal globin repressor ZBTB7A.

Overall, we deliver three different mechanistic explanations for non-deletional HPFH in humans. By uncovering the molecular basis underlying these mutations we made a significant contribution to better understanding the foetal to adult haemoglobin switch.

Page | x

Abbreviations

List of Abbreviations

3C chromatin conformation capture

A adenine

 alpha aa amino acid

AGRF Australian Genome Research Facility

APS ammonium persulfate

ATP adenosine triphosphate

 beta

o beta-thalassemia major

+ beta-thalassemia intermedia

BAC bacterial artificial

BCL11A B cell lymphoma 11A

BCL2L11 Bcl-2 Interacting Mediator of cell death

bHLH basic helix-loop-helix

BLAST basic Local Alignment Search Tool

bp (s)

BRG1 Brahma-related gene-1

BSA bovine serum albumin

c centi

C celsius

C cytosine

Cas9 CRISPR associated protein 9

Page | xi

Abbreviations

spCas9 Cas9 protein from Streptococcus pyogenes

CBP CREB-binding protein

CCR5 C-C chemokine type 5

CD34 cluster of differentiation 34 antigen cDNA complementary DNA

CFU colony-forming unit

CHD3 Chromodomain helicase DNA-binding protein 3

CHD8 Chromodomain helicase DNA-binding protein 8

ChIP chromatin immunoprecipitation

ChIP-qPCR chromatin immunoprecipitation-quantitative polymerase chain reaction

ChIP-Seq chromatin immunoprecipitation followed by sequencing

Chr chromosome

CMP common lymphoid progenitor cells

CMV cytomegalovirus

CO2 carbon dioxide

CO carbon monoxide

Cos CV-1 in Origin with SV40 cells

CRISPR clustered regularly interspaced short palindromic repeats crRNA CRISPR RNA

CTBP C-terminal binding protein

D aspartic Acid

 delta

DEPC diethylpyrocarbonate

DMEM Dulbecco’s modified eagle medium Page | xii

Abbreviations

DMSO dimethylsulfoxide

DNA 2’deoxyribonucleic acid dNTP deoxynucleoside triphosphate

DSB double-strand break dsDNA double stranded DNA dsRED Discosoma sp. red fluorescent protein

DTT dithiothreitol

 epsilon

E12/E47 E2A immunoglobulin enhancer-binding factors E12/E47

E.coli Escherichia coli

ECFP enhanced cyan fluorescent protein

EDTA ethylenediamine-tetraacetic acid disodium dehydrate

EGFP enhanced green fluorescent protein

EGTA ethylene glycol-bis[2-aminoethylether]-N,N,N¢,N¢-tetraacetic acid

EMSA electrophoretic mobility shift assay

ENCODE encyclopedia of DNA elements

EPO erythropoietin

ES embryonic stem cell(s)

EtOH ethanol

FACS fluorescent activated cell sorting

FBI1 factor that Binds to Inducer of Short Transcripts Protein 1

FOG1 Friend of GATA-1

FCS foetal calf serum g gram(s) Page | xiii

Abbreviations

G glycine

G guanine

 gamma

G1 Gap 1 phase of cell cycle

G1ER inducible GATA1- cell line

G2/M Gap 2/ mitosis phase of cell cycle

GAPDH glyceraldehyde-3-phosphate dehydrogenase

GATA-1 GATA-binding transcription factor 1

GATAD2B GATA Zinc Finger Domain Containing 2B gDNA genomic DNA

GMP granulocyte-macrophage progenitor cells

GST glutathione S-transferase

GWAS genome-wide association study h hour(s)

H histidine

HAT histone acetyl transferase

HbA adult haemoglobin protein

HBA1 human adult haemoglobin 1

HBA2 human adult haemoglobin 2

HBB human adult β-globin

HBD human adult β-like globin, δ

HBE human embryonic β-like globin, ε

HbF foetal haemoglobin protein

Page | xiv

Abbreviations

HBG1 human foetal β-like globin, A

HBG2 human foetal β-like globin, G

HbS sickle haemoglobin protein

HBZ human embryonic α-like globin, 

HCl hydrochloric acid

HDR homology-directed repair

HEPES N-[2-hydroxyethyl]piperazine-N9-[2-ethanesulfonic acid]

HIV human immunodeficiency virus

HTH helix-turn-helix

HMG high Mobility Group box

HMT histone methyl transferase

HPFH hereditary persistence of foetal haemoglobin

HR homologous recombination

HS hypersensitive site

HSC haematopoietic stem cell

HUDEP2 human umbilical cord blood-derived erythroid progenitor 2

I isoleucine

IDT Integrated DNA Technologies

IgG immunoglobulin G

IP immunoprecipitation

K lysine

K1ER KLF1-estrogen receptor cell line

kb kilobase pairs

Page | xv

Abbreviations

kDa kilodalton(s)

KLF Krüppel-like factor

KLF1 Erythroid Krüppel-like factor

KLF1-ER Erythroid Krüppel-like factor-oestrogen receptor fusion protein

KLF2 Lung Krüppel-like factor

KLF3 Basic Krüppel-like factor

KLF4 Krüppel-like factor 4

KLF5 Krüppel-like factor 5

KLF6 Krüppel-like factor 6

KLF7 Krüppel-like factor 7

KLF8 Krüppel-like factor 8

KLF9 Krüppel-like factor 9

KLF10 Krüppel-like factor 10

KLF11 Krüppel-like factor 11

KLF12 Krüppel-like factor 12

KLF13 Krüppel-like factor 13

KLF14 Krüppel-like factor 14

KLF15 Krüppel-like factor 15

KLF16 Krüppel-like factor 16

KLF17 Krüppel-like factor 17

KO knock-out

L liter

LB Luria-Bertani

LCR locus control region Page | xvi

Abbreviations

LDB1 Lim domain binding 1

LMO2 Lim-only protein 2

LRF Leukaemia/Lymphoma-Related Factor

LSD1 lysine-specific demethylase 1A

LZ m milli

M molar

µ micro

MEL mouse erythroleukaemia cell line

MEME multiple Em for Motif Elicitation

MEP megakaryocytic-erythroid progenitor(s) min minute(s)

MQW milli-Q water mRNA messenger RNA

N asparagine

NaCl sodium Chloride

NCBI National Centre for Biotechnology Information

Neo neomycin resistance gene

NHEJ non-homologous end joining

NK natural killer cell(s)

NO nitric oxide

NTP nucleoside triphosphate

NuRD nucleosome remodelling and deacetylase complex

O2 oxygen Page | xvii

Abbreviations

ORF open reading frame p300/CBP E1A binding protein 300/CREB-binding protein

PAGE polyacrylamide gel electrophoresis

PAM protospacer adjacent motif

PBS phosphate-buffered saline

PCR polymerase chain reaction

PGK phosphoglycerate kinase

PIC transcription preinitiation complex

PMSF phenylmethylsulfonyl fluoride

POK POZ and Krüppel type

POKEMON POZ and Krüppel Erythroid Myeloid Ontogenic Factor

POZ pox virus and Zinc finger domain

PSG penicillin, streptomycin and glutamine solution qPCR quantitative real time RT-PCR

RNA ribonucleic acid

RNase A ribonuclease A

RPMI Roswell Park Memorial Institute Media rRNA ribosomal RNA

RT-PCR reverse transcription-polymerase chain reaction

S DNA synthesis phase of cell cycle

SCD sickle cell disease

SCF stem cell factor

SCL/TAL1 T-Cell Acute Lymphocytic Leukemia 1

SDM site-directed mutagenesis Page | xviii

Abbreviations

SDS sodium dodecyl sulfate

SFEM Serum-Free Expansion Medium sgRNA small guide RNA

SL2 Schneider 2 cells

SNP single nucleotide polymorphism

SOX6 SRY (Sex Determining Region Y)-Box 6

SP1

SP2

SP3 ssDNA single-stranded DNA ssODN single-stranded oligodeoxynucleotide

SWI/SNF SWItch/Sucrose Non-Fermentable

T thymidine

TALEN transcription activator-like effector nucleases

TBE tris-Borate-EDTA buffer

TBP TATA-binding protein tdTomato Tandem tomato fluorescent protein

TF transcription factor

TFII general transcription factor of RNA polymerase II tracrRNA transactivating CRISPR RNA

TRIS tris-hydroxymethyl-methylamine

Triton X-100 t-octylphenoxypolyethoxyethanol

TSS transcription start site

Tween™-20 polyoxyethylenesorbitanmonolaurate Page | xix

Abbreviations

U unit(s)

UCSC University of California Santa Cruz

UTR untranslated region

V Volts

WT wildtype

YAC yeast artificial chromosome

 zeta

ZBTB7A Zinc finger and BTB domain containing 7A

ZF zinc fingers

ZFN zinc finger nuclease

ZFP zinc finger protein

Page | xx

CHAPTER 1: Introduction

1 CHAPTER 1: Introduction

1.1 Mammalian gene regulation Although every cell in our bodies carries the same genetic information there is no doubt that their morphology, phenotype and function is far from being identical. This phenomenon has fascinated scientists since the discovery of the genetic code and has been one of the key drivers in molecular biology research. It has led us to our current understanding that cellular diversity is caused by differential gene expression patterns. Genes are activated or silenced according to the destined function of the cell. Fine-tuned networks of proteins known as transcription factors are responsible for up- or down-regulation of genes in different tissues and time points of development. The proportion and absolute quantities of different activating and repressing transcription factors define the expression levels of genes encoding for other proteins that then determine the functional fate of the cell1. The first step of gene regulation happens on the transcriptional level - gene activation increases, while gene repression reduces the number of readily available pre-mRNA molecules in the nucleus. Although regulation at the level of transcription is considered to be particularly important and will be the focus of this work, regulation can also exist at other levels. For instance, the pre-mRNA can then be processed post-transcriptionally by RNA splicing and levels of mRNA can be altered by selective degradation or stabilisation2. Once the mRNA has reached the cytoplasm and is ready to be translated there are more factors, such as mRNA longevity, localisation and selective translation that regulate gene expression on a translational level3. Furthermore, once the mRNA has been translated into protein, enzymes can add or remove functional groups such as phosphates, sulphates or acetyl-groups known as post-translational modifications to enhance or inhibit the function of the modified protein4.

1.1.1 Gene regulatory elements Gene regulatory elements are defined as regions of DNA that are particularly important for the regulation of a certain gene. They can be close to the transcription start site (TSS) or be located further up- or downstream of the gene.

1.1.1.1 Promoters Proximal gene regulatory elements are known as promoter regions. They usually consist of a basal promoter located approximately 100 bp upstream of the TSS and additional promoter-proximal elements located between 100 and 500 bp upstream of the TSS. The basal promoter contains recognition elements such as TATA, DPE and BRE that are bound by the general transcription factor (GTF) machinery and, in turn, recruit RNA polymerase II. The

Page | 1

CHAPTER 1: Introduction

proximal promoter lies adjacent to the basal promoter but does not contain any binding sites for GTFs5. Instead, it comprises recognition sites for specific transcription factors which are involved in fine-tuning the tissue-specific expression of the gene. These sites are known as cis-regulatory elements and combinations of sites for different activators or repressors are often found in the same proximal promoter allowing for synergistic activation or repression of the gene by multiple factors5.

1.1.1.2 Enhancers and silencers Another group of gene regulatory elements includes enhancers and silencers which are characterised by their distal location in relation to the gene that is regulated. They can be located kilobases (kb) up- or downstream of a gene and typically regulate transcription in a spatial- or temporal-specific manner5. Similar to promoters, enhancers and silencers contain binding sites for specific transcription factors that mediate activation or repression of the gene, respectively. They exert their function through long-range interactions with the proximal promoter independently of orientation or distance to the promoter6.

1.1.2 Transcription factors (TFs) Transcription factors are proteins that specifically recognise DNA sequences. They can be broadly divided into two subgroups: general TFs and specific TFs. GTFs bind to recognition elements in the basal promoter and assemble a transcription preinitiation complex (PIC) including RNA polymerase II. Although the assembly of a PIC results in low levels of transcript in vitro, specific TFs are needed to boost expression levels to relevant quantities by binding to cis-regulatory elements in the proximal promoter7. These tissue-specific TFs interact with other proteins, so called coregulators, to form complexes that then in turn recruit other proteins, enzymes and chromatin modulators.

TFs consist of a DNA-binding domain and a functional domain which can comprise repression and/or activation domains. TFs often also contain dimerization or ligand binding domains. They are grouped by the protein structure of their DNA-binding domain and the most common and well-described DNA-binding motifs are the basic helix-loop-helix (bHLH) motif, the basic leucine zipper (LZ) motif, the zinc finger (ZF) motif and helix-turn-helix (HTH) domains8.

Transcriptional activators can drive the expression of a gene in multiple ways. Firstly they may be involved in recruiting and stabilising the PIC either through direct interaction or via interaction with cofactors. Secondly, they may induce long-range interactions with distal enhancer elements bringing them in close proximity with the basal transcription apparatus9.

Page | 2

CHAPTER 1: Introduction

Transcriptional repressors on the other hand may act through steric hindrance of PIC assembly or by recruiting distal silencer regions to the promoter10.

Furthermore, both, activators and repressors, have the ability to alter post-translational modifications of histones through recruitment of histone modifying enzymes. Active genes are usually associated with loosely packaged chromatin (euchromatin) whereas silenced genes are usually buried in tightly packed chromatin (heterochromatin). Histone-modifying enzymes can acetylate, methylate and ubiquitinylate nucleosomes to adjust the chromatin state according to the expression program driven by the bound transcription factors7.

Almost a third of human developmental disorders are a result of transcription factor dysfunction underlining the importance of a fine-tuned network of TFs regulating gene expression11. This study will focus on a range of transcription factors, both activators and repressors, which are involved in haematopoiesis and genetic disorders related to haemoglobin.

1.2 Blood and Haematopoiesis Mammalian blood consists of different types of blood cells that are all derived from a very small population of pluripotent haematopoietic stem cells (HSCs). In addition to their capacity to differentiate and reconstitute the whole blood system, HSCs also self-renew to replenish their population. When HSCs differentiate they undergo various maturational stages generating progenitor cells that are multipotent but lineage-committed (Figure 1.1). The process of differentiation of HSCs into mature blood cells is known as haematopoiesis12. Two main pathways can be distinguished: lymphopoiesis and myelopoiesis. Lymphopoiesis gives rise to common lymphoid progenitor cells (CMPs) and lymphoblasts which then differentiate into T- and B-cells and natural killer cells13. Myelopoiesis on the other hand produces all other cell types found in blood. Common myeloid progenitor cells (CMPs) commit to either the granulocyte-macrophage or the megakaryocytic-erythroid lineage. Granulocyte-macrophage progenitor cells (GMPs) are the precursors to monocytes and granulocytes which can further differentiate into macrophages, dendrites, mast cells, neutrophils, eosinophils and basophils14. Megakaryocytic-erythroid progenitors (MEPs) can either commit to megakaryopoiesis or erythropoiesis. In megakaryopoiesis large polyploid cells are formed that are known as megakaryocytes and yield large amounts of anucleate platelets (or thrombocytes) which are involved in blood clotting and immune responses15. Erythropoiesis is the main focus of this thesis and is described in the next subchapter in more detail.

Page | 3

CHAPTER 1: Introduction

Haematopoiesis is driven by distinct programs of transcription factor expression that determine the lineage fate. For each lineage, the programs of transcription factor expression are driven by a small subset of transcription factors, known as master regulators16. Extracellular signals such as cytokines and growth factors have been shown to be important in sustaining haematopoiesis and cell viability16. Although alone they are not instructive for the pathway of differentiation, they are able to alter lineage fate by modifying transcription factor programs17,18.

Figure 1.1: Haematopoiesis. Haematopoiesis gives rise to a number of distinct cell lineages. There are two main pathways: lymphopoiesis and myelopoiesis. In brief, an HSC differentiates into either myeloid or lymphoid progenitors which then undergo distinct differentiation pathways to generate the various mature blood cells shown at the end of each pathway. Erythropoiesis is highlighted with a black box. Figure adapted from de Bruin et al, 201419.

1.2.1 Erythropoiesis Erythropoiesis yields large amounts of small enucleated cells called erythrocytes or red blood cells. Of all blood cells erythrocytes are the most common type and their main function is to transport oxygen from the lungs to the tissues. They are derived from megakaryocytic-erythroid progenitors (MEPs) which differentiate into proerythroblasts driven by the expression of the master regulator GATA-binding protein 1 (GATA1). Once GATA1 is activated expression of GATA-binding protein 2 (GATA2) is repressed. GATA2 is thought to drive proliferation of MEPs whereas onset of GATA1 expression specifies the erythroid fate of the cell20,21. From the proerythroblast stage the cell undergoes further

Page | 4

CHAPTER 1: Introduction

differentiation into various erythroblast stages with different levels of chromaticity. The cell then starts to condense and the nucleus is extruded. In this stage the cell is referred to as a reticulocyte which still contains RNA and nuclear fragments. The last step of erythropoiesis is the complete enucleation which then results in a mature erythrocyte with the typical concave disc-like shape22.

In mammals, erythropoiesis occurs in two distinct waves. The initial wave is referred to as primitive erythropoiesis and commences in humans at around day 17 of development23. Primitive erythroblast progenitors are formed in blood islands of the yolk sac and mainly fulfil the function of producing primitive erythrocytes to supply the rapidly growing embryo with oxygen12,24. One distinct feature of primitive erythrocytes is the expression of embryonic haemoglobin. The initial wave of erythropoiesis is only transient and is soon replaced by definitive erythropoiesis. The location of red blood cell production moves from the yolk sac to the foetal liver – this step is initiated by circulating primitive HSCs that colonize the foetal liver at day 23 of human development25. Unlike in primitive erythropoiesis, where large nucleated erythroid cells can enter the bloodstream, in definitive erythropoiesis all erythrocytes undergo enucleation before they are released from the foetal liver26. From week 11 on the bone marrow starts to become the major site of erythropoiesis but the transition is not completed until after birth24,25.

1.3 Haemoglobin Haemoglobin is the protein found in erythrocytes that accounts for the transport of oxygen from the lungs to the different tissues of the body. It is composed of four polypeptide chains, two of them are -like chains and two of them are -like chains27. - and -like chains form two heterodimers which then assemble with four heme groups to form the tetramer haemoglobin. Haemoglobin can specifically interact with four different gases: oxygen (O2),

carbon dioxide (CO2), carbon monoxide (CO) and nitric oxide (NO). O2, CO and NO can

covalently bind to the iron atom of the heme group whereas CO2 is transported in the blood in solution by weakly binding to the terminal amino groups of the polypeptides28. Although the transport of oxygen in mammals is the main function of haemoglobin, the interaction with the other gases has different important biological functions.

In humans there are three main forms of haemoglobin: embryonic, foetal and adult haemoglobin. As their names suggest they are developmentally regulated which will be described in detail in the next chapter. Their main quaternary structure is similar; however, they differ in the composition of - and -like chains. The differences in the chain

Page | 5

CHAPTER 1: Introduction

composition result in slight differences in oxygen affinities of the haemoglobin molecules. For example, foetal haemoglobin, which is mainly present in the unborn foetus, has a higher affinity for oxygen than adult haemoglobin. This helps to transport oxygen across the placenta from the mother’s blood ensuring a sufficient oxygen supply to the baby29.

1.4 The globin loci

As mentioned above the haemoglobin molecule consists of - and -like polypeptide chains. These proteins are encoded by two gene loci in mammals: the -globin locus and the -globin locus. The local arrangements of the genes are shown in Figure 1.2 and will be described in the chapters below.

Figure 1.2: Schematic of the globin loci. Depicted are the -globin locus on chromosome 16 and the -globin locus on chromosome 11. Both loci have strong enhancer elements, HS-40 and LCR, located several kb upstream of the most 5’ gene. The globin genes are sequentially expressed in the order of 5’ to 3’ throughout development.

1.4.1 The -globin locus In humans the -globin locus is located on chromosome 16. It consists of three active genes named , 1 and 2. is the embryonic -like globin whereas 1 and 2 are adult -like globins. Although the 1 and 2 genes for identical protein chains30, 2 gene expression is two-fold higher than that of 1, which is most likely due to sequence differences in the promoter regions of these two genes31. Approximately 40 kb upstream of the genes, there is a cluster of DNAse hypersensitive sites called HS-40 that has been shown to play an important role in regulating the expression of the -like globin genes30.

1.4.2 The -globin locus The -like globin genes are arranged on chromosome 11 in the . There are five transcribed genes in this locus: , G, A,  and .  is the embryonic -like globin, G and Aare foetal and  and  are adult -like globins. G and Aare believed to have arisen via a tandem duplication event during evolution32. Their coding sequences only differ in one amino acid at position 136, which is in case of G a glycine and in case of A an alanine33. The 5’ regulatory regions of the two  genes are identical up to position -221 upstream of the

Page | 6

CHAPTER 1: Introduction

transcription start site. The rest of the 5 kb duplicated region differs on average in around 14 % of nucleotides32. During the foetal period, the G-globin gene is expressed about twice as strongly as the A gene34. The adult globin genes  and  are both expressed in the adult human. However, roughly 97 % of all adult -like globins are -globin chains27. Situated about 6 kb upstream of the -globin gene is an important regulatory element known as the locus control region (LCR). This region consists of five DNAse hypersensitive sites and is a hot spot for transcription factor binding. It has been shown to be a strong enhancer element that is involved in regulating the timely expression of embryonic, foetal or adult -like globin genes by directly contacting the promoter regions of each gene35. The enhancer activity of the LCR is tissue-specific confining the expression of the -like globin genes to erythroid cells35,36.

1.5 Haemoglobin switching

The expression of the different globin chains in the - and -globin locus is developmentally regulated in order to meet the specific oxygen demands of the developing embryo37. The different genes in each locus are sequentially activated and silenced, a process that is known as haemoglobin switching.

In the -globin locus -globin is expressed only in the very early stages of embryonic development (see Figure 1.3). -globin is then silenced within the first trimester and both the -globins start to be expressed persistently until adulthood38.

Page | 7

CHAPTER 1: Introduction

Figure 1.3: Human globin switching. This figure depicts the developmental switching of the - (black and brown) and -like (green, blue, pink, orange) globin gene expression in humans. The main locations of erythropoiesis at different stages of development are illustrated above the graph.

Globin switching in the -globin locus is slightly more complex. There are two switches occurring in human development: the first switch takes place at a similar time point to the - to -globin switch and the second switch happens at the time of birth. Within the first trimester of pregnancy the dominant -like globin chain being expressed is -globin which pairs with -globin to form functional embryonic haemoglobin E (Hb Gower-1). As soon as -globin expression is silenced -globin chain production is upregulated. -globins pair with -globins to form the main foetal form of haemoglobin, haemoglobin F (HbF). The second switch happens shortly after birth when -globin expression is silenced and adult -globin becomes the most prominent -like globin. Also expressed in adulthood is -globin; however it represents only a small amount of the total globin present. Both - and -globin assemble with -globin forming adult haemoglobins A (HbA) and A2 (HbA2), respectively.

The location of erythropoiesis changes simultaneously with the switch of the polypeptide chain composition in the haemoglobin molecule. First haemoglobin is produced in primitive erythroblasts in blood islands of the embryonic yolk sac. After the first trimester, when primitive erythropoiesis gives way to definitive erythropoiesis, the expression of globins switches to the foetal liver, later on to the spleen and finally to the bone marrow.

Page | 8

CHAPTER 1: Introduction

1.5.1 Transcription factors in globin switching Globin switching has been investigated for decades and the -globin locus is most likely one of the best-studied loci in the genome often serving as a model for developmental gene regulation. Many transcription factors have been described to be important in the regulation of embryonic, foetal and adult globin expression. This subchapter will define some of these factors and their roles in haemoglobin switching with the main focus on the -globin locus.

1.5.1.1 Krüppel-like factors The family of Krüppel-like transcription factors (KLFs) consists of 17 highly-related zinc finger proteins with diverse biological functions. Their carboxy-terminal DNA binding domains are highly conserved whereas their amino-terminal functional domains are highly divergent39. The first member of the family, KLF1, is also known as erythroid KLF (EKLF) because it is abundantly expressed in erythroid tissues40. It has been shown that KLF1 binds to a CACCC motif in the adult -globin proximal promoter in both mice and humans41,42. Lack of KLF1 in mice results in lethal -thalassemia but does not affect embryonic yolk sac erythropoiesis demonstrating the importance of KLF1 in driving adult -globin expression43. Furthermore, Klf1 knock-out mice failed to completely silence HbF production indicating a more extensive role of KLF1 in the foetal to adult globin switch than just the regulation of adult -globin expression. Just recently a study of a Maltese family revealed the missing link by showing that nonsense mutations in the KLF1 gene are associated with high foetal haemoglobin levels44. Simultaneously another study identified KLF1 as one of the main drivers of the foetal globin repressor BCL11A45 (see below). This suggests that KLF1 has in fact a dual role in globin switching: the direct upregulation of adult -globin through direct binding to its promoter and the indirect down-regulation of foetal -globin through upregulation of the repressor BCL11A.

The repressor KLF3, another member of the KLF family, is also highly abundant in erythroid tissues and is a direct target of KLF146. KLF1 and KLF3 have been shown to bind to very similar DNA sequences in vitro and in vivo including the CACCC element in the -globin proximal promoter47. It is believed that KLF3 acts as a feedback repressor of a set of genes that are activated by KLF148.

1.5.1.2 BCL11A and SOX6 The zinc finger protein B-cell lymphoma 11A (BCL11A) is recognized as one of the main repressors of foetal haemoglobin49. A genome-wide association study has revealed a correlation between single nucleotide polymorphisms (SNPs) in the BCL11A gene and high

Page | 9

CHAPTER 1: Introduction

HbF levels identifying BCL11A as a modulator of HbF expression50. After its discovery, BCL11A’s function in erythropoiesis and globin switching was studied in more detail by creating Bcl11a knockout mice. Transgenic mice carrying the human -globin locus fail to entirely silence foetal -globin expression in the absence of BCL11A51. Furthermore, inactivation of BCL11A in a sickle cell disease mouse model was sufficient to activate HbF expression and correct the disease52.

BCL11A is believed to silence -globin transcription through cooperation with SOX653. SOX6 is a member of the Sry-related HMG box transcription factor family and has been shown to bind to the embryonic and foetal globin promoters to silence their expression54. Although knockdown of SOX6 slightly elevated foetal haemoglobin levels in primary human erythroblasts, a much bigger effect was described upon knockdown of both BCL11A and SOX6. Thus it was suggested that BCL11A and SOX6 act cooperatively in silencing -globin expression through long-range interactions involving chromatin looping53.

1.5.1.3 GATA1 The zinc finger protein GATA1 is a transcription factor that recognises GATA motifs55 and is one of the master regulators in both erythropoiesis and megakaryopoiesis56. It has been shown to bind to regulatory regions of many erythroid genes including hypersensitive sites in the -globin LCR and the adult -globin promoter57,58. Genome-wide DNA binding studies have revealed that GATA1 tends to bind and act in synergy with the transcription factors KLF1 and T cell acute lymphocytic leukaemia 1 protein (TAL1, also known as SCL)59–61. When found in complexes with TAL1, GATA1 mainly acts as an activator46. However, depending on the gene context and interacting partners, GATA1 can also act as a repressor62,63.

There is evidence that GATA1 works in conjunction with other regulatory factors, such as NF-E2, Ikaros, Friend of GATA-1 (FOG1) and the nucleosome remodelling and deacetylase (NuRD)complex in a stage specific manner to silence -globin64,65. A region upstream of the -globin promoters is important for silencing the gene and was shown to be bound by GATA1 in a FOG1 dependent manner. Furthermore, recruitment of HDAC1 and Mi-2 (part of NuRD) to the -globin promoter enhances GATA1 binding to regulatory regions in the β-globin locus66.

1.5.1.4 ZBTB7A Zinc finger and BTB domain containing 7A (ZBTB7A, also known as LRF, POKEMON, FBI-1) is another transcription factor that has been shown to play a role in erythropoiesis. It is a member of the broad complex, tramtrack, bric-à-brac and zinc finger (BTB-ZF) family of

Page | 10

CHAPTER 1: Introduction

transcription factors67. The first studies on ZBTB7A focused on its role as an oncogene in many different cancers and its function in B-cell development68,69. Later it was discovered that ZBTB7A suppresses apoptosis during erythroid cell fate decision by repressing the proapoptotic factor Bim (BCL2L11). Hence, Zbtb7a knockout mice die embryonically in utero due to severe anaemia caused by an increase in apoptosis in erythroblasts70.

By binding to the consensus sequence 5’-G(A/G)GGG(T/C)(C/T)(T/C)(C/T)-3’71 ZBTB7A mainly acts as a repressor through recruiting the NuRD chromatin remodelling complex. Recently it was shown that ZBTB7A represses foetal haemoglobin through a mechanism that is independent of BCL11A72. In that study it was shown that knock-out of Zbtb7a in transgenic -YAC mice resulted in reactivation of human foetal haemoglobin. Similar effects were seen upon knock-down of ZBTB7A in human CD34+ haematopoietic stem and progenitor cells (HSPCs) indicating that ZBTB7A is a potent repressor of foetal haemoglobin72. Interestingly ZBTB7A expression is, similarly to that of BCL11A, driven by GATA170 and KLF173.

1.6 Haemoglobinopathies Haemoglobinopathies are defined as genetic disorders affecting the haemoglobin genes. They can result either from the production of an abnormal globin chain due to a genetic mutation in the gene or from the lack of globin chain production resulting in thalassemias.

1.6.1 -haemoglobinopathies Genetic disorders affecting the -globin like genes are known as -haemoglobinopathies. Two of the most common haemoglobinopathies are sickle-cell disease (SCD) and -thalassemia with about 250 000 babies born with SCD yearly74 and 1 in 100 000 individuals showing symptoms of -thalassemia annually75.

SCD is a caused by a single genetic mutation in the coding sequence of the -globin chain. The point mutation results in an amino acid substitution at position 6 in the -globin protein leading to the formation of a mutant -globin chain that forms haemoglobin S (HbS) when paired with -globin. HbS is biochemically unstable and tends to polymerise when deoxygenated forming long precipitates of haemoglobin molecules which force the erythrocytes into a long, sickled shape76,77. Clinically the disease manifests through vascular occlusion caused by the sickle shaped red blood cells. This leads to tissue injury and haemolysis both contributing to multi-organ dysfunction76. Even with current treatments, SCD usually results in early death with a decreased life expectancy of about 25 to 30 years compared to the average population78.

Page | 11

CHAPTER 1: Introduction

Unlike SCD, -thalassemia is not due to a single mutation but can have diverse genetic causes with over 200 different mutations described79. -thalassemias are characterised by anomalies in the expression of -globin with varying levels of severity of the disease between individuals. -thalassemia major (0) is the most severe type with virtually no -globin chain production. Patients with -thalassemia intermedia (+) present with reduced levels of -globin and people with -thalassemia minor are symptom-free (carrier state)75. The clinical severity of the disease is directly related to the extent of imbalance of - to -globin chains as unassembled -globin chains precipitate causing oxidative damage and cell death. The premature cell death of erythrocytes leads to a compensatory increase in haematopoiesis and thus patients with -thalassemia have a dramatically larger number of erythroid precursor cells in the bone marrow80. This is defined as ineffective erythropoiesis and results in both, anaemia and bone marrow expansion leading to extensive bone deformities. The defective development of erythrocytes also induces extramedullary haematopoiesis which results in enlargement of the spleen and liver.81.

Both SCD and -thalassemia are autosomal recessive disorders with a clinically silent carrier state when heterozygous. In fact, heterozygotes have a selective advantage in certain areas of the world because they are less susceptible to be infected by the parasite Plasmodium falciparum which causes malaria82.

1.6.2 Current treatments

Currently the only readily available curative treatment for -haemoglobinopathies is a haematopoietic stem cell transplant from a matched donor. If the transplant is successful it results in disease-free survival post transplantation in about 90 % of the cases. However, less than 15 % of patients have access to suitable bone marrow and transplants with matched but unrelated donors are limited by donor availability and complications like graft rejection and graft-versus-host disease83.

Other strategies for treatment of these diseases aim to manage the symptoms. Regular blood transfusions are used to correct the anaemia. In both, SCD and -thalassemia patients this can result in transfusional iron overload and therefore needs to be accompanied with chelation therapy to prevent liver and heart damage. Nevertheless, liver failure due to excess iron is still one of the most frequent reasons for early death in 0 patients84.

As all -like globins are able to pair with  chains to form functional haemoglobin but only the adult -globin is genetically affected in -haemoglobinopathies, the induction of the

Page | 12

CHAPTER 1: Introduction

silent foetal haemoglobin poses an attractive therapeutic option. Many clinical and laboratory studies have shown that high foetal haemoglobin levels can ameliorate the symptoms. For instance, it was noted that individuals with -thalassemia who show abnormally high HbF levels generally show a milder course of disease85. Additional studies have demonstrated that HbF interferes with the sickling properties of HbS86. These findings lead to the conclusion that high HbF levels are beneficial. Accordingly, many drugs have been developed to induce the expression of foetal haemoglobin postnatally. 5-azacytidine87, hydroxyurea88 and butyrate89 are the most commonly used drugs. While hydroxyurea and butyrate act through non-specific activation of HbF expression, 5-azacytidine was believed to elevate HbF expression by decreasing DNA methylation of the foetal globin genes87 which, however, was later refuted90. Although quite effective, use of each of these drugs results in various side effects and their long-term clincial efficacy still needs to be determined77.

As monogenic diseases that affect the blood system, -haemoglobinopathies have long been considered to be an ideal candidate for gene therapy91. Bone marrow is readily removable from a patient and retransplantable and therefore has a significant advantage over gene therapy in other tissues. So far technological challenges have prevented the clinical application of gene therapy as a broadly used curative treatment. With the recent advances in genome editing technologies the idea of gene therapy to cure -haemoglobinopathies has moved back into focus92 and a clinical trial targeting HSCs ex vivo has been started for -thalassemia patients93.

1.7 Hereditary persistence of Foetal Haemoglobin (HPFH) Normally HbF levels are downregulated to about 1 % of total haemoglobin in adult life. In people that have a condition called Hereditary Persistence of Foetal Haemoglobin (HPFH) the silencing of foetal haemoglobin is incomplete and results in HbF levels higher than the normal 1 %94. HPFH can be classified into two categories: Deletional and non-deletional. This classification is based on the molecular causes for elevated HbF levels which are described below.

1.7.1 Deletional HPFH Deletional HPFH is associated with the deletion of large regions of DNA between the  and  globin genes within the -globin locus. At least six different deletions have been described that result in high levels of HbF95 (Figure 1.4).

Page | 13

CHAPTER 1: Introduction

Figure 1.4: Deletional HPFH. Shown are the complete -globin locus and six deletional HPFH mutations that have been described. Bars indicate the location and size of deletion. Adapted from Forget BG, 199895.

All of the deletions comprise the loss of - and -globin genes and in case of HPFH-6 also the A-globin gene thus resulting in ()0-thalessemia or (A0-thalassemia, respectively. Deletions in HPFH-1 to -5 lead to upregulation of both G- and A-globin genes whereas in HPFH-6 only G chains are present. It is not entirely clear why these deletions result in high HbF. It is however widely believed that enhancer elements from the 3’ regions of the -globin gene become juxtaposed to the -globin promoters by the deletion and now drive -globin expression due to their newly obtained proximity96,97.

1.7.2 Non-deletional HPFH The second type of HPFH is referred to as non-deletional HPFH and will be the main focus of this thesis. Unlike the deletional HPFH type, this type of HPFH is not associated with any large deletions in the -globin locus and is characterised by overexpression of either one of the two -globin genes95. Sequence analyses of individuals with this condition revealed small genetic changes in the promoter regions of either their G- or A-globin genes that link with the overexpression of the respective gene98. Often these small indels or point mutations are only discovered during genetic screens for haemoglobinopathies by investigating unaffected family members of the patient underlining the fact that they are benign and do not adversely affect the individual. However, when they occur in combination with other genetic disorders of the -globin locus such as SCD and -thalassemia they become beneficial. Patients with SCD or -thalassemia who have also inherited non-deletional HPFH show milder symptoms due to high levels of HbF which compensates for the adult haemoglobin deficiencies and thus ameliorates the severity of disease99–101.

Many point mutations in the -globin promoter have been described that are associated with HPFH. Table 1.1 summarizes the reported location of the mutations relative to the transcription start site and HbF levels of the individuals carrying the mutation. Interestingly

Page | 14

CHAPTER 1: Introduction

the mutations are clustered around two distinct sites: the -200 and the -115 site. The -175 mutation stands out as a lone mutation not surrounded by other known HPFH mutations.

Table 1.1: List of non-deletional HPFH mutations in the -globin promoter. Highlighted in red are elevated HbF levels caused by the respective mutations. N/A indicates that data are unavailable for the genotype listed.

% HbF in % HbF in 5' to TSS Gene Mutation Ref homozygote heterozygote C>T N/A 2.6-4.8 102 A C>G N/A 38 101

-114 C>G N/A 11-14 103 G C>T N/A 8.6 104 A 115 cluster 115 -117  G>A 20-26 10-20 105 - -102 13 bp A N/A 30-32 to -114 deletion 106

A -175  T>C N/A 17-29 107 -175 site G T>C N/A 38-41 108 -195 A C>G N/A 4.5-7 109 A C>T N/A 12-38 110,111 -196 G C>T N/A 8.6 112 -197 A C>T N/A 6 94 -198 A T>C 18-21 1.8-13 113,114 200 cluster 200 - -201 A C>T N/A 10.2 112 A C>T N/A 1.6-23 115 -202 G C>G N/A 15-25 116

Although most of the HPFH mutations were discovered in the early or late 80s, the molecular basis for -globin activation by the mutations is still not understood. The fact that the mutations form two dense clusters around the -115 and the -200 sites has led to the hypothesis that they disrupt binding of a transcriptional repressor which is necessary for -globin silencing. Mutating any of the bases in the DNA binding motif of the repressor may permanently derepress -globin and result in elevated HbF levels. The -175 T>C site is different as there are no other known HPFH mutations within 20 bp of the site and it has been hypothesised that it may be a gain-of-function mutation creating a new transcription factor binding site for a transcriptional activator.

Understanding the molecular basis of these HPFH mutations is important for therapeutic reasons. People with HPFH clearly demonstrate that these beneficial mutations alleviate the

Page | 15

CHAPTER 1: Introduction

symptoms of SCD and -thalassemia. Revealing how these mutations work may help shed light on molecular networks of transcription factors in globin switching and may lead to advances in targeted drug development. However, even more interestingly, HPFH mutations could be directly utilised therapeutically by introducing them into patients with haemoglobinopathies via genome editing (Chapter 1.8) to ameliorate symptoms.

1.8 Genome editing For many years researchers have been fascinated with the idea of making site-specific small or large changes in the genome of living cells. When it was discovered that cells have endogenous DNA repair mechanisms to fix otherwise lethal DNA double-strand breaks (DSBs) the idea of genome editing became more feasible117.

Site specific DNA nucleases can find a target sequence in the genome to induce a DSB. This increases the frequency of DNA repair events118,119 and can be exploited for genome editing. Basically the first step to targeted genome editing is to introduce a DNA-nuclease into cells that specifically recognises the desired site in the genome and induces a double strand break at that specific locus. The second step then relies on the natural repair mechanisms of the cell. Mammalian cells have at least two different pathways to repair the break: Non-homologous end joining (NHEJ) and homologous-directed repair (HDR)120. An outline of the process of genome editing in mammalian cells is depicted in Figure 1.5. Both, targetable nucleases and DNA repair pathways will be discussed in this chapter.

Page | 16

CHAPTER 1: Introduction

Figure 1.5: Schematic of a genome editing strategy. The first step in genome editing is the induction of a DSB by a targetable nuclease. In the second step there are two possible repair pathways that can occur: NHEJ and HDR. NHEJ results in small random insertions and deletions whereas HDR results in a precise change of the targeted DNA.

1.8.1 DNA Nucleases One focus of genome editing research was to engineer proteins that recognise a specific DNA sequence and also have an effective DNA nuclease domain. Here I will discuss three types of engineered nucleases: zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs) and the CRISPR/Cas9 system.

1.8.1.1 Zinc finger nucleases (ZFNs) The first nucleases that were engineered to enhance the efficiency of gene targeting are

ZFNs. They are composed of a DNA recognition domain containing three or four Cys2-His2

121,122 zinc finger modules and a non-specific DNA cleavage domain . Each Cys2-His2 zinc finger recognises approximately 3 bp of DNA allowing a ZF protein (ZFP) with four ZFs to recognize a sequence of 12 bases in total123. The ZF modules can be engineered to bind to any 3 bp sequence allowing researchers to design a ZFP that binds specifically to the desired target sequence in the genome. To allow DNA cleavage ZFNs are often coupled to the DNA cleavage domain of the FokI endonuclease. FokI is a bacterial type IIS restriction enzyme and when its cleavage domain is uncoupled from its original DNA binding domain it can be linked to other DNA-binding proteins to cleave double-stranded DNA122,124. However, it was

Page | 17

CHAPTER 1: Introduction

found that dimerisation of the endonuclease domain of FokI is required for effective cleavage125. Thus, by designing a pair of ZFPs it is possible to specifically target a unique genomic sequence of 24 bp (Figure 1.6).

Figure 1.6: Schematic of a pair of ZFNs bound to their target DNA. Each ZFP is designed to recognise a 12 bp sequence on either side of the desired DSB. When their FokI domains dimerise a DSB is made. Adapted from Urnov et al. 2010123.

1.8.1.2 Transcription activator-like effector nucleases (TALENs) Almost a decade after the discovery of ZFNs, transcription activator-like effectors (TALEs) started to be described in the literature126,127. They were found in a range of plant species that have been infected with pathogenic bacteria of the genus Xanthomonas128,129 and work as key virulence factors to reprogram host cells by mimicking eukaryotic transcription factors128. Their target specificity relies on a variable number of 34 amino acid repeats with each repeat recognizing one base pair of a DNA sequence130,131. The TALE repeat domains are highly conserved and only differ by a few amino acids. Two hypervariable amino acids at positions 12 and 13 in each individual repeat have been shown to be positioned in the major groove of DNA when the repeat binds its target132. The residue at position 13 can specifically interact with DNA and the combination of residue 12 and 13 determines its affinity for guanine, adenine, cytosine or thymine, respectively133,134. Table 1.2 summarizes combinations of hypervariable residues at position 12 and 13 and their preferred base for interaction.

Table 1.2: Hypervariable residues in TALEs and their base recognition specificity.

Hypervariable Base residues recognition NN, NK, NH G NI A HD C NG T

Page | 18

CHAPTER 1: Introduction

Similarly to ZFNs the DNA binding domain of a TALE can be coupled to a FokI nuclease domain resulting in TALE-nuclease (TALEN). By assembling arrays of 34 aa repeats to facilitate binding to a desired target sequence, a pair of TALENs that recognises any unique sequence in the genome can be engineered. Like a pair of ZFNs, pairs of TALENs bind and cleave DNA as a dimer133 (Figure 1.7).

Figure 1.7: Schematic of a pair of TALENs binding to its target. Each TALEN recognises a unique sequence on either side of the desired DSB. By dimerisation of their FokI domains a DSB can be created.

Both, ZFNs and TALENs represent useful systems to generate targeted double-strand breaks in the genome. However, the modularity of TALENs makes them more versatile and easier to assemble than ZFNs.

1.8.2 The CRISPR/Cas9 system The CRISPR/Cas9 system of DNA cleavage works differently to that of ZFNs and TALENs. Unlike ZFNs and TALENs the CRISPR/Cas9 system does not rely on protein/DNA contact to bind to DNA but instead utilises simple Watson-Crick base pairing between an RNA molecule and the genomic DNA.

CRISPR systems are naturally found as adaptable immune systems in many bacteria species to protect the cell from invading nucleic acids like viruses or plasmids135–138. When a host cell is attacked the CRISPR system will incorporate DNA sequences from the foreign DNA between Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) sequences in the host genome. The foreign DNA is then transcribed into a CRISPR RNA (crRNA) before hybridising with another RNA, the transactivating CRISPR RNA (tracrRNA)139. This then forms a complex with a CRISPR-associated (Cas) nuclease guiding it to complementary sequences in the foreign DNA. If the target sequence is adjacent to a short sequence known as protospacer adjacent motif (PAM), the Cas nuclease will introduce a double-strand break at that locus to protect the bacteria from the invading nucleic acid140.

The CRISPR/Cas system from S. pyogenes has been exploited for genome editing140. The first component of this engineered system is known as the guide RNA (gRNA), a fusion of a crRNA

Page | 19

CHAPTER 1: Introduction

and a fixed tracrRNA. The second component is the Cas protein 9 (Cas9) nuclease which is able to complex with the gRNA to form an active Cas9/gRNA complex (Figure 1.8). The 20 bp sequence in the crRNA component of the gRNA is interchangeable and can be chosen depending on the desired target sequence in the genome. The only limitation is that the target sequence must be 5’ adjacent to the short PAM sequence NGG. Thus the CRISPR/Cas9

120 system can be used to cleave any site in the genome with N20-NGG .

Figure 1.8: Schematic of the CRISPR/Cas9 complex binding and cleaving its target. The Cas9 protein assembles with a guide RNA to form a functional targetable nuclease. It finds it target through complementary base-pairing of the 20-nt crRNA with the target sequence in the genome. If there is an adjacent PAM sequence present, a DSB is made. Adapted from Charpentier & Doudna, 2013141.

The ease of use and the low cost of only altering the 20 bp in the crRNA has made the CRISPR/Cas9 system the genome editing method of choice for many researchers over the last three years.

1.8.2.1 Off-target effects As the genomes of eukaryotes are large, there are many opportunities for a nuclease to bind and cleave DNA outside of their target site. Therefore binding specificity is an important factor to contemplate in genome editing especially when considering therapeutic applications. Studies have shown that all nucleases, ZFNs142,143, TALENs144,145 as well as CRISPR/Cas9s145–147, show off-target activity to a certain extent. Software prediction of potential off-target sites can help to avoid extensive cleavage in undesired places of the genome. Additionally, researchers have engineered improved nucleases that enhance on-target binding and reduce off-target cleavage144,148–150.

Page | 20

CHAPTER 1: Introduction

1.8.3 DNA repair mechanisms Once a double-strand break has occurred the cell will initiate DNA repair mechanisms to avoid cell death. In eukaryotic cells there are two major pathways for the repair of DSBs: Non-homologous DNA end joining (NHEJ) and Homology-directed repair (HDR)151.

1.8.3.1 NHEJ In humans and mice NHEJ is the more common DNA repair pathway and can occur at any time of the cell cycle. Following a DSB the DNA ends need to be modified to make the broken ends suitable to be joined together. Most frequently a few bases are lost or inserted at each end of the DSB before the ends are fused. The resulting sequence will be imperfect and slightly deviant from the original sequence making this pathway an imprecise but quick method of DNA repair151.

In genome editing NHEJ is mainly employed to generate knock-out cells or knock-out mice. A nuclease targeting the coding sequence of a gene of choice is introduced into the cell cleaving the DNA at its target site. Due to the imprecise repair mechanism of NHEJ the cell fixes the DSB with random insertions and deletions. Frameshift mutations or larger deletions then result in premature stop codons or degradation of the protein, creating a homozygous knock-out phenotype when biallelic.

1.8.3.2 HDR The frequency of HDR is much lower than that of NHEJ. While NHEJ can occur at any time point in the cell cycle, HDR is mostly limited to G2 and S phase152,153. HDR is a DNA repair mechanism that results in precisely repaired DNA sequences because the DSB is fixed using a repair template. In diploid cells the repair template is the sister chromatid which is in close proximity during mitosis. When DSBs are detected by the DNA repair machinery and the cell prepares for HDR, the broken ends of DNA are processed to form an extended single-stranded DNA (ssDNA) overhang154. Next, a homology search and DNA-strand invasion is performed followed by the formation of a DNA loop from the sister chromatid which initiates DNA synthesis. The ssDNA overhang anneals with the homologous DNA and the DSB is repaired by filling in the missing base pairs according to the template strand152.

This natural repair mechanism can be exploited in genome editing when a precise change in the DNA sequence is needed. In addition to the nuclease that creates the targeted DSB the cell is also supplied with a DNA template incorporating the desired DNA change. The template needs arms of homology on either side of the modification ranging from 80 bp to 1-2 kb. Researchers have found that circular double-stranded DNA, linear PCR products155

Page | 21

CHAPTER 1: Introduction

and also single-stranded DNA in form of oligonucleotides156 can be used as a donor template for HDR in genome editing.

1.8.4 Applications Genome editing has led to the investigation of many genetic features. The rapid establishment of knockout or mutant animal models is one of the main advances that has come with improvements in genome editing157,158. However, the applications of targetable nucleases have become far more extensive. Engineered inactive mutants of the Cas9 protein tethered to repression or activation domains allow researchers to fine tune expression levels of their gene of interest159,160. In other studies the inactive Cas9 has been tethered to fluorescent proteins like EGFP enabling the visualization of DNA on a microscopic scale in living cells161 and tracking of endogenous RNA molecules162.

1.9 Gene therapy Gene therapy is a field of medicine with the potential to treat or cure genetic diseases. This can be achieved by introducing foreign DNA into the patient’s cells to repair their mutation or by replacing the patient’s dysfunctional gene. Although this seems both attractive and straight forward, there are many hurdles that have to be overcome to put gene therapy into practice. Firstly, the genetic information has to be effectively delivered to the affected tissue where the therapeutic molecule must be expressed in sufficient quantities without disturbing the normal cell functions163. Secondly, gene transfer must be efficient enough to create a large number of permanently edited cells to be able to alleviate the symptoms of the disease.

Haematopoietic stem cell gene therapy has many advantages over gene therapy targeting other organs or tissues of the human body. HSCs have the potency of recreating the entire haematopoietic system and are readily accessible from the bone marrow of the patient. They can be isolated from the patient’s body, genetically modified and expanded ex vivo and then reintroduced by HSC transplantation164. Clinical trials targeting HSCs ex vivo have been performed to treat diseases like -thalassemia93, Wiskott–Aldrich syndrome165,166, adrenoleukodystrophy167,168 and metachromatic leukodystrophy169 all of which used lentiviral vectors as the mode of gene transfer. Although most of the trials have had successful outcomes and a benefit gain for the patient, the use of lentiviral vectors bears certain risks170. Integration of lentiviruses into host DNA may not be random but seems to be enriched in active genes and near cancer-related genes171. Additionally, it is not limited to one

Page | 22

CHAPTER 1: Introduction

integration event per cell which can eventually lead to the formation of ‘dead-end’ circular DNA molecules that may be toxic to the cell172.

With the advances in gene-targeting technologies it is now possible to use a more elegant and targeted approach for gene therapy: genome editing173. The gene therapeutic potential of gene editing by homology-directed repair is readily apparent. For example, diseases that are caused by point mutations can simply be corrected by designing nucleases that create a DSB close to the mutation. After introducing them into the affected cell along with a donor template DNA that contains the wildtype DNA sequence, the cell repairs the DSB by HDR and the mutation is permanently corrected174. This has been successfully demonstrated in human CD34+ haematopoietic stem cells to correct the sickle cell mutation175 making the treatment of SCD by genome editing possible. NHEJ can be applied in cases when the disruption of a genetic element leads to a clinical benefit for the patient. Those genetic elements can include coding sequences, regulatory elements like promoters and enhancers and other intergenic or intronic regions173. For example, downregulation of the repressor BCL11A activates -globin expression which is beneficial in -thalassemia. Disrupting BCL11A or its regulatory elements in a -thalassemia patient could permanently derepress -globin176,177. In another study, researchers have successfully treated human immunodeficiency virus (HIV) patients by targeting their Ccr5 gene with ZFNs in T-cells exploiting the knowledge that the disruption of Ccr5 leads to resistance of HIV infection178. This study is now going into phase II clinical trials173.

1.10 Aims and hypothesis Having high foetal haemoglobin levels in adult life has been shown to be beneficial in patients with -haemoglobinopathies. People with non-deletional HPFH naturally have high foetal haemoglobin levels caused by small changes in the -globin promoter. We hypothesise that these mutations could be utilised to elevate foetal haemoglobin levels in patients with -thalassemia to ameliorate and possibly cure the disease. Hence, the aims of this thesis were the following.

Firstly, we aimed to investigate the molecular mechanisms that underlie non-deletional HPFH. Understanding why silencing of -globin fails due to HPFH mutations may help unravel the importance of these regulatory elements in normal globin switching. There are three main clusters of mutations in the -globin promoter. Two different sites of HPFH mutations were studied in this thesis, the -175 site and the -200 cluster. This was done by investigating DNA-binding properties of candidate transcription factors to the regions of

Page | 23

CHAPTER 1: Introduction

interest. Introducing the mutations into various cellular models by genome editing allowed us to observe TF binding in vivo.

Secondly, by introducing naturally occurring HPFH mutations into cellular models we performed a proof-of-principle study for the use of HPFH mutations in gene therapeutical applications. We determined promoter activity and expression levels of -globin to investigate if the HPFH mutations introduced by gene editing can reproduce high levels of foetal haemoglobin in erythroid cells.

Page | 24

CHAPTER 2: Material and Methods

2 CHAPTER 2: Material and Methods

2.1 Materials

2.1.1 Chemicals and Reagents Below is a list of chemicals and reagents used in the experimental work performed in this thesis, along with details of their suppliers or manufacturers. All chemicals and reagents used were of “molecular biology grade” unless specified otherwise.

. acetic acid (Asia Pacific Specialty Chemicals)

. acrylamide (electrophoresis grade) (Sigma Aldrich, Australia)

. adenosine triphosphate (ATP) (Sigma Aldrich, Australia)

. adenosine 5’-[-32P] triphosphate ([-32P] ATP) (PerkinElmer Life Sciences, Boston,MA, USA)

. agarose (DNA grade) (Progen Industries)

. agar (Amyl Media)

. albumin, bovine serum, fraction powder V (BSA) (Sigma Aldrich, Australia)

. ampicillin sodium salt (Progen Industries)

. aprotinin (Sigma Aldrich, Australia)

. -mercaptoethanol (Sigma Aldrich, Australia)

. boric acid (Sigma Aldrich, Australia)

. buffer R (Neon Transfection Kit, Thermo Fisher Scientific, Scoresby, VIC, Australia)

. buffer T (Neon Transfection Kit, Thermo Fisher Scientific, Scoresby, VIC, Australia)

. calcium chloride (Sigma Aldrich, Australia)

. casein peptone (Amyl Media)

. chloroform (Biolab Scientific)

. deoxynucleotide triphosphates (dNTPs) (Sigma Aldrich, Australia)

. dexamethasone (Sigma Aldrich, Australia)

Page | 25

CHAPTER 2: Material and Methods

. diethylpyrocarbonate (DEPC) (Sigma Aldrich, Australia)

. dimethylsulfoxide (DMSO) (Sigma Aldrich, Australia)

. dipotassium hydrogen orthophosphate (Sigma Aldrich, Australia)

. dithiothreitol (DTT) (Sigma Aldrich, Australia)

. doxycline hyclate (doxycline) (Sigma Aldrich, Australia)

. Dulbecco’s modified Eagle medium (DMEM) (high and low glucose) (Gibco-BRL

Thermo Fisher Scientific, Scoresby, VIC, Australia) . Dynabeads Protein G (Thermo Fisher Scientific, Scoresby, VIC, Australia)

. ethanol (Ajax Finechem)

. ethidium bromide (Roche Molecular Biochemicals)

. ethylenediaminetetraacetic acid (EDTA) (Sigma Aldrich, Australia)

. ethylene glycol-bis[2-aminoethylether]-N,N,N’,N’-tetraacetic acid (EGTA) (Sigma Aldrich, Australia)

. ethylene glycol bis(succinimidyl succinate) (EGS) (Sigma Aldrich, Australia)

. Ficoll® 400 (Sigma Aldrich, Australia)

. foetal calf serum (FCS) (Thermo Fisher Scientific, Scoresby, VIC, Australia)

. formaldehyde (Sigma Aldrich, Australia)

. FuGENE® 6 transfection reagent (Thermo Fisher Scientific, Scoresby, VIC, Australia)

. GeneRuler™ DNA ladder mix (Thermo Fisher Scientific, Scoresby, VIC, Australia)

. glycerol (Asia Pacific Specialty Chemicals)

. glycine (Sigma Aldrich, Australia)

. N-[2-hydroxyethyl]piperazine-N’-[2-ethanesulfonic acid] (HEPES) (Roche Molecular Biochemicals)

. imidazole (Sigma Aldrich, Australia)

. isopropanol (Biolab Scientific, Australia) Page | 26

CHAPTER 2: Material and Methods

. Iscove’s MDM (IMDM) (Thermo Fisher Scientific, Scoresby, VIC, Australia)

. kanamycin sulfate (Sigma Aldrich, Australia)

. leupeptin (Sigma Aldrich, Australia)

. lithium acetate dihydrate (Sigma Aldrich, Australia)

. lithium chloride (Sigma Aldrich, Australia)

. magnesium chloride (Sigma Aldrich, Australia)

. methanol – Sigma Aldrich, Australia

. 3-[N-Morpholino]propanesulfonic acid (MOPS) (Sigma Aldrich, Australia)

. nickel-nitrilotriacetic acid (Ni-NTA) affinity resin (Qiagen, Chadstone, VIC, Australia)

. N-lauroylsarcosine (Sigma Aldrich, Australia)

. Nocodazole (Sigma Aldrich, Australia)

. NP-40 (Igepal CA-630) (Sigma Aldrich, Australia)

. nucleoside triphosphates (NTPs) (Roche Molecular Biochemicals)

. NuPAGE™ Novex™ 10 % Bis-Tris Protein Gels (Thermo Fisher Scientific, Scoresby, VIC, Australia)

. t-octylphenoxypolyethoxyethanol (Triton X-100) (Sigma Aldrich, Australia)

. penicillin, streptomycin and glutamine solution (Gibco-BRL Thermo Fisher Scientific, Scoresby, VIC, Australia)

. phenol:chloroform:isoamyl alcohol (25:24:1) (Progen Industries)

. phenylmethylsulfonyl fluoride (PMSF) (Sigma Aldrich, Australia)

. phosphate-buffered saline (PBS) tablets (Sigma Aldrich, Australia)

. poly(dI-dC) (Amersham Pharmacia Biotech, Little Chalfont, Buckinghamshire, UK)

. polyoxyethylenesorbitanmonolaurate (Tween™-20) (Sigma Aldrich, Australia)

. potassium chloride (Sigma Aldrich, Australia)

Page | 27

CHAPTER 2: Material and Methods

. potassium glutamate (L-Glutamic acid potassium salt monohydrate) (Sigma Aldrich, Australia)

. potassium hydroxide (Sigma Aldrich, Australia)

. propidium iodide (PI) (Thermo Fisher Scientific, Scoresby, VIC, Australia)

. Rainbow™ protein size standards (Amersham GE Healthcare Life Sciences)

. RNase-Free DNase Set (Qiagen, Chadstone, VIC, Australia)

. RNeasy mini Plus kit (Qiagen, Chadstone, VIC, Australia)

. RPMI 1640 medium (Gibco-BRL Thermo Fisher Scientific, Scoresby, VIC, Australia)

. Schneider's Drosophila Medium (Gibco-BRL Thermo Fisher Scientific, Scoresby, VIC, Australia)

. StemSpan™ Serum-Free Expansion Medium (SFEM) (Stemcell Technologies, Tullamarine, VIC, Australia)

. skim milk powder (No Frills)

. sodium acetate (Sigma Aldrich, Australia)

. sodium chloride (Sigma Aldrich, Australia)

. sodium dihydrogen orthophosphate (Sigma Aldrich, Australia)

. sodium-deoxycholate (Sigma Aldrich, Australia)

. sodium dodecyl sulfate (lauryl sulfate sodium salt) (SDS) (Sigma Aldrich, Australia)

. sodium hydrogen carbonate (Asia Pacific Specialty Chemicals)

. sodium hydroxide (Sigma Aldrich, Australia)

. SYBR green PCR master mix (Applied Biosystems, Foster City, CA, USA)

. TO-PRO®-3 live/dead stain (Thermo Fisher Scientific, Scoresby, VIC, Australia)

. tris-hydroxymethyl-methylamine (Tris) (Sigma Aldrich, Australia)

. TRI-REAGENT™ (Sigma Aldrich, Australia)

Page | 28

CHAPTER 2: Material and Methods

2.1.2 Enzymes . alkaline phosphatase (from calf intestine) (Genesearch, Arundel, Qld, Australia)

. proteinase K (Genesearch, Arundel, Qld, Australia)

. Q5 High-Fidelity DNA Polymerase (Genesearch, Arundel, Qld, Australia)

. ribonuclease A (RNase A) (Sigma Aldrich, Australia)

. T4 DNA ligase (Genesearch, Arundel, Qld, Australia)

. T4 polynucleotide kinase (Genesearch, Arundel, Qld, Australia)

. type II restriction endonucleases (Genesearch, Arundel, Qld, Australia)

2.1.3 Antibodies Antibodies used for Western blots, chromatin immunoprecipitation and supershift are listed below. A rabbit anti-KLF1 polyclonal antibody (anti-KLF1) (raised against amino acids 1-114) was used for KLF1 supershift in EMSAs and has been described previously47. The ChIP experiments were performed with the KLF1 antibody listed below.

2.1.3.1 Primary antibodies

. anti--actin (Cat#A1978, Sigma Aldrich, Australia) . anti-Flag (Cat# f3165, Sigma Aldrich, Australia) . anti-GATA1 (Cat# sc-265 X, Santa Cruz Biotechnology, Santa Cruz, CA) . anti-His (Cat# MA1-21315, Thermo Fisher Scientific, Scoresby, VIC, Australia) . anti-KLF1 (Cat# PA5-18031, Thermo Fisher Scientific, Scoresby, VIC, Australia) . anti-KLF3 (Cat# PA5-18030, Thermo Fisher Scientific, Scoresby, VIC, Australia) . anti-LDB1(Cat# sc-11198X, Santa Cruz Biotechnology, Santa Cruz, CA) . anti-LMO2 (Cat# AF2726, R&D Systems, Minneapolis, MN) . anti-SP1(Cat# sc-17824X, Santa Cruz Biotechnology, Santa Cruz, CA) . anti-TAL1 (Cat# sc-12984X, Santa Cruz Biotechnology, Santa Cruz, CA) . anti-rabbit IgG (Cat# sc-2027, Santa Cruz Biotechnology, Santa Cruz, CA) . anti-goat IgG (Cat# sc-2028, Santa Cruz Biotechnology, Santa Cruz, CA)

2.1.3.2 Secondary antibodies

 ECL™ Anti – Mouse IgG (Cat# NA931V, GE Life Sciences, UK)

2.1.4 Cytokines  recombinant human stem cell factor (SCF) (Peprotech, NJ, USA) Page | 29

CHAPTER 2: Material and Methods

 recombinant human erythropoietin (EPO) (Peprotech, NJ, USA)

2.1.5 Plasmids Plasmids that were used for protein overexpression in Cos and SL2 cells are listed in the table below.

Table 2.1: Mammalian and insect cell expression plasmids.

Gene Species of origin Vector Made/Supplied by KLF1 mouse pSG5 J. Bieker KLF1 mouse pPac S. Orkins KLF1 (261-376 aa) mouse pcDNA3 M. Crossley KLF2 mouse pc3HA L. Glimcher KLF3 mouse pMT2 J. Turner KLF3 Mouse pPac A. Funnell mouse pMT3 V. Yang KLF5 human N/A N/A KLF6 human pX N. Koritschoner KLF7 mouse pCMV-Tag1-Flag F. Ramirez KLF8 human pcDNA3-Flag N/A KLF9 rat pRSV Y. Fujii-Kuriyama KLF10 human pCDNA4/TO.Flag T. Spelsberg KLF11 human pCDNA4/TO.Flag T. Spelsberg KLF12 mouse pCMX-PL1 R. Buettner KLF13 mouse pSG5 H. Asano GATA1 mouse pRcCMV N/A ZBTB7A (370-500aa) human pCDNA3-Flag A. Funnell

2.1.6 Oligonucleotides Single stranded oligonucleotides up to 100 bp were synthesised by Sigma-Aldrich, NSW, Australia. Single stranded oligonucleotides longer than 100 bp were synthesised by Integrated DNA Technologies (IDT), IA, USA. The names and sequences of oligonucleotides used are listed in the appendix.

Page | 30

CHAPTER 2: Material and Methods

2.1.7 Bacterial strains and culture The bacterial strain used for all subcloning was Escherichia coli alpha select silver efficiency

- - + - (genotype: F deoR endA1 recA1 relA1 gyrA96 hsdR17(rk , mk ) supE44 thi 1 phoAΔ (lacZYA argF)U169 Φ80lacZΔM15λ-) (Bioline (Aust) Pty Ltd, NSW, Australia). E. coli were cultured in Luria-Bertani (LB) broth (10 g/L casein peptone, 5 g/L yeast extract, 10 g/L sodium chloride) or on LB-agar plates. LB was made up with MQW and was sterilised by autoclaving. LB-agar plates were prepared by adding 15 g/L bacteriological agar prior to autoclaving. Filter- sterilised ampicillin (100 mg/mL in MQW) or kanamycin (50 mg/mL in MQW) was added to cooled, autoclaved broth to a final concentration of 100 μg/mL or 50 μg/mL, respectively. In the preparation of agar plates, this step was performed immediately prior to pouring.

2.1.8 Commercial services and kits Sanger sequencing was performed by the Australian Genome Research Facility Ltd or the Ramaciotti Centre for Genomics, both Sydney, NSW, Australia. Pyrosequencing was performed by the Australian Genome Research Facility Ltd. Techniques that involved the use of a commercial kit were carried out as advised in the manufacturers’ protocols. A list of commercial kits used is displayed below.

. cDNA synthesis was performed with SuperScript® VILO cDNA Synthesis Kit (Invitrogen)  DNA-free™ (Ambion, Austin, TX, USA) for DNase-treatment of RNA  RNeasy mini kit (Qiagen, Chadstone, VIC, Australia) for purification of RNA samples  PureLink HiPure Plasmid Filter Maxi- and Midiprep Kit (Thermo Fisher Scientific, Scoresby, VIC, Australia)  Wizard SV Gel and PCR Clean-up System (Promega, Alexandria, NSW, Australia)  PureLink® Genomic DNA Mini Kit (Thermo Fisher Scientific, Scoresby, VIC, Australia)  QuickExtract™ DNA Extraction Solution (Epicentre, WI, USA)

2.2 Methods

2.2.1 General methods Routine molecular biological techniques were carried out as outlined in Sambrook et al179. Page references for each technique are shown below.

. restriction endonuclease digestion of DNA: 5.24-5.32 . agarose gel electrophoresis: 6.1-6.20 . polyacrylamide gel electrophoresis: 6.36-6.43, 6.45, 18.47-18.55

Page | 31

CHAPTER 2: Material and Methods

. agarose gel DNA purification: 6.22-6.23 . DNA ligation: 1.63-1.69 . transformation of competent bacterial cells: 1.74, 1.76, 1.86 . phenol/chloroform extraction of DNA: E.3-E.4 . ethanol precipitation of DNA/RNA: E.10-E.14 . mini-preparations of plasmid DNA: 1.21-1.31 . polymerase chain reaction (PCR): 14.1-14.4, 14.14-14.21 . nuclear extracts from cultured cells: 17.8-17.10

2.2.2 Mammalian cell culture and transfection

2.2.2.1 K562 cells K562 cells were maintained in RPMI1640 supplemented with 10 % foetal calf serum and 1 × penicillin, streptomycin and L-glutamine.

Cells were transfected by nucleofection using a Neon Transfection System (Thermo Fisher Scientific, Scoresby, VIC, Australia). Cells (105) were resuspended in nucleofection buffer T (Neon Transfection Kit, Thermo Fisher Scientific, Scoresby, VIC, Australia) and given three pulses of 1450 V for 10 ms. Cells were then cultured for 48–72 h in RPMI1640 supplemented with 10 % FCS before selection or fluorescent activated cell sorting (FACS).

2.2.2.2 MEL GdsREDEGFP cells The mouse erythroleukaemia (MEL) GdsREDEGFP cells used in this work carry the human - globin locus on a 188-kb bacterial artificial chromosome (BAC) with dsRED as a reporter under the control of G-globin promoter and EGFP under the control of the -globin promoter180. These MEL GdsREDEGFP cells were maintained in in RPMI1640 (Thermo Fisher Scientific, Scoresby, VIC, Australia) supplemented with 10 % foetal calf serum and 1 × penicillin, streptomycin and L-glutamine.

Cells were transfected by nucleofection using a Neon Transfection System (Thermo Fisher Scientific, Scoresby, VIC, Australia). Cells (105) were resuspended in nucleofection buffer R (Neon Transfection Kit, Thermo Fisher Scientific, Scoresby, VIC, Australia) and given one pulse of 1500 V for 20 ms. Cells were then cultured for 48–72 h in RPMI1640 supplemented with 10 % FCS before selection or fluorescent activated cell sorting (FACS).

Cells were differentiated by the addition of culture media containing 2 % DMSO for up to 10 days.

Page | 32

CHAPTER 2: Material and Methods

2.2.2.3 HUDEP2 cells Human umbilical cord blood-derived erythroid progenitor 2 (HUDEP2) cells are an immortalised human erythroid progenitor cell line which was developed by Kurita et al in 2013181. They carry an expression cassette for the fluorescent protein Kusabira-Orange which serves as a live marker in flow cytometry. HUDEP2 cells were grown in StemSpan SFEM (StemCell Technologies) supplemented with SCF (50 ng/mL), EPO (3 IU/mL) (both Peprotech), dexamethasone (10-6M) and doxycycline (1 μg/mL) (both Sigma). Media was made up fresh every time before use and supplemented with 1 × penicillin, streptomycin and L-glutamine (Thermo Fisher Scientific, Scoresby, VIC, Australia). Cells were maintained at a concentration of 5x105 cells/mL.

Cells were transfected by nucleofection using a Neon Transfection System (Thermo Fisher Scientific, Scoresby, VIC, Australia). Cells (105) were resuspended in nucleofection buffer T (Neon Transfection Kit, Thermo Fisher Scientific, Scoresby, VIC, Australia) and given one pulse of 1100-1300 V for 20-40 ms. Cells were then cultured for 48–72 h in complete StemSpan SFEM lacking 1 × penicillin, streptomycin and L-glutamine. Cells were then sorted into a pool of cells by FACS and returned into culture. After 3-7 days cells were single-cell sorted into 96- well culture plates.

2.2.2.4 Cos cells CV-1 in Origin with SV40 genes (Cos) cells are fibroblast-like monkey kidney cells and were used for overexpression of proteins in a mammalian system. Cells were grown in DMEM supplemented with 10 % foetal calf serum and 1 × penicillin, streptomycin and L-glutamine (all Thermo Fisher Scientific, Scoresby, VIC, Australia).

For transfection cells were seeded the day before and grown to a confluency of 50-60 %. Cells were then transfected with 1-5 μg of DNA by FuGENE®6 (Thermo Fisher Scientific, Scoresby, VIC, Australia) according to the manufacturer’s protocol. Cells were harvested for nuclear or cytoplasmic extracts 48 hours post-transfection as previously described182.

2.2.2.5 SL2 cells SL2 cells are semi-adherent embryonic cells derived from Drosophila Melanogaster. Cells were cultured in Schneider’s medium supplemented with 10 % FCS and 1 × penicillin, streptomycin and L-glutamine. They were maintained at 25 °C at a concentration of 3-5 x 106 cells/mL.

Page | 33

CHAPTER 2: Material and Methods

For transient transfections cells were seeded at a density of 5 x 105 cells/mL the day before transfection. Cells were then transfected with 1-5 μg of DNA by FuGENE®6 (Thermo Fisher Scientific, Scoresby, VIC, Australia) according to the manufacturer’s protocol. Cells were harvested for luciferase assays according to the manufacturer’s protocol (DualGlo Luciferase System, Promega, Alexandria, NSW, Australia).

2.2.3 Overexpression of proteins in E.coli Bacterial protein expression and purification were conducted as follows. GATA1 ZF was overexpressed in Escherichia coli BL21 (DE3). LMO2-LDB1 was expressed as a tethered protein183. Proteins were purified using a cation exchange column. TAL1 bHLH and E47 bHLH were coexpressed in the same host. Bacterial lysates of TAL1/E47 were first purified by cation exchange and then further purified with a nickel-nitrilotriacetic acid (Ni-NTA) affinity resin, and bound heterodimers were eluted with imidazole. Before EMSA, proteins were dialysed in dialysis buffer (150 mM NaCl, 10 mM Tris-HCl and 1 mM dithiothreitol, pH 8.0).

2.2.4 Electrophoretic mobility shift assays (EMSAs)

2.2.4.1 EMSAs using radiolabelled probes EMSAs using nuclear extracts from Cos cells were performed using probes radiolabelled with 32P. Binding reactions for gel shift assays were performed at 4ºC in 30 mL of 10 mM N-2- hydroxyethylpiperazine-N9-2-ethanesulfonic acid (HEPES; pH 7.8)–50 mM potassium glutamate–5 mM MgCl2–1 mM EDTA–1 mM dithiothreitol (DTT)–5 % glycerol–10mg of bovine serum albumin–1 mg of poly(dI-dC)–0.5 ng of 32P-labeled double-stranded oligonucleotide probe and 5-10 μL of nuclear extract. If applicable, antibodies for supershift were added to the reaction. After 30 min at 4ºC, reaction mixtures were loaded onto a 6 % non-denaturing polyacrylamide gel in 0.5 x Tris-borate-EDTA and subjected to electrophoresis at 12 V/cm at 4ºC for 2 h. The gels were dried and subjected to autoradiography.

2.2.4.2 EMSAs using fluorescein labelled probes DNA-binding assays with purified proteins were performed using 5′ fluorescein-labelled double-stranded DNA probes (Sigma-Aldrich). Before EMSA, proteins were dialysed in dialysis buffer (150 mM NaCl, 10 mM Tris-HCl and 1 mM dithiothreitol, pH 8.0). They were then incubated with labelled oligo (5 nM) in mobility shift buffer (MSB) buffer (10 mM HEPES,

pH 7.9, 30 mM NaCl and 1 mM MgCl2) for 30 min at 4 °C. Ficoll loading dye was added to the reaction and samples were loaded onto an 8 % non-denaturing polyacrylamide gel in 0.5 × Tris–borate–EDTA and subjected to electrophoresis at 17 mA for 2 h at room temperature

Page | 34

CHAPTER 2: Material and Methods

(RT). After electrophoresis, gels were imaged using a Typhoon FLA-9500 imager (GE Healthcare Life Sciences).

2.2.5 RNA extraction and cDNA synthesis Cell cultures were washed with PBS prior to RNA extraction. Total RNA was extracted with TRI-REAGENT (Sigma) as per the supplier’s protocol but with an additional centrifuge step at 12,000 g for 10 min at 4 °C following homogenisation to reduce possible genomic DNA contamination. To further reduce genomic and other contamination, RNA was cleaned by the use of RNeasy kits (Qiagen, Chadstone, VIC, Australia) and was subsequently ‘rigorously’ DNase-treated with DNA-free kits as instructed by the supplier (Thermo Fisher Scientific, Scoresby, VIC, Australia).

Approximately 10 μg of RNA was used for cDNA synthesis using SuperScript® VILO cDNA Synthesis Kit (Thermo Fisher Scientific, Scoresby, VIC, Australia) in which random hexamers were used as primers for generation of first strand cDNA. For each sample, a negative control was also set up in the absence of SuperScript III reverse transcriptase (-RT) to check for genomic DNA contamination.

2.2.6 Quantitative real-time RT-PCR and Fluidigm Total RNA (up to 10 μg, assessed by Nanodrop) was used as a template for cDNA synthesis using the SuperScript III First-Strand Synthesis System kit. Reactions were primed with random hexamers rather than oligo (dT) primers to achieve more efficient reverse transcription from RNA templates with long 3’ untranslated regions (UTRs). The use of random hexamers also enabled cDNA synthesis from rRNA, thus allowing the use of 18S levels as standards for normalisation. Quantitative real-time PCRs (qPCR) (final volume 20 μL) were set up with Power SYBR Green PCR Master Mix (Thermo Fisher Scientific, Scoresby, VIC, Australia) and were run with the default cycle parameters of the Applied Biosystems® 7500 Fast Real-Time PCR System. Approximately 10 ng cDNA (assuming 100 % reverse transcription efficiency) was employed in each qPCR in duplicate. Expression levels of genes of interest were normalised against 18S rRNA levels. As negative controls, minus RT and no- template reactions were always included. Data from qPCRs were analysed with Applied Biosystems® 7500 Real-Time PCR System software.

K562 tdTomato samples were assayed by quantitative real time PCR (qRT–PCR) with a FLEXSix Fluidigm Dynamic Array integrated fluidic circuit (Fluidigm) using EvaGreen dye on a BioMark System (Fluidigm).

Page | 35

CHAPTER 2: Material and Methods

2.2.6.1 qPCR primers Primer Express™ software (Thermo Fisher Scientific, Scoresby, VIC, Australia) was used to design paired qPCR primers. Primer pairs were designed to straddle exon-exon junctions where possible to prevent amplification of any contaminating genomic DNA. Specificity of primers was verified by conducting genomic sequence searches using the Basic Local Alignment Search Tool (NCBI). Final primer concentrations lay between 200 nM and 600 nM. Amplification efficiencies of primer pairs were tested using standard curves and were all found to be within an acceptable range close to 1. A list of all qPCR primers used is found in Table I.1 in the Appendix.

2.2.7 Western Blotting Nuclear extracts were performed as previously described. To detect overexpressed Flag- tagged ZBTB7A (370-500 aa) 5 µL of COS-cell extract was loaded onto a NuPAGE™ Novex™ 10 % Bis-Tris Protein Gel (Thermo Fisher Scientific, Scoresby, VIC, Australia) and samples were run in MOPS buffer at 200 V for 50 mins. Proteins were then transferred onto a nitrocellulose membrane at 30 V for 60 mins. Blots were probed for Flag-ZBTB7A using an anti-Flag antibody (Sigma Aldrich, Australia) followed by 1 μL horseradish peroxidase-linked anti- mouse antibody in 15 mL TBST. Rainbow™ protein standards (Amersham GE Healthcare Life Sciences) were loaded in each gel for size estimation. Densitometry of relative band intensities was performed using ImageJ Gel Analysis tool. Briefly, ImageJ was used to select and determine the background-subtracted density of the bands in the indicated blots.

2.2.8 Genome editing

2.2.8.1 Tal-Effector-Nucleases (TALENs)

-globin TALENs were kindly donated by Matthew H. Porteus (Stanford University, CA). TALENs are described in Voit et al184. They are expressed from a pcDNA3.1 (Invitrogen) vector driven by a cytomegalovirus (CMV) promoter. They were synthesized using a Golden Gate cloning strategy185 with a Δ152 N-terminal domain and a +63 C-terminal domain186

2.2.8.2 Targeting vector construction Targeting vector (tdTomato) was kindly donated by Matthew H. Porteus (Stanford University, CA). The −175T>C and the -195C>G mutation were introduced into the targeting vector by site-directed mutagenesis (Q5 SDM Kit, New England Biolabs), and their presence was confirmed by Sanger sequencing (Australian Genome Research Facility).

Page | 36

CHAPTER 2: Material and Methods

A targeting vector containing ECFP in place of tdTomato was generated by PCR from pECFP- C1 (Clontech) for the ECFP fragment (F: 5′- CTCCTAGTCCAGACGCCATGGTGAGCAAGGGCGAG -3′, R: 5′- ATTAATGCATTTACTTGTACAGCTCGTCCATGCC -3′) and PCR from tdTomato targeting vector for the 5′ -promoter region (F: 5′- ATTAAAGCTTGATATCGAATTCGATT -3′, R: 5′- GGCGTCTGGACTAGGAG -3′) followed by overlap extension PCR of both fragments (F: 5′- ATTAAAGCTTGATATCGAATTCGATT -3′, R: 5′- ATTAATGCATTTACTTGTACAGCTCGT -3′). The resulting PCR product 5′-ECFP was then ligated into the target vector backbone via the restriction enzyme sites HindIII and NsiI.

2.2.8.3 Cloning of sgRNA plasmids for CRISPR/Cas9 genome editing For CRISPR/Cas9 genome editing a plasmid encoding both the Cas9 protein and the small guide RNA was used. pSpCas9(BB)-2A-GFP (pX458) was a gift from Feng Zhang (Addgene plasmid # 48138)187. The Cas9 sequence is coupled to a T2A site and EGFP. Expression of Cas9 protein results in simultaneous expression of EGFP allowing for selection of positively transfected cells. We designed sgRNA sequences using the optimised CRISPR design online tool provided by the Zheng lab from Massachusetts Institute of Technology, Boston188. pX458 was cut with BbsI restriction enzyme and phosphatase treated. A pair of complementary oligos was ordered from Sigma containing the 20-nt target sequence of the sgRNA. The fwd oligo was ordered with a CACC overhang at the 5’end of the oligo. The rev oligo was ordered with an AAAC overhang at the 5’ end of the oligo. These overhangs allow ligation of the annealed oligo with pX458 after BbsI digestion. If the 20 nt genomic sequence did not start with a guanine an extra G was added to the 5’ end of the 20 nt sequence of the fwd oligo and an extra C was added to the 3’end of the rev oligo.

Oligos were phosphorylated and annealed, resulting in a double stranded piece of DNA:

5’ CACC(G)-NNNNNNNNNNNNNNNNNNNN 3’ 3’ (C)-NNNNNNNNNNNNNNNNNNNN-CAAA 5’ A ligation was set up containing the annealed oligo and pX458 cut with BbsI enzyme. E.coli were transformed with mixture and colonies were screened by Sanger sequencing for positive transformants.

2.2.8.4 ssODN design Single-stranded oligodeoxynucleotides (ssODNs) were used as DNA donor templates in CRISPR/Cas9 genome editing strategies. Oligos were ordered as Ultramers from Integrated DNA Technologies (IDT) including the desired base pair change and 75 bp arms of homology

Page | 37

CHAPTER 2: Material and Methods

on either side of the mutation. Oligos were diluted to a concentration of 50 µM and 0.2 µL of that stock was used as a donor template in each nucleofection.

2.2.8.5 Screening for correctly edited clones After single cells were sorted into 96-well plates the plates were left in the incubator for 7-14 days or until small colonies could be detected by the naked eye. Clonal populations were then transferred into one master 96-well plate. 50 µL of each well of 70 % confluent cultures were transferred into a PCR plate and spun down. Supernatant was discarded by flicking the plate. Each cell pellet was resuspended in 50 µl of QuickExtract™ DNA Extraction Solution (Epicentre, WI, USA). Plate was heated to 98 ºC for 2 mins followed by incubation at 65 ºC for 6 mins. 5 µl of this extract was used in a 30 µl PCR reaction that amplifies the region that has been edited. Primers were designed to lay outside of the arms of homology to ensure insertion of the DNA template at the correct genomic locus. PCR products (300-1200 bp) were sent for Sanger sequencing (AGRF and Ramaciotti Centre) and genotype was determined.

To avoid selecting clones that had a large deletion on the second allele, genomic DNA was extracted with PureLink® Genomic DNA extraction kit (ThermoFisher). 10 ng of genomic DNA was used to run quantitative real-time PCR across the edited locus to determine copy number. Cts were normalised against Cts from PCR over a control region in the Zfmp1 gene and compared to the known copy number of the WT cells.

2.2.9 Fluorescence-activated cell sorting and flow cytometry Cells were sorted with a BD Influx™ cell sorter (BD Biosciences, San Jose, CA) or a BD FACSAria™ III cell sorter (BD Biosciences, San Jose, CA). Post-transfection HUDEP2 cells were first pool sorted for EGFP and Kusabira-Orange positive cells into 15 mL falcon tubes containing SFEM media + supplements and then put back into culture for 6 days to allow the cells to recover. They were then single-cell sorted (EGFP negative, Kusabira-Orange positive) into 96-well plates containing the appropriate growth media to obtain monoclonal populations of cells. K562 and MEL GdsREDEGFP cells (live cells using TO-PRO®-3) were directly sorted into 96-well plates. Gates were set using the BD FACSDiva™ software (BD Biosciences, San Jose, CA). tdTomato, EGFP, ECFP and dsRED fluorescence was detected with a BD LSRFortessa™ cell analyser (BD Biosciences, San Jose, CA). Data was analysed using FlowJo software (FlowJo, LLC, Ashland, OR).

Page | 38

CHAPTER 2: Material and Methods

2.2.10 Chromatin immunoprecipitation ChIP experiments were performed as previously described. Briefly, approximately 5 x 107 cells were used for each IP. Cells were crosslinked with 1 % formaldehyde (Sigma-Aldrich) for 10 min at RT and reaction was quenched with glycine at a final concentration of 125 mM. For LDB1 ChIP, cells were crosslinked with ethylene glycol bis(succinimidyl succinate) (EGS) at a final concentration of 1.5 mM for 30 min followed by 1 % formaldehyde crosslinking for 10 min. Crosslinked cells were then lysed and sonicated to obtain ~200–300 bp fragments of chromatin. DNA was pulled down at 4 °C overnight using antibodies (15 μg) specified in chapter 2.1.3.1. Chromatin was then reverse crosslinked and eluted at 65 °C overnight and DNA was purified. Real-time qPCR was performed on ChIP material on a 7500 Fast Real-Time PCR System (Applied Biosystems).

2.2.11 Pyrosequencing Quantification of WT or −175 T>C alleles before and after ChIP was determined by pyrosequencing. ChIP material was amplified with primers F: 5′-biotin- CAAGGCTATTGGTCAAGGCAA-3′ and R: 5′-TTCCCCACACTATCTCAATGCAAA -3′ on a 7500 Fast Real-Time PCR System (Applied Biosystems). Pyrosequencing was performed using AGRF’s PyroMark Sequencing Service (Qiagen, Chadstone, VIC, Australia) with sequencing primer 5′- CACACTATCTCAATGCAAA -3′.

2.2.12 Chromatin conformation capture (3C) The 3C assay was performed using ~5 × 106 cells per experiment. Cells were crosslinked with 1.5 % formaldehyde at room temperature for 10 min, followed by glycine quenching, cell lysis, HindIII (1,000 U) digestion overnight and T4 ligation (400 U) for 4–5 h at 16 °C followed by 30 min at room temperature (both New England Biolabs). 3C ligation products were quantified in triplicates by real-time qPCR. Primer sequences were previously described189 and are listed in the Table I.3 in the Appendix. Primers were tested by serial dilution and gel electrophoresis to ensure specific and linear amplification (Figure II.1 in the Appendix). Digestion efficiencies were monitored by qPCR with primer pairs that amplify genomic regions spanning or avoiding HindIII digestion sites. Only samples with efficiencies >75 % were considered for analysis. A BAC containing the entire human -globin locus (pEBACGdsREDEGFP)190 was digested with HindIII and religated to generate random ligation products of HindIII fragments for transgenic MEL cell experiments. For the 3C in K562s, we used a BAC containing the unmodified human -globin locus (pBAC 148). The ligated BAC DNA was serially diluted and used to generate standard curves for each primer

Page | 39

CHAPTER 2: Material and Methods

pair to which all 3C products were normalised. The 3C signals at the -globin locus were further normalised to those from an intervening genomic region.

2.2.13 Transient Transactivation Assays Luciferase reporter assays were performed 48 h after transfection using the Promega Luciferase Assay System according to the manufacturer's instructions. Briefly, SL2 cells were transfected with 1 μg of either pGL4.10 (Firefly luciferase) empty reporter plasmid, pGL4.10 reporter plasmid containing the WT globin promoter (-385/+50 from TSS) or pGL4.10 reporter plasmid containing the -198T>C globin promoter (-385/+50 from TSS). Additionally different amounts of pPac-Empty, pPac-KLF1 or pPac-KLF3 and 100 ng of pGL74 (Renilla luciferase) were added to the transfection. Every transfection had the same total amount of DNA.

2.2.14 Mitotic arrest and cell cycle analysis HUDEP2 and K562 cells were treated with 100 ng/mL nocodazole to induce mitotic arrest. After 15 h, nocodazole was removed from the media by centrifuging the cells and washing them twice in PBS. Cells were then resuspended in complete media and further grown in the incubator at 37 °C until time of harvest. Cells were harvested for ChIP and flow cytometry 1 h, 2 h, 3 h, 5 h, 7 h and 24 h after they have been released back into culture.

For cell cycle analysis by flow cytometry cells were fixed with 70 % cold ethanol. They were treated with RNAse to ensure only DNA is stained for flow cytometry. Propidium iodide was added to cell suspension to stain the DNA content. Samples were run on a BD LSRFortessa™ cell analyser and data was analysed using the cell cycle analysis algorithm provided with the FlowJo software.

2.2.15 Statistical analysis Statistical analysis was performed using GraphPad Prism software. Significance was determined by unpaired two-tailed t-test using the Holm–Sidak method.

Page | 40

CHAPTER 3: The -175 T to C mutation

3 CHAPTER 3 – The -175 T to C mutation

3.1 Chapter 3 Introduction The haemoglobin switch and especially the developmental silencing of foetal haemoglobin have been studied extensively for many years. Many of these studies relate to adult haemoglobin disorders such as SCD and -thalassemia. As it has been shown that high foetal haemoglobin levels in adult life can ameliorate symptoms of -globin disorders, one main target in the search of treatment for these diseases is to reactivate the silenced -globin gene. Hereditary Persistence of Foetal Haemoglobin (HPFH), the condition where expression of foetal haemoglobin continues into adulthood, was first described in the 1960s and 1970s by various groups of researchers191,192. Many HPFH variants were discovered in genetic screens of families that have been affected by other disease causing haemoglobin mutations. It was noticed that symptoms of disease were much milder in individuals that carry HPFH variants in addition to the disease causing mutations and it was suggested that high levels of foetal haemoglobin are beneficial193.

As described in Chapter 1, HPFH can be caused by deletional and non-deletional mutations. This thesis will focus solely on non-deletional variants. Most of the non-deletional variants that result in high foetal haemoglobin have been mapped to the proximal promoter of one of the two tandem foetal -globin genes95. These are often single base-pair substitutions. The HPFH mutations that will be investigated in this thesis are shown in Figure 3.1. The HPFH mutation at position-175 will be the focus of this chapter.

Figure 3.1: Non-deletional HPFH mutations in the proximal promoter of -globin. Shown is the DNA sequence of the proximal promoter of the -globin gene. HPFH mutations that will be discussed in this thesis are marked with arrows. Boxed is the -175 site which will be investigated in this chapter.

3.1.1 The -175 T>C HPFH mutation elevates foetal haemoglobin levels in humans At least eight individuals from five unrelated families (with different ethnic backgrounds) have been described who carry a T to C substitution at position −175 in the foetal -globin promoter. The mutation was first described by Surrey et al. in 1988108 and was later

Page | 41

CHAPTER 3: The -175 T to C mutation

confirmed by other groups to be associated with significantly elevated levels of foetal hemoglobin107,194,195. As shown in Table 3.1, clinical data indicates that the level of HbF in these individuals varies between 16 % and 41 % of total haemoglobin. Thus, this mutation is associated with HPFH in humans in vivo95,196. All individuals have been reported to be heterozygous for the -175 T>C mutation and three of them also carried other -globin abnormalities.

The work described in this chapter uncovered the molecular mechanism of the single base-pair substitution at the -175 site in the foetal -globin promoter. We investigated the binding of different transcription factors to the -globin promoter in vitro at the -175 site and then we established two different cell line models carrying the -175 mutation. We also performed flow cytometry, ChIP and chromatin conformation capture (3C) experiments on those cells to investigate the molecular mechanism by which this mutation functions.

Page | 42

Table 3.1: Clinical data describing the -175 T to C HPFH mutation in humans

Ethnic Genetic background Haematological data Gender back- Relation Health status Ref other HbA2 HbA HbS/ HbF Composition /Age 5' Gg 5' Ag ground abnormalities [%] [%] C [%] [%] of HbF A <2 66%  34 - - Adult healthy normal normal - 2-3% 95-98% 0 G % 33% 

Italian Proposita F49 healthy -175 T>C (Het) Normal - 1.1 ? - 17 G 195 90-92%  (Sardinian) Daughter F20 healthy -175 T>C (Het) Normal - 1.3 ? - 21 108, Black S (Het, in trans F27 healthy -175 T>C (Het) Normal 1.9 27.5 40.9 29.5 100% G American to -175 T>C) ⱡ 192 -175 T>C (Het) -158 C>T (Het, in cis -369C>G‡ C (Het, in trans ~80% A Black M25 mild microcytosis 1.6 15.7 45.4 38 107 to -175 T>C) † to -175 T>C) Ⱡ ~20% G -16C>Gᶲ +24A>C -158 C>T (Het, in cis S (Het, in trans ~66% A Black F30 healthy 175T>C (Het) 1.9 16.8 40.4 40.9 194 to -175 T>C) † to -175 T>C) ⱡ ~34% G

thrombocytopenic, -175 T>C (Het) 4 bp deletion (Het, Trisomy 8 Propositus M3 1.1 ? - 21.6 >90% G macrocytic -158 C>T -222 to -225)* (mosaic)

4 bp deletion (Het,- 197 British Mother F healthy -175 T>C (Het) - 1.5 ? - 20.1 >90% G 222 to -225)*

4 bp deletion (Het, Aunt F healthy -175 T>C (Het) - ? ? - 16.2 ? -222 to -225)*

† This variant (T at position -158) has been shown to result in high G/A ratios in patients with sickle cell disease or  thalassemia198. It was first described in the first gene sequenced (from a foetus) but it is likely that the presence of C at -158 is the more common polymorphism 108,199 ‡ appears to be common as it has been found in five additional cases from adults without ndHPFH 32,200 ᶲ this variant was also found in the mother who is a HbC heterozygote * this deletion has been shown to result in low expression levels of A in adults201 ⱡ S is a mutation in the -globin gene that results in a change at position 6 in -globin. Glutamic acid is replaced with valine. Sickle cell mutation. Ⱡ C is a mutation in the -globin gene that results in an amino acid change from glutamic acid to lysine at position 6 resulting in the production of hemoglobin C (HbC)

Page | 43

CHAPTER 3: The -175 T to C mutation

3.2 The -175 T>C mutation does not affect GATA1 binding to that region in vitro Since the -175 mutation was first described researchers have speculated on the mechanism behind the activation of -globin by the T to C substitution. It has been shown that the -175 T>C mutation increases the expression of a reporter construct in erythroid cells but not in non-erythroid cells suggesting an erythroid specific mechanism for activation196,202,203. By analysing the DNA sequence of the proximal promoter around the -175 site Martin et al.196 discovered two binding motifs for the erythroid factor GF-1 (Figure 3.2A), now known as GATA Binding Protein 1 (GATA1). GATA1 binds to (T/A)GATA(A/G) consensus sequences204 and can act as a repressor or activator depending on the gene context and interacting cofactors63. Martin et al. and others have then shown that the -175 T>C mutation does not abolish GATA1 binding to the -globin promoter however they suggested that the interaction of GATA1 with DNA is altered98,196,202,203,205.

In order to confirm these previous results of GATA1 binding to the WT and mutant promoter we extracted nuclear proteins from MEL cells and tested binding in electrophoretic mobility shift assays (Figure 3.2B). We wanted to investigate if GATA1 binding is disrupted by the -175 T>C mutation when only one GATA site (-151 to -186) or when both GATA sites are present (-151 to -203). Interestingly GATA1 binding to the probe containing only one GATA1 site is dramatically reduced (lane 5) indicating that the T to C change in the GATA motif does disrupt GATA1 binding. However, as it has been described in the literature when using a probe containing both GATA sites one can detect no difference in binding strength of GATA1 to the -globin promoter when comparing WT and -175 T>C promoters (lane 8 and 11).

This data suggests that the GATA site further upstream is the more important site to mediate binding of GATA1 to the -globin promoter. Affiliation of GATA1 to the GATA site closer to the transcription start site seems to be weaker (lane 2) and although the -175 T>C mutation disrupts the GATA1-DNA complex in vitro (lane 5) it is unlikely to play an important role in the effects of this T>C substitution in vivo due to the presence of an intact second and stronger GATA site further upstream.

Page | 44

CHAPTER 3: The -175 T to C mutation

Figure 3.2: GATA1 binding to the -globin promoter is not affected by the -175 T>C mutation if both consensus sites are present. (A) Schematic of the -globin promoter indicating GATA1 binding sites (boxed). Also shown are the sequences and names of probes used in EMSA. (B) EMSA showing endogenous GATA1 from MEL cell NE (induced with 2% DMSO for 72h) binding to the -globin promoter. Probes in lanes 1-6 contain only the -175 GATA motif whereas probes in lanes 7-12 contain both GATA motifs (-175 and -185). Binding of GATA1 to the -175 T>C mutant probe is abolished when only the -175 GATA site is present (lane 5) but is retained when both GATA sites are available (lane 11). Supershift with anti-GATA1 (GATA1*) confirmed the identity of the protein (lanes 3, 6, 9 and 12).

Page | 45

CHAPTER 3: The -175 T to C mutation

3.3 The -175 T>C mutation is a gain-of-function mutation

3.3.1 The -175 T>C mutation creates a TAL1 binding site We decided to explore an alternate hypothesis for the mechanism behind the -175 T>C mutation than that previously proposed where the substitution disrupts binding of GATA-1. The fact that the -175 T>C mutation is not found in a cluster of other HPFH mutations suggests that it could be a gain-of-function mutation creating a de novo transcription factor binding site rather than disrupting an existing binding motif of a repressor. Hence, we chose to compare the mutated DNA sequence to known erythroid transcriptional activator binding motifs. A close inspection revealed that the T>C substitution creates a consensus binding motif for the transcription factor T-Cell Acute Lymphocytic Leukaemia 1 (TAL1). The complete motif is best viewed on the antisense strand of the -globin promoter (Figure 3.3A and B). TAL1 is a member of the basic helix-loop-helix (bHLH) family of transcription factors

and is required for normal erythropoiesis206. It binds to DNA E-Box motifs of the sequence CANNTG as an obligate heterodimer with ubiquitously expressed class I bHLH E-proteins (such as E47 and E12)207. TAL1 is often found as part of a multiprotein complex together with the LIM-only domain protein 2 (LMO2) and the LIM domain-binding protein 1 (LDB1), which in turn recruit other cofactors to regulate transcription59,208. To test binding of TAL1 to the mutated sequence we expressed the DNA-binding domain of TAL1 and its partner E47 and compared binding to WT and -175 T>C -globin promoter probe in EMSA (Figure 3.3C). We found that although an E47 homodimer can interact with the WT promoter (lane 2), a TAL1/E47 heterodimer could only be detected to bind to the -175 T>C probe (lane 5). The identity of TAL1 in the complex was confirmed by supershift with anti-His antibody (lane 6).

Next we wondered if TAL1 is able to bind to the -175 T>C promoter in a complex with LMO2 and LDB1. To test this we expressed LMO2 and LDB1 as a tethered protein and investigated the affinity of TAL1/E47 and LMO2-LDB1 to the -globin promoter in EMSA (Figure 3.3D). As LMO2-LDB1 specifically interacts with TAL1 but not with E47209 only a complex containing TAL1 will supershift the probe upon addition of LMO2-LDB1. It can be seen in lane 3 and 6 that this is only the case for the -175 T>C promoter probe. This finding adds evidence to the fact that the -175 HPFH mutation creates a TAL1 consensus motif allowing TAL1 to bind together with its cofactors LMO2 and LDB1.

Page | 46

CHAPTER 3: The -175 T to C mutation

Figure 3.3: The -175 T>C mutation in the -globin promoter creates an E-Box motif. (A) TAL1/SCL consensus binding motif as previously analysed by MEME search from ChIP-Seq210. (B) The motif created by the mutation (CAGATG) is found on the antisense strand of the -175 T>C -globin promoter (C) EMSA. The DNA binding domains (bHLH) of TAL1 and E47 were coexpressed in bacteria and purified by ion exchange chromatography. Binding of E47/E47 homodimer (E/E) and TAL1/E47 heterodimer (T/E) to the WT and -175 T>C -globin promoters is shown in lanes 2 and 5 respectively. Lanes 1 and 4 show probe alone; specific binding of TAL1/E47 to the mutant probe is confirmed by supershift using an anti-His antibody (lane 6, T*/E). The probe spans region -166 to -215. (D) EMSA showing interaction of TAL1/E47 with LMO2-LDB1 and the mutant -175 T>C promoter. LMO2 and LDB1 were bacterially expressed as a tethered protein183 and then purified by ion exchange chromatography. Binding of E47/E47 homodimer (E/E) and TAL1/E47 heterodimer (T/E) to the WT -175 T>C -globin promoters is shown in lanes 2 and 5 respectively. Lanes 1 and 4 show probe alone. The retarded band in lane 5 supershifts upon addition of LMO2-LDB1 (lane 6) indicating an interaction of TAL1/E47 with LMO2-LDB1 (T/E/L-L). The probe spans region -163 to -195.

3.3.2 TAL1 and GATA1 can bind to the -175 mutation in a complex GATA1 has also been shown to work in combination with TAL1 to activate erythroid genes62,63,209,211. A pentameric complex consisting of GATA1, TAL1, E47, LMO2 and LDB1 has been shown to bind to a bipartite motif comprising an E-box (CANNTG) motif followed 9- 12bp downstream by a GATA site208. Due to the spacing of the two GATA sites in the -globin promoter the creation of an E-Box in the -175 T>C promoter implies the creation of an E-Box- GATA site at the same time (Figure 3.4).

Page | 47

CHAPTER 3: The -175 T to C mutation

Figure 3.4: The -175 T>C mutation creates an E-Box-GATA consensus motif. Shown is a schematic of the -globin promoter (the antisense strand). Marked in red are GATA1 binding sites and marked in blue is the E-Box created by the -175 T>C mutation (highlighted in grey).

We next wanted to investigate if we could see the pentameric complex binding to the -175 T>C promoter in vitro. We expressed all components of the complex bacterially and performed EMSAs comparing their affinity for the WT or the mutant -globin promoter (Figure 3.5). We could show that mutual binding of TAL1 and GATA1 to the -globin promoter is possible when the bridging molecule LMO2/LDB1 is present (lane 27-32) indicating the possibility of synergistic binding of the two factors to the -175 T>C promoter.

Figure 3.5: GATA1 and TAL1 can bind to the -175 T>C promoter in a pentameric complex. (A) Schematic of the pentameric complex binding to an E-box-GATA motif. It consists of the transcription factors TAL1/E47 and GATA1 and the cofactors LMO2 and LBD1. LMO2 acts as a bridging molecule between TAL1 and GATA1. (B) EMSA showing a complex of GATA1-TAL1-E47 and LMO2-LDB1 binding

Page | 48

CHAPTER 3: The -175 T to C mutation

to the WT and -175 T>C mutant promoter. Proteins were bacterially expressed and purified by ion- exchange chromatography. E47 and TAL1 were incubated with increasing amounts of an LMO2-LDB1 fusion protein and either the WT -globin promoter probe (lanes 1-8) or the mutant -175 T>C -globin promoter probe (lanes 17-24). Only the -175 T>C probe shows binding of TAL1/E47 heterodimer (T/E) and TAL1/E47/LMO2-LDB1 complex (T/E/L-L). Both WT and -175 T>C probes show background binding of E47 heterodimer (E/E). Addition of GATA1 to the complex (lanes 9-16 for WT and 25 to 32 for -175 T>C) results in a slower migrating retarded band only in the -175 T>C mutant probe (visible in lanes 28 to 32). This complex is likely to be GATA1/TAL1/E47/LMO2-LDB1 (G/T/E/L-L).

3.4 Generation of murine erythroid cell lines carrying the -175 T to C mutation To confirm our findings from the in vitro binding studies in an in vivo environment, we developed a strategy to introduce the -175 T>C mutation into the genome of transgenic mouse erythroleukemia (MEL) cells. Our collaborators in Melbourne have previously generated cells that carry a modified version of the human -globin locus on a bacterial artificial chromosome (BAC) with dsRED and EGFP replacing the endogenous G and -globin gene coding sequences, respectively180. This fluorescent reporter system can be used to study mechanisms of globin switching as these cells express EGFP under the control of the adult human -globin promoter, but retain the potential for reactivation of the silenced foetal -globin promoters. To study the effect of the -175 T>C mutation on -globin gene expression, we further modified this cell line by using TALENs184 that specifically target the A-globin gene at the natural A ATG start codon (Figure 3.6A). We then used homologous recombination to integrate a -globin promoter carrying the -175 T>C substitution fused to ECFP. As a control we also generated a cell line expressing ECFP under the control of the wildtype A -globin promoter (Figure 3.6B).

Page | 49

CHAPTER 3: The -175 T to C mutation

Figure 3.6: Genome editing strategy in transgenic MEL cells. (A) Schematic of Tal-Effector Nucleases (TALENs) targeting the translation start site (ATG) of the -globin gene. DNA sequences recognized by TALENs are highlighted and boxed. (B) Gene editing strategy in MEL cells (G dsRED EGFP): transfected TALENs create a DSB at the ATG start codon of A-globin gene on the transgenic BAC. Cotransfection of TALENs and targeting vector with 1kb arms of homology 5’ and 3’ from DSB results in integration of ECFP via homologous recombination. ECFP is driven by the A promoter whereas the Agene is lacking a promoter region and is not expressed.

Using the strategy described in Figure 3.6, we engineered a number of different clonal cell lines carrying ECFP under the control of either the WT or -175 T>C mutant A-globin promoter. The Gglobin promoter remained with the WT sequence in both cases (Figure 3.7A and B). Successful targeting was confirmed by genomic PCR with one primer located outside of the region contained in the targeting vector and one primer located in either ECFP or the -globin gene. If the knock-in of ECFP was successful we were able to see a PCR product of ~1200 bp whereas if the locus was untargeted the PCR product size was much bigger (~2500 bp) (Figure 3.7C). We then Sanger sequenced the promoter regions of successfully modified clones to screen for the presence of the -175 T>C mutation.

Page | 50

CHAPTER 3: The -175 T to C mutation

Figure 3.7: The engineered -globin locus in transgenic MEL cells. (A) Schematic of the engineered -globin locus on the BAC in MEL cells. (B) Schematic showing the modified -globin genes in MEL WT:A or -175T>C:A cells, respectively. (C) Genomic PCR using primers to distinguish between unmodified (upper panel) and A-globin promoter-ECFP targeted cells (lower panel). Lane 1 shows genomic PCR product from unmodified MEL cells; lanes 2-7 show genomic PCR for successfully modified clonal populations of MEL WT/-175T>C:Acells.

3.5 TAL1 binds and activates -globin -175 T>C in transgenic mouse erythroid cells Next we investigated if the transgenic MEL WT and -175 T>C cells show differences in globin gene expression. To analyse these cell lines for -globin, A- and G -globin promoter activity, we performed flow cytometry to determine the expression levels of EGFP, ECFP and dsRED respectively (Figure 3.8). ECFP (A-globin) expression was significantly higher in -175 T>C cells compared to WT, whereas expression of EGFP (-globin) was significantly lower in those cells. The differences increased further upon differentiation of the cells for three days. In contrast to the A-globin gene that was modified, the unmodified G-globin locus marked by the expression of dsRED showed no significant changes in promoter activity.

Page | 51

CHAPTER 3: The -175 T to C mutation

Figure 3.8: The -175 T>C mutation increases promoter activity in transgenic mouse erythroid cells. (A) Expression of reporter genes as determined by flow cytometry. Shown is the percentage of ECFP (A-globin, left panel) and EGFP (-globin, right panel) over total measurable globin reporters (ECFP, EGFP and dsRED) comparing clonal MEL WT:Aand -175T>C:A cell populations (n=3) in uninduced state (Day 0) and after 72h of induction with 2% DMSO (Day 3). Significance was determined by unpaired two-tailed t-test (*p<0.05). Shown is mean േSD. (B) Flow cytometry of clonal populations of MEL WT:A or -175T>C:A cells. Shown are superimposed representative histograms comparing expression levels of ECFP (A-globin), dsRED (G-globin) and EGFP (-globin) between 72h induced MEL WT:A or -175T>C:A cells. Depicted is the median sample of the monitored monoclonal populations (n=3).

We next examined the molecular mechanism whereby the -175 T>C mutation promotes -globin expression by investigating whether the mutant promoter facilitates de novo binding of TAL1 to the -globin promoter in our cellular model. Chromatin immunoprecipitation (ChIP) experiments revealed a significantly higher enrichment of TAL1 occupancy at the -globin promoter in the presence of the -175 T>C mutation (Figure 3.9A). We then assayed for occupancy of the TAL1 partner proteins LMO2 and LDB1 at the mutated promoter. Both, LMO2 and LDB1 ChIP revealed enrichment of these factors at the -globin promoter in cells carrying the -175T>C mutation (Figure 3.9B and C). Additionally, we performed ChIP experiments for GATA1 in both MEL:A WT and -175T>C:Acells. There was a modest 1.7 fold increase in GATA1 binding to the -globin promoter when the mutation was present but this increase was not statistically significant (p=0.19 as determined by an unpaired two-tailed t-test, Figure 3.9D).

Page | 52

CHAPTER 3: The -175 T to C mutation

Figure 3.9: TAL1 and its cofactors LMO2 and LDB1 bind the -175 T>C promoter by ChIP. Anti- TAL1 (A), anti-LMO2 (B), anti-LDB1 (C) and anti-GATA-1 (D) ChIP in MEL AECFP cells carrying either the WT (left panel) promoter sequence or the -175 T>C mutation (right panel) in A. Only ChIPs performed in -175 T>C targeted cells showed binding of the factors to the -globin promoter.

3.6 Genome editing in a human erythroid cell line Having demonstrated that the -175 T>C mutation creates a TAL1 binding site and results in elevated levels of -globin expression in a mouse cell model, we next investigated whether we could use the same TALEN-based approach in human erythroid K562 cells to engineer a -175 T>C mutation into the -globin promoter to reactivate -globin expression. Our strategy here was to knock-in a fluorescent tdTomato reporter and followed by the same neomycin cassette as used in the transgenic mouse cells. Expression of tdTomato was to be under the control of either the WT or the -175 T>C -globin promoter. Due to the homology of the two -globin genes genome editing of the locus results in splicing of the G-globin gene which lies in between the cut sites (Figure 3.10).

Page | 53

CHAPTER 3: The -175 T to C mutation

Figure 3.10: Genome editing strategy in K562 cells. TALENs create a double strand break (DSB) at the ATG of both endogenous -globin genes. TdTomato is integrated by homologous recombination after cotransfection with a targeting vector with 1kb arms of homology on the 5’ and 3’ end of the DSB. TdTomato reporter is now driven by the endogenous G promoter. The G-globin gene was spliced in this process and the A-globin gene is lacking a promoter region and hence cannot be expressed.

As K562 cell lines are often aneuploid212, we collaborated with D. Wright from the Children’s Hospital in Westmead (Australia) to karyotype our K562 cell line and found that they are triploid for chromosome 11 where the β-globin locus is located (Figure 3.11A). We nucleofected TALENs and donor plasmids into K562 cells and then established clonal cell lines. We screened the clones by genomic PCR amplifying the region across the -globin promoter. Firstly, to ensure the fluorescent reporter had integrated in the correct genomic locus, we designed a PCR strategy with primers spanning the junction between the donor plasmid sequence and the genomic sequence 5’ of the -globin promoter (PCR1). A second genomic PCR with one primer recognising the endogenous -globin gene and the second primer binding to a region outside of the arms of homology was used to determine if there were any residual unmodified -globin loci left (PCR2). We selected clones that showed a PCR product from amplification by PCR1 but none from amplification by PCR2 to confirm that successful genome editing had introduced tdTomato into all endogenous -globin loci for both WT and -175 T>C cells (Figure 3.11B). Next we sequenced the PCR products from PCR1 by Sanger sequencing and determined if the cells carried the -175 T>C mutation. We

Page | 54

CHAPTER 3: The -175 T to C mutation

managed to obtain five monoclonal populations that carried the -175 T>C mutation heterozygously on either one or two of the three alleles. The cells will be referred to as K562 G-A tdTomato WT or -175 T>C, respectively.

Figure 3.11: Characterisation of K562 cells. (A) Karyotype of K562 cells used in this thesis. Cells were karyotyped by Giemsa banding in collaboration with D. Wright from the Children’s Hospital in Westmead, Australia. (B) Schematic showing the -globin locus before (normal) and after (K562 Gγ-Aγ tdTomato) genome editing. As K562s have three alleles of the -globin locus only clones with three targeted alleles were selected. We considered clones -175 T>C cells if they had one or more alleles with the -175 T>C mutation (see enlarged area).

3.7 TAL1 binds and activates -globin -175 T>C in K562 cells To analyse the effect of introducing the -175 T>C mutation we performed flow cytometry on the K562 G-A tdTomato (WT/-175 T>C) cell lines, in order to determine the expression levels of tdTomato (Figure3.12 B and C). On average, clones carrying the -175 T>C mutation in at least one allele showed a significantly higher mean fluorescence than clones with tdTomato under the control of the WT -globin promoter (Figure 3.12A and B). Consistent results were obtained by determining the percentage of mRNA expression for each of the β-like globins, with -175 T>C mutant clones showing an average 2-fold higher -globin expression than the WT clones (Figure 3.12C).

Page | 55

CHAPTER 3: The -175 T to C mutation

Figure 3.12: K562 G-A cells carrying -175 T>C show increased -globin promoter activity. (A) Bar chart showing mean േSD -globin promoter activity for clonal populations (n=5) of K562 WT/-175T>C:G-A cells as determined by measuring median tdTomato fluorescence intensity (MFI). Significance was determined by unpaired two-tailed t-test (*p<0.05). (B) Histogram showing -globin promoter activity in representative clonal populations of K562 WT/-175T>C:G-A cells, as determined by tdTomato fluorescence. Depicted is the median out of five clonal populations (for K562 WT G-A or -175T>C:G-Arespectively) (C) Shown is the percentage mRNA expression of β-like globins in unmodified K562 cells (left) and also expression of tdTomato and β-like globins in clonal populations of K562 WT/-175T>C:G-A cells as determined by qPCR. The graph on the right depicts mean mRNA levels for clonal K562 WT/-175T>C:G-Acell populations (n=4).

To confirm that these differences in -globin expression do not result from changed expression of transcription factors involved in globin gene regulation, we also analysed mRNA expression of TAL1, GATA1, GATA2 and two well-known silencers of foetal globin expression, SOX6 and BCL11A (Figure 3.13). We compared five clonal populations of K562 G-A tdTomato (WT) and (-175 T>C) cells, along with unmodified K562 cells and found no significant differences in transcription factor expression between samples.

Page | 56

CHAPTER 3: The -175 T to C mutation

Figure 3.13: Transcription factor expression is unchanged in clonal K562 G-A tdTomato (WT) or (-175 T>C) cells, respectively. Unmodified K562 cells (n=1), K562 WT:G-A (n=5) and K562 175T>C: G- A cells (n=5). Whole RNA was extracted, and qPCR was used to investigate the expression of SOX6, TAL1, GATA1 and GATA2. Levels were normalised to 18S rRNA levels. BCL11A expression levels were also assessed but were too low to be detected.

Now that we had determined that introduction of the -175 T>C mutation results in higher -globin expression in this human cell model, we next performed TAL1 ChIPs in four different clonal K562 G-A tdTomato (WT) and (-175 T>C) cell lines (Figure 3.14A). We found that TAL1 binds to the -globin promoter in K562 G-A tdTomato (-175 T>C) but not WT cells. To confirm whether TAL1 preferentially binds to the -175 T>C -globin promoter we took the ChIP PCR products and ran pyrosequencing reactions on input and IP material (Figure 3.14B). Pyrosequencing is a sequencing-by-synthesis technique that allows the detection of synthesised DNA sequence in real-time and thus enables accurate quantification of sequence variation213. Before immunoprecipitation the allelic constitution of the promoter is heterozygous with about 40 % T (WT) and 60 % C (Mutation). After pulldown with TAL1 antibody the mutant allele was enriched (90 %), while control IgG antibody precipitated the same ratio of WT:mutant allele as the input. This finding strongly supports the hypothesis that the -175 T>C mutation creates a functional TAL1 binding site in vivo.

Page | 57

CHAPTER 3: The -175 T to C mutation

Figure 3.14: Effects of the -175 T>C mutation in a human in vivo model. (A) Anti-TAL1 ChIP in K562 G-A tdtomato (WT/-175 T>C) cells (n=4). Shown is mean േSD. Enrichment of TAL1 at -globin promoter (HBG) was significantly higher in K562 -175T>C:G-A cells (p<0.005). ZFPM1, a known TAL1 target gene59, was included as positive control region and the -globin promoter (HBB) was included as a region that TAL1 does not bind. No significant difference in TAL1 occupancy was detectable between WT and -175T>C cells at the control regions. Significance was determined by unpaired two- tailed t-test. (B) Representative sequencing pyrograms (left) of PCR products derived from input and ChIP samples in K562 cells heterozygous for the -175 T>C mutation. Shown on the right is the mean frequency of cytosine (=mutated allele) at position -175 of the -globin promoter in input, control antibody ChIP (IgG) and TAL1 ChIP. The mutated allele is enriched only after TAL1 ChIP. Pyrosequencing was performed in triplicate and shown is the mean േSD.

3.8 -175 T>C increases enhancer looping to the -globin promoter

As described in Chapter 1, it is known that developmental regulation of the -globin locus is controlled by progressive looping of distal enhancer elements in the Locus Control Region (LCR) to the promoters of the embryonic, foetal and adult -like globin genes214. Recently, it has been shown that forced looping of the LCR to the -globin promoter in cells that usually express adult haemoglobin can override the developmentally regulated gene expression program towards expression of foetal haemoglobin215,216. To achieve this, Gerd Blobel’s group engineered an artificial zinc finger protein tethered to the self-association domain of LDB1 that targets the -globin promoter. The tethered LDB1 domain can then dimerise with endogenous LDB1 proteins that are bound to the hypersensitive sites in the LCR and hence draw the enhancer towards the foetal globin promoter. Our hypothesis is that the -175 T>C

Page | 58

CHAPTER 3: The -175 T to C mutation

substitution similarly creates a new TAL1/LDB1 binding site and thus may also promote looping of the LCR to the -globin promoter.

To test this hypothesis we performed chromatin conformation capture (3C) experiments in the transgenic MEL cell lines and the modified K562 cells (Figure 3.15A and B). First we looked at relative cross-linking frequencies between hypersensitive site 2 (HS2) and the A-globin promoter. In MEL cells with the -175T>C mutation in the A-globin promoter we could detect higher relative crosslinking frequencies than in the WT controls. We could see the same effect in our modified K562 cells, this time changing the perspective and looking at crosslinking frequencies of the -globin promoter to the different hypersensitive sites in the LCR. Again, we saw an increase in crosslinking frequencies between the -globin promoter and all hypersensitive sites in -175 T>C modified cells compared to cells incorporating the WT promoter tdTomato construct. Thus we suggest that the -175 T>C mutation works mechanistically through enhanced chromatin looping to activate expression of foetal globin (Figure 3.15C).

Figure 3.15: -175 T>C enhances chromatin looping of the LCR to the -globin promoter. (A) 3C assay measuring locus-wide cross-linking frequencies in MEL WT:Acells (grey), MEL -175 T>C:Acells (black) and unmodified transgenic MEL cells (green). A schematic of the human β-globin locus is

Page | 59

CHAPTER 3: The -175 T to C mutation

shown on top of the graph. The x-axis indicates distances in kb from the -gene. Vertical lines represent HindIII restriction sites. The dark brown bar denotes the anchor HindIII fragment containing hypersensitive site 2 (HS2). Beige bars denote analysed HindIII fragments. Replicates are from two independently generated clonal cell populations for WT:Aand -175T>C:Acells (n=2), respectively. Shown is mean േSEM. (B) 3C assay measuring relative crosslinking frequencies of G-globin and LCR in K562 WT and -175T>C:G-A cells. Vertical lines represent HindIII restriction sites. The dark brown bar denotes the anchor HindIII fragment containing the Gglobin promoter. Replicates are from independently generated clonal cell populations of K562 WT (n=2) and -175T>C:G-A(n=3). Shown is mean േSEM. (C) Model of LCR looping to the -globin promoter upon introduction of the -175T>C mutation in the -globin promoter. In the foetal environment, nuclear factors mediate looping of the LCR to the -globin genes (left panel). In the WT adult environment, the LCR loops to the β-globin gene and -globin is silenced. The -175T>C mutation drives recruitment of the LCR to the -globin promoter via assembly of a looping complex consisting of TAL1 and associated cofactors217.

3.9 Chapter 3 Discussion In this chapter we have demonstrated that the -175 T>C HPFH mutation creates a de novo binding site for the transcriptional activator TAL1. Additionally, we showed that TAL1 then recruits the cofactors LMO2 and LDB1 to the -globin promoter in vitro and in cellular models. It has previously been described195,196 that the -175 T>C mutation alters GATA1 binding to the -globin promoter. Our findings from the in vitro binding assays showed that although the -175 T>C mutation disrupts GATA1 binding to that specific site, GATA1 binding to the -globin promoter is not affected due to the second GATA site present. We demonstrated in vitro that mutual binding of TAL1 and GATA1 to the -globin promoter is possible when the bridging molecule LMO2/LDB1 is present. However, from our ChIP data in cellular models altered GATA1 binding is not evident. We provide no evidence that GATA-1 plays a critical role in activating -globin expression but are able to show that the -175 T>C mutation causes TAL1/LMO2/LDB1 binding. It has been shown that GATA1, like most transcription factors, can activate or repress genes depending on the cellular context218. Tripic et al. found that GATA1 association with the TAL1/LMO2/LDB1 complex distinguishes active from repressive GATA complexes62. Given these findings and our results on GATA1 binding to WT or -175 T>C -globin promoter in vitro and in ChIP, we cannot rule out the possibility that GATA1 may be involved in both the repression of the WT promoter65 and also the activation of the -175 T>C promoter in cooperation with TAL1. Earlier this year our findings were confirmed by a different group219. They generated a transgenic mouse model carrying the -175 T>C mutation that had high levels of HbF (36-41 %) and were able to confirm the formation of a TAL1/LMO2/LDB1 complex at the -175 T>C -globin promoter. However, similar to our results, they were unable to detect a change in binding of GATA1 to the WT or -175 T>C -globin promoter.

Page | 60

CHAPTER 3: The -175 T to C mutation

It has also been reported that the repressor OCT1/POU2F1 may bind to this region of the -globin promoter in vitro203,220,221. Researchers have suggested that the -175 T>C disrupts binding of OCT1 and thus -globin is derepressed. We did not investigate binding of OCT1 in this work and we do not want to exclude the possibility of its involvement in derepression of -globin in the -175 T>C promoter. However, it has previously been described that the -175 T>C mutation can drive promoter activity in erythroid cells but not in non-erythroid cells196. As OCT1 is ubiquitously expressed222 this finding suggests that disruption of OCT1 binding cannot be solely responsible for the increased promoter activity and that an erythroid specific factor is necessary for the activation of the -175 T>C promoter. Nevertheless, further work needs to be done on determining the transcriptional components that are involved in silencing the WT -globin promoter. This is definitely something that we will continue to pursue in the future.

Besides the clinical data on the -175 T>C mutation in humans, all investigations on -175 T>C promoter activity have been done in artificial reporter systems. In our study we were the first to introduce an HPFH mutation into the endogenous genomic locus of living cells. We successfully edited the genome of erythroid cell lines to introduce the -175 T>C HPFH mutation and found that this was associated with a significant increase in -globin promoter activity in both the transgenic MEL cells and the modified K562 cells. In addition, in our transgenic MEL cells with and without the -175 T>C mutation we could not only monitor -globin promoter activity between the cell lines but also compare the expression of G and A within the same cell. We introduced the HPFH mutation and ECFP into the G-globin promoter whereas the A-globin promoter retained a WT sequence and dsRED reporter gene. When comparing promoter activity of A- and G-globin in WT and -175 T>C MEL cell lines we saw an increase of A-globin promoter activity in the -175 T>C cells but no significant change in expression of dsRED driven by the G-globin promoter. However, it needs to be mentioned that generally A-globin promoter activity is dramatically higher than G activity in both WT and -175 T>C cells as it can be seen from the results of the flow cytometry experiments. We believe that introducing a neomycin cassette with a constantly active PGK promoter into a genetic locus that is usually poised may have opened up the chromatin locally thus allowing a higher background level of ECFP (A-globin) expression in both WT and -175 T>C cells. This clearly shows that our model system has limitations. Both of our cell models monitor -globin expression in a fluorescent reporter system without actually measuring the levels of functional foetal haemoglobin. Furthermore, K562 cells recapitulate an embryonic/foetal stage of erythropoiesis expressing high levels of foetal haemoglobin. As we are interested in

Page | 61

CHAPTER 3: The -175 T to C mutation

reactivating the expression of -globin it becomes apparent that using a model that already has high expression of foetal haemoglobin is far from ideal. We were able to see a two-fold increase in promoter activity with the -175 T>C mutation in our modified K562 cells. That increase is small compared to the 10 to 40-fold increase in foetal haemoglobin levels in people with HPFH and probably due to already high foetal globin expression in the WT K562 cells. The transgenic MEL cells have the advantage of showing an adult-type globin expression pattern but then face the problem of species-dependant differences in haemoglobin switching51,223. The lack of a better model cell line to study haemoglobin switching has been an issue in the field of haemoglobin research for some time. When we started this project, the only readily available cell line models were K562 and transgenic MEL cells which both have their limitations as outlined above. During the course of this PhD and after we finished the work on the -175 T>C mutation, a new cell line was developed that circumvents most of the difficulties associated with K562 and transgenic MEL cells models. HUDEP2 cells are immortalized human umbilical cord blood-derived erythroid progenitor cells from a healthy donor and thus have a normal karyotype181. They resemble human adult erythroid progenitor cells and can further be differentiated into enucleated red blood cells. Their -like globins mainly consist of adult -globin chains (97 %) and only small amounts foetal -globin (~1-2 % each) making them a great model to study the reactivation of foetal haemoglobin. On the downside, HUDEP2 cells are a fairly sensitive cell line that requires complex growth media supplemented with stem cell factors and cytokines. In the following projects of this work we aimed to perform most of our experiments in HUDEP2 cells stepping away from using K562 and transgenic MEL cell line models. Ideally our experiments should be carried out in human CD34+ hematopoietic progenitor cells derived from bone marrow or cord blood. However, although efficiency of genome editing is improving rapidly, frequencies of homologous recombination in pluripotent cells are still low making editing of primary cells challenging.

Nonetheless, most importantly, our models allowed us to determine the molecular mechanism that allows this HPFH mutation to facilitate persistent -globin expression. We showed that the -175 T>C mutation creates a novel binding site for the activator TAL1. Indeed, our data indicate that TAL1 binds to the mutant -globin promoter in a complex with LMO2 and LDB1. It has recently been shown that LDB1 is the key factor enabling LCR looping to the globin genes and that an artificial zinc finger LDB1 construct is sufficient to force LCR looping to either the foetal or adult globin genes215,216. Our data suggest that recruitment of LDB1 to the -globin promoter by de novo binding of TAL1 can similarly facilitate looping to

Page | 62

CHAPTER 3: The -175 T to C mutation

the LCR via dimer- or multimerisation224 with LDB1 proteins. In addition, our transgenic MEL cell model showed that expression of foetal globin was upregulated whereas expression of adult globin was reduced. A common hypothesis is that the interaction of the LCR with the foetal or adult globin promoter is mutually exclusive214. This may explain the reduction of -globin expression in the transgenic MEL cells carrying the -175 T>C mutation as the LCR is predominantly looping to the foetal globin promoter. Although the chromatin conformation capture experiments show enhanced looping of the LCR to the -globin promoter in these cells no strong interaction of the LCR with the -globin promoter could be detected in the WT cells. This could be due to a number of factors. The transgenic MEL cells represent a model system of human haemoglobin switching in a murine cell. They carry the complete human -globin locus on a BAC as a transgene. Whilst the expression of the globin genes seems to be regulated similarly to an adult human erythroid cell it is not entirely clear if developmental differences between mice and humans influence the chromatin structure of the locus. Given that the -globin locus is located on a BAC and not integrated in its usual chromosomal context chromatin folding and 3D structure of the locus may be altered. Additionally one may argue that the fluorescent reporter genes within the locus disrupt the native chromatin structure again showing the limitations of our cell models. Nevertheless, as the 3C results of the modified K562s show the same trend as the MEL cells, that is more looping of the LCR to the -globin promoter in the -175 T>C cells, we are confident that our findings provide a mechanistic explanation for how the -175 T>C mutation results in HPFH. We suggest that introducing the -175 T>C mutation could represent a new approach to reactivating -globin expression in adult erythroid cells. By reversing globin switching, the engineering of this HPFH mutation increases expression of beneficial -globin and at the same time reduces levels of defective -globin chains. For future experiments our lab will try to avoid the knock-in of reporter genes or antibiotic resistance markers but aim to introduce HPFH mutations in the promoter region as point mutations only. Consequently we will be able to monitor endogenous -globin gene expression and sidestep the complicacy of altered chromatin structure.

Overall we propose that this study presents a proof-of-concept model of a novel gene therapeutic strategy to reactivate the expression of -globin in adulthood. By performing genome editing on adult hematopoietic stem cells derived from an individual that is affected with -thalassemia one could introduce the -175 T>C mutation and thus elevate foetal haemoglobin levels to alleviate symptoms. The -175 T>C mutation results in foetal haemoglobin levels of ~20 % in heterozygotes unaffected with other haemoglobinopathies

Page | 63

CHAPTER 3: The -175 T to C mutation

and thus represents the strongest foetal globin inducer of all HPFH mutations mentioned in this thesis. Hence we suggest that -175 T>C would be the HPFH mutation of choice in clinical applications that aim to raise foetal globin levels. Nevertheless, further work will be required to overcome challenges in obtaining high frequency recombination in pluripotent cells, obtaining enough cells for transplant, and assessing the safety of potential off target effects.

Page | 64

CHAPTER 4: The -198 T to C mutation

4 CHAPTER 4: The -198 T to C mutation

4.1 Chapter 4 Introduction

4.1.1 The −198 T>C HPFH mutation boosts foetal haemoglobin levels in vivo In Chapter 3, we focussed on the mechanism underlying the -175 T>C mutation, and, through use of in vitro techniques and genome editing in mouse and human erythroid cell lines, showed that this HPFH mutation causes elevated -globin by creating a de novo site for the transcriptional activator, TAL1. In this chapter we turn our attention to another of the HPFH mutations. The -198 T>C mutation was first described in 1974 in a British family113,114 and is therefore also known as British type HPFH. One female member of the family was a control participant in a survey for the thalassemia trait and was found to have high foetal haemoglobin levels (21 % HbF). Following that discovery, 19 more family members were screened 13 of which were found to have elevated haemoglobin levels. All of them, including the proposita, were completely healthy. From their haematological data the family members could be grouped into two classes: high levels of HbF (~20 %) and intermediate levels of HbF (~10 %). The fact that the proposita had three children all of which have intermediate levels of HbF led to the assumption that the group with high levels of HbF are homozygotes while the group with intermediate levels are heterozygotes. This finding was later confirmed113. Further analysis of the foetal haemoglobin revealed that there were mostly A-globin chains present (~90 %) suggesting that a mutation causing high -globin expression is located in the A-globin promoter. Ten years after the identification of the family, sequencing revealed that affected individuals had a T to C substitution at position -198 in their A-globin promoter.

4.1.2 The -198 T>C mutation creates a CACCC box The -198 T>C mutation is found amongst a cluster of other HPFH mutations around the -200 site in the -globin promoter (Figure 4.1). It was previously proposed that all of these HPFH mutations disrupt binding of a repressor. Our lab was able to show that this repressor is ZBTB7A and that indeed all the HPFH mutations disrupt ZBTB7A binding to the -200 site with the exception of the -198 T>C mutation (this will be subject of Chapter 5). As the -198 T>C mutation does not disrupt the binding of ZBTB7A to the -globin promoter, the mechanism underlying -globin activation in British HPFH must be a different. Here we sought to uncover this mechanism.

Page | 65

CHAPTER 4: The -198 T to C mutation

Figure 4.1: HPFH mutations in the -globin promoter. The -198 T>C site is highlighted with a black box and will be discussed in this chapter.

In 1989 it was discovered that the -198 T>C promoter creates a binding site that resembles a CACCC motif. This motif is commonly found in other promoters and is known to be bound by the SP protein family and Krüppel-like factors225. EMSAs revealed that the transcription factor SP1 extracted from K562 cells and other non-erythroid cell lines is capable of binding to the -198 T>C globin promoter in vitro225,226. SP1 is expressed in a large number of different tissues with similar expression levels227. That goes in hand with the fact that it plays an important role in the maintenance of housekeeping genes, cell growth factors and cell cycle genes228. However, transactivation assays using the WT or -198 T>C -globin promoter coupled to a reporter construct have shown that the -198 T>C mutation is solely able to activate the reporter gene when transfected into erythroid K562 cells but not in non-erythroid cell lines225,226. This finding suggests that SP1 alone is not capable of activating the -globin promoter alone but requires interaction with other erythroid specific cofactors to be able to drive expression of foetal haemoglobin to the levels seen in people with British HPFH.

Given these results, we decided to re-evaluate what is known about the -198 T>C mutation and considered other factors in addition to SP1 that could be responsible for high expression of -globin. SP1 is one of nine SP proteins in the SP/KLF transcription factor family whose members have highly conserved DNA-binding domains with sequence identity of more than

65 % across members. They are composed of three Cys2His2-type zinc fingers that recognise GC and/or GT-boxes228. The alignment of their DNA-binding domains is shown in Figure 4.2A. Notably, the 26 family members show high sequence identity in the three residues in each zinc finger (highlighted in red) that make direct contact to DNA. Hence, it is not surprising that their in vivo binding motifs are highly similar as determined by chromatin immunoprecipitation experiments followed by sequencing (ChIP-Seq) (Figure 4.2B).

Page | 66

CHAPTER 4: The -198 T to C mutation

Figure 4.2: The DNA-binding domains of SP and KLF proteins are highly similar. (A) Aligned protein sequences of all members of the SP/KLF family of transcription factors. Shown is the three zinc finger DNA-binding domain of all proteins. Highlighted are residues -1, +3 and +6 of each zinc finger which are responsible for making contact to DNA. (B) In vivo binding motifs of transcription factors SP1229, KLF160, KLF3230 and KLF4231 as determined by ChIP-Seq. (Figure adapted from Lim W, 2016232.)

If all members of the SP/KLF family recognise highly similar DNA sequences it is inevitable that some of them also share the same target genes. For instance, KLF2, KLF4 and KLF5 are all involved in the regulation of Esrrb, Fbxo15, Nanog and Tcl1 in embryonic stem cells233. It is hard to imagine though that all SP and KLF proteins in the cell are competing for the same binding sites. This means that there must be other factors influencing the regulation of SP/KLF regulated genes. While the DNA-binding domains of SP proteins and KLFs are highly similar, their functional domains vary and determine interactions with cofactors that can alter DNA-binding specificity and their activation or repression potential230,234. Some KLFs act mainly as activators, such as KLF141 whereas others, such as KLF347, are predominantly repressors. Importantly, some of the family members show tissue-specific expression patterns limiting their availability for gene regulation to only a few cell types.

Page | 67

CHAPTER 4: The -198 T to C mutation

In this chapter we reinvestigated the SP/KLF family of transcription factors and their potential to bind to and activate the -198 T>C -globin promoter. We hypothesised that the erythroid specific activator KLF1 was a promising candidate for the family member whose binding was altered by the -198 T>C mutation due to its abundance in erythroid tissues41.

4.2 KLF1 binds to the -198T>C mutation in vitro The first step in validating our hypothesis was to examine binding of KLF1 to the -198 T>C -globin promoter in vitro. We performed EMSAs using radiolabelled WT and -198 T>C -globin promoter probes (-209 to -187) and nuclear extracts from Cos cells overexpressing full-length KLF1 (Figure 4.3A). With the WT probe we could not detect any binding of KLF1, incubation with the -198 T>C probe however resulted in a retarded protein-DNA complex upon addition of KLF1 (lane 5) which supershifted with antibody against KLF1 (lane 6). The endogenous Cos cell bands that appear in lanes 4-6 are presumably abundantly expressed proteins, one of which has been identified as SP1 in K562 cells and other non-erythroid cells225,226. Binding of full-length proteins is often difficult to visualise in EMSAs while smaller fragments of the protein can be detected more easily. Hence, we tested binding of the DNA- binding domain only of KLF1 to the -globin promoter (Figure 4.3B). Again we were able to see binding of KLF1-ZF only in case of the probe containing the -198 T>C mutation but not the WT probe. These findings confirm our hypothesis that -198 T>C mutation creates a de novo binding motif for the transcriptional activator KLF1.

Page | 68

CHAPTER 4: The -198 T to C mutation

Figure 4.3: KLF1 binds to the -198 T>C promoter in vitro. (A) EMSA showing binding of full length KLF1 (A) and KLF1 zinc fingers (B) to WT and -198 T>C -globin promoter probes. KLF1 was overexpressed in Cos cells and nuclear extracts were incubated with radiolabelled probe (-209 to -187) before gel electrophoresis. Incubation of KLF1 with the -198 T>C mutant probe resulted in a retarded band that was supershifted with antibody against KLF1.

4.3 Other KLF proteins also bind to the -198T>C mutation in vitro

Although we showed that KLF1 can bind to the -198 T>C -globin promoter in vitro, the highly conserved DNA binding domains of the KLF/SP family mean that other family members were likely to behave similarly. We next investigated the abilities of all readily available KLFs to bind to the -198 T>C -globin promoter in vitro. This was done by transiently overexpressing KLF2 to 13 in Cos cells. Nuclear extracts were then incubated with radiolabelled probes for the WT and the -198 T>C -globin promoter. As we did not have antibodies to all KLFs to allow us to perform Western Blots to confirm protein expression, we performed an EMSA with a probe containing a consensus CACCC box from the -globin promoter that should allow binding of most KLFs in vitro and provide expression information. The sequences of the CACCC -globin promoter probe and -globin promoter are shown in Figure 4.4A. We detected binding of all KLFs to the control CACCC probe, with some KLFs showing stronger interaction with DNA than others, suggesting that all proteins were expressed but that expression levels were variable between family members

Page | 69

CHAPTER 4: The -198 T to C mutation

(Figure 4.4B). Binding of all KLFs to the WT -globin promoter was negligible with most of them showing no interaction at all (Figure 4.4C). We detected weak DNA-protein interactions for KLF2, 4, 8, 9, 11 and 13 with the WT probe which are also the proteins that bound most strongly to the CACCC control probe suggesting that there is greater levels of expressed protein in these samples. The -198 T>C promoter probe was bound by all KLFs with comparable or even stronger affinity than what we observed with the CACCC -globin promoter control probe (Figure 4.4D). These findings led us to the conclusion that the -198 T>C mutation creates a typical CACCC box that allows many KLFs to bind and potentially activate the gene.

Figure 4.4: The -198 T>C mutation creates a binding site for many KLFs. (A) Sequences of the DNA probes used in this experiment. The KLF/Sp binding site is highlighted in red. The murine -globin CACCC probe was used as a positive control for binding of KLFs. EMSAs showing binding of KLF1-13 to the -globin CACCC probe (B), the WT -globin promoter (C) and the -198 T>C mutant -globin promoter (D). Probes were labelled with radioactive Phosphorus-32 with the same activity. KLFs were overexpressed in COS cells and nuclear extracts were incubated with same amounts of radiolabelled probe before gel electrophoresis. All gels were exposed to a phosphorimager screen overnight. Red asterisks mark KLF transcription factor-DNA complexes.

Page | 70

CHAPTER 4: The -198 T to C mutation

4.4 KLF3 and KLF1 are the most abundant KLFs in erythroid cells

As outlined above, the -198 T>C mutation activates the -globin expression only in erythroid cells. While KLF1 stands out as the obvious candidate as it is known as one of the master transcription factors in red blood cells, there are other KLFs that have also been identified as important regulators of erythropoiesis. For instance, KLF2, closely related to KLF1235, has been shown to be necessary for early erythropoiesis, driving the expression of embryonic globins in mice and humans236. We therefore decided to examine relative expression levels of all KLFs in erythroid cells and also included the first three members of the SP protein family, SP1, SP2 and SP3.

It is known that the mutation raises foetal haemoglobin levels in adult erythroid cells. Hence, we looked at expression levels of KLFs in human primary CD34+ cells derived from adult bone marrow. These cells can be differentiated from haematopoietic stem cells (week 1) to late stage erythroblasts (week 3) ex vivo237. The differentiation of the cells was done by Laura Norton (Crossley lab) and the differentiation procedure is described in detail in her unpublished PhD thesis73. We monitored mRNA levels of KLFs over a period of three weeks sampling at week 1, 2 and 3 (Figure 4.5). We determined relative mRNA levels by quantitative real-time PCR using primers spanning exon-exon boundaries. To confirm equivalent primer efficiency we ran standard curves with serially diluted cDNA for each primer pair (Figure II.2 in the Appendix). However, some KLFs were expressed at such low levels that we were unable to determine primer efficiency by standard curves. Moreover, we failed to design a suitable primer pair for KLF15 due to the high GC content in the exon-exon boundary region. KLF14 is a gene without introns thus primers will also recognise any genomic DNA contamination in the sample. Ct values for KLF14 were very low in all samples and almost identical to Ct values from -RT samples leading us to the conclusion that expression KLF14 was very low to undetectable. KLF17 levels were also very low to undetectable in all stages of differentiation and are therefore not shown in the graph.

KLF2, KLF5, KLF8 and KLF12 are expressed at very low levels in both uninduced HSCs and HCSs after erythroid differentiation. KLF4 is expressed at low levels in HSCs and is downregulated on differentiation at the week 2 and even further in week 3 time points. The expression of most other KLFs was induced upon differentiation with KLF1 and KLF3 showing the greatest fold induction compared to HSCs in week 1. At the week 2 and 3 time points they are by far the most abundant KLFs present of the ones we were able to measure in this experiment. SP1 was expressed at low to moderate levels while SP2 and SP3 were more abundant. Overall,

Page | 71

CHAPTER 4: The -198 T to C mutation

these findings were in agreement with what other researchers have found by RNA-Seq in adult bone marrow238 and in in vitro differentiated CD34+ cells239 (Appendix III, Figure III.1).

Figure 4.5: KLF1 and KLF3 become the most abundant KLFs during erythropoiesis. KLF and SP gene expression during differentiation of human CD34+ bone marrow cells. Three separate erythroid differentiations of CD34+ bone marrow cells were performed and RNA was extracted at week 1, 2 and 3 of differentiation. qPCR was used to investigate the expression of KLF1 to 13, KLF16 and SP1 to 3. Levels were normalised to 18S rRNA levels and thresholds were kept the same throughout the experiment. Shown is the mean ± S.D. for each target.

Next, we investigated expression levels in two human erythroid cell lines that are routinely used in our lab: HUDEP2 and K562 cells (Figure 4.6). While K562 cells are an erythroleukemia cell line that mostly expresses embryonic and foetal-type globins, HUDEP2 cells are immortalised human erythroblasts derived from CD34+ bone marrow of a healthy adult donor and express mainly adult-type haemoglobin240. The expression pattern of KLFs in HUDEP2 cells resembles differentiated CD34+ bone marrow cells (Figure 4.6A). KLF1 and KLF3 are highly expressed to equivalent levels. KLF6, KLF8, KLF9-13, KLF16 and SP1-3 are expressed in moderate amounts, whereas KLF2, KLF4, KLF5 and KLF7 are scarcely expressed. Levels of KLF12 and KLF17 were undetectable. Once differentiated, mature erythrocytes almost exclusively contain KLF1, KLF3 and SP3 mRNA239 (Appendix III, Figure III.1B). Hence, the presence of other KLF mRNA in HUDEP2 cells confirms the progenitor state of the cells but the high amounts of KLF1, KLF3 and SP3 indicate their commitment towards the erythroid lineage.

Historically most of the experiments that have been done to investigate the -198 T>C mutation were performed in K562 cells225,226. K562 cells show high levels of KLF3 and SP3, however, KLF1 is only half as strongly expressed as in differentiated HSCs or HUDEP2 cells

Page | 72

CHAPTER 4: The -198 T to C mutation

(Figure 4.6B). KLF2, KLF4, KLF5, KLF7, KLF8 and KLF17 are only present in very low amounts. KLF12 levels were undetectable and are therefore not shown in the graph. We found that K562 cells express moderate levels of KLF6, KLF9, KLF10, KLF11, KLF13, KLF16, SP1 and SP2.

Figure 4.6: Klf and Sp gene expression in routinely used human erythroid cell lines HUDEP2 (A) and K562 (B). Three biological replicates were prepared for each cell line. qPCR was used to investigate the expression of KLF1 to 13, KLF16, 17 and SP1 to 3. Levels were normalised to 18S rRNA levels and thresholds were kept the same throughout the experiment. Shown is the mean ± S.D. for each target.

4.5 KLF1 activates the -globin promoter in a reporter assay As demonstrated above, KLF1 an abundant KLF protein in human erythroid cells and can bind to the -globin promoter in vitro. Accordingly, our next step was to investigate if KLF1 can activate the –198 T>C -globin promoter in vitro. We performed transient activation assays with reporter plasmids that express firefly luciferase when activated. We compared

Page | 73

CHAPTER 4: The -198 T to C mutation

KLF1’s potential to activate the WT and the -198 T>C mutant -globin promoter in SL2 cells (Figure 4.7A). Both constructs contained the -385 to +50 region of the G-globin promoter cloned into pGL4.10. The empty promoter-less pGL4.10 plasmid was used as a negative control.

Figure 4.7: KLF1 activates the -198 T>C globin promoter in vitro. The effect of KLF1 (A) and KLF3 (B) on the activation of the -198 T>C -globin promoter was studied by transient transfection assays in SL2 cells. SL2 cells were transfected with pGL4.10 reporter plasmid containing no promoter (empty), or the WT or the -198 T>C -globin promoter (-385/+50). An expression plasmid for KLF1 or KLF3 was co-transfected in increasing amounts. After 48 h, the cells were harvested and then assayed for luciferase activity. Data were the mean ± S.D. from three independent experiments. Luciferase activity is significantly higher in cells transfected with the maximum KLF1 dose and the -198 T>C promoter compared to the WT (*p=0.016, determined by unpaired two-tailed t-test). SL2 cells transfected with

KLF3 did not show any significant difference in luciferase activity.

The empty pGL4.10 reporter showed very low background levels of luciferase activity and addition of KLF1 did not activate the reporter. The WT -globin promoter demonstrated a generally higher level of background activation without KLF1, which further increased upon addition of KLF1 in a dose-dependent manner. Transfection of the -198 T>C -globin promoter also resulted in higher background activation than empty pGL4.10. However, adding increasing amounts of KLF1 further elevated promoter activity and the maximum dose of KLF1 resulted in significantly higher firefly luciferase signals than in the WT promoter. When comparing the firefly signals achieved with the maximum dose of KLF1 to the ones with no KLF1 addition for each promoter individually, the WT promoter is activated around 9-fold whereas the reporter output with the -198 T>C mutation is approximately 45-fold higher than without KLF1 addition. These data suggest that KLF1 is indeed a potent activator of the -198 T>C -globin promoter in vitro.

As we have found that KLF3 is also highly abundant in erythroid cells, we next investigated the activating potential of KLF3 in a similar assay (Figure 4.7B). Again transfection of the WT

Page | 74

CHAPTER 4: The -198 T to C mutation

and -198 T>C -globin promoter resulted in higher background activation of the luciferase reporter. However, this time co-transfection with KLF3 in increasing amounts did not change the level of luciferase activity of either the WT or the -198 T>C -globin promoter, indicating that KLF3 is not capable of transactivating the -globin promoter in this assay.

4.6 The effect of the -198 T>C mutation on -globin expression in a human cell model To study the -198 T>C mutation in more detail we decided to engineer HUDEP2 cells to stably carry the mutation. HUDEP2 cells appear to resemble primary adult erythroblasts most in their expression patterns of globins and KLFs/SP proteins and should represent a good cellular model of -198 T>C HPFH181.

In this study we wanted to keep the experimental design as close as possible to the actual state of HPFH in people. Technological advances that occurred throughout my PhD candidature allowed us to approach this HPFH mutation in a way that removed some of the limitations of the work shown on the -175 T>C mutation shown in Chapter 3. Hence, we decided to introduce the mutation by solely making a single base pair substitution by using the CRISPR/Cas9 system and refrained from introducing artificial fluorescence or selection markers. We transfected the cells with a small guide RNA that leads the Cas9 protein close to the -198 T>C mutation in the genome and additionally added a single-stranded oligodeoxynucleotide (ssODN) containing the -198 T>C mutation and 75 bp arms of homology on either side of the mutation (Figure 4.8A). We also mock nucleofected HUDEP2 cells without adding plasmid DNA or donor oligo to the nucleofection mixture as a negative control. After establishing monoclonal populations, genomic DNA was extracted. The clones were screened by PCR across the globin promoter and the presence of the mutation was confirmed by Sanger sequencing of the PCR product. Interestingly, successful integration of the -198 T>C mutation was much more frequent than the integration of other HPFH mutations in that region (see Chapter 5). Almost every clone that we screened contained at least one allele with the -198 T>C mutation, however, the second allele often displayed random insertions and/or deletions. Out of 30 clones that were screened two showed a heterozygote and seven a homozygote genotype. We considered the possibility of large deletions across the promoter region of one allele such that our PCR primers would not detect the deleted allele – such clones would appear falsely as homozygous -198 T>C by sequencing. We ruled out this possibility by running another genomic PCR with primers further up- and downstream of the targeted site in the promoter (Figure 4.8B). Without any deletions the PCR product would have an estimated size of 2.5 kb, however, if bigger

Page | 75

CHAPTER 4: The -198 T to C mutation

deletions were present the product was smaller. We detected big deletions of approximately 800-1500 bp in clone 12 and 14, while clone 19 showed an unexpected insertion. These three clones were excluded from further analysis. Clone 16 failed to be expanded beyond the 96-well cell culture scale and was therefore discarded.

As there are a total of four -globin promoters in the genome of HUDEP2 cells that have an identical sequence, the sgRNA/Cas9 complex is likely to target all of them. Thus, we next investigated if our genome editing strategy had resulted in a collapsed locus similarly to what we observed in our K562 tdTomato cells (Chapter 3). We determined the copy number of the -globin promoter and the intergenic region between the two -globin genes by running quantitative real-time PCR on genomic DNA of HUDEP2 WT cells and the clonal populations. We normalised the data against Ct values of an unrelated gene (ZFPM1) and then set the copy number of HUDEP2 WT cells to 2 for the intergenic region and to 4 for the -globin promoter (Figure 4.8C). Mock nucleofected clonal populations and heterozygous clones 15 and 17 showed an intact -globin locus with four copies of the promoter and two copies of the intergenic region. Data for the three homozygous clones indicated that the globin locus had been collapsed as they showed no amplification for the intergenic region and only half the copy number for the -globin promoter (Figure 4.8D). The Sanger sequencing tracks of the five clones that were analysed and used for further experiments are shown in Figure 4.8E.

Page | 76

CHAPTER 4: The -198 T to C mutation

Figure 4.8: Establishment of clonal HUDEP2 cell populations carrying the -198 T>C mutation. (A) Genome editing strategy to introduce the -198 T>C mutation into HUDEP2 cells. The sgRNA is designed to cut close to the -198 site in the globin promoter. An ssODN containing the T>C substitution at position -198 and 75 bp arms of homology to either side of the -198 site was co- transfected with the sgRNA/Cas9 plasmid and used as a donor template for HDR. (B) PCR from genomic DNA of monoclonal HUDEP2 populations spanning a region of ~2.5 kb across the -globin promoter. PCR on HUDEP2 WT cells (pool) was run as a control. Large deletions were detected in clones #12 and #14 and an insertion was identified in clone #19. (C) Copy number analysis in HUDEP2 WT cells (pool) and clonal populations of WT (Mock1 and 2) and -198 T>C cells. Copy number was determined by quantitative real-time PCR across the globin promoter (HBG) and the intergenic region (-3.1 kb upstream of A). Ct values were normalised against Ct values for PCR across ZFPM1 and then compared to known copy numbers in HUDEP2 WT cells. (D) Schematic of the -globin locus in

Page | 77

CHAPTER 4: The -198 T to C mutation

unedited HUDEP2 WT cells and heterozygous and homozygous -198 T>C HUDEP2 cells after genome editing. Blue arrows indicate qPCR primers at the -globin promoter (HBG), red arrows indicate qPCR primers in the intergenic region (-3.1 kb). A red cross indicates the presence of the -198 T>C mutation. Homozygous -198 T>C HUDEP2 cells are missing the G-globin gene and the intergenic region. (E) Aligned Sanger sequencing tracks of HUDEP2 clones carrying at least one copy of the -198 T>C mutation. These five clones were used for further analysis.

4.7 The -198 T>C mutation increases foetal globin expression in HUDEP2 cells To explore if the -198 T>C HPFH mutation indeed increases foetal globin expression in our cellular model, we looked at mRNA expression levels of all -like globins by quantitative RT-PCR (Figure 4.9). In HUDEP2 cells foetal globin is only expressed to about 1-2 % of total -like globin chains. Embryonic globin represents an even smaller fraction of total -like globins (>0.2 %) and is therefore not shown in Figure 4.9A. The majority of -like globins expressed in HUDEP2 cells are adult-type -globin chains and a small percentage are -globin chains. This expression pattern resembles the composition of haemoglobin in adult humans.

Primary CD34+ cells often express a high basal level of -globin when cultured ex vivo240 which may be induced by cellular stress due to the culturing conditions241. HUDEP2 cells are derived from CD34+ bone marrow HSCs and we considered that they may display similarly higher -globin levels when stressed. By mock nucleofecting HUDEP2 cells and establishing clonal populations by FACS we ensured that these control populations went through the same amount of stress as the -198 T>C edited cells. We investigated a total of five monoclonal WT populations and none of them showed significant upregulation of -globin expression compared to the pool of HUDEP2 cells which had not been nucleofected or sorted leading us to the conclusion that our editing and screening procedures alone do not elevate foetal -globin expression in HUDEP2 cells (Figure 4.9 and data not shown). We then compared the expression of all -like globins of WT cells (pool) and two of the mock WT populations to that of our HUDEP2 cells carrying the -198 T>C mutation. The two heterozygous clones 15 and 17 with an intact globin locus expressed up to 12 times more globin than WT cells (Figure 4.9B). Up to 9 % of all like globins were globin chains. Interestingly, globin expression of the three homozygous clones was dramatically boosted with between 250 and 460 fold upregulation in expression compared to WT. Their expression profile of globins almost exclusively consists of globin chains (between 83 and 95 %) (Figure 4.9). The raw qPCR data can be found in Figure III.5 in the Appendix.

Page | 78

CHAPTER 4: The -198 T to C mutation

Figure 4.9: The -198 T>C mutation in the globin promoter drives expression of globin. (A) mRNA expression levels of ,  and globins displayed as a percentage of total like globins. mRNA levels were determined by qPCR and Ct values were normalized to rRNA levels of 18S. Shown are HUDEP2 WT cells (pool), two mock nucleofected WT clones and five clones carrying at least one allele with the -198 T>C mutation. (B) Fold change of globin transcript levels compared to HUDEP2 WT cells. mRNA levels were normalised to 18S rRNA levels. globin expression of HUDEP2 WT cells (pool) was set to 1.

Page | 79

CHAPTER 4: The -198 T to C mutation

4.8 KLF1, KLF3 and SP1 bind to the -198 T>C mutation in vivo After establishing that the -198 T>C mutation elevates foetal globin levels in human erythroid HUDEP2 cells, we continued this study by investigating the binding of transcription factors to the -198 T>C globin promoter in living cells. We began by performing chromatin- immunoprecipitation experiments for KLF1 using the three homozygous -198 T>C HUDEP2 clones and unmodified WT HUDEP2 cells as a control (Figure 4.10A). IgG pulldown experiments were also performed to establish background level of DNA-pulldown by the antibody and enrichment of all targets was low. The SP1 promoter contains a typical CACCC element allowing KLF1 to bind and thus we used this region as a non-erythroid positive control for KLF1 occupancy. We also examined binding of KLF1 to the globin promoter as a second positive control because KLF1 is known to drive the expression of globin in HUDEP2 cells. Moreover, we were interested to see if there were any changes in binding of KLF1 to the globin promoter in our -198 T>C cells. As a negative control we used the globin promoter. To account for differences between experiments and efficiency of ChIP, we decided to normalise the enrichment of KLF1 at each target to the enrichment of KLF1 at the SP1 promoter. The raw qPCR data can be found in Figure III.2 in the Appendix. Binding of KLF1 to the SP1 promoter should not be affected by globin expression and thus represents an appropriate internal control for the efficiency of the pulldown in each experiment. In WT HUDEP2 cells we saw enrichment of KLF1 at the positive control region in the SP1 promoter and also, as expected, to even higher levels at the globin promoter (Figure 4.10A). The globin promoter and the globin promoter regions showed similar low enrichment compared to the positive controls. In all of the -198 T>C cells we saw a consistently higher enrichment of the globin promoter (~3 to 7 fold) compared to the WT. Interestingly clone 9 and 11 also showed a decrease in binding of KLF1 to the globin promoter. Enrichment of the negative control region was mainly unchanged. These findings support our hypothesis that the -198 T>C mutation creates a functional de novo binding site for the activator KLF1.

As outlined above, the transcriptional repressor KLF3 is also expressed at high levels in erythroid cells. Hence, we tested the occupancy of KLF3 by ChIP in the same clonal populations carrying the -198 T>C mutation and WT HUDEP2 cells (Figure 4.10B). KLF3 binds to similar sequences as KLF1 therefore we used the same positive and negative controls as defined for the KLF1 ChIP experiment. We saw enrichment of the SP1 promoter control region and also strong enrichment of the globin promoter in HUDEP2 WT cells. The  and  promoter showed low levels of enrichment after pulldown with KLF3 antibody. However, in the -198 T>C cells KLF3 consistently occupied the globin promoter more (2 to 5 fold) than

Page | 80

CHAPTER 4: The -198 T to C mutation

in the WT cells. Once more we saw a decrease in enrichment of the globin promoter in the cells carrying the -198 T>C mutation.

The next step was to investigate binding of SP1 to the -198 T>C promoter in these cells (Figure 4.10C). Again, we were able to use the same positive and negative control regions for our ChIP experiment. We saw that SP1 strongly binds to its own promoter region and does not occupy the -globin promoter in HUDEP2 WT cells. We were unable to see any strong enrichment of the or globin promoter after pulldown with SP1 antibody. In the -198 T>C cells there was an increase in binding of SP1 to the globin promoter (~2-3 fold). No difference in enrichment was observed for the globin or the globin promoter.

These data suggest that the -198 T>C mutation creates a site for binding of KLF1, KLF3 and SP1 in vivo with KLF1 showing the biggest increase in de novo binding.

Page | 81

CHAPTER 4: The -198 T to C mutation

Figure 4.10: The -198 T>C globin promoter mutation allows binding of KLF1, KLF3 and SP1 in vivo. ChIP-qPCR analysis of the relative enrichment of KLF1 (A), KLF3 (B) and SP1 (C) at various genomic loci in HUDEP2 WT and -198 T>C cells. The tested genomic loci were the globin promoter (HBG), the globin promoter (HBB) and the globin promoter (HBE, -ctrl). The SP1 promoter which all three factors bind in vivo served as a positive control (+ctrl) for successful pulldown with the respective antibody. Enrichment of other genomic loci was normalised against enrichment of the SP1 promoter to account for differences between experiments and pulldown efficiencies of antibodies.

Page | 82

CHAPTER 4: The -198 T to C mutation

4.9 KLFs compete with ZBTB7A for binding to the -198 T>C globin promoter Thus far, we have demonstrated that KLF1 and KLF3 are highly expressed in erythroid cells and cell lines and that the -198 T>C globin promoter mutation leads to elevated foetal -globin expression in human HUDEP2 cells and creates bindings sites for KLF1, KLF3 and SP1 in vitro and in vivo. KLF1 and SP1 mainly function as transcriptional activators and we have shown above that KLF1 can activate the -198 TC promoter in vitro. Taken together, these facts raised the question of how binding of KLF3 to the -198 T>C promoter in vivo can elevate -globin levels.

Our lab has recently shown that the foetal haemoglobin repressor ZBTB7A binds to the -200 site in the globin promoter in vitro and in vivo (unpublished and will be the focus of Chapter 5). Most HPFH mutations found around the -200 site disrupt binding of ZBTB7A, however, the -198 T>C mutation does not. Nevertheless, the globin gene is derepressed when the -198 T>C mutation is present which can be explained by binding of the activator KLF1 but not by de novo binding of the repressor KLF3. Hence we considered a model where different KLF and SP transcription factors compete for binding to the same recognition site as ZBTB7A and binding of one may displace another. As a proof-of-principle experiment, we performed an in vitro DNA-binding assay investing the binding of ZBTB7A zinc fingers and full-length KLF3 to the WT and the -198 T>C globin promoter (Figure 4.11). As expected we observed ZBTB7A bound to both the WT and -198 T>C promoter in our assay (lane 2 and 8). Interestingly we saw KLF3 also binding to both promoters but with a much weaker interaction with the WT than with the -198 T>C promoter (lane 3 and 9). Accordingly, increasing the amount of KLF3 was sufficient to disrupt binding of ZBTB7A to the -198 T>C (lanes 10-12) but not to the WT promoter (lanes 4-6).

These findings indicate that KLF3 can compete with ZBTB7A for occupancy at the -198 T>C -globin promoter because they share an overlapping binding site.

Page | 83

CHAPTER 4: The -198 T to C mutation

Figure 4.11: KLF3 competes with ZBTB7A for binding to the -198 T>C globin promoter. EMSA showing binding of ZBTB7A zinc fingers (ZBTB7A ZF) and full-length KLF3 to the WT and the -198 T>C globin promoter. Proteins were overexpressed in Cos cells and nuclear extracts were incubated with radiolabelled globin promoter probe (-209 to -187).

4.10 Chapter 4 Discussion

In this chapter we have reinvestigated the -198 T>C mutation in the globin promoter that is known to cause British HPFH. Many HPFH mutations are found in the region around -200 bp upstream of the transcription start site of globin and we were able to show that they disrupt binding of the foetal globin repressor ZBTB7A in vitro (see Chapter 5). However, the -198 T>C does not affect the affinity of ZBTB7A for the -globin promoter in vitro. In the late 1980s it was suggested that the mutation creates a binding site for the widely expressed activator SP1225,226. SP1 is one of many members of a large transcription factor family that includes Krüppel-like factors 1-17 and SP proteins 1-9. As the DNA-binding domains of these factors share high levels of homology and are known to bind to very similar DNA motifs in vitro and in vivo, we reopened the question of what factor drives the expression of globin in the case of British HPFH. The activator KLF1, the founding member of the KLF family, is known to be highly expressed in erythroid tissues and therefore stood out as an obvious candidate. We were able to show by in vitro binding assays that the -198 T>C mutation indeed creates a de novo binding site for KLF1. Additionally we

Page | 84

CHAPTER 4: The -198 T to C mutation

demonstrated in a similar in vitro assay that other KLFs are also able to bind to the -198 T>C promoter from which we concluded that the -198 T>C promoter contains a typical CACCC box allowing recognition by many KLFs and SP proteins with highly similar DNA-binding domains.

However, it is hard to imagine that the -198 T>C globin promoter is occupied by all KLFs and Sp proteins in vivo. There are two main factors that determine in vivo binding capacity of a transcription factor (TF). Firstly, the absolute abundance of the TF controls the frequency with which the TF can make contact to regulatory elements in the genome and therefore directly influences the increase or decrease of target gene expression. Different tissues contain variable absolute amounts of TFs according to the desired target gene output in those cells. The -198 T>C mutation when found in people is known to raise foetal haemoglobin levels in red blood cells by 6- to 10 fold in heterozygotes and up to 23-fold in homozygotes113,114. Thus we concluded that only KLF/SP proteins that are expressed in at least moderate levels in these cells would be able to affect foetal globin output to that extent. Our data suggested that KLF1, which we found is expressed in very high levels after erythroid differentiation of HSCs, can bind to the -198 T>C promoter in vivo. Secondly, ongoing studies in our lab have shown that although the DNA-binding domain is the part of the protein that makes direct contact to the DNA it is not the only factor that mediates occupancy of transcription factors at regulatory elements230,234. Burdach and Lim et al were able to show that regions outside of the DNA-binding domain are crucial to determine if a transcription factor can find its target in vivo, likely due to protein-protein interactions with other DNA-binding proteins or cofactors. Again, KLF1 has been described to interact with other erythroid-specific transcription factors to regulate the expression of globins and other erythroid genes48,243.

Our HUDEP2 -198 T>C cells allowed us to study the -198 T>C mutation in a cellular environment that resembles adult erythrocytes. HUDEP2 cells normally express high levels (97-98 %) of adult -globin chains, little amounts of adult -globin chains, no embryonic -globin chains and up to 1-2 % of foetal -globin chains. This is consistent with the composition of haemoglobin in healthy adults, which normally consists of 95-98 % HbA, 2- 3 % HbA2 and less than 2 % HbF. People who are heterozygous for the -198 T>C mutation show elevated HbF levels to between 3.5-10 %. Two of the HUDEP2 clones we screened were heterozygotes for the mutation with -globin ratios of 6 and 9 %, respectively. Thus, the heterozygous phenotype of our cells is a good model for heterozygous individuals with British HPFH.

Page | 85

CHAPTER 4: The -198 T to C mutation

ChIP-qPCR experiments in homozygous HUDEP2 -198 T>C cells showed a 3-7 fold increase in binding of KLF1 to the promoter further supporting the hypothesis that KLF1 activates it in vivo. KLF1 occupancy at the -globin promoter went down in two of the clones carrying the -198 T>C mutation. This could be explained by the fact that KLF1, which usually drives the expression of -globin, is not as required at the -globin promoter as in the WT cells. Given that the -198 T>C clones mainly express -globin and show reduced levels of -globin, this is a plausible explanation for decreased binding of KLF1 to the promoter. We also saw an increase in enrichment of SP1 at the -198 T>C promoter compared to the WT which was slightly more modest than what we saw with KLF1. Interestingly however, SP1 did not greatly occupy the globin promoter although it has been shown to display strong affinity in vitro225. This shows once more that binding of TFs to their targets is not solely dependent on their DNA-binding domain but also depends on other interacting factors and the promoter context.

Our data are insufficient to answer the question whether KFL1 or SP1 is driving the expression of globin in vivo. Both factors have the potential to drive the expression of the -198 T>C promoter in transient transactivation assays using SL2 cells as shown by us and others244. In our experiments KLF1 was capable of activating the -198 T>C globin promoter 46-fold whereas it activated the WT promoter only 9-fold, other groups have seen similar effects with SP1244. As SL2 cells lack endogenous SP-like proteins245 we can conclude from these experiments that KLF1 and SP1 can both act independently to activate the -198 T>C promoter. The fact that -198 T>C promoter activity is only increased in erythroid but not in non-erythroid cells226 suggests that erythroid specific factors are required to reveal the full activating potential of the HPFH mutation. Binding of KLF1 and SP1 in SL2 cells may allow recruitment of the basal transcription machinery thus elevating promoter activity but that does not properly reflect the process of gene activation in erythroid cells.

As an erythroid specific transcription factor KLF1 is known to regulate the expression of many erythroid genes in conjunction with other erythroid TFs. For instance, ChIP-Seq data have revealed a striking overlap of KLF1 target genes with peaks from GATA1, LDB1 and TAL1 ChIP-seqs246. Binding sites for GATA1 and KLFs (including SP1) are often found in close proximity to each other in promoters and enhancers of many erythroid specific genes and seem to be cooperatively activated by GATA1 and KLF1/SP1247. Furthermore, Merika et al. showed that GATA1 and KLF1/SP1 can physically interact via their DNA-binding domains allowing recruitment of GATA1 even without the need for the presence of a GATA site248. We therefore propose that one possibility of how the -198 T>C mutation mechanistically

Page | 86

CHAPTER 4: The -198 T to C mutation

activates globin expression is through de novo recruitment of KLF1 and SP1 which in turn physically interact with and recruit GATA1 (Figure 4.11A). There are two GATA sites in close proximity to the -198 site (at position -175 and -185) which could allow binding of GATA1 directly to the -globin promoter when recruited by KLF1 or SP1. However, as previously mentioned interaction of GATA1 with DNA is not crucial for its recruitment to the promoter. Both scenarios should result in an increase of binding of GATA1 either directly or indirectly to the -198 T>C -globin promoter and could be tested by chromatin immunoprecipitation experiments for GATA1. GATA1 binds to the locus control region and is often found in a pentameric complex with TAL1/E47, LMO2 and LDB1249. Recruitment of GATA1 by KLF1 or SP1 could therefore also induce looping of the LCR towards the globin promoter in the same way to what we have observed by de novo binding of TAL1 in the -175 T>C cells250 (Chapter 3). This hypothesis could be verified by chromatin conformation capture assays in HUDEP2 -198 T>C cells.

Figure 4.11: Proposed mechanisms of globin activation in British HPFH. (A) Schematic of de novo activation of the globin gene by recruitment of KLF1 or SP1 to the -198 T>C promoter. Direct interaction with GATA1 could enhance looping of the LCR induced by GATA- associated factors. KLF1 and SP1 can also interact with the coactivator p300 and the chromatin remodelling complex SWI-SNF via the BRG1 subunit to activate transcription. (B) Derepression of the globin gene by competing KLF/SP proteins. ZBTB7A binding is not disrupted by the -198 T>C mutation, however KLFs and SP proteins can compete for the same binding site as ZBTB7A. Displacement of ZBTB7A and the associated NuRD complex may contribute to the derepression of the gene.

Another hypothesis is that KLF1 and SP1 directly affect the chromatin structure at the normally poised foetal globin locus. KLF1 and SP1 both specifically interact with the coactivator p300/CBP251. P300/CBP possesses intrinsic acetyltransferase activity which can

Page | 87

CHAPTER 4: The -198 T to C mutation

alter histone tail modifications, affect the nucleosomal environment and result in an open chromatin structure252,253. Furthermore, p300 can also acetylate non-histone protein targets and has been shown to acetylate KLF1 at residue Lys-288. Acetylation of KLF1 is required for full activation of the globin promoter and it enhances KLF1’s affinity for the SWI-SNF chromatin remodelling complex254. Both, KLF1 and SP1 can directly interact with this complex via the BRG1 subunit to structurally remodel the nucleosomes at the globin promoter thus activate transcription255. We propose that similar mechanisms could apply in the case of the -198 T>C promoter.

Our data also indicated that KLF3 binds more strongly to the -198 T>C than to the WT promoter in vitro and in vivo. KLF3 has mainly been shown to repress genes in vivo through recruitment of the corepressor C-terminal bindin protein (CtBP)256. Although it has the potential to activate some genes in vitro47 no direct target genes of KLF3 activation are known from in vivo studies. Furthermore, our results from the luciferase reporter assays indicated that KLF3 is incapable of transactivating the -198 T>C promoter in vitro. We therefore wondered what the role of KLF3 binding to the promoter in activating -globin expression could be. As outlined earlier, binding of the transcriptional repressor ZBTB7A is not disrupted by the -198 T>C mutation in vitro. In theory it could therefore bind to the mutant promoter, recruit the NuRD chromatin remodelling complex via direct interaction with the subunits GATAD2B and CDH3 and 872 and silence expression (Figure 4.11B). However, the creation of a de novo binding site for KLF and SP proteins by the -198 T>C mutation now allows many new proteins to bind. Not all of them are potent erythroid activators like KLF1 or SP1 but they might still be able to reside on the binding site for a sufficient amount of time to sterically interfere with ZBTB7A occupancy at the promoter which in turn could lead to the derepression of the globin gene. We tested this hypothesis in in vitro DNA-binding assays and were able to show that KLF3 and ZBTB7A indeed compete for binding to the mutant promoter.

Overall, it is likely that both mechanisms, direct activation by KLF1/SP1 and indirect derepression by competing TFs, occur at the same time in the cell. De novo binding of the activator TAL1 to the -175 T>C mutation results in much higher foetal haemoglobin levels in heterozygous individuals than what has been observed by de novo binding of KLF1/SP1 in people with British HPFH. We suggest that competition for binding to the -198 T>C promoter by additional, non-activating KLFs could possibly not only affect binding of the repressor ZBTB7A but could also imply a competition with the activators KLF1/SP1. This could be a reason for the comparatively lower activation potential of the -198 T>C promoter than

Page | 88

CHAPTER 4: The -198 T to C mutation

the -175 T>C promoter in vivo. For future directions we consider further delving into the question if KLF1 is the key factor to mediate HPFH. By introducing the -198 T>C mutation into the KLF1-inducible cell line K1ER we may be able to determine KLF1s role in activating the -198 T>C promoter. K1ER cells are Klf1 null cells that were rescued with a KLF1 protein tethered to an estrogen receptor which is located in the cytoplasm257. Upon induction with tamoxifen KLF1 is released into the nucleus to activate its target genes. If KLF1 is the main driver of -198 T>C HPFH -globin levels should be increasing once the cells are induced.

In our screen for edited HUDEP2 cell clones we managed to isolate three clones that carried the -198 T>C mutation in all -globin promoters. -globin expression was dramatically increased in these resulting in up to 95 % foetal -globin chains of total -like globins. This large increase cannot solely be explained by the homozygous state of the cells. Copy number analysis allowed us to deduce that genome editing in all three homozygous clones led to a collapsed globin locus which caused deletion of the G gene and the intergenic region between A and G on both alleles. The two remaining Aglobin genes are now regulated by the Gglobin promoter. From the literature available on the three homozygous individuals described it is evident that the mutation occurred in the Aglobin promoter on both alleles and resulted in ~20 % HbF. In theory, the foetal globin output seen in our homozygous cells with two mutant promoters should be comparable to what was observed in the homozygous individuals and twice as high as what we saw in our heterozygous clones. However, this was not the case. One possible explanation could be that the intergenic region in between the globin genes plays a major role in silencing the expression of foetal globin. By collapsing the locus we deleted the intergenic region and all regulatory elements that are contained in it. This could have resulted in loss of binding of important transcription factors that normally keep the globin locus silenced and further activated globin expression in addition to the effects of the -198 T>C mutation. For instance, the transcription factor BCL11A represses foetal haemoglobin expression via a mechanism that is not completely understood but it is known that long-range interactions and cooperation with SOX6 are necessary to keep the foetal globin genes silenced53. Maybe certain chromatin loops are formed that require anchor points within the intergenic region to stabilise the repressive state of the globin genes. To investigate how much of the globin activation stems from the deletion of the intergenic region and how much is due to the HPFH mutation, one would have to collapse the globin locus in HUDEP2 cells without introducing an HPFH mutation and then compare expression levels of foetal globin. This is currently one of the ongoing projects in our lab.

Page | 89

CHAPTER 4: The -198 T to C mutation

Lastly, we have noticed when performing genome editing in HUDEP2 cells that homologous recombination events introducing the -198 T>C mutation were unexpectedly common compared to introducing other HPFH mutations in the same cell type. Almost every clone that we screened had at least one allele with the T to C substitution at position -198. We believe that the abundance of integration is not necessarily a result of a higher frequency of HDR but a consequence of a quick natural repair mechanism that seems to recurrently result in the -198 T>C mutation. The occurrence of a string of cytidines on each side of the mutation could have an effect on the frequency of integration as opposite strands can easily “slide along” when annealing and repairing the break. Although we do not have an explicit explanation for this phenomenon we interpret it as a clear advantage when considering introducing HPFH mutations into HSCs for therapeutic purposes. At this stage of technology HDR in primary stem cells still represents a bottleneck and thus the high frequency in that the -198 T>C mutation occurs could be beneficial in gene therapy.

Page | 90

CHAPTER 5: The -200 site

5 CHAPTER 5: The -200 site

5.1 Chapter 5 Introduction

5.1.1 A cluster of mutations at the -200 site causes HPFH in humans

Mutations in the globin promoter that are located around -200 bp upstream of the TSS of globin have been described in various families with different genetic backgrounds. For instance, C to T substitutions were discovered in two Greek individuals at positions -196 and -201112, respectively, while a C to G substitution at position -195 was found in a Brazilian individual109. Furthermore, additional single base pair substitutions have been described at positions -19794 and -202115,116 (Figure 5.1). The mutations can occur in either the G- or Aglobin promoter with no particular preference. Most individuals that have been examined were heterozygous for one of the mutations. Currently only the -198 T>C mutation (described in detail in Chapter 4) has been seen to occur homozygously. Unlike the -198 T>C mutation, which was discovered in a healthy family without other haemoglobinopathies, most of the other mutations in the -200 cluster have been found in individuals with other abnormalities in the globin locus. Hence, the percentages of foetal haemoglobin in heterozygotes are more variable and harder to interpret. For example, two individuals with very high levels of HbF carrying the -196 C>T mutation were also affected by thalassemia which itself generally results in higher foetal globin output110. Data available from the small number of HPFH family members unaffected by other haemoglobin abnormalities indicates that the -200 cluster mutations elevate foetal haemoglobin to a moderate level of 3-10 % (-195, -197, -201)94,109,112 or a high level of 12-16 % in case of -196 C>T110110110110110110110110110110110110110110110109109109108108108(108). The mutation at position -202 have only been described in conjunction with the sickle cell trait.

Figure 5.1: Mutations around the -200 site in the globin promoter cause HPFH. (A) Schematic of the globin promoter showing HPFH mutations. The -200 cluster of mutations is highlighted with a black box and will be discussed in this chapter. (B) Mutations in the -200 site and the respective foetal haemoglobin percentages described in individuals carrying the mutations heterozygously.

Page | 91

CHAPTER 5: The -200 site

5.1.2 HPFH mutations disrupt binding of ZBTB7a in vitro The fact that the HPFH mutations in the -200 region are clustered has led to the hypothesis that they disrupt the binding of a repressor of foetal haemoglobin. By changing any one of the bases in the consensus sequence, binding could be impaired and in turn inhibit proper silencing of the foetal globin gene at time of the globin switch leading to persistent expression of globin.

As outlined in Chapter 1, the repressor ZBTB7A has been shown to play a major role in erythropoiesis. Conditional Zbtb7a knock-out in adult murine HSCs resulted in upregulation of the mouse embryonic globin genes which are orthologues of the human foetal globin genes258. Thus we considered that ZBTB7A could also be responsible for human globin gene regulation. ZBTB7A’s consensus binding motif has been characterised by Pessler et al. in in vitro binding assays71 and by us (unpublished, L. Norton, A. Funnell and J. Burdach) and others210 by motif analysis of ChIP-Seq peaks. We found that it remarkably resembles the sequence found in the -200 region of the globin promoter (Figure 5.2).

Figure 5.2: The ZBTB7A consensus motif resembles the -globin promoter sequence. (A) DNA sequence of the -200 site in the -globin promoter. HPFH mutations that disrupt binding of ZBTB7A in vitro are indicated with arrows. (B) Consensus motifs for ZBTB7A from different sources. The two motifs on the left are derived from motif search analysis in ChIP-Seq peaks determined by us (L. Norton, A. Funnell and J. Burdach unpublished) and Wang et al210. The motif on the right was described by Pessler et al.71 in in vitro binding assays of ZBTB7A.

Hence, we tested binding of ZBTB7A and were able to show that ZBTB7A indeed can bind to that site in vitro. This work was done by Gabriella Martyn from our lab and is part of her unpublished Honours thesis259. When we tested binding of ZBTB7A to the promoter containing either one of the HPFH mutations described above we found that all of them

Page | 92

CHAPTER 5: The -200 site

disrupt binding (Figure 5.3A). The -198 T>C mutation, which is also found in that cluster of mutations, was the exception and is described in detail in Chapter 4 (Figure 5.3B). This discovery has led us to hypothesise that ZBTB7A is the repressor of foetal haemoglobin acting through a binding motif at the -200 site.

Figure 5.3: HPFH mutations in the -200 cluster disrupt binding of ZBTB7A. (A) EMSA showing binding of the zinc finger domain of ZBTB7A (370-500 aa) to WT and mutant globin promoter (-209 to -187 bp). Lane 1 contains extract from Cos cells transfected with pcDNA3-empty plasmid. Lanes 2-8 contain nuclear extracts of ZBTB7A overexpressed in Cos cells and same amounts were incubated with radiolabelled probes for each lane. Lanes 1 and 2 comprise radiolabelled WT globin promoter whilst lanes 3-8 comprise the promoter with respective HPFH mutations. HPFH mutations in the -200 site disrupt binding of ZBTB7A. (B) EMSA showing binding of ZBTB7A to the WT and -198 T>C promoter. This work was performed by G. Martyn and is part of her unpublished Honours thesis259.

5.1.3 ZBTB7a binds to the -200 region of the foetal globin promoter in vivo Our group has performed chromatin immunoprecipitation experiments followed by sequencing for ZBTB7A in K562 cells (Figure 5.4A). The wet lab experiment was done by A. Funnell from our lab and the data analysis by J. Burdach and L. Norton as outlined in her unpublished PhD dissertation73. Shown here are two replicate IP samples of pulldown with anti-ZBTB7A antibody and their respective inputs. Both IPs showed enrichment of the A and Gglobin promoter regions. Close inspection of the area under the highest point of

Page | 93

CHAPTER 5: The -200 site

the peak revealed that the -200 site is located in the centre of the peak (Figure 5.4B). This finding confirmed that ZBTB7A can bind to the -200 region in vivo.

Figure 5.4: ZBTB7A binds to the -200 site in K562 cells in vivo. (A) Snapshot of tracks from ZBTB7A ChIP-Seq in K562 cells (unpublished data from A. Funnell, L. Norton and J. Burdach). Two replicates of immunoprecipitation (IP) and their respective input samples are shown. Peaks across the globin genes (~10 kb) were viewed using IGV v2.3. (B) Magnification of DNA sequence in the centre of the peak which revealed the -200 of the globin promoter and the ZBTB7A binding site (bold). (C) Snapshot of tracks from ZBTB7A ChIP-Seq in HUDEP2 cells and ATAC-Seq in WT and ZBTB7A KO HUDEP2 cells illustrated with IGV v2.3. Shown are two replicates of pulldown with anti-ZBTB7A antibody in HUDEP2 cells and ATAC-Seq peaks showing highly accessible chromatin in differentiated HUDEP2 WT and ZBTB7A KO cells. These data have been published by T. Maeda’s group72.

We collaborated with T. Maeda from Harvard Medical School and were able to confirm the importance of ZBTB7A in foetal globin silencing. This work was published earlier this year and showed that ZBTB7A is necessary for maintaining a condensed nucleosome structure at the globin locus in adults by recruiting the chromatin remodelling complex NuRD independent of BCL11A72. Knock-out of ZBTB7A in HUDEP2 cells resulted in up to 70 % foetal haemoglobin and in highly accessible chromatin as shown by ATAC-Seq (Figure 5.4C), a result that is very comparable to the knock-out of the repressor BCL11A in the same cell type177. Additionally, they performed ChIP-seq experiments for ZBTB7A in HUDEP2 cells and

Page | 94

CHAPTER 5: The -200 site

were able to show small but noticeable enrichment of ZBTB7A at the globin promoters and 3’ regions of the genes (Figure 5.4C).

In this chapter we will further investigate ZBTB7A’s role in -globin silencing to explore our hypothesis that loss of ZBTB7A binding is the key driver for elevated globin levels in people carrying -200 cluster HPFH mutations.

5.2 ChIP-qPCR of ZBTB7A in HUDEP2 cells K562 cells are an embryonic/foetal type erythroleukaemia cell line and express high levels of globin. Thus it is fairly surprising that we can detect ZBTB7A binding strongly to the promoter in this cell type given that we propose that ZBTB7A is a repressor of foetal globin. HUDEP2 cells on the other hand express low levels of globin indicating that the locus is successfully silenced. They therefore present the preferred cell type to investigate foetal globin silencing. T. Maeda’s group was able to show binding of ZBTB7A to the promoter in ChIP-Seq experiments in this cell type.

We were interested in determining if there is a decrease in ZBTB7A binding to the -200 site in vivo when an HPFH mutation is present. Hence, to be able to confirm that the HPFH mutations in the -200 site disrupt binding of ZBTB7A we first need to show that it binds this site when no mutation is present. We began our investigations by trying to replicate the results of the ChIP-Seq experiment performed by T. Maeda’s in the HUDEP2 cells in our hands. We performed a ChIP experiment followed by qPCR for ZBTB7A in these cells using the same antibody as Maeda’s group (Figure 5.5A). We chose to compare the enrichment at the globin promoter to three different positive controls, hypersensitive site 3 in the globin LCR, the promoter of the proapoptotic gene BCL2L11 (BIM1) and the KLF1 promoter, all of which are known ZBTB7A targets63,70,72. As a negative control we selected the -globin promoter and two intergenic regions upstream of the Aglobin gene (-1.1 kb and -2.1 kb). In our ChIP experiment, ZBTB7A bound only marginally to the globin promoter with no obvious difference to the adjacent intergenic regions -1.1 and -2.1 kb upstream of the Aglobin gene or the -globin promoter. Thus, we were unable to reproduce the finding that ZBTB7A binds to the globin promoter in HUDEP2 cells.

We compared the obtained data to a similar experiment performed in K562 cells (Figure 5.5B). We ran qPCR for the same targets and this time we were able to see substantial enrichment of the globin promoter in the ZBTB7A pulldown compared to the negative control regions.

Page | 95

CHAPTER 5: The -200 site

Figure 5.5: ZBTB7A ChIP-qPCR experiment in HUDEP2 and K562 cells. ChIP was performed using an anti-ZBTB7A antibody on HUDEP2 (A) and K562 (B) cells, and qPCR was used to analyse the enrichment of ZBTB7A at various loci. The tested genomic loci include the BCL2L11 and KLF1 proximal promoters and hypersensitive site 3 in the globin LCR as positive controls. A negative control area was the globin proximal promoter. Other negative controls include -1.1and -2.1 kb upstream of the Aglobin promoter.

5.3 ChIP-qPCR experiments of ZBTB7A at various stages of the cell cycle Seeking to explain our findings described above we wondered if the reason why we can see ZBTB7A binding in K562 cells is that the locus is constantly active and composed of open chromatin. This might allow ZBTB7A to easily find its target although the interaction with DNA is not functionally repressing the gene in these cells. Conversely, in HUDEP2 cells globin is silenced and the chromatin mainly in a condensed, inactive state72. Whilst activators may need to reside at the promoter to constantly recruit the basal transcription machinery to drive expression of a gene, we believe that repressors might work in a “hit-and-run”-like manner. That is, they might bind to their respective site in the promoter and recruit chromatin remodelling complexes which in turn transform the locus into tightly packed chromatin. Once the chromatin is condensed and transcription is shut down the repressor might no longer be required to reside at the promoter. ChIP-qPCR and ChIP-Seq

Page | 96

CHAPTER 5: The -200 site

experiments are only snapshots of temporary transcription factor binding to their targets in the cell.

Hence, we wondered if there are certain times in the cell cycle when a repressor is found more frequently at its target sites than others. In synthesis phase (S phase) the cell replicates the complete genome, chromatin is disassembled, and once the replication is complete, reassembled. We reasoned that binding may happen directly after S phase, during mitosis (G2/M phase) or at very early stages of G1 phase and decided to explore if ZBTB7A executes its function as a repressor of globin at any of these time points. Cells in culture grow asynchronously leading to a population of cells at different stages of the cell cycle at a given time. High concentrations of the drug nocodazole can prevent cells from proceeding through mitosis and can be used to synchronise cells grown in culture260. We treated asynchronously growing HUDEP2 cells with nocodazole for 15h, then removed the block and release the cells back into the cell cycle. We monitored the cells as they proceeded synchronously through the cell cycle by taking samples every couple of hours and analysing them by flow cytometry (Figure 5.6A).

First, we performed flow cytometry on asynchronously growing HUDEP2 cells to analyse their cell cycle stages and found that most cells were in S phase (46 %) or G1 phase (35 %) and only around 13 % in G2/M-phase. Nocodazole treatment resulted in an accumulation of cells in G2/M phase (65 %). One hour after releasing the block the remaining cells that resided in G1 phase have now moved forward in the cell cycle to S or G2/M phase. After two hours about 13% of cells that were in G2/M phase before have entered G1 or S phase to start a new cell cycle. Samples at 3h, 5h and 7h after release showed further progression of the cells in the cell cycle until they reached asynchronicity again after 24h. Based on these data we chose to investigate binding of ZBTB7A by ChIP-qPCR in cells treated with nocodazole for 15h (representative of mitosis) and cells that had been released back into cell cycle for one or two hours, respectively (representative for early G1 phase) (Figure 5.6B). Again, we chose the BCL2L11 and KLF1 promoters as positive control regions as well as the globin promoter as a negative control, but yet again, we were unable to detect binding of ZBTB7A to the globin promoter in any of the samples examined. Overall, the efficiency of the pulldown seemed to decrease over the course of the experiment, which may be associated with a lower viability of the cells after treatment with nocodazole. We repeated the same experiment in K562 cells and were unable to detect a noticeable change in ZBTB7A binding to the globin promoter in early G1 phases (Figure III.3, Appendix). In the K562 cells we detected a modest decrease of binding of ZBTB7A to all targets during mitosis (15h nocodazole). These experiments

Page | 97

CHAPTER 5: The -200 site

suggest that our hypothesis is not correct, as the stage of the cell cycle did not seem to affect binding of ZBTB7A to the globin promoter at the chosen time points.

Figure 5.6: ChIP-qPCR experiments of ZBTB7A in synchronised HUDEP2 cultures. (A) Cell cycle analysis of synchronised HUDEP2 cells by flow cytometry. Cells were arrested in G2/M phase by treatment with nocodazole for 15h. Then cells were washed and released back into the cell cycle. Samples were taken from asynchronous cells, after 15h of nocodazole treatment and 1h, 2h, 3h, 5h, 7h and 24h after release of the mitotic block. Cells were fixed and stained with propidium iodide before flow cytometry. Data was analysed with the cell cycle algorithm provided by the FlowJo software. (B) ChIP-qPCR experiment of ZBTB7A in asynchronous and synchronised HUDEP2 cells. ChIP samples were taken 15h of after nocodazole treatment and 1h and 2h after the mitotic block was released. Samples were analysed by qPCR and normalised over the input. Controls included the BCL2L11 and KLF1 proximal promoters as a positive control as well as the globin promoter as a negative control.

5.4 Generation of human erythroid cell lines carrying the -globin promoter - 195 C>G mutation

Despite our unsuccessful attempts to verify binding of ZBTB7A to the -globin promoter, we pursued with generating human erythroid cell lines that stably carry one of the -200 site HPFH mutations. We chose to start with the -195 C>G mutation as it had the most dramatic effect on ZBTB7A binding in vitro. HUDEP2 cells closely resemble human adult erythroid cells

Page | 98

CHAPTER 5: The -200 site

and thus would be the most suitable cell type to use. However, as outlined above, we were unable to detect binding of ZBTB7A to the globin promoter in HUDEP2 cells. In K562 cells enrichment of ZBTB7A at the -200 site is detectable but they do not represent a good model to investigate changes in expression levels of globin as it is already expressed in high levels. Hence, we decided to introduce the -195 C>G mutation into both cell models with the aim of performing ChIP experiments in K562 cells and examining expression levels of globin in HUDEP2 cells.

5.4.1 Genome editing in K562 cells Our strategy to introduce the -195 C>G mutation into K562s cells was the same as for the -175 T>C mutation described in Chapter 3. We transfected K562 cells with plasmids encoding a pair of TALENs that recognise flanking sequences next to the ATG start codon of the globin gene184. Additionally we co-transfected donor plasmids that contain a tdTomato reporter sequence, a neomycin resistance gene driven by a PGK promoter and also about 1 kb arms of homology to the Gglobin promoter region or the Aglobin gene, respectively (Figure 5.7). We used donor plasmids containing either the WT or the -195 C>G mutated promoter. Successful homologous recombination resulted in a tdTomato reporter knocked-in in conjunction with the WT or the -195 C>G -globin promoter. Due to the homology of the two -globin genes, genome editing of the locus resulted in excision of the G-globin gene which lies in between the cut sites.

Figure 5.7: Strategy of genome editing to introduce the -globin promoter -195 C>G mutation into K562 cells. Genome editing strategy in K562 cells. TALENs create a double strand break (DSB) at the ATG of both endogenous -globin genes. TdTomato is integrated by homologous recombination after co-transfection with a donor plasmid with 1kb arms of homology on the 5’ and 3’ end of the DSB. The G gene was spliced in this process and the A gene is lacking a promoter region and hence cannot be expressed.

Due to the antibiotic resistance cassette contained in the knocked-in sequence we were able to enrich for cells that have been modified with the donor DNA. However, not all cells that

Page | 99

CHAPTER 5: The -200 site

stably integrated tdTomato were modified in every one of the three globin loci present in K562s. Therefore, we stringently selected clones in which tdTomato had been knocked into all alleles. We then performed Sanger sequencing of the promoter region to determine the presence of the -195 C>G mutation. We managed to obtain two clones that were homozygous for the -195 C>G mutation. These cells and the ones containing the WT globin promoter (with tdTomato knocked into all alleles) will be referred to as K562 G-A tdTomato WT or -195 C>G, respectively.

5.4.2 Genome editing in HUDEP2 cells In our genome editing strategy to introduce the -195 C>G mutation into HUDEP2 cells we aimed to avoid integration of a reporter or a selection marker. Similarly to the generation of HUDEP2 -198 T>C cells (Chapter 4) we used the CRISPR/Cas9 system to engineer HUDEP2 cells carrying the -195 C>G mutation. As the -195 is adjacent to the -198 mutation we were able to use the same small guide RNA to lead the Cas9 protein to its target site. Again, we enabled HDR by co-transfecting an ssODN as a donor template containing the -195 C>G mutation and 75 bp arms of homology 5’ and 3’ to the mutation (Figure 5.8A). 60 clones were screened, of which eight clones were homozygous for the -195 C>G mutation as determined by genomic PCR across the promoter region followed by Sanger sequencing. We then performed qPCR on genomic DNA of those clones to determine the copy number of the globin promoter and the intergenic region between the two globin genes (Figure 5.8B). Comparison with copy number of HUDEP2 WT cells revealed that three of the clones (#5, #43 and #60) showed almost exactly half the copy number for the promoter region and no amplification for the intergenic region. We propose that genome editing in these clones resulted in a collapsed globin locus similar to what we observed in the homozygous -198 T>C HUDEP2 cells (Figure 5.8C). The other clones showed loss of at least one of four promoter regions and some had residual intergenic regions. This indicates that they contain larger deletions around the globin promoter that our first genomic PCR was unable to pick up. These clones were excluded from further experiments. The Sanger sequencing results for the three homozygous clones that we decided to include in further analyses are shown in Figure 5.8D. Unfortunately we were unable to obtain any clones that carry the -195 C>G in a heterozygous manner and still have an intact globin locus.

Page | 100

CHAPTER 5: The -200 site

Figure 5.8: Genome editing in HUDEP2 cells to introduce the -195 C>G mutation. (A) Genome editing strategy to introduce the -195 C>G mutation into HUDEP2 cells. The sgRNA is designed to cut close to the -195 site in the globin promoter. An ssODN containing the C>G substitution at position - 195 and 75 bp arms of homology to either side of mutation was co-transfected with the sgRNA/Cas9 plasmid and used as a donor template for HDR. (B) Copy number analysis in HUDEP2 WT cells and clonal populations of -195 C>G cells. Copy number was determined by quantitative real-time PCR across the globin promoter (HBG) and the intergenic region (-3.1 kb upstream of A). Ct values were normalised against Ct values for PCR across ZFPM1 and then compared to known copy numbers in HUDEP2 WT cells. (C) Schematic of the -globin locus in unedited HUDEP2 WT cells homozygous -195 C>G HUDEP2 cells after genome editing. Blue arrows indicate qPCR primers at the -globin promoter (HBG), red arrows indicate qPCR primers in the intergenic region (-3.1 kb). A red cross indicates the presence of the -195 C>G mutation. Homozygous -195 C>G HUDEP2 cells are missing the G-globin gene and the intergenic region. (D) Aligned Sanger sequencing tracks of HUDEP2 clones carrying at least one copy of the -195 C>G mutation. These three clones were used for further analysis.

5.5 The -195 C>G mutation disrupts ZBTB7a binding in vivo We suggest that the -200 site mutations disrupt binding of ZBTB7A to the promoter in vivo and therefore prevent complete silencing of the foetal globin genes at time of the globin switch. The successful generation of cell models that carry the -195 C>G mutation allowed us to investigate binding of ZBTB7A to the promoter in a cellular context. We are generally able to see enrichment of ZBTB7A at the globin promoter in K562 WT cells and hence chose this

Page | 101

CHAPTER 5: The -200 site

cell line to examine the effect of the -195 C>G mutation on ZBTB7A binding. We performed ChIP-qPCR experiments for ZBTB7A on two of our K562 GA tdTomato WT or -195 C>G clones, respectively and compared enrichment of the globin promoter to enrichment of other targets. We chose the same positive and negative controls as mentioned earlier and decided to normalise the enrichment of each target to the enrichment of the non-erythroid positive control region at the BCL2L11 promoter. This accounts for variability of pulldown efficiencies in each sample and allows a better comparison of relative enrichment of targets. The raw qPCR data for these samples can be found in Figure III.4 in the Appendix.

We found that binding of ZBTB7A is significantly reduced in the K562 cells carrying the -195 C>G mutation compared to the WT (Figure 5.9). Enrichment of other targets was not affected and showed no significant difference between WT and -195 C>G cells. This outcome supports our hypothesis that the -195 C>G mutation results in loss of ZBTB7A binding to the globin promoter.

Figure 5.9: The -195 C>G mutation decreases binding of ZBTB7A in K562 cells. ChIP-qPCR experiment of ZBTB7A in K562 GA tdTomato WT or -195 C>G cells (n=2). ChIP samples were analysed by qPCR and normalised over the input and the non-erythroid positive control region in the BCL2L11 promoter. Other controls included HS3 in the globin LCR as a positive and the globin promoter as a negative. Binding of ZBTB7A to the globin promoter was significantly reduced in the -195 C>G cells (*p = 0.039 as determined by an unpaired two-tailed t-test).

5.6 The -195 C>G mutation increases foetal globin expression Foetal haemoglobin expression is usually silenced in HUDEP2 cells. Hence, our next step was to examine expression levels of globin in our engineered HUDEP2 -195 C>G cells to determine if the HPFH mutation elevates foetal globin expression. We compared the expression of all like globins of HUDEP2 WT cells to that of the three clones carrying the -195 C>G mutation homozygously (Figure 5.10A). globin was only expressed to very small amounts in all samples (<0.1 %) and was therefore not graphed. All three -195 C>G clones showed a dramatic increase of globin expression with between 80-98 % of all like

Page | 102

CHAPTER 5: The -200 site

globins being globin. Expression of globin was elevated by 80-145 fold in these cells compared to WT HUDEP cells whilst  and globin expression was significantly decreased (Figure 5.10B). The raw qPCR data can be found in Figure III.5 in the Appendix.

As outlined above and discussed in detail in Chapter 4, the -195 C>G clones investigated here are most likely to have a collapsed globin locus and are completely missing the intergenic region between the two globin genes. The expression data for these cells are surprisingly similar to what we have observed in our three homozygous HUDEP2 -198 T>C cell lines that we believe also have a collapsed globin locus. Unfortunately in the case of the -195 C>G mutation we were unable to generate cell lines that carry the -195 C>G mutation but still have an intact globin locus. Hence, without an appropriate control that contains a collapsed globin locus but the WT promoter region, it is difficult to draw an assertive conclusion about the effect of the -195 C>G mutation on globin expression in HUDEP2 cells. However, these results once more support our hypothesis that the intergenic region is crucial for proper foetal haemoglobin silencing.

Figure 5.10: HUDEP2 -195 C>G cells show an increase in globin expression. (A) mRNA expression levels of ,  and globins displayed as a percentage of total like globins. mRNA levels were determined by qPCR and Ct values were normalized to rRNA levels of 18S. Shown are HUDEP2 WT cells and three clones carrying -195 C>G mutation homozygously. (B) Fold change of like globin expression in HUDEP2 -195 C>G cells compared to expression in WT cells.

Page | 103

CHAPTER 5: The -200 site

5.7 Modifying a ZF of ZBTB7a can restore binding to -195 C>G in vitro

Although our collaborators were able to show binding of ZBTB7A to the globin promoter by ChIP-Seq, so far our group has been unable to replicate these results in ChIP-qPCR experiments with the HUDEP2 cells in our hands. However, the enrichment they observed was also comparatively small given that knock-out of ZBTB7A in HUDEP2 cells results in a dramatic increase of foetal globin expression. We propose that ZBTB7A is not constantly bound to the globin promoter but instead fulfils its role as a repressor at an early time-point and is then is excluded from the silenced promoter.

Thus, we developed an elegant strategy to test if ZBTB7A indeed represses foetal haemoglobin through the -200 site. If the HPFH mutations in this site elevate globin levels because ZBTB7A cannot bind, we should be able to restore globin to WT levels by modifying the DNA binding domain of ZBTB7A so that it is able to bind once more to the mutated promoter. ZBTB7A binds to DNA with four Cys2-His2-type zinc fingers located near the C-terminus of the protein. In collaboration with J. Mackay from Sydney University we predicted the DNA-binding specificities of the four zinc fingers of ZBTB7A using the Singh algorithm261,262 (Figure 5.11A). When we combined the predicted binding motifs for each zinc finger the resulting consensus sequence closely resembled the -200 site in the globin promoter. According to this model, zinc finger four (F4) makes direct contact with positions -194, -195, -196 and -197 via residues +6, +3, -1 or +2 respectively (Figure 5.11B). We then used the same algorithm and J. Mackay’s expertise to predict the amino acid sequence that is necessary to allow F4 to bind to the mutated -195 C>G sequence. We proposed that making a total of three amino acid changes in the -1 to +6 aa sequence of F4 would restore binding of ZBTB7A to the -195 C>G promoter.

Page | 104

CHAPTER 5: The -200 site

Figure 5.11: Prediction of DNA-binding specificity of ZBTB7A zinc fingers. (A) Predicted DNA-binding specificity for the zinc fingers of WT ZBTB7A. Shown are the amino acid sequences of residues -1 to +6 in each zinc finger (F1 to F4) and their respective predicted consensus sequence. The combined binding motif closely resembles the -200 site of the globin promoter. (B) Putative model of ZBTB7A binding to the -200 in the globin promoter. Zinc finger 4 (F4) makes direct contact with the positions -194, -195, -196 and -197. By mutating three residues in the aa sequence of F4 the binding specificity may be able to be altered to recognize the -195 C>G promoter. The residues in F4 that make direct contact to DNA are -1, +2, +3 and +6 and have been colour coded in accordance with their bound base in the promoter. All predictions were made with the Singh algorithm using the Expanded Linear SVM prediction model261,262.

Experimentally, we began by testing the binding specificities of ZBTB7A containing the altered aa sequence in F4 (ZBTB7A F4 Mut) and comparing it to WT ZBTB7A. We performed EMSAs to investigate binding of ZBTB7A WT and F4 Mut to the WT and -195 C>G globin promoter in vitro (Figure 5.12A). We used the DNA-binding domain of ZBTB7A (aa 370-500) labelled with a Flag-tag and overexpressed them in Cos cells. As expected WT ZBTB7A bound to the WT globin promoter and binding was disrupted by the -195 C>G HPFH mutation (lane 2 and 5). ZBTB7A F4 Mut also bound to the WT promoter with similar affinity as the WT protein (lane 3). Strikingly, binding of ZBTB7A F4 Mut to the -195 C>G promoter was completely restored (lane 6) indicating that our prediction of binding was accurate. We also

Page | 105

CHAPTER 5: The -200 site

tested binding of both WT and F4 Mut protein to the promoter containing another HPFH mutation, -196 C>T. Once more binding of WT ZBTB7A was disrupted by the -196 C>T mutation whilst the affinity for the promoter was almost completely restored in case of the ZBTB7A F4 Mut protein. This finding was unexpected given that the predicted binding sequence for this ZF mutant was CCGC (-195 C>G) and not CTCC (-196 C>T). Interestingly, binding to the WT promoter was not affected by the aa changes in F4. To exclude that differences in binding affinity are due to variable protein levels in the samples, we checked the expression levels of ZBTB7A WT and F4 Mut by Western Blot and found that they were overexpressed to equivalent amounts (Figure 5.12B).

Figure 5.12: Modifying a ZF restores binding of ZBTB7A to the -195 C>G and -196 C>T globin promoter. (A) EMSA showing binding of WT and F4 Mut ZBTB7A (370-500 aa) to the WT, -195 C>G and -196 C>T globin promoter. Proteins were overexpressed in Cos cells and nuclear extracts were incubated with either WT globin promoter probe (-209 to -187) or the probe containing the -195 C>G or -196 C>T HPFH mutations. (B) Western blot to confirm equal amounts of ZBTB7A ZF protein are present in each sample. Blot was probed with anti-Flag as ZBTB7A WT and Mut F4 are N- terminally Flag-tagged.

Our next step will be to introduce the small changes in F4 into the endogenous ZBTB7A locus in HUDEP2 cells by CRISPR/Cas9 genome editing. As ZBTB7A F4 Mut is still capable of binding to the WT globin promoter in vitro this could be an asset for our planned experiments in cellular models as ZBTB7A F4 Mut should still be able to recognise its normal target sites in the genome. We will then introduce the same aa changes in F4 into HUDEP2 cells carrying the -195 C>G mutation and are aiming to rescue the HPFH phenotype of the

Page | 106

CHAPTER 5: The -200 site

cells by doing so. We hypothesise that ZBTB7A will be able to bind to the globin promoter in these cells and thus be able to properly silence globin expression. These experiments are ongoing and were not complete by the time this thesis was compiled.

5.8 Chapter 5 Discussion Previous data generated by our lab and others have shown that ZBTB7A is one of the major foetal haemoglobin repressors alongside BCL11A and SOX672. Knock-out of Zbtb7a in HUDEP2 cells results in a dramatic increase in foetal haemoglobin expression in HUDEP2 cells and this effect seems to be independent of BCL11A. It was shown by our collaborators that ZBTB7A can interact with the GATAD2B and CDH3/8 subunits of the NuRD chromatin remodelling complex and that this interaction is likely to be responsible for ZBTB7A’s role as a repressor in globin switching. However, the site through which ZBTB7A represses foetal haemoglobin still remains elusive and hence will was the main focus of this work.

An array of HPFH mutations around the -200 site in the globin promoter has been proven to elevate foetal globin expression in humans and previous work from our lab has shown that ZBTB7A is capable of binding to this site and that the HPFH mutations disrupt binding in vitro. This finding led us to hypothesise that ZBTB7A may exhibit its function as a repressor through the -200 site in the globin promoter and here we aimed to show that elevated foetal globin expression in these HPFH mutations is due to loss of ZBTB7A binding to the promoter.

As the ability to show that ZBTB7A is capable of binding to the -200 site in vivo inevitably precedes investigating the loss of binding, the first part of this chapter focussed on that. Our lab was able to show binding of ZBTB7A to the promoter by ChIP-Seq in K562 cells and in this chapter we confirmed the results by ChIP-qPCR. However, our attempts to see enrichment of the globin promoter region in ZBTB7A ChIP-qPCR experiments in HUDEP2 cells failed. This finding may seem surprising given the fact that K562 cells express high levels of foetal haemoglobin whilst the foetal globin locus in HUDEP2 cells is silenced. Naturally, one would think that if ZBTB7A is a repressor of foetal haemoglobin our experimental results should be exactly the opposite. However, from our experience with a number of transcription factors, repressors act very differently to activators in ChIP experiments. For an activator, we propose that it tends to consistently reside at its target site to recruit coactivators and the basal transcription machinery to initiate and constantly drive transcription of the gene. Repressors, on the other hand, may be bound to their target sites only for a short amount of time to initiate conformational changes in the surrounding chromatin resulting in tightly packed

Page | 107

CHAPTER 5: The -200 site

heterochromatin. Once that role is fulfilled the repressor may no longer be needed to actively repress the gene and may be excluded from the promoter. This hypothesis is supported by the fact that ChIP-Seq data is available for most of the master activators in erythropoiesis, GATA1218, TAL159, KLF160 and even the cofactors LMO2 and LDB1249 and that their modes of activation have been studied in depth. However, little is known about the genome-wide binding patterns of the important repressors BCL11A or SOX6 which is mainly due to the lack of high-quality ChIP-Seq data. We describe this proposed mechanism of transcriptional repressor binding as a “hit-and-run”-like process. Others have previously proposed this as a mechanism for transcriptional regulation decades ago263 but it has only been experimentally confirmed in a couple of instances264,265. One reason for this might be that the experimental framework often only allows researchers to take a snapshot of the processes in the cell. However, transcription factor binding is much more dynamic and techniques like ChIP-Seq and ChIP-qPCR cannot appropriately reflect this265.

We considered that ZBTB7A may act in a “hit-and-run”-type manner to silence the expression of globin at the promoter. We propose that binding of ZBTB7A to the promoter in K562 cells is likely due to the higher accessibility of the chromatin in these cells (as the locus is active) which in turn enhances the frequency in that ZBTB7A can make contact with its target site. If the interaction of ZBTB7A with the promoter is only transient we wondered what the best time point is to investigate ZBTB7A binding. Studies have shown that transcriptional processes are mainly shut down during mitosis and are reactivated upon mitotic exit266. Several “bookmarking” factors have been described that remain bound to chromatin during mitosis to allow instant reactivation of gene expression266–269; however, most other TFs are ejected from mitotic chromatin and ZBTB7A could be one of them267. We reasoned that directly after mitosis could be the time point when ZBTB7A binds to the globin promoter in HUDEP2 cells to silence expression early on in the cell cycle. However, our ChIP experiments following a mitotic arrest did not reveal any significant changes in binding throughout the cell cycle of ZBTB7A to the promoter or any of the other target genes we investigated. We explored two more time points after the cells have been released from the mitotic arrest but again we were unable to see any changes. One possible explanation for this result is that the time points were not chosen appropriately and that binding actually occurred before or after the windows that were selected. Surprisingly, we did not see a complete loss of ZBTB7A binding to other targets during mitosis. This could have two reasons: firstly, ZBTB7A could be a bookmarking factor that stays bound to the target genes investigated during mitosis or secondly, the mitotic arrest may have been incomplete and many cells may not yet have entered mitosis at the time point of the ChIP. To test this, one could repeat the experiment

Page | 108

CHAPTER 5: The -200 site

taking samples in smaller time increments and aiming for a more fully optimised synchronisation of the cells by FACS. Additionally, to investigate the “hit-and-run” mechanism further, we plan on generating inducible cell lines similar to GATA1-ER (G1ER)270 or KLF1-ER (K1ER)257 cells. In these cell lines GATA1 or KLF1 is tethered to an estrogen receptor keeping it inactive in the cytoplasm. Upon induction with tamoxifen the TF is translocated into the nucleus and can now exhibit its transcriptional functions. Such a cell line would allow us to monitor ZBTB7A binding to its targets post-induction and may help to “catch” ZBTB7A in the act of silencing foetal globin expression.

As an alternative explanation to above, that binding of ZBTB7A could be a transient process, we also considered that pulldown of ZBTB7A binding to the globin promoter could simply be inhibited by epitope masking that prevents the antibody from recognising the protein. Although we confirmed that the antibody genuinely works in ChIP experiments by testing ZBTB7A binding to positive control regions, recognition may dependent on the 3D-structure that ZBTB7A adopts when it binds to its respective cofactors. As the interacting cofactors differ in a gene-specific manner it is possible that at some genes the epitope is readily accessible to the antibody and at others it is not. Additionally, it is likely that ZBTB7A cofactors are different at genes that it is actively repressing. The antibody we routinely use for ZBTB7A ChIP experiments is a monoclonal antibody recognising only one epitope in the protein. It has been described in the literature that the use of monoclonal antibodies, although they are more specific, can result in a decreased signal in ChIP if the recognised epitope is masked by surrounding chromatin271. Thus, one future approach will be to test different antibodies in ChIP recognising multiple or other epitopes of ZBTB7A. At the same time one could attempt to endogenously tag the N- or C-terminus of the protein by genome editing which could prevent the need for an antibody against the endogenous protein and might also make the epitope more accessible.

Despite the negative results of our mitotic arrest experiments we continued to generate edited human erythroid cell lines that carried one of the -200 site HPFH mutations. The -195 C>G mutation disrupts binding of ZBTB7A most substantially in vitro and was therefore chosen as a representative mutation for loss of ZBTB7A binding in vivo. We successfully introduced the mutation into K562 and HUDEP2 cells. Our K562 cells served as a model to investigate binding of ZBTB7A to the promoter, as we can demonstrate binding to WT promoter in this cell line. HUDEP2 cells on the other hand were used to examine reactivation of foetal globin expression, as the gene editing strategy used here left

Page | 109

CHAPTER 5: The -200 site

endogenous globin genes and baseline globin expression in these cells is low, allowing derepression to be readily observed.

Binding of ZBTB7A to the -200 site of globin was indeed reduced in our K562 -195C>G cells supporting our hypothesis that the -200 site is where ZBTB7A binds and that HPFH mutations in this region function via disrupting ZBTB7A binding in vivo. We are currently working on generating additional clonal populations of K562 cells carrying the -195 C>G mutation and other HPFH mutations around the -200 site to increase the sample size and confirm our findings. However, the decrease in binding, although statistically significant, was modest with some residual enrichment of ZBTB7A at the WT globin promoter. The effects of the HPFH mutations that we observed in vitro were dramatic with almost complete disruption of ZBTB7A binding to the promoter. Interestingly, our in vivo experiment does not reflect this. Transcription factor specificity is determined initially by the DNA-binding domain, nevertheless the functional domain of the protein is believed to also be important for transcription factor binding in vivo230,234. Hence, the fact that we saw differences in the binding affinity of ZBTB7A to the -195 C>G promoter between in vitro and in vivo experiments might not be surprising after all. Most likely ZBTB7A does not repress genes all alone but instead recruits or is recruited by other transcription factors and cofactors that act in synergy with ZBTB7A to silence expression. Although ZBTB7A loses its ability to directly bind to the -200 site when the -195 C>G is present in vitro, there might be other factors that are still capable of recruiting ZBTB7A to the promoter to a certain extent in vivo. This hypothesis appears feasible given that all the -200 site mutations (with the exception of -196 C>T) result only in moderate levels of globin activation, with affected individuals having between 3 and 10 % foetal haemoglobin. If binding of this repressor was completely abolished by the mutations, elevation of HbF levels may be more substantial.

Our attempts to examine expression levels of globin in HUDEP2 cells that carry the mutation were complicated by the fact that genome editing most likely resulted in a collapsed globin locus in all three homozygous -195 C>G HUDEP2 cell lines. The foetal globin expression levels were dramatically increased to an extent that we are unable to explain solely by their homozygous genotype considering the magnitude of the effect of the -195 C>G mutation on globin expression levels in individuals with HPFH. We made the same observation when we investigated the -198 T>C mutation (Chapter 4) but in case of the -195 C>G mutation were unable to obtain heterozygous clones with an intact globin locus. Without an appropriate control HUDEP2 cell line that contains a collapsed globin locus but the WT globin promoter sequence, we are unable to make a conclusive

Page | 110

CHAPTER 5: The -200 site

statement of the effect of the -195 C>G mutation on foetal globin expression levels in HUDEP2 cells. Although this may seem disappointing at first, our findings in -195 C>G and -198 T>C HUDEP2 cell lines excitingly raised an array of new scientific questions. By performing these experiments we now propose that the intergenic region between the two globin genes may play a crucial role in globin silencing since all cell lines that were missing the intergenic region showed an impressive increase in foetal globin expression. Hence, we are currently working on generating cell lines that lack the intergenic region, not only to obtain an appropriate control cell line for our previous experiments, but also to further study the intergenic region, the chromatin structure of the locus and potential TF binding sites in this region. Splicing of the intergenic region could represent a tractable and effective therapeutic strategy for -haemoglobinopathies. At the same time we will persist with our attempts to generate heterozygous HUDEP2 cell lines with the -195 C>G mutation which do not have a spliced locus.

Finally, in this chapter we outlined a strategy to sidestep possible technical issues with epitope masking in ChIP and ZBTB7A acting in a “hit-and-run”-type manner as described above. In this “rescue-type” experiment we plan to reverse the HPFH phenotype in cells with the -195 C>G mutation by using a modified version of ZBTB7A that is capable of binding to the -195 C>G mutant promoter and once again to silence foetal globin. Our preliminary in vitro experiments with modified ZBTB7A protein showed that this strategy is feasible. In silico modelling of the zinc finger binding specificity of ZBTB7A suggested to change three amino acids in the last zinc finger (F4) of the protein and indeed this subtle change could restore binding of the protein in vitro. We now aim to introduce these changes into the ZBTB7A genomic locus by CRISPR/Cas9 genome editing in our -195 C>G HUDEP2 cells. Although the ZBTB7A Mut F4 protein showed no noticeable difference to the WT ZBTB7A protein in binding to the WT globin promoter, there might be other targets in the genome that could be negatively affected. Furthermore, there is also the possibility of de novo targets that we create by changing the DNA-binding specificity of ZBTB7A. Hence, to control for other changes that might be induced by the modified zinc fingers we will also generate WT HUDEP2 cell lines expressing the modified version of ZBTB7A. Surprisingly, the changes in F4 also restored ZBTB7A binding to the -196 C>T mutation in vitro. This was unexpected and shows that zinc finger-DNA interaction is complex and not always predictable. However, it will allow us to incorporate an additional control into this experiment by performing a similar rescue-type experiment in cells that carry the -196 C>T mutation. If we see a rescued HPFH

Page | 111

CHAPTER 5: The -200 site

phenotype in both cell lines, this will provide convincing evidence that loss of ZBTB7A binding accounts for HPFH caused by mutations at the the -200 globin promoter site.

In conclusion, in this chapter we were able to show that one of the -200 site HPFH mutations, -195 C>G, decreases the affinity of ZBTB7A for this site in vivo. However, further experiments are necessary to confirm that ZBTB7A indeed acts through the -200 site to silence foetal globin expression.

Page | 112

CHAPTER 6: General discussion and conclusion

6 CHAPTER 6: General discussion and conclusion

6.1 Summary

-haemoglobinopathies are amongst the most common genetic disorders in the world with devastating prospects for the affected individuals if untreated. The discovery that high foetal haemoglobin (HbF) levels are beneficial for patients leading to less severe symptoms has been a key driver for haemoglobin research. Naturally occurring mutations in the promoter region of foetal -globin result in the expression of foetal haemoglobin into adulthood - a benign condition called HPFH. Individuals with HPFH have foetal haemoglobin levels between 3 % and 40 % whilst normal adults only produce about 1 % HbF. There are two main reasons why these mutations are of particular scientific interest.

Firstly, understanding why these mutations result in high HbF levels may help elucidate basic principles and mechanisms of foetal to adult globin switching. Here we investigated different HPFH mutations in the -globin promoter and were able to report mechanistic explanations for -globin activation. Two of the mutations, -175 T>C (Chapter 3) and -198 T>C (Chapter 4) appear to create de novo binding sites for the transcriptional activators TAL1 and KLF1, respectively which drive the expression of -globin. Additionally, we provide evidence that a cluster of mutations around -200 bp upstream of the TSS disrupt binding of the foetal globin repressor ZBTB7A (Chapter 5).

Secondly, individuals with HPFH and other -haemoglobinopathies show a milder course of disease due to their naturally high HbF levels. This fact could be exploited and translated into the clinic by utilising these HPFH mutations in gene therapy. Here, we used genome editing for proof-of-principle studies to introduce various mutations into different erythroid cell lines and were able to show that foetal globin expression is subsequently elevated in each case. We have provided evidence that it is possible to reactivate foetal globin expression in adult type erythroid cells via genome editing and thus consider this work an important pre-clininal study for the application of this technology to treating -haemoglobinopathies by genome editing.

6.2 The molecular basis of HPFH mutations Despite the fact that many HPFH mutations were discovered decades ago, the mechanistic basis by which most of the mutations elevate HbF levels has remained elusive. Most attempts to examine the molecular basis of the mutations were performed in vitro due to the fact that these mutations are rare. Recent advances in genome editing technology allowed us to reinvestigate the molecular mechanisms underlying HPFH in cell models. The first Page | 113

CHAPTER 6: General discussion and conclusion

mutation we investigated was a T to C substitution at position -175 upstream of the TSS of -globin. The -175 T>C mutation is probably the best characterised HPFH mutation as it has occurred in five different families with various genetic backgrounds and results in the highest HbF levels out of all non-deletional HPFH mutations (16 %-41 %)108,194,195,220. Generally, there are two possible explanations for the activating effects of a promoter mutation: either the mutation creates a de novo binding site for an activator or binding of a repressive factor is disrupted. We found by in vitro and in vivo transcription factor binding assays that this mutation creates a de novo binding site for the activator TAL1 (Figure 6.1, red colour scheme). TAL1 is an erythroid transcription factor that induces target gene expression in conjunction with its cofactors LMO2, LDB1 and the transcription factor GATA1. It is necessary for terminal differentiation of red blood cells and therefore highly abundant in erythroid cells59. It has been shown by others that de novo recruitment of LDB1 to the - or -globin promoter is sufficient to override the developmental globin gene expression program215,216. While their study was performed using an artificial zinc finger protein tethered to LDB1, the -175 T>C mutation recruits TAL1 and its cofactor LDB1 to the -globin promoter naturally and activates expression of -globin through LDB1-mediated interaction with the LCR.

Several other mutations clustered around the -200 site of the -globin promoter have been described to cause HPFH. However, prior to the work described in this thesis, an explanation for their potential to derepress -globin was unknown. Here, we provide evidence that the foetal globin repressor that usually binds this region and is disrupted by the HPFH mutations is ZBTB7A. We could show that ZBTB7A binding is decreased by the mutations in vitro and in vivo and our collaborators demonstrated that ZBTB7A interacts with the NuRD chromatin remodelling complex to silence foetal globin72 (Figure 6.1, green colour scheme). Interestingly, ZBTB7A is known to bind to the hypersensitive sites in the LCR and thus could, like BCL11A and SOX6, be involved in long-range chromatin looping to various regions of the -globin locus. As it is also been found to bind to the -globin promoter72 one could argue that it potentially has a dual role in haemoglobin switching: silencing foetal -globin expression by binding to the promoter and recruiting the NuRD complex and potentially driving adult -globin expression through a different activating mechanism. We suggest that the HPFH mutations in the -200 cluster decrease binding of ZBTB7A and hence prevent complete silencing of -globin. However, more experiments are needed to further elucidate the role of ZBTB7A in globin switching.

The -198 T>C mutation, although located in the middle of the cluster of other mutations, stands out because it does not disrupt binding of ZBTB7A. In vitro investigations by other

Page | 114

CHAPTER 6: General discussion and conclusion

researchers showed that the T to C substitution creates a binding site for the widely expressed activator SP1. Here, we confirmed binding of SP1 to the mutation in vivo and additionally demonstrated that binding of the erythroid activator KLF1 is also increased (Figure 6.1, blue colour scheme). We showed that KLF1 is more abundant than SP1 in erythroid tissues and it is known that activation of the -198 T>C promoter is dependent on erythroid factors226. Hence, we propose that KLF1 is the key driver of -globin expression in -198 T>C HPFH. Due to its erythroid specific nature it could directly interact with GATA1247,248 which in turn may facilitate LCR looping (indicated with a dotted double-headed arrow). KLF1 is also known to be able to interact with the subunit BRG1 of the SWI/SWF chromatin remodelling complex254,272 and thus could exhibit its activation potential through recruitment of chromatin modifiers (indicated with a dotted double-headed arrow). Lastly, as we found that other KLF and SP proteins can also bind the -198 T>C promoter in vitro and in vivo, we propose that KLFs could compete for binding with and displace the repressor ZBTB7A (indicated with a dotted arrow) which binds to the same site.

Figure 6.1: Model of the molecular basis of HPFH mutations in the -globin promoter. Mechanisms of -globin activation by the -175 T>C (red), the -198 T>C mutation (blue) and the -200 cluster (green) are indicated in different colour schemes. De novo binding of TAL1 recruits LMO2 and LDB1 to the -175 T>C promoter which facilitates looping of the LCR to drive expression of the gene. De novo binding of KLF1 (and SP1) to -198 T>C may also result in recruitment of GATA1 and the LCR or the chromatin remodelling complex SWI/SWF (dotted double-headed arrows). Binding of KLF/SPs could also displace binding of ZBTB7A (dotted arrow). Binding of the repressor ZBTB7A is disrupted by -200 cluster mutations.

6.3 Translation into the clinic Reactivating the expression of HbF in adult life has been a major therapeutic target of haemoglobinopathy research for decades and a number of different approaches to

Page | 115

CHAPTER 6: General discussion and conclusion

reactivation have been taken. The discovery of hydroxyurea as a small molecule to enhance foetal haemoglobin production was one of the first treatments available for SCD88 and is still being used today. Many other drugs that target cell adhesion, haemoglobin polymerisation and sickling have been developed and tested but most of them work only efficiently in combination273. BCL11A has been described as one of the main repressors of foetal haemoglobin and mutations in the BCL11A gene are connected to high levels of foetal globin50,51. It has been proposed that BCL11A could be a potential drug target in the treatment of haemoglobinopathies223. However, the fact that it is a transcription factor with many other functions and targets within in the cell complicates the matter as interfering with BCL11A activity to increase foetal globin production will also impact its normal role in transcriptional regulation of other targets in the genome.

The haematopoietic system has long been praised as the “ideal” tissue to implement gene therapy as bone marrow from patients is easily accessible and stem cells can be cultured ex vivo to perform genome editing. Intuitively, reverting the disease mutation back to the normal sequence would be the best scenario. This may seem simple but ultimately proofs difficult for two reasons. Firstly, whilst some diseases like SCD are caused by a single-base pair substitution and could be treated by simply repairing the mutation, others, such as - thalassemias, are more genetically homogenous with many different mutations and/or deletions that cause disease. Patients may also carry two different mutations at the same time implicating that every patient needs to be genotypically characterised before a suitable, personalised genome editing strategy can be designed to fix the -globin gene. Secondly, the two pathways of DNA repair, NHEJ and HDR, have been shown to not be equally efficient. In quiescent cells NHEJ seems to be the dominant pathway whereas HDR is rare274. Accordingly, genome editing strategies that rely on NHEJ have been a main focus of investigation. For example, one approach taken suggests targeting a regulatory enhancer region of BCL11A to elevate HbF levels177. Although this strategy showed a very promising outcome in increasing HbF levels, it needs to be considered that there might be other unknown side effects of disrupting the enhancer region. We believe one elegant approach may be to introduce naturally occurring HPFH mutations to drive high HbF levels in adult red blood cells as these mutations are known to naturally ameliorate the symptoms of SCD and -thalassemia in humans275. This approach has significant advantages as only naturally occurring variants are introduced and problems with epigenetic silencing of foreign genetic material or the unintended activation of nearby genes should be avoided. Furthermore, introducing HPFH mutations presents a general strategy to treat -thalassemias,

Page | 116

CHAPTER 6: General discussion and conclusion

independent of the patient’s genotype. With ongoing research aiming to increase the efficiency of HDR in HSCs we have confidence that HDR rates in primary cells will improve and that introducing small point mutations will become more feasible.

Here, we investigated several mutations and raised the question which of them is the preferred mutation to translate into the clinic. Generally we found that mutations that involve de novo binding of activators (-175 and -198) result in higher levels of HbF in individuals than the -200 site mutations that disrupt binding of a repressor. This is most likely due to the fact that although direct binding of ZBTB7A to DNA is disrupted, ZBTB7A might still interact with cofactors and other transcription factors binding to the promoter allowing ZBTB7A to be recruited to the promoter to a certain extent. As people with the -175 T>C mutation show the highest levels of foetal haemoglobin (16 % - 40 %) and our studies have allowed the mechanism of -globin activation to now be well understood we suggest that this mutation represents an ideal candidate to use in gene therapy. We have proof that the -175 T>C mutation induces looping of the LCR towards the -globin genes and possibly away from the adult -globin gene. The advantage of this is twofold. Firstly, -globin is activated and secondly this results in down-regulation of adult -globin expression which could be beneficial for patients with a faulty adult globin gene. The -198 T>C mutation on the other hand seems to have an additional interesting feature. We found that it integrated with a much higher frequency than any of the other HPFH mutations that we engineered into cells. This could be of significance when considering introducing this mutation into haematopoietic progenitor cells as part of a gene therapeutic strategy.

Furthermore, we discovered by coincidence that the intergenic region between the two - globin genes may play a major role in silencing -globin expression. For unknown reasons HbF levels of cells missing the intergenic region were exceptionally high with up to 95 % HbF of total haemoglobin and thus one exciting future direction of this work will be to dissect and further investigate the intergenic region between the two -globin genes. The fact that simply deleting the intergenic region elevates HbF levels could also be exploited in gene therapy. Here, no precise DNA repair via HDR is necessary as the deletion could be facilitated by NHEJ.

However, although HbF levels this high may be beneficial for the course of disease, they could also complicate the course of a pregnancy of the affected individual as oxygen transfer between mother and foetus is compromised276. HPFH mutations on the other hand elevate

Page | 117

CHAPTER 6: General discussion and conclusion

foetal globin levels more subtly to levels that are high enough to ameliorate symptoms but low enough to ensure sufficient oxygen transfer during pregnancy276.

6.4 Conclusion Here we present evidence that HPFH mutations could be utilised in a gene therapeutic approach to treat -haemoglobinopathies. Our study introduced three different naturally occurring HPFH mutations into cellular models resulting in elevated levels of foetal haemoglobin in all cell lines. Thus, we propose that this work represents a proof-of-principle study for the use of HPFH mutations in gene therapy.

Furthermore, we were able to investigate the molecular mechanisms underlying these HPFH mutations through in vitro and in vivo binding studies. We demonstrated that the -175 T>C and the -198 T>C mutations create de novo binding sites for the erythroid specific activators TAL1 and KLF1, respectively. Chromatin conformation capture experiments revealed that TAL1 mediates looping of the LCR to the -globin promoter through recruitment of LMO2 and LDB1 to activate foetal globin expression. We also provide evidence that a cluster of HPFH mutations around the -200 in the -globin promoter decreases binding of the foetal globin repressor ZBTB7A.

Overall, we deliver three different mechanistic explanations for non-deletional HPFH in humans. By uncovering the molecular basis underlying these mutations we made a significant contribution to better understanding the foetal to adult haemoglobin switch.

Page | 118

References

References

1. Levine, M. & Tjian, R. Transcription regulation and animal diversity. Nature 424, 147– 151 (2003).

2. Gilbert, S. F. in Developmental Biology (2000).

3. Jansen, M., de Moor, C. H., Sussenbach, J. S. & van den Brande, J. L. Translational control of gene expression. Pediatr Res 37, 681–686 (1995).

4. Beck-Sickinger, A. G. & Mörl, K. Posttranslational Modification of Proteins. Expanding Nature’s Inventory. By Christopher T. Walsh. Angew. Chem. Int. Ed. 45, 1020–1020 (2006).

5. Maston, G. A., Evans, S. K. & Green, M. R. Transcriptional regulatory elements in the human genome. Annu Rev Genomics Hum Genet 7, 29–59 (2006).

6. Blackwood, E. M. & Kadonaga, J. T. Going the distance: a current view of enhancer action. Science 281, 60–63 (1998).

7. Lee, T. I. & Young, R. A. Transcriptional regulation and its misregulation in disease. Cell 152, 1237–1251 (2013).

8. Villard, J. Transcription regulation and human diseases. Swiss Med Wkly 134, 571–579 (2004).

9. Lodish, H. et al. in Molecular Cell Biology. (2000).

10. Courey, A. J. & Jia, S. Transcriptional repression: the long and the short of it. Genes Dev 15, 2786–2796 (2001).

11. Boyadjiev, S. A. & Jabs, E. W. Online Mendelian Inheritance in Man (OMIM) as a knowledgebase for human developmental disorders. Clin Genet 57, 253–266 (2000).

12. Orkin, S. H. & Zon, L. I. Hematopoiesis: an evolving paradigm for stem cell biology. Cell 132, 631–644 (2008).

13. Zhang, Q., Iida, R., Yokota, T. & Kincade, P. W. Early events in lymphopoiesis: an update. Curr Opin Hematol 20, 265–272 (2013).

14. Skalnik, D. G. Transcriptional mechanisms regulating myeloid-specific genes. Gene 284, 1–21 (2002).

15. Semple, J. W., Italiano, J. E. & Freedman, J. Platelets and the immune continuum. Nat Rev Immunol 11, 264–274 (2011).

16. Orkin, S. H. Diversification of haematopoietic stem cells to specific lineages. Nat Rev Genet 1, 57–64 (2000).

Page | 119

References

17. Socolovsky, M., Lodish, H. F. & Daley, G. Q. Control of hematopoietic differentiation: lack of specificity in signaling by cytokine receptors. Proc Natl Acad Sci U S A 95, 6573– 6575 (1998).

18. Robb, L. Cytokine receptors and hematopoietic differentiation. Oncogene 26, 6715– 6723 (2007).

19. De Bruin, A. M., Voermans, C. & Nolte, M. A. Impact of interferon-γ on hematopoiesis. Blood 124, 2479–2486 (2014).

20. Ferreira, R., Ohneda, K., Yamamoto, M. & Philipsen, S. GATA1 function, a paradigm for transcription factors in hematopoiesis. Mol Cell Biol 25, 1215–1227 (2005).

21. Weiss, M. J. & Orkin, S. H. GATA transcription factors: key regulators of hematopoiesis. Exp Hematol 23, 99–107 (1995).

22. Hattangadi, S. M., Wong, P., Zhang, L., Flygare, J. & Lodish, H. F. From stem cell to red cell: regulation of erythropoiesis at multiple levels by multiple proteins, RNAs, and chromatin modifications. Blood 118, 6258–6268 (2011).

23. Tavian, M. & Péault, B. Embryonic development of the human hematopoietic system. Int J Dev Biol 49, 243–250 (2005).

24. Palis, J. Primitive and definitive erythropoiesis in mammals. Front Physiol 5, 3 (2014).

25. Baron, M. H., Isern, J. & Fraser, S. T. The embryonic origins of erythropoiesis in mammals. Blood 119, 4828–4837 (2012).

26. Rifkind, R. A., Chui, D. & Epler, H. An ultrastructural study of early morphogenetic events during the establishment of fetal hepatic erythropoiesis. J Cell Biol 40, 343–365 (1969).

27. Weatherall, D. J. & Clegg, J. B. Molecular genetics of human hemoglobin. Annu Rev Genet 10, 157–178 (1976).

28. Schechter, A. N. Hemoglobin research and the origins of molecular medicine. Blood 112, 3927–3938 (2008).

29. Huehns, E. R. & Faroqui, A. M. Oxygen dissociation properties of human embryonic red cells. Nature 254, 335–337 (1975).

30. Waye, J. S. & Chui, D. H. The alpha-globin gene cluster: genetics and disorders. Clin Invest Med 24, 103–109 (2001).

31. Liebhaber, S. A., Cash, F. E. & Ballas, S. K. Human alpha-globin gene expression. The dominant role of the alpha 2-locus in mRNA and protein synthesis. J Biol Chem 261, 15327–15333 (1986).

32. Shen, S. H., Slightom, J. L. & Smithies, O. A history of the human fetal globin gene duplication. Cell 26, 191–203 (1981).

Page | 120

References

33. Sankaran, V. G., Xu, J. & Orkin, S. H. Advances in the understanding of haemoglobin switching. Br J Haematol 149, 181–194 (2010).

34. Comi, P. et al. Globin chain synthesis in single erythroid bursts from cord blood: studies on gamma leads to beta and G gamma leads to A gamma switches. Proc Natl Acad Sci U S A 77, 362–365 (1980).

35. Li, Q., Peterson, K. R., Fang, X. & Stamatoyannopoulos, G. Locus control regions. Blood 100, 3077–3086 (2002).

36. Van Assendelft, G. B., Hanscombe, O., Grosveld, F. & Greaves, D. R. The β-globin dominant control region activates homologous and heterologous promoters in a tissue-specific manner. Cell 56, 969–977 (1989).

37. Oneal, P. A. et al. Fetal hemoglobin silencing in humans. Blood 108, 2081–2086 (2006).

38. Yagi, M. et al. Chromatin structure and developmental expression of the human alpha- globin cluster. Mol Cell Biol 6, 1108–1116 (1986).

39. Kaczynski, J., Cook, T. & Urrutia, R. Sp1- and Krüppel-like transcription factors. Genome Biol (2003). at

40. Nuez, B., Michalovich, D., Bygrave, A., Ploemacher, R. & Grosveld, F. Defective haematopoiesis in fetal liver resulting from inactivation of the EKLF gene. Nature 375, 316–318 (1995).

41. Miller, I. J. & Bieker, J. J. A novel, erythroid cell-specific murine transcription factor that binds to the CACCC element and is related to the Krüppel family of nuclear proteins. Mol Cell Biol 13, 2776–2786 (1993).

42. Donze, D., Townes, T. M. & Bieker, J. J. Role of erythroid Kruppel-like factor in human gamma- to beta-globin gene switching. J Biol Chem 270, 1955–1959 (1995).

43. Perkins, A. C., Sharpe, A. H. & Orkin, S. H. Lethal beta-thalassaemia in mice lacking the erythroid CACCC-transcription factor EKLF. Nature 375, 318–322 (1995).

44. Borg, J. et al. Haploinsufficiency for the erythroid transcription factor KLF1 causes hereditary persistence of fetal hemoglobin. Nat Genet 42, 801–805 (2010).

45. Zhou, D., Liu, K., Sun, C. W., Pawlik, K. M. & Townes, T. M. KLF1 regulates BCL11A expression and gamma- to beta-globin gene switching. Nat Genet 42, 742–744 (2010).

46. Funnell, A. P. et al. Erythroid Krüppel-like factor directly activates the basic Krüppel-like factor gene in erythroid cells. Mol Cell Biol 27, 2777–2790 (2007).

47. Crossley, M. et al. Isolation and characterization of the cDNA encoding BKLF/TEF-2, a major CACCC-box-binding protein in erythroid cells and selected other cells. Mol Cell Biol 16, 1695–1705 (1996).

Page | 121

References

48. Funnell, A. P. et al. Differential regulation of the α-globin locus by Krüppel-like Factor 3 in erythroid and non-erythroid cells. BMC Mol Biol 15, 8 (2014).

49. Sankaran, V. G. et al. Human fetal hemoglobin expression is regulated by the developmental stage-specific repressor BCL11A. Science 322, 1839–1842 (2008).

50. Uda, M. et al. Genome-wide association study shows BCL11A associated with persistent fetal hemoglobin and amelioration of the phenotype of beta-thalassemia. Proc Natl Acad Sci U S A 105, 1620–1625 (2008).

51. Sankaran, V. G. et al. Developmental and species-divergent globin switching are driven by BCL11A. Nature 460, 1093–1097 (2009).

52. Xu, J. et al. Correction of sickle cell disease in adult mice by interference with fetal hemoglobin silencing. Science 334, 993–996 (2011).

53. Xu, J. et al. Transcriptional silencing of {gamma}-globin by BCL11A involves long-range interactions and cooperation with SOX6. Genes Dev 24, 783–798 (2010).

54. Yi, Z. et al. Sox6 directly silences epsilon globin expression in definitive erythropoiesis. PLoS Genet 2, e14 (2006).

55. Ko, L. J. & Engel, J. D. DNA-binding specificities of the GATA transcription factor family. Mol Cell Biol 13, 4011–4022 (1993).

56. Cantor, A. B. & Orkin, S. H. Transcriptional regulation of erythropoiesis: an affair involving multiple partners. Oncogene 21, 3368–3376 (2002).

57. Wall, L., deBoer, E. & Grosveld, F. The human beta-globin gene 3’ enhancer contains multiple binding sites for an erythroid-specific protein. Genes Dev 2, 1089–1100 (1988).

58. Evans, T., Reitman, M. & Felsenfeld, G. An erythrocyte-specific DNA-binding factor recognizes a regulatory sequence common to all chicken globin genes. Proc Natl Acad Sci U S A 85, 5976–5980 (1988).

59. Kassouf, M. T. et al. Genome-wide identification of TAL1’s functional targets: insights into its mechanisms of action in primary erythroid cells. Genome Res 20, 1064–1083 (2010).

60. Tallack, M. R. et al. A global role for KLF1 in erythropoiesis revealed by ChIP-seq in primary erythroid cells. Genome Res 20, 1052–1063 (2010).

61. Cheng, Y. et al. Erythroid GATA1 function revealed by genome-wide analysis of transcription factor occupancy, histone modifications, and mRNA expression. Genome Res 19, 2172–2184 (2009).

62. Tripic, T. et al. SCL and associated proteins distinguish active from repressive GATA transcription factor complexes. Blood 113, 2191–2201 (2009).

Page | 122

References

63. Yu, M. et al. Insights into GATA-1-mediated gene activation versus repression via genome-wide chromatin occupancy analysis. Mol Cell 36, 682–695 (2009).

64. Harju-Baker, S., Costa, F. C., Fedosyuk, H., Neades, R. & Peterson, K. R. Silencing of Agamma-globin gene expression during adult definitive erythropoiesis mediated by GATA-1-FOG-1-Mi2 complex binding at the -566 GATA site. Mol Cell Biol 28, 3101–3113 (2008).

65. Bottardi, S. et al. Ikaros and GATA-1 combinatorial effect is required for silencing of human gamma-globin genes. Mol Cell Biol 29, 1526–1537 (2009).

66. Keys, J. R. et al. A mechanism for Ikaros regulation of human globin gene switching. Br J Haematol 141, 398–406 (2008).

67. Lee, S. U. & Maeda, T. POK/ZBTB proteins: an emerging family of proteins that regulate lymphoid development and function. Immunol Rev 247, 107–119 (2012).

68. Maeda, T., Hobbs, R. M. & Pandolfi, P. P. The transcription factor Pokemon: a new key player in cancer pathogenesis. Cancer Res 65, 8575–8578 (2005).

69. Sakurai, N. et al. The LRF transcription factor regulates mature B cell development and the germinal center response in mice. J Clin Invest 121, 2583–2598 (2011).

70. Maeda, T. et al. LRF is an essential downstream target of GATA1 in erythroid development and regulates BIM-dependent apoptosis. Dev Cell 17, 527–540 (2009).

71. Pessler, F. & Hernandez, N. Flexible DNA binding of the BTB/POZ-domain protein FBI-1. J Biol Chem 278, 29327–29335 (2003).

72. Masuda, T. et al. Transcription factors LRF and BCL11A independently repress expression of fetal hemoglobin. Science 351, 285–289 (2016).

73. Norton, L. Investigations of the gene regulatory protein, ZBTB7A, and its role in' ' red blood cells. (UNSW, Australia, 2015).

74. Lervolino, L. G. et al. Prevalence of sickle cell disease and sickle cell trait in national neonatal screening studies. Rev Bras Hematol Hemoter 33, 49–54 (2011).

75. Cao, A. & Galanello, R. Beta-thalassemia. Genet Med 12, 61–76 (2010).

76. Bender, M. A. & Douthitt Seibel, G. in GeneReviews(®) (eds. Pagon, R. A. et al.) (University of Washington, Seattle, 1993).

77. Frenette, P. S. & Atweh, G. F. Sickle cell disease: old discoveries, new concepts, and future promise. J Clin Invest 117, 850–858 (2007).

78. Platt, O. S. et al. Mortality in sickle cell disease. Life expectancy and risk factors for early death. N Engl J Med 330, 1639–1644 (1994).

79. Giardine, B. et al. HbVar database of human hemoglobin variants and thalassemia mutations: 2007 update. Hum Mutat 28, 206 (2007).

Page | 123

References

80. Centis, F. et al. The importance of erythroid expansion in determining the extent of apoptosis in erythroid precursors in patients with beta-thalassemia major. Blood 96, 3624–3629 (2000).

81. Rund, D. & Rachmilewitz, E. Beta-thalassemia. N Engl J Med 353, 1135–1146 (2005).

82. Roberts, D. J. & Williams, T. N. Haemoglobinopathies and resistance to malaria. Redox Rep 8, 304–310 (2003).

83. Walters, M. C. et al. Barriers to bone marrow transplantation for sickle cell anemia. Biol Blood Marrow Transplant 2, 100–104 (1996).

84. Mokhtar, G. M., Gadallah, M., El Sherif, N. H. & Ali, H. T. Morbidities and mortality in transfusion-dependent Beta-thalassemia patients (single-center experience). Pediatr Hematol Oncol 30, 93–103 (2013).

85. Marinucci, M. et al. beta Thalassemia associated with increased HB F production. Evidence for the existence of a heterocellular hereditary persistence of fetal hemoglobin (HPFH) determinant linked to beta thalassemia in a southern Italian population. Hemoglobin 5, 1–17 (1981).

86. Nagel, R. L. et al. Structural bases of the inhibitory effects of hemoglobin F and hemoglobin A2 on the polymerization of hemoglobin S. Proc Natl Acad Sci U S A 76, 670–672 (1979).

87. DeSimone, J., Heller, P., Hall, L. & Zwiers, D. 5-Azacytidine stimulates fetal hemoglobin synthesis in anemic baboons. Proc Natl Acad Sci U S A 79, 4428–4431 (1982).

88. Platt, O. S. et al. Hydroxyurea enhances fetal hemoglobin production in sickle cell anemia. J Clin Invest 74, 652–656 (1984).

89. Atweh, G. F. et al. Sustained induction of fetal hemoglobin by pulse butyrate therapy in sickle cell disease. Blood 93, 1790–1797 (1999).

90. Mabaera, R. et al. Neither DNA hypomethylation nor changes in the kinetics of erythroid differentiation explain 5-azacytidine’s ability to induce human fetal hemoglobin. Blood 111, 411–420 (2008).

91. Bank, A., Markowitz, D. & Lerner, N. Gene Transfer. Ann N Y Acad Sci 565, 37–43 (1989).

92. Hoban, M. D., Orkin, S. H. & Bauer, D. E. Genetic treatment of a molecular disorder: gene therapy approaches to sickle cell disease. Blood 127, 839–848 (2016).

93. Cavazzana-Calvo, M. et al. Transfusion independence and HMGA2 activation after gene therapy of human β-thalassaemia. Nature 467, 318–322 (2010).

94. Amato, A. et al. Interpreting elevated fetal hemoglobin in pathology and health at the basic laboratory level: new and known γ- gene mutations associated with hereditary persistence of fetal hemoglobin. Int J Lab Hematol 36, 13–19 (2014).

Page | 124

References

95. Forget, B. G. Molecular basis of hereditary persistence of fetal hemoglobin. Ann N Y Acad Sci 850, 38–44 (1998).

96. Kosteas, T., Palena, A. & Anagnou, N. P. Molecular cloning of the breakpoints of the hereditary persistence of fetal hemoglobin type-6 (HPFH-6) deletion and sequence analysis of the novel juxtaposed region from the 3’ end of the beta-globin gene cluster. Hum Genet 100, 441–445 (1997).

97. Anagnou, N. P. et al. Sequences located 3’ to the breakpoint of the hereditary persistence of fetal hemoglobin-3 deletion exhibit enhancer activity and can modify the developmental expression of the human fetal A gamma-globin gene in transgenic mice. J Biol Chem 270, 10256–10263 (1995).

98. Ottolenghi, S., Mantovani, R., Nicolis, S., Ronchi, A. & Giglioni, B. DNA sequences regulating human globin gene transcription in nondeletional hereditary persistence of fetal hemoglobin. Hemoglobin 13, 523–541 (1989).

99. Wilber, A., Nienhuis, A. W. & Persons, D. A. Transcriptional regulation of fetal to adult hemoglobin switching: new therapeutic opportunities. Blood 117, 3945–3953 (2011).

100. Weatherall, D. J. The thalassaemias. BMJ 314, 1675–1678 (1997).

101. Akinbami, A. O. et al. Hereditary Persistence of Fetal Hemoglobin Caused by Single Nucleotide Promoter Mutations in Sickle Cell Trait and Hb SC Disease. Hemoglobin 40, 64–65 (2016).

102. Oner, R., Kutlar, F., Gu, L. H. & Huisman, T. H. The Georgia type of nondeletional hereditary persistence of fetal hemoglobin has a C---T mutation at nucleotide-114 of the A gamma-globin gene. Blood 77, 1124–1125 (1991).

103. Motum, P. I., Deng, Z. M., Huong, L. & Trent, R. J. The Australian type of nondeletional G gamma-HPFH has a C-->G substitution at nucleotide -114 of the G gamma gene. Br J Haematol 86, 219–221 (1994).

104. Fucharoen, S., Shimizu, K. & Fukumaki, Y. A novel C-T transition within the distal CCAAT motif of the G gamma-globin gene in the Japanese HPFH: implication of factor binding in elevated fetal globin expression. Nucleic Acids Res 18, 5245–5253 (1990).

105. Ottolenghi, S. et al. A frequent A gamma-hereditary persistence of fetal hemoglobin in northern Sardinia: its molecular basis and hematologic phenotype in heterozygotes and compound heterozygotes with beta-thalassemia. Hum Genet 79, 13–17 (1988).

106. Gilman, J. G. et al. Distal CCAAT box deletion in the A gamma globin gene of two black adolescents with elevated fetal A gamma globin. Nucleic Acids Res 16, 10635–10642 (1988).

107. Stoming, T. A. et al. An A gamma type of nondeletional hereditary persistence of fetal hemoglobin with a T----C mutation at position -175 to the cap site of the A gamma globin gene. Blood 73, 329–333 (1989).

Page | 125

References

108. Surrey, S., Delgrosso, K., Malladi, P. & Schwartz, E. A single-base change at position -175 in the 5’-flanking region of the G gamma-globin gene from a black with G gamma- beta+ HPFH. Blood 71, 807–810 (1988).

109. Costa, F. F. et al. The Brazilian type of nondeletional A gamma-fetal hemoglobin has a C----G substitution at nucleotide -195 of the A gamma-globin gene. Blood 76, 1896– 1897 (1990).

110. Giglioni, B. et al. A molecular study of a family with Greek hereditary persistence of fetal hemoglobin and beta-thalassemia. EMBO J 3, 2641–2645 (1984).

111. Ottolenghi, S. et al. A frequent A?-persistence of fetal hemoglobin in northern Sardinia: its molecular basis and hematologic phenotype in heterozygotes and compound heterozygotes with ?-thalassemia. Hum Genet 79, 13–17 (1988).

112. Tasiopoulou, M. et al. G gamma-196 C-->T, A gamma-201 C-->T: two novel mutations in the promoter region of the gamma-globin genes associated with nondeletional hereditary persistence of fetal hemoglobin in Greece. Blood Cells Mol Dis 40, 320–322 (2008).

113. Wood, W. G., MacRae, I. A., Darbre, P. D., Clegg, J. B. & Weatherall, D. J. The British type of non-deletion HPFH: characterization of developmental changes in vivo and erythroid growth in vitro. Br J Haematol 50, 401–414 (1982).

114. Weatherall, D. J. et al. A form of hereditary persistence of fetal haemoglobin characterized by uneven cellular distribution of haemoglobin F and the production of haemoglobins A and A2 in homozygotes. Br J Haematol 29, 205–220 (1975).

115. Gilman, J. G., Mishima, N., Wen, X. J., Kutlar, F. & Huisman, T. H. Upstream promoter mutation associated with a modest elevation of fetal hemoglobin expression in human adults. Blood 72, 78–81 (1988).

116. Collins, F. S., Stoeckert, C. J., Serjeant, G. R., Forget, B. G. & Weissman, S. M. G gamma beta+ hereditary persistence of fetal hemoglobin: cosmid cloning and identification of a specific mutation 5’ to the G gamma gene. Proc Natl Acad Sci U S A 81, 4894–4898 (1984).

117. Rudin, N., Sugarman, E. & Haber, J. E. Genetic and physical analysis of double-strand break repair and recombination in Saccharomyces cerevisiae. Genetics 122, 519–534 (1989).

118. Choulika, A., Perrin, A., Dujon, B. & Nicolas, J. F. Induction of homologous recombination in mammalian by using the I-SceI system of Saccharomyces cerevisiae. Mol Cell Biol 15, 1968–1973 (1995).

119. Rouet, P., Smih, F. & Jasin, M. Introduction of double-strand breaks into the genome of mouse cells by expression of a rare-cutting endonuclease. Mol Cell Biol 14, 8096–8106 (1994).

Page | 126

References

120. Sander, J. D. & Joung, J. K. CRISPR-Cas systems for editing, regulating and targeting genomes. Nat Biotechnol 32, 347–355 (2014).

121. Bibikova, M., Beumer, K., Trautman, J. K. & Carroll, D. Enhancing gene targeting with designed zinc finger nucleases. Science 300, 764 (2003).

122. Durai, S. et al. Zinc finger nucleases: custom-designed molecular scissors for genome engineering of plant and mammalian cells. Nucleic Acids Res 33, 5978–5990 (2005).

123. Urnov, F. D., Rebar, E. J., Holmes, M. C., Zhang, H. S. & Gregory, P. D. Genome editing with engineered zinc finger nucleases. Nat Rev Genet 11, 636–646 (2010).

124. Li, L., Wu, L. P. & Chandrasegaran, S. Functional domains in Fok I restriction endonuclease. Proc Natl Acad Sci U S A 89, 4275–4279 (1992).

125. Bitinaite, J., Wah, D. A., Aggarwal, A. K. & Schildkraut, I. FokI dimerization is required for DNA cleavage. Proc Natl Acad Sci U S A 95, 10570–10575 (1998).

126. White, F. F. & Yang, B. Host and pathogen factors controlling the rice-Xanthomonas oryzae interaction. Plant Physiol 150, 1677–1686 (2009).

127. Römer, P. et al. Plant pathogen recognition mediated by promoter activation of the pepper Bs3 resistance gene. Science 318, 645–648 (2007).

128. Kay, S. & Bonas, U. How Xanthomonas type III effectors manipulate the host plant. Curr Opin Microbiol 12, 37–43 (2009).

129. Boch, J. & Bonas, U. Xanthomonas AvrBs3 family-type III effectors: discovery and function. Annu Rev Phytopathol 48, 419–436 (2010).

130. Boch, J. et al. Breaking the code of DNA binding specificity of TAL-type III effectors. Science 326, 1509–1512 (2009).

131. Moscou, M. J. & Bogdanove, A. J. A simple cipher governs DNA recognition by TAL effectors. Science 326, 1501 (2009).

132. Deng, D. et al. Structural basis for sequence-specific recognition of DNA by TAL effectors. Science 335, 720–723 (2012).

133. Joung, J. K. & Sander, J. D. TALENs: a widely applicable technology for targeted genome editing. Nat Rev Mol Cell Biol 14, 49–55 (2013).

134. Mak, A. N., Bradley, P., Cernadas, R. A., Bogdanove, A. J. & Stoddard, B. L. The crystal structure of TAL effector PthXo1 bound to its DNA target. Science 335, 716–719 (2012).

135. Wiedenheft, B., Sternberg, S. H. & Doudna, J. A. RNA-guided genetic silencing systems in bacteria and archaea. Nature 482, 331–338 (2012).

136. Fineran, P. C. & Charpentier, E. Memory of viral infections by CRISPR-Cas adaptive immune systems: acquisition of new information. Virology 434, 202–209 (2012).

Page | 127

References

137. Garneau, J. E. et al. The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA. Nature 468, 67–71 (2010).

138. Horvath, P. & Barrangou, R. CRISPR/Cas, the immune system of bacteria and archaea. Science 327, 167–170 (2010).

139. Deltcheva, E. et al. CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III. Nature 471, 602–607 (2011).

140. Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816–821 (2012).

141. Charpentier, E. & Doudna, J. A. Biotechnology: Rewriting a genome. Nature 495, 50–51 (2013).

142. Pattanayak, V., Ramirez, C. L., Joung, J. K. & Liu, D. R. Revealing off-target cleavage specificities of zinc-finger nucleases by in vitro selection. Nat Methods 8, 765–770 (2011).

143. Gabriel, R. et al. An unbiased genome-wide analysis of zinc-finger nuclease specificity. Nat Biotechnol 29, 816–823 (2011).

144. Guilinger, J. P. et al. Broad specificity profiling of TALENs results in engineered nucleases with improved DNA-cleavage specificity. Nat Methods 11, 429–435 (2014).

145. Wang, X. et al. Unbiased detection of off-target cleavage by CRISPR-Cas9 and TALENs using integrase-defective lentiviral vectors. Nat Biotechnol 33, 175–178 (2015).

146. O’Geen, H., Henry, I. M., Bhakta, M. S., Meckler, J. F. & Segal, D. J. A genome-wide analysis of Cas9 binding specificity using ChIP-seq and targeted sequence capture. Nucleic Acids Res 43, 3389–3404 (2015).

147. Hsu, P. D. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotechnol 31, 827–832 (2013).

148. Doyon, Y. et al. Enhancing zinc-finger-nuclease activity with improved obligate heterodimeric architectures. Nat Methods 8, 74–79 (2011).

149. Miller, J. C. et al. An improved zinc-finger nuclease architecture for highly specific genome editing. Nat Biotechnol 25, 778–785 (2007).

150. Slaymaker, I. M. et al. Rationally engineered Cas9 nucleases with improved specificity. Science 351, 84–88 (2016).

151. Lieber, M. R. The mechanism of double-strand DNA break repair by the nonhomologous DNA end-joining pathway. Annu Rev Biochem 79, 181–211 (2010).

152. Panier, S. & Boulton, S. J. Double-strand break repair: 53BP1 comes into focus. Nat Rev Mol Cell Biol 15, 7–18 (2014).

Page | 128

References

153. Saleh-Gohari, N. & Helleday, T. Conservative homologous recombination preferentially repairs DNA double-strand breaks in the S phase of the cell cycle in human cells. Nucleic Acids Res 32, 3683–3688 (2004).

154. Heyer, W. D., Ehmsen, K. T. & Liu, J. Regulation of homologous recombination in eukaryotes. Annu Rev Genet 44, 113–139 (2010).

155. Chu, V. T. et al. Increasing the efficiency of homology-directed repair for CRISPR-Cas9- induced precise gene editing in mammalian cells. Nat Biotechnol 33, 543–548 (2015).

156. Chen, F. et al. High-frequency genome editing using ssDNA oligonucleotides with zinc- finger nucleases. Nat Methods 8, 753–755 (2011).

157. Riordan, S. M., Heruth, D. P., Zhang, L. Q. & Ye, S. Q. Application of CRISPR/Cas9 for biomedical discoveries. Cell Biosci 5, 33 (2015).

158. Segal, D. J. & Meckler, J. F. Genome engineering at the dawn of the golden age. Annu Rev Genomics Hum Genet 14, 135–158 (2013).

159. Qi, L. S. et al. Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152, 1173–1183 (2013).

160. Larson, M. H. et al. CRISPR interference (CRISPRi) for sequence-specific control of gene expression. Nat Protoc 8, 2180–2196 (2013).

161. Chen, B. et al. Dynamic imaging of genomic loci in living human cells by an optimized CRISPR/Cas system. Cell 155, 1479–1491 (2013).

162. Nelles, D. A. et al. Programmable RNA Tracking in Live Cells with CRISPR/Cas9. Cell 165, 488–496 (2016).

163. Naldini, L. Gene therapy returns to centre stage. Nature 526, 351–360 (2015).

164. Naldini, L. Ex vivo gene transfer and correction for cell-based therapies. Nat Rev Genet 12, 301–315 (2011).

165. Hacein-Bey Abina, S. et al. Outcomes following gene therapy in patients with severe Wiskott-Aldrich syndrome. JAMA 313, 1550–1563 (2015).

166. Aiuti, A. et al. Lentiviral hematopoietic stem cell gene therapy in patients with Wiskott- Aldrich syndrome. Science 341, 1233151 (2013).

167. Cartier, N. et al. Lentiviral hematopoietic cell gene therapy for X-linked adrenoleukodystrophy. Meth Enzymol 507, 187–198 (2012).

168. Cartier, N. et al. Hematopoietic stem cell gene therapy with a lentiviral vector in X- linked adrenoleukodystrophy. Science 326, 818–823 (2009).

169. Biffi, A. et al. Lentiviral hematopoietic stem cell gene therapy benefits metachromatic leukodystrophy. Science 341, 1233158 (2013).

Page | 129

References

170. Connolly, J. B. Lentiviruses in gene therapy clinical research. Gene Ther 9, 1730–1734 (2002).

171. Ronen, K. et al. Distribution of lentiviral vector integration sites in mice following therapeutic gene transfer to treat β-thalassemia. Mol Ther 19, 1273–1286 (2011).

172. Pang, S. et al. High levels of unintegrated HIV-1 DNA in brain tissue of AIDS dementia patients. Nature 343, 85–89 (1990).

173. Porteus, M. H. Towards a new era in medicine: therapeutic genome editing. Genome Biol 16, 286 (2015).

174. Porteus, M. Genome Editing: A New Approach to Human Therapeutics. Annu Rev Pharmacol Toxicol 56, 163–190 (2016).

175. Hoban, M. D. et al. Correction of the sickle cell disease mutation in human hematopoietic stem/progenitor cells. Blood 125, 2597–2604 (2015).

176. Bauer, D. E. et al. An erythroid enhancer of BCL11A subject to genetic variation determines fetal hemoglobin level. Science 342, 253–257 (2013).

177. Canver, M. C. et al. BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis. Nature 527, 192–197 (2015).

178. Tebas, P. et al. Gene editing of CCR5 in autologous CD4 T cells of persons infected with HIV. N Engl J Med 370, 901–910 (2014).

179. Sambrook, J., Fritsch, E. K. & Maniatis, T. Molecular cloning : a laboratory manual. (Cold Spring Harbor Laboratory Press, 1989).

180. Chan, K. S. et al. Generation of a genomic reporter assay system for analysis of γ- and β- globin gene regulation. FASEB J 26, 1736–1744 (2012).

181. Kurita, R. et al. Establishment of immortalized human erythroid progenitor cell lines able to produce enucleated red blood cells. PLoS ONE 8, e59890 (2013).

182. Andrews, N. C. & Faller, D. V. A rapid micropreparation technique for extraction of DNA- binding proteins from limiting numbers of mammalian cells. Nucleic Acids Res 19, 2499 (1991).

183. Ryan, D. P. et al. Identification of the key LMO2-binding determinants on Ldb1. J Mol Biol 359, 66–75 (2006).

184. Voit, R. A., Hendel, A., Pruett-Miller, S. M. & Porteus, M. H. Nuclease-mediated gene editing by homologous recombination of the human globin locus. Nucleic Acids Res 42, 1365–1378 (2014).

185. Cermak, T., Starker, C. G. & Voytas, D. F. Efficient design and assembly of custom TALENs using the Golden Gate platform. Methods Mol Biol 1239, 133–159 (2015).

Page | 130

References

186. Miller, J. C. et al. A TALE nuclease architecture for efficient genome editing. Nat Biotechnol 29, 143–148 (2011).

187. Ran, F. A. et al. Genome engineering using the CRISPR-Cas9 system. Nat Protoc 8, 2281– 2308 (2013).

188. Zheng, F. Optimised CRISPR design tool. at

189. Woon Kim, Y., Kim, S., Geun Kim, C. & Kim, A. The distinctive roles of erythroid specific activator GATA-1 and NF-E2 in transcription of the human fetal γ-globin genes. Nucleic Acids Res 39, 6944–6955 (2011).

190. Vadolas, J. et al. Development of sensitive fluorescent assays for embryonic and fetal hemoglobin inducers using the human beta -globin locus in erythropoietic cells. Blood 100, 4209–4216 (2002).

191. Weatherall, D. J. & Clegg, J. B. Hereditary persistence of fetal haemoglobin. Br J Haematol 29, 191–198 (1975).

192. Friedman, S. & Schwartz, E. Hereditary persistence of foetal haemoglobin with beta- chain synthesis in cis position (Ggamma-beta+-HPFH) in a negro family. Nature 259, 138–140 (1976).

193. Powars, D. R., Weiss, J. N., Chan, L. S. & Schroeder, W. A. Is there a threshold level of fetal hemoglobin that ameliorates morbidity in sickle cell anemia? Blood 63, 921–926 (1984).

194. Coleman, M. B. et al. G gamma A gamma (beta+) hereditary persistence of fetal hemoglobin: the G gamma -158 C-->T mutation in cis to the -175 T-->C mutation of the A gamma-globin gene results in increased G gamma-globin synthesis. Am J Hematol 42, 186–190 (1993).

195. Gumucio, D. L. et al. The -175T----C mutation increases promoter strength in erythroid cells: correlation with evolutionary conservation of binding sites for two trans-acting factors. Blood 75, 756–761 (1990).

196. Martin, D. I., Tsai, S. F. & Orkin, S. H. Increased gamma-globin expression in a nondeletion HPFH mediated by an erythroid-specific DNA-binding factor. Nature 338, 435–438 (1989).

197. Craig, J. E., Sheerin, S. M., Barnetson, R. & Thein, S. L. The molecular basis of HPFH in a British family identified by heteroduplex formation. Br J Haematol 84, 106–110 (1993).

198. Labie, D. et al. The -158 site 5’ to the G gamma gene and G gamma expression. Blood 66, 1463–1465 (1985).

199. Gilman, J. G. & Huisman, T. H. DNA sequence variation associated with elevated fetal G gamma globin production. Blood 66, 783–787 (1985).

Page | 131

References

200. Slightom, J. L., Blechl, A. E. & Smithies, O. Human fetal G gamma- and A gamma-globin genes: complete nucleotide sequences suggest that DNA can be exchanged between these duplicated genes. Cell 21, 627–638 (1980).

201. Gilman, J. G., Johnson, M. E. & Mishima, N. Four base-pair DNA deletion in human A gamma globin-gene promoter associated with low A gamma expression in adults. Br J Haematol 68, 455–458 (1988).

202. Lloyd, J. A., Lee, R. F. & Lingrel, J. B. Mutations in two regions upstream of the A gamma globin gene canonical promoter affect gene expression. Nucleic Acids Res 17, 4339– 4352 (1989).

203. Gumucio, D. L. et al. Nuclear proteins that bind the human gamma-globin gene promoter: alterations in binding produced by point mutations associated with hereditary persistence of fetal hemoglobin. Mol Cell Biol 8, 5310–5322 (1988).

204. Martin, D. I. & Orkin, S. H. Transcriptional activation and DNA binding by the erythroid factor GF-1/NF-E1/Eryf 1. Genes Dev 4, 1886–1898 (1990).

205. Mantovani, R. et al. The effects of HPFH mutations in the human gamma-globin promoter on binding of ubiquitous and erythroid specific nuclear factors. Nucleic Acids Res 16, 7783–7797 (1988).

206. Hall, M. A. et al. The critical regulator of embryonic hematopoiesis, SCL, is vital in the adult for megakaryopoiesis, erythropoiesis, and lineage choice in CFU-S12. Proc Natl Acad Sci U S A 100, 992–997 (2003).

207. Porcher, C., Liao, E. C., Fujiwara, Y., Zon, L. I. & Orkin, S. H. Specification of hematopoietic and vascular development by the bHLH transcription factor SCL without direct DNA binding. Development 126, 4603–4615 (1999).

208. Wadman, I. A. et al. The LIM-only protein Lmo2 is a bridging molecule assembling an erythroid, DNA-binding complex which includes the TAL1, E47, GATA-1 and Ldb1/NLI proteins. EMBO J 16, 3145–3157 (1997).

209. El Omari, K. et al. Structural basis for LMO2-driven recruitment of the SCL:E47bHLH heterodimer to hematopoietic-specific transcriptional targets. Cell Rep 4, 135–147 (2013).

210. Wang, J. et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res 22, 1798–1812 (2012).

211. Fujiwara, T. et al. Discovering hematopoietic mechanisms through genome-wide analysis of GATA factor chromatin occupancy. Mol Cell 36, 667–681 (2009).

212. Naumann, S., Reutzel, D., Speicher, M. & Decker, H. J. Complete karyotype characterization of the K562 cell line by combined application of G-banding, multiplex- fluorescence in situ hybridization, fluorescence in situ hybridization, and comparative genomic hybridization. Leuk Res 25, 313–322 (2001).

Page | 132

References

213. Ronaghi, M. Pyrosequencing sheds light on DNA sequencing. Genome Res 11, 3–11 (2001).

214. Noordermeer, D. & de Laat, W. Joining the loops: beta-globin gene regulation. IUBMB Life 60, 824–833 (2008).

215. Deng, W. et al. Controlling long-range genomic interactions at a native locus by targeted tethering of a looping factor. Cell 149, 1233–1244 (2012).

216. Deng, W. et al. Reactivation of developmentally silenced globin genes by forced chromatin looping. Cell 158, 849–860 (2014).

217. Ryan, D. P., Duncan, J. L., Lee, C., Kuchel, P. W. & Matthews, J. M. Assembly of the oncogenic DNA-binding complex LMO2-Ldb1-TAL1-E12. Proteins 70, 1461–1474 (2008).

218. Welch, J. J. et al. Global regulation of erythroid gene expression by transcription factor GATA-1. Blood 104, 3136–3147 (2004).

219. Braghini, C. A. et al. Generation of non-deletional hereditary persistence of fetal hemoglobin β-globin locus yeast artificial chromosome transgenic mouse models: - 175 Black HPFH and -195 Brazilian HPFH. Exp Biol Med (Maywood) (2016). doi:10.1177/1535370216636724

220. Liu, L. R. et al. T to C substitution at -175 or -173 of the gamma-globin promoter affects GATA-1 and Oct-1 binding in vitro differently but can independently reproduce the hereditary persistence of fetal hemoglobin phenotype in transgenic mice. J Biol Chem 280, 7452–7459 (2005).

221. Mantovani, R. et al. A protein factor binding to an octamer motif in the gamma-globin promoter disappears upon induction of differentiation and hemoglobin synthesis in K562 cells. Nucleic Acids Res 15, 9349–9364 (1987).

222. Sturm, R. A., Das, G. & Herr, W. The ubiquitous octamer-binding protein Oct-1 contains a POU domain with a homeo box subdomain. Genes Dev 2, 1582–1599 (1988).

223. Bauer, D. E., Kamran, S. C. & Orkin, S. H. Reawakening fetal hemoglobin: prospects for new therapies for the β-globin disorders. Blood 120, 2945–2953 (2012).

224. Cross, A. J., Jeffries, C. M., Trewhella, J. & Matthews, J. M. LIM domain binding proteins 1 and 2 have different oligomeric states. J Mol Biol 399, 133–144 (2010).

225. Ronchi, A., Nicolis, S., Santoro, C. & Ottolenghi, S. Increased Sp1 binding mediates erythroid-specific overexpression of a mutated (HPFH) gamma-globulin promoter. Nucleic Acids Res 17, 10231–10241 (1989).

226. Fischer, K. D. & Nowock, J. The T----C substitution at -198 of the A gamma-globin gene associated with the British form of HPFH generates overlapping recognition sites for two DNA-binding proteins. Nucleic Acids Res 18, 5685–5693 (1990).

Page | 133

References

227. Suske, G. The Sp-family of transcription factors. Gene 238, 291–300 (1999).

228. Wierstra, I. Sp1: emerging roles--beyond constitutive activation of TATA-less housekeeping genes. Biochem Biophys Res Commun 372, 1–13 (2008).

229. Chong, H. K., Biesinger, J., Seo, Y. K., Xie, X. & Osborne, T. F. Genome-wide analysis of hepatic LRH-1 reveals a promoter binding preference and suggests a role in regulating genes of lipid metabolism in concert with FXR. BMC Genomics 13, 51 (2012).

230. Burdach, J. et al. Regions outside the DNA-binding domain are critical for proper in vivo specificity of an archetypal zinc finger transcription factor. Nucleic Acids Res 42, 276– 289 (2014).

231. Chen, X. et al. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133, 1106–1117 (2008).

232. Lim, W. Directing an artificial zinc finger protein to new targets by fusion to a non-DNA binding domain. (UNSW, Australia, 2016).

233. Jiang, J. et al. A core Klf circuitry regulates self-renewal of embryonic stem cells. Nat Cell Biol 10, 353–360 (2008).

234. Lim, W. F. et al. Directing an artificial zinc finger protein to new targets by fusion to a non-DNA-binding domain. Nucleic Acids Res 44, 3118–3130 (2016).

235. Anderson, K. P., Kern, C. B., Crable, S. C. & Lingrel, J. B. Isolation of a gene encoding a functional zinc finger protein homologous to erythroid Krüppel-like factor: identification of a new multigene family. Mol Cell Biol 15, 5957–5965 (1995).

236. Basu, P. et al. KLF2 is essential for primitive erythropoiesis and regulates the human and murine embryonic beta-like globin genes in vivo. Blood 106, 2566–2571 (2005).

237. Schuller, C. E., Jankowski, K. & Mackenzie, K. L. Telomere length of cord blood-derived CD34(+) progenitors predicts erythroid proliferative potential. Leukemia 21, 983–991 (2007).

238. FANTOM Consortium and the RIKEN PMI and CLST (DGT) et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).

239. Doss, J. F. et al. A comprehensive joint analysis of the long and short RNA transcriptomes of human erythrocytes. BMC Genomics 16, 952 (2015).

240. Fibach, E., Burke, L. P., Schechter, A. N., Noguchi, C. T. & Rodgers, G. P. Hydroxyurea increases fetal hemoglobin in cultured erythroid cells derived from normal individuals and patients with sickle cell anemia or beta-thalassemia. Blood 81, 1630–1635 (1993).

241. Mabaera, R. et al. A cell stress signaling model of fetal hemoglobin induction: what doesn’t kill red blood cells may make them stronger. Exp Hematol 36, 1057–1072 (2008).

Page | 134

References

242. Pearson, R. C., Funnell, A. P. & Crossley, M. The mammalian zinc finger transcription factor Krüppel-like factor 3 (KLF3/BKLF). IUBMB Life 63, 86–93 (2011).

243. Siatecka, M. & Bieker, J. J. The multifunctional role of EKLF/KLF1 during erythropoiesis. Blood 118, 2044–2054 (2011).

244. Gumucio, D. L. et al. Interaction of Sp1 with the human gamma globin promoter: binding and transactivation of normal and mutant promoters. Blood 78, 1853–1863 (1991).

245. Courey, A. J. & Tjian, R. Analysis of Sp1 in vivo reveals mutiple transcriptional domains, including a novel glutamine-rich activation motif. Cell 55, 887–898 (1988).

246. Love, P. E., Warzecha, C. & Li, L. Ldb1 complexes: the new master regulators of erythroid gene transcription. Trends Genet 30, 1–9 (2014).

247. Gregory, R. C. et al. Functional interaction of GATA1 with erythroid Krüppel-like factor and Sp1 at defined erythroid promoters. Blood 87, 1793–1801 (1996).

248. Merika, M. & Orkin, S. H. Functional synergy and physical interactions of the erythroid transcription factor GATA-1 with the Krüppel family proteins Sp1 and EKLF. Mol Cell Biol 15, 2437–2447 (1995).

249. Li, L. et al. Ldb1-nucleated transcription complexes function as primary mediators of global erythroid gene activation. Blood 121, 4575–4585 (2013).

250. Wienert, B. et al. Editing the genome to introduce a beneficial naturally occurring mutation associated with increased fetal globin. Nat Commun 6, 7085 (2015).

251. Suzuki, T., Kimura, A., Nagai, R. & Horikoshi, M. Regulation of interaction of the acetyltransferase region of p300 and the DNA-binding domain of Sp1 on and through DNA binding. Genes Cells 5, 29–41 (2000).

252. Bannister, A. J. & Kouzarides, T. The CBP co-activator is a histone acetyltransferase. Nature 384, 641–643 (1996).

253. Ogryzko, V. V., Schiltz, R. L., Russanova, V., Howard, B. H. & Nakatani, Y. The transcriptional coactivators p300 and CBP are histone acetyltransferases. Cell 87, 953– 959 (1996).

254. Zhang, W., Kadam, S., Emerson, B. M. & Bieker, J. J. Site-specific acetylation by p300 or CREB binding protein regulates erythroid Krüppel-like factor transcriptional activity via its interaction with the SWI-SNF complex. Mol Cell Biol 21, 2413–2422 (2001).

255. Kadam, S. et al. Functional selectivity of recombinant mammalian SWI/SNF subunits. Genes Dev 14, 2441–2451 (2000).

256. Turner, J. & Crossley, M. Cloning and characterization of mCtBP2, a co-repressor that associates with basic Krüppel-like factor and other mammalian transcriptional regulators. EMBO J 17, 5129–5140 (1998).

Page | 135

References

257. Coghill, E. et al. Erythroid Kruppel-like factor (EKLF) coordinates erythroid cell proliferation and hemoglobinization in cell lines derived from EKLF null mice. Blood 97, 1861–1868 (2001).

258. Lee, S. U. et al. LRF-mediated Dll4 repression in erythroblasts is necessary for hematopoietic stem cell maintenance. Blood 121, 918–929 (2013).

259. Martyn, G. A potential role for ZBTB7A in the foetal to adult globin gene switch. (UNSW, 2013).

260. De Brabander, M. J., Van de Veire, R. M., Aerts, F. E., Borgers, M. & Janssen, P. A. The effects of methyl (5-(2-thienylcarbonyl)-1H-benzimidazol-2-yl) carbamate, (R 17934; NSC 238159), a new synthetic antitumoral drug interfering with microtubules, on mammalian cells cultured in vitro. Cancer Res 36, 905–916 (1976).

261. Persikov, A. V., Osada, R. & Singh, M. Predicting DNA recognition by Cys2His2 zinc finger proteins. Bioinformatics 25, 22–29 (2009).

262. Persikov, A. V. & Singh, M. De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins. Nucleic Acids Res 42, 97–108 (2014).

263. Schaffner, W. A hit-and-run mechanism for transcriptional activation? Nature 336, 427– 428 (1988).

264. Para, A. et al. Hit-and-run transcriptional control by bZIP1 mediates rapid nutrient signaling in Arabidopsis. Proc Natl Acad Sci U S A 111, 10371–10376 (2014).

265. Charoensawan, V., Martinho, C. & Wigge, P. A. Hit-and-run": Transcription factors get caught in the act. Bioessays 37, 748–754 (2015).

266. Zaret, K. S. Genome reactivation after the silence in mitosis: recapitulating mechanisms of development? Dev Cell 29, 132–134 (2014).

267. Kadauke, S. & Blobel, G. A. Mitotic bookmarking by transcription factors. Epigenetics Chromatin 6, 6 (2013).

268. Caravaca, J. M. et al. Bookmarking by specific and nonspecific binding of FoxA1 pioneer factor to mitotic chromosomes. Genes Dev 27, 251–260 (2013).

269. Egli, D., Birkhoff, G. & Eggan, K. Mediators of reprogramming: transcription factors and transitions through mitosis. Nat Rev Mol Cell Biol 9, 505–516 (2008).

270. Weiss, M. J., Yu, C. & Orkin, S. H. Erythroid-cell-specific properties of transcription factor GATA-1 revealed by phenotypic rescue of a gene-targeted cell line. Mol Cell Biol 17, 1642–1651 (1997).

271. Kidder, B. L., Hu, G. & Zhao, K. ChIP-Seq: technical considerations for obtaining high- quality data. Nat Immunol 12, 918–922 (2011).

Page | 136

References

272. Armstrong, J. A., Bieker, J. J. & Emerson, B. M. A SWI/SNF-related chromatin remodeling complex, E-RC1, is required for tissue-specific transcriptional regulation by EKLF in vitro. Cell 95, 93–104 (1998).

273. Telen, M. J. Beyond hydroxyurea: new and old drugs in the pipeline for sickle cell disease. Blood (2016). doi:10.1182/blood-2015-09-618553

274. Mohrin, M. et al. Hematopoietic stem cell quiescence promotes error-prone DNA repair and mutagenesis. Cell Stem Cell 7, 174–185 (2010).

275. Akinsheye, I. et al. Fetal hemoglobin in sickle cell anemia. Blood 118, 19–27 (2011).

276. Murji, A. et al. Pregnancy outcomes in women with elevated levels of fetal hemoglobin. J Matern Fetal Neonatal Med 25, 125–129 (2012).

Page | 137

Appendix

Appendix I - Primer lists

Table I.1: qPCR primers for gene expression

qPCR (RT PCR) primers Data base gene F/R Oligo sequence 5' to 3'

A6666 F GGTTGCGGCAAGAGCTACA hKLF1 A6667 R ACACAGGGGAGAAGCCATACG A2672 F AGGTCGTCGTCGGTGC hKLF2 A2673 R TGCCGTCCTTCTCCACTTTCG A3950 F ACCCAGTTCCTGTCAAGCAA hKLF3 A3951 R TCAGGCAATGGTGTGGAGTA A6598 F CATTACCAAGAGCTCATGCCACC hKLF4 A6599 R CAGCCCGCGTAATCACAAG A7315 F TTTAAAAGCTCACCTGAGGACTCA hKLF5 A7316 R CAGCCTTCCCAGGTACACTTG A7271 F TCTGGAGGAGTACTGGCAACAG hKLF6 A7272 R GCTCGCTCTGGAGGTAACGT A7273 F CCGGCTACTTCTCAGCTTTACC hKLF7 A7274 R CGTTCCAATTCAAGGCATGTC A3041 F ACCAAAAGCTCTCACCTGAAAGC hKLF8 A3042 R TCTGAGCGAGCAAATTTCCA A7275 F CCATTACAGAGTGCATACAGGTGAA hKLF9 A7276 R TGAGCGGGAGAACTTTTTAAGG A7277 F GGTGCCTCTCTCCAGCAGACT hKLF10 A7278 R CTTTTGGCCTTTCAGAAATCATTT A7317 F AGCCACACCTGAACTACCAAAAG hKLF11 A7318 R GGCTCTGAGGAGGAGTTATGCA A7281 F AAAACAGAGCTTTTGGAATCTGAAC hKLF12 A7282 R GCTTCCATATCGGGATAGTTGTG A7319 F GAAATCTTCGCACCTCAAG hKLF13 A7320 R GAACTTCTTGTTGCAGTCCT A7289 F GCGCCAAAGCCTACTACAAGTC hKLF16 A7290 R TGCCAGTCACAAGCAAAAGG

Page | 138

Appendix

A7291 F CCAGGCTGCCCAGGATAA hKLF17 A7292 R CTTCCAGAAGATGAAGACATGTTCA A7307 F CATGCACCTGCCCCTACTG hSP1 A7308 R TGCTGTTTCTTTTTGCCAGGAT A7309 F CCTCCCAGAACTTTCAGATCCA hSP2 A7310 R GGAAGGCGTGCGGATGT A7311 F CATCCAGGAGAGAATGCTGACA hSP3 A7312 R CACTCTTCAGGATCAGGTTCTTCTT A5307 F GCCGACATCCCCGATTACAAGA tdTomato A5308 R CGATGGTGTAGTCCTCGTTGTGG A2524 F CCTGTCCTCTGCCTCTGCC HBG A2525 R GGATTGCCAAAACGGTCAC A4069 F TGTCCACTCCTGATGCTGTTATG HBB A4070 R GGCACCGAGCACTTTCTTG A4141 F TGCTGAGGAGAAGGCTGCCG HBE A4142 R TGGGTCCAGGGGTAAACAACGAGG

A3994 F AACCTCAAGGGCACTTTTTCT HBD A3995 R GGAAACAGTCCAGGATCTCAA A2682 F GGGTGGACCCGGTCAACTT HBA A2683 R GAGGTGGGCGGCCAGGGT A3992 F GAGGACCATCATTGTGTCCA HBZ A3993 R AGTGCGGGAAGTAGGTCTTG

A1560 F CACGGCCGGTACAGTGAAAC 18S A1561 R AGAGGAGCGAGCGACCAA A4111 F CGAGCACAAACGGAAACAATG hBCL11A A4112 R GATTAGAGCTCCATGTGCAGAACG A4283 F AAGGCTCGTTCCTGTTCAGA hGATA2 A4284 R TGCCCATTCATCTTGTGGTA A5073 F TGCTCTGGTGTCCTCCACAC hGATA1 A5074 R TGGGAGAGGAATAGGCTGCT A5071 F AGCCGGATGCCTTCCCTAT hTAL1 A5072 R CCGCACAACTTTGGTGTGG A5480 hSOX6 F GGCGTCCCCCTACCCTGTCATCC

Page | 139

Appendix

A5481 R TGCTGCACACGGCTCCTCACTG

Table I.2: ChIP primers

ChIP primers Data base Locus F/R Oligo sequence 5' to 3'

A3825 human HBB F GGAGGGCTGAGGGTTTGAAGTCC promoter A3826 R TGTCCTTGGCTCTTCTGGCACTG A1925 human HBG F CAAGGCTATTGGTCAAGGCAA promoter A1926 R TTCCCCACACTATCTCAATGCAAA A4994 human ZFPM1 (Fog1) F AGTCGATGTGAGCTCCGATAA intron1 A4995 R GGCCAAAGATAAGGCCTCTT A5106 F AGCACACCACACACATATCG mouse Klf1 promoter A5107 R ATGGGCTATGAGGCTAGGAA

A2756 mouse KLF8 -4.5 kb F GGTTTCTGAGACCTAACACTTCACACT of exon 1 A2757 R CCATTTAGTCATCCAGCGAACAA A4376 human BIM1 F CGGGTTGGGGTAGGTGAG (BCL2L11) promoter A4377 R GGCGTGTTTACCGGAGTAAC A4445 human HBE F CACAAACTTAGTGTCCATCCATCAC promoter A4446 R CCCTGTTCTCCATGGTACTTAAAAG A7203 human KLF8 F TCGCTGTCCTTGTTCGTGTTA promoter A7204 R GCTGACAACTGGAAGCGATATTT A7205 F CCAGAACTGCTCATGCTTGGA human HS3 A7206 R CTTAGTTCCTGTTACATTTCTGTGTGTCT A4374 human KLF1 F TCAAATTAGCCTGGCGTTCAA promoter A4375 R AATGGTGGGCCAGTTGTCA A1965 F GAGATCATGGATCACTTTCAGAG human HBG -1.1 kb A1966 R AAGTATTTATGGTGGTTTTTTGG A1967 F CCTGACCAGGAACCAGCAGAAAAG human HBG -2.1 kb A1968 R AAGGTGCTATAACAAAATAGCATAG A1969 F ATGTGGGTTTTGATGAGCAAAT human HBG -3.1 kb A1970 R ACCTTTTACTCCCACTTGCAGAAC

Page | 140

Appendix

Table I.3: Chromatin conformation capture primers

3C primers Primers for digestion efficiency Data Locus F/R Oligo sequence 5' to 3' base A5581 F CATCACTCTAGGCTGAGAACATCTG HS2 uncut A5582 R GGCTCAAGCACAGCAATGC A5610 F GTCAGTGGGGCTGGAATAAA HBB cut A5591 R TGGTCAGAGCCTCAGTTTCA A5614 F CGTAGAGGACTAGGAAAGACCAGA HBE term cut A5595 R TGTGCACATAAGCAGATTACTTTTT Primers for 3C assay Locus F/R Oligo sequence 5' to 3'

A5581 F CATCACTCTAGGCTGAGAACATCTG HS2 (internal ctrl) A5582 R GGCTCAAGCACAGCAATGC A5600 3C HS2 (constant) F CATAGTTGTCAGCACAATGCCTA A5590 3C HS 3' R GCCTGGTGGTGACAAAATCT A5591 3C HBB R TGGTCAGAGCCTCAGTTTCA A5592 3C HBD R ATACTGAAACATAGGGGACGAG A5784 3C HBG1 R GGAGGCAAGCTGTATCTTCAAATT A5787 3C HBG2 R GAGCTGTGAGGTGAAACTACCA A5595 3C HBE term R TGTGCACATAAGCAGATTACTTTTT 3C HBG2 A5787 R GAGCTGTGAGGTGAAACTACCA (contstant) A5597 3C HS1 R CCTGATGAGTTTTTCCTCCA A5598 3C HS2 R GCTTGGACTATGGGAGGTCA A6087 3C HS3 R CAGCCTTTTGCTCAGGGTAG A6088 3C HS4 R CCAAATGGGTGACTGTAGGG A6089 3C HS5 R AGGACATGGCCATCAGTACA

Page | 141

Appendix

Table I.4: Genome editing primers

Genome editing primers ssODN donors Locus F Oligo sequence 5' to 3'

AGGCTGGCCAACCCATGGGTGGAGTTTAGCCAGGGA CCGTTTCAGACAGATATTTGCATTGAGATAGTGTGCG -195 C>G F GAAGGGGCCCCCAAGAGGATACTGCTAATTTTTTTTAT AGCCTTTGCCTTGTTCCGATTCAGTCATTCCAGTTTTTC

AGGCTGGCCAACCCATGGGTGGAGTTTAGCCAGGGA CCGTTTCAGACAGATATTTGCATTGAGATAGTGTGGG -198 T>C F GGAGGGGCCCCCAAGAGGATACTGCTAATTTTTTTTAT AGCCTTTGCCTTGTTCCGATTCAGTCATTCCAGTTTTTC

sgRNA oligos Locus F Oligo sequence 5' to 3' A6451 sgRNA - F CACCGCATTGAGATAGTGTGGGGAA A6452 195/-198 R AAACTTCCCCACACTATCTCAATGC

genomic PCR screening primers A5077 HBG- F AGTGTGTGGACTATTAGTCAA A5078 tdTomato R ATGAACTCTTTGATGACCTCC

A5077 HBG- F AGTGTGTGGACTATTAGTCAA A5357 ECFP R ATGAACTTCAGGGTCAGCTT A6969 F ACGGCTGACAAAAGAAGTCC HBG A5220 R GGCGTCTGGACTAGGAG A5077 F AGTGTGTGGACTATTAGTCAA HBG wide A5265 R CAGTGGTATCTGGAGGACA

Page | 142

Appendix

Appendix II - Controls

The following figures dispplay supporting information and controls that were performed for the chromatin conformation capture experiments and qPCR primer validations.

Figure II.1: 3C controls. (A) Representative gel electrophoresis of pEBACGdsREDEGFP and pBAC148DNA. BAC DNA was purified (lane 1), digested with HindIII (lane 2), and ligated with T4 ligase (lane 3) to generate random ligation products of HindIII fragments. Ligated DNA served as control DNA in 3C assay. (B) Digestion efficiency of 3C samples at two different loci. Only samples with efficiency greater than 75% were considered for analysis. (C) Gel electrophoresis of amplification products of 3C primers to verify primer specificity. (D) 3C primers were tested for linearity in qPCR by using serial diluted ligated BAC DNA as template.

Page | 143

Appendix

Figure II.2: qPCR primer validation. Graphs display the standard curves that were run for different sets of qPCR primers to determine linear amplification of targets. cDNA from K562 (A-C) or HUDEP2 cells (D) was serial diluted and standard curves were run and analysed with the software provided with the Applied Biosystems 7500 Fast Real-Time PCR System.

Page | 144

Appendix

Appendix III - Additional data

Figure III.1: Published RNA-Seq analyses. (A) RNA-Seq CAGE analysis of human tissues in FANTOM5 project238. Data was accessed through http://www.ebi.ac.uk. Numbers display relative expression levels of KLFs and SP proteins in foetal umbilical cord blood and adult bone marrow. (B) RNA-Seq data of KLF/SP expression levels from Doss et al. 2015239. Preanalysed expression data from the study in red blood cells (top) and 8 days-differentiated CD34+ cells (bottom) was graphed using Excel to display relative expression levels of KLFs.

Page | 145

Appendix

Figure III.2: ChIP-qPCR in HUDEP2 cells carrying the -198 T>C mutation (raw data). ChIPs for SP1, KLF1 and KLF3 were performed in HUDEP2 WT (A) and HUDEP2 -198 T>C cells from homozygous clones #9 (B), #11 (C) and #21 (D). The tested genomic loci were the globin promoter (HBG), the globin promoter (HBB) and the globin promoter (HBE, -ctrl). The SP1 promoter which all three factors bind in vivo served as a positive control (+ctrl) for successful pulldown with the respective antibody. Cts for each target were normalised against Cts of the input.

Page | 146

Appendix

Figure III.3: Mitotic arrest followed by ChIP-qPCR for ZBTB7A in K562 cells. (A) Cell cycle analysis of synchronised K562 cells by flow cytometry. Cells were arrested in G2/M phase by treatment with nocodazole for 15h. Then cells were washed and released back into the cell cycle. Samples were taken from asynchronous cells, after 15h of nocodazole treatment and 1h, 2h, 3h, 5h, and 7h after release of the mitotic block. Cells were fixed and stained with propidium iodide before flow cytometry. Data was analysed with the cell cycle algorithm provided by the FlowJo software. (B) ChIP-qPCR experiment of ZBTB7A in asynchronous and synchronised K562 cells. ChIP samples were taken 15h of after nocodazole treatment and 1h and 2h after the mitotic block was released. Samples were analysed by qPCR and normalised over the input. Controls included the KLF1 proximal promoters as a positive as well as the regions -1.1 kb and -2.1 kb upstream of the -globin promoter as negatives.

Page | 147

Appendix

Figure III.4: ChIP-qPCR in K562 G-A tdTomato cells (raw data). ChIP-qPCR experiment of ZBTB7A in K562 G-A tdTomato WT clones (A) or -195 C>G clones (B). ChIP samples were analysed by qPCR and normalised over the input. Controls included the BCL2L11 promoter and HS3 as positives and the epsilon-globin promoter as a negative.

Page | 148

Appendix

Figure III.5: Raw expression qPCR data for -globin expression in -198 T>C (A) and -195 C>G (B) HUDEP2 cells. Amplification plots display the threshold cycle (Ct) of loading control 18S and -globin. Expression levels of -globin RNA in HUDEP2 -198 T>C and -195 C>G cells were compared to expression in WT HUDEP2 cells.

Page | 149