Precise and sustained silencing in CD4+ cells using designer epigenome modifiers as a therapeutic approach to treat HIV infection

Inaugural-Dissertation

to obtain the Doctoral Degree Faculty of Biology, Albert-Ludwigs-Universität Freiburg im Breisgau

presented by

Tafadzwa Mlambo born in Harare, Zimbabwe

Freiburg im Breisgau February 2018

Dekanin: Prof. Dr. Bettina Warscheid Promotionsvorsitzender: Prof. Dr. Andreas Hiltbrunner

Betreuer der Arbeit: Dr. Claudio Mussolino

Referent: Prof. Dr. Toni Cathomen Ko-Referent: Prof. Dr. Peter Stäheli Drittprüfer: Dr. Giorgos Pyrowolakis

Datum der mündlichen Prüfung: 27.04.2018

iii Table of Contents

TABLE OF CONTENTS

TABLE OF CONTENTS III

ABSTRACT VIII

1. INTRODUCTION 11

1.1 HIV BURDEN AND EPIDEMIOLOGY 11

1.2 HIV LIFE CYCLE AND TROPISM 12

1.3 HIV TREATMENT 14

1.4 CCR5 AND CXCR4 AS TARGETS OF ANTI-HIV THERAPY 15

1.5 DESIGNER NUCLEASE TECHNOLOGY 17

1.5.1 Zinc finger nucleases 19

1.5.2 Transcription activator-like effector nucleases 20

1.5.3 RNA-guided endonucleases 21

1.6 HIV GENE THERAPY 22

1.7 OFF-TARGET EFFECTS 25

1.8 DELIVERY 27

1.9 EPIGENETIC REGULATION 30

1.9.1 Gene expression 30

1.9.2 Transcriptional regulation of gene expression 31

1.9.3 Targeted transcription activation 32

1.9.4 Targeted transcription repression 33

1.9.5 Epigenome editing 36

1.9.6 DNA methylation 38

iv Table of Contents

1.9.7 Designer epigenome modifiers 40

1.10 AIM AND OBJECTIVES OF PHD THESIS 42

2. MATERIALS AND METHODS 44

2.1 STANDARD MOLECULAR BIOLOGY METHODS 44

2.1.1 Restriction digest 44

2.1.2 Ligation 44

2.1.3 Polymerase Chain Reaction (PCR) 44

2.1.4 Transformation of chemically competent E.coli cells 45

2.1.5 Colony PCR 45

2.1.6 DNA extraction from bacteria (Mini and Midi preparations) 46

2.1.7 Agarose gel electrophoresis 46

2.1.8 PCR purification 47

2.1.9 Gel purification 47

2.1.10 Oligo cloning 47

2.1.11 Gibson Assembly 47

2.2 GENERATION OF THE TALE-BASED DESIGNER TRANSCRIPTION FACTORS 48

2.2.1 Generation of the TALE arrays 48

2.2.2 Generation of the TALE-based designer transcription activators and repressors 48

2.2.3 Generation of DEM expression plasmids 49

2.2.4 Cloning of the Firefly Luciferase reporter 53

2.2.5 Cloning of the EGFP reporter 54

2.3 MRNA PRODUCTION 55

2.3.1 Plasmid linearization 55

2.3.2 In vitro transcription 55

2.4 CELL CULTURE METHODS 56

2.4.1 Culture conditions 56

v Table of Contents

2.4.2 DNA PEI transfection 57

2.4.3 RNA lipofection 58

2.4.4 Dual Luciferase assay 58

2.4.5 Virus production and transduction 59

2.4.6 Generation of reporter cells 60

2.4.7 Reactivation 60

2.4.8 Flow cytometry 61

2.5 PRIMARY CELL CULTURE 62

2.5.1 PBMC extraction from LRS Chamber 62

2.5.2 MACS Isolation of CD4+ cells from PBMCs 63

2.5.3 Thawing and activation of primary human CD4+ T cells 64

2.5.4 Nucleofection of primary CD4+ T cells 65

2.6 EPIGENETIC AND EXPRESSION ANALYSES 66

2.6.1 Bisulfite sequencing 66

2.7 IN SILICO PREDICTION OF DEM #6 OFF-TARGET SITES 68

2.7.1 Bisulfite sequencing via Next Generation Sequencing 69

2.7.2 RNA isolation 71

2.7.3 Reverse transcription 72

2.7.4 Quantitative real-time PCR analysis 72

2.8 CHROMATIN-BASED STUDIES AND WHOLE TRANSCRIPTOME ANALYSIS 73

2.8.1 Chromatin immunoprecipitation 73

2.8.2 ATAC-seq 77

2.8.3 RNA-seq 78

2.9 STATISTICAL ANALYSIS 79

3. RESULTS 80

3.1 FUNCTIONALITY OF THE DESIGNER TRANSCRIPTION ACTIVATORS 81

vi Table of Contents

3.1.1 CCR5 target sites and Firefly Luciferase-based reporter 81

3.1.2 Functionality and synergy of the DTAs 82

3.2 GENERATION OF A GFP-BASED REPORTER AND REPORTER CELL LINE 85

3.3 REPRESSION OF CCR5 EXPRESSION IN A REPORTER CELL LINE 87

3.4 GENERATION OF THE DEM CONSTRUCTS 89

3.5 MRNA PRODUCTION VIA IN VITRO TRANSCRIPTION 90

3.6 LONG-TERM REPRESSION OF EGFP EXPRESSION IN A REPORTER CELL LINE 92

3.6.1 mRNA vs DNA delivery of DEMs resulted in potent silencing of gene expression 93

3.7 DNA METHYLATION ANALYSIS IN A REPORTER CELL LINE 94

3.8 DEM-INDUCED GENE SILENCING IN HEK293T CELLS 98

3.9 DEM-INDUCED GENE SILENCING IN CD4+ CELLS 101

3.9.1 Multiplex gene silencing in CD4+ cells 105

3.9.2 DNA methylation analysis in CD4+ cells 106

3.9.3 DEM-mediated histone modifications 109

3.10 SPECIFICITY PROFILE OF THE DEMS 110

4. DISCUSSION 116

4.1 TRANSCRIPTIONAL REGULATION ACHIEVED WITH THE DESIGNER TRANSCRIPTION FACTORS IS

MODEST AND TRANSIENT 116

4.2 DEMS MEDIATE POTENT, STABLE GENE SILENCING ACCOMPANIED BY DNA METHYLATION

120

4.3 DEMS EXHIBIT FUNCTIONALITY AT CLINICALLY RELEVANT, ENDOGENOUS 123

4.4 DEMS EXHIBIT A BENIGN SPECIFICITY PROFILE 129

4.5 THE RELEVANCE OF DEM-BASED APPROACHES FOR HIV GENE THERAPY 132

5. REFERENCES 137

6. APPENDIX 149

vii Table of Contents

6.1 LIST OF ABBREVIATIONS 149

6.2 LIST OF FIGURES 154

6.3 LIST OF TABLES 155

6.4 SUPPLEMENTARY MATERIALS AND METHODS 156

6.5 SUPPLEMENTARY TABLES 159

6.6 CURRICULUM VITAE 198

6.7 ACKNOWLEDGMENTS 200

6.8 DECLARATION 201

viii Abstract

ABSTRACT

The human immunodeficiency virus (HIV) is a major global health burden which has claimed over 25 million lives in the past 30 years. Although the current treatment strategies have proven to be effective, several concerns remain to be addressed. The CCR5 and CXCR4 co-receptors are necessary for HIV entry into host cells and have therefore gained interest as possible targets for therapeutics against HIV infection. Designer nuclease technology has not only revolutionized the fields of biology and medicine but has also provided tools which enable the specific targeting and inactivation of the CCR5 and CXCR4 genes. However, nuclease-based approaches are associated with significant off-target effects and improving their safety remains an important consideration. Transcriptional repression via epigenetic modification provides a viable alternative to nuclease-mediated gene knock-out and may represent a safer approach as the genomic sequence remains unchanged and the severity of off-target effects may therefore be reduced. To investigate this, designer epigenome modifiers (DEMs) were generated by fusing transcription activator-like-effector (TALE)- based DNA binding domains to effectors capable of inducing gene silencing through DNA methylation and by altering chromatin conformation. DEM constructs targeted to the CCR5 and CXCR4 genes were tested in a reporter cell line harbouring an integrated EGFP expression cassette under control of the CCR5 proximal promoter or in HEK293T cells respectively. EGFP silencing was observed in 80% of the cells and up to 65 days following the delivery of a CCR5-specific DEM. Silencing was associated with up to 80% CpG methylation with minimal spreading from the target site. Furthermore, EGFP expression could be restored via treatment with the non-specific demethylating agent 5-AZA thus demonstrating the reversible nature of DEM-induced silencing. CXCR4-specific DEMs induced a 3.4- and 1.8-fold reduction in CXCR4 mRNA and transcript levels respectively,

ix Abstract as measured by quantitative RT-PCR or flow cytometry. Silencing was again accompanied by up to 22% CpG methylation demonstrating that DEM-induced silencing is retained following cell division. To highlight the translational potential of DEMs, their functionality was tested in CD4+ primary T cells. Four days post-delivery, CCR5-specific DEMs resulted in 1.8- or 1.6-fold reduction in CCR5 transcript or levels, respectively whereas moderate silencing was observed for CXCR4. Evaluation of the cells at a second time-point

18 days post-transfection revealed a 1.6-fold reduction both in CXCR4 transcript and protein levels whereas CCR5 silencing had been lost, most likely due to the T cell-culturing conditions. Next-generation bisulfite sequencing carried out for CCR5 revealed significant

DNA methylation within 5 kb of the target site. Similarly, up to a 12-fold increase in methylation was observed at the CXCR4 gene. Silencing of both genes was associated with the H3K9me3 repressive histone modification as evaluated via chromatin precipitation. To address the safety concerns associated with designer nuclease technology, whole transcriptome analysis via RNA-seq was carried out following delivery of a CCR5-specific

DEM into CD4+ cells. In addition to CCR5, 84 genes showed differential expression (more than 1.5-fold) but an evaluation of potential DEM off-target sites revealed that this was not as a result of DEM off-target binding. To evaluate possible changes in chromatin accessibility, ATAC-seq was carried out in cells which received the CCR5-specific DEM.

Reduced chromatin accessibility was observed in the region in direct proximity to the target site. In addition, 324 additional sites showed reduced chromatin accessibility but these regions did not correlate with the differentially-expressed genes identified via RNA-seq.

Finally, evaluation of DNA methylation at the top 10 computationally-predicted off-target sites revealed no significant differences between DEM- and control-treated cells for all but one site which was later shown to be intergenic and inaccessible in CD4+ cells. Therefore,

x Abstract this study establishes DEMs as a powerful epigenome editing tool and provides a possible next step towards developing a safe therapy to treat HIV infection.

11 Introduction

1. INTRODUCTION

1.1 HIV BURDEN AND EPIDEMIOLOGY

The human immunodeficiency virus (HIV) is a major global health concern and it is estimated that approximately 36.7 million people were living with HIV and 1 million people had died of acquired immune deficiency syndrome (AIDS)-related illnesses in 2016 (1)

(Figure 1.1). AIDS is the name given to the disease state caused by HIV infection and is typically characterized by opportunistic infections such as tuberculosis (2). HIV can be spread through exposure to infected bodily fluids for example via sexual contact, needle sharing amongst drug users and from mother to child during pregnancy, delivery and breastfeeding (3). The virus exists as HIV-1 or HIV-2, originating from apes and sooty mangabey monkeys respectively (4). HIV-1 comprises four groups (M, N, O and P) of which

M is the pandemic form of the virus which has been widespread globally. The N, O and P groups are less prevalent and are collectively restricted to Cameron, Gabon and the neighbouring countries (5-7). HIV-2 is largely restricted to West Africa (2) and is less pathogenic compared to HIV-1 with only 20-25% of infections developing to AIDS (8).

Possible reasons for this phenomenon include decreased viral replication as well as enhanced host immune control however the exact mechanisms are still to be elucidated (9). In 2016

Africa accounted for approximately 69% of HIV infections (10) thus highlighting the devastation caused by this virus in this region in particular and highlighting the urgent need for effective treatment strategies.

12 Introduction

Figure 1.1: HIV prevalence in 2016

The World Health Organization (WHO) reported that 36.7 million people globally were infected with HIV at the end of 2016. Amongst adults aged 15-49 the global prevalence was 0.8%. In Africa nearly 1 in every 25 adults was living with HIV. Adapted from WHO Global Health Observatory (GHO) data.

1.2 HIV LIFE CYCLE AND TROPISM

HIV is a lentivirus, from the family Retroviridae, and possesses an RNA genome contained within a capsid core particle. Viral particles are enveloped by a lipid bilayer derived from the host cell membrane and which contains two viral glycoproteins: gp120 and gp41. In addition to macrophages, monocytes and dendritic cells, HIV primarily targets activated

CD4 T lymphocytes (2). The viral life cycle begins with the binding of gp120 to the CD4 receptor and to either the CCR5 or CXCR4 co-receptor (Figure 1.2). This binding facilitates the fusion of the viral particle with the target cell membrane and allows for the release of the viral core and enzymes including reverse transcriptase, integrase and protease into the cytoplasm. Following viral entry, the reverse transcriptase transcribes the single-strand viral

RNA to full-length double-strand DNA (11). A pre-integration complex comprising the

13 Introduction newly generated viral DNA is directed to the nucleus where integration of the reverse- transcribed DNA into the host genome takes place via the viral integrase (10). Subsequently, the integrated viral DNA is transcribed and translated giving rise to new viral RNA and which translocate to the cell surface and assemble to form new virus particles which bud off as immature virions. Further processing of the structural polyprotein occurs via the viral protease enzyme ultimately resulting in the assembly of an infectious viral particle (12).

Following viral infection, the depletion of CD4+ cells which occurs is one of the hallmarks of HIV infection and is thought to occur via several mechanisms including direct viral attack, apoptosis through chronic immune activation and pyroptosis of abortively-infected cells

(13)(reviewed in (14)).

Figure 1.2: HIV life cycle

Following binding of the virion to the CD4 receptor and co-receptors, fusion with the host cell membrane occurs allowing the viral core and enyzmes to be released into the cell. The viral capsid undergoes uncoating then the viral RNA is reverse transcribed into DNA. The pre-integration complex is formed and translocates to the nucleus where integration of viral DNA into the host genome occurs. Transcription and translation give rise to new viral RNA and proteins which translocate to the cell surface and are assembled to form immature virions. Maturation occurs through processing of a structural polyprotein and new infectious virions are released. Source (15).

14 Introduction

As already mentioned, viral entry requires the presence of the CD4 receptor as well as the

CCR5 or CXCR4 co-receptor. HIV strains can use either the CCR5 or the CXCR4 receptor and are referred to as R5/M- or X4/T-tropic respectively. The M and T-tropic classification came from early in vitro observations that R5 viruses infected primary cultures of macrophages and lymphocytes whereas X4 isolates also infected T cell lines (16) however this did not take into account that all primary isolates replicate in activated, primary CD4+ T lymphocytes (17). Some strains have the capacity to use both receptors for entry and are thus referred to as dual tropic (R5/X4). It is known that early after infection HIV-1 strains typically use the CCR5 receptor and that X4 tropic strains emerge at later disease stages.

This switch from R5 to X4 tropism is not yet fully understood but has been associated with a faster decline in CD4 T cells and accelerated disease progression (18).

1.3 HIV TREATMENT

The immune system is not capable of efficiently clearing HIV infection due to the integrated provirus which acts as a permanent template for the production of infectious virions. In addition, the provirus may become latent and remain undetectable by the host immune system for several years (19). Furthermore, the virus has a very high genetic variability resulting in strains which are significantly different to the original lineage (20). This diversity can be attributed to high rates of replication, recombination and mutation. Notably, the viral reverse transcriptase is highly error-prone (>10-4-10-5 mutations//cycle) and does not possess proof-reading activity (21).

15 Introduction

The current treatment for HIV involves a combination of antiretroviral drugs and is referred to as antiretroviral therapy (ART). These drugs are targeted to different processes which occur during the HIV life cycle and a typical regimen will include two nucleoside reverse transcriptase inhibitors and one non-nucleoside reverse transcriptase, protease or integrase inhibitor (2). In 2017 the WHO reported that approximately 20.9 million people had access to treatment compared to 17.1 million in 2015 (1). While ART has had a positive impact on the morbidity and mortality related to HIV infection there remain several concerns which must be addressed. These include the side effects and high cost associated with treatment, the emergence of viral resistance as well as the inability to eradicate latent viral reservoirs which persist throughout infection with the virus (22). In addition, patient adherence to treatment is an important factor which will influence the prognosis of HIV-related disease.

It is evident that novel treatment strategies for HIV are necessary which will allow for the sustained management of the virus with limited side effects.

1.4 CCR5 AND CXCR4 AS TARGETS OF ANTI-HIV THERAPY

The CCR5 and CXCR4 co-receptors which are necessary for HIV entry into host cells have gained interest as possible targets for antiviral therapeutics. CCR5 is involved in the regulation of cell migration and local immune activation whereas CXCR4 is involved in several crucial processes including stem cell migration, organogenesis and neuronal development (23, 24). In 1996 Feng and colleagues (25) demonstrated that recombinant

CXCR4 enabled CD4-expressing nonhuman cells to support T-tropic HIV infection. A few months later Deng and colleagues observed that the natural ligands of the CCR5 receptor, regulated-upon-activation, normal T expressed and secreted (RANTES), macrophage

16 Introduction inflammatory protein 1α (MIP-1α) and MIP-1β, inhibited infection by M-tropic HIV strains

(26). From these two key discoveries the roles played by CCR5 and CXCR4 in HIV-1 infection were revealed.

Further evidence for the contribution of CCR5 in HIV infection came in the form of the discovery of a 32 bp deletion in the CCR5 open reading frame (CCR5 Δ32) that encodes a non-functional protein. The homozygous CCR5 Δ32 mutation is present in 1% of individuals of European descent but is rare in persons of African or Asian descent (27). It has been found that HIV disease progresses more slowly in individuals with one copy of the Δ32 mutation

(CCR5+/-) compared to the wild-type (CCR5+/+) (28). Even more interestingly, individuals that are homozygous for the mutation (CCR5-/-) have been found to be largely resistant to

HIV infection (29). Therefore, CCR5 has become a target for various anti-viral therapies against HIV.

Success has already been observed with maraviroc, a non-competitive inhibitor of CCR5, which results in the inhibition of HIV replication and has been approved for use in the treatment of HIV-infected patients (30). However, the biggest break-through to date is the case of the “Berlin patient”, which involved the permanent elimination of CCR5 and has become the only documented cure of HIV infection. In this case, a man diagnosed with acute myeloid leukemia and diagnosed with HIV more than 10 years prior received an allogeneic stem cell transplantation using cells from a donor homozygous for the CCR5 Δ32 mutation.

As a result, engraftment of CCR5 negative cells occurred and the man remains free of detectable HIV infection despite discontinuing ART (27). Although undoubtedly an exciting step forward in the cure of HIV this approach has several limitations. There are concerns

17 Introduction that eradicating CCR5 will not provide absolute protection against HIV infection as infection with the X4 tropic strains can still occur. In the case of the Berlin patient, the presence of

X4-tropic virus was detected prior to transplantation but HIV-1 virus could not be detected following treatment most likely due to a low number of infectious particles (27). In addition, there are significant related morbidities associated with allogeneic stem cell transplantation thus restricting its widespread use apart from in patients with AIDS-associated malignancies.

Furthermore, homozygous CCR5 Δ32 stem cell donors are very rare and must still be HLA- matched with potential recipients therefore this would not be a viable option for the majority of HIV-infected patients (31). It is also worth noting that the patient himself was heterozygous for the Δ32 mutation prior to treatment further highlighting the complexity and somewhat impractical nature of this particular strategy. Therefore, there is an urgent need for alternative approaches in which this HIV-resistance phenotype can be induced artificially.

1.5 DESIGNER NUCLEASE TECHNOLOGY

Designer nucleases have been a topic of great interest over the past few years owing to the exciting ability to specifically and permanently modify the genomes of complex organisms that they provide. This technology involves the use of engineered nucleases which consist of sequence-specific DNA-binding domains fused to a DNA cleavage domain (32). These artificial enzymes introduce double-strand breaks (DSBs) at pre-determined loci in the genome thus activating the non-homologous end joining (NHEJ) or the homologous recombination (HR) DNA repair pathways (33) (Figure 1.3). NHEJ is an error prone mechanism which can occur throughout the cell cycle and functions by modifying and ligating the broken DNA ends with little to no homology thus introducing insertions and

18 Introduction deletions (indels) at the break site. Indels may result in frameshifts and disruption of the targeted gene (34). In contrast, homologous recombination is mainly restricted to the late S and G2 phases of the cell cycle and involves the use of the sister chromatid or homologous as a template for DNA repair resulting in the error-free reconstitution of the original sequence (35). It was shown that inducing a DSB increases the efficiency and frequency of HR therefore the co-delivery of a designer nuclease with a homologous donor sequence can be used to induce mutations, insertions or deletions at the target site (36).

Furthermore, site-specific integration into ‘safe harbour’ sites in the genome reduces the risk of insertional mutagenesis (37). The designer nucleases currently being used for genome editing and which have received the most attention in recent years include the zinc finger nucleases (ZFN) and transcription activator-like effector nucleases (TALENs). The most recent development in the realm of designer nuclease technology came in the form of RNA- guided endonucleases (RGENs), specifically the clustered regularly interspaced short palindromic repeats/CRISPR associated (CRISPR/Cas) system which has gained widespread use and popularity for various applications.

NHEJ HR

No donor DNA + donor DNA + donor DNA + donor DNA

+ + +

(a) Gene knockout (b) Gene insertion (c) Gene modification (d) Gene insertion

Figure 1.3: Genome editing with designer nucleases

Following nuclease-mediated site-specific cleavage, double-strand breaks in the DNA can be repaired by the error prone non-homologous end-joining pathway (NHEJ) or via homologous recombination (HR). NHEJ often results in the introduction of insertions and deletions (a) which can lead to gene disruption and knockout. If donor DNA is available (b) insertion can also occur. Donor DNA can be used to achieve either specific gene modification (c) or insertion (d) via HR. Adapted from (38).

19 Introduction

1.5.1 Zinc finger nucleases

ZFNs are considered the first targetable nucleases and have become an important tool for biological research with vast therapeutic potential. ZFNs were first generated by Kim and colleagues in 1996 and comprised of a sequence-specific array of synthetic zinc finger motifs fused to the nuclease domain of the FokI restriction enzyme (39) (Figure 1.4). They are based on zinc finger proteins (ZFPs) which consist of tandem repeats of the commonly occurring

C2H2 zinc finger DNA-binding domain (40). Earlier studies showed that each zinc finger recognizes three base pairs and dimerisation of two ZFN subunits separated by a spacer of

5-7 bp results in the cleavage of target DNA (41-43). Typically, 3-6 zinc finger proteins are used to generate each subunit therefore allowing targeting of sequences 18-36 bp in length.

In later developments of the ZFN platform heterodimeric FokI domains were used to eliminate homodimer-induced cleavage and in this way reduced off-targets effects and ZFN- induced genotoxicity (44). However, the use of ZFNs has been somewhat limited owing to restrictions which include poor targeting density due to design constraints and the difficulty of generating highly specific and efficient ZFNs (45, 46).

Figure 1.4: Zinc finger nuclease structure

Each ZFN comprises 3-6 zinc finger proteins fused to the FokI cleavage domain. The dimerization of the two subunits (separated by a 5-7 bp spacer) results in DNA cleavage. Adapted from (32).

20 Introduction

1.5.2 Transcription activator-like effector nucleases

Following the emergence of designer nucleases in the form of ZFNs, came the exciting discovery of a new platform of DNA-binding proteins. In 2009, transcription activator-like effector (TALE) proteins identified in plant pathogenic bacteria of the genus Xanthomonas were revealed and have since been widely exploited for genome editing applications (47,

48). They typically comprise of an N-terminal translocation domain, central repeats which bind to DNA in a sequence-specific manner and an activation domain as well as a nuclear localization signal in the C-terminus. The TALE DNA-binding domain typically consists of

15.5-19.5 repeats each of which consists of 33-35 highly conserved amino acid residues.

Polymorphisms at position 12 and 13 of each residue, termed the repeat variable di-residues

(RVD) determine DNA specificity with each RVD binding to one base pair (47). This allows for the prediction of the target sites of natural TALEs or the design of synthetic TALEs which bind to specific sequences. TALEs can also be fused to a nuclease domain giving rise to transcription activator-like effector nucleases (TALENs) (Figure 1.5) which exhibit similar functionality as observed with ZFNs [reviewed in (49, 50)]. TALENs are generally favoured over ZFNs as they have comparable nuclease activity but a superior toxicity profile and minimal off-target activity (51). In addition, their 1:1 base recognition offers greater flexibility and versatility compared to ZFNs. The limitations associated to TALENs can be attributed to cloning and delivery difficulties due to their repetitive sequence and size as well as their 5’ T nucleotide binding requirement (32). Derivatives of TALENs, called megaTALs, which comprise of a TALE DNA-binding domain fused to a sequence-specific homing endonuclease (also known as meganuclease) have also been developed and used for various applications (52).

21 Introduction

Figure 1.5: TALEN structure

Each TALE comprises 15.5-19.5 repeats each of which is made up of 33-35 highly conserved amino acids. The TALE DNA binding domain is fused to the FokI nuclease domain forming the TALEN. DNA recognition is through the amino acids at positions 12 and 13 known as the repeat variable diresidues which enable binding to one base pair. Adapted from (32).

1.5.3 RNA-guided endonucleases

In 2012 Jinek and colleagues presented the most recent contender in the field of genome engineering by harnessing an adaptive immune defense mechanism found in bacteria and archaea. In the type II CRISPR/Cas system, short segments of foreign DNA known as protospacers are incorporated into the CRISPR loci then transcribed and processed into short

CRISPR RNAs (crRNAs). These can then form a complex with trans-activating crRNAs

(tracrRNAs) thus facilitating the RNA-guided and sequence-specific cleavage of DNA by the Cas9 endonuclease. It was also found that the two RNAs can be replaced by a chimeric sgRNA (single guide RNA) which comprises the functional components of both the crRNA and tracrRNA (Figure 1.6). In addition, Cas9 target recognition requires a ‘seed’ sequence in the crRNA as well as a protospacer adjacent motif (PAM) at the 3’ end of the crRNA- binding site (53). Compared to ZFNs and TALENs, RGENs possess additional desirable characteristics including simpler design and the possibility of multiplexing (54). The most widely used Cas9 from Streptococcus pyogenes (SpCas9) has a NGG PAM requirement and while this restricts its use, other Cas9 homologs with different PAM requirements have also been implemented (reviewed in (55) ). A major limitation associated with RGENs is their

22 Introduction variable and unpredictable specificity, as shown in some cases by their higher activity at off- target compared to on target sites (56), however, extensive efforts have been directed towards improving the specificity of this platform (reviewed in (55)).

Figure 1.6: CRISPR/Cas9 structure

This system consists of the CRISPR-associated protein 9 (Cas9) and either a CRISPR RNA (crRNA) and trans- activating RNA (tracrRNA) or a chimeric single guide RNA (sgRNA). The guide sequence in the tracrRNA or sgRNA is complimentary to the target site which is adjacent to the PAM motif required for Cas9 DNA recognition. Adapted from (57).

1.6 HIV GENE THERAPY

As a novel treatment approach, a gene therapy strategy against HIV infection has been proposed where autologous CCR5-/- stem cells are developed such that some of the limitations associated with allogeneic stem cell transplantation are circumvented. Therefore, the ideal situation would be one in which a patient’s own stem cells are isolated, CCR5 is

23 Introduction disrupted or knocked out and the modified cells are transplanted back into the patient where engraftment will occur resulting in the reconstitution of T cell immunity with CCR5-/- cells and the eradication of and/or resistance to HIV infection. A study conducted by Nansen and colleagues (58) revealed that due to the redundancies in the immune system, the expression of CCR5 is not critical for T cell-mediated antiviral immunity therefore making it amenable to disruption-based strategies.

The most exciting work with designer nucleases targeting HIV was carried out by Perez and colleagues (59) in 2008. In this study they used ZFNs to target CCR5 in an attempt to artificially recreate the effects of the Δ32 mutation. They found that ZFN-targeted disruption provided stable and heritable protection against HIV-1 in vitro as well as in a mouse model of HIV infection. In addition, they demonstrated a selective advantage of the modified cells in the presence of the virus. Further studies have been carried out targeting CCR5 in CD4+

(60, 61), hematopoietic stem cells (62-64) as well as in human embryonic stem cells

(hESCs) and induced pluripotent stem cells (hiPSCs) (65). Following this success, a Phase

1 clinical trial was conducted by Sangamo Biosciences (NCT00842634) (66) in which they investigated autologous T-cells genetically modified at the CCR5 gene using ZFNs. Their primary goal was to establish the safety of such an approach while secondary goals included assessing immune reconstitution and HIV resistance. They noted an increase in CD4 cell count one week after infusion and a significantly reduced loss of CCR5-modified CD4 cells compared to the unmodified cells during treatment interruption. HIV RNA was found to be undetectable in one out of four patients and HIV DNA in the blood was reduced in most patients. In addition, the patient exhibiting the longest delay in viral rebound during treatment interruption was later found to be heterozygous for the Δ32 mutation. One adverse reaction was recorded and was attributed to transfusion-related effects. The small size of

24 Introduction their cohort (12 patients) prevents definitive conclusions from being made but they found this approach to be generally safe under the conditions investigated. Other similar clinical trials are in progress or are completed but so far unpublished (reviewed in (67)). In contrast to the CCR5 disruption strategies, there have also been studies in which ZFNs targeting the

HIV LTRs were successfully used to excise proviral DNA from infected and latently infected

T cells (64, 68).

Similarly, TALENs have also been used to target CCR5 in CD4 T cells (69) and hiPSCs (70) as well as to target the integrated provirus itself (71, 72). However, unlike ZFNs, TALE design modifications can enable binding to methylated DNA (73) which is important when targeting the integrated HIV provirus. The CRISPR/Cas system has been used in a similar fashion to target the integrated provirus (74, 75) and CCR5 in CD4 T cells and hematopoietic stem cells (76) as well as in, hiPSCs (77, 78). Additionally, a novel approach using this platform involves the use of a catalytically inactive Cas9 in combination with a synergistic activation mediator in order to reactivate the latent viral reservoirs. When carried out in combination with ART, viral reactivation would result in the induction of viral cytopathic effects, immune clearance and cell death whereas uninfected cells would be unaffected and protected (79). It is worth noting that in the context of HIV, TALENs and CRISPR/Cas9 have as yet been implemented to a lesser extent compared to ZFNs (reviewed in (80)) but this is likely to change in the coming years owing to the merits of these two platforms in comparison to ZFNs.

25 Introduction

The simultaneous targeting of CCR5 and CXCR4 with the ZFN and CRISPR/Cas9 platforms has been demonstrated highlighting the feasibility of generating cells resistant to both M and

T tropic viruses (81, 82). Didigu and colleagues also demonstrated that edited cells were capable of engraftment and normal trafficking to the spleen (81). However, due to its role in various crucial processes mice lacking CXCR4 die in utero (83). Therefore, designer- nuclease-mediated targeting of CXCR4 has as yet only been performed in CD4 T cells (84-

86).

Despite the great strides made possible through the emergence and development of designer nuclease technology, there are still serious obstacles hindering the clinical advancement of this approach. Two of the major obstacles are discussed below.

1.7 OFF-TARGET EFFECTS

One of the greatest challenges currently being addressed in the field of genome editing is the activity of these nucleases at unintended genomic sites (known as off-target sites) which share with the intended on-target site. As previously mentioned, cleavage at these off-target sites followed by indel formation can result in gene inactivation or mutation. Nuclease-mediated off-target activity can have serious implications including cellular toxicity, impaired cellular functionality and fitness, as well as increasing the oncogenic potential of the target cells (87). In addition, the activity of designer nucleases at multiple sites in the genome can also result in chromosomal rearrangements such as inversions and deletions (88) as well as translocations (89) if DSBs are induced on different .

26 Introduction

As a first step towards reducing off-target activity, the choice of target sites can have a profound impact on the likelihood of off-target activity. To this end, bioinformatic tools have been developed to aid with target site selection (reviewed in (90)). These tools are based on three principles: 1) choice of target sites/nuclease design, 2) genomic searches for all possible off-targets and 3) determining the level of on- and off-target cleavage rates.

However, due to factors including target site accessibility and tolerance of base mismatches, the computational prediction of CRISPR/Cas9 off-targets has proven to be particularly difficult therefore further hindering the advancements with this tool for clinical applications.

In the context of all three platforms, great effort has been invested towards increasing specificity and thus decreasing or eliminating these deleterious effects. In addition to target- site based considerations, structural based approaches have been used including the development of improved ZFN and TALEN heterodimers to prevent cleavage by single monomers (90). Additionally, FokI modifications have been carried out generating

‘nickases’ which stimulate HR without activating the NHEJ pathway (91). Cas9 nickases have also been generated by introducing mutations into one of its two nuclease domains generating a partly inactivated Cas9 which can induce single strand breaks in the DNA.

Furthermore, using paired Cas9 nickases co-delivered with two gRNAs results in two single- strand breaks on opposite DNA strands and reduced indel formation activity (92). The functionality of the CRISPR/Cas9 system can also be enhanced by introducing chemical modifications to the gRNAs. Hendel and colleagues demonstrated improved efficacy of genome editing in primary cells and attributed these effects to the increased intracellular stability achieved via gRNA end-modifications (93). Ryan and colleagues (94) then went on to show that by using chemically modified gRNAs they could decrease the frequency of

CRISPR/Cas9-mediated off-target indels without sacrificing on-target activity. They found

27 Introduction that the modifications which they introduced destabilized proximal base pairing and proposed that this could affect the dynamics of guide sequence hybridization when comparing on- and off-target sites. In addition, the truncation of gRNAs has been found to have a positive effect on Cas9 specificity possibly due to increased sensitivity of the RGEN-

DNA complex to mismatches (95).

1.8 DELIVERY

Ultimately, the efficiency of genome editing relies on the successful delivery of the designer nucleases to the relevant cells. The three platforms described above are characterized by macromolecules which do not readily enter cells therefore their delivery remains both a challenge and a topic of great interest. The delivery methods currently being used can either be classified as viral or non-viral involving the use of viral vectors such as lentiviruses, adenoviruses and adeno-associated viruses (AAVs) or methods such as electroporation and the use of cationic lipids respectively.

The popularity of viral vectors can be attributed to their properties and the possibilities which they offer. Lentiviruses can transduce non-dividing cells and lead to integration of viral DNA into the genome (55). In addition, they have a large packaging capacity ( 8.5kb) and have been used to package and deliver the CRISPR/Cas9 system (96). Most recently,∼ integrase deficient lentiviruses (IDLVs) have been used to achieve transient expression of ZFNs (97).

However, this method cannot be used to deliver TALENs as recombination is known to occur due to their repetitive sequences (98). On the other hand, adenoviruses can infect both dividing and non-dividing cells and do not result in viral DNA integration, but they have been shown to be immunogenic (99). In contrast, AAVs possess similar characteristics to

28 Introduction adenoviruses but do not elicit an immune response. In addition, several serotypes with different tropism have been characterized enabling the targeting of diverse cell types (55).

However, the main limitation comes from their packaging capacity ( 4.5kb) which necessitates the separate delivery of Cas9, its gRNAs and donor DNA template∼ (100). In addition, only one TALEN monomer can be packaged whereas both ZFN monomers can be delivered due to their smaller size (32).

In some cases, the use of non-viral delivery methods may be desirable, for example when only transient expression of a therapeutic agent is required. Electroporation is a technique which relies on an electrical pulse which destabilizes the cell membrane to allow exogenous

DNA or RNA to enter the cell (101). In the case of the CRISPR/Cas9 system, it has been shown that electroporation of the recombinant Cas9 protein pre-incubated with an in vitro- transcribed gRNA forming a ribonucleoprotein (RNP) results in higher editing efficiency, decreased off-target effects and less toxicity compared to DNA in haematopoietic stem and progenitor cells (HSPCs) and T cells (93, 102). The improved safety has been attributed to the rapid degradation of the RNPs which occurs after delivery thus reducing exposure time with the cells. An alternative approach is the use of cationic lipid-based reagents. This method involves the encapsulation of DNA or RNA in liposomes and has been used extensively (reviewed in (103)). However, both electroporation and nano-particle-mediated delivery can be extremely toxic and may not be suitable for all cell types (32).

29 Introduction

Using the methods described above, delivery can either be carried out in- or ex vivo (Figure

1.7). Ex vivo delivery of therapeutic agents involves the isolation of cells of interest from a patient followed by modification of these cells and transplantation back into the patient. In this approach both viral and non-viral methods can be used to modify the cells. In addition, only the target cells are manipulated and greater dose control can be achieved which may reduce off-target effects (104). However, as this method involves the removal of cells from the patient it can be problematic in cells which cannot survive or which lose essential properties outside of the body. In addition, the effectiveness of the treatment relies on the efficiency of engraftment following re-transplantation. In contrast, in vivo delivery describes the approach where the therapeutic agents are delivered directly into the patient, either systemically or via targeted local injection or with agents with a specific tropism. This approach allows for the targeting of multiple cell types including those which may not be amenable for ex-vivo manipulation (87). Viral vectors have typically been used for this approach however viral packaging capacity and tropism pose significant limitations. In addition, the issues of immunogenicity (105) and controlling the dose and distribution of the therapeutic agent are serious considerations with this method. Recently studies have been conducted in which lipid nanoparticles were used to deliver Cas9 to the mouse inner ear

(106) and into neurons (107). Such studies raise the hopes that we may one day overcome the challenges currently associated with in vivo delivery methods. In the context of HIV, ex- vivo delivery methods are currently being explored. Human clinical trials have or are currently being conducted in which CCR5 has been targeted via adenoviral or mRNA delivery of ZFNs to T cells and mRNA delivery to HSPCs (reviewed in (67)).

All in all, while designer nucleases have revolutionized the fields of biology and medicine, efficient delivery of these reagents to the relevant cells is a crucial step towards realizing

30 Introduction their full potential. In addition, there is an urgent need to increase the specificity of these nucleases and to eliminate or alleviate the effects of their off-target activity. The latter concern can be addressed by investigating strategies where the transcriptome and epigenome are targeted. Such strategies have emerged as viable alternatives to gene disruption and are discussed in the following sections.

Figure 1.7: In vivo vs Ex vivo gene editing

Ex vivo therapy involves the isolation of cells from a patient, the modification using a therapeutic agent and infusion back into the patient. In in vivo therapy, the therapeutic agent is delivered directly to the patient either systemically or using local injection or via agents with a specific tropism. Viral delivery of the therapeutic agent is depicted. Adapted from (108).

1.9 EPIGENETIC REGULATION 1.9.1 Gene expression

Gene expression is the name given to the flow of information from DNA to RNA to protein.

The cellular processes involved in gene expression include transcription, mRNA processing and translation and protein degradation. These steps can be modulated thus allowing gene expression to be regulated. Additionally, the set of genes expressed within a cell determine its identity and function. Therefore, the gene expression profile of a specific cell includes

31 Introduction housekeeping genes, which are expressed in most cells, as well as cell-type-specific genes.

All in all, the regulation of gene expression is extremely important and misregulation in gene expression can lead to conditions which include cancer, autoimmunity and developmental disorders (109). Controlling gene expression at the level of transcription was one of the main aims of this study and is discussed in detail in the following sections.

1.9.2 Transcriptional regulation of gene expression

Transcription regulation in eukaryotes is a complex process which is determined by the interplay between RNA polymerases and the transcription factors and coregulators which control their activity during transcription initiation and elongation. This includes the proteins and enzymes which contribute to the highly dynamic nature of chromatin structure through processes such as DNA methylation, histone modifications and nucleosome localization

(110). Transcription of protein-coding genes in eukaryotes is carried out by the RNA polymerase II and such genes typically contain two variants of cis-acting transcriptional regulatory elements: 1) a promoter composed of a core promoter and proximal regulatory elements and 2) distal regulatory elements such as enhancers and silencers. Both these elements contain recognition sites for activating and repressive trans-acting DNA binding transcription factors (111). In addition, actively transcribed genes are generally associated with an accessible chromatin state which allows the binding of activating transcription factors. Therefore, modifications to the DNA which affect chromatin accessibility will consequently have an impact on gene expression.

32 Introduction

1.9.3 Targeted transcription activation

During transcription initiation, general transcription factors (GTFs) assemble on the core promoter forming a pre-initiation complex (PIC) which directs RNA polymerase II to the transcription start site (TSS). However, the assembly of the PIC on the core promoter only drives basal levels of transcription and transcription levels can be stimulated by the action of transcriptional activators (111). Transcription activators are commonly sequence-specific

DNA-binding proteins which bind at specific recognition sites near the core promoter and at enhancers (112). In addition to a DNA-binding domain they also comprise of an activation domain such as VP16, a viral activation domain which recruits Pol II transcription machinery

(113). Apart from promoting transcription initiation, activators are also required for transcription elongation in the event that RNA polymerase is paused at the proximal promoter as has been observed with some genes (114).

In a therapeutic context, transcriptional activators have been used to achieve targeted activation and may provide the key to developing novel treatment strategies. For example,

Onori and colleagues (115) successfully demonstrated the upregulation of the dystrophin- related gene utrophin using a zinc-finger-based activator. Dystrophin is a cytoskeletal protein which is absent in the X-linked muscle degenerative disease Duchenne muscular dystrophy. VP64, a tetramer of VP16 has been successfully used in conjunction with the

DBDs derived from ZFNs (116), TALENs (117) and the CRISPR/Cas9 system (118) (Figure

1.8). The p65 subunit of the NF-κB complex can also be used to induce activation of endogenous genes and has also been used with all three platforms (113) . To generate such constructs the nuclease domain is replaced with the activation domain or a catalytically inactive Cas9 is used in the case of CRISPR/Cas9. The ‘dead’ Cas9 (dCas9) is generated by introducing silencing mutations in the RuvC1 (D10A) and HNH (H841A) nuclease domains

33 Introduction this creating a protein which can bind but not cleave DNA (119). However, it has been observed that multiple TALE- and CRISPR-based activators must be recruited to a single locus in order to achieve high levels of gene activation (120, 121).

1.9.4 Targeted transcription repression

Transcriptional repression is the name given to the process through which transcription is inhibited leading to the downregulation of gene expression. Transcriptional activation and repression are therefore both necessary to either increase or decrease the expression of a gene as required. Not all genes are actively transcribed at any given time and expression is variable depending on an organism’s stage of development and in response to extracellular signaling molecules. Various classes of repressors are known including those which bind to

DNA such as the Mad repressor and those which bind to DNA-binding proteins such as the

Methyl-CpG-binding domain protein 3 (MBD3) (122). In general, mammalian transcriptional repressors are often classified as passive or active. Passive repressors do not possess silencing capability but instead effect repression through competition with transcription activators for DNA binding or by forming inactive complexes with activators and preventing their interaction with DNA or coactivators. In contrast, active repressors can themselves induce gene silencing, often through altering chromatin conformation (123).

As is the case with activation, repression can also be induced artificially with far-reaching therapeutic implications. The silencing approach involving RNA interference (RNAi) has been widely used for various applications. RNAi involves mRNA degradation and translation suppression via small interfering RNAs which are homologous to the target

34 Introduction mRNA. In the context of cancer, for example, the biotechnology company Gradalis developed a cancer vaccine comprising of the immune modulator granulocyte-macrophage colony-stimulating factor (GM-CSF) and a short hairpin RNA (shRNA) targeting furin. The

GM-CSF is used to stimulate an immune response against the patient’s tumour antigen whereas furin was chosen as a target because it is involved in the maturation of transforming growth factor beta (TGF-β), an immunosuppressant overexpressed in cancer (124). Using this strategy Gradalis has progressed as far as human clinical trials for various cancers including a Phase III trial for high-risk ovarian cancer which is currently underway

(NCT02346747). Apart from cancer, RNAi is being exploited against ocular disease, high cholesterol, as well as infectious diseases such as the Hepatitis B virus (HBV), the Hepatitis

C virus (HCV) and Ebola (reviewed in (124))

Despite the significant progress in recent years, RNAi-based approaches are associated with off-target effects (125) and there is the risk of cytotoxicity due to the oversaturation of the endogenous RNAi pathway (126). Therefore, there is a need for therapeutic strategies enabling gene silencing with minimal off-target effects. The emergence of designer nuclease technology has enabled the targeting of transcriptional repressors to specific DNA sequences and stable gene repression with silencing efficiency comparable to RNAi and therefore offers a viable alternative (119). As previously mentioned, one mechanism by which this can occur is through the direct steric hindrance of the transcription machinery. This was demonstrated by Qi and colleagues (119) upon co-expressing a catalytically inactive/dead

Cas9 protein (dCas9) with a guide RNA. They found that a DNA recognition complex was generated which could interfere with the transcription machinery. They named this system

CRISPR interference (CRISPRi) and also demonstrated the silencing of targeted genes in

Escherichia coli in a specific and reversible manner. In addition, they showed that this

35 Introduction technology is amenable to multiplexing and for use in mammalian cells. Another repression mechanism involves the use of a silencing domain such as the Krüppel-associated box

(KRAB) repressor domain to induce chromatin condensation (Figure 1.8). Specifically, the

KRAB domain initiates a heterochromatin-forming complex which includes the histone methyltransferase (HMT) SET domain bifurcated 1 (SETDB1) and the histone deactylase

(HDAC) nucleosome remodeling and deacetylase (NuRD) complex thus resulting in more potent gene silencing compared to steric hindrance alone (113). By coupling the dCas9 to the KRAB domain, Gilbert and colleagues (119) could achieve highly specific and stable repression of gene expression in human and yeast cells. Mussolino and colleagues (127) have also demonstrated the successful coupling of zinc finger DNA binding domains to the

KRAB domain to achieve robust transcriptional repression of gene expression in a mouse model. Similarly, Cong and colleagues (128) investigated fusions of TALE DNA binding domains with various repressor domains, including KRAB, and could demonstrate efficient repression of mammalian genes.

Figure 1.8: Designer nuclease-mediated activation and repression

A DNA-binding domain can be fused to activator and repressor domains enabling the modulation of gene expression. The catalytically inactive dCas9 is depicted here. Adapted from (119, 129).

36 Introduction

1.9.5 Epigenome editing

The term ‘epigenetics’ is often translated as ‘outside the field of genetics’ and is used to describe the changes in gene expression which occur without changing the DNA sequence.

Epigenetic mechanisms which have been reported include DNA methylation, histone modifications and non-coding RNAs. In order to understand the significance of epigenetics in the processes regulating gene expression, an understanding of chromatin structure and the mechanisms which control its structure and remodeling is required.

DNA in the eukaryotic nucleus is packaged into highly organized chromatin with each diploid cell containing approximately 2m of DNA (130). This DNA must be accessed as required therefore chromatin biology is characterized by the dynamic balance between genome packaging and genome access to allow processes such as transcription and replication to occur (131). The basic unit of chromatin is known as the nucleosome and arises from the wrapping of 145-147 bp of DNA around an octameric complex forming the

‘nucleosome core particle’. This octamer is made up of two units of each of the histone proteins H2A, H2B, H3 and H4. H1 is a linker histone which binds outside of the nucleosome core complex forming a full nucleosome or chromatosome and stabilizes higher-order chromatin structures. Nucleosomes are found approximately every 240 bp and form a characteristic ‘beads on a string structure’ with the DNA (132). Although necessary, nucleosomes generally impede transcription of DNA by preventing access to the transcription machinery. Furthermore, RNA polymerases need to move through the histone during transcription and DNA polymerases must cope with this structure during replication.

Therefore, specific factors are required to reposition and restructure the nucleosomes as required (133). In addition to the basic structure of the nucleosome, histones can also carry several post-translational modifications which can influence chromatin compaction and

37 Introduction accessibility (Figure 1.9). These modifications include acetylation, methylation, phosphorylation, ubiquitinylation, sumoylation and deamination and are typically found on the N- and C-terminal histone tails which protrude from the nucleosome core (134). Histone modifications affect several interactions including histone-histone and histone-DNA interactions therefore they can facilitate the aforementioned processes by changing the structure and stability of the nucleosome (133).

Two of the most common modifications are histone acetylation and methylation. Histone acetylation was the first histone modification discovered and studies soon revealed a relationship between actively transcribed genes and hyperacetylation (135) thus suggesting a role for histone acetylation in facilitating transcription. Histone acetylation neutralizes the positive charge of lysine residues and in this way weakens the interactions within the nucleosome thus resulting in a more relaxed chromatin conformation and increasing DNA accessibility to the transcription machinery (136). This modification is carried out by histone acetyltransferases (HATs) and removed by histone deacetylases (HDACs). In contrast, histone methylation has no effect on the overall charge but rather influences the hydrophobic character and size of the modified residues (133). This modification either leads to mono-, di- or trimethylation of lysine (K) residues or mono- or dimethylation of arginine residues

(R). Both activation (e.g. H3K4me) and repressive (e.g. H3K9me) modifications are known.

It has been suggested that H3K4me promotes H3/H4 acetylation through the recruitment of

HATs also indicating that there is crosstalk between these different processes. On the other hand, H3K9me is associated with constitutive heterochromatin formation and acts in positive reinforcement with DNA methyltransferases which induce de novo DNA methylation.

38 Introduction

Figure 1.9: The epigenetic mechanisms affecting gene activity

Eukaryotic DNA is packaged as chromatin of which the basic unit is known as the nucleosome. Nucleosomes comprise of DNA wound around an octameric histone core. Post translational modifications of the histone tails result in the compaction or relaxation of chromatin thus affecting chromatin accessibility and either preventing or allowing access to the transcription machinery. Adapted from (137).

1.9.6 DNA methylation

DNA methylation is one of the most well characterized epigenetic mechanisms and refers to the process in which a methyl group is transferred from S-adenosylmethionine to the position

5 carbon in a pyrimidine ring (138). It is involved in important developmental processes including X-chromosome inactivation and genomic imprinting. DNA methylation is carried out by the DNA methyltransferases (DNMTs) and typically occurs on cytosine bases in cytosine-guanine (CpG) diresidues. CpGs can be found clustered in ‘CpG islands’ which are found in the promoter regions of approximately 60% of genes (139).

39 Introduction

The encodes five DNMTs: DNMT1, DNMT2, DNMT3A, DNMT3B and

DNMT3L (Figure 1.10). Of these five, DNMT 3A and 3B carry out de novo DNA methylation whereas DNMT1 is known to be responsible for the maintenance of methylation due to its preference for hemimethylated DNA (140). In contrast, DNMT2 shows only weak

DNMT activity in vitro whereas DNMT3L does not possess DNMT activity but interacts with the DNMT1 and DNMT3 and modulates their activity (141). DNMTs generally consist of an N-terminal regulatory domain and a C-terminal catalytic domain. DNMT1 contains additional N-terminal domains including the DNMT1-associated protein 1 (DMAP1) required for molecular interactions and the replication foci targeting sequence (RFTS) which mediates targeting to replication foci and facilitates replication-dependent DNA methylation maintenance. The CXXC domain is required for binding to unmethylated DNA while the function of the bromo-adjacent homology (BAH) domains is still to be determined. The Pro-

Trp-Trp-Pro (PWWP) and ATRX-DNMT3-DNMT3L (ADD) domains found in DNMT3 are required for chromatin interactions (140).

Figure 1.10: The DNA methyltransferases

De novo methylation is carried out by DNMT3A and 3B and DNMT1 is responsible for the maintenance of methylation following DNA replication. DNMT3L does not possess DNMT activity but interacts with and modulates the activity of the DNMT3A and 3B. Adapted from (138, 142).

40 Introduction

Keshet and colleagues (143) where among the first to show that a high level of DNA methylation results in condensed, inaccessible heterochromatin and consequent gene silencing whereas a low level of DNA methylation is often associated with accessible chromatin and actively transcribed genes. Indeed, aberrant DNA methylation has been found to result in the silencing of tumour genes leading to various cancers (reviewed in (144)).

Two mechanisms by which DNA methylation inhibits transcription have been proposed: 1) via the blocking of transcription factor binding, 2) by recruiting proteins such as the methyl-

CpG binding domain proteins (MBDs) which in turn recruit repressor complexes to methylated promoter regions (139). Therefore, it is clear that DNA methylation has far- reaching implications, both at the level of gene expression and in human disease.

1.9.7 Designer epigenome modifiers

As previously mentioned, DNA methylation plays a major role in human disease. On the one hand, methylation of cytosine has been shown to be a hotspot for mutation and increases the rate of C→T transition mutations. These mutations are responsible for about one third of all disease-causing mutations in the germline (139). On the other hand, tumour-suppressor gene silencing via the methylation of promoter CpG islands is the most common epigenetic mechanism contributing to cancer. This phenomenon was first described with the retinoblastoma gene RB by Greger and colleagues in 1989 (145). In contrast, DNA hypomethylation has been shown to result in the activation of genes important in cancer

(146). However, gene silencing arising from DNA methylation was a main focus of this study and is discussed in the following sections.

41 Introduction

With respect to DNA methylation, drugs such as the demethylating agent 5-AZA have been used in an attempt to reverse the effects of this modification. These drugs have gained interest in the context of cancer as they block de novo DNA methylation and can lead to the reactivation of tumor suppressor genes silenced due to aberrant DNA demethylation.

However, the use of these drugs has been limited by their toxicity and their global demethylation activity due to their lack of specificity. Therefore, there is a need for approaches which will enable the targeted modification of the epigenome with minimal toxicity.

In recent years exciting developments have come in the form of a new class of epigenome editing tools derived from combining the designer nuclease technology with the properties of transcriptional repressors and epigenome modifiers. In contrast to designer nucleases, the use of epigenome modifiers would not result in any changes to the DNA sequence. This is particularly important considering the off-target effects which can occur due to nuclease- mediated cleavage. Furthermore, while the effects of designer nucleases are permanent, it may be possible to reverse the effects of epigenome modifiers through the removal or addition of specific epigenetic marks. All in all, a therapy based on epigenome modification may result in similar phenotypic outcomes but may have a superior toxicity profile compared to designer nucleases.

This alternative strategy of epigenome editing has already been explored in the context of all three platforms. Stolzenburg and colleagues (147) used ZFPs fused to DNMT3A to achieve site-specific methylation and long-term stable repression of the SOX2 oncogene.

They also demonstrated that repression was maintained through cell generations and after

42 Introduction the suppression of ZF-DNTM3A expression. Similarly, Li and colleagues (148) used a fusion with a TALE DNA-binding domain to demonstrate that they could successfully alter the expression of a gene involved in prostate cancer metastasis. Siddique and colleagues attempted to improve the functionality of this approach by using a DNMT3a-3L fusion.

Indeed, they could show higher levels of methylation at the vascular endothelial growth factor A (VEGF-A) promoter compared with Dnmt3a alone (149) . Furthermore, several studies have demonstrated the repression of gene silencing via ZFP- and TALE-mediated histone methylation (113) thus highlighting the versatility of this approach. The

CRISPR/Cas9 system has been used extensively to attempt to dissect and understand the complexities related to the chromatin landscape. To this end, it has been used to target various epigenetic modifiers including histone methylases, demethylases and acetyltransferase to specific genomic loci. The dCas protein fused to DNMT domains was shown to result in the methylation of CpG islands as well as promoter and enhancer regions in cancer cell lines, ESCs and primary cells (reviewed in (150)). Taken together, epigenome engineering is an exciting emerging technology which may offer the solution to the challenges which have thus far hindered progress with the designer nucleases and gene knockout strategies.

1.10 AIM AND OBJECTIVES OF PHD THESIS

Despite the improved prognosis afforded by antiretroviral therapy, a cure for HIV infection remains elusive due to factors which include viral escape and the inability to target the integrated provirus. The emergence of designer nuclease technology has led to the development of gene therapy-based antiviral therapeutics however, efficacy and safety concerns have necessitated the development of novel strategies. The aim of this project was

43 Introduction to use a novel TALE-based platform to investigate the efficacy and specificity of epigenome editing as a treatment strategy against HIV infection.

Objectives:

• To generate CCR5-specific designer transcription factors and to assess their functionality in an episomal assay and in a reporter cell line • To generate CCR5 and CXCR4-specific designer epigenome modifiers and to evaluate functionality in a reporter cell line and at an endogenous gene • To test the functionality of the DEMs in the clinically relevant CD4+ cells • To assess DEM-induced DNA methylation in reporter cells, at an endogenous gene and in CD4+ cells • To assess DEM-induced histone modifications • To determine the specificity profile of the DEMs

44 Materials and Methods

2. MATERIALS AND METHODS

2.1 STANDARD MOLECULAR BIOLOGY METHODS 2.1.1 Restriction digest

All restriction enzymes used for cloning and for the screening of newly-generated plasmids and the corresponding buffers were obtained from New England Biolabs (Ipswich, USA) and were used according to the manufacturer’s instructions unless otherwise specified. The amounts of buffer and enzyme were scaled up accordingly with increasing amounts of DNA.

2.1.2 Ligation

Ligations were carried out using T4 DNA Ligase (New England Biolabs, Ipswich, USA) at a 3:1 insert:vector ratio according to the manufacturer’s instructions unless otherwise specified. Reactions were incubated at room temperature for 2 hours or overnight at 16°C.

2.1.3 Polymerase Chain Reaction (PCR)

All PCR amplifications for cloning were carried out using the proof-reading PhusionHF

DNA polymerase (New England Biolabs, Ipswich, USA). Colony PCR for the screening of transformants was performed using the Taq DNA Polymerase (New England Biolabs,

Ipswich, USA) (described in 2.1.5). Bisulfite converted DNA was PCR-amplified using the

PyroMark PCR kit (Qiagen, Hilden, Germany) (described in 2.6.1). Reaction set-up and cycling conditions were according to the manufacturer’s instructions unless otherwise specified. All primers were obtained from Apara Bioscience (Denzlingen, Germany).

45 Materials and Methods

2.1.4 Transformation of chemically competent E.coli cells

Escherichia coli DH5α competent bacteria were thawed on ice and the ligations pre-cooled.

For each transformation 5 µl of the ligation reaction were added to 50 µl of bacteria in a safe-lock tube. The tube was flicked gently to mix then the cells placed on ice for 15 minutes.

Heat shock was performed at 42°C for 90 seconds then the cells were placed back on ice for

5 minutes. Subsequently, 300 µl of lysogeny broth (LB) (Appendix 6.1) were added to the cells and incubation carried out at 37 °C and 500 rpm for 30 minutes. The cells were plated on pre-warmed agar plates(Appendix 6.1) supplemented with the appropriate antibiotic (100

µg/ml ampicillin or 50 µg/ml kanamycin) then allowed to grow overnight at 37 °C.

2.1.5 Colony PCR

To evaluate the success of cloning, colony PCR was carried out. A bacterial colony was picked with a sterile 1 µl pipette tip and smeared onto the bottom of a 0.5 µl PCR reaction tube. The tip was then placed in a safe-lock tube containing LB and incubated at 37°C until positive clones had been identified via PCR. To each 0.5 µl PCR reaction tube containing bacteria the following were added: 0.5 μl of each primer (10 µM), 0.25 μl of 10 mM dNTPs,

1.25 μl of 10X ThermoPol reaction buffer, 0.15 μl of Taq DNA Polymerase (all from New

England Biolabs, Ipswich, USA) and 9.85 μl of nuclease-free H2O. The PCR was carried out according to the manufacturer’s instructions with the appropriate annealing and extension times. Subsequently, gel electrophoresis was carried out (described in 2.1.9). After positive clones had been identified the tip and LB previously prepared were inoculated into 5 ml of

LB containing the appropriate antibiotic.

46 Materials and Methods

2.1.6 DNA extraction from bacteria (Mini and Midi preparations)

Bacterial clones were picked from selection plates and inoculated into 5 ml (mini preparation) or 50 ml (midi preparation) of LB medium supplemented with the appropriate antibiotic and grown overnight at 37°C. The overnight culture was harvested by centrifuging at 17000 × g for 1 minute at room temperature or at 6000 × g for 15 minutes at 4°C for the mini and midi preparations respectively. Extractions were carried out using the QIAprep

Spin Miniprep Kit (mini preparation) or the Qiagen Plasmid Midi Kit (midi preparation)

(Qiagen, Hilden, Germany) according to the manufacturer’s instructions.

2.1.7 Agarose gel electrophoresis

To each sample 6X orange loading dye was added (Appendix 6.1) and DNA electrophoresis was carried out on 1-1.5% agarose gels (Appendix 6.1) supplemented with 1 µg/µl of ethidium bromide (Roth, Karlsruhe, Germany). Electrophoresis of mRNA was performed on 1% agarose RNA formaldehyde gels (Appendix 6.1) and ethidium bromide was added directly to each sample prior to loading (described in 2.3.2). Electrophoresis was allowed to proceed in 1X TAE (tris-acetate-EDTA) (Appendix 6.1) or in a 1X MOPS-based buffer (3-

(N-morpholino)propanesulfonic acid)(Appendix 6.1) for DNA and RNA respectively.

Electrophoresis was carried out at 100-120 V for 30-45 minutes. To determine DNA and mRNA size a 2-Log DNA or ssRNA ladder (New England Biolabs, Ipswich, USA) was used respectively. Gels were visualised using the Fusion FX device (Vilber, Eberhardzell,

Germany).

47 Materials and Methods

2.1.8 PCR purification

PCR amplicons for cloning were purified using the QIAquick PCR Purification kit (Qiagen,

Hilden, Germany) according to the manufacturer’s instructions. DNA was eluted twice in 30

µl of ultra-pure water (Biochrom, Berlin, Germany).

2.1.9 Gel purification

Gel fragments were excised from the gel and purification carried out with the QIAquick Gel

Extraction kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions.

DNA was eluted twice in 30 µl of ultra-pure water.

2.1.10 Oligo cloning

Two complementary primers were designed such that once annealed the overhangs would be compatible with restriction sites in the desired vector. The primers were resuspended to give a concentration of 100 pmol/µl and annealed in the following mixture: 1 µl of each primer, 10 µl of 10X annealing buffer (100 mM Tris/HCl pH 7.5, 500 mM NaCl), 88 µl of ultra-pure water. The mixture was heated to 99°C for 10 minutes then cooled slowly to room temperature and the concentration determined using a Nanodrop1000 device (PEQLAB,

Erlangen, Germany). A ligation reaction was then carried out as previously described.

2.1.11 Gibson Assembly

A gBlock was designed with the sequence of interest and overhangs corresponding to the vector plasmid. The gBlock was diluted to 10 ng/µl according to the manufacturer’s instructions. A 3:1 insert:vector ratio was calculated and the insert combined with the digested vector in a volume made up to 5 µl with water. The mixture was added to 15 µl of

48 Materials and Methods the assembly mixture (Appendix 6.1) and incubation carried out at 50°C for 15-60 minutes.

Competent bacteria were transformed with 5 µl of the ligation reaction.

2.2 GENERATION OF THE TALE-BASED DESIGNER TRANSCRIPTION FACTORS 2.2.1 Generation of the TALE arrays

Potential CCR5 and CXCR4 target sites were identified by examining the proximal promoter region (-400 to +1 bp relative to the transcription start site (TSS)) and the region centred 350 bp around the TSS respectively. Special consideration was given to DNase I hypersensitivity sites identified using the UCSC Genome Browser database (http://genome.ucsc.edu/) as an indicator of chromatin accessibility. Additionally, the target sites chosen started with a 5’ T nucleotide fulfilling the TALE-binding requirement and were in close proximity to known cis regulatory elements. TALE-arrays were then generated via golden gate assembly as previously described (151) with each array comprising 17.5 repeats and cloned into the optimized TALE scaffold previously described by our lab (51). Flanking NheI and BamHI restriction sites allowed for the transfer of the DNA-binding domain (DBD) into the desired expression plasmids. The sequences of the target sites are given in Table 2.1.

2.2.2 Generation of the TALE-based designer transcription activators and repressors

In order to generate the DTA and DTR constructs, TALE DBDs were cloned into the pRK5_HA_VP16_NLS_TAL4-2 (#1276) or the pRK5_HA_KRAB_NLS_TAL3-2 (#1579) expression plasmids containing either the herpes simplex VP16 activation domain (152) or the Krüppel-associated box (KRAB) repressor domain (153) respectively.

49 Materials and Methods

Table 2.1: List of CCR5 and CXCR4 target sites

ID Strand Gene Target sequence Distance from TSS (bp)

1 Plus CCR5 TGGCTTAGCCCCTGGGTTA 293

2 Plus CCR5 TCTGCCTCTGTAGGATTGG 269

3 Minus CCR5 TTGTGATCTCTAAGAAGGC 217

4 Plus CCR5 TCTTGTGGCTCGGGAGTAG 150

5 Minus CCR5 TCATATCAAGCTCTCTTGG 71

6 Plus CCR5 TGACCATATACTTATGTCA 19

R1 Minus CXCR4 TGAAACTGGACTTACACTG -11*

R2 Minus CXCR4 TTGAAACTGGACTTACACT -12*

L1 Plus CXCR4 TCTGGAGAACCAGCGGTTA 2

L2 Plus CXCR4 TGGAGAACCAGCGGTTACC 0

*Negative values indicate sequences downstream of the TSS

2.2.3 Generation of DEM expression plasmids

The backbone of all the DEM constructs was the pPIX-K_TALshuttle(NS)_WT-Fok plasmid (#1593) which contains the first exon (5’ untranslated) and intron of the cytomegalovirus immediate early (CMV IE) gene for enhanced transgene expression (154) and a T7 promoter to facilitate the production of in vitro transcribed mRNA (Figure 2.1).

Digestion with BamHI and ApaI eliminated the Fok1 endonuclease domain and the 3859 bp fragment was purified via gel extraction and used for further cloning.

In order to generate the DMT, a gBlock was designed which contained the C-terminal region of the human DNA methyltransferase 3A (DNMT3A) linked to the C-terminal region of the murine Dnmt3-like (Dnmt3L) protein as previously described (149). The gBlock contained

50 Materials and Methods

EcoRV and PmeI overhangs to enable cloning into the pVAX_TAL_shuttle plasmid (#1360).

The pVAX_TAL_shuttle plasmid was digested with EcoRV and PmeI then the 2896 bp fragment purified via gel extraction. The gBlock was cloned into the vector via Gibson

Assembly giving the pVAX_shuttle_DNMT3a-Lco plasmid then the ligation mix transformed into DH5α cells. Colony PCR of transformants was carried out with primers 13 and 1404. Positive clones were inoculated for a mini DNA preparation then the success of cloning verified through a control digestion with PvuI and NdeI as well as Sanger sequencing with primers 13 and 1897. One clone taken for further steps was digested with BamHI and

ApaI then the 1642 bp fragment purified via gel extraction and ligated into the digested

#1593 backbone giving the pPIX_shuttle_DNMT3a-Lco construct (#1694). Colony PCR of transformants was carried out with primers 1083 and 1709. Positive clones were inoculated for a mini DNA preparation then the success of cloning verified through a control digestion with PvuI and NdeI. The cloning schematic is given in Figure 2.1 and the primers used are indicated in Table 2.2.

In order to generate the dDMT construct, the previously described pVAX_shuttle_DNMT3a-Lco plasmid was digested with PvuI and XbaI then the 4492 bp fragment gel purified. Oligo cloning was performed with primers 2604 and 2605 in order to introduce an ENV→ANV mutation in the DNMT3A catalytic site as previously described

(155) giving the pVAX_shuttle_DNMT3a-Lmut plasmid. Colonies were inoculated for a mini DNA preparation then the success of cloning verified through a control digestion with

MluI and XbaI as well as Sanger sequencing with primer 13. Cloning into the #1593 expression plasmid was carried out as previously described giving the pPIX_shuttle_DNMT3a-Lmut construct (#1696). Colonies were inoculated for a mini DNA

51 Materials and Methods preparation then the success of cloning verified through a control digestion with NdeI and

XbaI. The primers used are indicated in Table 2.2.

In order to generate the DEM construct, the KRAB domain was PCR-amplified from the

#1579 plasmid using the primers 2633 and 2634 then the amplicon was gel purified. Both the amplicon and the #1593 plasmid were digested with XbaI and NheI followed by gel purification. A ligation was carried out giving the pPIX_shuttle_TAL_KRAB plasmid.

Colony PCR of transformants was carried out with primers 1009 and 2634. Positive clones were inoculated for a mini DNA preparation then the success of cloning verified through a control digestion with HincII. The pPIX_shuttle_TAL_KRAB and pPIX_shuttle_DNMT3a-

Lco plasmids were then digested with ApaI and BamHI and the 1642 bp DNMT3A-Dnmt3L fragment ligated into the pPIX_shuttle_TAL_KRAB plasmid giving the pPIX_shuttle_DNMT3a-Lco_KRAB plasmid (#1697). Colonies were inoculated for a mini

DNA preparation then the success of cloning verified through a control digestion with XbaI.

The corresponding DNA-binding domain was introduced into all the expression plasmids described via a NheI/BamHI digest as previously mentioned. The cloning schematic is given in Figure 2.1 and the primers used are indicated in Table 2.2.

As the previously described pRK5_HA_KRAB_NLS_TAL3-2 (#1579) DTR expression plasmid does not contain a T7 promoter to enable mRNA production, the corresponding

DBDs were cloned into the pPIX_shuttle_TAL_KRAB plasmid described previously creating DTRs for in vitro transcription.

52 Materials and Methods

Table 2.2: Primers used to generate the DEM constructs

ID Purpose Sequence (5’-3’) DMT 13 Colony PCR, sequencing TCACGGGGATTTCCAAGTCTC 404 Colony PCR CAGGACGCGGCCGCGTGGCGACCGGTGGAT CC 1897 Sequencing CTACTCAGACAATGCGATGCA 1083 Colony PCR CGCCTCTAGACCCAAGAAAAAG 1709 GTCCTGAAGCTTCTATGGCAGGGCCTGCCG CC

dDMT 2604 Oligo cloning CGACCCTTCTTCTGGCTGTTCGCGAACGTGG TCGCTATGGGCGTGTCTGACAAGCGGGACA TTT 2605 CTAGAAATGTCCCGCTTGTCAGACACGCCC ATAGCGACCACGTTCGCGAACAGCCAGAAG AAGGGTCGAT 13 Sequencing TCACGGGGATTTCCAAGTCTC

DEM 2633 KRAB amplification CAGGACTCTAGAATGGATGCTAAGTCACTG ACTGC 2634 KRAB amplification, colony CAGGACGCTAGCAGAATTCCCGTGC PCR 1009 Colony PCR CAGTACATGACCTTATGGGACTTTCCTAC

53 Materials and Methods

Figure 2.1: DEM cloning schematic

A gBlock containing the human DNMT3A linked to the murine Dnmt3L was cloned into plasmid #1360 then the gBlock excised by restriction digest and cloned into the plasmid #1593 giving the DMT backbone. The KRAB domain was PCR amplified and cloned together with the gBlock into plasmid #1593 giving the DEM backbone. The corresponding DNA-binding domain was cloned into each respective backbone via a NheI/BamHI restriction digest.

2.2.4 Cloning of the Firefly Luciferase reporter

A 320 bp fragment of the CCR5 upstream promoter was PCR amplified from Jurkat cells using the following primers: 5’-TTTTGCTAGCTTGGCAGTCTGACTACAG-3’ (2268) and 5’-TTTTACCGGTTCCACATGACATAAGTATATGGTCAAG-3’ (2269). The incorporated NheI and AgeI restriction sites are indicated in bold respectively. The PCR product was purified then both the amplicon and the pGLtk.11.luc plasmid (#307) containing

Firefly Luciferase digested with NheI and AgeI. The digests were purified then a ligation carried out giving the pGLtk.11.luc_CCR5 plasmid (#2094). Colony PCR of transformants

54 Materials and Methods was carried out with primers 2268 and 2269. Positive clones were inoculated for a mini DNA preparation then the success of cloning verified through a control digestion with NotI and

EcoRI and Sanger sequencing with primer 528. The primers used are indicated in Table 2.3.

2.2.5 Cloning of the EGFP reporter

As with the Firefly Luciferase reporter, the CCR5 promoter was PCR amplified from Jurkat cells using the primers 2841 and 2842. The PCR product was gel purified then ligated into a third-generation lentiviral transfer plasmid (#890) digested with EcoRV giving the pCCL_CCR5_prox_prom_rev_CMV_eGFP (#1768) plasmid. Colony PCR of transformants was carried out with the primers 2841 and 2842. Positive clones were inoculated for a mini

DNA preparation then the success of cloning verified through Sanger sequencing with primer 1405. The primers used are indicated in Table 2.3.

Table 2.3: Primers used to generate the Firefly Luciferase and EGFP reporters

ID Purpose Sequence (5’-3’) Firefly Luciferase Reporter 2268 Amplification of CCR5 promoter TTTTGCTAGCTTGGCAGTCTGACTA CAG 2269 TTTTACCGGTTCCACATGACATAA GTATATGGTCAAG 528 Sequencing ATAGCTTCTGCCAACCGAAC

EGFP reporter 2841 Amplification of CCR5 promoter CTTGGCAGTCTGACTACAGA 2842 TCCACATGACATAAGTATATGGT 1405 Sequencing GACGTGAAGAATGTGCGAGA

55 Materials and Methods

2.3 MRNA PRODUCTION 2.3.1 Plasmid linearization

Plasmid linearization and in vitro transcription were performed under RNase-free conditions and the working area and equipment were cleaned with RNase-ZAP (Sigma, St. Louis, USA) prior to reaction setup. Plasmid linearization was carried out by combining 10 µg of high quality expression plasmid (Midi preparation) with 5 µl 10X CutSmart buffer, 1 µl PspOMI and nuclease-free water (Ambion, Austin, USA) to a volume of 50 µl in a 1.5 ml safe-lock tube. A GFP expression plasmid (#1233) was included as a transfection control. In this case the same master mix was prepared but using the XbaI restriction enzyme. The reaction was incubated at 37°C for 2 hours and 300 ng of the undigested plasmid retained for quality control on a gel. The digested plasmid was purified using the Qiagen Gel Extraction kit according to the manufacturer’s instructions. The DNA concentration was determined using a Nanodrop device. To control the extent of the linearization reaction, 300 ng of the linearized plasmid and its equivalent undigested plasmid were combined with 9 µl of 6X orange loading dye in an approximate volume of 11 μl and loaded on a 1% agarose gel in

1X TAE. The gel was allowed to run for 60 minutes at 100 V then visualized with a Fusion

FX device. Once successful digestion had been verified the linearized plasmid was used for the in vitro transcription reaction.

2.3.2 In vitro transcription

The in vitro transcription reaction was carried out using the mMessage mMachine T7 Ultra

Kit (Ambion, Austin, USA) according to the manufacturer’s instructions. The transcription reaction was carried out at 37°C for 2 hours for maximum yield. Subsequently, the poly(A) tailing reaction was carried out following the manufacturer’s instructions. A 2.5 µl aliquot taken from the reaction before the addition of the E-PAP tailing enzyme was used for the

56 Materials and Methods untailed control. The mRNA was recovered via lithium chloride precipitation as suggested by the manufacturer. The pellet was resuspended in 11-16 µl of RNAse-free EB buffer

(Qiagen, Hilden, Germany) depending on the size of the pellet. To determine the mRNA concentration 1 µl of purified mRNA was diluted in 9 µl of EB buffer and the concentration determined using a Nanodrop device. Reactions were set up containing 8 µl of 2X loading dye (New England Biolabs, Ipswich, USA), 1 µl of a 50X ethidium bromide dilution (in

RNase-free water) and 300 ng of mRNA or 2.5 µl of the untailed control or 2 µl of ssRNA ladder (New England Biolabs, Ipswich, USA) in a final volume of approximately 11 µl. The samples were incubated at 70°C for 10 minutes then on ice for 1 minute then loaded onto a

1% agarose RNA formaldehyde gel and run in 1X MOPS running buffer for 30 minutes at

120 V. The mRNA was visualized with a Fusion FX device. Success of the poly(A) tailing of the mRNA could be visualized as a size shift compared to the untailed control.

2.4 CELL CULTURE METHODS 2.4.1 Culture conditions

HEK293T cells and the HEK293T-based reporter cell line were maintained in Dulbecco’s

Modified Eagle Medium (DMEM) (Thermo Fisher Scientific, Massachusetts, USA) supplemented with 10% Fetal Calf Serum (FCS) (GE Healthcare, Little Chalfont, UK) 1%

Penicillin/Streptomycin (GE Healthcare, Little Chalfont, UK) and 1% sodium pyruvate

(Biochrom, Berlin, Germany). The cells were passaged as follows: the culture medium was removed and the cells washed with Dulbecco’s phosphate buffered saline (DPBS) (PAN-

Biotech, Aidenbach, Germany). The DPBS was removed and cells detached by adding 1X trypsin-EDTA (Biochrom, Berlin, Germany) and incubating at 37°C for 5 minutes. The trypsin was neutralised by the addition of fresh culture medium and the required amount of

57 Materials and Methods cells transferred to a new culture vessel containing the appropriate amount of medium. Blood samples were obtained from healthy donors after informed consent and human CD4+ T cells were isolated from the peripheral blood mononuclear cells (PBMCs) by Ficoll density gradient centrifugation followed by Miltenyi MACS separation (described in 2.5.1 and

2.5.2). The cells were maintained in X-VIVO 15 medium (Biozym, Hessisch Oldendorf,

Germany) supplemented with IL2 (Miltenyi Biotec, Bergisch Gladbach, Germany) (final concentration 20U/ml). All cells were cultured at 37°C and 5% CO2 in a humidified incubator.

2.4.2 DNA PEI transfection

HEK293T cells were seeded in a 24 well plate (130 000 cells per well in 500 µl of medium) and left to attach for at least 5 hours. The amounts of DNA required for transfection were determined and the reactions assembled. For the DTR experiments reactions were set up as follows:

Table 2.4: DTR transfection scheme

Component 1x effector 2x effector 3x effector mCherry 50 ng 50 ng 50 ng Effector 400 ng 800 ng 1200ng pUC118 (#77) 800 ng 400 ng -

mCherry was used to monitor transfection efficiency and pUC118 used as a stuffer plasmid.

A total of 1250 ng of DNA were used for each transfection. For each reaction the volume was made up to 25 µl with 150 mM NaCl and 25 µl of PEI (Appendix 6.1) were added.

Typically each transfection was carried out in duplicate and a master mix of all the components assembled. The DNA-PEI mix was incubated for 10 minutes at room

58 Materials and Methods temperature. The mixture was then added to the cells dropwise and a medium change performed after 8-12 hours.

2.4.3 RNA lipofection

For the mRNA lipofection 2 µg of each construct were used and the volume made up to

25 µl with Opti-MEM Reduced Serum Medium (Thermo Fisher Scientific, Massachusetts,

USA). To each reaction, 25 µl of a lipofectamine dilution were added (2 µl Lipofectamine

2000 (Thermo Fisher Scientific, Massachusetts, USA) + 23 µl Opti-MEM). The mixture was incubated for 5 minutes at room temperature then added to the cells dropwise.

2.4.4 Dual Luciferase assay

In the activation experiments co-transfections with the Firefly Luciferase reporter (described in 2.2.4) and a Renilla Luciferase plasmid (#550) were carried out. Reactions were assembled as follows:

Table 2.5: Dual Luciferase Assay transfection scheme

Component 1x effector 2x effector 3x effector Control Luciferase 15 ng 15 ng 15 ng 15 ng reporter Renilla 5 ng 5 ng 5 ng 5 ng Luciferase Effector 400 ng 800 ng 1200ng - Shuttle plasmid 800 ng 400 ng - 1200 ng (#1276) pUC118 30 ng 30 ng 30 ng 30 ng

59 Materials and Methods

The Dual Luciferase assay was performed 24 hours post transfection using the Dual-

Luciferase Reporter Assay System (Promega, Wisconsin, USA) according to the manufacturer’s instructions. For each sample 10 µl of lysate were transferred to each well of a white 96-well MicroWellplate (Thermo Fisher Scientific, Massachusetts, USA). To each well 50 µl of luciferase assay reagent II (LAR) were added and the Firefly Luciferase activity measured on the Tecan GENios Microplate Reader (MTX Lab Systems, Florida, USA).

After the reading 50 µl of Stop & Glo reagent were then added and the Renilla Luciferase activity measured. Firefly Luciferase activity was normalised to Renilla Luciferase and activation expressed relative to the control shuttle plasmid.

2.4.5 Virus production and transduction

All viral transductions were carried out with a VSV-G-pseudotyped self-inactivating lentiviral system. For virus production, 3.5×106 HEK293T cells were seeded in a 10-cm cell culture dish with 9 ml of DMEM medium. After 24 hours the cells were transfected with a mixture containing 1.5 μg pMD2.G 0 (#1476)(env), 12.5 μg pC.GP.4xcte (#1477) (Gag/pol),

5 μg pRSV-Rev (1478) (Rev)(156) and 7.5 μg of the desired transfer plasmid (the EGFP reporter described in 2.2.5). The volume was made up to 500 μl with150 mM NaCl then 500

μl of PEI added. The mixture was incubated at room temperature for 10 minutes then added to the cells dropwise. The next day a medium change was carried out and 7 ml of DMEM added to the cells. Viral supernatant was collected each day from day 2 to day 5 post transfection, and 7 ml of DMEM added to the cells after each collection. The virus was passed through a 0.22 µM filter (Merck Millipore, Massachusetts, USA) before use and stored at 4°C short term or -20°C long term.

60 Materials and Methods

2.4.6 Generation of reporter cells

To generate the reporter cell line, 5×104 HEK293T cells were seeded per well in a 24-well plate. The next day the medium was removed and to each well 100 μl of fresh DMEM, 100 μl of DMEM supplemented with 16μg/ml protamine sulfate and a 200 μl preparation of the

EGFP reporter virus (described in 2.4.5) were added. In this case a titration was carried out according to Table 2.6. Spinoculation was carried out at 32-37°C for 1 hour at 200 × g then the cells cultured in a humidified incubator at 37 C in 5% CO2. The medium was changed the next day and flow cytometry carried out after 3 days. The efficiency of transduction was determined by the percentage of EGFP+ cells. The cells which received 1 μl of virus corresponding to an MOI of 0.03 were chosen for expansion as the low EGFP expression was indicative of a low copy number of the integrated reporter construct. Single cell sorting

17 days post transduction was carried out using the MoFloAstrios Cell Sorter (Beckman

Coulter, California, USA) into 96-well plates. The single cells were expanded then flow cytometry carried out. Clones showing ~100% EGFP+ cells were chosen and expanded. One such clone was used for further experiments and thus referred to as the HEK293T-EGFP cell line.

Table 2.6: Viral titration scheme

Virus 200 μl 100 μl 50 μl 10 μl 1 μl 0.1 μl* supernatant Medium - 100 μl 150 μl 190 μl 199 μl 199.9 μl

*Reaction set up via a 10X dilution of the viral supernatant

2.4.7 Reactivation

DEM-treated reporter cells were sorted in order to obtain the cells which were ~100% EGFP negative. For the DTA and 5-Aza-2’-deoxycytidine (5-AZA) reactivation 130 000 cells per

61 Materials and Methods well were seeded on a 24-well (in 500 µl of medium) or 6-well (in 2 ml of medium) plate respectively. Transfections were carried out in triplicate with 50 ng of mCherry and 1150 ng of the DTA #6 or the shuttle control plasmid (#1276). For the 5-AZA reactivation the cells were either treated daily with 10 µM of 5-AZA freshly prepared in culture medium or left untreated. Flow cytometry was performed on day 3 and 6 post transfection/5-AZA treatment.

2.4.8 Flow cytometry

For flow cytometry analysis, HEK293T cells were detached, centrifuged and combined with

300 μl of FACS buffer (DPBS+ 5% FCS + 1 mM EDTA+ 0.1% sodium azide). The cells were then applied to the Accuri C6 device (BD Biosciences, Allschwil, Switzerland). CCR5 repression and activation in the reporter cell line were similarly evaluated via flow cytometry. The CD4+ cells were nucleofected with GFP mRNA to monitor transfection efficiency (described in 2.5.4). One day post nucleofection a 50 μl cell aliquot was combined with 300 μl of FACS buffer and the cells analysed on the Accuri C6 device. Untreated cells were used as a control. CCR5 and CXCR4 surface expression were monitored in HEK293T and CD4+ cells by staining with the anti-CCR5 3A9 (BD Biosciences, Allschwil,

Switzerland) and anti-CXCR4 12G5 (BioLegend, California, USA) monoclonal antibodies.

The corresponding isotype controls were used to ensure the specificity of antibody staining.

Briefly, 2×105CD4 or 1×105 HEK293T cells were harvested and washed with pre-warmed

DPBS. The cells were then combined with antibody in a total volume of 50 μl of DPBS and incubated in the dark at room temperature for 20 minutes. Following incubation, the cells were washed with DPBS and resuspended in FACS buffer (DPBS+10% FCS). Analysis was carried out on the Accuri C6 or the FACS Canto II device (BD Biosciences, Allschwil,

Switzerland). Data were analysed using the Accuri software (BD Biosciences, Allschwil,

Switzerland) or FlowJo (FlowJo LLC, Oregon, USA).

62 Materials and Methods

2.5 PRIMARY CELL CULTURE 2.5.1 PBMC extraction from LRS Chamber

Peripheral blood mononuclear cells (PBMCs) were obtained from healthy donors and collected in a leukocyte reduction system (LRS) chamber which separates the white blood cells from blood products such as platelets and red blood cells. The cells were transferred to a 50 ml Falcon tube and the volume noted. The chamber was washed with 30 ml of PBS and the wash added to the Falcon tube containing the cells. The cells were then transferred to a

T75 tissue culture flask. For each 6 ml of cell volume 29 ml of PBS were added, taking into account the 30 ml used to wash the chamber. For the density gradient separation, 50 ml

Falcon tubes were prepared with 15 ml of Biocoll Separating Solution (Biochrom, Berlin,

Germany) and the Biocoll carefully overlaid with 35 ml of cells. Centrifugation was carried out at 400 × g and 20 °C for 30 minutes without brakes. All subsequent centrifugation steps were carried out at 20°C. The supernatant was removed and discarded and the cellular layer was transferred to a new 50 ml Falcon tube. The volume was made up to 50 ml with PBS and the cells centrifuged at 300 × g for 10 minutes. The supernatant was removed, the volume made up to 50 ml with PBS and the cells centrifuged at 200 × g for 15 minutes. This step was performed twice. After centrifugation the cells were resuspended in 40 ml of PBS and counted using the NucleoCounter NC250 (ChemoMetec, Allerod, Denmark) device. The cells were either frozen at -80°C in FCS + 10% DMSO or used directly for CD4 isolation.

63 Materials and Methods

2.5.2 MACS Isolation of CD4+ cells from PBMCs

After PBMC isolation an aliquot of the cells (typically 7 × 108 PBMCs) was passed through a 30 µM pre-separation filter (Miltenyi Biotec, Bergisch Gladbach, Germany) to obtain a single cell suspension then the cells collected in a 15 ml Falcon tube. T cell isolation was performed using the CD4+ T cell Isolation kit (Miltenyi Biotec, Bergisch Gladbach,

Germany). The cells were kept cold throughout the isolation and pre-cooled solutions were used. The cells were centrifuged at 300 × g for 20 minutes then the supernatant removed.

The cells were resuspended in 80 µl of cold buffer (MACS BSA stock solution diluted 1:20 with autoMACS Rinsing Solution, pH 7.2, 0.5% BSA, 2 mM EDTA) for every 1 × 107 cells then 20 µl of CD4 MicroBeads added for every 1 × 107 cells. The cells were mixed well then incubated at 4°C for 15 minutes. The cells were washed with 1 ml of buffer for every 1x107 cells then centrifuged at 300 × g and 4°C for 10 minutes. The supernatant was removed and the cells resuspended in 500 µl of buffer for every 1 × 108 cells. An LS column was placed in the magnetic field of a MACS separator and a nylon mesh filter placed on top of the column. The column was equilibriated with 3 ml of buffer then the cells applied to the column. The column was washed three times with 3 ml of buffer and the unlabelled cells which passed through the column were collected in a 50 ml Falcon. The column was removed from the separator and placed into a 15 ml Falcon tube. To elute the labelled cells,

5 ml of buffer were added to the column then the plunger immediately pushed into the column. The cell number was determined using the NucleoCounter device then the cells frozen at -80°C in FCS + 10% DMSO or used directly.

64 Materials and Methods

2.5.3 Thawing and activation of primary human CD4+ T cells

Prior to thawing, CD2, CD3 and CD28 activation beads were prepared using the Human T cell activation kit (Miltenyi Biotec, Bergisch Gladbach, Germany) according to the manufacturer’s instructions. After preparation the bead concentration was 1 × 105 beads/µl.

The primary CD4+ T cells were thawed quickly in a 37°C water bath. Into a 15 ml Falcon tube 10 ml of pre-warmed X-VIVO 15 medium were pipetted then the cells were added and mixed gently. The cells were centrifuged at 300 × g for 5 minutes then the supernatant was discarded and fresh medium added. Approximately 10 ml of medium were used for 30 million cells. The cell number was determined using the NucleoCounter device. The amount of beads required for activation considering a 2:1 (cells:beads) ratio was calculated. The beads were resuspended gently by pipetting then the required amount placed into a 1.5 ml safe-lock tube and washed gently with X-VIVO 15 medium in a ratio of 1:3 (beads:medium).

The beads were centrifuged at 300 × g for 5 minutes then the supernatant discarded and the beads resuspended in the same volume of medium (i.e. 150 µl medium for 150 µl of cells).

Subsequently the beads were added to the cells and the combination mixed gently. For optimal activation, the cell density should be 1.3 × 106 cells/cm2 and 2 × 106 cells/ml and the cells were seeded as suggested in Table 2.7 and the culturing medium adjusted appropriately. Therefore, if activating 30 million cells, the optimal surface area is 23 cm2

(30 × 106 cells/1.3 × 106 cells/cm2) and the optimal volume is 15 ml (30 × 106 cells/2 × 106 cells/ml). In this case a surface area of 28.5 cm2 could be selected due to the high volume, but a surface area of 19 cm2 would also be suitable. Therefore, in this case an additional 5 ml of medium were added to the cells and the cells transferred to 3 wells of a 6-well plate, each well containing 5 ml. The cells were cultured for 3 days in a humidified 37°C incubator with 5% CO2.

65 Materials and Methods

Table 2.7: Recommended culturing conditions for CD4+ cells

Vessel type Approx. 2× 3× 4× 5× surface area (cm2) 96w 0.35 0.7 1.05 1.4 1.75 48w 1.1 2.2 3.3 4.4 5.5 24w 1.9 3.8 5.7 7.6 9.5 12w 3.9 7.8 11.7 15.6 19.5 6w 9.5 19 28.5 38 47.5 10com 58 116 174 232 290

2.5.4 Nucleofection of primary CD4+ T cells

Nucleofections were performed three days post activation and prior to nucleofection the activation beads were removed. Briefly, the cells were resuspended well with a 1 ml pipette to disrupt cell clumps then transferred to a 15 ml or 50 ml Falcon tube depending on the cell volume. The cells were centrifuged at 300 × g for 5 minutes then the supernatant removed.

The cells were thoroughly resuspended in 3-5 ml of X-VIVO medium (not supplemented with IL2) in order to detach the cells from the beads. The Falcon tube was placed in a

DynaMag15 or 50 magnet (Invitrogen, California, USA) for 15 ml or 50 ml Falcon tubes respectively for 3 minutes. The cells were carefully aspirated using a 5 ml pipette and placed into a new Falcon tube. The Falcon tube was once again placed in the magnet for 3 minutes to ensure that all the beads had been removed then the cells transferred to a new Falcon tube.

The cells were placed in a 37°C incubator while preparing for the nucleofection. A 12-well plate containing 2 ml of X-VIVO medium supplemented with IL2 (final concentration

20U/ml) per well was prepared for cell recovery after nucleofection. The medium was equilibriated in a 37°C incubator for at least 30 minutes. The amount of mRNA needed for each nucleofection was determined and the respective mRNA transferred to a safe-lock tube.

Typically 5-10 µg of mRNA in a total volume of 10 µl in EB buffer were used for each

66 Materials and Methods nucleofection. The mRNA was kept on ice at all times until required. Nucleofections were carried out using the 4D Nucleofector X device and the P3 Primary Cell 4D Nucleofector X kit (Lonza, Basel, Switzerland) following the instructions for stimulated human T cells. For each nucleofection, 2.5 × 106 cells were centrifuged at 200 × g for 10 minutes at room temperature then resuspended in 100 µl of nucleofection solution containing 18 µl of supplement and 82 µl of Nucleofector solution. The 100 μl cell suspension was transferred to the 1.5 ml safe-lock tube containing the mRNA then the mixture combined by pipetting and transferred to the nucleofection cuvette. The electric pulse was applied to the cells using the program EO-115. Immediately afterwards the provided sterile pipettes were used to transfer about half of the equilibrated medium into the cuvette then the cells were transferred into one well of the 12-well plate containing the rest of the equilibrated medium (total volume 2 ml medium +100 µl cells). The cells were nucleofected with GFP mRNA each time in order to assess nucleofection efficiency and flow cytometry carried out 24 hours post nucleofection. The cells were maintained in X-VIVO 15 medium supplemented with IL2 and passaged every three days post nucleofection to a density of 0.3 × 106 cells/cm2 and 0.5

× 106 cells/ml according to Table 2.7. To maintain the cells in culture long-term the activation was repeated every seven days and beads removed after 3 days of activation.

2.6 EPIGENETIC AND EXPRESSION ANALYSES 2.6.1 Bisulfite sequencing

Genomic DNA was extracted using the QIAamp DNA Blood Mini kit (Qiagen, Hilden,

Germany) according to the manufacturer’s instructions and eluted in 35 µl of nuclease-free water. Bisulfite conversion was carried out with 500 ng of gDNA using the EZ DNA

Methylation-Goldkit (Zymo Research, Freiburg im Breisgau, Germany) following the manufacturer’s instructions. The extent of CpG methylation was evaluated via PCR of the

67 Materials and Methods bisulfite converted DNA using the PyroMark PCR kit (Qiagen, Hilden, Germany). Each reaction contained 0.5 μl of each primer (10 µM), 6.25 μl of PyroMark PCR mix, 4.75 μl nuclease-free H2O and 0.5 µl of bisulfite converted DNA. The primers used are in Table 2.8.

Quality control of the bisulfite conversion was carried out with unconverted DNA and the primers designed for converted DNA. To evaluate the success of the PCR 1.5 µl of the PCR reaction were loaded on a 1% agarose gel. No band should be observed in the reaction containing unconverted DNA. The remaining PCR reaction was purified using the QIAquick

PCR Purification Kit and eluted in 30 μl nuclease free water. The PCR amplicon was cloned via the CloneJET PCR Cloning Kit according to the sticky-end cloning protocol given by the manufacturer. Competent bacteria were transformed with 2.5 μl of the ligation mixture.

To screen for positive transformants colony PCR was carried out using the pJET1.2 forward and reverse primers. Positive clones were inoculated for mini DNA preparation then sequencing carried out with the pJET1.2 forward and reverse primers. Methylation analysis was carried out using the software Quma (http://quma.cdb.riken.jp).

68 Materials and Methods

Table 2.8: Primers used for bisulfite sequencing analysis

ID Purpose Sequence (5’-3’) 3498 ATGGGGGTGTTTTGTTGGTAGTGG 3500 Kinetics bisulfite TTGTGGTTGTTGTAGTTGTATTTTAGTTTGTG 3502 analysis in HEK293T- GTGGTGTAGATGAATTTTAGGGTTAGTTTGT 3504 EGFP cells TCCCTACGCAAACCCAAAACTACC 3507 CTAAACAACCGCTATTAACCACAACCCAT

+2kb 3459 AATATATAGTATGGGTAAGTAGGGAG +2kb 3550 TTATCCCTCATATCTCCTCCTCC

+1kb 3555 TTTATAGTGAATAGAGTTAGGTAGGG +1kb 3556 TATATCTACTATTCTTTCCCCTACAC

+500bp 3557 GTTGTATTTTAGTTTGTGTTTTAGG

+500bp 3558 CTAACCCTAAAATTCATCTACA

Spreading bisulfite -500bp 3559 analysis in HEK293T- GGTTTTTGTAGTTAGATTGTTAAG -500bp 3560 EGFP cells AACAAAAACAAAACCCACACTAC

-1kb 3561 GAGTGTTGTAAAGTTTGTAATTTGG -1kb 3562 CTCCTAAATACTAAAAACTATACTATCC

-2kb 3563 GTTTGTGTTGTTATTTGGATTTTG -2kb 3553 CTACAACTACCTTATAAATCATTAATC

4101 CXCR4 bisulfite CATTTATAACAAAACAAATTAAAACTAAAC 3795 analysis in HEK293T TTAGTGTTTTTATTGTAGTATTTTTAG &CD4+ cells

pJET1.2 F pJET sequencing CGACTCACTATAGGGAGAGCGGC pJET1.2 R AAGAACATCGATTTTCCATGGCAG

2.7 IN SILICO PREDICTION OF DEM #6 OFF-TARGET SITES

Potential off-target sites of the DEM #6 were predicted using the online tool TAL Effector

Targeter (157) (https://tale-nt.cac.cornell.edu/node/add/single-tale). To identify sites with up to three mismatches compared to the on-target sequence the COSMID online tool (158)

(https://crispr.bme.gatech.edu/) was used using the following sequence as input:

TGACCATATACTTATGTCANNN.

69 Materials and Methods

2.7.1 Bisulfite sequencing via Next Generation Sequencing

After genomic DNA extraction, PCR amplification of the regions of interest was carried out and libraries generated using the NEBNext Ultra II DNA Library Prep Kit (New England

Biolabs, Ipswich, USA) then quantified using the ddPCR Library Quantification Kit for

Illumina TruSeq (Biorad, California, USA). The primers used are indicated in Table 2.9.

Sequencing was performed on an Illumina MiSeq platform with a MiSeq Reagent Micro Kit v2, 300- cycles (Illumina, California, USA). Paired-end reads were either merged using Fast

Length Adjustment of SHort reads (FLASH) (159) or processed individually and mapped to the corresponding amplicon sequences using the Burrows-Wheeler Aligner (BWA) (160) depending on the amplicon size. CpG diresidues were identified within the amplicons and mapped reads analysed at these positions to evaluate the extent of bisulfite conversion.

Methylation for each position was calculated as the ratio between mapped reads showing unconverted cytosines indicative of CpG methylation and the total number of mapped reads.

Methylation analysis at the top 10 predicted off-target sites for DEM #6 was performed in the same way. Primers used are indicated in Table 2.10.

70 Materials and Methods

Table 2.9: Primers used for CCR5 NGS on-target analysis

Purpose ID Sequence (5’-3’) CCR5_NGS_1F GTATTTGTGAAAAGTGTTGAGAGTTTGG CCR5_NGS_1R CTCTATTACCCAAACTAAAATACAATAACC

CCR5_NGS_2F GATGGAATTTTTTTTTGGTGAAGATGTTG CCR5_NGS_2R CTTACTAATCAAAATAATAATTACTAAAAC

CCR5_NGS_3F GTTAGAGTATTGATGGTGATAATTAG

CCR5_NGS_3R CTTCAAAAAATACCTTTTACTCCACCC

CCR5_NGS_4F TTTAGGGTGGAAGTTGTTTTAGG CCR5_NGS_4R ACTCACCCCCAAAAAACAATATA

CCR5_NGS_5F TTTGAATTGTATATATGGGATGAA CCR5_NGS_5R TACTTAAAAAAAACCAAAACAATATAA

CCR5_NGS_6F TTTAGTTTTTGGGTTAGTTTGTTTTTG On-target analysis CCR5_NGS_6R TATAATCAAATTCAAATTCTTTATAAC

CCR5_NGS_7F TTTAGAAAAAGATGGGAAATTTGTT CCR5_NGS_7R TCCTAAACTTCACATTAACCCTATATC

CCR5_NGS_8F GTTTATTTGGTTAGAAGAGTTGAG CCR5_NGS_8R CTTTCACTCACAATCATATTTTATATC

CCR5_NGS_9F TTAGTATTTTAGGAGGTTGAGGTAGGAGG CCR5_NGS_9R CAAACATAATACAACTCAACCTTT

CCR5_NGS_10F AAAAGAAGGTTTTTATTATATTTGTAG CCR5_NGS_10R CAAACCAAAAAATTCCTAAAAAATATTC

CCR5_NGS_11F GGTAAGGAGATTATTAATAGTTTTTAGG CCR5_NGS_11R AAAAACTAAAAATTCTCTCTAACTCC

CCR5_NGS_12F GATAGAGTTTTAAATGTAAATATAATTG CCR5_NGS_12R CTACTCAAAAAACTAAAACAAAAAAATTAC

71 Materials and Methods

Table 2.10: Primers used for CCR5 NGS off-target analysis

Purpose ID Sequence (5’-3’) CCR5_NGS_OFF_1F TGGGAAGGTAGATGAGGAAATTATTTG CCR5_NGS_OFF_1R TTCATTCTATTCATTTTCACCTTAATC

CCR5_NGS_OFF_2F GTTGTGTTAGTAGATAAAGTTTTAATATG CCR5_NGS_OFF_2R CATTCACAAATACCATAAATTAAAAATT C

CCR5_NGS_OFF_3F GTAGAGATAAGTTTTTATTATGTTGG

CCR5_NGS_OFF_3R CTTATATCAATTAAATACTCCTCAAAC

CCR5_NGS_OFF_4F TAGGTAATAAGTTGTTTAGGGTATTTGG CCR5_NGS_OFF_4R TTCAAATAAACTCTCTACCTTTATAACC

Off-target analysis CCR5_NGS_OFF_5F GGTAAGGAAAGATTTGATTATTTTTG CCR5_NGS_OFF_5R TTTCAATAAATCTCAAACACTTAACAAC

CCR5_NGS_OFF_6F GTTATTTTGAGTAAAGAGAGTAAATTAG CCR5_NGS_OFF_6R CTACTTATAAATATCCAAATTTTCCAAC

CCR5_NGS_OFF_7F GTGGTTTTATTATGTTGGTTAGGTTG CCR5_NGS_OFF_7R CCCTAATTCTATCAACCAATATTTAC

CCR5_NGS_OFF_8F GTTGATGTTAATTTTGATTATTTGGTTG CCR5_NGS_OFF_8R CTAAACAAATCTCCCATATAAAAAATTC

CCR5_NGS_OFF_9F GGTATGTATTTAGAAGAATTTTTGAG CCR5_NGS_OFF_9R CTTAATTCCACAACTAAATTAAAAACTC

CCR5_NGS_OFF_10F GTTTGTTTTTGAGAATGTTTTATGTGTTG CCR5_NGS_OFF_10R CTAAAAAATTCAACACACAACATTCAAC

2.7.2 RNA isolation

Cells were harvested on day 7 or day 21 post nucleofection and whole RNA extracted using the RNeasy Mini kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. Typically at least 1 × 106 but not more than 5 × 106 cells were used. The RNA was eluted in 30 µl of RNase-free water and the concentration was determined using a

Nanodrop device. RNA was either stored at -20°C or used for the reverse transcription reaction.

72 Materials and Methods

2.7.3 Reverse transcription

Reverse transcription was carried out using the QuantiTect Reverse Transcription kit

(Qiagen, Hilden, Germany) according to the manufacturer’s instructions. The RNA input was 500 ng per reaction and the final cDNA concentration was 25 ng/µl.

2.7.4 Quantitative real-time PCR analysis

Gene expression analyses were carried out using the TaqMan Gene Expression Assays listed in Table 2.11 and the TaqMan Gene Expression Master Mix (Thermo Fisher Scientific,

Massachusetts, USA). B2M was used as a house-keeping gene. The Gene Expression assays and cDNA were thawed on ice then mixed by pipetting and centrifuged briefly. The Master

Mix was mixed by gently swirling the bottle. The assays and cDNA were stored on ice while preparing the reactions and the master mix was stored at room temperature. Each reaction was set up in a safe-lock tube as follows: 1 µl of 20X Taqman Gene Expression Assay, 10

µl of 2X Taqman Gene Expression Master Mix, 50 ng of cDNA (2 µl) and 5 µl of RNase- free water. Typically a master mix was made containing the required amounts of Taqman

Gene Expression Assay, Taqman Gene Expression Master Mix and RNase-free water and the cDNA was diluted 1:2 in RNase-free water. Therefore each reaction contained 4 µl of cDNA and 16 µl of the master mix and was prepared in triplicate. One well without cDNA was included as a no-template control. The reactions were pipetted onto a 96-well plate and the plate centrifuged briefly to collect all material at the bottom of the wells. The plate was sealed with sealing foil and placed into the StepOnePlus Real-Time PCR System (Applied

Biosystems, California, USA). The following parameters were used: 10 minute hold at 95

°C then 40 cycles of 95°C for 15 seconds and 60 °C for 1 minute. Once the PCR was complete the raw data were analysed using the StepOne Software and gene expression was

73 Materials and Methods calculated using the ΔCt method normalized to the housekeeping gene B2M then expressed relative to control samples.

Table 2.11: Taqman assays used for gene expression analyses

Gene Gene name Assay ID Exon boundary symbol B2M Beta-2-microglobulin Hs99999907_m1 2-3 CCR5 Chemokine (C-C motif) Hs00152917_m1 2-3 receptor 5 CXCR4 C-X-C motif chemokine Hs00976734_m1 1-2 receptor 4 CCR1 Chemokine (C-C motif) Hs00174298_m1 1-2 receptor 1 CCR3 Chemokine (C-C motif) Hs00266213_s1 2-2 receptor 3 CCR2 Chemokine (C-C motif) Hs00174150_m1 2-3 receptor 2 CCRL2 Chemokine (C-C motif) Hs00243702_s1 2-2 receptor-like 2

2.8 CHROMATIN-BASED STUDIES AND WHOLE TRANSCRIPTOME ANALYSIS 2.8.1 Chromatin immunoprecipitation

For each sample, one million cells were transferred to a safe-lock tube and centrifuged for 5 minutes at 400 × g and 8°C. The supernatant was removed and the pellet resuspended in 900

µl of culture medium. To each tube 100 µl of 10% PFA (Thermo Fisher Scientific,

Massachusetts, USA) freshly diluted in DPBS were added. The tubes were placed in a

MACSmix Tube Rotator (Miltenyi Biotec, BergischGladbach, Germany) and allowed to rotate for 10 minutes at room temperature. The formaldehyde was quenched by the addition of 100 µl of 1.5 M glycine and the samples again allowed to rotate for 10 minutes at room temperature. The samples were centrifuged at 600 × g for 10 minutes at 8°C then the supernatant removed. The pellets were washed with 500 µl of ice-cold DPBS then allowed to rotate for 5 minutes at 4°C. The samples were then centrifuged at 600 × g for 10 minutes

74 Materials and Methods at 8°C. The supernatant was removed completely then the pellet was frozen quickly in liquid

N2 and frozen at -80°C.

Protein G Dynabeads (Thermo Fisher Scientific, Massachusetts, USA) were vortexed to resuspend then blocking was performed by transferring 15 µl per sample into a safe-lock tube containing 1 ml DPBS + 0.5% BSA. The beads were allowed to rotate for 5 minutes at

4°C then the tubes centrifuged briefly and placed in the DynaMag 2 magnet (Thermo Fisher

Scientific, Massachusetts, USA). After 1 minute the supernatant was removed carefully and the wash repeated once more. The beads were resuspended in 200 µl of DPBS + 0.5% BSA per IP. Magnetic beads were bound with 3 µg of a ChIP-grade antibody against H3K9me3

(Abcam, Cambridge, UK) and incubation carried out at 4°C overnight. After incubation the tubes were placed on the magnet and the supernatant removed. The beads were then washed by the addition of 1 ml of DPBS + 0.5% BSA and rotation at 4°C for minutes. The wash was repeated once more with 1 ml of RIPA (50 mM Tris-HCl pH8.0, 150 mM NaCl, 0.1% SDS,

0.1% Na-Deoxycholate, 1% Triton X-100, 1 mM EDTA) then the supernatant removed.

Crosslinked cells were lysed directly in 300 µl of sonication buffer (10 mM Tris-HCl pH8.0,

0.5% SDS, 5 mM EDTA) containing protease inhibitor and incubated for 10 minutes on ice.

Sonication was performed on ice for 5 cycles at 10 seconds each (18% sonication amplitude) with 1 minute pauses on ice between cycles using the Bandelin Sonopuls sonicator

(Bandelin, Berlin, Germany). Lysates were cleared by centrifugation at 16 000 × g for 5 minutes at 4°C and 4 parts dilution buffer (10mM Tris-HCl pH8.0, 1.25% Triton X-100,

0.125% Na-Deoxycholate, 187.5 mM NaCl) containing protease inhibitor were added to 1 part sonicated chromatin. Lysates were either retained as the input (typically 10%) or incubated overnight at 4°C with the previously prepared magnetic beads. For each IP

75 Materials and Methods typically 400-600 µl of lysate were used. The next day, the beads were washed once with

500 µl of RIPA, once with 500 µl of RIPA 500 (50 mM Tris-HCl pH8.0, 500 mM NaCl,

0.1% SDS, 0.1% Na-Deoxycholate, 1% Triton X-100, 1mM EDTA), once with 500 µl of

LiCl wash (10 mM Tris-HCl pH8.0, 250 mM LiCl, 0.5% NP-40, 0.5% Na-deoxycholate, 1 mM EDTA) and finally twice with 500 µl of TE (10mM Tris pH8.0, 1 mM EDTA). Bound complexes were eluted from the beads in 200 µl of elution buffer (10 mM Tris-HCl pH8.0,

0.5% SDS, 300 mM NaCl, 5 mM EDTA) for 30 minutes at 65 °C with shaking then crosslinks were reversed overnight at 65°C. The volume of the input was made up to 200 µl with elution buffer and crosslinks reversed in the same way. The next day, the tubes were placed in the magnet and the supernatant transferred to a fresh safe-lock tube. RNA was digested in the supernatant by the addition of 1 µl of RNase A (4 mg/ml) and incubation at

37°C for 30 minutes. Protein was digested by the addition of 1 µl of Proteinase K (10 mg/ml) and incubation at 55°C for 1-2 hours. The DNA was purified using the ChIP DNA Clean &

Concentrator kit (Zymo Research, Freiburg im Breisgau, Germany) according to the manufacturer’s instructions and eluted in 60 µl of elution buffer (Qiagen, Hilden, Germany).

A summary of the workflow is depicted in Figure 2.2.

DNA purification & qRT-PCR

Crosslinking of proteins DNA Immmunoprecipitation and DNA

Figure 2.2: Summary of the chromatin immunoprecipitation workflow

Crosslinked cells are sonicated to fragment the DNA and immunoprecipitation carried out using an antibody corresponding to the modification of interest. Following purification, the DNA can be used for downstream processes such as qRT-PCR.

76 Materials and Methods

For the quantitative RT-PCR, the input was diluted 1:2 and 3 µl of the input or IP DNA added to a reaction containing 0.5 µl of each primer (10 µM), 10 µl of the QuantiTect SYBR

Green PCRMastermix (Qiagen, Hilden, Germany) and 7 µl of nuclease-free water. Reactions were performed in duplicate or triplicate with the primers indicated in Table 2.12. The following parameters were used: 10 minute hold at 95 °C then 40 cycles of 95°C for 15 seconds and 60 °C for 1 minute. The percentage of input was calculated using the ΔCt method using the input as a normalizer then expressed relative to the negative control site actin. The UNTR5 site was used as a positive control. The negative and positive control each correspond to a site showing low or high levels of H3K9me3 respectively, as determined using the UCSC Genome Browser.

Table 2.12: Primers used for ChIP qRT-PCR analysis

ID Purpose Sequence (5’-3’) Actin_F Actin qRT-PCR AGAAGTCGCAGGACCACACT Actin_R CAGCTCCAGGGTAAAAGGTG

UNTR5_F UNTR5 qRT-PCR CTGTACCTGGGGTTCATTCATT UNTR5_R CAGTAAGCCGTTCACTCTCACA

3580 CCR5 qRT-PCR amplicon AGTCTGACTACAGAGGCCACTGG 3581 #1 AGGCAAATGAGACCCCAAACAGC

3584 CCR5 qRT-PCR amplicon TGTATCTGGCATAGTGTGAGTCCTC 3585 #2 AGTTTTAACTATGGGCTCACGGGTG

4118 CXCR4 qRT-PCR amplicon GTAGCAAAGTGACGCCGAG 4119 #1 AGCAGGTTGAAACTGGACTT

4120 CXCR4 qRT-PCR amplicon TTGTTAAACTCTGTGCGGCC 4121 #2 CACCACTCGATCCCCTCAG

77 Materials and Methods

2.8.2 ATAC-seq

Chromatin accessibility analysis was carried out with the Assay for Transposase-Accessible

Chromatin and subsequent sequencing through collaboration with the AG Bossen at the CCI

(Figure 2.3). T cells nucleofected with DEM #6 or control mRNA from two independent experiments were harvested on day 12 or 13 and ATAC-Seq performed as previously described (161) with minor changes. After lysis, cells were centrifuged at 300 × g for 30 minutes then tagmentation (transposase-based fragmentation) performed at 37°C for 1 hour using the Nextera DNA Sample Preparation Kit (Illumina, California, USA). Purification of the tagmented DNA was carried out using the ChIP DNA Clean & Concentrator columns.

Amplification of library fragments was performed using the NEBNext Ultra II Q5 Master

Mix and custom Nextera PCR primers. The number of cycles was determined by quantitative

PCR as previously described (161). Libraries were purified and size-selected with AMPure

XP beads (Beckman Coulter, California, USA) then sequenced on a HiSeq 2000 system

(Illumina, California, USA) as single reads. Alignments to hg19 were carried out using

Bowtie software (http://bowtie-bio.sourceforge.net/index.shtml) with the parameter –m 1.

Data were analysed with the HOMER suite of tools (hypergeometric optimisation of motif enrichment) (http://homer.ucsd.edu/homer/). The parameter –tbp 1 was used to generate tag directories thus removing ‘reads’ arising from mitochondrial DNA. The Circos software package (http://circos.ca/software/) was used for data visualization and the data from the replicates experiments combined. To identify differences in regions of open chromatin, regions of open chromatin from Th1 cells

(www.encodeproject.org/experiments/ENCSR000EQE/) were annotated with the tag directories. Potential off-targets were determined as sites showing a 3-fold difference between the control and DEM-treated cells in both replicates and a minimum of 4 normalized tags in control cells.

78 Materials and Methods

Closed chromatin Open chromatin

Tn5 transposome

DNA purification, PCR & sequencing

Figure 2.3: ATAC-seq principle and workflow

Living cells are harvested then lysed and Tn5 tagmentation carried out. Regions of open, accessible chromatin are preferentially cleaved and sequencing adapters ligated. The tagmented DNA can then be purified, PCR amplified and sequenced.

2.8.3 RNA-seq

Whole transcriptome analysis was carried out via RNA sequencing (RNA-seq). T cells nucleofected with mRNA encoding the DEM #6 or the inactive control were harvested four days post nucleofection and whole RNA isolated. RNA was obtained from three independent experiments and sequencing performed at the sequencing facility of the Center for Genomics and Transcriptomics (CeGaT, Tübingen, Germany). Data analysis included demultiplexing of the sequencing reads using Illumina CASAVA (2.17) and adaptor trimming with Skewer

(162)(version 0.1.116). Alignments of trimmed raw reads and the human reference genome

79 Materials and Methods

(hg19) were carried out using STAR(163) (version 2.5.1). Read counts were normalized using the R package DESeq2 and further analysis carried out with Microsoft Excel.

2.9 STATISTICAL ANALYSIS

All experiments were carried out at least three times unless otherwise specified. The error bars indicate the standard error of the mean (s.e.m). Statistical significance was calculated using a two-tailed, unpaired Student’s t-test. In experiments requiring normalisation to a control sample a two-tailed, paired Student’s t-test was performed.

80 Results

3. RESULTS

The emergence and development of tools which allow for the specific epigenetic modification of human genes presents exciting opportunities in the fields of biology and medicine. This emergence was preceded by the discovery and exploitation of designer nuclease technology which enabled the precise and permanent modification of complex genomes. In the context of HIV, various strategies have been developed using the ZFN,

TALEN and CRISPR/Cas9 platforms (reviewed in (80) going as far as human clinical trials

(66). However, while gaining widespread use and popularity, gene disruption strategies are still associated with detrimental effects attributed to nuclease-mediated off-target activity

(87). The modulation of gene expression is also possible through the use transcription factors which can directly up- or down regulate the expression of a gene of interest however their use has been restricted due to their transient activity. In contrast, epigenome modifications present the opportunity to achieve stable regulation of gene expression while leaving the

DNA sequence unchanged. In so doing, targeted epigenome editing may improve the safety profile of gene-therapy based treatment strategies and facilitate advancement to the clinic and application to diverse disease models. This thesis presents a novel TALE-based epigenome editing platform which allows for the specific, potent and long-term repression of gene expression in clinically relevant cells without significant off-target effects.

81 Results

3.1 FUNCTIONALITY OF THE DESIGNER TRANSCRIPTION ACTIVATORS 3.1.1 CCR5 target sites and Firefly Luciferase-based reporter

Potential CCR5 target sites were identified by examining the proximal promoter region at a distance of -400 to +1 bp relative to the transcription start site (TSS). Special consideration was given to DNase 1 hypersensitivity sites as an indicator of accessible chromatin and sequences which fulfilled the 5’ T requirement for TALE DNA-binding domains (described in 2.2.1). Six sites targeting the + or - strand were chosen (Figure 3.1 (top)) and the corresponding TALE arrays generated. To generate the Firefly Luciferase-based reporter a

320 bp region of the endogenous CCR5 proximal promoter was PCR amplified from Jurkat cells and cloned upstream of a Mini-TK promoter driving the expression of the Firefly

Luciferase gene giving the Firefly Luciferase reporter (Figure 3.1 (middle)).

400 bp 1 2 4 6 CCR5 1 2A 2B 3 3 5 +1

Promoter Mini-TK Luciferase Firefly Luciferase reporter

Promoter mCMV EGFP EGFP reporter LTR LTR 320 bp

Figure 3.1: Schematic of the CCR5 gene and the Firefly Luciferase and EGFP reporters

Regions of accessible chromatin within the CCR5 proximal promoter were identified and six target sites (1-6) targeting the + and - strand chosen. A DNase I hypersensitivity site is highlighted in grey. The target sites are indicated in red and the blue boxes represent exons. A 320 bp sequence from the endogenous CCR5 promoter was PCR amplified from Jurkat cells and cloned into a plasmid in which the Firefly Luciferase gene is driven by the Mini-TK promoter (middle). To generate the EGFP reporter the CCR5 promoter sequence was fused to a minimal CMV promoter driving the expression of EGFP (bottom).

82 Results

3.1.2 Functionality and synergy of the DTAs

In order to validate the six chosen target sites initial experiments were carried out using the

Firefly Luciferase reporter and designer transcription activators (DTAs) which would activate the expression of Firefly Luciferase upon binding of the TALE DNA-binding domain to its target within the CCR5 promoter sequence. In cells which do not express

CCR5, such as HEK293T cells, the CCR5 promoter in conjunction with the Mini TK promoter drive very low basal levels of Firefly Luciferase therefore this reporter provided an ideal system with which to test the activator constructs.

The DTAs were generated by fusing the corresponding TALE arrays to a VP16 transactivator domain (152) (Figure 3.2A (top)). Co-transfections of the Firefly Luciferase reporter, a Renilla Luciferase expression plasmid and 400 ng of each effector or a control plasmid lacking a CCR5-specific DNA-binding domain were carried out in HEK293T cells and the Dual Luciferase Assay performed after 24 hours. Of the six DTAs, three resulted in significantly increased Firefly Luciferase activity compared to the control (DTAs #2, #5 and

#6) and up to 4.5 fold (DTA #6) (Figure 3.2A (bottom)). In order to further characterise the activator constructs, transfections were carried out with combinations of two DTAs.

Interestingly, the highest activation was observed each time in combination with the DTA targeting position 6 (combinations #2 + #6 and #5 + #6) (Figure 3.2B). Subsequently, these two combinations were combined with a third DTA in order to evaluate whether the extent of activation would increase. Indeed, the DTAs exhibited synergistic activity with the combination of DTAs #2+ #5+ #6 resulting in the activation of Firefly Luciferase activity up to 27-fold (Figure 3.2C).

83 Results

In order to evaluate whether this higher activation was as a result of the synergistic activity of the DTAs acting simultaneously at distinct positions within the promoter and not simply because of the presence of three times the amount of activator, experiments were carried out in which HEK293T cells were transfected with triple the amount of each DTA. In this scenario the activation observed was very similar to that observed with one dose (i.e. 400 ng) of each DTA (Figure 3.2A) with activation of Firefly Luciferase activity not surpassing

5-fold (Figure 3.2D) thus indicating that the DTAs could indeed function in a synergistic manner.

To assess whether the activation observed was due to the transactivation domain, transfections were carried out with the corresponding DNA-binding domain fused to a Fok1 endonuclease domain. It has been shown in the case of CRISPR interference (CRISPRi) that the binding of a catalytically inactive ‘dead’ Cas9 is able to induce gene repression independent of an effector domain through blocking transcription initiation or elongation

(164). However, in the case of the DTAs the TALE nuclease subunits could not induce

Firefly activation above basal levels (Figure 3.2E) indicating that the activation observed in previous experiments was indeed as a result of the VP16 transactivator. Taken together, these data show that all six DTAs were able to recognize their intended target sites and could activate CCR5 gene expression in a synergistic manner as previously shown (117).

84 Results

A B #1+ #2+ #3+ #4+ #5+ #6+ VP16 DTA

6 12 * 4 8 ** * 2 4

0 0 Activation torel. control Activation torel. control

C D E #2 + #6 + #5 + #6 + Triple amount TALEN subunit

30 8 2 6 20 4 1 10 2 0 0

Activation torel. control Activation torel. control Activation torel. control

Figure 3.2: Evaluation of the functionality of the designer transcription activators (DTAs)

(A) Assessment of DTA function and binding. DTAs were generated by fusing a TALE DNA-binding domain targeting positions #1 to #6 to a VP16 transactivator domain (top). Co-transfections with the Firefly Luciferase reporter, a Renilla Luciferase plasmid and 400 ng of each DTA were carried out in HEK293T cells. The Dual Luciferase Assay was performed after 24 hours. A shuttle plasmid containing the VP16 activator domain but lacking a CCR5-specific DNA-binding domain was used as a negative control. Activation is expressed relative to the control-treated cells (mean ± s.e.m, experiments were performed at least 3 times in duplicate). Statistical significance was calculated using a two-tailed paired Student’s ttest (*p<0.05, **p<0.01). (B) Assessment of the synergistic activity of the DTAs. HEK293T cells were transfected with combinations of two DTAs. The combinations resulting in the highest Firefly Luciferase activation (#2 + # 6 and #5 + #6) are indicated in blue. (C) Evaluating the optimal DTA combinations for Firefly Luciferase activation. HEK293T cells were transfected with combinations of three DTAs. A third activator was added to the two best combinations of two DTAs. The best combination (#2 + #6 + #5) is highlighted in blue. (D) Assessing the effect of tripling the amount of effector targeting each position. Transfections were carried out with triple the amount of each DTA. (E) Characterizing DTA activity. Transfections were carried out with the corresponding TALE nuclease subunit for each target site. Activation is expressed relative to the control-treated cells (indicated by a dashed line).

85 Results

3.2 GENERATION OF A GFP-BASED REPORTER AND REPORTER CELL LINE

As HEK293T cells do not express CCR5, a system was constructed to achieve basal levels of reporter gene expression and to enable assessment of the functionality of the repressors.

To this end, the previously described 320 bp CCR5 promoter fragment was transferred into a plasmid in which a minimal CMV promoter (minCMV) drives the expression of an enhanced green fluorescent protein (EGFP) gene giving the EGFP reporter (Figure 3.1

(bottom)). This construct was then used as the transfer plasmid for the preparation of virus using the third-generation lentivirus system. This virus was used to perform a titration in

HEK293T cells to establish the optimal amount of virus which would yield a low percentage of EGFP+ cells thus indicating a low integrated copy number of the reporter in the HEK293T cells. Three days post-transduction flow cytometry was carried out and the percentages of

EGFP+ cells ranged from 68.3% to 2.9% corresponding to 200 μl and 1 μl of virus respectively (Figure 3.3A). The cell viability as indicated by the SSC/FSC was similar for all amounts of virus compared to the untreated cells. The cells transduced with 1 μl of virus corresponding to an MOI of 0.03 were selected for expansion and single cell sorting performed 17 days post-transduction. The single cells were expanded then flow cytometry performed to identify the clones suitable for the establishment of a reporter cell line. Two examples of the clones obtained are given in Figure 3.3B. Clone A with only ~69% of

EGFP+ cells would have been discarded whereas Clone B with almost 100% EGFP+ cells would have been retained and one such clone was expanded and later referred to as the

HEK293T-EGFP cell line.

86 Results A B

Figure 3.3: Establishment of an EGFP reporter cell line

(A) Viral transduction titration. The EGFP reporter was used to generate VSV-G pseudotyped lentivirus and a titration carried out in HEK293T cells. Transduction using 1µl - 200 µl of virus was carried out using protamine sulfate and spinoculation performed at 37°C for 1 hour. Flow cytometry was performed three days post transduction. Cell viability is given the SSC/FSC plots (top) and the percentage of EGFP+ cells given by the EGFP/FSC plots (bottom). Untransduced cells (NT) were included as a control. (B) Single-cell sorting to obtain the HEK293T-EGFP reporter cell line. The cells transduced with 1 µl of virus corresponding to an MOI of 0.03 were expanded and single-cell sorting performed 17 days post transduction. Two examples of the clones obtained are given. Clones such as Clone A showing ∼69% of EGFP+ cells would have been discarded and clones similar to Clone B showing ∼100% EGFP+ cells would have been retained to form the HEK293T-EGFP cell line.

87 Results

3.3 REPRESSION OF CCR5 EXPRESSION IN A REPORTER CELL LINE

Once the reporter cell line had been established the functionality of the designer transcription repressors (DTRs) could then be assessed as the reduction of EGFP signal via flow cytometry. The DTRs were generated by fusing the previously described DNA-binding domains to the KRAB repressor domain (Figure 3.4A (top)). DNA transfections were carried out with either 400 ng of each of the six DTRs or a control plasmid lacking a CCR5-specific

DNA-binding domain and flow cytometry carried out at specified time-points. Seven days post transfection, the reduction of EGFP signal ranged from 10-20% for all the DTRs tested

(Figure 3.4A (bottom)).

In a similar fashion to the activation experiments, transfections of different combinations of repressors were carried out. Initially, combinations of two repressors were tested. In this case repressors #3, #5 and #6 were selected for further testing based on their performance as single repressors (Figure 3.4A) and also taking into account the results from the activation experiments (Fig 3.2). In addition, the repressor targeting position 2 was included based on the results from the activation experiments even though it seemed to exhibit lower functionality as a repressor (Figure 3.4A). Interestingly, the constructs targeting positions 5 and 6 again exhibited higher functionality (up to 25% reduction with #5+#6) but in this case in combination with the DTR targeting position 3 instead of position 2 as observed in the activation experiments (Figure 3.4B). These data indicated that the repressors could function synergistically in a similar manner as the activators.

As a next step, combinations of three repressors were then investigated. Combinations of

DTRs targeting positions 2, 3, 5 and 6 were tested and the best combinations from the activation experiments (#2+ #6+ #5, #2+ #6+ #1, #2+ #6+ #4) were also included.

88 Results

Interestingly, adding a third repressor did not increase the extent of repression as previously observed with activators (Figure 3.4C). Taken together, these data showed that with this platform up to two repressors were sufficient to induce repression of EGFP expression.

Constructs targeting position 6 of the CCR5 promoter were used for subsequent experiments owing to consistently good performance from both the activation and repression experiments. Subsequently, the cells transfected with either the control plasmid or DTR #6 were monitored long-term up to 31 days. Significant repression of EGFP expression could be observed until day 21 but by day 31 levels of EGFP+ cells similar to the control-treated cells could be observed in a manner consistent with a transiently expressed repressor (Figure

3.4D).

89 Results

A B

KRAB DTR

100 100 ** ** ** ** ** ** ** ** ** ** ** ** 90 90

positive cells - 80 positive cells 80 - 70 70 10 10 0 0 % of EGFP % of % of EGFP % of

C D

mock DTR #6

100 100 ** ** ** ** ** ** 90 90 positive cells - positive cells 80 80 ** - ** ** 70 70 ** 10 10 EGFP 0 0 % of EGFP % of

% of 4 7 11 21 31 Days post-transfection

Figure 3.4: Assessing the functionality of the designer transcription repressors (DTRs)

(A) Evaluating the activity of single DTRs. The DTRs were generated by fusing each TALE DNA-binding domain corresponding to positions 1-6 to a KRAB repressor domain (top). Transfections were performed in the HEK293T-EGFP reporter and flow cytometry carried out after 7 days. Cells transfected with a control plasmid containing the KRAB repressor domain but lacking a CCR5-specific DNA-binding domain were included as a negative control (mock). The dashed line represents the level of EGFP+ cells in the mock-treated cells. (mean ± s.e.m, experiments were performed at least 3 times in duplicate). Statistical significance was calculated using a two-tailed unpaired Student’s ttest (**p<0.01). (B) Evaluating the synergistic activity of the DTRs. The HEK293T-EGFP cells were transfected with combinations of two DTRs targeting positions 2,3,5 and 6. Flow cytometry was carried out after 7 days to determine EGFP expression (mean ± s.e.m, experiments were performed at least twice in duplicate). (C) Evaluating the effect of three DTRs on EGFP expression. Transfections were carried out with combinations of 3 repressors targeting positions 2,3,5 and 6. The three best combinations from the activation experiments were included (#2 +#6# 5, #2 +#6 +#1, #2 + #6 + #4). Flow cytometry was carried out after 7 days (mean ± s.e.m, experiments were performed at least twice in duplicate). (D) Assessing the long-term activity of the DTRs. HEK293T-EGFP cells transfected with DTR #6 or the control plasmid were monitored via flow cytometry up to 31 days post transfection (mean ± s.e.m, experiments were performed at least 3 times in duplicate). Statistical significance was calculated using a two-tailed unpaired Student’s ttest (**p<0.01).

3.4 GENERATION OF THE DEM CONSTRUCTS

Having observed the transient activity exhibited by the DTRs, the next step was to develop a platform which would enable the potent and stable repression of gene expression. To this the, DNA methyltransferases and the possibility of harnessing their silencing properties as

90 Results previously described (165) were considered. Therefore, the designer methyltransferase

(DMT) comprising the C-terminal domain of the human DNA methyltransferase 3A

(DNMT3A) the C terminal domain of the murine DNA methyltransferase 3-like (Dnmt3L) regulatory factor was generated in order to assess the silencing properties of the DNA methyltransferases alone. Furthermore, the designer epigenome modifier (DEM) which additionally contains an N-terminal KRAB repressor domain was generated. To serve as a negative control a construct lacking the KRAB domain and containing the inactivating

E752A amino acid substitution in the catalytic site of the DNMT3A domain was generated giving the inactive designer methyltransferase (dDMT) (Figure 3.5).

Linker

Dnmt3L DMT DNMT3A

dDNMT3A Dnmt3L dDMT

KRAB DNMT3A Dnmt3L DEM

Figure 3.5: Structure of the designer epigenome modifiers (DEMs)

The designer methyltransferase (DMT) (top) was generated by linking a TALE DNA-binding domain to a single chain fusion of the C termini of the human DNMT3A and murine Dnmt3L. An inactivating amino acid substitution was introduced into the catalytic domain of the DNMT3A domain to create the dDMT inactive control (middle). To generate the designer epigenome modifier (DEM) (bottom) the DNMT3A and Dnmt3L domains were combined with the KRAB repressor domain.

3.5 MRNA PRODUCTION VIA IN VITRO TRANSCRIPTION

In order to generate mRNA encoding the DEM constructs, the T7 promoter found in the common backbone which all the constructs share was used. After plasmid linearization, the in-vitro transcription process took place in two steps which involved 5’ capping and poly

(A) tailing of the transcript. After recovery via lithium chloride precipitation the success of

91 Results in vitro transcription could be assessed by loading the mRNA on an RNA formaldehyde gel alongside the untailed control mRNA (Figure 3.6A). A construct encoding GFP was included for later use to monitor transfection efficiency. For all constructs, the capped and tailed in vitro-transcribed mRNA appears as a distinct band and runs higher on the gel compared to the untailed control. Transfection of the GFP mRNA into HEK293T cells using lipofectamine resulted in high levels of GFP expression (about 94%) two days post transfection (Figure 3.6B).

A

B

Figure 3.6: In vitro transcription of DEM mRNA

(A) Gel electrophoresis quality control of in-vitro transcribed mRNA. mRNA encoding GFP, dDMT and DEM is depicted. After in-vitro transcription the mRNA was loaded onto an RNA formaldehyde gel alongside the corresponding un-tailed control mRNA. M: size marker. (B) Flow cytometry of HEK 293T cells two days following transfection with mRNA encoding for GFP. Untreated cells (top) and transfected cells (bottom) are shown.

92 Results

3.6 LONG-TERM REPRESSION OF EGFP EXPRESSION IN A REPORTER CELL LINE

The in vitro transcribed mRNA was delivered to the reporter cells via lipofection and gene repression evaluated as the decrease in EGFP+ cells via flow cytometry at specified time points. The DTR, DMT and DEM targeting position 6 were all able to induce significant silencing of EGFP expression after 6 days (Figure 3.7A). Notably, the DEM resulted in

EGFP silencing in about 80% of the cells. The dDMT construct exhibited no silencing effect and was thus used as a negative control. Long-term monitoring of the cells revealed that the activity of the DTR was transient as previously observed, but so was the activity of the DMT which resulted in EGFP levels similar to the control dDMT construct after about one month.

In contrast, the DEM induced potent and stable silencing of EFGP expression which was maintained until the experiment was terminated after 65 days.

Having observed the difference in activity between the DTR, DMT and DEM constructs, this discrepancy was further investigated. In a previous study (165), the DNMT3A, 3L and

KRAB components had been delivered on separate mRNA molecules therefore experiments were carried out to evaluate the merits, if any, of the new platform comprising of all three components combined on one mRNA molecule. To this end, transfections were carried out in which either the dDMT targeting position 6, the DEM targeting position 6, the DTR targeting position 3, the DMT targeting position 6 or a combination of the DTR #3 and DMT

#6 were delivered. As previously observed the DMT and DTR alone exhibited transient activity (Figure 3.7B). Surprisingly, the two domains delivered simultaneously resulted in potent and stable silencing of gene expression, albeit at a lower extent compared to the DEM

93 Results construct. Taken together, these data highlight the efficiency and superiority of the combined architecture compared to delivering the three components on distinct mRNA molecules.

A dDMT #6 DMT #6 DTR #6 DEM #6 100 80 ** ** ** * 60 ** ** ** positive cells

- 40 20 ** ** ** ** ** **

% of EGFP % of 0 2 6 13 19 23 31 65 Days post transfection

B dDMT #6 DMT #6 DTR #3 DEM #6 DMT #6 + DTR #3

100 80 ** ** ** ** * 60 ** positive cells - ** ** ** 40 ** ** ** ** 20 ** ** ** ** ** ** EGFP % of 0 2 6 13 19 23 31 65 Days post transfection Figure 3.7: Assessing the stability and potency of designer epigenome modifier (DEM)-induced silencing

(A) Assessing the functionality of the DEMs in a reporter cell line. HEK293T-EGFP cells were transfected with in-vitro transcribed mRNA encoding the dDMT, DMT, DTR and DEM targeting position 6. Flow cytometry was carried out to determine the reduction of EGFP expression. (B) Assessing the functionality of the DEM split architecture. Transfections of the HEK293T-EGFP cells were carried out using mRNA encoding the dDMT #6, DEM # 6, DMT #6, #DTR #3 and a combination of the DMT # 6 and DTR #3. The dDMT was used as a negative control (mean ± s.e.m, experiments were performed at least 3 times in duplicate). Statistical significance was calculated using a two-tailed unpaired Student’s ttest (*p<0.05, **p<0.01).

3.6.1 mRNA vs DNA delivery of DEMs resulted in potent silencing of gene expression

Having observed the potency of the DEMs delivered as mRNA, the next step was to determine if the method of delivery also contributed to their efficacy. To this end, DNA and mRNA transfections of the DEM targeting position 6 into the HEK293T-EGFP cells were performed and flow cytometry carried out after 6 days. Indeed, there was a significant

94 Results difference in the silencing observed with mRNA compared to DNA, a reduction of 30% and

80% respectively was observed (Figure 3.8). The superiority of mRNA delivery was important because in our laboratory we have observed that DNA delivery into CD4+ cells results in high toxicity (data not shown) therefore mRNA delivery was carried out for subsequent experiments.

Day 6 post transfection dDMT #6 DEM #6

** 100 80 60 positive cells - 40 20 0 EGFP % of DNA mRNA Figure 3.8: Assessing the effect of DNA vs mRNA DEM delivery

HEK293T-EGFP cells were transfected with DEM#6 and dDMT DNA or mRNA. Flow cytometry was carried out 6 days post transfection to determine EGFP silencing. (mean ± s.e.m). Statistical significance was calculated using a two-tailed unpaired Student’s ttest (**p<0.01).

3.7 DNA METHYLATION ANALYSIS IN A REPORTER CELL LINE

In order to demonstrate that EGFP silencing was indeed as a result of DEM-induced DNA methylation, bisulfite sequencing analysis was carried out. In this assay genomic DNA is isolated from cells and bisulfite conversion of the DNA carried out. In this process unmethylated cytosines in CpG diresidues are converted to uracil while methylated cytosines remain unchanged. Following bisulfite conversion PCR amplification of desired regions is carried out and the uracil residues (formerly cytosine) are converted to thymine. Sub-cloning is then carried out and after transformation individual colonies can be picked and Sanger sequencing carried out to evaluate the extent of DNA methylation using software such as

QUMA (http://quma.cdb.riken.jp). CG to TG mutations indicate unmethylated cytosines

95 Results whereas unchanged CG diresidues represent methylated cytosines (Figure 3.9A). Therefore

HEK293T-EGFP cells were transfected with mRNA encoding DEM # 6 or the control dDMT and bisulfite sequencing analysis within a range of 500 bp from the DEM #6 binding site carried out at specified time points. Two days post transfection about 60% of CpG methylation could be observed (Figure 3.9B (top)). The average methylation increased to about 80% six days post transfection and remained stable until day 31 when the experiment was terminated (Figure 3.9B (bottom)). These data confirm that DEM-induced DNA methylation had taken place and highlight the fast kinetics of de novo DNA methylation which can be achieved using this platform. In addition, DNA methylation was maintained over multiple cell divisions highlighting the stability of the epigenetic modifications effected by the DEMs.

It is known that DNA methylation has the capacity to spread owing to the propensity for the

DNA methyltransferases to induce de novo DNA methylation of CpG dinucleotides near pre-existing methylated cytosines (166) therefore methylation spreading in the reporter cells was evaluated. To this end, different amplicons at increasing distances from the DEM #6 binding site were investigated and bisulfite analysis carried out one month post transfection.

An average of about 80% of CpG methylation could be detected up to a distance of 1 kb from the binding site whereas methylation decreased to about 33% at a distance of 2 kb

(Figure 3.9C). Taken together these data show that DEMs are capable of inducing DNA methylation over a wide range but at the same time this range may not be so wide as to cause unwanted off-target effects.

96 Results

As previously mentioned, the classical strategies based on gene disruption result in irreversible changes to the genomic sequence which can be deleterious when occurring at off-target sites. Apart from not altering the DNA sequence, epigenome modifications may have the added advantage of being reversible. Agents such as 5-AZA block de novo DNA methylation and have previously been used to reverse the effects of the DNA methyltransferases (167). In order to evaluate whether DEM-induced DNA methylation was reversible, FACS of the HEK293T-EGFP cells treated with the DEM #6 was carried out to enrich for the EGFP negative cells. Reactivation was then carried out in two ways: 1: transfections were carried out with 1150 ng of the DTA #6 or the control plasmid lacking a

CCR5-specific DNA-binding domain, 2: the cells were treated daily with 10 μM of the DNA demethylating agent 5-AZA or left untreated. Flow cytometry was carried out on day 3 and

6 and reactivation evaluated as an increase in EGFP signal. On day 3 there was a significant increase in EGFP positive cells with both the DTA transfection (2 fold increase) and the 5-

AZA treated cells (12-fold) compared to the mock or untreated cells respectively (Figure

3.9D). However, by day 6 only the 5-AZA treated cells still exhibited significant EGFP activation, with reactivation increasing to 20-fold. Taken together, these data suggest that the transcriptional activation alone was not able to fully overcome the DEM-induced epigenetic modifications and the inhibition of de novo DNA methylation was critical in restoring EGFP expression.

97 Results

A B

500bp Day 2

#6 dDMT

C 6 LTR LTR Prom. mCMV EGFP

DEM #6 4.8 Kb methylated unmethylated

dDMT #6 DEM #6 dDMT #6 DEM #6 100 100 ** ** 80 80 ** 60 60 40 40 20 20

0 methylation% CpG of 0 methylation% CpG of 2 6 31 Days post transfection

Distance from binding site (kb)

D

VP16 TALE-DBD DTA mock DTA #6 NT 5-AZA

6 24 ** 20 4 16 ** ** 12 2 8

Fold activation 4 Fold activation 0 0 2 6 2 6 Days post transfection Days post treatment

Figure 3.9: DNA methylation analysis in a reporter cell line

(A) Bisulfite sequencing schematic. Bisulfite conversion of genomic DNA results in the conversion of unmethylated cytosines to uracil residues whereas methylated cytosines remain cytosines. After PCR amplification the uracils are converted to thymines and in comparison to a reference sequence CG and TG diresidues indicate methylated and unmethylated cytosines respectively. (B) Assessing DEM-induced methylation in a reporter cell line. Bisulfite sequencing was performed on day 2, 6 and 31 in HEK293T-EGFP cells transfected with either DEM or dDMT mRNA. The methylation of individual cytosine residues in a 500 bp region encompassing the DEM #6 binding site is indicated (top) and the overall methylation is presented in the histogram (bottom) (mean ± s.e.m). (C) Assessing the spreading of DEM-induced DNA methylation. Bisulfite sequencing was carried out up to a distance of 2 kb up- and downstream from the DEM #6 target site in HEK293T-EGFP transfected with the dDMT or DEM #6. Analysis was performed 31 days post transfection. The integrated reporter and the analysed region are depicted (top) and the methylation analysis is summarised in the histogram (bottom) (mean ± s.e.m). (D) Assessing the reactivation of EGFP expression. Fluorescence- activated cell sorting (FACS) was carried in cells transfected with mRNA encoding the DEM #6 to enrich for the EGFP negative cells. Reactivation was performed either via transfection with the designer transcription activator (DTA) (depicted top) targeting position 6 (left) or 10 µM of the DNA demethylating agent 5-AZA (right). Flow cytometry was carried out 3 or 6 days following transfection or the beginning of 5-AZA treatment to determine the extent of EGFP reactivation. The DTA-

98 Results induced reactivation is expressed relative to a control plasmid containing the VP16 activator domain but lacking a CCR5-specific DNA-binding domain. The 5-AZA reactivation is expressed relative to untreated cells. The dashed line represents the reference value in the control-treated cells (mean ± s.e.m). Statistical significance was calculated using a two-tailed paired Student’s ttest (**p<0.01).

3.8 DEM-INDUCED GENE SILENCING IN HEK293T CELLS

Having demonstrated the functionality of the DEMs in a reporter system and at an integrated reporter gene, the next step was to evaluate activity in a chromosomal context. To this end, four DEM constructs targeting the first exon and intron of the CXCR4 gene were generated.

L1 and L2 target the + strand and R1 and R2 target the - strand (Figure 3.10A). CXCR4 encodes for the CXCR4 receptor which T tropic strains of HIV use for viral entry therefore it has also been targeted in efforts to develop treatment strategies against HIV infection.

However, unlike CCR5 it is crucial for several processes including organogenesis and embryogenesis (168) therefore the complete knockout of CXCR4 expression only in T-cell based therapies would be feasible.

HEK293T cells express CXCR4 and presented an ideal system in which to test the DEMs.

Therefore, mRNA transfections were carried out and CXCR4 silencing evaluated by measuring the mRNA and protein levels via quantitative RT-PCR and flow cytometry respectively. Two days post transfection all four constructs significantly decreased mRNA levels about 3.4 –fold compared to the control inactive construct (Figure 3.10B). Similarly, the surface expression of the receptor was decreased by an average of 1.8-fold. However, by day 20 there was only a significant difference in protein levels compared to the control

(Figure 3.10C). Nevertheless, the trend of reduced transcript in the DEM-treated cells persisted. As the DEM #R2 seemed to exhibit the best activity this construct was selected for subsequent analyses and experiments.

99 Results

To again demonstrate that CXCR4 silencing was a result of DEM-induced methylation, bisulfite sequencing analysis was carried out. The methylation 20 days post transfection was assessed in four regions centred 400 bp around the binding site for the DEM #R2. About 1% methylation was observed in the control treated cells whereas the DEM-treated cells revealed up to 22% of CpG methylation (Figure 3.10D) thus confirming that the silencing observed was again indeed as result of DEM activity. In addition, these results demonstrate that DEM- induced modifications are maintained across multiple cell divisions in the context of endogenous genes as well. Taken together these results demonstrate that the DEMs exhibit functionality in reporter systems as well as at clinically relevant endogenous genes.

100 Results

A L1 L2

CXCR4 1 2 R1 R2

B C

1.8 1.2 1.5 0.9 1.2 * ** 0.9 0.6 protein levels * ** 0.6 0.3 0.3 Relative mRNA levels Relative 0 0 - L1 L2 R1 R2 - L1 L2 R1 R2 - L1 L2 R1 R2 - L1 L2 R1 R2

Day 2 Day 20 Day 2 Day 20

400bp D #4 #3 #2 #1 CXCR4 1 2 R2

dDMT DEM #R2

30 ** ** ** 20 ** 10

0 methylation% CpG of #1 #2 #3 #4 Amplicon Figure 3.10: DEM-induced silencing of the endogenous CXCR4 gene

(A) Schematic of the CXCR4 gene. Four DEM constructs targeting the + (L1 & L2) and – (R1 & R2) strand of the first exon and intron of the CXCR4 gene were generated. Target sites are indicated in red. Blue boxes represent exons. (B)(C) Assessing DEM-induced CXCR4 silencing. HEK293T cells were transfected with mRNA encoding the 4 DEM constructs and qRT-PCR analysis (B) and flow cytometry (C) carried out 2 days post transfection to determine CXCR4 mRNA and protein levels respectively. The dDMT L1 was used as a negative control. mRNA levels are expressed relative to B2M and protein levels are expressed relative to untreated cells (mean ± s.e.m, n=6 (B), n=5 (C)). Statistical significance was calculated using a two-tailed paired Student’s ttest (*p<0.05, **p<0.01). (D) CXCR4 bisulfite sequencing analysis. Bisulfite sequencing was performed 20 days following transfection with the DEM R2 or the L1 inactive control in a region of 400 bp encompassing the R2 binding site. (mean ± s.e.m, n=4). Statistical significance was calculated using a two- tailed paired Student’s ttest (*p<0.05, **p<0.01).

101 Results

3.9 DEM-INDUCED GENE SILENCING IN CD4+ CELLS

Once the functionally of the DEMs in reporter systems and in a cell line had been demonstrated successfully, DEM activity in clinically relevant primary human T cells was evaluated. In order to circumvent the high cost and side effects associated with antiretroviral treatment (169, 170), focus has turned towards developing gene therapy-based treatment strategies against HIV infection. A popular approach has been the designer nuclease- mediated disruption of CCR5 to generate immune cells resistant to HIV infection (66). This study sought to evaluate DEM-induced gene silencing in the context of efficacy and safety thus presenting an alternative to the classical gene knockout approaches.

As a first step towards testing the DEMs, culture conditions to enable long-term culture of the T cells with high viability were developed (Figure 3.11A). These culture conditions involved activating the cells on the day of thawing with beads conjugated with antibodies against CD2, CD3 and CD28. The activation beads were removed three days later on the day of nucleofection. Following nucleofection the cells were passaged every three days and cultured in medium supplemented with IL2. Harvesting for analysis and reactivation were performed every seven days post thawing with the final analysis only being done on cells harvested on day 7 and day 21. Activation was carried out for three days each time. Having established the optimal culture conditions, nucleofections were carried out with mRNA encoding the best performing CCR5- and CXCR4-specific DEMs as well as a GFP reporter to monitor transfection efficiency. After 24 hours flow cytometry was carried out and about

50% cell death observed as a result of the nucleofection. However, in the remaining cells over 90% GFP positive cells were obtained indicating that the nucleofection had been successful (Figure 3.11B).

102 Results

A

A A H H

0 3 7 10 14 17 21 T B B B Days post A N thawing

B

untreated

67.5%

0.3%

GFP

SSC GFP

35,9%

93.2%

FSC FSC

Figure 3.11: T cell long-term culture conditions and nucleofection

(A) Time line of T cell experiments. On day 0 cells were thawed (T) and activated (A) with beads conjugated with antibodies against CD2, CD3 and CD28 for 3 days. The beads (B) were removed prior to nucleofection and 3 days after each successive reactivation. Nucleofections (N) were carried out with in-vitro transcribed mRNA encoding the DEM and corresponding control dDMT. Cells were harvested (H) for analysis every 7 days and reactivated to continue the culture. (B) Assessing nucleofection efficiency in CD4+ cells. After three days of activation CD4+ cells were nucleofected with GFP mRNA and flow cytometry performed after 24 hours to determine GFP expression. Cell viability is given by the SSC/FSC plot and GFP expression is given by the GFP/FSC plot. Untreated cells were included as a negative control.

103 Results

Four days post nucleofection the mRNA and protein levels were evaluated via qRT-PCR analysis and flow cytometry respectively. In cells treated with the DEM#6 a 1.8 and 1.6-fold decrease in mRNA (Figure 3.12A) and protein levels (Figure 3.12B) respectively was observed whereas in cells which received the DEM #R2 no significant difference was observed in mRNA (Figure 3.12C) and protein levels (Figure 3.12D) compared to the inactive control. However, when the cells were evaluated after 21 days in culture the silencing effect of the DEM #6 had been lost (Figure 3.12 A & B). Surprisingly, at this later time point a 1.6-fold reduction in both mRNA and protein levels in cell which had received the DEM #R2 could be observed (Figure 3.12 C & D). A comparison of the CCR5 and

CXCR4 levels on day 7 revealed that basal expression of CXCR4 transcripts is 20-fold higher than for CCR5 (Figure 3.12E). Therefore, this may have been the reason only a moderate effect was observed at the earlier time point. Furthermore, it is also possible that the kinetics of DEM silencing at these two loci are different resulting in a delayed effect at the CXCR4 gene. As done with the DTAs and DTRs, the synergistic effect of the DEMs in

CD4+ cells was investigated. Cells either received the DEMs #5 and #6 or L2 and R2 targeting CCR5 and CXCR4 respectively. Interestingly, in this context the addition of a second DEM did not have any effect on either mRNA or protein levels for both CCR5

(Figure 3.12 A & B) and CXCR4 (Figure 3.12 C & D) thus suggesting that one DEM may be sufficient to induce effective silencing.

In addition, the functionality of the split platform previously described was investigated.

Surprisingly, in the context of the T cells no significant differences in mRNA (Figure 3.12F) or protein levels (Figure 3.12G) in the DEM and control-treated cells could be observed in contrast to the reporter experiments (Figure 3.7). Taken together, these data demonstrate that the DEMs are functional in primary human cells, albeit with different kinetics at the CCR5

104 Results and CXCR4 genes, and also highlight the importance and merits of delivering the silencing components on one mRNA molecule.

A C

dDMT #6 dDMT R1 DEM #6 DEM R2 DEMs #5 + #6 DEMs L2 + R2

1.4 1.4

1.2 1.2 levels 1.0 1.0 * * 0.8 0.8 * ** 0.6 0.6 0.4 0.4 0.2 0.2

Relative mRNA levels Relative mRNA 0 0 Day 7 Day 21 Day 7 Day 21 B D

1.4 1.4 1.2 1.2 1.0 1.0 0.8 0.8 ** ** ** 0.6 0.6 0.4 0.4 0.2 0.2

Relative levels protein Relative protein levels 0 0 Day 7 Day 21 Day 7 Day 21

E 0.020

0.016

0.012 0.008 0.004 Relative mRNA levels 0 CCR5 CXCR4

F G dDMT #6 dDMT #6 DEM #6 DEM #6 DTR #3 + DMT #6 DTR #3 +DMT #6

2.0 1.4

1.6 1.2 1.0 1.2 0.8 ** 0.8 * 0.6 0.4 0.4 0.2 Relative mRNA levels Relative protein levels 0 0 Day 7 Day 21 Day 7 Day 21

105 Results

Figure 3.12: Functionality of the DEMs in CD4+ T cells

(A) Assessing the reduction in CCR5 transcript and (B) protein levels following DEM delivery. CD4+ cells were nucleofected with mRNA encoding the DEM or dDMT targeting position 6. (C) Assessing the reduction in CXCR4 transcript and (D) protein levels following DEM delivery. CD4+ cells were nucleofected with mRNA encoding the DEM R2 or dDMT R1. Analysis via qRT-PCR and flow cytometry were carried out on day 7 and day 21. mRNA levels (relative to B2M) and protein levels are expressed relative to the control treated cells (-) (mean ± s.e.m, experiments were performed at least 3 times). Statistical significance was calculated using a two-tailed paired Student’s ttest (*p<0.05, **p<0.01). (E) Assessing the basal CCR5 and CXCR4 expression levels in CD4+ cells. Cells nucleofected with the corresponding dDMT were harvested after 7 days in culture and qRT-PCR analysis carried out to determine transcript levels (mean ± s.e.m, n=9). (F)(G) Assessing the functionality of the split DEM architecture in CD4 cells. CD4+ cells were nucleofected with mRNA encoding either the DEM or dDMT targeting position 6 or a combination of mRNA encoding the DTR #3 and DMT #6. Analysis via qRT-PCR and flow cytometry were carried out on day 7 and day 21. (F) mRNA levels (relative to B2M) and (G) protein levels are expressed relative to the control treated cells (-) (mean ± s.e.m, experiments were performed at least 3 times). Statistical significance was calculated using a two-tailed paired Student’s ttest (*p<0.05, **p<0.01).

3.9.1 Multiplex gene silencing in CD4+ cells

As previously mentioned, both the CCR5 and CXCR4 co-receptors are required for HIV viral entry. Therefore, the possibility of multiplexing and silencing both genes simultaneously was explored. Multiplexing has previously been achieved in the context of the HIV through a simultaneous knock-out of the CXCR4 and CCR5 genes in CD4+ cells using the CRISPR/Cas (171) and ZFN (81) platforms. The benefits of multiplexing include facilitating the targeting of multiple genes with a single delivery, an important characteristic when handling sensitive cells and/or delivery methods which result in high levels of cell death. To this end, nucleofections in the T cells with mRNA encoding for both DEM #6 and

R2 were carried out. As previously demonstrated, a significant 2.6-fold reduction in CCR5 transcript levels at day 7 and a 1.8-fold reduction CXCR4 in transcript levels at day 21 was observed (Figure 3.13A). Additionally, a 2-fold reduction in CCR5 and CXCR4 surface expression at the corresponding time points could be achieved (Figure 3.13B). Therefore, it could be demonstrated that DEMs are amenable to multiplexing thereby increasing the versatility and functionality of this platform.

106 Results

A B CCR5 CXCR4 CCR5 CXCR4 1.4

1.4 1.2 1.2

1.0 1.0 0.8 0.8 * 0.6 0.6 ** * * 0.4 0.4 * 0.2 Relative mRNA levels 0.2 Relative protein levels 0 0 dDMTs DEMs dDMTs DEMs dDMTs DEMs dDMTs DEMs Day 7 Day 21 Day 7 Day 21 Figure 3.13: Assessing simultaneous targeting of the CCR5 and CXCR4 genes using DEMs

CD4+ cells were nucleofected with mRNA encoding the DEMs #6 and R2 or each corresponding control. (A) mRNA levels (relative to B2M) and (B) protein levels are expressed relative to the control treated cells (-) (mean ± s.e.m, experiments were performed at least 3 times). Statistical significance was calculated using a two-tailed paired Student’s ttest (*p<0.05, **p<0.01).

3.9.2 DNA methylation analysis in CD4+ cells

To evaluate the extent of CpG methylation, bisulfite sequencing analysis via NGS was carried out for cells which had received the DEM #6 or the inactive control. Combining these two techniques allowed a broader analysis with greater accuracy to be conducted compared to methylation analysis via Sanger sequencing which can be very laborious. Initial analysis was carried out 11 days post nucleofection at 6 amplicons (#4-9) in a region of 5 kb centered around the binding site of the DEM #6. Within this region a significant increase in CpG methylation was observed in DEM-treated cells compared to cells which received the inactive control (Figure 3.14A). Amplicons up to a distance of 9kb up- and downstream from the DEM #6 binding site (amplicons #1-3 and #10-12) were also evaluated and did not show significant differences in methylation (Figure 3.14C). Similarly, bisulfite analysis was performed 18 days post nucleofection in cells which had received either the DEM R2 or the inactive control in a region of 400 bp surrounding the DEM R2 binding site. In this region up to a 12-fold increase in CpG methylation in DEM-treated cells was observed compared

107 Results to the control (Figure 3.14B). Taken together, these data demonstrate that DEMs are capable of inducing potent and stable DNA methylation in primary human cells.

108 Results 6 400bp A 5Kb B C CCR5 1 2A 2B Kb 6 #4 #3 #2 #1 -9.0 -7.0 -5.0 -3.0 +3.0 +5.0 +7.0 +9.0 CCR5 1 2A 2B 3 CXCR4 1 2 3

#4 #5 #6 #7 #8 #9 R2 #1 #2 #3 #10 #11 #12 ON4 ON5 ON6 ON7 ON8 ON9

Position

32 77 30 47 50 82 35 39 54 31 42 dDMT R1 DEM R2 dDMT #6 DEM #6 147 134 229 113 134 121 120 174 123 205 144 111 224 dDMT 1 2 20 * 100 #6 3 15 80 DEM 1 ** 2 60 #6 10 3 ** 40 40 70 70 90 20 70 70 90 7 20 80 90 5 20 % CpG methylation % of CpG methylation% CpG of % of CpG methylation% CpG of 0 0 #1 #2 #3 #4 1 2 3 10 11 12 dDMT #6 DEM #6 Amplicon

Amplicon 100 * ** 80 60 * ** 40 20 ** 0 methylation% CpG of #4 #5 #6 #7 #8 #9 Amplicon

Figure 3.14: Methylation analysis at the CCR5 and CXCR4 genes in CD4+ cells

(A) Next-generation bisulfite sequencing and (B) Sanger sequencing were performed to determine the extent of methylation at the CCR5 and CXCR4 genes respectively. Analysis was carried out in cells nucleofected with either the DEM #6, R2 or the corresponding inactive control. The amplicons in a region of 5 kb or 400 bp encompassing the DEM target site are indicated for CCR5 and CXCR4 respectively. A heat map showing the methylation of individual CpGs (top) and a histogram summarising the methylation data (bottom) are shown for CCR5 (mean ± s.e.m, n=3). (C) Assessing methylation spreading at the CCR5 gene in CD4+ cells. Next-generation bisulfite sequencing analysis was carried out in amplicons at a distance of 9 kb up- and downstream from the DEM #6 target site. The amplicons investigated are indicated (mean ± s.e.m, n=3). Statistical significance was calculated using a two-tailed unpaired Student’s ttest (*p<0.05, **p<0.01).

109 Results

3.9.3 DEM-mediated histone modifications

In order to further characterise the effect of the DEMs in primary cells, the histone modifications associated with DEM activity were also evaluated via chromatin immunoprecipitation. Chromatin immunoprecipitation is an assay which is typically used to investigate interactions between proteins and DNA. Importantly, it enables the evaluation of genomic sites associated with specific transcription factors and also allows for the monitoring of histones harbouring posttranslational modifications at specific loci (172, 173).

In this study, H3K9me3 which indicates the trimethylation at lysine 9 of histone 3 was analysed. H3K9me3 is a posttranslational modification which is associated with the formation of heterochromatin and transcriptional repression (reviewed in (174)) Therefore in this context this assay allowed for the determination of if and where the H3K9me3 repressive mark was established within the regions of interest. To this end, cells were harvested four or eleven days following nucleofection with DEM #6 or R2 respectively as well as the corresponding controls. Cells were harvested and fixed with formaldehyde then chromatin immunoprecipitation carried out using an antibody against H3K9me3. Two amplicons were analysed for each locus and a significantly increased signal at amplicon #2 and #1 for CCR5 and CXCR4 respectively could be observed (Figure 3.15 A & B). In both cases the amplicon in which a significant difference was detected was also the amplicon closest to the DEM binding site. No significant differences in DEM-treated and control- treated cells were observed at the unrelated untranscribed region 5. This region was chosen as a positive control as it shows a high H3K9me3 signal in primary human cells as determined using the UCSC Genome Browser. Taken together, these data suggest that in addition to DNA methylation the DEMs are also capable of inducing histone modifications which may contribute to the stability of the silenced state of these two genes.

110 Results

B A R2 6 CXCR4 1 2 CCR5 1 2A 2B 3 #1 #2 #1 #2 H3K9me3 H3K9me3 dDMT #6 dDMT #6 dDMT R1 dDMT R1 DEM #6 DEM #6 DEM R2 DEM R2

2.0

8

2.0 3 *

*

1.5 1.5 6 2 1.0 input 1.0 4

1 % input of 0.5 % input of 2 % of % of % input of 0.5

Relative to actin Relative to actin Relative to actin Relative to actin 0.0 0 0.0 0 UNTR5 #1 #2 UNTR5 #1 #2 CCR5 CXCR4

Figure 3.15: Assessing DEM-induced histone modifications in CD4+ cells

(A) CCR5 and (B) CXCR4 chromatin immunoprecipitation was carried out using an antibody against H3K9me3. Cells were nucleofected with either the DEM #6, R2 or the corresponding inactive control. Analysis was carried out 4 or 11 days post nucleofection for CCR5 and CXCR4 respectively. The percentage of input is expressed relative to a negative control gene actin and UNTR5 was included as positive control for H3K9me3 in CD4+ cells. The amplicons investigated are indicated (mean ± s.e.m, n=3). Statistical significance was calculated using a two-tailed paired Student’s ttest (*p<0.05).

3.10 SPECIFICITY PROFILE OF THE DEMS

As a first step towards investigating the specificity and thus safety profile of the DEMs, the consequences of possible methylation spreading on nearby genes was evaluated. Therefore, the expression levels of four genes within a 200 kb range from the DEM #6 binding site were measured, namely CCR2, CCR1, CCR3, CCRL2. Cells were nucleofected with either mRNA encoding for DEM #6 or the inactive control and qRT-PCR performed four days post nucleofection. As expected, no significant differences in expression could be observed when comparing the DEM- and control treated cells (Figure 3.16). As previously observed methylation spreading from the DEM #6 binding site was limited and did not extend further than 3 kb up and downstream. Taken together, these data suggest that while the DEMs exhibit potent and stable gene silencing, their window of activity is restricted to regions in direct proximity to the effector target site.

111 Results

6 Kb -168.396 -127.724 -16.361 +37.125

dDMT #6 DEM #6 1.6 1.2 0.8 * 0.4

Rel. mRNA levels 0

Figure 3.16: Assessing the effects of DEM-induced silencing at CCR5-neighbouring genes

Quantitative RT-PCR was carried out 4 days post nucleofection in CD4+ cells which received the DEM or inactive control targeting position 6. The genes investigated are indicated as well as the distances of their transcription start sites from the DEM #6 target site. Transcript levels (relative to B2M) are expressed relative to the control treated cells (mean ± s.e.m, n=3). Statistical significance was calculated using a two-tailed paired Student’s ttest (*p<0.05).

Next, whole transcriptome analysis was carried out on DEM#6 and dDMT samples from three experiments via RNA-seq. Four days post nucleofection whole RNA was extracted from the cells and the samples sent to the sequencing facility of the Center for Genomics and Transcriptomics (CeGaT). As expected, CCR5 transcript levels were reduced 1.7-fold in DEM-compared to control treated cells (Figure 3.17A). Furthermore, no differences were detected at an unrelated gene-β2 microglobulin. The results also showed that 28 genes were consistently upregulated whereas 56 genes were down-regulated (including CCR5), all beyond a threshold of 1.5-fold (Appendix Table 6.1). To verify whether this could be attributed to TALE off-target binding, potential DEM #6 off-target sites bearing up to three mismatches compared to the on-target site were identified using the online tool COSMID

(Appendix Table 6.2). By comparing these off-target sites against the 84 genes identified via

RNA-seq, it was observed that the predicted off-target sites did not fall within 10 kb of the

TSS of the differentially-expressed genes. These data suggest that the up and down- regulation of these 84 genes was not as a result of the direct binding of the DEM #6.

112 Results

Next, changes in chromatin accessibility were investigated using the Assay for Transposase-

Accessible Chromatin sequencing (ATAC-seq). This assay involves the use a hyperactive

Tn5 transposase which is able to simultaneously cut and insert sequencing adapters at regions of accessible chromatin. Therefore, sequencing reads will give an indication of chromatin accessibility and can be used to determine transcription factor binding sites and nucleosome position (161). To this end, DEM and control-treated cells from two experiments were analysed in collaboration with the AG Bossen at the CCI. Decreased chromatin accessibility was observed in a range of 3.5 kb from the DEM #6 binding site in cells which had received the DEM #6 compared to cells which had received the inactive control (Figure 3.17B & C). Accessibility at the unrelated gene B2M was unchanged.

Furthermore, 324 sites in addition to CCR5 exhibiting lower accessibility in DEM- compared to control treated cells were observed (Appendix Table 6.3). However, sites genes did not overlap with the 84 differentially-expressed genes identified via RNA-req. While 113 of these regions fell within 10 kb of the TSS of known genes (Appendix Table 6.4), none of these genes were significantly de-regulated as evaluated using the RNA-seq data. It is worth noting that out of the 324 regions which showed reduced chromatin accessibility, 3 sites were within 10 kb proximity of predicted DEM #6 off-target sites (Appendix Table 6.5).

However, on closer examination the closest genes were either not expressed or not de- regulated.

113 Results

As a last step the TAL Effector Nucleotide Targeter 2.0 tool (157) was used to

computationally predicted the top 10 DEM #6 off-target sites (Table 3.1) and next generation

bisulfite sequencing carried out to evaluate the extent of CpG methylation. Out of ten sites

only one showed a significant increase in methylation in DEM compared to control treated

cells (Figure 3.17D). On closer examination it was observed that this site falls within an

intergenic region on chromosome 20 and is not accessible in primary human T cells as

evaluated by ATAC-seq (Figure 3.17E). Taken together, CCR5 was the only gene

consistently detected via high-throughput whole transcriptome and chromatin accessibility

analyses in conjunction with in silico prediction (Figure 3.18) thus suggesting that the DEMs

have mild off-target effects and may indeed provide safer therapeutic options compared to

gene disruption strategies.

Table 3.1: List of potential off-target sites identified with TAL Effector Nucleotide Targeter 2.0

Distanc e from TSS COSMID ID Chromosome Strand Gene Mismatches Score Start Position Target Sequence (bp) prediction1 TGACCATATACTTATGT 0 3 + CCR5 0 4,98 46.411.596 CA 19

1 7 + LOC101927668 2 6,24 20.121.800 TAACCATATACTTATCTCA 42'968 ID #35

2 4 + Intergenic 1 7,31 165.401.212 TGAACATATACTTATGTCA n/a ID #1 TGACCATATACCTATCTC 3 18 + YES1 2 8,04 779.684 A 32'626 ID #40

4 17 + Intergenic 3 8,24 8.563.008 TCACCATATACATATATCA n/a ID #580

5 20 + Intergenic 3 8,24 12.726.816 TCACCATATACATATATCA n/a ID #582

6 5 + Intergenic 3 8,24 97.609.784 TCACCATATACATATATCA n/a ID #581

7 12 + TEAD4 2 8,36 3.076.665 TGAACATATACTTATCTCA 8'186 ID #36

8 5 + LOC101927421 3 8,63 124.565.833 TAACCATATATTTATATCA 193'308 ID #545

9 X - Intergenic 3 8,63 112.150.169 TAACCATATATTTATATCA n/a ID #544

10 2 - NYAP2 3 8,9 226.408.801 TAGCCATATACTTATATCA 143'199 ID #440 1 As shown in Appendix Table 6.4

114 Results

B C A dDMT #6 DEM #6 2kb dDMT #6

Chr.3 DEM #6 target site 15 DEM #6 180,000 900 160,000 800 0 140,000 700 15 600

seq counts 120,000 * seq counts - 100,000 - 500 0 80,000 400 60,000 RNA 300 RNA

200 2kb 40,000 20,000 100 55 Chr.15 0 0 Norm. B2M Norm. CCR5 0 55 0

D E

dDMT #6 DEM #6

100 2kb dDMT #6 * Chr.3 DEM #6 80 15 60 0 40 15

20 n/a 0 methylation% CpG of 0

Figure 3.17: Assessing the specificity profile of DEMs in CD4+ cells

(A) Validation of CCR5 silencing as evaluated by RNA-seq. Whole RNA was extracted from cells 4 days post nucleofection with either the DEM or inactive control targeting position 6. The histograms show the normalised RNA-seq read counts at the target gene CCR5 and an unrelated gene B2M (mean ± s.e.m, n=3). Statistical significance was calculated using a two-tailed paired Student’s ttest (*p<0.05). (B) Circos plot showing whole genome accessibility in primary human T cells. Analysis was carried out 12 or 13 days post transfection with the DEM (grey) or dDMT (black) targeting position 6. The line indicates the DEM #6 target site in CCR5 on chromosome 3. (C) Assessing chromatin accessibility at the CCR5 gene in CD4+ cells following DEM delivery. The accessibility at the CCR5 gene (top) and unrelated B2M gene (bottom) are indicated. The normalised read counts pooled from two independent experiments are shown. The dashed line indicates the DEM #6 binding site. (D) Methylation analysis at predicted off-target sites. Next-generation bisulfite analysis was carried out at the top 10 in silico predicted DEM #6 off-target (OT) sites. For each region the average CpG methylation in an amplicon encompassing the predicted off-target site was analysed. The histogram shows the averge methylation measured (mean ± s.e.m, n=3). Statistical significance was calculated using a two-tailed paired Student’s ttest (*p<0.05). n/a denotes failure to obtain a PCR product. (E) Accessibility at the OT5 measured via ATAC-seq.

115 Results

Figure 3.18: Summary of the DEM-specificity profile evaluation

Venn diagram showing the overlap between data obtained via RNA-seq, ATAC-seq and in silico analysis. The overlap with computationally predicted off-target sites is restricted to the 10 kb distance from the TSS of the 84 de-regulated genes detected via RNA-seq or from the 325 regions of lower chromatin accessibility identified via ATAC-seq.

116 Discussion

4. DISCUSSION

The HI virus remains an important global health concern and there is an urgent need for treatment strategies which address the inadequacies of the current therapies and the complexity of the virus itself. This study presents a novel platform which harnesses the targeting ability of programmable DNA-binding domains with the silencing capabilities of the KRAB repressor and DNA methyltransferase domains. In addition to demonstrating the efficacy of this system, the second major aim of this project was to evaluate and determine the safety of this approach. The end goal would be to use this system for the ex vivo modification of human T cells such as to confer resistance against HIV. To this end, this study provides the first proof of principle data for such an approach which may bring about greater understanding of the problem at hand and may in turn contribute to the complete eradication of this virus.

4.1 TRANSCRIPTIONAL REGULATION ACHIEVED WITH THE DESIGNER TRANSCRIPTION FACTORS IS MODEST AND TRANSIENT

The ZFN, TALE and CRISPR/Cas platforms have all been combined with various effector domains to allow for the modulation of gene expression. In this study designer transcription activators were generated by fusing each corresponding DNA-binding domain to the VP16 activator domain. Functionality was determined using a Firefly Luciferase-based reporter and in so doing this allowed for the evaluation of the ability of the DTAs to bind to their target sites. Three DTAs, corresponding to positions 2, 5 and 6 resulted in a significant increase in Firefly Luciferase activity up to 4.5-fold compared to the control. Despite being an episomal assay, this is in agreement with previous studies with both the VP16 and VP64

117 Discussion activators targeting endogenous genes in which activation does not typically go beyond 5- fold (reviewed in (117)). However, due to the different activator architectures used and the different genes targeted it is difficult to ascertain the factors contributing to these results.

Nonetheless, these results may also be in accordance with a study by Stone and colleagues

(175) in which they used TALE-based activators and found that their activity was strand and position dependent. Similarly, in this study varying activities were observed at the six CCR5 target sites of which four are located on the sense strand and two are located on the antisense strand. In addition, they observed robust activation of gene expression when targeting the promoter regions and inhibition attributed to steric hindrance when targeting the sense strand. In the same way, in this study the CCR5 promoter as well as a DNaseI hypersensitivity region were targeted, both of which have been shown to enhance the success of transcriptional activation (176). Interestingly, two of the three aforementioned positions are also the sites closest to the TSS. The DTAs showed synergistic activity as has previously been demonstrated (117) with the combination corresponding to #2 + #5 +#6 resulting in the highest Firefly Luciferase activity in accordance with previous results. A single TALEN subunit did not result in significant changes in Firefly Luciferase activity as has been demonstrated by Liu and colleagues (119) in the case of CRISPR interference suggesting that the activity of the DTAs can be attributed to the VP16 and not to mere binding to the target site. Further investigations into factors such as binding affinity and the locations of endogenous transcription factor binding sites would be necessary in order to establish the significance of these six positions with regards to CCR5 expression.

Next, designer transcription repressors were generated by fusing the DNA-binding domain to a KRAB repressor and their activity tested on an endogenous reporter gene. All six DTRs were functional and resulted in between 10-20% reduction in GFP positive cells. This is in

118 Discussion accordance with the study by Cong and colleagues (128) in which they demonstrated endogenous gene silencing using TALE-based KRAB repressors. In contrast to CRISPRi, they also demonstrated that repression via the TALE platform was not a result of steric hindrance as the TALE DNA-binding domain alone did not result in gene silencing.

Interestingly, when tested as combinations, the DTRs targeting position #5 and #6 again exhibited high functionality (approximately 25% reduction in GFP positive cells) as in the previous experiments with the DTAs. However, in contrast to the synergistic activity observed with the DTAs, combinations of three repressors did not increase the extent of silencing compared to combinations of two repressors. Qi and colleagues (119) demonstrated synergistic repression using the catalytically inactive dCas9 and multiple gRNAs, however, they observed suppressive combinatorial effects when using gRNAs whose target sites overlap. They attributed this effect to competition of the gRNAs for their respective target site. In general, KRAB fusions seem to be less dependent on synergistic activity and sufficient gene silencing can be induced by a single effector (113). In addition, differences in architecture may also contribute to variations in the silencing mediated by the different platforms. In this case, it is possible that the binding of three TALE DNA-binding domains within the same region resulted in steric hindrance and the observed deleterious effects on gene silencing. However, in the case of the DTRs targeting positions 5 and 6 it is possible that binding competition was eliminated or minimised as the effectors bind different

DNA strands.

Long-term monitoring of cells which received the DTR #6 revealed that silencing was abolished after approximately three weeks suggesting that DTR-mediated silencing was transient. On the one hand, these results are characteristic with transient expression of the

119 Discussion

DTR following plasmid DNA transfection resulting in a gradual loss of silencing. On the other hand, this phenomenon may also be mechanism related. Stolzenburg and colleagues

(147) showed that ZFPs fused to the KRAB domain did not sustain stable downregulation of gene expression in contrast with a ZFP-DNMT3A fusion despite the stable expression of both via viral vectors. However, as the KRAB domain results in the assembly of a heterochromatin-forming complex, stable silencing is expected. Indeed, silencing induced by the KAP1 complex associated with the KRAB domain has been shown to be mitotically heritable over several cell generations. Ayyanathan and colleagues (177) investigated the mechanism of KRAB-KAP1 repression through the use of a hormone-inducible system and were able to make the following conclusions: KRAB mediated the strong transcriptional repression of an integrated reporter gene, repression was within a short range and accompanied by chromatin compaction, the repressed gene was physically associated with the KAP1 co-repressor and the heterochromatin protein I (HP1), the induced silenced state of the transgene could be maintained for approximately 40 population doublings in the absence of the hormone. Lastly, they concluded that stable silencing was specific for the

KRAB-KAP1-HP1 mechanism as the use of other repressor domain fusions which do not bind HP1 did not result in stable silencing. Therefore, it seems that KRAB-mediated repression is strongly dependent on the presence of other silencing factors at the target locus and may not necessarily result in stable silencing if these criteria are not fulfilled.

120 Discussion

4.2 DEMS MEDIATE POTENT, STABLE GENE SILENCING ACCOMPANIED BY DNA METHYLATION

In order to improve the low efficiency and transient activity exhibited by the DTRs, designer epigenome modifiers were designed and generated with the aim of harnessing the silencing potential of DNA methylation. Transfections of in vitro transcribed mRNA encoding for the different constructs were carried out and testing conducted in the reporter cell line. The effect of the following constructs were evaluated: the DTR previously described, the DMT containing the DNMT3A and Dnmt3L domains, the dDMT comprising a catalytically inactive DNMT3A domain and a Dnmt3L domain as well as the DEM comprising the

DNMT3A, Dnmt3L and KRAB domains. Six days following transfection a significant reduction in EGFP+ cells was observed with the DTR, DMT and DEM constructs.

Remarkably, an 80% reduction of EGFP+ cells was observed in cells treated with the DEM whereas the effect of the DTR and DMT were quite modest, up to 30% reduction. The dDMT did not result in significant GFP silencing and was henceforth used as a negative control.

Upon long-term monitoring of the cells it was revealed that the silencing effect of the DTR was transient, in accordance with previous observations. However, the silencing which had been achieved with the DMT was also abolished. In contrast, DEM-mediated EGFP silencing remained stable up to 65 days post transfection. This is in accordance with results obtained by Amabile and colleagues (165) in which transient expression of the KRAB and

DNMT3A did not result in stable silencing whereas stable silencing was observed through co-delivery of the two factors. Additionally, they could improve silencing efficiency by co- delivering the KRAB and DNMT3A in combination with the DNMT3L. Interestingly, they could achieve high levels of silencing by co-delivering the three factors on different plasmids however in this study the separation of the DNMT3A/3L from the KRAB domain resulted in reduced silencing thus highlighting the benefits of this unique all-in-one architecture.

121 Discussion

The silencing observed can be explained in the context of the silencing cascade induced in endogenous retroviruses (ERVs) which is established in the pre-implantation embryo and maintained throughout development until adulthood (165). Silencing is initiated by the

KRAB-ZFPs which bind to specific retroviral sequences and recruit KAP1 as previously discussed. KAP1 subsequently acts as a scaffold for the assembly of a heterochromatin- forming complex which comprises epigenetic silencers such as histone methyltransferases e.g SETDB1 and the NURD histone deacetylase complex which add the H3K9me3 repressive mark and remove transcription-promoting acetyl groups respectively (178).

KAP1 can then recruit the de novo DNA methyltransferases as well as the DNMT3L which through the methylation of genomic CpG residues result in the stabilisation of this silenced state and allow for the maintenance of silencing following DNA replication (165). Therefore, the results obtained confirm that all three components, the DNMT3A, DNMT3L and KRAB repressor are required for stable silencing.

The delivery method is also an important consideration as stable expression via viral vectors may result in unwanted effects. In this study the delivery of the DEMs as in vitro transcribed mRNA resulted in higher levels of silencing compared to plasmid delivery. Delivery of the effectors as mRNA has the added benefits of being less stressful to the cells and enabling delivery into sensitive cells such as primary cells. In addition, transient expression of the effectors may result in reduced off-target effects as mRNA is rapidly degraded following delivery thus resulting in a shorter exposure time with DNA (179).

122 Discussion

Furthermore, the observation that short-term exposure to the DEMs resulted in a greater extent of silencing suggests that the persistence of the effectors may have a deleterious effect, possibly by impairing the assembly of the complexes required for silencing.

Bisulfite sequencing analysis confirmed that gene silencing was associated with potent and stable DNA methylation. Approximately 60% CpG methylation was observed two days post transfection highlighting the fast kinetics of DEM-induced silencing. Methylation increased to 80% and remained stable until the experiment was terminated on day 31. This is in agreement with the results obtained by Bernstein and colleagues (180) in which they used a

TALE-DNMT3a/3L fusion targeting the CDKN2A gene in HeLa cells. They observed up to

17% methylation across the CpG island and 66.5% at individual CpGs. Using the triple combination comprising the DNMT3A, DNMT3L and KRAB domains, Amabile and colleagues (165) observed up to 80% methylation at the B2M gene in HEK293 cells. Given the differences in delivery methods (DNA vs mRNA) and the different genes targeted in different cell types this probably explains the variations observed in these two studies in comparison to this work. In addition, minimal spreading of DNA methylation was observed with drastically decreased methylation observed at a distance of 2 kb from the DEM binding site. This is contrary to the observation of activity of the DNA methyltransferases at unmethylated CpGs in close proximity to existing methylated CpGs (166). In an attempt to restore gene expression, either the DTA targeting position 6 or the DNA demethylating agent

5-AZA were used. Two days following treatment a significant increase in EGFP expression could be observed with both conditions compared to the respective controls. However, six days following treatment only the cells treated with 5-AZA exhibited significantly increased gene expression compared to control cells. Similarly, Amabile and colleagues (165) were able to reactivate gene expression using 5-AZA treatment. In addition, they investigated the

123 Discussion effect of targeted DNA demethylating and the recruitment of artificial transcriptional activators on the silencing mediated by the triple effector combination. They used the dCas9 protein fused to the catalytic domains of a DNA demethylating domain (dCas9:TET1), the

VP160 transcriptional activator or the catalytic core of the acetyltransferase p300. The transcriptional activator and acetyltransferase both resulted in inefficient short-term reactivation of gene expression. In contrast, transient expression of the dCas9:TET1 resulted in effective, stable and long-term reactivation of the silenced gene which was also associated with DNA demethylation of the targeted CpG island. In accordance with the results in this study, they concluded that silencing mediated by the DNMT3A/DNMT3L/KRAB fusion is resistant to reactivation by artificially-recruited transcriptional activators and is stably maintained by DNA methylation. Therefore, in order to restore gene expression long-term, the methylation of genomic CpGs should be abolished. Taken together, these observations highlight the potent yet reversible nature of the silencing mediated by the DEMs.

4.3 DEMS EXHIBIT FUNCTIONALITY AT CLINICALLY RELEVANT, ENDOGENOUS GENES

The aim of this project was to target the CCR5 and CXCR4 co-receptors via DEM-mediated silencing as a possible treatment strategy against HIV infection. Therefore, having established functionality in an episomal assay and at an integrated reporter gene the efficacy of this approach at the two aforementioned genes was then assessed. As HEK 293T cells express CXCR4 they provided an ideal initial system to test the functionality of the CXCR4- specific repressor constructs. DEMs targeted to the first exon and intron of the CXCR4 gene were generated and delivered to the cells as in vitro transcribed mRNA. Two days post transfection a significant reduction was observed in both transcript and protein levels

124 Discussion assessed via qRT-PCR and flow cytometry respectively. However, 20 days post transfection significant changes could only be detected at the protein level and not at the mRNA level although a similar trend was maintained. The latter observation can be attributed to the high variation in Ct values observed across the different experiments. Nonetheless, the consistency in silencing observed when assessing the surface expression of CXCR4 at D2 and D20 suggests that the effects of the DEMs were maintained through cell division.

Bisulfite sequencing analysis carried out at D20 revealed that silencing was accompanied by up to 22-fold DNA methylation, in agreement with previous observations. Taken together, these data provide a first glimpse into the functionality of the DEMs in a chromosomal context.

Given the complexity of stem cell transplantation, this project was focused on developing an anti-HIV therapy based on the inactivation of CCR5 and CXCR4 in CD4+ T cells.

Nucleofections were carried out with mRNA encoding the DEMs and silencing assessed via quantitative RT-PCR and flow cytometry on D7 and D21. In cells which received the DEM

#6 targeting the CCR5 promoter a significant reduction in both transcript and protein levels was observed on D7 whereas no significant effect could be observed in the cells which received the DEM R2 targeting CXCR4. However, when assessing the cells on D21, DEM- mediated silencing of CCR5 had been lost whereas at this time point it was finally possible to detect a significant reduction of both CXCR4 mRNA and protein levels. On the one hand, the basal expression of CXCR4 on day 7 was about 20-fold higher than that of CCR5 which could explain the moderate effect observed for the CXCR4 gene at this earlier time point. On the other hand, CCR5 expression has been found to be affected by the activation which is required to keep the cells in culture long-term. T cell activation requires the CD3 T cell receptor as well as a co-stimulatory signal which can be provided through the interaction of

125 Discussion the co-stimulatory molecule CD28 with its ligands CD80 or CD86 (181). Carroll and colleagues (182) found that the activation of CD4+ T cells from donors infected with HIV-1 via CD3/CD28 co-stimulation induced a virus-resistant state. This effect was specific for M- tropic viruses and was found to be accompanied by a loss of detectable CCR5 mRNA expression. In contrast, they found CXCR4 transcripts to be abundant under these conditions.

Nonetheless, they had also previously shown that this virus resistance via CD3/CD28 co- stimulation is in part due to the enhanced production of RANTES, MIP-1α and MIP-1β which block infection by M-tropic strains of HIV (183). As previously mentioned, these chemokines are also the natural ligands of CCR5 which explains why their increased production would result in HIV resistance. In addition, Bleul and colleagues (184) showed that IL-2 up-regulated CCR5 expression. Therefore, it seems that there are several mechanisms at play when it comes to the regulation of CCR5 expression which may result in unstable expression kinetics when using culture conditions involving CD3/CD28 activation and IL2. Taken together, the observations from these studies may explain the loss of CCR5 silencing at the later time point. In future studies it may be worth assessing the

DEM-mediated silencing at genes which are not directly affected by the cell culture conditions.

Another important factor to consider is the abundance of CpG residues in these two genes and the presence or absence of CpG islands. Notably, the CCR5 gene does not contain a CpG island in its promoter and in fact only contains 4 CpG residues in the 320 bp upstream promoter region incorporated into the two reporter systems. In contrast, a CpG island can be found in the CXCR4 gene promoter. This difference may play a role in the persistence of

DEM-mediated DNA methylation. One can speculate that the presence of more CpG residues may result in a more complex chromatin environment and may result in greater

126 Discussion stabilisation of a transcriptionally silenced state. Indeed, a study by Curradi and colleagues

(185) showed while a few methylated cytosines may result in the inhibition of a nearby promoter, a threshold of modified sites is required to organize a stable chromatin structure which is capable of spreading to other unmethylated cytosines in the region. In their Xenopus model, three methylated cytosines reduced transcription by 45% whereas a single methylated cytosine had no effect. In addition, the presence of 6 methylated cytosines enabled spreading of methylation over 100 bp but this was effect was abolished with four methylated cytosines.

Furthermore, they could demonstrate that histone deacetylation contributes to gene repression in the situation where the number of methylated cytosines is insufficient to mediate silencing over large distances. In the instance of CCR5, it is thus possible that the scarcity of CpG residues resulted in a weaker extent of silencing which was more easily overcome when compared to CXCR4.

In contrast to the data obtained with the DTAs and DTRs, the DEMs did not exhibit any synergistic activity in the CD4+ cells and one effector seemed to be sufficient to mediate gene silencing for both CCR5 and CXCR4. The addition of a second DEM did not improve the silencing efficiency and in some cases even proved to be deleterious. On the other hand simultaneous targeting of CCR5 and CXCR4 could be achieved demonstrating that DEMs are indeed amenable to multiplexing. Similarly, Amabile and colleagues (165) showed that they could achieve multiplex gene silencing at efficiencies comparable to when targeting each gene individually. In the same way, in this study similar silencing of CCR5 and CXCR4 was observed in both scenarios. In the case of targeting two (or more) genes, one of the main merits of an approach based on epigenome editing is it eliminates the risk of chromosomal rearrangements which could occur if the targeted genes were simultaneously cleaved by designer nucleases. Surprisingly, it was also revealed that the split DEM architecture which

127 Discussion had shown modest activity in the reporter cell line did not result in significant silencing in the CD4 cells. This could have been due to the delivery method used as well as the inability of the silencing effectors to bind to their target simultaneously in this context. This further emphasises the importance of the all-in-one architecture and puts forward an ideal system for further investigations which will result in robust gene silencing in different cell types.

The variations in activity observed at the integrated reporter, in HEK293T cells and CD4 primary cells may be explained by considering the state and structure of the chromatin in each case. Using similar effectors (KRAB, DNMT3A, DNMT3L) Amabile and colleagues observed varying silencing efficiencies in different cell types and attributed these differences to the local chromatin environment. However, contrary to the findings in this study, they demonstrated and suggested that the combination of all three effectors could overcome these constraints. In the past years, several groups have worked to determine the effect of chromatin structure on the targeting efficiency of designer-nuclease based approaches.

Kuscu and colleagues (186) conducted a Chip-Seq analysis and found that not only were dCas9 off-target sites enriched in regions of accessible chromatin as indicated by DNaseI hypersensitivity, but binding sites were also more prevalent in accessible regions such as gene promoters. In the same year, Wu and colleagues (187) also demonstrated that CpG methylation may also negatively impact Cas9 binding in contrast to previous observations.

TALENs are known to be sensitive to CpG methylation however this limitation can be overcome by using TALE repeats which recognize and bind 5mC (73). Therefore, it is possible that the differences in efficacy observed with the different repressors in different cell types may in part be due to the local chromatin environment in these cells. As previously observed, DEM-induced silencing in CD4 cells was accompanied by CpG methylation with both the CCR5 and CXCR4 genes highlighting the robust nature of this platform. Notably,

128 Discussion significant differences in methylation when comparing DEM and control-treated were only detectable within a 5 kb region encompassing the DEM #6 binding site. Amplicons investigated in a region 3-9 kb up- and downstream from the DEM binding site did not exhibit significant differences in methylation. This was in accordance with data from the reporter assays and assays in HEK293T cells in which DEM-induced methylation was found to be confined to regions in immediate proximity to the target site. Similarly, Bernstein and colleagues demonstrated a reduction in CpG methylation with increasing distance from a

TALE-DNMT target site in primary human fibroblasts (180). However, despite confirming

DEM-induced methylation of the CCR5 promoter, silencing was lost over time. This, in combination with the scarcity of CpG residues in the CCR5 promoter, suggests that DNA methylation may not play a significant role in the regulation of CCR5 expression.

An investigation into possible chromatin modifications introduced via DEM silencing revealed the presence of the H3K9me3 post-translational modification with both the CCR5 and CXCR4 genes. Importantly, the signal observed for this repressive mark was only significant in the regions closest to the binding sites of the DEMs #6 and R2.

Understandably, this is an ideal observation as it further confirms that the effects mediated by the DEMs are highly localised to the target region. However, despite the establishment of histone modifications with both genes, DEM-induced CCR5 silencing was not stable. As both the ChIP and qRT-PCR analyses were carried out four days post transfection it is possible that the loss of CCR5 silencing observed over time is also accompanied by the loss of this repressive mark. It would be necessary to conduct ChIP analyses at later time points to further investigate this discrepancy. As this was not an extensive evaluation of the relevant epigenetic modifications it would also be important to investigate the effects on RNA

129 Discussion polymerase II binding and markers of active transcription such as H3K4me3 and H3K27ac, for example.

4.4 DEMS EXHIBIT A BENIGN SPECIFICITY PROFILE

The purpose of this project was to assess the safety of epigenome editing for therapeutic purposes in light of the widespread and growing use of programmable nucleases. There still exist safety concerns associated with designer nucleases therefore improvements to the different platforms which would address these concerns would be advantageous.

As a first step, the expression of the genes neighbouring CCR5 was evaluated in an attempt to evaluate DEM specificity. The expression levels of the four genes closest to CCR5 within a 200 kb range were investigated via qRT-PCR, namely: CCR2, CCR1, CCR3 and CCRL2.

Data from three independent experiments revealed that CCR5 was the only gene for which a significant difference in transcript levels could be detected when comparing DEM and control-treated cells. CCR2 is not only in very close proximity to CCR5 but the two genes also share 75% sequence identity (188). For this reason, CCR2 is a common off-target site for programmable nucleases targeted to CCR5. Notably, in the study by Perez and colleagues

(59) aimed towards generating HIV-resistant T cells through mimicking the Δ32 mutation,

CCR2 was also disrupted via ZFN-mediated cleavage. They added that while the activity at the CCR2 gene was measurable it was unlikely to be deleterious as the knockout is well- tolerated in mice. However, given the fact that epigenome modifiers and designer nucleases target the gene promoter or the gene itself respectively, it is not possible to make a comparison between the two strategies in this context.

130 Discussion

Whole transcriptome analysis of three experiments carried out via RNA-seq revealed that

CCR5 mRNA levels were reduced 1.7-fold as expected whereas the unrelated gene B2M was unaffected. A total of 84 genes were differentially expressed of which 28 genes were up-regulated and 56 genes (including CCR5) were down-regulated. To exclude effects due to the off-target activity of the DEM #6, in silico analysis via the COSMID tool was carried out in order to identify off-target sites harbouring up to three mismatches compared to the on-target site. None of these predicted off-target sites were within 10 kb of the TSS of the differentially regulated transcripts leading to the conclusion that the effects on mRNA expression observed were not as a result of DEM off-target binding. Amabile and colleagues

(165) had similar results and attributed the deregulation which they observed to background noise of the analysis or to unknown perturbations related to their treatment. In order to fully understand this phenomenon, it would be worth looking into whether any of these 83 genes are related to CCR5 in any way and may be differentially expressed in response to the down- regulation of CCR5. In addition, comparing treated to completely untreated cells would reveal whether cell manipulations such as the nucleofection may also be partly responsible for such observations. Furthermore, it is known that the chromatin 3D architecture may result in the physical interaction of hypermethylated CpG islands resulting in the recruitment of repressor proteins and the establishment of a transcriptional silenced state (189). Therefore, higher order chromatin structures may also play a role in the de-regulation of these 83 genes.

131 Discussion

As previously discussed, chromatin structure and accessibility are important factors which may have far-reaching implications where gene expression is concerned. Therefore, in this project an assay which specifically evaluates the extent of chromatin accessibility was used.

Two independent experiments were evaluated via the Assay for Transposase-Accessible

Chromatin (ATAC-seq) and comparisons made between DEM and control-treated cells. In the region immediately surrounding the DEM #6 binding site reduced accessibility was observed in the cells which had received the DEM #6 whereas the unrelated gene B2M was again unchanged. Apart from CCR5, 324 additional sites showing lower chromatin accessibility were identified. However, it was revealed that there was no correlation with the differentially-expressed genes identified via RNA-seq. Considering the highly dynamic nature of chromatin structure, it is possible that these observations could be partly due to the background noise associated with the assay and the constant changes occurring within the cells. Similarly, Ackermann and colleagues (190) conducted a study in which they integrated

ATAC-seq and RNA-seq in order to identify cell type-selective gene regulatory regions in pancreatic α- and β-cells. They too observed discrepancies between these data sets and partly attributed this phenomenon to the fact that gene expression depends on multiple regulatory regions which may not necessarily be in immediate proximity of the gene itself. Taken together, ATAC-seq provides valuable data and insight into the local chromatin environment however, there are many factors to consider when it comes to whether or not a gene is expressed. In this case, changes in chromatin accessibility did not correspond to differentially expressed genes at this level of analysis therefore it is also possible that these changes will prove to be insignificant and/or not deleterious. Evaluation of the top 10 in silico predicted off-target sites revealed only one site showing a significant difference in methylation when comparing DEM and control-treated cells. On closer examination this site was found to fall within an intergenic inaccessible region as evaluated using ATAC-seg data.

132 Discussion

Taken together, an extensive investigation was carried out into the specificity of the DEMs and it was consistently demonstrated that their efficacy is not associated with significant off- target or other harmful effects.

4.5 THE RELEVANCE OF DEM-BASED APPROACHES FOR HIV GENE THERAPY

Compared to the first antiretroviral treatment strategies, current regimens are more effective, less toxic and have a lower pill burden but are still associated with significant side-effects.

In addition, the prospects of an HIV vaccine have been dampened by low efficacy (191) therefore there is an urgent need for new treatment strategies which will additionally address the concerns associated with viral escape and latency. As previously discussed, the emergence of designer nuclease technology has led to significant strides in the development of antiviral therapeutics. However, limitations linked to their efficiency, specificity and delivery are still causes for concern. The future of HIV gene therapy with these platforms will strongly depend on the attention to key considerations, some of which are discussed below.

As previously discussed, off-target cleavage by designer nucleases is a major adverse effect associated with this technology. There are various determinants of off-target cleavage which shed light on the modifications necessary to improve the specificity of such approaches.

Firstly, there is direct correlation between the frequency of identical sequences in the genome and the probability of off-target activity. Therefore, the use of software to determine whether a particular nuclease has multiple target sites in the genome is extremely beneficial

(192). In this study the TAL Effector Targeter and PROGNOS software were used to aid in

133 Discussion target site selection and to predict potential off-target sites while the COSMID software was used to determine sites bearing up to three mismatches compared to the target site.

Secondly, the amount of nuclease used and delivery method have also been found to be crucial. Hsu and colleagues (104) demonstrated a reduction in off-target cleavage through the dosage titration of gRNA and Cas9 plasmids. However, it is important to bear in mind a reduction in input amount may lead to a reduction in on-target activity as well. Concerning delivery, reduced exposure to the nucleases through the use of transient delivery methods and non-integrating viral vectors may reduce the probability of off-target effects (192). Kim and colleagues (179) demonstrated that the delivery of Cas9 and gRNAs as ribonucleoproteins (RNPs) had a positive impact on off-target activity. When tested in human cells, RNPs induced indels at target sites almost immediately following delivery and were quickly degraded thus reducing off-target effects. Importantly, the greater specificity achieved via RNP delivery did not come at the expense of on-target activity. Target site accessibility which has been discussed in previous sections is another major determinant of off-target activity which poses its own challenges.

This project presents a novel DEM platform which not only addresses the challenges posed by the virus itself but the limitations currently associated with designer nuclease technology as well. Viral escape is a long-standing problem faced by anti-HIV therapeutics and will likely remain a major obstacle for the foreseeable future. Notably, resistance to the entry inhibitor Maraviroc has already been demonstrated. This drug is the only currently approved

CCR5 antagonist and received approval from the United Stated Food and Drug

134 Discussion

Administration in 2007. In this case resistance is the result of the mutations in the gp120 and gp 41 viral glycoproteins allowing for interaction with drug-bound CCR5 as well as the emergence of previously undetected X4-tropic virus which are selected under pressure in response to treatment (30). Interestingly, in 2016 it was reported that HIV had acquired mutations which conferred resistance to CRISPR-based approaches (193). Specifically, insertions, deletions and substitutions characteristic of the NHEJ pathway had been discovered around the Cas9/gRNA cleavage sites. In this case the indels introduced were not deleterious but prevented recognition by the same gRNA due to the altered target DNA sequence. The DEM platform described here would mainly function on the level of the host thus circumventing the problems associated with viral mutations. In addition, as this approach does not rely on DNA cleavage the phenomenon observed with the CRISPR system would not be a concern although the feasibility of targeting the HIV provirus using this approach would need to be investigated. In this study the possibility and feasibility of multiplexing the DEMs to target CCR5 and CXCR4 in T cell-based therapy was also demonstrated.

Viral latency is another major cause for concern which can currently not be overcome by drug or nuclease based approaches. One way in which viral reactivation has been achieved is through the ‘shock and kill’ method. This involves the use of chemical agent to induce cellular reactivation (80). However, the functionality of such agents has been found to be unsatisfactory and nonspecific. In the case of designer nuclease or DEM-based approaches, it would theoretically be possible to eliminate the risks associated with the reactivation of latent virus if this occurred at a time when CCR5 and/CXCR4 had been silenced thus inhibiting viral infection of the edited cells.

135 Discussion

The major advantage of using DEMs in comparison to designer nucleases is that this platform addresses the existing specificity and safety concerns. This study demonstrated that

DEMs show a benign specificity profile as evaluated by different methods including high- throughput, genome-wide analyses. Ultimately, the ability to achieve stable and potent gene silencing following transient expression of the DEMs is highly advantageous and may be key to the advancement of such approaches to the clinic. The so-called ‘hit and run’ action of this platform is a stark contrast to nuclease technology which causes permanent modifications at both on- and off-target sites. In addition, this study demonstrated that the cytosine modifications mediated by the DEMs are also reversible. While the use of drugs such as 5-AZA is limited due to safety concerns, the ten-eleven translocation (TET) family of proteins in conjunction with programmable DNA-binding domains has enabled the targeted removal of CpG methylation. The TET proteins are enzymes which catalyse the oxidation of 5mC to 5-hydroymethlycytosine thus resulting in the demethylation of DNA.

Maeder and colleagues (194) fused these proteins to TALE DNA-binding domains and were able to achieve targeted DNA demethylation at endogenous loci. This demethylation was accompanied by substantial increases in gene expression thus highlighting the importance of this novel tool for biological and medical applications.

Going forward, one major question remains to be answered if DEMs are to be considered a serious contender in the battle against HIV. Namely, how protective will this approach be against HIV infection? One way to address this question is to conduct in vivo and HIV challenge experiments to determine the physiological effect of DEM-mediated silencing.

However, at the level of providing proof of principle, this study successfully demonstrated that DEMs do indeed present an exciting new platform of therapeutics which addresses both

136 Discussion efficacy and safety concerns and which may bring HIV gene therapy one step closer to the clinic.

137 References

5. REFERENCES

1. UNAIDS Fact Sheet [press release]. 2017. 2. Maartens G, Celum C, Lewin SR. HIV infection: epidemiology, pathogenesis, treatment, and prevention. Lancet. 2014;384(9939):258-71. 3. Deeks SG, Overbaugh J, Phillips A, Buchbinder S. HIV infection. Nature reviews Disease primers. 2015;1:15035. 4. Sharp PM, Hahn BH. Origins of HIV and the AIDS pandemic. Cold Spring Harbor perspectives in medicine. 2011;1(1):a006841. 5. Mauclere P, Loussert-Ajaka I, Damond F, Fagot P, Souquieres S, Monny Lobe M, et al. Serological and virological characterization of HIV-1 group O infection in Cameroon. Aids. 1997;11(4):445-53. 6. Plantier JC, Leoz M, Dickerson JE, De Oliveira F, Cordonnier F, Lemee V, et al. A new human immunodeficiency virus derived from gorillas. Nature medicine. 2009;15(8):871-2. 7. Vallari A, Bodelle P, Ngansop C, Makamche F, Ndembi N, Mbanya D, et al. Four new HIV-1 group N isolates from Cameroon: Prevalence continues to be low. AIDS research and human retroviruses. 2010;26(1):109-15. 8. de Silva TI, Cotten M, Rowland-Jones SL. HIV-2: the forgotten AIDS virus. Trends in microbiology. 2008;16(12):588-95. 9. Saleh S, Vranckx L, Gijsbers R, Christ F, Debyser Z. Insight into HIV-2 latency may disclose strategies for a cure for HIV-1 infection. Journal of virus eradication. 2017;3(1):7- 14. 10. Lusic M, Siliciano RF. Nuclear landscape of HIV-1 infection and integration. Nature reviews Microbiology. 2017;15(2):69-82. 11. Gu WG. Genome editing-based HIV therapies. Trends in biotechnology. 2015;33(3):172-9. 12. Barre-Sinoussi F, Ross AL, Delfraissy JF. Past, present and future: 30 years of HIV research. Nature reviews Microbiology. 2013;11(12):877-83. 13. Doitsh G, Galloway NL, Geng X, Yang Z, Monroe KM, Zepeda O, et al. Cell death by pyroptosis drives CD4 T-cell depletion in HIV-1 infection. Nature. 2014;505(7484):509- 14. 14. Vidya Vijayan KK, Karthigeyan KP, Tripathi SP, Hanna LE. Pathophysiology of CD4+ T-Cell Depletion in HIV-1 and HIV-2 Infections. Frontiers in immunology. 2017;8:580. 15. Rambaut A, Posada D, Crandall KA, Holmes EC. The causes and consequences of HIV evolution. Nature reviews Genetics. 2004;5(1):52-61. 16. Clapham PR, McKnight A. HIV-1 receptors and cell tropism. British medical bulletin. 2001;58:43-59. 17. Berger EA, Doms RW, Fenyo EM, Korber BT, Littman DR, Moore JP, et al. A new classification for HIV-1. Nature. 1998;391(6664):240.

138 References

18. Bader J, Daumer M, Schoni-Affolter F, Boni J, Gorgievski-Hrisoho M, Martinetti G, et al. Therapeutic Immune Recovery and Reduction of CXCR4-Tropic HIV-1. Clinical infectious diseases : an official publication of the Infectious Diseases Society of America. 2017;64(3):295-300. 19. Kok YL, Ciuffi A, Metzner KJ. Unravelling HIV-1 Latency, One Cell at a Time. Trends in microbiology. 2017;25(11):932-41. 20. Roberts JD, Bebenek K, Kunkel TA. The accuracy of reverse transcriptase from HIV-1. Science. 1988;242(4882):1171-3. 21. Rawson JM, Landman SR, Reilly CS, Mansky LM. HIV-1 and HIV-2 exhibit similar mutation frequencies and spectra in the absence of G-to-A hypermutation. Retrovirology. 2015;12:60. 22. Bhoj VG, Thibodeaux SR, Levine BL. Novel gene and cellular therapy approaches for treating HIV. Discovery medicine. 2016;21(116):283-92. 23. Allers K, Schneider T. CCR5Delta32 mutation and HIV infection: basis for curative HIV therapy. Current opinion in virology. 2015;14:24-9. 24. Miller RJ, Banisadr G, Bhattacharyya BJ. CXCR4 signaling in the regulation of stem cell migration and development. Journal of neuroimmunology. 2008;198(1-2):31-8. 25. Feng Y, Broder CC, Kennedy PE, Berger EA. HIV-1 entry cofactor: functional cDNA cloning of a seven-transmembrane, G protein-coupled receptor. Science. 1996;272(5263):872-7. 26. Deng H, Liu R, Ellmeier W, Choe S, Unutmaz D, Burkhart M, et al. Identification of a major co-receptor for primary isolates of HIV-1. Nature. 1996;381(6584):661-6. 27. Hutter G, Nowak D, Mossner M, Ganepola S, Mussig A, Allers K, et al. Long-term control of HIV by CCR5 Delta32/Delta32 stem-cell transplantation. The New England journal of medicine. 2009;360(7):692-8. 28. de Roda Husman AM, Koot M, Cornelissen M, Keet IP, Brouwer M, Broersen SM, et al. Association between CCR5 genotype and the clinical course of HIV-1 infection. Annals of internal medicine. 1997;127(10):882-90. 29. Liu R, Paxton WA, Choe S, Ceradini D, Martin SR, Horuk R, et al. Homozygous defect in HIV-1 coreceptor accounts for resistance of some multiply-exposed individuals to HIV-1 infection. Cell. 1996;86(3):367-77. 30. Woollard SM, Kanmogne GD. Maraviroc: a review of its use in HIV infection and beyond. Drug design, development and therapy. 2015;9:5447-68. 31. Kiem HP, Jerome KR, Deeks SG, McCune JM. Hematopoietic-stem-cell-based gene therapy for HIV disease. Cell stem cell. 2012;10(2):137-47. 32. Gaj T, Gersbach CA, Barbas CF, 3rd. ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering. Trends in biotechnology. 2013;31(7):397-405. 33. Kim H, Kim JS. A guide to genome engineering with programmable nucleases. Nature reviews Genetics. 2014;15(5):321-34. 34. Lieber MR. The mechanism of human nonhomologous DNA end joining. The Journal of biological chemistry. 2008;283(1):1-5.

139 References

35. Mao Z, Bozzella M, Seluanov A, Gorbunova V. DNA repair by nonhomologous end joining and homologous recombination during cell cycle in human cells. Cell cycle. 2008;7(18):2902-6. 36. Moehle EA, Rock JM, Lee YL, Jouvenot Y, DeKelver RC, Gregory PD, et al. Targeted gene addition into a specified location in the human genome using designed zinc finger nucleases. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(9):3055-60. 37. Lombardo A, Cesana D, Genovese P, Di Stefano B, Provasi E, Colombo DF, et al. Site-specific integration and tailoring of cassette design for sustainable gene transfer. Nature methods. 2011;8(10):861-9. 38. Bortesi L, Fischer R. The CRISPR/Cas9 system for plant genome editing and beyond. Biotechnol Adv. 2015;33(1):41-52. 39. Kim YG, Cha J, Chandrasegaran S. Hybrid restriction enzymes: zinc finger fusions to Fok I cleavage domain. Proceedings of the National Academy of Sciences of the United States of America. 1996;93(3):1156-60. 40. Chandrasegaran S, Carroll D. Origins of Programmable Nucleases for Genome Engineering. Journal of molecular biology. 2016;428(5 Pt B):963-89. 41. Bitinaite J, Wah DA, Aggarwal AK, Schildkraut I. FokI dimerization is required for DNA cleavage. Proceedings of the National Academy of Sciences of the United States of America. 1998;95(18):10570-5. 42. Pavletich NP, Pabo CO. Zinc finger-DNA recognition: crystal structure of a Zif268- DNA complex at 2.1 A. Science. 1991;252(5007):809-17. 43. Smith J, Bibikova M, Whitby FG, Reddy AR, Chandrasegaran S, Carroll D. Requirements for double-strand cleavage by chimeric restriction enzymes with zinc finger DNA-recognition domains. Nucleic acids research. 2000;28(17):3361-9. 44. Miller JC, Holmes MC, Wang J, Guschin DY, Lee YL, Rupniewski I, et al. An improved zinc-finger nuclease architecture for highly specific genome editing. Nature biotechnology. 2007;25(7):778-85. 45. Cornu TI, Thibodeau-Beganny S, Guhl E, Alwin S, Eichtinger M, Joung JK, et al. DNA-binding Specificity Is a Major Determinant of the Activity and Toxicity of Zinc-finger Nucleases. Molecular therapy : the journal of the American Society of Gene Therapy. 2008;16(2):352-8. 46. Ramirez CL, Foley JE, Wright DA, Muller-Lerch F, Rahman SH, Cornu TI, et al. Unexpected failure rates for modular assembly of engineered zinc fingers. Nature methods. 2008;5(5):374-5. 47. Boch J, Bonas U. Xanthomonas AvrBs3 family-type III effectors: discovery and function. Annual review of phytopathology. 2010;48:419-36. 48. Moscou MJ, Bogdanove AJ. A simple cipher governs DNA recognition by TAL effectors. Science. 2009;326(5959):1501. 49. Mussolino C, Cathomen T. RNA guides genome engineering. Nature biotechnology. 2013;31(3):208-9. 50. Joung JK, Sander JD. TALENs: a widely applicable technology for targeted genome editing. Nature reviews Molecular cell biology. 2013;14(1):49-55.

140 References

51. Mussolino C, Morbitzer R, Lutge F, Dannemann N, Lahaye T, Cathomen T. A novel TALE nuclease scaffold enables high genome editing activity in combination with low toxicity. Nucleic acids research. 2011;39(21):9283-93. 52. Boissel S, Jarjour J, Astrakhan A, Adey A, Gouble A, Duchateau P, et al. megaTALs: a rare-cleaving nuclease architecture for therapeutic genome engineering. Nucleic acids research. 2014;42(4):2591-601. 53. Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, Charpentier E. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science. 2012;337(6096):816-21. 54. Cong L, Ran FA, Cox D, Lin S, Barretto R, Habib N, et al. Multiplex genome engineering using CRISPR/Cas systems. Science. 2013;339(6121):819-23. 55. Komor AC, Badran AH, Liu DR. CRISPR-Based Technologies for the Manipulation of Eukaryotic Genomes. Cell. 2017;168(1-2):20-36. 56. Fu Y, Foden JA, Khayter C, Maeder ML, Reyon D, Joung JK, et al. High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells. Nature biotechnology. 2013;31(9):822-6. 57. Sander JD, Joung JK. CRISPR-Cas systems for editing, regulating and targeting genomes. Nature biotechnology. 2014;32(4):347-55. 58. Nansen A, Christensen JP, Andreasen SO, Bartholdy C, Christensen JE, Thomsen AR. The role of CC chemokine receptor 5 in antiviral immunity. Blood. 2002;99(4):1237- 45. 59. Perez EE, Wang J, Miller JC, Jouvenot Y, Kim KA, Liu O, et al. Establishment of HIV-1 resistance in CD4+ T cells by genome editing using zinc-finger nucleases. Nature biotechnology. 2008;26(7):808-16. 60. Maier DA, Brennan AL, Jiang S, Binder-Scholl GK, Lee G, Plesa G, et al. Efficient clinical scale gene modification via zinc finger nuclease-targeted disruption of the HIV co- receptor CCR5. Human gene therapy. 2013;24(3):245-58. 61. Yi G, Choi JG, Bharaj P, Abraham S, Dang Y, Kafri T, et al. CCR5 Gene Editing of Resting CD4(+) T Cells by Transient ZFN Expression From HIV Envelope Pseudotyped Nonintegrating Lentivirus Confers HIV-1 Resistance in Humanized Mice. Molecular therapy Nucleic acids. 2014;3:e198. 62. DiGiusto DL, Cannon PM, Holmes MC, Li L, Rao A, Wang J, et al. Preclinical development and qualification of ZFN-mediated CCR5 disruption in human hematopoietic stem/progenitor cells. Molecular therapy Methods & clinical development. 2016;3:16067. 63. Holt N, Wang J, Kim K, Friedman G, Wang X, Taupin V, et al. Human hematopoietic stem/progenitor cells modified by zinc-finger nucleases targeted to CCR5 control HIV-1 in vivo. Nature biotechnology. 2010;28(8):839-47. 64. Qu X, Wang P, Ding D, Li L, Wang H, Ma L, et al. Zinc-finger-nucleases mediate specific and efficient excision of HIV-1 proviral DNA from infected and latently infected human T cells. Nucleic acids research. 2013;41(16):7771-82. 65. Yao Y, Nashun B, Zhou T, Qin L, Qin L, Zhao S, et al. Generation of CD34+ cells from CCR5-disrupted human embryonic and induced pluripotent stem cells. Human gene therapy. 2012;23(2):238-42.

141 References

66. Tebas P, Stein D, Tang WW, Frank I, Wang SQ, Lee G, et al. Gene editing of CCR5 in autologous CD4 T cells of persons infected with HIV. The New England journal of medicine. 2014;370(10):901-10. 67. Yin H, Kauffman KJ, Anderson DG. Delivery technologies for genome editing. Nature reviews Drug discovery. 2017;16(6):387-99. 68. Qu X, Wang P, Ding D, Wang X, Zhang G, Zhou X, et al. Zinc finger nuclease: a new approach for excising HIV-1 proviral DNA from infected human T cells. Molecular biology reports. 2014;41(9):5819-27. 69. Mock U, Machowicz R, Hauber I, Horn S, Abramowski P, Berdien B, et al. mRNA transfection of a novel TAL effector nuclease (TALEN) facilitates efficient knockout of HIV co-receptor CCR5. Nucleic acids research. 2015;43(11):5560-71. 70. Ru R, Yao Y, Yu S, Yin B, Xu W, Zhao S, et al. Targeted genome engineering in human induced pluripotent stem cells by penetrating TALENs. Cell regeneration. 2013;2(1):5. 71. Ebina H, Kanemura Y, Misawa N, Sakuma T, Kobayashi T, Yamamoto T, et al. A high excision potential of TALENs for integrated DNA of HIV-based lentiviral vector. PloS one. 2015;10(3):e0120047. 72. Strong CL, Guerra HP, Mathew KR, Roy N, Simpson LR, Schiller MR. Damaging the Integrated HIV Proviral DNA with TALENs. PloS one. 2015;10(5):e0125652. 73. Valton J, Dupuy A, Daboussi F, Thomas S, Marechal A, Macmaster R, et al. Overcoming transcription activator-like effector (TALE) DNA binding domain sensitivity to cytosine methylation. The Journal of biological chemistry. 2012;287(46):38427-32. 74. Ebina H, Misawa N, Kanemura Y, Koyanagi Y. Harnessing the CRISPR/Cas9 system to disrupt latent HIV-1 provirus. Scientific reports. 2013;3:2510. 75. Hu W, Kaminski R, Yang F, Zhang Y, Cosentino L, Li F, et al. RNA-directed gene editing specifically eradicates latent and prevents new HIV-1 infection. Proceedings of the National Academy of Sciences of the United States of America. 2014;111(31):11461-6. 76. Mandal PK, Ferreira LM, Collins R, Meissner TB, Boutwell CL, Friesen M, et al. Efficient ablation of genes in human hematopoietic stem and effector cells using CRISPR/Cas9. Cell stem cell. 2014;15(5):643-52. 77. Kang H, Minder P, Park MA, Mesquitta WT, Torbett BE, Slukvin, II. CCR5 Disruption in Induced Pluripotent Stem Cells Using CRISPR/Cas9 Provides Selective Resistance of Immune Cells to CCR5-tropic HIV-1 Virus. Molecular therapy Nucleic acids. 2015;4:e268. 78. Yang L, Guell M, Byrne S, Yang JL, De Los Angeles A, Mali P, et al. Optimization of scarless human stem cell genome editing. Nucleic acids research. 2013;41(19):9049-61. 79. Dahabieh MS, Battivelli E, Verdin E. Understanding HIV latency: the road to an HIV cure. Annu Rev Med. 2015;66:407-21. 80. Khalili K, White MK, Jacobson JM. Novel AIDS therapies based on gene editing. Cellular and molecular life sciences : CMLS. 2017;74(13):2439-50. 81. Didigu CA, Wilen CB, Wang J, Duong J, Secreto AJ, Danet-Desnoyers GA, et al. Simultaneous zinc-finger nuclease editing of the HIV coreceptors ccr5 and cxcr4 protects CD4+ T cells from HIV-1 infection. Blood. 2014;123(1):61-9.

142 References

82. Liu Z, Chen S, Jin X, Wang Q, Yang K, Li C, et al. Genome editing of the HIV co- receptors CCR5 and CXCR4 by CRISPR-Cas9 protects CD4+ T cells from HIV-1 infection. Cell & bioscience. 2017;7:47. 83. Tachibana K, Hirota S, Iizasa H, Yoshida H, Kawabata K, Kataoka Y, et al. The chemokine receptor CXCR4 is essential for vascularization of the gastrointestinal tract. Nature. 1998;393(6685):591-4. 84. Hou P, Chen S, Wang S, Yu X, Chen Y, Jiang M, et al. Genome editing of CXCR4 by CRISPR/cas9 confers cells resistant to HIV-1 infection. Scientific reports. 2015;5:15577. 85. Wilen CB, Wang J, Tilton JC, Miller JC, Kim KA, Rebar EJ, et al. Engineering HIV- resistant human CD4+ T cells with CXCR4-specific zinc-finger nucleases. PLoS pathogens. 2011;7(4):e1002020. 86. Yuan J, Wang J, Crain K, Fearns C, Kim KA, Hua KL, et al. Zinc-finger nuclease editing of human cxcr4 promotes HIV-1 CD4(+) T cell resistance and enrichment. Molecular therapy : the journal of the American Society of Gene Therapy. 2012;20(4):849-59. 87. Cox DB, Platt RJ, Zhang F. Therapeutic genome editing: prospects and challenges. Nature medicine. 2015;21(2):121-31. 88. Ma S, Wang X, Liu Y, Gao J, Zhang S, Shi R, et al. Multiplex genomic structure variation mediated by TALEN and ssODN. BMC genomics. 2014;15:41. 89. Brunet E, Simsek D, Tomishima M, DeKelver R, Choi VM, Gregory P, et al. Chromosomal translocations induced at specified loci in human stem cells. Proceedings of the National Academy of Sciences of the United States of America. 2009;106(26):10620-5. 90. Lee CM, Cradick TJ, Fine EJ, Bao G. Nuclease Target Site Selection for Maximizing On-target Activity and Minimizing Off-target Effects in Genome Editing. Molecular therapy : the journal of the American Society of Gene Therapy. 2016;24(3):475-87. 91. McConnell Smith A, Takeuchi R, Pellenz S, Davis L, Maizels N, Monnat RJ, Jr., et al. Generation of a nicking enzyme that stimulates site-specific gene conversion from the I- AniI LAGLIDADG homing endonuclease. Proceedings of the National Academy of Sciences of the United States of America. 2009;106(13):5099-104. 92. Ran FA, Hsu PD, Lin CY, Gootenberg JS, Konermann S, Trevino AE, et al. Double nicking by RNA-guided CRISPR Cas9 for enhanced genome editing specificity. Cell. 2013;154(6):1380-9. 93. Hendel A, Bak RO, Clark JT, Kennedy AB, Ryan DE, Roy S, et al. Chemically modified guide RNAs enhance CRISPR-Cas genome editing in human primary cells. Nature biotechnology. 2015;33(9):985-9. 94. Ryan DE, Taussig D, Steinfeld I, Phadnis SM, Lunstad BD, Singh M, et al. Improving CRISPR-Cas specificity with chemical modifications in single-guide RNAs. Nucleic acids research. 2018;46(2):792-803. 95. Fu Y, Sander JD, Reyon D, Cascio VM, Joung JK. Improving CRISPR-Cas nuclease specificity using truncated guide RNAs. Nature biotechnology. 2014;32:279. 96. Sanchez-Rivera FJ, Papagiannakopoulos T, Romero R, Tammela T, Bauer MR, Bhutkar A, et al. Rapid modelling of cooperating genetic events in cancer through somatic genome editing. Nature. 2014;516(7531):428-31.

143 References

97. Lombardo A, Genovese P, Beausejour CM, Colleoni S, Lee YL, Kim KA, et al. Gene editing in human stem cells using zinc finger nucleases and integrase-defective lentiviral vector delivery. Nature biotechnology. 2007;25(11):1298-306. 98. Holkers M, Maggio I, Liu J, Janssen JM, Miselli F, Mussolino C, et al. Differential integrity of TALE nuclease genes following adenoviral and lentiviral vector gene transfer into human cells. Nucleic acids research. 2013;41(5):e63. 99. Wang AY, Peng PD, Ehrhardt A, Storm TA, Kay MA. Comparison of adenoviral and adeno-associated viral vectors for pancreatic gene delivery in vivo. Human gene therapy. 2004;15(4):405-13. 100. Yang Y, Wang L, Bell P, McMenamin D, He Z, White J, et al. A dual AAV system enables the Cas9-mediated correction of a metabolic liver disease in newborn mice. Nature biotechnology. 2016;34(3):334-8. 101. Rols MP. Mechanism by which electroporation mediates DNA migration and entry into cells and targeted tissues. Methods in molecular biology. 2008;423:19-33. 102. Schumann K, Lin S, Boyer E, Simeonov DR, Subramaniam M, Gate RE, et al. Generation of knock-in primary human T cells using Cas9 ribonucleoproteins. Proceedings of the National Academy of Sciences of the United States of America. 2015;112(33):10437- 42. 103. Shim G, Kim M-G, Park JY, Oh Y-K. Application of cationic liposomes for delivery of nucleic acids. Asian Journal of Pharmaceutical Sciences. 2013;8(2):72-80. 104. Hsu PD, Scott DA, Weinstein JA, Ran FA, Konermann S, Agarwala V, et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nature biotechnology. 2013;31(9):827- 32. 105. Bessis N, GarciaCozar FJ, Boissier MC. Immune responses to gene therapy vectors: influence on vector function and effector mechanisms. Gene therapy. 2004;11 Suppl 1:S10- 7. 106. Zuris JA, Thompson DB, Shu Y, Guilinger JP, Bessen JL, Hu JH, et al. Cationic lipid-mediated delivery of proteins enables efficient protein-based genome editing in vitro and in vivo. Nature biotechnology. 2015;33(1):73-80. 107. Wang M, Zuris JA, Meng F, Rees H, Sun S, Deng P, et al. Efficient delivery of genome-editing proteins using bioreducible lipid nanoparticles. Proceedings of the National Academy of Sciences of the United States of America. 2016;113(11):2868-73. 108. Müller M. MT, Mussolino C. Designer Effectors for Editing and Regulating Complex Genomes. In: N B-P, editor. Safety and Efficacy of Gene-Based Therapeutics for Inherited Disorders: Springer, Cham; 2017. 109. Lee TI, Young RA. Transcriptional regulation and its misregulation in disease. Cell. 2013;152(6):1237-51. 110. Conaway JW. Introduction to theme "Chromatin, epigenetics, and transcription". Annual review of biochemistry. 2012;81:61-4. 111. Maston GA, Evans SK, Green MR. Transcriptional regulatory elements in the human genome. Annual review of genomics and human genetics. 2006;7:29-59. 112. Vernimmen D, Bickmore WA. The Hierarchy of Transcriptional Activation: From Enhancer to Promoter. Trends in genetics : TIG. 2015;31(12):696-708.

144 References

113. Thakore PI, Black JB, Hilton IB, Gersbach CA. Editing the epigenome: technologies for programmable transcription and epigenetic modulation. Nature methods. 2016;13(2):127-37. 114. Ma J. Transcriptional activators and activation mechanisms. Protein Cell. 2011;2(11):879-88. 115. Onori A, Pisani C, Strimpakos G, Monaco L, Mattei E, Passananti C, et al. UtroUp is a novel six zinc finger artificial transcription factor that recognises 18 base pairs of the utrophin promoter and efficiently drives utrophin upregulation. BMC Mol Biol. 2013;14:3. 116. Passananti C, Corbi N, Onori A, Di Certo MG, Mattei E. Transgenic mice expressing an artificial zinc finger regulator targeting an endogenous gene. Methods in molecular biology. 2010;649:183-206. 117. Maeder ML, Linder SJ, Reyon D, Angstman JF, Fu Y, Sander JD, et al. Robust, synergistic regulation of human gene expression using TALE activators. Nature methods. 2013;10(3):243-5. 118. Maeder ML, Linder SJ, Cascio VM, Fu Y, Ho QH, Joung JK. CRISPR RNA-guided activation of endogenous human genes. Nature methods. 2013;10(10):977-9. 119. Gilbert LA, Larson MH, Morsut L, Liu Z, Brar GA, Torres SE, et al. CRISPR- mediated modular RNA-guided regulation of transcription in eukaryotes. Cell. 2013;154(2):442-51. 120. Cheng AW, Wang H, Yang H, Shi L, Katz Y, Theunissen TW, et al. Multiplexed activation of endogenous genes by CRISPR-on, an RNA-guided transcriptional activator system. Cell research. 2013;23(10):1163-71. 121. Perez-Pinera P, Ousterout DG, Brunger JM, Farin AM, Glass KA, Guilak F, et al. Synergistic and tunable human gene activation by combinations of synthetic transcription factors. Nature methods. 2013;10(3):239-42. 122. Gaston K, Jayaraman PS. Transcriptional repression in eukaryotes: repressors and repression mechanisms. Cellular and molecular life sciences : CMLS. 2003;60(4):721-41. 123. Thiel G, Lietz M, Hohl M. How mammalian transcriptional repressors work. Eur J Biochem. 2004;271(14):2855-62. 124. Bobbin ML, Rossi JJ. RNA Interference (RNAi)-Based Therapeutics: Delivering on the Promise? Annu Rev Pharmacol Toxicol. 2016;56:103-22. 125. Jackson AL, Burchard J, Schelter J, Chau BN, Cleary M, Lim L, et al. Widespread siRNA "off-target" transcript silencing mediated by seed region sequence complementarity. Rna. 2006;12(7):1179-87. 126. Grimm D, Streetz KL, Jopling CL, Storm TA, Pandey K, Davis CR, et al. Fatality in mice due to oversaturation of cellular microRNA/short hairpin RNA pathways. Nature. 2006;441(7092):537-41. 127. Mussolino C, Sanges D, Marrocco E, Bonetti C, Di Vicino U, Marigo V, et al. Zinc- finger-based transcriptional repression of rhodopsin in a model of dominant retinitis pigmentosa. EMBO molecular medicine. 2011;3(3):118-28. 128. Cong L, Zhou R, Kuo YC, Cunniff M, Zhang F. Comprehensive interrogation of natural TALE DNA-binding modules and transcriptional repressor domains. Nature communications. 2012;3:968.

145 References

129. Nelson CE, Robinson-Hamm JN, Gersbach CA. Genome engineering: a new approach to gene therapy for neuromuscular disorders. Nat Rev Neurol. 2017;13(11):647- 61. 130. Bloom K, Joglekar A. Towards building a chromosome segregation machine. Nature. 2010;463(7280):446-56. 131. Clapier CR, Cairns BR. The biology of chromatin remodeling complexes. Annual review of biochemistry. 2009;78:273-304. 132. Lawrence M, Daujat S, Schneider R. Lateral Thinking: How Histone Modifications Regulate Gene Expression. Trends in genetics : TIG. 2016;32(1):42-56. 133. Epigenetic Gene Expression and Regulation: Academic Press; 2015. 134. Zhang T, Cooper S, Brockdorff N. The interplay of histone modifications - writers that read. EMBO reports. 2015;16(11):1467-81. 135. Allfrey VG, Faulkner R, Mirsky AE. Acetylation and Methylation of Histones and Their Possible Role in the Regulation of Rna Synthesis. Proceedings of the National Academy of Sciences of the United States of America. 1964;51:786-94. 136. Zentner GE, Henikoff S. Regulation of nucleosome dynamics by histone modifications. Nature structural & molecular biology. 2013;20(3):259-66. 137. Marx V. Reading the second genomic code. Nature. 2012;491:143. 138. Fernandez AF, Huidobro C, Fraga MF. De novo DNA methyltransferases: oncogenes, tumor suppressors, or both? Trends in genetics : TIG. 2012;28(10):474-9. 139. Li E, Zhang Y. DNA methylation in mammals. Cold Spring Harbor perspectives in biology. 2014;6(5):a019133. 140. Lyko F. The DNA methyltransferase family: a versatile toolkit for epigenetic regulation. Nature reviews Genetics. 2017. 141. Klose RJ, Bird AP. Genomic DNA methylation: the mark and its mediators. Trends in biochemical sciences. 2006;31(2):89-97. 142. Yang L, Rau R, Goodell MA. DNMT3A in haematological malignancies. Nature reviews Cancer. 2015;15(3):152-65. 143. Keshet I, Lieman-Hurwitz J, Cedar H. DNA methylation affects the formation of active chromatin. Cell. 1986;44(4):535-43. 144. Feinberg AP, Tycko B. The history of cancer epigenetics. Nature reviews Cancer. 2004;4(2):143-53. 145. Greger V, Passarge E, Hopping W, Messmer E, Horsthemke B. Epigenetic changes may contribute to the formation and spontaneous regression of retinoblastoma. Hum Genet. 1989;83(2):155-8. 146. Ehrlich M. DNA hypomethylation in cancer cells. Epigenomics. 2009;1(2):239-59. 147. Stolzenburg S, Beltran AS, Swift-Scanlan T, Rivenbark AG, Rashwan R, Blancafort P. Stable oncogenic silencing in vivo by programmable and targeted de novo DNA methylation in breast cancer. Oncogene. 2015;34(43):5427-35. 148. Li K, Pang J, Cheng H, Liu WP, Di JM, Xiao HJ, et al. Manipulation of prostate cancer metastasis by locus-specific modification of the CRMP4 promoter region using

146 References chimeric TALE DNA methyltransferase and demethylase. Oncotarget. 2015;6(12):10030- 44. 149. Siddique AN, Nunna S, Rajavelu A, Zhang Y, Jurkowska RZ, Reinhardt R, et al. Targeted methylation and gene silencing of VEGF-A in human cells by using a designed Dnmt3a-Dnmt3L single-chain fusion protein with increased DNA methylation activity. Journal of molecular biology. 2013;425(3):479-91. 150. Pulecio J, Verma N, Mejia-Ramirez E, Huangfu D, Raya A. CRISPR/Cas9-Based Engineering of the Epigenome. Cell stem cell. 2017;21(4):431-47. 151. Morbitzer R, Elsaesser J, Hausner J, Lahaye T. Assembly of custom TALE-type DNA binding domains by modular cloning. Nucleic acids research. 2011;39(13):5790-9. 152. Seipel K, Georgiev O, Schaffner W. Different activation domains stimulate transcription from remote ('enhancer') and proximal ('promoter') positions. EMBO J. 1992;11(13):4961-8. 153. Margolin JF, Friedman JR, Meyer WK, Vissing H, Thiesen HJ, Rauscher FJ, 3rd. Kruppel-associated boxes are potent transcriptional repression domains. Proceedings of the National Academy of Sciences of the United States of America. 1994;91(10):4509-13. 154. Simari RD, Yang ZY, Ling X, Stephan D, Perkins ND, Nabel GJ, et al. Requirements for enhanced transgene expression by untranslated sequences from the human cytomegalovirus immediate-early gene. Molecular medicine. 1998;4(11):700-6. 155. Reither S, Li F, Gowher H, Jeltsch A. Catalytic mechanism of DNA-(cytosine-C5)- methyltransferases revisited: covalent intermediate formation is not essential for methyl group transfer by the murine Dnmt3a enzyme. Journal of molecular biology. 2003;329(4):675-84. 156. Kotlarz D, Zietara N, Uzel G, Weidemann T, Braun CJ, Diestelhorst J, et al. Loss- of-function mutations in the IL-21 receptor gene cause a primary immunodeficiency syndrome. The Journal of experimental medicine. 2013;210(3):433-43. 157. Doyle EL, Booher NJ, Standage DS, Voytas DF, Brendel VP, Vandyk JK, et al. TAL Effector-Nucleotide Targeter (TALE-NT) 2.0: tools for TAL effector design and target prediction. Nucleic acids research. 2012;40(Web Server issue):W117-22. 158. Cradick TJ, Qiu P, Lee CM, Fine EJ, Bao G. COSMID: A Web-based Tool for Identifying and Validating CRISPR/Cas Off-target Sites. Molecular therapy Nucleic acids. 2014;3:e214. 159. Magoc T, Salzberg SL. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics. 2011;27(21):2957-63. 160. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754-60. 161. Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA- binding proteins and nucleosome position. Nature methods. 2013;10(12):1213-8. 162. Jiang H, Lei R, Ding SW, Zhu S. Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads. BMC bioinformatics. 2014;15:182. 163. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15-21.

147 References

164. Qi LS, Larson MH, Gilbert LA, Doudna JA, Weissman JS, Arkin AP, et al. Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell. 2013;152(5):1173-83. 165. Amabile A, Migliara A, Capasso P, Biffi M, Cittaro D, Naldini L, et al. Inheritable Silencing of Endogenous Genes by Hit-and-Run Targeted Epigenetic Editing. Cell. 2016;167(1):219-32 e14. 166. Turker MS. Methylation of mouse adenine phosphoribosyltransferase gene is altered upon cellular differentiation and loss of phenotypic expression. Somat Cell Mol Genet. 1990;16(4):331-40. 167. Grant SG, Chapman VM. Mechanisms of X-chromosome regulation. Annu Rev Genet. 1988;22:199-233. 168. Murdoch C. CXCR4: chemokine receptor extraordinaire. Immunol Rev. 2000;177:175-84. 169. Barnhart M, Shelton JD. ARVs: the next generation. Going boldly together to new frontiers of HIV treatment. Glob Health Sci Pract. 2015;3(1):1-11. 170. Sued O, Figueroa MI, Cahn P. Clinical challenges in HIV/AIDS: Hints for advancing prevention and patient management strategies. Adv Drug Deliv Rev. 2016;103:5-19. 171. Yu S, Yao Y, Xiao H, Li J, Liu Q, Yang Y, et al. Simultaneous Knockout of CXCR4 and CCR5 Genes in CD4+ T Cells via CRISPR/Cas9 Confers Resistance to Both X4- and R5-Tropic Human Immunodeficiency Virus Type 1 Infection. Human gene therapy. 2018;29(1):51-67. 172. Gilmour DS, Lis JT. Detecting protein-DNA interactions in vivo: distribution of RNA polymerase on specific bacterial genes. Proceedings of the National Academy of Sciences of the United States of America. 1984;81(14):4275-9. 173. Solomon MJ, Larsen PL, Varshavsky A. Mapping protein-DNA interactions in vivo with formaldehyde: evidence that histone H4 is retained on a highly transcribed gene. Cell. 1988;53(6):937-47. 174. Becker JS, Nicetto D, Zaret KS. H3K9me3-Dependent Heterochromatin: Barrier to Cell Fate Changes. Trends in genetics : TIG. 2016;32(1):29-41. 175. Uhde-Stone C, Cheung E, Lu B. TALE activators regulate gene expression in a position- and strand-dependent manner in mammalian cells. Biochem Biophys Res Commun. 2014;443(4):1189-94. 176. Liu PQ, Rebar EJ, Zhang L, Liu Q, Jamieson AC, Liang Y, et al. Regulation of an endogenous locus using a panel of designed zinc finger proteins targeted to accessible chromatin regions. Activation of vascular endothelial growth factor A. The Journal of biological chemistry. 2001;276(14):11323-34. 177. Ayyanathan K, Lechner MS, Bell P, Maul GG, Schultz DC, Yamada Y, et al. Regulated recruitment of HP1 to a euchromatic gene induces mitotically heritable, epigenetic gene silencing: a mammalian cell culture model of gene variegation. Genes Dev. 2003;17(15):1855-69. 178. Wolf G, Greenberg D, Macfarlan TS. Spotting the enemy within: Targeted silencing of foreign DNA in mammalian genomes by the Kruppel-associated box zinc finger protein family. Mob DNA. 2015;6:17.

148 References

179. Kim S, Kim D, Cho SW, Kim J, Kim JS. Highly efficient RNA-guided genome editing in human cells via delivery of purified Cas9 ribonucleoproteins. Genome Res. 2014;24(6):1012-9. 180. Bernstein DL, Le Lay JE, Ruano EG, Kaestner KH. TALE-mediated epigenetic suppression of CDKN2A increases replication in human fibroblasts. J Clin Invest. 2015;125(5):1998-2006. 181. Malissen B, Gregoire C, Malissen M, Roncagalli R. Integrative biology of T cell activation. Nat Immunol. 2014;15(9):790-7. 182. Carroll RG, Riley JL, Levine BL, Feng Y, Kaushal S, Ritchey DW, et al. Differential regulation of HIV-1 fusion cofactor expression by CD28 costimulation of CD4+ T cells. Science. 1997;276(5310):273-6. 183. Riley JL, Carroll RG, Levine BL, Bernstein W, St Louis DC, Weislow OS, et al. Intrinsic resistance to T cell infection with HIV type 1 induced by CD28 costimulation. J Immunol. 1997;158(11):5545-53. 184. Bleul CC, Wu L, Hoxie JA, Springer TA, Mackay CR. The HIV coreceptors CXCR4 and CCR5 are differentially expressed and regulated on human T lymphocytes. Proceedings of the National Academy of Sciences of the United States of America. 1997;94(5):1925-30. 185. Curradi M, Izzo A, Badaracco G, Landsberger N. Molecular mechanisms of gene silencing mediated by DNA methylation. Mol Cell Biol. 2002;22(9):3157-73. 186. Kuscu C, Arslan S, Singh R, Thorpe J, Adli M. Genome-wide analysis reveals characteristics of off-target sites bound by the Cas9 endonuclease. Nature biotechnology. 2014;32(7):677-83. 187. Wu X, Scott DA, Kriz AJ, Chiu AC, Hsu PD, Dadon DB, et al. Genome-wide binding of the CRISPR endonuclease Cas9 in mammalian cells. Nature biotechnology. 2014;32(7):670-6. 188. Barmania F, Pepper MS. C-C chemokine receptor type five (CCR5): An emerging target for the control of HIV infection. Appl Transl Genom. 2013;2:3-16. 189. Epigenetic Therapy of Cancer: Springer; 2013. 190. Ackermann AM, Wang Z, Schug J, Naji A, Kaestner KH. Integration of ATAC-seq and RNA-seq identifies human alpha cell and beta cell signature genes. Mol Metab. 2016;5(3):233-44. 191. Haynes BF, Burton DR. Developing an HIV vaccine. Science. 2017;355(6330):1129-30. 192. Yee JK. Off-target effects of engineered nucleases. FEBS J. 2016;283(17):3239-48. 193. Wang G, Zhao N, Berkhout B, Das AT. CRISPR-Cas9 Can Inhibit HIV-1 Replication but NHEJ Repair Facilitates Virus Escape. Molecular therapy : the journal of the American Society of Gene Therapy. 2016;24(3):522-6. 194. Maeder ML, Angstman JF, Richardson ME, Linder SJ, Cascio VM, Tsai SQ, et al. Targeted DNA demethylation and activation of endogenous genes using programmable TALE-TET1 fusion proteins. Nature biotechnology. 2013;31(12):1137-42.

149 Appendix

6. APPENDIX

6.1 LIST OF ABBREVIATIONS

Δ32 delta 32

5-AZA 5-Aza-2’-deoxycytidine

AAV adeno-associated virus

AIDS acquired immune deficiency syndrome

ART antiretroviral therapy

ATAC-seq Assay for Transposase-Accessible Chromatin using

sequencing bp base pair

Cas 9 CRISPR-associated protein 9

CD4 cluster of differentiation 4

CCR1 C-C motif chemokine receptor 1

CCR2 C-C motif chemokine receptor 2

CCR3 C-C motif chemokine receptor 3

CCR5 C-C motif chemokine receptor 5

CCRL2 C-C motif chemokine receptor-like 2

ChIP chromatin immunoprecipitation

150 Appendix

COSMID CRISPR Off-target Sites with Mismatches, Insertions, and

Deletions

CpG 5’-C-phosphate-G-3’

CRISPR clustered regularly interspaced short palindromic repeats crRNA CRISPR RNA

CXCR4 C-X-C motif chemokine receptor 4 dCas9 ‘dead’ Cas9 dDMT ‘dead’ designer methyl transferase

DEM designer epigenome modifier

DEPC diethyl pyrocarbonate

DMEM Dulbecco’s Modified Eagle Medium

DMT designer methyl transferase

DNA deoxyribonucleic acid

DNMT DNA methyl transferase dNTP deoxynucleotide triphosphate

DPBS Dulbecco’s phosphate buffered saline

DSB double strand break

DTA designer transcription activator

DTR designer transcription repressor

EGFP enhanced green fluorescent protein

EPAP E. coli poly(A) polymerase

151 Appendix

FACS fluorescence-activated cell sorting

FCS fetal calf serum

GFP green fluorescent protein gp glycoprotein gRNA guide RNA

HAT histone acetyltransferase

HDAC histone deacetylase

HEK human embryonic kidney hESC human embryonic stem cell

HF high fidelity hiPSC human induced pluripotent stem cell

HIV human immunodeficiency virus

HLA human leukocyte antigen

HMT histone methlytransferase

HR homologous recombination

IL interleukin indel insertion/deletion

KRAB Krüppel-associated box

LB lysogeny broth

LTR long terminal repeat

MIP macrophage inflammatory protein

152 Appendix

MOI multiplicity of infection

MOPS 3-(N-morpholino)propanesulfonic acid mRNA messenger RNA

NGS next-generation sequencing

NHEJ non-homologous end joining

NuRD nucleosome remodelling and deacetylase

PAM protospacer adjacent motif

PBMC peripheral blood mononuclear cell

PCR polymerase chain reaction

PEI polyethylenimine

PIC pre-initiation complex

Pol II RNA polymerase II

PROGNOS Predicted Report Of Genome-wide Nuclease Off-target Sites

OT off-target

RANTES regulated-upon-activation, normal T expressed and secreted

RGEN RNA-guided endonuclease

RNA ribonucleic acid

RNAi RNA interference

RNA-seq RNA sequencing

RNP ribonucleoprotein qRT-PCR quantitative real-time PCR

153 Appendix

RVD repeat variable diresidue

SETDB1 SET domain bifurcated 1 ssRNA single-stranded RNA

TAE Tris-acetate-EDTA

TALE transcription activator-like effector

Taq Thermus aquaticus

TALEN transcription activator-like effector nuclease

TET ten-eleven translocation tracrRNA trans-activating crRNA

TSS transcription start site

WHO World Health Organisation

ZFN zinc finger nuclease

ZFP zinc finger protein

154 Appendix

6.2 LIST OF FIGURES

FIGURE 1.1: HIV PREVALENCE IN 2016 ...... 12 FIGURE 1.2: HIV LIFE CYCLE ...... 13 FIGURE 1.3: GENOME EDITING WITH DESIGNER NUCLEASES ...... 18 FIGURE 1.4: ZINC FINGER NUCLEASE STRUCTURE ...... 19 FIGURE 1.5: TALEN STRUCTURE ...... 21 FIGURE 1.6: CRISPR/CAS9 STRUCTURE ...... 22 FIGURE 1.7: IN VIVO VS EX VIVO GENE EDITING ...... 30 FIGURE 1.8: DESIGNER NUCLEASE-MEDIATED ACTIVATION AND REPRESSION ...... 35 FIGURE 1.9: THE EPIGENETIC MECHANISMS AFFECTING GENE ACTIVITY ...... 38 FIGURE 1.10: THE DNA METHYLTRANSFERASES ...... 39 FIGURE 2.1: DEM CLONING SCHEMATIC ...... 53 FIGURE 2.2: SUMMARY OF THE CHROMATIN IMMUNOPRECIPITATION WORKFLOW ...... 75 FIGURE 2.3: ATAC-SEQ PRINCIPLE AND WORKFLOW ...... 78 FIGURE 3.1: SCHEMATIC OF THE CCR5 GENE AND THE FIREFLY LUCIFERASE AND EGFP REPORTERS ...... 81 FIGURE 3.2: EVALUATION OF THE FUNCTIONALITY OF THE DESIGNER TRANSCRIPTION ACTIVATORS (DTAS).. 84 FIGURE 3.3: ESTABLISHMENT OF AN EGFP REPORTER CELL LINE ...... 86 FIGURE 3.4: ASSESSING THE FUNCTIONALITY OF THE DESIGNER TRANSCRIPTION REPRESSORS (DTRS) ...... 89 FIGURE 3.5: STRUCTURE OF THE DESIGNER EPIGENOME MODIFIERS (DEMS) ...... 90 FIGURE 3.6: IN VITRO TRANSCRIPTION OF DEM MRNA ...... 91 FIGURE 3.7: ASSESSING THE STABILITY AND POTENCY OF DESIGNER EPIGENOME MODIFIER (DEM)-INDUCED SILENCING ...... 93 FIGURE 3.8: ASSESSING THE EFFECT OF DNA VS MRNA DEM DELIVERY ...... 94 FIGURE 3.9: DNA METHYLATION ANALYSIS IN A REPORTER CELL LINE ...... 97 FIGURE 3.10: DEM-INDUCED SILENCING OF THE ENDOGENOUS CXCR4 GENE ...... 100 FIGURE 3.11: T CELL LONG-TERM CULTURE CONDITIONS AND NUCLEOFECTION ...... 102 FIGURE 3.12: FUNCTIONALITY OF THE DEMS IN CD4+ T CELLS ...... 105 FIGURE 3.13: ASSESSING SIMULTANEOUS TARGETING OF THE CCR5 AND CXCR4 GENES USING DEMS ...... 106 FIGURE 3.14: METHYLATION ANALYSIS AT THE CCR5 AND CXCR4 GENES IN CD4+ CELLS ...... 108 FIGURE 3.15: ASSESSING DEM-INDUCED HISTONE MODIFICATIONS IN CD4+ CELLS ...... 110 FIGURE 3.16: ASSESSING THE EFFECTS OF DEM-INDUCED SILENCING AT CCR5-NEIGHBOURING GENES ...... 111 FIGURE 3.17: ASSESSING THE SPECIFICITY PROFILE OF DEMS IN CD4+ CELLS ...... 114 FIGURE 3.18: SUMMARY OF THE DEM-SPECIFICITY PROFILE EVALUATION ...... 115

155 Appendix

6.3 LIST OF TABLES

TABLE 2.1: LIST OF CCR5 AND CXCR4 TARGET SITES ...... 49 TABLE 2.2: PRIMERS USED TO GENERATE THE DEM CONSTRUCTS ...... 52 TABLE 2.3: PRIMERS USED TO GENERATE THE FIREFLY LUCIFERASE AND EGFP REPORTERS ...... 54 TABLE 2.4: DTR TRANSFECTION SCHEME ...... 57 TABLE 2.5: DUAL LUCIFERASE ASSAY TRANSFECTION SCHEME ...... 58 TABLE 2.6: VIRAL TITRATION SCHEME ...... 60 TABLE 2.7: RECOMMENDED CULTURING CONDITIONS FOR CD4+ CELLS ...... 65 TABLE 2.8: PRIMERS USED FOR BISULFITE SEQUENCING ANALYSIS ...... 68 TABLE 2.9: PRIMERS USED FOR CCR5 NGS ON-TARGET ANALYSIS ...... 70 TABLE 2.10: PRIMERS USED FOR CCR5 NGS OFF-TARGET ANALYSIS ...... 71 TABLE 2.11: TAQMAN ASSAYS USED FOR GENE EXPRESSION ANALYSES ...... 73 TABLE 2.12: PRIMERS USED FOR CHIP QRT-PCR ANALYSIS ...... 76 TABLE 3.1: LIST OF POTENTIAL OFF-TARGET SITES IDENTIFIED WITH TAL EFFECTOR NUCLEOTIDE TARGETER 2.0 ...... 113 TABLE 6.1: RESULTS OF RNA-SEQ ANALYSIS. (84 GENES DE-REGULATED) ...... 160 TABLE 6.2: POTENTIAL DEM #6 OFF-TARGET BINDING SITE (WITH UP TO THREE MISMATCHES) ...... 162 TABLE 6.3: RESULTS OF ATAC-SEQ ANALYSIS. (325 SITES WITH REDUCED CHROMATIN ACCESSIBILITY) ..... 188 TABLE 6.4: TRANSCRIPTION START SITES (TSS) CLOSEST TO THE DIFFERENTIALLY ACCESSIBLE REGIONS IDENTIFIED VIA ATAC-SEQ ...... 195 TABLE 6.5: OVERLAP BETWEEN ATAC-SEQ AND POTENTIAL OFF-TARGET SITES ...... 197

156 Appendix

6.4 SUPPLEMENTARY MATERIALS AND METHODS

Lysogeny broth (LB)

Ten grams of bactotryptone, 5 g of yeast extract and 10 g of NaCl were dissolved in 1 litre of deionised water. The mixture was autoclaved then cooled to room temperature

Agar plates

Ten grams of bactotryptone, 5 g of yeast extract, 10 g of NaCl and 10 g of bacteriological agar were dissolved in 1 litre of deionised water. The solution was autoclaved then ampicillin or kanamycin to a final concentration of 100 μg/ml or 50 µg/ml respectively were added. The agar was poured into petri dishes and allowed to solidify at room temperature.

6x loading dye

The following were combined in a 50 ml Falcon tube: 100 mg Orange G (Roth, Karlsruhe,

Germany), 10 ml Glycerol and 40 ml dH2O. After through mixing appropriate aliquots were made.

Tris-acetate-EDTA (TAE) buffer

To make 1 litre of 0.5M EDTA, 186.1 g of disodium EDTA were added to 800 ml of deionised water and the solution stirred vigorously on a magnetic stirrer. The pH was adjusted to 8 with NaOH pellets then the volume made up to 1 litre with deionised water.

Tris base (242g) was added to 57.1 ml of glacial acetic acid then 100 ml of 0.5M EDTA (pH 8) was added. The volume was made up to 1 litre with deionised water. One litre of 1× TAE buffer was made by adding 20 ml of 50 × TAE buffer to 980 ml of deionised water.

157 Appendix

Agarose gel

To make a 1% gel, 0.5 g of agarose were added to 50 ml of 1× TAE (tris-acetate-EDTA) buffer and heated in a microwave until the agarose had completely dissolved. The agarose was left to cool then 10 mg/ml ethidium bromide added. The mixture was poured into a gel cast and the gel allowed to solidify at room temperature. The amounts of agarose and buffer were adjusted appropriately for different gel sizes and percentages.

RNA gel

To make 1 litre of DEPC water, 1ml of diethyl pyrocarbonate (Sigma, St. Louis, USA) was added to 1 litre of Millipore water. The solution was mixed using a magnetic stirrer for at least 1 hour then autoclaved.

To make a 1% gel, 1g of agarose were added to 75 ml DEPC water and heated in a microwave then the mixture cooled to 60°C. Once cooled, 18 ml of formaldehyde solution (Sigma, St. Louis, USA) were added followed by 10 ml of NorthernMax10X MOPS Running Buffer (Ambion, Austin, USA). The mixture was immediately poured into a gel cast and left to solidify for 1 hour

MOPS buffer

1X MOPS buffer was prepared using NorthernMax10X MOPS Running Buffer (Ambion, Austin, USA) and Millipore water.

1.33X Gibson Master Mix (assembly mixture)

To make 6ml of 5X isothermal buffer the following were combined: 3 ml of 1 M Tris-HCl pH 7.5, 150 μl of 2 M MgCl2, 60 μl of 100 mMdGTP, 60 μl of 100 mMdATP, 60 μl of 100 mMdTTP, 60 μl of 100 mMdCTP, 300 μl of 1 M DTT, 1.5 g PEG-8000, 300 μl of 100 mM NAD and water to 6 ml.

158 Appendix

The following were combined: 50 µl of Taq ligase (40U/ul), 100µl of 5X isothermal buffer, 2 µl of T5 exonuclease (1U/µl), 6.25 µl of Phusion polymerase (2U/µl) and 216.75 µl of nuclease-free water.

Polyethylenimine (PEI)

To make 1 litre 0.1 g of PEI and 150 mM (8.77g) NaCl were combined with 950 ml deionised H2O and the solution mixed until the PEI had completely dissolved. The pH was adjusted to 5.5 with HCl then the volume made up to 1 litre with deionised water and the solution filtered to sterilise.

159 Appendix

6.5 SUPPLEMENTARY TABLES

160 Appendix

Table 6.1: Results of RNA-seq analysis. (84 genes de-regulated)

161 Appendix

162 Appendix Table 6.2: Potential DEM #6 off-target binding site (with up to three mismatches)

Genomic Location2

Potential off-target site1 Number of mismatches Chromosome Start End Strand

TGACCATATACTTATGTCATGT 0 Chr3 46.411.595 46.411.616 +

TGAACATATACTTATGTCATCT 1 Chr4 165.401.211 165.401.232 +

TGACCATATCCTTATGGCACAA 2 Chr1 72.084.753 72.084.774 -

TGACCAAATACTTAGGTCACAA 2 Chr1 81.197.612 81.197.633 +

TGACCATATGTTTATGTCAGCC 2 Chr1 115.995.413 115.995.434 -

TGACCATGTACTTATGTAACCC 2 Chr10 47.982.402 47.982.423 +

TGACCATGTACTTATGTAACCC 2 Chr10 51.927.697 51.927.718 +

TGACCATGTACTTATGTAACCC 2 Chr10 52.532.086 52.532.107 -

TGGCCATATACTTAAGTCAACC 2 Chr10 65.030.745 65.030.766 +

TGACCATATTCTTATCTCAGAT 2 Chr10 97.447.765 97.447.786 -

TGAGCATATACTGATGTCAGAC 2 Chr10 109.301.891 109.301.912 -

TGACTATATACTTATGTCTGGA 2 Chr11 27.732.515 27.732.536 +

TGACCATCTATTTATGTCAGTA 2 Chr11 127.541.818 127.541.839 -

TGAACATATACTTATCTCAAGA 2 Chr12 3.076.664 3.076.685 +

TGACCATATCCTTATTTCACTT 2 Chr12 12.967.242 12.967.263 -

TGATCATATACTTCTGTCACAT 2 Chr12 52.015.830 52.015.851 +

TGAGAATATACTTATGTCAAGG 2 Chr12 66.048.068 66.048.089 +

AGACAATATACTTATGTCATTT 2 Chr12 126.201.878 126.201.899 -

TGACCAAATTCTTATGTCAGAT 2 Chr13 36.353.811 36.353.832 +

TGAGCATCTACTTATGTCATCG 2 Chr13 44.968.765 44.968.786 +

TGACCTTATACTTCTGTCAAAA 2 Chr13 48.720.381 48.720.402 -

TTACCATATACTTTTGTCAATC 2 Chr16 61.077.413 61.077.434 -

TGACCCTCTACTTATGTCATGT 2 Chr17 14.261.495 14.261.516 +

TGACCATATACCTATCTCAAGA 2 Chr18 779.683 779.704 +

TTACAATATACTTATGTCATTT 2 Chr18 19.371.552 19.371.573 +

TTACCATATAGTTATGTCACTA 2 Chr18 76.945.488 76.945.509 -

TGACCATATACGTATGTAACAG 2 Chr2 47.956.567 47.956.588 +

TGACAATATACTTATGACAATA 2 Chr2 224.060.353 224.060.374 -

TGACCATATAGTTATGTAAGCT 2 Chr20 12.544.854 12.544.875 -

TGTCCATATACATATGTCATAC 2 Chr3 128.677.399 128.677.420 -

TGTCCATATATTTATGTCATCC 2 Chr3 178.293.606 178.293.627 +

TGACCGTATACATATGTCAAAA 2 Chr3 186.686.208 186.686.229 -

TCACTATATACTTATGTCATTA 2 Chr4 34.747.592 34.747.613 -

TGATCATATCCTTATGTCATAT 2 Chr4 106.456.883 106.456.904 -

163 Appendix

TGACCATAAACTTATTTCACTC 2 Chr4 119.610.622 119.610.643 -

TGACCATACACTGATGTCAAAC 2 Chr5 13.083.831 13.083.852 -

TGACCATATTCTTATGTAACCC 2 Chr5 35.132.581 35.132.602 +

TGACCATTTGCTTATGTCAAGC 2 Chr5 160.357.986 160.358.007 +

TGACCATACATTTATGTCATAA 2 Chr6 89.528.614 89.528.635 +

TGACTATATATTTATGTCAAAC 2 Chr6 107.362.597 107.362.618 +

TAACCATATACTTATCTCATCT 2 Chr7 20.121.799 20.121.820 +

TGACCAAATATTTATGTCATAT 2 Chr7 84.533.770 84.533.791 +

TGACCATATTCTTATGTCTTCC 2 Chr7 134.102.729 134.102.750 +

TTACCATATACATATGTCAAGA 2 Chr8 31.056.581 31.056.602 +

TGATCATATAGTTATGTCATAC 2 Chr8 33.669.235 33.669.256 +

TGACCATCTACTTGTGTCACTT 2 Chr8 52.364.466 52.364.487 -

TGATTATATACTTATGTCAACT 2 Chr8 111.862.001 111.862.022 -

TGACCACATACTGATGTCATTA 2 Chr8 116.567.819 116.567.840 -

TGAGCATATACTTATGTAAGCT 2 Chr9 29.004.177 29.004.198 -

TGACCATATACATGTGTCATTT 2 ChrX 17.017.334 17.017.355 +

TGATCATATACTTTTGTCAAAT 2 ChrX 98.322.767 98.322.788 -

TGTCCATATACATATGTCATGA 2 ChrX 99.230.790 99.230.811 +

TGACCATATACTTTTGTGATAG 2 ChrX 119.842.107 119.842.128 -

TGACCATATAATTATGCCACTG 2 ChrY 17.815.850 17.815.871 -

TGACCATATAATCATCTCAACA 3 Chr1 7.336.191 7.336.212 +

TGACCATATACGTTAGTCAAAT 3 Chr1 20.667.497 20.667.518 -

TAAACATATAATTATGTCAATT 3 Chr1 21.253.449 21.253.470 -

TAACCAAATACTTATTTCAACT 3 Chr1 21.485.031 21.485.052 +

AGACCATGTACCTATGTCAAGG 3 Chr1 28.077.336 28.077.357 -

TGAACATATACTTTAGTCACTA 3 Chr1 34.059.505 34.059.526 -

TGACCATCTACCCATGTCAGTG 3 Chr1 39.360.822 39.360.843 -

TGAGTATATACATATGTCAAGA 3 Chr1 41.609.289 41.609.310 -

TGACAATAAACTTATTTCACAG 3 Chr1 44.281.368 44.281.389 +

TGACTATATTCTTTTGTCAAAA 3 Chr1 45.662.453 45.662.474 -

TGACCATATATTTAAGTCCACC 3 Chr1 48.999.303 48.999.324 +

TGAACATATATATATGTCAGAG 3 Chr1 52.535.151 52.535.172 +

TGACCCTAGACTTATGACATGC 3 Chr1 56.012.090 56.012.111 -

TCACCATCTACTTCTGTCAATA 3 Chr1 56.391.597 56.391.618 -

TTACCATCTACTTATGGCAAGT 3 Chr1 60.615.651 60.615.672 -

TGGTCATATACTAATGTCATTG 3 Chr1 67.240.748 67.240.769 -

TGACATTATAGTTATGTCAGTT 3 Chr1 76.902.563 76.902.584 -

164 Appendix

TTACCATATATTTATTTCAGTA 3 Chr1 77.062.961 77.062.982 +

TATACATATACTTATGTCATGG 3 Chr1 83.446.626 83.446.647 +

TTACCATTTACTTATGTAAATG 3 Chr1 85.377.443 85.377.464 -

TGACTATAGACTTTTGTCATCC 3 Chr1 91.553.355 91.553.376 +

TGACCATCTAGTTATGACATAA 3 Chr1 91.674.194 91.674.215 -

CTACCTTATACTTATGTCAATT 3 Chr1 93.386.994 93.387.015 -

TCACCAGATACTTATGTTAGCC 3 Chr1 96.979.631 96.979.652 +

TGACCATAGGCTTAGGTCACAG 3 Chr1 101.022.324 101.022.345 -

TGACCTTAGAATTATGTCATAT 3 Chr1 104.119.150 104.119.171 -

TGACAATATACTAATGTGACAA 3 Chr1 107.150.509 107.150.530 +

TGGGTATATACTTATGTCAAAA 3 Chr1 111.075.150 111.075.171 -

TGACCATATTTTTATTTCAGGT 3 Chr1 115.280.400 115.280.421 +

TGAGCATTTAATTATGTCAGGG 3 Chr1 154.815.150 154.815.171 -

TGATAATATATTTATGTCACAG 3 Chr1 158.813.376 158.813.397 -

TGAACATCTGCTTATGTCAGAG 3 Chr1 160.204.051 160.204.072 -

TGACAATATACTTATTTCTTCC 3 Chr1 162.580.671 162.580.692 -

TAACCTTATACTTATTTCAGCT 3 Chr1 167.361.657 167.361.678 -

TGACCAGATCTTTATGTCATAA 3 Chr1 168.035.637 168.035.658 -

TGACCATATAGGTTTGTCAGAA 3 Chr1 168.182.991 168.183.012 +

TGACCAAACACTGATGTCACCA 3 Chr1 181.060.661 181.060.682 -

TGACCACAGGCTTATGTCACAA 3 Chr1 191.224.557 191.224.578 -

TTACCTTATACTGATGTCATTT 3 Chr1 191.792.805 191.792.826 -

TGAGCATATACTTAGGTAAAAT 3 Chr1 191.825.521 191.825.542 +

TGACTATATATTTTTGTCAGAC 3 Chr1 192.003.333 192.003.354 +

TGAATATATAATTATGTCAAAA 3 Chr1 192.303.073 192.303.094 +

TGAATATTTACTTATGTCATCT 3 Chr1 193.523.139 193.523.160 +

TGACCAAATAATTATTTCATAA 3 Chr1 198.650.361 198.650.382 -

TGAGCATATACATATATCAGTA 3 Chr1 198.666.693 198.666.714 -

TAACAATATACTTATGTAAATA 3 Chr1 198.692.037 198.692.058 +

TGACCATTTACTTATTTCTTCT 3 Chr1 198.696.435 198.696.456 +

TGACCACATATTTATGCCACTA 3 Chr1 207.194.595 207.194.616 -

TGTCCCTATACTTATGTCCATG 3 Chr1 208.662.671 208.662.692 +

TGGCCAAATACTGATGTCACGT 3 Chr1 210.360.271 210.360.292 +

TGTCCATATACTTCTGACATTC 3 Chr1 211.766.031 211.766.052 +

TTACCATATATTTATGCCATAT 3 Chr1 215.410.701 215.410.722 +

TGAATATATAATTATGTCATCT 3 Chr1 215.561.213 215.561.234 -

TGATCATATATTTATTTCAGCT 3 Chr1 217.770.267 217.770.288 +

165 Appendix

TGTCCATCTACTTATTTCATTT 3 Chr1 221.753.946 221.753.967 -

TGAGAATATACTTATGCCATAT 3 Chr1 222.732.321 222.732.342 -

TGACATTATACTTAAGTCACCA 3 Chr1 224.284.198 224.284.219 +

TGTCCATGTACTGATGTCATGC 3 Chr1 228.480.526 228.480.547 +

TGACCAAATATTTATGGCATCA 3 Chr1 229.098.926 229.098.947 +

GGACCATATACTTAAGGCACTC 3 Chr1 229.696.825 229.696.846 -

TGGCCATACATTTATGTCAAAA 3 Chr1 231.790.022 231.790.043 +

TGCCCATATGCTCATGTCATTC 3 Chr1 232.024.989 232.025.010 -

TGCCCATATAATTATATCACTA 3 Chr1 233.353.112 233.353.133 +

TGAAAATATAGTTATGTCATGT 3 Chr1 236.169.241 236.169.262 +

AGACCATATAATGATGTCAGAA 3 Chr1 239.492.680 239.492.701 -

CTGCCATATACTTATGTCACAG 3 Chr1 243.270.377 243.270.398 -

TAAACATATACATATGTCACAT 3 Chr10 1.807.732 1.807.753 +

TAATCATATACTTATGACATGC 3 Chr10 3.542.159 3.542.180 +

TACCCATATACTTGTGTCATCA 3 Chr10 9.569.397 9.569.418 +

TGAGCTTATAGTTATGTCAACA 3 Chr10 9.707.644 9.707.665 -

TGAGCATATATTTAAGTCAGCC 3 Chr10 15.466.911 15.466.932 -

TAACCATATACTAATGGCATGG 3 Chr10 20.232.449 20.232.470 +

TACCCATATACTTAAGTCAAAG 3 Chr10 27.467.551 27.467.572 -

TGACCATCTAGTTTTGTCAAAT 3 Chr10 46.657.393 46.657.414 -

TGACCATCTAGTTTTGTCAAAT 3 Chr10 48.844.294 48.844.315 -

TGACCATCTAGTTTTGTCAAAT 3 Chr10 49.364.859 49.364.880 -

TGACCAAAGACTTATGTGAGTG 3 Chr10 55.543.335 55.543.356 +

TGCCCATATACTTAAGTCTGTT 3 Chr10 55.664.921 55.664.942 +

TGTTCATATATTTATGTCATTA 3 Chr10 56.369.676 56.369.697 +

TAACCAAATGCTTATGTCAAAT 3 Chr10 59.042.691 59.042.712 -

TGACCATTAAATTATGTCATGG 3 Chr10 59.872.587 59.872.608 +

TGACCATAGGCTTAGGTCACAA 3 Chr10 61.916.512 61.916.533 +

GGACCATATGTTTATGTCACCT 3 Chr10 67.952.684 67.952.705 +

TGACCATGTACTTATTTGACAT 3 Chr10 69.492.853 69.492.874 -

TGAGCATATTCTCATGTCATTA 3 Chr10 77.475.306 77.475.327 +

TGACCATACATTTCTGTCAAAT 3 Chr10 77.879.299 77.879.320 +

TGGCCCTATAGTTATGTCACAG 3 Chr10 79.021.168 79.021.189 -

TGACCATAGACTTATGGCTTCA 3 Chr10 87.033.428 87.033.449 -

TTACCATATACATTTGTCAGAG 3 Chr10 87.498.689 87.498.710 +

TCACCATCTACTTATGTGAATA 3 Chr10 89.607.484 89.607.505 -

TGACCATATACTTATATTCCAG 3 Chr10 94.034.264 94.034.285 -

166 Appendix

TGACCTAATACTTAAGTCAGAA 3 Chr10 95.268.025 95.268.046 -

TGACCCAATAATTATGTCAGTG 3 Chr10 98.716.236 98.716.257 -

TGACCATGTATTTAGGTCACAA 3 Chr10 115.330.884 115.330.905 +

TGCACATATACTTATGTAACCT 3 Chr10 117.075.546 117.075.567 -

TGCCCATATACTTCTTTCAAAC 3 Chr10 118.100.532 118.100.553 -

TGACCATATAATTAAGTAAATA 3 Chr10 119.824.463 119.824.484 -

TGACCTTGTACTCATGTCATTT 3 Chr10 126.153.431 126.153.452 -

TGAACATATAATTATGTAAAAC 3 Chr10 126.384.319 126.384.340 -

TGACTATATACTTATATCTAAT 3 Chr10 129.443.426 129.443.447 +

GGACCTTATTCTTATGTCACTG 3 Chr10 130.232.611 130.232.632 +

TGACCATATACTCAGGACAGGA 3 Chr11 2.300.179 2.300.200 +

TGATCATATATTTTTGTCATTC 3 Chr11 5.835.137 5.835.158 -

TGAGCATATACTTCTCTCAAAG 3 Chr11 9.598.697 9.598.718 -

TGAGCATATATTTATGCCAAGC 3 Chr11 15.640.238 15.640.259 -

TGACCTTATGCTTATGTGAAAA 3 Chr11 16.463.904 16.463.925 +

TGACAATATAAATATGTCATTA 3 Chr11 24.807.066 24.807.087 -

TGAGCATATAATTATGCCAAAA 3 Chr11 27.084.858 27.084.879 -

TTACCATAAACTCATGTCACTC 3 Chr11 28.044.720 28.044.741 +

GGATCATATACTTATTTCAATG 3 Chr11 37.411.140 37.411.161 +

TGTCCATATACTTTTGTCTGGA 3 Chr11 38.148.732 38.148.753 -

TGACCATAGCCTTAGGTCACAA 3 Chr11 40.786.463 40.786.484 +

TGAACATATACATATGTAAACC 3 Chr11 41.836.895 41.836.916 +

TGTCCATATACTTATGGTAAGT 3 Chr11 49.191.042 49.191.063 +

TGCTCAGATACTTATGTCAAAG 3 Chr11 56.365.252 56.365.273 -

TGCCCAGATACTTAAGTCATAA 3 Chr11 71.048.976 71.048.997 -

TGACCACAAACTGATGTCAGAT 3 Chr11 80.656.743 80.656.764 -

TGACCAAATACTTAGATCACAA 3 Chr11 84.656.265 84.656.286 -

TGACCATTTATTTAAGTCACCA 3 Chr11 85.035.173 85.035.194 +

TTACCATATACATTTGTCAAAA 3 Chr11 87.538.927 87.538.948 -

TGATCATATATTTATTTCAGGG 3 Chr11 89.153.809 89.153.830 -

TGTCCATATACTTATGGTAAGT 3 Chr11 89.409.020 89.409.041 -

TGACCACTTACTTGTGTCACAG 3 Chr11 92.217.204 92.217.225 -

TGACCATCTACTCAGGTCAATA 3 Chr11 93.297.803 93.297.824 -

TTTCCATATACTTATCTCAAAG 3 Chr11 95.495.658 95.495.679 -

TGGCCATATACTCATGTAACTG 3 Chr11 98.698.553 98.698.574 -

GCACCACATACTTATGTCAATC 3 Chr11 103.029.958 103.029.979 -

TGAACATATATTTATTTCAAAG 3 Chr11 104.211.239 104.211.260 +

167 Appendix

TGACCAGAGACTAATGTCATAA 3 Chr11 105.139.011 105.139.032 +

TCACCATATACCTATGTGAAAG 3 Chr11 107.545.139 107.545.160 -

TGAACATATAATTATCTCATAA 3 Chr11 108.223.850 108.223.871 +

TGACCATAAAATTCTGTCAGCT 3 Chr11 110.173.089 110.173.110 -

TGAGTATATACATATGTCAAAA 3 Chr11 112.508.566 112.508.587 -

TGACCATATACATCTGGCAGGA 3 Chr11 117.422.065 117.422.086 -

TGGCCATATGATTATGTCATCA 3 Chr11 118.755.129 118.755.150 +

TGTACATACACTTATGTCATTA 3 Chr11 121.293.139 121.293.160 +

TGAGCATTTACTTATTTCACCA 3 Chr11 121.427.689 121.427.710 -

TAACCATAAAATTATGTCAGGA 3 Chr11 123.758.679 123.758.700 -

TGAACATATACTTATAACAGGA 3 Chr11 130.252.106 130.252.127 -

TGACCATATATGTATGTCTTTG 3 Chr11 133.397.408 133.397.429 -

TGACCATATTCTTATGGCCTTC 3 Chr11 133.443.702 133.443.723 -

TGAACATTTACTTATGTAATTG 3 Chr12 14.727.382 14.727.403 +

TGACAACATACTTATGTTACTT 3 Chr12 15.389.726 15.389.747 +

TGACCATATACTTTTCACAGAA 3 Chr12 16.871.213 16.871.234 -

TGACCAAATACTTAGATCACAC 3 Chr12 17.870.726 17.870.747 -

TGACAATATACTTTTATCAGAG 3 Chr12 20.467.956 20.467.977 +

TGACCAAATACTTCTGCCACTT 3 Chr12 20.628.015 20.628.036 +

TGTCCATATTCTTATTTCATTT 3 Chr12 27.553.833 27.553.854 -

AGATCATTTACTTATGTCATTT 3 Chr12 28.747.966 28.747.987 -

TCACCATATAATTATGGCAATT 3 Chr12 28.853.737 28.853.758 -

TGACTATTTTCTTATGTCATTA 3 Chr12 29.525.924 29.525.945 -

TTACCATATACTTACATCAGTT 3 Chr12 31.457.573 31.457.594 -

TGACAATATTCTTATGTTATAT 3 Chr12 33.106.790 33.106.811 +

TGACCATATATACATGTCACTT 3 Chr12 34.257.318 34.257.339 -

TGATCATATATATATGTCACTT 3 Chr12 38.869.609 38.869.630 -

TGACCATTTCCTTATGACACCT 3 Chr12 41.090.913 41.090.934 +

TGAGAATATAGTTATGTCATTT 3 Chr12 43.831.434 43.831.455 +

TGAGCATAGACTTATGTAAATA 3 Chr12 45.512.101 45.512.122 +

TGACCATAAACTGAGGTCATCT 3 Chr12 45.621.183 45.621.204 -

TGTCCATGTAATTATGTCACTG 3 Chr12 45.840.517 45.840.538 +

TGACTATATACTGATGTAAAAG 3 Chr12 46.871.563 46.871.584 -

TCATCATATACTTAAGTCACAT 3 Chr12 52.673.458 52.673.479 -

TGAGAATATATTTATGTCACCT 3 Chr12 55.100.598 55.100.619 +

TGGCCATAAAATTATGTCATAG 3 Chr12 61.400.229 61.400.250 -

TTACCATATATTTATGTAAAGC 3 Chr12 61.979.644 61.979.665 +

168 Appendix

TGATCATTTACTTGTGTCAGCC 3 Chr12 62.071.418 62.071.439 -

TGTTCATATACTTATGCCACTG 3 Chr12 62.865.584 62.865.605 -

GGACCATATACTTATCTCTCAC 3 Chr12 65.442.739 65.442.760 -

TGACAATATATTTATGTAAATA 3 Chr12 67.654.981 67.655.002 -

TGACCATATATTATTGTCACCA 3 Chr12 68.091.772 68.091.793 +

TGAACATAAACTTATCTCATCA 3 Chr12 68.423.665 68.423.686 -

TGACTAGATACTTATGTCCTAA 3 Chr12 70.396.021 70.396.042 -

TGACCAGATACTACTGTCATGC 3 Chr12 70.756.121 70.756.142 +

TGATAATATACTTCTGTCAAAA 3 Chr12 73.823.422 73.823.443 -

TCACCATATACTGATTTCAGGC 3 Chr12 74.116.656 74.116.677 +

TGACCATAGTCTTATGTTAGAG 3 Chr12 75.382.381 75.382.402 +

TGACAATATGCTTATGTTAAAA 3 Chr12 79.378.535 79.378.556 +

TGACTATATCCTTATTTCATAC 3 Chr12 79.463.478 79.463.499 -

TGAACATATAATTATCTCAAAA 3 Chr12 91.032.557 91.032.578 +

TGAGCATATGCTTCTGTCATGT 3 Chr12 98.695.219 98.695.240 -

TGAGCATTTACTTATGTCCCAA 3 Chr12 99.465.020 99.465.041 -

TGACTAAATACTTATATCAGTA 3 Chr12 99.755.687 99.755.708 +

TGAGTATATACTTATGCCAGAA 3 Chr12 100.142.329 100.142.350 +

TGAGTATATACTTATGTCGAAA 3 Chr12 101.429.243 101.429.264 +

TGACCAAATGCTTATGTCCCCC 3 Chr12 106.113.246 106.113.267 -

AGACCATATAATTATGTCCTGT 3 Chr12 107.522.138 107.522.159 -

TGAGCATATATTTATTTCATGT 3 Chr12 115.625.133 115.625.154 -

TCACCAAATACTTATTTCATGC 3 Chr12 117.782.083 117.782.104 -

TGACCATATATATTTGTCAAAA 3 Chr12 122.780.824 122.780.845 +

AGAGGATATACTTATGTCAAAT 3 Chr12 127.834.251 127.834.272 +

TTACCATATAGTTTTGTCAACC 3 Chr12 127.898.445 127.898.466 -

TGCCCATATACTCATGTTAACT 3 Chr12 128.843.381 128.843.402 +

TGACCATGTATTTATTTCATAA 3 Chr13 26.195.995 26.196.016 +

TAACCACATACTCATGTCAGGC 3 Chr13 28.169.560 28.169.581 -

CGACCATATAATTATTTCAATA 3 Chr13 31.657.142 31.657.163 +

TTACCATAAACTTCTGTCAGTG 3 Chr13 33.242.609 33.242.630 -

TGACCATATAATTATTTCTTGT 3 Chr13 38.649.278 38.649.299 +

TTACCATCTACTTATGGCAAAG 3 Chr13 39.073.822 39.073.843 -

TCAGCAAATACTTATGTCAAAC 3 Chr13 45.380.581 45.380.602 -

TAACAATATACTTATTTCATAG 3 Chr13 49.468.476 49.468.497 +

TGACCATCTACTAGTGTCACTC 3 Chr13 50.417.728 50.417.749 -

TAAGCATATACTTATGTGACCC 3 Chr13 50.543.703 50.543.724 +

169 Appendix

TGAACATGTACTTATGTGATTC 3 Chr13 54.447.189 54.447.210 +

TCACTATATACTCATGTCACCA 3 Chr13 70.255.280 70.255.301 -

TGACCGTGTATTTATGTCAAAT 3 Chr13 71.679.052 71.679.073 -

TGACCAGATAATCATGTCAGCT 3 Chr13 81.771.708 81.771.729 -

TGACCATACACTTTAGTCAGCT 3 Chr13 84.944.481 84.944.502 +

TGATCATATATTTATCTCATAA 3 Chr13 86.911.506 86.911.527 -

TGACCTTGTACTTCTGTCAATC 3 Chr13 86.997.764 86.997.785 +

TGAACATATACATTTGTCAAAC 3 Chr13 87.654.343 87.654.364 +

TGACCATATATTTTTGTAAATT 3 Chr13 88.321.169 88.321.190 -

TGACCAAAAACTTAGGTCACAG 3 Chr13 91.600.513 91.600.534 +

TTACCATATAATTATCTCAAAT 3 Chr13 92.836.289 92.836.310 +

TAACCATATACACATGTCAAAG 3 Chr13 97.742.353 97.742.374 +

TGACTATGTACTTAGGTCAAAC 3 Chr13 99.872.750 99.872.771 +

TGAGAAAATACTTATGTCATAA 3 Chr13 100.119.153 100.119.174 -

TGACCACATAATTTTGTCAGTT 3 Chr13 101.744.542 101.744.563 +

TCACCAAATACTTGTGTCATGC 3 Chr13 105.390.158 105.390.179 +

TGACCATTTACATATTTCAATG 3 Chr13 107.550.942 107.550.963 +

ATACAATATACTTATGTCAATA 3 Chr13 114.984.778 114.984.799 -

TCACCATATATTAATGTCACCC 3 Chr14 23.503.021 23.503.042 +

AAACCAAATACTTATGTCAAGG 3 Chr14 24.334.024 24.334.045 -

TGTCCATATACTAACGTCAGAG 3 Chr14 24.424.535 24.424.556 +

TGTCCATATACTAACGTCAGAG 3 Chr14 24.459.682 24.459.703 +

TGACCATATACAAAAGTCAACT 3 Chr14 25.121.385 25.121.406 +

TCTCCATATACTTATGGCAATA 3 Chr14 25.398.608 25.398.629 +

TGACCATATTCTTTTTTCAACT 3 Chr14 25.451.425 25.451.446 +

TGATCATTTTCTTATGTCATTA 3 Chr14 33.351.413 33.351.434 +

TGGCCATATACTGAAGTCATTT 3 Chr14 36.029.889 36.029.910 +

TTACAATTTACTTATGTCACCT 3 Chr14 36.788.238 36.788.259 -

TGATCATTTACTTATGTAAGCA 3 Chr14 39.138.272 39.138.293 -

TGACTATATACTTGGGTCAGTT 3 Chr14 41.513.642 41.513.663 -

TGAGCTTATACTTATGTCTTCA 3 Chr14 42.607.281 42.607.302 +

TGACAATGTACTTATGACAAAG 3 Chr14 43.920.043 43.920.064 -

TGACCATATAGTTTGGTCAAAT 3 Chr14 45.064.552 45.064.573 +

TGACCAAAGACTTATATCAAAA 3 Chr14 46.051.611 46.051.632 -

TGACCATTTTCTTATGTTATAA 3 Chr14 46.152.474 46.152.495 -

TTACCATATTCTTATGTCTATA 3 Chr14 47.157.400 47.157.421 +

TGAAAATATACTTTTGTCAATG 3 Chr14 49.433.617 49.433.638 -

170 Appendix

TCACCATATACTAAAGTCAACT 3 Chr14 49.700.677 49.700.698 -

TGCCCAGATACATATGTCAGAA 3 Chr14 51.963.825 51.963.846 -

TCACCATATACTTATTTTATCT 3 Chr14 59.918.260 59.918.281 +

AAACCATTTACTTATGTCAAAT 3 Chr14 67.783.634 67.783.655 -

TTTCTATATACTTATGTCAGAA 3 Chr14 68.784.637 68.784.658 +

TGACTATATACTTTTGTTAAAA 3 Chr14 71.304.603 71.304.624 +

TGGCCATACACTTTTGTCATTA 3 Chr14 73.661.576 73.661.597 +

TGACCATGGCCTTATGTCACAG 3 Chr14 76.168.584 76.168.605 -

TGAACATATAATTATCTCAGAG 3 Chr14 78.361.692 78.361.713 +

TGATCCTATTCTTATGTCAAAT 3 Chr14 80.226.275 80.226.296 -

TCACTATATACTTATATCATGT 3 Chr14 82.338.084 82.338.105 +

TGACTATATAATTATTTCATGA 3 Chr14 83.778.630 83.778.651 -

TAACAATATACTTTTGTCATAT 3 Chr14 84.438.979 84.439.000 +

TCACCATATTCTCATGTCATAG 3 Chr14 84.594.042 84.594.063 +

TGGCCATATAATTAAGTCATAA 3 Chr14 84.937.620 84.937.641 -

TGACCATGTACTGATGTTAGAC 3 Chr14 84.954.630 84.954.651 -

TGACCATTTACTTAATTCATTT 3 Chr14 86.284.877 86.284.898 -

TTACCATATACTGATGTAAAAT 3 Chr14 86.434.581 86.434.602 -

TGATTATTTACTTATGTCAATG 3 Chr14 96.372.217 96.372.238 +

GGACCATATACATTTGTCAAAA 3 Chr14 99.617.203 99.617.224 -

TGAACATATACAGATGTCAAAA 3 Chr14 100.298.285 100.298.306 -

TGAGGATGTACTTATGTCACCC 3 Chr15 27.208.628 27.208.649 -

GGACCATATAATTGTGTCAAGT 3 Chr15 37.365.268 37.365.289 -

TGACTATATACATTTGTCAAAA 3 Chr15 37.559.865 37.559.886 +

TGAGCATTTTCTTATGTCATTA 3 Chr15 42.257.468 42.257.489 -

TGGCCATATACTAATGTTACAC 3 Chr15 48.359.552 48.359.573 -

TGACCATATAATTTTGCCATCT 3 Chr15 52.622.865 52.622.886 -

TGGCCATATTTTTATGTCATAT 3 Chr15 55.864.035 55.864.056 +

TGACCAAATACTTAGATCACAA 3 Chr15 56.593.016 56.593.037 +

TAATCATATACTTATGTAATTG 3 Chr15 56.747.770 56.747.791 -

TGACTGGATACTTATGTCACCC 3 Chr15 61.176.605 61.176.626 +

TCCCAATATACTTATGTCATAT 3 Chr15 64.580.494 64.580.515 -

GGACCATATATTTATTTCATGA 3 Chr15 68.457.967 68.457.988 -

AAACCATATACTAATGTCATTT 3 Chr15 73.215.139 73.215.160 -

TGACCACTTACTTGTGTCAGGA 3 Chr15 77.722.277 77.722.298 +

TGATCATTTACTTATTTCAGTA 3 Chr15 78.515.160 78.515.181 -

TGAACATTTTCTTATGTCATTA 3 Chr15 78.874.197 78.874.218 +

171 Appendix

TCACCTTATACTCATGTCATTA 3 Chr15 87.989.251 87.989.272 +

TGACCAAATACTGATCTCATTT 3 Chr15 89.016.598 89.016.619 -

TCACCATATACTGATGTCGAAA 3 Chr15 91.891.890 91.891.911 -

TGACCATAAAGTTATTTCACTG 3 Chr16 21.575.237 21.575.258 +

TCTCCATATATTTATGTCACAT 3 Chr16 24.331.442 24.331.463 -

TGACCATATAATTATGCCTTTT 3 Chr16 27.933.684 27.933.705 +

TGACCATAGACTTGTGTTAGGA 3 Chr16 47.671.527 47.671.548 +

GGACCATATACTTCTGTCTTTC 3 Chr16 52.136.571 52.136.592 +

TGAACATATACTTATAACATGA 3 Chr16 55.155.256 55.155.277 -

TGACCATATACCTATTTTACAG 3 Chr16 57.641.971 57.641.992 -

TGACCACATCCTTATTTCATGT 3 Chr16 59.721.649 59.721.670 +

TGAGCATTTACTTATGTTAAAT 3 Chr16 60.437.632 60.437.653 -

TGAATATATACTTATGTTATAT 3 Chr16 64.564.041 64.564.062 +

TGACAATGTAATTATGTCAGAG 3 Chr16 65.713.477 65.713.498 +

TGACCATATATTAATGTGATTC 3 Chr16 76.014.777 76.014.798 +

TGATCATATAGTTATTTCAATT 3 Chr16 76.339.098 76.339.119 +

CTACCATATACCTATGTCAAAA 3 Chr16 82.738.680 82.738.701 +

TGACCATCTACTTTTTTCACAC 3 Chr17 2.224.496 2.224.517 -

TGACCATATAATTTTATCATGT 3 Chr17 2.389.487 2.389.508 +

GGACCATAGGCTTATGTCACAA 3 Chr17 7.374.637 7.374.658 +

TCACCATATACATATATCATTG 3 Chr17 8.563.007 8.563.028 +

TGACCATATCCCAATGTCACAG 3 Chr17 16.415.774 16.415.795 -

TGACCATACACATATGTAATAA 3 Chr17 16.606.510 16.606.531 -

TGACCATACACATATGTAATAA 3 Chr17 18.515.536 18.515.557 +

TGACCATACACATATGTAATAA 3 Chr17 18.737.070 18.737.091 -

TTACAATATACTGATGTCATTT 3 Chr17 18.801.155 18.801.176 +

TCAACATATACTTATTTCACTG 3 Chr17 19.145.013 19.145.034 -

TGACCATACACATATGTAATAA 3 Chr17 20.237.838 20.237.859 -

TGACCATACACATATGTAATAA 3 Chr17 20.786.753 20.786.774 +

TTAGCAAATACTTATGTCATGC 3 Chr17 34.478.917 34.478.938 +

TGACCATGAACTTATGTGAGAT 3 Chr17 36.051.289 36.051.310 +

TGACTATAAACATATGTCAGAA 3 Chr17 45.404.235 45.404.256 +

TGACTCTATACATATGTCAAAA 3 Chr17 55.331.529 55.331.550 +

AGATCATGTACTTATGTCAAAA 3 Chr17 63.321.149 63.321.170 +

TAAGCATATATTTATGTCAAGA 3 Chr17 67.111.201 67.111.222 -

TGACCACATACATATGTGAGGA 3 Chr18 961.125 961.146 +

TGACCATTTACTTAGCTCACGA 3 Chr18 4.333.121 4.333.142 +

172 Appendix

TGACCTTATATATATGTCACAT 3 Chr18 8.803.375 8.803.396 -

TGACCATCTACTCATGTCCAAT 3 Chr18 9.348.222 9.348.243 -

TGACCATATACATATTTTAAAA 3 Chr18 10.549.324 10.549.345 +

TGACCATCTTCCTATGTCATTA 3 Chr18 12.489.728 12.489.749 -

TGCCCGTTTACTTATGTCACCC 3 Chr18 20.722.690 20.722.711 -

TGACCCTATACTGCTGTCACTG 3 Chr18 23.787.568 23.787.589 -

TGAGCATAGACTTATTTCAGGT 3 Chr18 25.357.557 25.357.578 -

TGAACACATACTTATGTAAGAA 3 Chr18 25.729.592 25.729.613 +

TGGGGATATACTTATGTCAAAA 3 Chr18 28.843.113 28.843.134 -

TGAACATATGCATATGTCAGGA 3 Chr18 30.689.925 30.689.946 -

TGACCATCCACTCATGTCAATA 3 Chr18 31.661.574 31.661.595 -

TGACCATATATTCATCTCAATA 3 Chr18 34.563.963 34.563.984 +

AGACCATATCCTTATATCACTT 3 Chr18 38.432.965 38.432.986 -

TGACCATATGGTTATGCCATGA 3 Chr18 40.011.706 40.011.727 +

TGCCCATATATTTATGTGAATT 3 Chr18 41.619.665 41.619.686 -

TGAGCATCTTCTTATGTCATTA 3 Chr18 44.289.594 44.289.615 -

ATACCATATAATTATGTCATTC 3 Chr18 45.499.378 45.499.399 +

TGAGCATATACTTATGTTTATG 3 Chr18 49.116.859 49.116.880 +

TGACCATTTACTTTTGTGATAT 3 Chr18 50.293.804 50.293.825 -

TGATCATGTAATTATGTCAGGT 3 Chr18 50.629.550 50.629.571 -

TGACATTTTACTTATGTCATCT 3 Chr18 56.598.366 56.598.387 -

TCACCATACACTTATGTCTTTG 3 Chr18 57.220.040 57.220.061 +

TGAACAAATACTTTTGTCACTT 3 Chr18 58.593.593 58.593.614 +

TGGCCATATGCTTATATCATTT 3 Chr18 58.866.216 58.866.237 +

TGAACAAATATTTATGTCAGGC 3 Chr18 59.930.898 59.930.919 +

TAACCATATACATATGTAAATA 3 Chr18 61.771.175 61.771.196 -

TGACCATATATTTATGGTATTG 3 Chr18 63.294.229 63.294.250 -

TGACCATAGACTTAAGTTATGA 3 Chr18 64.978.293 64.978.314 +

TGACCATATATTTTAGTCAATA 3 Chr18 65.093.472 65.093.493 -

TGAGCATATACATATGTCCAGA 3 Chr18 70.981.277 70.981.298 +

TAACAATATACTGATGTCAGTA 3 Chr18 73.529.736 73.529.757 -

TGACCCTATCCTTATTTCAATA 3 Chr19 7.014.507 7.014.528 +

AGAGCAGATACTTATGTCATTT 3 Chr19 15.970.512 15.970.533 +

TGAAAATATACTTATATCAAAA 3 Chr19 30.908.963 30.908.984 -

TGAACATATACTTATCTGATAC 3 Chr19 35.257.578 35.257.599 -

TTACCATATACTTATATTAACA 3 Chr19 43.318.697 43.318.718 +

TTACCATATACTTATATTAACA 3 Chr19 43.487.308 43.487.329 +

173 Appendix

GGACTATAAACTTATGTCACCA 3 Chr19 47.636.007 47.636.028 -

TGAACAAATATTTATGTCACTT 3 Chr19 48.484.215 48.484.236 +

AGACCATATACATATGTCTGCA 3 Chr19 52.092.478 52.092.499 -

TGACAAAATACTTATGTAAATA 3 Chr19 57.510.440 57.510.461 -

TCACCATGTACTTATGTAAATG 3 Chr2 883.760 883.781 -

TTACAATATACGTATGTCACTT 3 Chr2 3.313.061 3.313.082 -

TGCTCATATACTTATTTCATTT 3 Chr2 4.477.788 4.477.809 +

TGAAGATATACGTATGTCAGAA 3 Chr2 6.792.058 6.792.079 -

TGAACATATACTTATTTCCCAA 3 Chr2 7.792.511 7.792.532 +

CGACCATTTACTTATATCAGTA 3 Chr2 9.208.162 9.208.183 +

TAAAAATATACTTATGTCATCT 3 Chr2 9.368.400 9.368.421 +

TGGTCATATACTTATATCAAAT 3 Chr2 19.154.862 19.154.883 +

TGACCATATTCTTAATTCAGCA 3 Chr2 20.299.132 20.299.153 -

TAACCATATACTGATGTCTTCA 3 Chr2 25.004.690 25.004.711 -

TGACCATATACTTGTATCTTTC 3 Chr2 27.607.069 27.607.090 -

TGACCAGATATTTATGTCTGCC 3 Chr2 31.386.411 31.386.432 -

TGACCATATAGTTATGTACCCT 3 Chr2 32.091.629 32.091.650 -

TAACAATATACTTATTTCATAA 3 Chr2 32.281.951 32.281.972 +

TCACCATATACATATTTCATAG 3 Chr2 33.410.462 33.410.483 -

TGACCATATATTTATCACAGAT 3 Chr2 35.455.307 35.455.328 -

TGGCCATTTACTTATGTTACAA 3 Chr2 35.871.402 35.871.423 +

CAAACATATACTTATGTCAAGT 3 Chr2 46.471.467 46.471.488 -

TGACCATTTATTTATATCATTT 3 Chr2 49.419.621 49.419.642 -

TGATCATCTACTTAGGTCATAA 3 Chr2 50.248.302 50.248.323 +

TGACCATAGAATTATGTAATTG 3 Chr2 51.379.704 51.379.725 -

TGATCATCTACTTATGTCTTAA 3 Chr2 53.147.122 53.147.143 -

TGACCATAAACTTTTCTCATAC 3 Chr2 58.170.499 58.170.520 -

TCACCAGAGACTTATGTCAGCA 3 Chr2 60.612.461 60.612.482 +

TGACTGTATACGTATGTCAAAG 3 Chr2 73.639.269 73.639.290 -

AGACCAAATACTTATATCACAA 3 Chr2 74.384.097 74.384.118 -

TGACCAGATTCTTGTGTCACAG 3 Chr2 75.182.561 75.182.582 -

TGACCCTATACTTAATTCAAGT 3 Chr2 77.150.961 77.150.982 -

TGACAGTATACTTTTGTCAAAA 3 Chr2 83.726.298 83.726.319 +

TGAACCTATACTTATCTCAAAA 3 Chr2 106.444.521 106.444.542 +

TGACCAAAAAGTTATGTCAATT 3 Chr2 114.308.481 114.308.502 -

TTACCATATAGGTATGTCACAA 3 Chr2 115.972.664 115.972.685 -

TCACCATATAACTATGTCAGTT 3 Chr2 117.498.784 117.498.805 -

174 Appendix

TAAACATAAACTTATGTCAAAG 3 Chr2 120.736.541 120.736.562 +

TGAACATTTACTTATGGCACGC 3 Chr2 121.474.082 121.474.103 +

TTACCAAATACTTATGTGACAT 3 Chr2 123.991.634 123.991.655 +

TGTCCATCTAGTTATGTCATCT 3 Chr2 124.820.866 124.820.887 +

TGACCATATAATTATTTGAATG 3 Chr2 126.228.948 126.228.969 +

TTACCATATACATATGTAAATT 3 Chr2 127.017.289 127.017.310 -

AGACCATATATTTATGTCTCCC 3 Chr2 130.075.727 130.075.748 +

TCACCATATACAAATGTCAACT 3 Chr2 130.244.195 130.244.216 +

TGAGCATATACAGATGTCAAAA 3 Chr2 130.541.985 130.542.006 +

TGACCAAATACGTATTTCATAA 3 Chr2 135.289.615 135.289.636 +

TCACAATATGCTTATGTCATGA 3 Chr2 139.516.451 139.516.472 +

TGACCATTTACTTGGGTCACTC 3 Chr2 141.494.839 141.494.860 +

TGACCATAAACTTTTGGCACTT 3 Chr2 144.209.305 144.209.326 +

TGACAAAATACTTAAGTCAAAA 3 Chr2 146.398.705 146.398.726 -

TGACCATATACTGAAGGCAAGC 3 Chr2 147.503.466 147.503.487 +

TGAACATATATTAATGTCATGC 3 Chr2 148.965.381 148.965.402 -

TGAACACATACTTATCTCAAAA 3 Chr2 149.495.744 149.495.765 -

TGACCTTAAAATTATGTCAAAA 3 Chr2 161.589.949 161.589.970 -

TGTTCATATACTTATATCAATG 3 Chr2 165.144.019 165.144.040 -

TGAACATTTTCTTATGTCATAC 3 Chr2 166.489.507 166.489.528 +

TGACCACATACCTATGACATAT 3 Chr2 167.336.167 167.336.188 +

TAATCATATATTTATGTCAACA 3 Chr2 170.229.681 170.229.702 +

TAACCTTATACTTAAGTCATTA 3 Chr2 171.242.409 171.242.430 -

TGATCTTATACATATGTCATTT 3 Chr2 174.521.068 174.521.089 +

TCATCATACACTTATGTCATAG 3 Chr2 175.009.315 175.009.336 +

TGACCATATATTTGTGTAATAA 3 Chr2 175.016.091 175.016.112 -

TGAACATATACTTAGGACACGT 3 Chr2 178.029.121 178.029.142 -

TGACCATATATTTCTGTAAATA 3 Chr2 184.834.297 184.834.318 -

TGACTACATACTAATGTCACGG 3 Chr2 186.951.427 186.951.448 +

TGCCCATATACTTGTGCCAAGC 3 Chr2 191.519.281 191.519.302 -

TGACTACATACTTATTTCATTT 3 Chr2 192.896.977 192.896.998 +

TGACCAAAAACTTAGGTCACAG 3 Chr2 193.500.908 193.500.929 +

TGACCATAGACTGATTTCAATG 3 Chr2 193.536.048 193.536.069 -

TGACCAGATACTCAAGTCAATA 3 Chr2 194.749.280 194.749.301 -

TGAAAATATACTTATGTCTTAT 3 Chr2 199.075.316 199.075.337 +

TGACCAAATACATATGTCCAAG 3 Chr2 206.723.429 206.723.450 -

TGGGCATATACATATGTCAAAA 3 Chr2 207.252.876 207.252.897 +

175 Appendix

TGAGCAAATACATATGTCACTA 3 Chr2 210.394.453 210.394.474 +

TGACCATAGACTTATGGTAGGT 3 Chr2 212.032.378 212.032.399 -

TGAGCATAGACTAATGTCATAT 3 Chr2 215.247.369 215.247.390 -

TGAATATATACTTATGTCCAAG 3 Chr2 216.980.643 216.980.664 -

TAGCCATATACTTATATCAGGA 3 Chr2 226.408.798 226.408.819 -

TGACCAATTACTTATGTCTACC 3 Chr2 230.748.084 230.748.105 +

TGTCAATCTACTTATGTCAATT 3 Chr2 231.268.442 231.268.463 +

TGATCATATATTTATATCAGTA 3 Chr2 234.786.759 234.786.780 +

TTACCATATATTTATGTAAATT 3 Chr2 235.062.877 235.062.898 +

TGAGCATTTTCTTATGTCATTA 3 Chr2 235.765.170 235.765.191 -

TGAACATATTCTTATGCCAAGC 3 Chr2 239.214.193 239.214.214 +

TGACCCTATACTAATATCAACA 3 Chr2 242.215.413 242.215.434 +

TAAACATATATTTATGTCAAAA 3 Chr20 1.615.704 1.615.725 +

TGTCCATATAATTGTGTCATTT 3 Chr20 5.071.384 5.071.405 -

TGAATATATACTTATTTCAAAA 3 Chr20 6.913.844 6.913.865 -

TCACCATATACATATATCACTT 3 Chr20 12.726.815 12.726.836 +

TGACCATATATTTACGTAACAG 3 Chr20 17.310.429 17.310.450 -

TGACCATATAGTTAGCTCAATG 3 Chr20 24.091.772 24.091.793 +

TGCCCAGATACTTAGGTCATAA 3 Chr20 25.169.048 25.169.069 +

TTACCATCTACTTATGTCTGAA 3 Chr20 33.569.209 33.569.230 +

TGAAAATAAACTTATGTCAATG 3 Chr20 34.300.687 34.300.708 +

TGACCATATACTTAAAACAACA 3 Chr20 34.720.081 34.720.102 +

TGACCATATACATACCTCATAT 3 Chr20 34.723.158 34.723.179 -

TTTCCATATACTTGTGTCATCT 3 Chr20 43.789.496 43.789.517 -

TGACATTATACTTATTTCAGGC 3 Chr20 53.093.626 53.093.647 -

TGATTATATACTAATGTCACCA 3 Chr20 55.308.829 55.308.850 -

TGACCATTTACTAATTTCATAA 3 Chr20 57.237.907 57.237.928 +

TAACCATATAATTATTTCAATA 3 Chr20 58.729.697 58.729.718 -

TGAGCATATAATTAGGTCACCG 3 Chr20 62.138.667 62.138.688 +

TGCCCATATGCTTCTGTCAGAT 3 Chr21 15.835.269 15.835.290 +

AGACCTTATATTTATGTCAAAG 3 Chr21 16.846.848 16.846.869 -

TGATCATATATTTATATCACAT 3 Chr21 17.391.197 17.391.218 -

TGACATTATACTTCTGTCAAGA 3 Chr21 18.383.304 18.383.325 +

TCACCATATGCTTATCTCAAGA 3 Chr21 20.279.602 20.279.623 +

TGACAAAATACTTATGACAATT 3 Chr21 20.770.048 20.770.069 +

TGAGCATAAACTTATTTCAGAG 3 Chr21 23.579.671 23.579.692 +

TGAAAATATAATTATGTCAATG 3 Chr21 24.024.618 24.024.639 +

176 Appendix

TGACCAAATAGTTATGACATTT 3 Chr21 25.446.857 25.446.878 +

TGTCCAAATACTTAAGTCAGCT 3 Chr21 27.320.348 27.320.369 +

TGACCATATACATATATCTACT 3 Chr21 27.963.063 27.963.084 +

TGACCATATACTTTGCTCAGGA 3 Chr21 29.328.185 29.328.206 +

TGAGAATATAGTTATGTCAATG 3 Chr21 29.456.359 29.456.380 +

TGACCCTAAACATATGTCATTA 3 Chr21 29.951.526 29.951.547 +

TGCCCATATGCTTGTGTCATTA 3 Chr21 30.699.619 30.699.640 +

AGACCATATACATATATCAAAT 3 Chr21 32.115.621 32.115.642 -

TGACCATATTCTTCTGCCAATG 3 Chr21 37.112.051 37.112.072 -

GGACCATATGCTTATGTAAAAA 3 Chr21 41.932.878 41.932.899 +

TGACCAGATTCTTTTGTCAATA 3 Chr22 19.449.043 19.449.064 +

TGAAAATATACTTATTTCAGTC 3 Chr22 22.541.594 22.541.615 +

TGAGCATTTACTTAAGTCACAG 3 Chr22 23.759.596 23.759.617 +

GGACCATATGCTTTTGTCAGTA 3 Chr22 30.033.549 30.033.570 +

TGAGCATATATTTATTTCACCT 3 Chr22 32.403.852 32.403.873 -

GGAGCATTTACTTATGTCAGAA 3 Chr22 34.007.959 34.007.980 -

TCACCATATATTTATCTCAGTA 3 Chr22 45.533.834 45.533.855 -

TGACCATGTCCTTGTGTCATCT 3 Chr22 45.653.393 45.653.414 -

TGACCATCTACTTAAGGCAGAT 3 Chr22 48.202.629 48.202.650 +

TGACCAAATACTTAGATCACAA 3 Chr3 895.866 895.887 +

TGACCATATGTTCATGTCAGAA 3 Chr3 1.951.040 1.951.061 +

TGACCCTACACTTATCTCATTT 3 Chr3 5.319.308 5.319.329 -

TAATAATATACTTATGTCATTA 3 Chr3 5.767.446 5.767.467 -

TGACCATATTCTTTTATCACTT 3 Chr3 8.641.243 8.641.264 +

TGACCTTATATTTATGGCAAAG 3 Chr3 14.832.621 14.832.642 +

TGACTATAAATTTATGTCATGT 3 Chr3 17.592.372 17.592.393 +

TGACCATTTACTTACGTTATCA 3 Chr3 21.118.105 21.118.126 +

TGAACATATACATATGCCATTA 3 Chr3 28.849.137 28.849.158 -

TGACAATAAAATTATGTCACTC 3 Chr3 30.087.604 30.087.625 +

TGACCAGATATTTATGCCAGTG 3 Chr3 35.031.911 35.031.932 -

TAACTATATACTTATTTCAATA 3 Chr3 35.044.105 35.044.126 -

AGACCATATACTTTTGTAAGAA 3 Chr3 43.865.220 43.865.241 -

TGACTATATAATTATCTCAAAA 3 Chr3 48.408.586 48.408.607 +

TGTCAATATAATTATGTCATGG 3 Chr3 48.960.001 48.960.022 +

TGAACATCTACTTTTGTCAGGC 3 Chr3 61.082.373 61.082.394 -

CTACCATATACTTATTTCAAGG 3 Chr3 62.303.971 62.303.992 +

TTACCATACAGTTATGTCAGAA 3 Chr3 63.372.969 63.372.990 +

177 Appendix

TAACAAGATACTTATGTCAATG 3 Chr3 64.759.689 64.759.710 +

TGAGCATATACTTTTTTCATTC 3 Chr3 67.142.143 67.142.164 -

TGACCAAAGACTTAGGTCACAA 3 Chr3 70.492.141 70.492.162 +

TGACCATAAACCTATGTCTGTA 3 Chr3 74.688.456 74.688.477 -

TGACCATACATTTATGCCACAG 3 Chr3 76.568.355 76.568.376 -

TGACCATGTAATTAAGTCATTT 3 Chr3 77.163.727 77.163.748 +

AGACAATATATTTATGTCACAA 3 Chr3 78.484.484 78.484.505 -

TGAGCCTATACGTATGTCAAAT 3 Chr3 78.808.298 78.808.319 +

TGCCCATATACTTTTATCATCC 3 Chr3 83.890.521 83.890.542 +

TGACACTATACTAATGTCATAA 3 Chr3 85.999.981 86.000.002 +

TGACCAGATACTAGTGTCATGC 3 Chr3 86.523.348 86.523.369 +

TGACCTTTTAATTATGTCAATT 3 Chr3 86.800.816 86.800.837 +

TTACCATATACATATATCAAAG 3 Chr3 88.021.399 88.021.420 +

TGACCATATTCTTTTGTGAGGT 3 Chr3 97.526.921 97.526.942 -

TTACAATAAACTTATGTCATAT 3 Chr3 98.939.331 98.939.352 +

TGACCATTTACCTATGTCCTGA 3 Chr3 103.221.239 103.221.260 -

TGACCAAATATGTATGTCACAC 3 Chr3 103.273.058 103.273.079 -

TGACCATGTACTCATCTCAGGT 3 Chr3 104.546.629 104.546.650 +

TGAGTATATACATATGTCAACA 3 Chr3 108.430.572 108.430.593 +

TGGCCATAGACATATGTCATTT 3 Chr3 110.863.090 110.863.111 -

TGACTATATGCTTATGTCCCAA 3 Chr3 111.292.005 111.292.026 -

TGACCATATATTCATGTTAATT 3 Chr3 114.246.579 114.246.600 +

TGACCAAATACTTACATCAAAA 3 Chr3 117.793.576 117.793.597 -

TGACCATATGCATTTGTCAAAA 3 Chr3 121.167.645 121.167.666 +

GGACCATATACATCTGTCACAT 3 Chr3 133.251.848 133.251.869 -

TGACCATATACATTTGTTAAAA 3 Chr3 135.179.559 135.179.580 -

TGACCAAAGACTTAGGTCACAA 3 Chr3 135.374.392 135.374.413 +

TGACCCTATACTTAAGCCACAC 3 Chr3 138.714.693 138.714.714 -

TGACCATATACTTTTATTAATG 3 Chr3 147.252.282 147.252.303 -

TGATCATATATTTATTTCAAAA 3 Chr3 148.497.553 148.497.574 -

TGACCATATACTTATTGGAAAC 3 Chr3 149.181.209 149.181.230 +

TGACTATATACTCATGACATAC 3 Chr3 155.361.196 155.361.217 +

TGACCAGCTCCTTATGTCAATT 3 Chr3 161.410.653 161.410.674 -

TGACCATATAAATATGTAAGAA 3 Chr3 163.805.471 163.805.492 +

TGGCCATATACTTCTGCCAAGG 3 Chr3 164.355.515 164.355.536 +

TGAACATATATCTATGTCATAA 3 Chr3 168.406.768 168.406.789 -

TGACCATATATTTGTGACATGT 3 Chr3 170.572.836 170.572.857 +

178 Appendix

TTGTCATATACTTATGTCAAAA 3 Chr3 172.485.969 172.485.990 -

TCACCTAATACTTATGTCAGTC 3 Chr3 178.127.961 178.127.982 +

TCACCATATATTTATGTAACAC 3 Chr3 178.289.145 178.289.166 -

AGAACATATATTTATGTCATAT 3 Chr3 179.861.994 179.862.015 -

TGACAATATGTTTATGTCAGTA 3 Chr3 180.174.086 180.174.107 -

TGACCATAAAATTATGACAGAA 3 Chr3 181.260.784 181.260.805 -

TGACCAAATACTTTTGTGAGCT 3 Chr3 189.492.210 189.492.231 +

AGACCATTTATTTATGTCATCC 3 Chr3 189.802.074 189.802.095 -

TTACCATTTACTCATGTCATCT 3 Chr3 194.202.932 194.202.953 +

TGACCATATATTTAGGCCAAGC 3 Chr3 194.655.515 194.655.536 -

TGAATATATACTAATGTCACCT 3 Chr4 6.487.426 6.487.447 +

TGACCATATACTTGTGATAACT 3 Chr4 11.951.521 11.951.542 -

TAACCATATACTTATTTAATTC 3 Chr4 12.756.934 12.756.955 -

TGACCATTTCCTTAAGTCAGTT 3 Chr4 13.473.702 13.473.723 -

TCAACATATTCTTATGTCATCT 3 Chr4 14.364.303 14.364.324 -

TGAGGATTTACTTATGTCATTT 3 Chr4 19.767.010 19.767.031 +

TGACCATATACTTTGGTCTTCA 3 Chr4 29.421.339 29.421.360 +

TGACCATATACTTACATAATTG 3 Chr4 31.991.867 31.991.888 +

TGACAATATATTTATGTGAATC 3 Chr4 33.907.017 33.907.038 -

TGAATATATACTTATATCAGTG 3 Chr4 35.083.054 35.083.075 +

TTACCATATACTCATCTCATTT 3 Chr4 35.511.318 35.511.339 +

TGAACATATACGTATATCAACC 3 Chr4 37.089.308 37.089.329 +

TGACCATATGCATTTGTCAAAA 3 Chr4 37.732.160 37.732.181 +

TGAACATAGACTGATGTCACTG 3 Chr4 41.198.995 41.199.016 +

TGGCCATTTACTTATGTGACCA 3 Chr4 41.833.861 41.833.882 +

TGACAATATAATTATGTAATCT 3 Chr4 41.927.712 41.927.733 +

TGACCATCTACTTATTTTAACT 3 Chr4 42.353.720 42.353.741 -

TGACCATATGATTATGCCACTG 3 Chr4 46.664.446 46.664.467 -

TGACGAGACACTTATGTCAAGA 3 Chr4 47.006.385 47.006.406 +

GGACCAGAGACTTATGTCAATA 3 Chr4 58.853.602 58.853.623 +

TGACCATATCTTTATATCACAC 3 Chr4 66.314.662 66.314.683 +

TGACCAAAAACTTATTTCATAA 3 Chr4 67.289.769 67.289.790 +

TTAACATTTACTTATGTCATAT 3 Chr4 67.403.788 67.403.809 -

TGAACATATACATCTGTCAAAA 3 Chr4 69.752.736 69.752.757 -

TGAACATATAATCATGTCACTA 3 Chr4 69.848.962 69.848.983 -

TGACCATTTATTTAGGTCAATA 3 Chr4 73.924.844 73.924.865 +

TGACCATATACATTTGTAAGTG 3 Chr4 74.123.294 74.123.315 -

179 Appendix

TGCACATATATTTATGTCATAC 3 Chr4 74.576.274 74.576.295 -

TGACAGAATACTTATGTCACTC 3 Chr4 75.513.889 75.513.910 -

TGACCAAAGACTTAAGTCATAA 3 Chr4 75.607.698 75.607.719 -

CCTCCATATACTTATGTCAAAG 3 Chr4 75.805.474 75.805.495 -

TGACAATATAGTTATTTCATAT 3 Chr4 77.274.684 77.274.705 -

TGAACAGATACATATGTCAACT 3 Chr4 81.138.845 81.138.866 +

TGAACATATTCTTTTGTCAAAA 3 Chr4 81.506.479 81.506.500 -

AGATCATATATTTATGTCACTG 3 Chr4 81.864.434 81.864.455 -

TATCCATAGACTTATGTCAGAG 3 Chr4 84.515.630 84.515.651 +

TCACTATATACTTATGTAATTA 3 Chr4 88.081.769 88.081.790 -

AAACCATATACTTATCTCAATA 3 Chr4 94.782.693 94.782.714 +

TGATCACATACTAATGTCACAG 3 Chr4 99.798.274 99.798.295 +

TTAGCAAATACTTATGTCATTT 3 Chr4 111.501.062 111.501.083 +

TGACTGTATAATTATGTCACTT 3 Chr4 115.197.705 115.197.726 +

TGTCCATTTACTTTTGTCATTA 3 Chr4 116.283.068 116.283.089 -

CTGCCATATACTTATGTCACAG 3 Chr4 119.494.628 119.494.649 +

TGACCAGAAACTTAGGTCACAA 3 Chr4 121.099.895 121.099.916 +

TGACAATATAATCATGTCATAT 3 Chr4 126.676.815 126.676.836 -

TGAACACATACTAATGTCAACA 3 Chr4 126.842.589 126.842.610 -

TGACATTATACATATGTCAAAA 3 Chr4 127.032.797 127.032.818 -

TGCCAATATACTTATTTCAGGT 3 Chr4 127.475.193 127.475.214 -

TGACCATCTACTTATGTGCCAA 3 Chr4 128.488.373 128.488.394 +

TGAGCATATACATTTGTCAGTA 3 Chr4 131.861.601 131.861.622 -

TTACCATGTACTTATGTTACCA 3 Chr4 132.252.453 132.252.474 -

TGAACATAAACTTATGTAAGTA 3 Chr4 134.980.669 134.980.690 -

TGACTATATACTTAAGTGACTA 3 Chr4 135.613.221 135.613.242 -

TGAGCATATTCTTATCTCAAAA 3 Chr4 136.139.291 136.139.312 -

TGAGAATATAATTATGTCATCT 3 Chr4 142.187.652 142.187.673 +

TGAATATATACTTATTTCAAAA 3 Chr4 147.702.283 147.702.304 -

TGTTCATATACTTATGCCATAC 3 Chr4 151.323.487 151.323.508 -

AGACCATATATTTATCTCATGA 3 Chr4 151.629.572 151.629.593 -

TGACGATATTCTAATGTCACCA 3 Chr4 154.052.074 154.052.095 +

TCACCTTCTACTTATGTCACTT 3 Chr4 158.233.654 158.233.675 -

TGCCCATATACATATTTCATCT 3 Chr4 158.794.698 158.794.719 -

TGAGCATTTACTTATGTTACAA 3 Chr4 160.883.036 160.883.057 +

TCACCATCCACTTATGTCACCA 3 Chr4 163.343.150 163.343.171 +

TGACTATATTCTTATGTCTGTT 3 Chr4 166.005.671 166.005.692 +

180 Appendix

TAACGATATACTTATTTCATAA 3 Chr4 167.302.604 167.302.625 -

TGACCAAATAATAATGTCAAGA 3 Chr4 175.015.485 175.015.506 +

TGAGCAGATACTTATGACACTC 3 Chr4 176.566.925 176.566.946 +

TGACCATTTCCTTAAGTCATTG 3 Chr4 176.974.173 176.974.194 -

TGTTCATATACTTATATCATAT 3 Chr4 180.768.055 180.768.076 -

AGACCCTATACTTATTTCACTT 3 Chr4 184.049.122 184.049.143 -

TGACCATATATTTCTGGCATAT 3 Chr4 189.281.720 189.281.741 -

TGACTATATACTTATTTCTCTT 3 Chr5 364.508 364.529 -

TGACCATACAGTTATGTCCAAA 3 Chr5 5.473.714 5.473.735 +

TGACTATATATTTTTGTCAAAC 3 Chr5 6.099.808 6.099.829 +

TGAGCATATACTTAGGGCAATG 3 Chr5 7.930.152 7.930.173 -

TGTCCATATACATATGCCAAAG 3 Chr5 11.412.752 11.412.773 +

TGACCATATTCTTTTTTCACAT 3 Chr5 14.955.643 14.955.664 -

TCACCATATACTCATGTGAATT 3 Chr5 17.698.659 17.698.680 -

TGACTATATAATTAGGTCATAT 3 Chr5 18.263.724 18.263.745 -

TGACCATATGGTTCTGTCAATG 3 Chr5 25.184.463 25.184.484 +

TAACCAAATACTAATGTCATCA 3 Chr5 26.840.733 26.840.754 -

TGATCATAGACTTAAGTCATTT 3 Chr5 29.251.784 29.251.805 +

TGTCTATATACATATGTCATCT 3 Chr5 29.812.812 29.812.833 -

TTACCATATACATTTGTCACCT 3 Chr5 35.196.382 35.196.403 -

TGACTCTATACTTCTGTCACAC 3 Chr5 37.023.231 37.023.252 -

TGACTATATTCTTATTTCAACT 3 Chr5 37.970.577 37.970.598 +

TGAGAATATACTTATGACAACA 3 Chr5 39.030.472 39.030.493 -

TGAACATATTCTTATGTGACTC 3 Chr5 41.102.712 41.102.733 -

TGACCATATACCTATGAAAGGT 3 Chr5 50.579.474 50.579.495 +

TGCCCATGTACTTATGTGATTT 3 Chr5 52.187.597 52.187.618 +

TGAGCATATAATTATGACACAC 3 Chr5 53.440.457 53.440.478 -

TGACCATATGCATTTGTCAAAA 3 Chr5 53.890.162 53.890.183 +

TGAACATATGCTTATGTTAAGG 3 Chr5 57.437.380 57.437.401 +

TGAACATATACTGAAGTCAGAG 3 Chr5 58.716.221 58.716.242 +

TTACCATATACTTTTGTGAGTT 3 Chr5 66.334.669 66.334.690 +

TCACCTTATACTTATGTCTAGT 3 Chr5 71.521.040 71.521.061 -

TGACTATATACTTATGAAAAAT 3 Chr5 75.416.061 75.416.082 +

TAACCATATACATATGTGAAGC 3 Chr5 77.726.875 77.726.896 +

TGACAATATATTTCTGTCATTT 3 Chr5 79.544.682 79.544.703 -

TGACCATTTACTTATGAGAAAG 3 Chr5 81.092.776 81.092.797 +

TAACCAGATACTTATGTCTGGC 3 Chr5 81.527.865 81.527.886 -

181 Appendix

TGAACACATTCTTATGTCATTA 3 Chr5 83.771.567 83.771.588 -

TGAACAAATATTTATGTCATAT 3 Chr5 84.022.927 84.022.948 -

TGACTATATTGTTATGTCATAT 3 Chr5 85.432.708 85.432.729 +

TCACCATATACATATATCAAAT 3 Chr5 97.609.783 97.609.804 +

TGACCATTTACTTATTTCCTTT 3 Chr5 97.671.947 97.671.968 +

TGACCAAATATATATGTCAAAG 3 Chr5 99.943.443 99.943.464 -

TGACCATATACATAGGTTACAC 3 Chr5 100.816.668 100.816.689 +

TGAACATACACTTATCTCATGT 3 Chr5 102.923.181 102.923.202 -

TGATCATATATTAATGTCAAAG 3 Chr5 103.755.120 103.755.141 -

TGAACATATACTGTTGTCACAA 3 Chr5 107.406.792 107.406.813 -

TGACCGTATACTTATTTGACTC 3 Chr5 109.004.997 109.005.018 +

TGACCATATACTAGTGCCAAAA 3 Chr5 116.969.704 116.969.725 +

TGACCATATATTTATTACATAC 3 Chr5 117.009.732 117.009.753 +

AGACCATGTATTTATGTCAGTA 3 Chr5 121.245.481 121.245.502 -

TGATCATATCCTTAAGTCAGTG 3 Chr5 123.861.021 123.861.042 -

TAACCATATATTTATATCAATA 3 Chr5 124.565.832 124.565.853 +

TGAAAAGATACTTATGTCAAGA 3 Chr5 125.664.804 125.664.825 +

TGACTATATACTTATGTAGTTT 3 Chr5 129.794.138 129.794.159 -

TGACCATATATATATATCACAG 3 Chr5 135.091.431 135.091.452 -

TGCCCATATCCTCATGTCAAGT 3 Chr5 142.266.331 142.266.352 -

TAACCATCTAGTTATGTCAGAA 3 Chr5 143.559.295 143.559.316 +

TGCACATGTACTTATGTCAGTA 3 Chr5 144.074.984 144.075.005 -

TGACTATATACTTGTATCAGTG 3 Chr5 144.714.119 144.714.140 +

TGAACATATACTGTTGTCAGAA 3 Chr5 146.657.934 146.657.955 +

TGACCATATAATTATCACAACA 3 Chr5 147.482.715 147.482.736 +

AGAGCATATACTTATGACAGCA 3 Chr5 149.405.914 149.405.935 -

TGACTATATACATTTGTCAAAA 3 Chr5 149.479.371 149.479.392 +

TGACAATATAGTGATGTCACCT 3 Chr5 150.878.662 150.878.683 -

TGACCATATATTTGTTTCAACA 3 Chr5 151.811.078 151.811.099 +

TCACCATATAGTTATGTAAATG 3 Chr5 153.114.070 153.114.091 +

TGACTATATACATATCTCACAA 3 Chr5 154.297.589 154.297.610 +

TGACCATATAGTTATTTCTTCA 3 Chr5 154.746.188 154.746.209 +

TGACCATTTACATTTGTCAGCA 3 Chr5 159.903.019 159.903.040 -

TGTCCATATACTTATTTCTATG 3 Chr5 161.995.095 161.995.116 +

TAACCATATACTTAAGTAAGAG 3 Chr5 163.944.821 163.944.842 +

TGACCATAAACTTACGTGAAAA 3 Chr5 164.602.641 164.602.662 -

TGACTATATACTAAAGTCAATA 3 Chr6 2.047.365 2.047.386 +

182 Appendix

TGCCTATATACTTATGTTAAAG 3 Chr6 10.012.482 10.012.503 +

AGACCATATTATTATGTCAACA 3 Chr6 12.892.772 12.892.793 -

TTACCATATTCTTATGTGAAAA 3 Chr6 15.615.960 15.615.981 -

TGACCTTTTACTTAAGTCAATA 3 Chr6 15.776.175 15.776.196 -

TGAGCATTTACTTTTGTCAGGC 3 Chr6 27.371.633 27.371.654 -

TGAAAATATACTTATTTCAGTG 3 Chr6 54.681.462 54.681.483 -

TGTCCATTTACTTATGTCTTTG 3 Chr6 54.885.562 54.885.583 +

TGTGCATATACTTATGTCCTGA 3 Chr6 63.514.199 63.514.220 -

TGACCATCTACTTCTGTGAGCC 3 Chr6 65.313.559 65.313.580 -

TGAAGATATACTTATGTAATCT 3 Chr6 68.617.482 68.617.503 -

TGCACATATACTTATTTCATTA 3 Chr6 69.257.299 69.257.320 -

TGGCTATATACATATGTCAAAA 3 Chr6 71.606.141 71.606.162 -

TAACCATATAATTATGTAAATA 3 Chr6 73.304.041 73.304.062 -

TCACCAAATATTTATGTCACCC 3 Chr6 74.416.731 74.416.752 -

TTTCCATTTACTTATGTCATCT 3 Chr6 82.209.494 82.209.515 -

TGTCCATATACCTGTGTCAACT 3 Chr6 82.448.966 82.448.987 +

TGACAATATAGTTATTTCATTC 3 Chr6 83.942.509 83.942.530 -

TGACCAGATACATGTGTCATGG 3 Chr6 96.240.306 96.240.327 +

TGACTATTGACTTATGTCATTT 3 Chr6 97.887.698 97.887.719 +

TGAGTATATACTTATGACACTA 3 Chr6 102.667.152 102.667.173 +

TGTACATATACTTATCTCATGG 3 Chr6 102.872.619 102.872.640 -

AAACCATATAATTATGTCAAAA 3 Chr6 104.401.209 104.401.230 +

TGACCATTTTCCTATGTCATTT 3 Chr6 106.349.191 106.349.212 +

TGACAATATACTGATGTAATTA 3 Chr6 110.548.413 110.548.434 +

TGACCATTTACTTACGTAACTA 3 Chr6 117.548.665 117.548.686 -

TGACCCTATATTTTTGTCATTT 3 Chr6 122.667.893 122.667.914 +

TGACCATCTACTGATGTTAGGG 3 Chr6 127.608.420 127.608.441 -

TGGACATATATTTATGTCAGTA 3 Chr6 130.955.591 130.955.612 -

TGTCCATATACTTGTGTCTGAT 3 Chr6 133.038.880 133.038.901 +

TGACCATTTACATGTGTCAGGA 3 Chr6 133.968.583 133.968.604 +

TGATCATATAGCTATGTCATTG 3 Chr6 141.172.496 141.172.517 +

TGTACATATACTTATGTTAAAC 3 Chr6 144.136.438 144.136.459 -

TGACAATAGACTTATTTCACTA 3 Chr6 145.699.361 145.699.382 -

TGACCATATTCTTAAGGCAGCC 3 Chr6 149.348.026 149.348.047 -

TGACCATATTCTCAGGTCACTG 3 Chr6 153.010.889 153.010.910 -

TGACCTTATTCTTATGTCTTTA 3 Chr6 155.335.406 155.335.427 +

TGACCACATTCTTATTTCAAAC 3 Chr6 157.748.240 157.748.261 +

183 Appendix

AGACAATAAACTTATGTCATTG 3 Chr6 161.205.089 161.205.110 +

TGACCATATAATCATGTCTGAG 3 Chr6 162.814.956 162.814.977 -

TGATCAGATACATATGTCATGT 3 Chr6 164.294.618 164.294.639 +

TGACCAAAAATTTATGTCAAAA 3 Chr6 165.304.577 165.304.598 +

TGAACCTATACTTATTTCAAAA 3 Chr7 2.840.187 2.840.208 +

TGATCATTAACTTATGTCACAG 3 Chr7 3.610.282 3.610.303 +

TGATCATATAGTCATGTCACTG 3 Chr7 7.868.656 7.868.677 +

TGACCATAAACTTTTATCACTA 3 Chr7 8.887.235 8.887.256 -

TGACCACATAGTTATGTTAACA 3 Chr7 11.318.764 11.318.785 -

TGACCATATACTTTGGTCCAAG 3 Chr7 12.667.190 12.667.211 -

TAACTATATACTTATGGCAATA 3 Chr7 15.004.910 15.004.931 -

TGACTATATACTTTTGTAATTA 3 Chr7 20.236.118 20.236.139 +

TGACCATTTACTTCTGTAAAAG 3 Chr7 20.661.976 20.661.997 +

TGATCATATACTAATGTGAAAT 3 Chr7 22.272.888 22.272.909 +

TGACAATATATTTATGACATCC 3 Chr7 24.594.817 24.594.838 -

TGATCCTATATTTATGTCATTT 3 Chr7 25.249.325 25.249.346 +

TGAACATATACTCATATCATTA 3 Chr7 30.641.086 30.641.107 +

TGAGCATATCCTTATTTCAGGG 3 Chr7 32.718.741 32.718.762 -

TGATTATATACTTATATCAATA 3 Chr7 35.328.320 35.328.341 -

TGACTATGTGCTTATGTCAATA 3 Chr7 37.507.140 37.507.161 -

TGACAATATACTTATTTGACAA 3 Chr7 37.693.930 37.693.951 -

TTACAAGATACTTATGTCATCA 3 Chr7 38.364.397 38.364.418 +

TGACCATTTACTTGTGTTAGAA 3 Chr7 46.151.164 46.151.185 +

TCTCCATATACTTATGTGAAGA 3 Chr7 53.765.939 53.765.960 +

TGACCAAATACTTATGTGGCAA 3 Chr7 66.596.002 66.596.023 -

TGAGCATATATTTATGCCAGGT 3 Chr7 78.996.977 78.996.998 -

TGAACATATTCTTATCTCAATA 3 Chr7 84.293.992 84.294.013 -

TGACCAAATACTTAGATCACAA 3 Chr7 85.578.615 85.578.636 -

TGACCATATAATTTTCTCATGT 3 Chr7 85.751.877 85.751.898 -

TGAGCATTTCCTTATGTCAGCA 3 Chr7 87.030.016 87.030.037 -

TGACTATATACTTATTTTATAC 3 Chr7 92.560.550 92.560.571 -

TGACCATATTTTCATGTCAGTA 3 Chr7 93.186.026 93.186.047 +

TGACCATTTATTTGTGTCATTC 3 Chr7 93.267.689 93.267.710 +

AGATCATATACTTATGTCCAAG 3 Chr7 94.031.182 94.031.203 +

TGACCATACTATTATGTCACCT 3 Chr7 96.802.825 96.802.846 +

TGCCCATCTACTTATTTCATGA 3 Chr7 98.353.642 98.353.663 -

TGACCATAATCTTATTTCATTG 3 Chr7 98.895.130 98.895.151 -

184 Appendix

AGATCATATACATATGTCAATG 3 Chr7 102.488.397 102.488.418 -

TGAACATTTACTTATGTAAAAG 3 Chr7 120.140.375 120.140.396 -

TGAATACATACTTATGTCAGTG 3 Chr7 121.025.864 121.025.885 +

TGACCAAACACTTATATCACAG 3 Chr7 121.096.053 121.096.074 -

TGAAGATATAATTATGTCAGGA 3 Chr7 126.965.486 126.965.507 +

TGACCAAATACTCATTTCACCA 3 Chr7 128.737.458 128.737.479 +

TGACACTTTACTTATGTCATTT 3 Chr7 137.670.812 137.670.833 +

TGACAATATGCATATGTCAGGA 3 Chr7 144.991.849 144.991.870 +

TGACCATGTGCTTCTGTCACTA 3 Chr7 145.872.377 145.872.398 -

TTACCCTATACTTATTTCATTA 3 Chr8 5.879.893 5.879.914 -

TGGCCATATATTTATTTCATTC 3 Chr8 13.113.666 13.113.687 -

TGACCATATATTTATGTATAAA 3 Chr8 13.872.827 13.872.848 +

CGATCATATACTTATGTTAAAG 3 Chr8 14.293.012 14.293.033 +

TGAATGTATACTTATGTCAAAA 3 Chr8 14.707.243 14.707.264 +

TGACTATATTCTTAAGTCATTC 3 Chr8 14.859.685 14.859.706 -

TGACCCTACAATTATGTCAAAA 3 Chr8 27.027.281 27.027.302 +

TGACCATATTCTTAGATCACAC 3 Chr8 27.677.127 27.677.148 -

TGGCCATATACTTATCCCATAA 3 Chr8 29.237.665 29.237.686 +

TGACCATCAAATTATGTCAGGG 3 Chr8 32.863.989 32.864.010 -

TGACCACATACTTATCTTAGAA 3 Chr8 35.857.271 35.857.292 -

TGCCAATATACTAATGTCACAA 3 Chr8 49.470.515 49.470.536 +

TGACCATGTAATTATGTAATGA 3 Chr8 52.035.899 52.035.920 +

TGAGCACATACTTATGGCACTG 3 Chr8 58.918.015 58.918.036 +

TGAACATATACTTATTCCACCT 3 Chr8 61.101.422 61.101.443 +

TGGCCATATACTTAGGTCTTCC 3 Chr8 62.835.301 62.835.322 -

TGACCATTTACTTCTGTTATCA 3 Chr8 65.632.177 65.632.198 +

TGACCATTTCATTATGTCATAG 3 Chr8 70.256.861 70.256.882 -

TGCACATATACTTATGTAACCT 3 Chr8 72.447.051 72.447.072 +

TGACCATAAACTTAAGTGAATT 3 Chr8 74.200.263 74.200.284 -

AAACCATATACATATGTCAAAT 3 Chr8 83.931.409 83.931.430 +

TGACCATTTATTTATGTTAGGT 3 Chr8 85.695.767 85.695.788 +

TGCCCATATACCAATGTCAAAA 3 Chr8 91.660.614 91.660.635 -

TGAGCATATACTTAGGTCTTCA 3 Chr8 92.321.566 92.321.587 +

TGACCATGTATGTATGTCAACA 3 Chr8 94.821.383 94.821.404 +

TGACCATATACTTATTACTTAA 3 Chr8 95.218.239 95.218.260 -

TGACCAGATACTCATATCACAT 3 Chr8 97.411.454 97.411.475 -

TTACCATATAGTTATGACAATA 3 Chr8 99.792.332 99.792.353 -

185 Appendix

TGACCATATTCATTTGTCATTC 3 Chr8 105.377.854 105.377.875 +

TGAGCATATACTGATGGCAAAT 3 Chr8 109.415.091 109.415.112 +

TGATCATATACTTATGTGTTAA 3 Chr8 112.765.444 112.765.465 -

TTACAATAAACTTATGTCAGAA 3 Chr8 113.986.592 113.986.613 +

TTACCATACAGTTATGTCATAT 3 Chr8 113.991.573 113.991.594 +

TGACCAGAACCTTATGTCACAG 3 Chr8 115.348.448 115.348.469 +

TGACCATATATTTATGGAATTC 3 Chr8 116.524.964 116.524.985 -

TGACCATAGACTTATGCCTCCT 3 Chr8 122.062.297 122.062.318 +

TGCCCAGATACTTAGGTCATAA 3 Chr8 127.006.713 127.006.734 +

TGACAATATTATTATGTCATTA 3 Chr8 132.688.705 132.688.726 -

TGACCACATAAATATGTCACTG 3 Chr8 135.565.745 135.565.766 -

TGGCCATATATATATGTCATAC 3 Chr8 137.784.938 137.784.959 -

TGACTGTATACATATGTCAAAA 3 Chr8 138.708.203 138.708.224 -

TAAACATATACTTATGTGATGA 3 Chr8 141.186.795 141.186.816 -

TTACCAAATACTTATGTTACTT 3 Chr8 141.726.114 141.726.135 +

TGACTATATACATTTGTCAAAA 3 Chr8 142.515.032 142.515.053 +

TGACCAAAAAGTTATGTCAATT 3 Chr9 64.430 64.451 +

TGAACATATTCTTTTGTCAGAG 3 Chr9 9.274.651 9.274.672 +

TGACCAGATAGTTAAGTCAGTT 3 Chr9 9.497.307 9.497.328 -

TGAACATTTCCTTATGTCATTA 3 Chr9 14.772.410 14.772.431 -

TGACCAGCTACTTATGTTAACT 3 Chr9 18.255.476 18.255.497 +

TGAGCACATACTTATATCAGAA 3 Chr9 22.419.433 22.419.454 +

TTACTATATACTTATATCACCT 3 Chr9 23.466.256 23.466.277 -

AGACCATATAATTATGGCAGCA 3 Chr9 25.218.110 25.218.131 +

TGAGCACCTACTTATGTCAAGC 3 Chr9 27.155.618 27.155.639 -

TGAACATACACATATGTCAACA 3 Chr9 29.595.962 29.595.983 +

TGACCATATATTTATTCCAAGC 3 Chr9 29.985.415 29.985.436 +

TGAACATAAACTTATGTCTTCT 3 Chr9 30.330.780 30.330.801 -

TGGCCATATAGTTATCTCACCA 3 Chr9 32.932.005 32.932.026 +

GGACTATATACTTAGGTCATAC 3 Chr9 36.987.292 36.987.313 -

TGACCAAAAAGTTATGTCAATT 3 Chr9 68.800.749 68.800.770 +

TGACCAAAAAGTTATGTCAATT 3 Chr9 69.138.843 69.138.864 +

TGACCAAAAAGTTATGTCAATT 3 Chr9 70.981.022 70.981.043 -

TGAACATATTTTTATGTCATTT 3 Chr9 72.993.837 72.993.858 +

TGAACAAATACTTGTGTCATCC 3 Chr9 75.243.233 75.243.254 +

TGACCATATTCTTCTCTCACAC 3 Chr9 75.888.940 75.888.961 -

TGAAAATATACTTATTTCACCT 3 Chr9 76.273.772 76.273.793 -

186 Appendix

TAAACATATATTTATGTCATTT 3 Chr9 79.121.664 79.121.685 +

TGACCTTATACTTATGTTCATC 3 Chr9 81.485.633 81.485.654 -

TGAACATATATTTGTGTCAGTG 3 Chr9 83.369.230 83.369.251 +

TGACCAAATACATCTGTCAATT 3 Chr9 85.564.460 85.564.481 +

TGAACATATAATTATGTAAATC 3 Chr9 87.266.518 87.266.539 +

TGTCCATTTACTTATGTCTTAA 3 Chr9 87.736.628 87.736.649 -

TAATCATACACTTATGTCACCA 3 Chr9 94.631.244 94.631.265 +

TGAGCATTTTCTTATGTCAAAA 3 Chr9 99.059.333 99.059.354 -

TCAACATATATTTATGTCAGAA 3 Chr9 106.740.279 106.740.300 -

TGACTTAATACTTATGTCAAAA 3 Chr9 108.076.669 108.076.690 -

TGTCCATATACATATGTTAATT 3 Chr9 109.164.812 109.164.833 -

TGTCCATTTACTTATTTCATTC 3 Chr9 119.173.167 119.173.188 +

TAACCATTTACTTATGTAAAGA 3 Chr9 120.627.387 120.627.408 +

TGTCTATATGCTTATGTCAGAG 3 Chr9 121.281.070 121.281.091 -

TGTCCATATATATATGTCACAT 3 Chr9 121.290.091 121.290.112 +

TGGCCATATTCTTATTTCACAT 3 Chr9 127.144.901 127.144.922 -

TGACCATAGACTGATCTCATAA 3 Chr9 131.747.737 131.747.758 +

GGACCACATACTTATGTCCCCT 3 ChrX 2.474.057 2.474.078 -

TGACCATTTTCTCATGTCATTT 3 ChrX 2.534.376 2.534.397 +

TGATTATATAATTATGTCATAG 3 ChrX 4.145.693 4.145.714 +

TGACCATATGATTATGCCACTG 3 ChrX 7.379.263 7.379.284 -

TTAGCATTTACTTATGTCATGT 3 ChrX 8.798.015 8.798.036 +

TGAACATATAATTATGCCACTC 3 ChrX 10.670.088 10.670.109 +

TGACTATATACATTTGTCAAAA 3 ChrX 19.137.241 19.137.262 +

TGCCTATATAGTTATGTCAAAA 3 ChrX 22.615.995 22.616.016 -

TGACTATATTCTTATATCATAT 3 ChrX 23.693.298 23.693.319 -

TGACTATATACATTTGTCAAAA 3 ChrX 25.448.117 25.448.138 +

TGACCACATACTTATCTCCACG 3 ChrX 26.849.956 26.849.977 +

TGACCAAAGACTTAGGTCACAA 3 ChrX 27.201.493 27.201.514 +

TGACCAGACTCTTATGTCAAAC 3 ChrX 33.353.975 33.353.996 +

TAAACATATACTCATGTCATCT 3 ChrX 39.132.279 39.132.300 +

TGACCATATACAAATCTCATGA 3 ChrX 43.130.215 43.130.236 +

TGAGCATATACATAAGTCAAAC 3 ChrX 48.707.368 48.707.389 -

TGACCAAATGCTGATGTCAGAT 3 ChrX 49.092.996 49.093.017 -

TCTCCATATACTTATATCAGCG 3 ChrX 50.296.252 50.296.273 -

TGACCATATACGTATGTATGTG 3 ChrX 51.437.713 51.437.734 -

TGACCATATACTGATCTTAGCT 3 ChrX 52.056.013 52.056.034 +

187 Appendix

TGAATATATAATTATGTCACAG 3 ChrX 55.802.099 55.802.120 -

TGACCATCTTCTTATGGCATTG 3 ChrX 62.887.882 62.887.903 -

TGAGCATTTACTTTTGTCATAA 3 ChrX 65.303.668 65.303.689 -

TGAGCATATTCTCATGTCATCA 3 ChrX 71.564.346 71.564.367 +

AGACCATATAATCATGTCAACA 3 ChrX 72.812.287 72.812.308 -

TGAACATATACGTGTGTCATTA 3 ChrX 76.770.761 76.770.782 -

TGAGCATTTATTTATGTCAGGA 3 ChrX 82.785.433 82.785.454 +

TGACCAAATACTTAGATCACAA 3 ChrX 94.006.715 94.006.736 -

TGACCAGGTACTTATATCATCA 3 ChrX 94.730.687 94.730.708 +

TTACCATATGCTTGTGTCATAT 3 ChrX 96.659.631 96.659.652 +

TGGCCATATTTTTATGTCAGTA 3 ChrX 99.958.783 99.958.804 -

TGACCACACAGTTATGTCAGTA 3 ChrX 103.225.319 103.225.340 -

TGACCACACAGTTATGTCAGTA 3 ChrX 103.322.666 103.322.687 +

TGGCCATATAGTTATATCAAAA 3 ChrX 103.676.999 103.677.020 -

TTACCATATACTAATGACACAT 3 ChrX 104.762.731 104.762.752 +

TGACCTTACACTTAAGTCATCT 3 ChrX 109.509.236 109.509.257 -

TAACCATATATTTATATCACAA 3 ChrX 112.150.166 112.150.187 -

TAAGCATATACTTATGTTATAT 3 ChrX 115.742.508 115.742.529 +

TGGCAATATAGTTATGTCAAAA 3 ChrX 124.241.297 124.241.318 -

TGACCATATATTTGTGGCACTA 3 ChrX 127.152.237 127.152.258 -

TGACCATATACTTTTCTCCTTT 3 ChrX 130.939.784 130.939.805 -

TAACATTATACTTATGTCATGG 3 ChrX 132.187.189 132.187.210 -

TGAGTATATACTTATGGCAATA 3 ChrX 133.330.610 133.330.631 -

TGAGCATTTACTCATGTCATTA 3 ChrX 133.735.185 133.735.206 +

TGATCATATATTTATGTAAAAA 3 ChrX 137.931.509 137.931.530 -

TCACCATACACTTTTGTCATTC 3 ChrX 140.775.052 140.775.073 -

TTTCCATTTACTTATGTCACCT 3 ChrX 143.934.995 143.935.016 -

TGACTATATACATCTGTCAAAT 3 ChrX 154.032.027 154.032.048 +

GGACCACATACTTATGTCCCCT 3 ChrY 2.424.057 2.424.078 -

TGACCATTTTCTCATGTCATTT 3 ChrY 2.484.376 2.484.397 +

TGAGCATAAACTTATGCCATTG 3 ChrY 16.175.659 16.175.680 +

TCACCATATACATCTGTCATCT 3 ChrY 23.771.126 23.771.147 - 1 As compared to the query TGACCATATACTTATGTCANNN 2 Referenced to the human genome GRCh37/hg19 assembly

188 Appendix Table 6.3: Results of ATAC-seq analysis. (325 sites with reduced chromatin accessibility)

ATAC-seq Normalized counts)

Average Average Gene start Gene end dDMT #6 DEM #6 GENE ID Gene Name Chromosome (bp) (bp) ENSG00000157933 SKI chr1 2190561 2190710 4,47 0 ENSG00000078900 TP73 chr1 3636761 3636910 5,03 0 ENSG00000049249 TNFRSF9 chr1 7991021 7991170 5,03 0,76 ENSG00000116649 SRM chr1 11121001 11121150 4,47 0 chr1 12217501 12217650 7,82 2,28 ENSG00000142634 EFHD2 chr1 15679061 15679210 5,03 0 ENSG00000158748 HTR6 chr1 19986421 19986570 9,49 2,28 ENSG00000157978 LDLRAP1 chr1 25883281 25883430 5,59 0,76 ENSG00000180198 RCC1 chr1 28844981 28845130 5,03 0,76 ENSG00000264773 MIR4420 chr1 31221041 31221190 6,14 0,76 ENSG00000182866 LCK chr1 32722181 32722330 6,14 0,76 chr1 33427981 33428130 5,59 0,76 ENSG00000116514 RNF19B chr1 33429441 33429590 13,4 3,8 ENSG00000117419 ERI3 chr1 44782181 44782330 4,47 0,76 ENSG00000162624 LHX8 chr1 75669481 75669630 5,59 0,76 ENSG00000264406 MIR548AP chr1 84397421 84397570 6,14 0,76 ENSG00000171517 LPAR3 chr1 85401961 85402110 6,7 1,52 ENSG00000153898 MCOLN2 chr1 85435781 85435930 8,94 1,52 ENSG00000227290 RP4-544H6.2 chr1 87901261 87901410 6,14 0 ENSG00000137947 GTF2B chr1 89356221 89356370 5,59 0,76 ENSG00000172031 EPHX4 chr1 92482341 92482490 4,47 0 ENSG00000154511 FAM69A chr1 93416761 93416910 5,59 0,76 ENSG00000134215 VAV3 chr1 108136441 108136590 6,7 0,76 ENSG00000143079 CTTNBP2NL chr1 112933541 112933690 16,76 3,04 ENSG00000173218 VANGL1 chr1 116185401 116185550 6,7 1,52 ENSG00000163564 PYHIN1 chr1 158900441 158900590 4,47 0 ENSG00000162739 SLAMF6 chr1 160492961 160493110 5,59 0 ENSG00000066294 CD84 chr1 160548141 160548290 4,47 0 ENSG00000117560 FASLG chr1 172714221 172714370 7,26 1,52 ENSG00000173627 APOBEC4 chr1 183654341 183654490 4,47 0,76 ENSG00000127074 RGS13 chr1 192671421 192671570 5,03 0 ENSG00000221680 MIR1278 chr1 193111021 193111170 7,82 1,52 ENSG00000198625 MDM4 chr1 204547981 204548130 5,59 0,76 ENSG00000154305 MIA3 chr1 222807861 222808010 5,03 0,76 ENSG00000228382 ITPKB-IT1 chr1 226844401 226844550 6,14 0 chr1 244964601 244964750 5,03 0,76 ENSG00000203667 COX20 chr1 244997121 244997270 5,03 0 ENSG00000067082 KLF6 chr10 3846681 3846830 5,03 0,76 ENSG00000236990 RP11-433J20.1 chr10 4075641 4075790 5,59 0,76 ENSG00000233117 LINC00702 chr10 4277041 4277190 5,03 0,76 ENSG00000065665 SEC61A2 chr10 12154881 12155030 5,59 0,76 ENSG00000107863 ARHGAP21 chr10 24931941 24932090 4,47 0 ENSG00000150347 ARID5B chr10 63594981 63595130 4,47 0,76 ENSG00000148572 NRBF2 chr10 64926681 64926830 5,03 0 ENSG00000099282 TSPAN15 chr10 71209641 71209790 5,03 0 ENSG00000108175 ZMIZ1 chr10 80828881 80829030 10,05 2,28

189 Appendix ENSG00000108219 TSPAN14 chr10 82223321 82223470 6,14 0,76 ENSG00000026103 FAS chr10 90747181 90747330 8,38 2,28 ENSG00000185745 IFIT1 chr10 91151761 91151910 7,26 1,52 ENSG00000188649 CC2D2B chr10 97723441 97723590 7,26 0 ENSG00000171314 PGAM1 chr10 99187741 99187890 7,82 1,52 ENSG00000148843 PDCD11 chr10 105176741 105176890 5,03 0,76 ENSG00000148834 GSTO1 chr10 106015501 106015650 5,59 0 ENSG00000232139 LINC00867 chr10 120219241 120219390 11,17 1,52 ENSG00000174885 NLRP6 chr11 268821 268970 6,14 0,76 ENSG00000167325 RRM1 chr11 4117161 4117310 5,59 0 ENSG00000070081 NUCB2 chr11 17297801 17297950 15,64 3,8 ENSG00000110786 PTPN5 chr11 18824301 18824450 5,03 0 ENSG00000085117 CD82 chr11 44545121 44545270 6,7 1,52 ENSG00000110448 CD5 chr11 60810741 60810890 5,03 0 ENSG00000229859 PGA3 chr11 60955161 60955310 5,03 0 ENSG00000174684 B3GNT1 chr11 66115021 66115170 6,7 0,76 ENSG00000174547 MRPL11 chr11 66197561 66197710 5,59 0,76 ENSG00000149273 RPS3 chr11 75094381 75094530 6,7 1,52 ENSG00000266192 MIR1260B chr11 95932881 95933030 7,82 1,52 ENSG00000023445 BIRC3 chr11 102150481 102150630 8,94 0,76 ENSG00000137673 MMP7 chr11 102431541 102431690 7,82 1,52 ENSG00000182985 CADM1 chr11 115088881 115089030 8,38 0,76 ENSG00000198331 HYLS1 chr11 125741401 125741550 5,03 0 ENSG00000184574 LPAR5 chr12 6744601 6744750 5,03 0,76 ENSG00000188393 CLEC2A chr12 10078481 10078630 4,47 0 RP11- ENSG00000245648 277P12.20 chr12 10517621 10517770 6,7 0,76 ENSG00000171681 ATF7IP chr12 14607101 14607250 5,03 0,76 ENSG00000256287 RP11-664H17.1 chr12 19945081 19945230 7,82 1,52 ENSG00000060982 BCAT1 chr12 24948761 24948910 5,59 0,76 ENSG00000050405 LIMA1 chr12 50616401 50616550 4,47 0,76 ENSG00000182379 NXPH4 chr12 57608161 57608310 7,82 1,52 ENSG00000173401 GLIPR1L1 chr12 75740081 75740230 7,26 1,52 ENSG00000139318 DUSP6 chr12 89727781 89727930 9,49 2,28 ENSG00000257594 GALNT4 chr12 89871481 89871630 5,03 0 ENSG00000271614 LINC00936 chr12 90469621 90469770 5,59 0,76 ENSG00000136040 PLXNC1 chr12 94638801 94638950 7,26 0,76 ENSG00000227825 SLC9A7P1 chr12 98818701 98818850 11,17 2,28 ENSG00000136052 SLC41A2 chr12 105158601 105158750 5,59 0,76 ENSG00000135111 TBX3 chr12 115121981 115122130 9,49 2,28 ENSG00000176871 WSB2 chr12 118493001 118493150 5,03 0,76 ENSG00000182500 ORAI1 chr12 122075221 122075370 4,47 0,76 ENSG00000212694 AC084018.1 chr12 122237461 122237610 7,82 0,76 ENSG00000151846 PABPC3 chr13 25611481 25611630 5,59 0 ENSG00000102781 KATNAL1 chr13 30720301 30720450 8,94 1,52 chr13 40795821 40795970 5,03 0 ENSG00000150907 FOXO1 chr13 41227841 41227990 5,59 0,76 chr13 44833921 44834070 4,47 0,76 ENSG00000233610 LINC00462 chr13 49169901 49170050 6,14 0,76 ENSG00000236051 MYCBP2-AS1 chr13 77758741 77758890 6,7 1,52 ENSG00000136367 ZFHX2 chr14 24021101 24021250 4,47 0,76 ENSG00000092140 G2E3 chr14 31028881 31029030 6,7 1,52 ENSG00000140043 PTGR2 chr14 74292361 74292510 9,49 1,52

190 Appendix ENSG00000259687 RP11-293M10.5 chr14 75800981 75801130 5,03 0,76 ENSG00000156127 BATF chr14 76008321 76008470 8,38 1,52 ENSG00000100599 RIN3 chr14 93052301 93052450 5,03 0 ENSG00000100600 LGMN chr14 93192581 93192730 6,14 0,76 ENSG00000175699 LINC00521 chr14 94474721 94474870 4,47 0,76 ENSG00000176438 SYNE3 chr14 95946561 95946710 8,38 1,52 ENSG00000196405 EVL chr14 100529001 100529150 7,26 1,52 ENSG00000183706 OR4N4 chr15 22386901 22387050 5,03 0,76 ENSG00000104131 EIF3J chr15 44838201 44838350 6,7 1,52 ENSG00000081014 AP4E1 chr15 51172881 51173030 5,03 0,76 ENSG00000069956 MAPK6 chr15 52298301 52298450 4,47 0,76 ENSG00000259482 RP11-219B17.2 chr15 60952861 60953010 4,47 0 chr15 63684921 63685070 4,47 0,76 ENSG00000166233 ARIH1 chr15 72767821 72767970 7,82 0 ENSG00000103653 CSK chr15 75077641 75077790 4,47 0,76 ENSG00000183476 SH2D7 chr15 78396221 78396370 7,26 1,52 ENSG00000170776 AKAP13 chr15 86140741 86140890 5,59 0 ENSG00000259416 RP11-158M2.5 chr15 86267341 86267490 7,26 0,76 ENSG00000259527 LINC00052 chr15 88140721 88140870 5,59 0,76 ENSG00000259494 MRPL46 chr15 89010261 89010410 4,47 0 ENSG00000172183 ISG20 chr15 89183321 89183470 13,4 2,28 ENSG00000140577 CRTC3 chr15 91107461 91107610 4,47 0,76 ENSG00000185418 TARSL2 chr15 102264901 102265050 7,26 1,52 ENSG00000006007 GDE1 chr16 19528881 19529030 6,14 0,76 ENSG00000102879 CORO1A chr16 30196881 30197030 8,94 2,28 ENSG00000261239 ANKRD26P1 chr16 46426941 46427090 7,26 1,52 ENSG00000237172 B3GNT9 chr16 67184241 67184390 4,47 0,76 ENSG00000260873 SNTB2 chr16 69221381 69221530 6,14 0 ENSG00000140836 ZFHX3 chr16 72821961 72822110 5,59 0,76 ENSG00000140950 TLDC1 chr16 84540101 84540250 11,17 2,28 ENSG00000223510 CDRT15 chr17 14104561 14104710 13,4 3,8 ENSG00000168961 LGALS9 chr17 25943501 25943650 4,47 0,76 ENSG00000108702 CCL1 chr17 32688661 32688810 5,03 0,76 ENSG00000230055 CISD3 chr17 36884081 36884230 7,26 0,76 chr17 45378581 45378730 6,7 0 ENSG00000167182 SP2 chr17 45967401 45967550 5,59 0,76 ENSG00000121073 SLC35B1 chr17 47792681 47792830 6,14 0,76 ENSG00000166292 TMEM100 chr17 53654641 53654790 5,03 0 ENSG00000186407 CD300E chr17 72642541 72642690 5,59 0 ENSG00000264511 MIR3678 chr17 73402121 73402270 10,05 1,52 chr17 75839761 75839910 8,94 0,76 ENSG00000167280 ENGASE chr17 77083761 77083910 5,59 0 ENSG00000266189 MIR3186 chr17 79448441 79448590 7,82 1,52 ENSG00000141526 SLC16A3 chr17 80186801 80186950 11,17 2,28 ENSG00000178927 C17orf62 chr17 80407241 80407390 5,03 0 ENSG00000176890 TYMS chr18 650441 650590 8,94 1,52 ENSG00000101577 LPIN2 chr18 3062661 3062810 5,03 0 ENSG00000177426 TGIF1 chr18 3453981 3454130 6,14 0 ENSG00000266835 RP11-838N2.4 chr18 3466961 3467110 4,47 0 ENSG00000266053 RP11-143J12.2 chr18 9120901 9121050 4,47 0 ENSG00000134769 DTNA chr18 32303221 32303370 5,03 0,76 ENSG00000141664 ZCCHC2 chr18 60119661 60119810 6,14 0,76 chr18 74534721 74534870 11,17 0,76

191 Appendix ENSG00000172270 BSG chr19 571041 571190 5,03 0 ENSG00000090661 CERS4 chr19 8259481 8259630 5,03 0 ENSG00000213339 QTRT1 chr19 10810541 10810690 5,59 0 ENSG00000160888 IER2 chr19 13388861 13389010 7,26 0,76 ENSG00000132003 ZSWIM4 chr19 13898821 13898970 5,59 0,76 ENSG00000105643 ARRDC2 chr19 18133061 18133210 7,82 1,52 ENSG00000105655 ISYNA1 chr19 18561841 18561990 5,03 0,76 ENSG00000167491 GATAD2A chr19 19517841 19517990 4,47 0 ENSG00000188508 KRTDAP chr19 35986321 35986470 5,59 0 ENSG00000160396 HIPK4 chr19 40904781 40904930 5,03 0 ENSG00000007129 CEACAM21 chr19 42061341 42061490 9,49 1,52 ENSG00000105419 MEIS3 chr19 47940001 47940150 8,94 0,76 chr19 50487001 50487150 4,47 0 chr2 7885061 7885210 9,49 1,52 ENSG00000134330 IAH1 chr2 9614121 9614270 6,14 0 ENSG00000265172 MIR4262 chr2 11969461 11969610 7,26 1,52 chr2 15146981 15147130 8,38 1,52 ENSG00000143889 HNRNPLL chr2 38806161 38806310 5,59 0 ENSG00000143924 EML4 chr2 42350481 42350630 6,14 0 ENSG00000138081 FBXO11 chr2 48131901 48132050 5,03 0 ENSG00000196843 ARID5A chr2 97203841 97203990 4,47 0,76 ENSG00000183513 COA5 chr2 99167961 99168110 5,03 0,76 chr2 106413761 106413910 7,82 1,52 ENSG00000071051 NCK2 chr2 106543841 106543990 7,82 0,76 ENSG00000115008 IL1A chr2 113551841 113551990 7,82 2,28 ENSG00000125629 INSIG2 chr2 118979701 118979850 6,14 0 ENSG00000121966 CXCR4 chr2 137000441 137000590 17,87 2,28 ENSG00000115935 WIPF1 chr2 175468141 175468290 7,82 1,52 ENSG00000170144 HNRNPA3 chr2 178058081 178058230 5,03 0,76 ENSG00000138381 ASNSD1 chr2 190527281 190527430 4,47 0 ENSG00000151690 MFSD6 chr2 191272181 191272330 7,26 0,76 ENSG00000115415 STAT1 chr2 191878361 191878510 4,47 0,76 ENSG00000230173 AC006196.1 chr2 196033221 196033370 4,47 0 ENSG00000081320 STK17B chr2 197031681 197031830 7,26 1,52 ENSG00000173166 RAPH1 chr2 204399341 204399490 6,14 0,76 ENSG00000197713 RPE chr2 210825861 210826010 13,4 3,04 ENSG00000135912 TTLL4 chr2 219572861 219573010 5,03 0 ENSG00000188389 PDCD1 chr2 242805721 242805870 8,38 0,76 ENSG00000230176 RP4-779E11.3 chr20 4152941 4153090 5,59 0,76 ENSG00000149596 JPH2 chr20 42709501 42709650 9,49 1,52 ENSG00000229005 HNF4A-AS1 chr20 43017281 43017430 9,49 0 chr20 43996901 43997050 7,26 1,52 ENSG00000020256 ZFP64 chr20 51019621 51019770 5,03 0 ENSG00000124209 RAB22A chr20 56885301 56885450 6,14 0 ENSG00000124203 ZNF831 chr20 57740301 57740450 6,14 0 ENSG00000101189 MRGBP chr20 61426601 61426750 6,14 0,76 ENSG00000101191 DIDO1 chr20 61531701 61531850 8,94 1,52 chr21 32531661 32531810 4,47 0,76 ENSG00000206102 KRTAP19-8 chr21 32552961 32553110 5,59 0,76 ENSG00000205670 SMIM11 chr21 35756501 35756650 8,94 2,28 ENSG00000186716 BCR chr22 23562081 23562230 5,03 0,76 ENSG00000100285 NEFH chr22 29877001 29877150 7,26 0,76 ENSG00000100191 SLC5A4 chr22 32662261 32662410 6,14 0,76

192 Appendix chr22 36682581 36682730 4,47 0 ENSG00000128383 APOBEC3A chr22 39350101 39350250 9,49 2,28 ENSG00000100298 APOBEC3H chr22 39495561 39495710 8,94 0,76 ENSG00000134077 THUMPD3 chr3 9396361 9396510 5,03 0,76 ENSG00000163701 IL17RE chr3 9943701 9943850 4,47 0 ENSG00000134070 IRAK2 chr3 10233901 10234050 10,61 1,52 ENSG00000163508 EOMES chr3 27669621 27669770 5,59 0 ENSG00000091317 CMTM6 chr3 32511441 32511590 5,03 0 ENSG00000188167 TMPPE chr3 33138141 33138290 5,03 0,76 ENSG00000136059 VILL chr3 38037201 38037350 4,47 0,76 ENSG00000168329 CX3CR1 chr3 39329921 39330070 10,05 2,28 ENSG00000173769 TOPAZ1 chr3 44249721 44249870 4,47 0,76 ENSG00000160791 CCR5 chr3 46410201 46410350 4,47 0 ENSG00000178038 ALS2CL chr3 46736801 46736950 4,47 0 ENSG00000163947 ARHGEF3 chr3 56787361 56787510 7,82 1,52 ENSG00000144746 ARL6IP5 chr3 69144921 69145070 11,73 2,28 ENSG00000114861 FOXP1 chr3 71119081 71119230 7,82 1,52 ENSG00000163602 RYBP chr3 72494101 72494250 4,47 0,76 ENSG00000178700 DHFRL1 chr3 93968481 93968630 7,26 0,76 ENSG00000170044 ZPLD1 chr3 102134521 102134670 4,47 0 ENSG00000114423 CBLB chr3 106253241 106253390 5,03 0 ENSG00000163607 GTPBP8 chr3 112721681 112721830 8,94 2,28 ENSG00000181847 TIGIT chr3 114010281 114010430 4,47 0,76 ENSG00000163762 TM4SF18 chr3 149070341 149070490 6,7 1,52 ENSG00000058056 USP13 chr3 179468621 179468770 6,7 0,76 ENSG00000073803 MAP3K13 chr3 185072041 185072190 12,29 2,28 ENSG00000213139 CRYGS chr3 186258421 186258570 10,05 1,52 ENSG00000163923 RPL39L chr3 186822841 186822990 4,47 0 ENSG00000159674 SPON2 chr4 1181101 1181250 5,03 0 ENSG00000168818 STX18 chr4 4526241 4526390 5,03 0 ENSG00000163453 IGFBP7 chr4 58293041 58293190 7,82 0,76 chr4 101940081 101940230 8,94 0 ENSG00000109320 NFKB1 chr4 103424541 103424690 5,59 0,76 ENSG00000138802 SEC24B chr4 110327601 110327750 4,47 0 ENSG00000180801 ARSJ chr4 114992261 114992410 6,14 0,76 ENSG00000145386 CCNA2 chr4 122744001 122744150 8,38 2,28 ENSG00000164136 IL15 chr4 143048181 143048330 5,59 0 ENSG00000170390 DCLK2 chr4 150185441 150185590 5,03 0 ENSG00000205208 C4orf46 chr4 159546161 159546310 6,7 0,76 ENSG00000073578 SDHA chr5 219621 219770 5,59 0,76 ENSG00000145491 ROPN1L chr5 10452921 10453070 5,03 0,76 ENSG00000082074 FYB chr5 39176901 39177050 7,26 0,76 chr5 42994661 42994810 8,94 2,28 ENSG00000152684 PELO chr5 52072141 52072290 9,49 2,28 ENSG00000113448 PDE4D chr5 58534561 58534710 5,03 0,76 ENSG00000113532 ST8SIA4 chr5 100237421 100237570 6,14 0,76 ENSG00000164400 CSF2 chr5 131438001 131438150 8,38 1,52 ENSG00000120708 TGFBI chr5 135372741 135372890 4,47 0 ENSG00000216009 MIR874 chr5 136965321 136965470 5,03 0,76 ENSG00000266478 MIR5197 chr5 142985981 142986130 14,52 1,52 ENSG00000145850 TIMD4 chr5 156347361 156347510 7,26 0,76 ENSG00000266890 MIR4634 chr5 174220821 174220970 8,94 0,76 ENSG00000175309 PHYKPL chr5 177676261 177676410 4,47 0

193 Appendix ENSG00000112137 PHACTR1 chr6 12894201 12894350 8,38 1,52 ENSG00000145990 GFOD1 chr6 13432241 13432390 8,38 2,28 ENSG00000137198 GMPR chr6 16436901 16437050 5,03 0,76 ENSG00000124788 ATXN1 chr6 16712641 16712790 11,17 2,28 chr6 16849701 16849850 4,47 0,76 ENSG00000111913 FAM65B chr6 25050721 25050870 6,7 1,52 ENSG00000112343 TRIM38 chr6 25946941 25947090 4,47 0,76 chr6 27652421 27652570 7,26 0,76 ENSG00000156711 MAPK13 chr6 36099521 36099670 13,96 1,52 ENSG00000112742 TTK chr6 80713521 80713670 7,26 0,76 chr6 88561561 88561710 4,47 0 ENSG00000112246 SIM1 chr6 100705881 100706030 5,59 0,76 ENSG00000146285 SCML4 chr6 107990461 107990610 4,47 0,76 ENSG00000123505 AMD1 chr6 111105561 111105710 5,59 0 ENSG00000249853 HS3ST5 chr6 114429641 114429790 6,14 0 ENSG00000196376 SLC35F1 chr6 118297341 118297490 6,14 0,76 ENSG00000111879 FAM184A chr6 119466441 119466590 12,29 3,04 ENSG00000146376 ARHGAP18 chr6 129993541 129993690 5,03 0,76 ENSG00000227660 RP11-162J8.2 chr6 149354041 149354190 15,08 3,04 ENSG00000009765 IYD chr6 150585061 150585210 5,03 0 ENSG00000091831 ESR1 chr6 152023041 152023190 6,14 0,76 ENSG00000182095 TNRC18 chr7 5436221 5436370 7,82 0,76 ENSG00000226097 AC099342.1 chr7 20320661 20320810 5,03 0,76 chr7 28966741 28966890 10,61 2,28 ENSG00000106100 NOD1 chr7 30519081 30519230 6,7 0,76 ENSG00000197085 NPSR1-AS1 chr7 34758461 34758610 9,49 0,76 ENSG00000136250 AOAH chr7 36776061 36776310 5,03 0,76 ENSG00000106327 TFR2 chr7 100231341 100231490 5,03 0,76 ENSG00000207691 MIR183 chr7 129484001 129484150 4,47 0 ENSG00000225932 CTAGE4 chr7 143875581 143875730 7,82 1,52 ENSG00000105993 DNAJB6 chr7 157069441 157069590 6,14 0 ENSG00000147324 MFHAS1 chr8 8777501 8777650 7,26 1,52 ENSG00000061337 LZTS1 chr8 20112141 20112290 4,47 0 ENSG00000120889 TNFRSF10B chr8 22932241 22932390 7,82 1,52 ENSG00000189233 NUGGC chr8 27939641 27939790 4,47 0,76 ENSG00000012232 EXTL3 chr8 28479601 28479750 5,59 0 ENSG00000120875 DUSP4 chr8 29329901 29330050 6,7 0,76 ENSG00000172728 FUT10 chr8 33192341 33192490 10,05 2,28 ENSG00000270673 YTHDF3-AS1 chr8 64078301 64078450 8,94 0,76 ENSG00000254081 CTD-3025N20.2 chr8 66499421 66499570 7,26 1,52 ENSG00000147576 ADHFE1 chr8 67358441 67358590 10,05 1,52 ENSG00000178460 MCMDC2 chr8 67782961 67783110 16,76 3,8 ENSG00000147601 TERF1 chr8 73856341 73856490 7,26 0,76 ENSG00000104450 SPAG1 chr8 101170021 101170170 9,49 1,52 ENSG00000147677 EIF3H chr8 117560601 117560750 6,14 0,76 ENSG00000155792 DEPTOR chr8 120886741 120886890 9,49 2,28 ENSG00000147689 FAM83A chr8 124170061 124170210 4,47 0,76 ENSG00000156831 NSMCE2 chr8 126137921 126138070 6,7 1,52 ENSG00000182851 GPIHBP1 chr8 144272821 144272970 5,59 0 ENSG00000107099 DOCK8 chr9 248701 248850 4,47 0 chr9 1048121 1048270 4,47 0 ENSG00000147872 PLIN2 chr9 19124021 19124170 4,47 0 ENSG00000107201 DDX58 chr9 32525441 32525590 4,47 0,76

194 Appendix ENSG00000130222 GADD45G chr9 92221341 92221490 5,03 0,76 chr9 100149521 100149670 5,03 0 ENSG00000136925 TSTD2 chr9 100396561 100396710 4,47 0 ENSG00000119523 ALG2 chr9 101955821 101955970 5,03 0 ENSG00000023318 ERP44 chr9 102847141 102847290 7,26 1,52 ENSG00000136802 LRRC8A chr9 131662401 131662550 5,03 0,76 ENSG00000188483 IER5L chr9 131936421 131936570 5,59 0,76 chr9 131988241 131988390 4,47 0,76 ENSG00000107263 RAPGEF1 chr9 134543141 134543290 8,94 0,76 chr9 134603621 134603770 4,47 0 ENSG00000123453 SARDH chr9 136567721 136567870 7,82 0 ENSG00000130741 EIF2S3 chrX 24100041 24100190 5,59 0,76 ENSG00000183337 BCOR chrX 40033381 40033530 7,82 0 ENSG00000147119 CHST7 chrX 46434501 46434650 4,47 0 ENSG00000017621 MAGIX chrX 49022801 49022950 11,17 2,28

195 Appendix Table 6.4: Transcription start sites (TSS) closest to the differentially accessible regions identified via ATAC-seq

196 Appendix

197 Appendix

Table 6.5: Overlap between ATAC-seq and potential off-target sites

1As indicated in Appendix Table 6.4

198 Appendix

6.6 CURRICULUM VITAE

TAFADZWA MLAMBO [email protected] Britzingerstraβe 78, 79114 Freiburg, Germany • +49 177 775 60493 ______EDUCATION Doctor of Philosophy Expected 2018 University of Freiburg, Germany Master of Science in Medicine 2013 University of the Witwatersrand, South Africa Bachelor of Science with Honours 2010 Rhodes University, South Africa

RESEARCH EXPERIENCE Institute for Cell and Gene Therapy, University Medical Centre Freiburg Doctoral research 2013-present Advisors: Prof. Dr. Toni Cathomen & Dr. Claudio Mussolino ‘Precise and sustained gene silencing in CD4+ cells using designer epigenome modifiers as a therapeutic approach to treat HIV infection’ • Developed novel TALE-based epigenome modifiers targeted to the HIV co-receptors as a safe treatment strategy against HIV

University of the Witwatersrand Medical School Master’s research 2011-2013 Advisors: Prof. Patrick Arbuthnot & Dr. Betty Mowa ‘Expression of anti-HBV primary micro-RNA shuttles using an inducible promoter system’ • Degree obtained with distinction • Designed and generated a liver-specific and drug-inducible promoter system for the spatio- temporal expression of primary microRNA mimics against Hepatitis B virus

Rhodes University Department of Biochemistry, Microbiology & Biotechnology Bachelor’s research 2007-2010 Advisor: Dr. Brendan Wilhelmi ‘A study of the pharmacokinetics and anti-cancer activity of Plocornulide A from the red alga Plocamium Cornutum’

199 Appendix

• Honour’s degree obtained with an upper second class pass, Bachelor’s degree obtained with distinction in Microbiology • Investigated the pharmacokinetics of the halogenated monoterpene Plocornulide A and demonstrated efficacy against breast cancer and leukemia cell lines

COURSES AND CERTIFICATIONS Soft skill courses: Project Management for Research; Powerful Scientific Presentation; Good Scientific Practice; Patent, Trademark, Design: Intellectual Property Rights; Introduction to GxP Practical courses: Advanced Imaging techniques in Microscopy; Basic Biostatistics

HONOURS AND AWARDS Young Investigator Award 2016 Awarded at the meeting of the German Society for Gene Therapy (DGGT), Germany Deutscher Akademischer Austausch Dienst (DAAD) Scholarship 2014 Accepted into the Spemann Graduate School of Biology and 2014 Medicine (SGBM) PhD program Poliomyelitis Research Foundation Scholarship 2012 & 2013 WITS postgraduate Merit Scholarship & Merit Award 2012 & 2013 Selected as one of Rhodes University’s Top 100 students 2010 Rhodes University Honour’s Degree Scholarship 2010 Selected as one of Rhodes University’s Top 100 students 2009 Dean of Students’ Leadership Award 2009

PUBLICATIONS Mlambo T, Nitsch S, Hildenbeutel M, Romito M, Müller M, Bossen C, Diederichs S, Cornu T, Cathomen T, Mussolino C (2018). Designer epigenome modifiers enable robust and sustained gene silencing in clinically relevant human cells. NAR. Resubmitted following revision.

Rahman SH, Kuehle J, Reimann C, Mlambo T, Alzubi J, Maeder ML, et al. (2015) Rescue of DNA-PK Signaling and T-Cell Differentiation by Targeted Genome Editing in a prkdc Deficient iPSC Disease Model. PLoS Genet 11(5): e1005239. doi:10.1371/journal.pgen.1005239

Mussolino C, Mlambo T, Cathomen T (2015). Proven and novel strategies for efficient editing of the human genome. Current Opinion in Pharmacology 2015, 24:105–112

200 Appendix

6.7 ACKNOWLEDGMENTS

I would like to extend my deepest gratitude to Dr. Claudio Mussolino and Prof. Dr. Toni Cathomen, not only for giving me this opportunity, but also for their guidance and support throughout my PhD.

I am grateful to the Deutscher Akademischer Austauschdienst for funding and for the experiences offered to me through my scholarship.

I am grateful to the SGBM team for their assistance as well as to my thesis committee (Prof. Dr. Toni Cathomen, Dr. Claudio Mussolino, Prof. Dr. Peter Stäheli and Dr. Giorgos Pyrowolakis) for their dedication and support.

To all the MusCoCats, thank you for all the helpful discussions, support and friendship. I would especially like to thank Christl, Mari, Sandy, Saskia, Nicola and Melina for their help and support.

To my family and friends all over the world that have stood with me during this journey-thank you for your endless love and support

To Etienne, Mama and I could never fully express our gratitude-thank you.

Finally, a huge thank you to the two people that continue to be my biggest inspirations and that have believed in me from the very beginning: Mama and Daddy ake- I love you both, your sacrifices have not been in vain.

201 Appendix

6.8 DECLARATION

I herewith declare that I have prepared the present work without any unallowed help from third parties and without the use of any aids beyond those given. All data and concepts taken either directly or indirectly from other sources are so indicated along with a notation of the source. In particular I have not made use of any paid assistance from exchange or consulting services

(doctoral degree advisors or other persons). No one has received remuneration from me either directly or indirectly for work which is related to the content of the present dissertation.

The work has not been submitted in this country or abroad to any other examination board in this or similar form.

The provisions of the doctoral degree examination procedure of the faculty of Biology of the

University of Freiburg are known to me. In particular I am aware that before the awarding of the final doctoral degree I am not entitled to use the title of Dr.

Date and signature