Chromosome conformation capture analysis of a putative enhancer region on 1

Master´s Thesis

Submitted for the degree of Master of Science (MSc.)

to the Faculty of Natural Sciences of the Paris-Lodron-University Salzburg

by Johanna Bergbaur

Supervisor: Univ.-Prof. Dr. Angela Risch

Biosciences

Salzburg, 10th of April 2019

0

Title

Title

Chromosome conformation capture analysis of a putative enhancer region on

Analyse zur Erfassung der Chromosomenkonformation einer potenziellen Enhancer-Region auf Chromosom 1

Supervision: Univ.-Prof. Dr. Angela Risch Dr. Florian Wolf

1

Abstract

Abstract

Among all cancer types, lung cancer nowadays represents the tumor type causing the highest death rate and cigarette-smoke has been found to induce specific alterations in DNA methylation patterns, which might contribute to the onset of this malignant disease. Thus, apart from mutations, there are indications that epigenetic dysregulation also plays an important role in lung tumorigenesis. In previous studies from M. Nair (2017) and S. Kollmann (2017) smoking-induced differentially methylated regions (sDMRs) were identified by analysis of data from methylated DNA immunoprecipitation followed by sequencing (MeDIP-seq) on whole blood samples deriving from monozygotic twins with discordant smoking behavior. Functional analyses of an identified candidate sDMR on chromosome 1 revealed a putative lung-specific enhancer function of this region which was consequently termed smoking-induced differentially methylated enhancer region (seDMR).

The aim of the present study was to obtain further insight into the likely consequences of differential methylation at this putative enhancer region. Thus, it was of interest which might be the targets of this seDMR. To do this, chromosome conformation capture (3C), a method that allows the detection of DNA loops, established by S. Häsler Gunnarsdóttir (2018) in this laboratory, was applied to the lung- cancer cell line A549 and to non-lung cell lines HEK293T and HepG2 with a few optimizations, such as the employment of ethanol precipitation followed by a column-based purification kit for 3C library purification and the preparation of a fresh control library. 1 % formaldehyde was used for fixation and HindIII for library digestion. The relative interaction frequencies of the seDMR with its centromeric target sites were identified via quantitative RT-PCR.

The 3C results revealed that both, an intergenic region between the protein-coding IRF2BP2 and the long non-coding RNA (lncRNA) LINC00184 as well as an intragenic region, especially on intron 4 of TARBP1 are interaction partners of the seDMR in all three tested cell lines. In silico analyses (based on Chromatin immunoprecipitation followed by sequencing data, ChIP-seq) revealed a clustering of binding sites for the structuring transcription factors (TFs) Ying Yang 1 (YY1) and CCCTC-binding factor (CTCF) close to the detected interaction sites of the seDMR as well as on the seDMR itself.

The 3C findings appear to contradict previous findings, obtained by luciferase assays, since a lung specificity of the seDMR’s enhancer function could not be verified. However, since only relative interactions of the seDMR were determined, further analyses are required in order to make a quantitative comparison across experiments on different cell lines. Furthermore, the expected enhancer-typical interaction pattern representing the formation of an enhancer-promoter loop was not found. Hence, further analysis is required in order to determine the function of the unexpected 0

Abstract interactions with either a gene-body or with an intergenic site remote from any promoter site. This requires adaptation of the 3C approach to include an additional control region as well as the employment of a different restriction enzyme, which would then allow a more tailored assessment of absolute interaction frequencies of the seDMR with its target sites.

In summary, the employment of the 3C method to three different cell lines revealed that the seDMR interacts with intron 4 of TARBP1 as well as with an intergenic region between IRF2BP2 and LINC00184 and further functional analysis might reveal further insight into the putative consequences of these enhancer-interactions in terms of the onset or progress of (lung) cancer.

1

Zusammenfassung

Zusammenfassung

Unter allen Krebstypen weist Lungenkrebs die höchste Todesrate auf und es konnte gezeigt werden, dass Zigarettenrauch Änderungen in DNA Methylierungsmustern herbeiführt, welche zur Entstehung dieser malignen Krankheit beitragen könnten. Außerdem gibt es Hinweise darauf, dass neben Mutationen eine fehlgeleitete epigenetische Regulierung in der Entstehung von Lungenkrebs eine wichtige Rolle spielt. Bei der Analyse von MeDIP-seq Daten, welche aus Blutproben monozygoter Zwillinge mit diskordantem Rauchverhalten generiert wurden, entdeckten M. Nair (2017) und S. Kollmann (2017) durch Zigarettenrauch-induziert differentiell methylierte Regionen (sDMR). Eine Funktionsanalyse einer ausgewählten sDMR auf Chromosom 1 zeigte eine potenzielle lungen- spezifische Enhancer-Funktion dieser Region, welche im Folgenden durch Zigarettenrauch-induziert differentiell methylierte Enhancer Region (seDMR) genannt wird.

Das Ziel dieser Studie war, mehr über die Auswirkungen von differenzieller Methylierung an dieser potenziellen Enhancer Region herauszufinden. Folglich war es von höchstem Interesse, die Ziel-Gene von jener seDMR zu identifizieren. Dazu wurde die Chromosomenkonformationserfassungsanalyse (3C), eine Methode, welche die Detektion von DNS-Schleifen ermöglicht und in diesem Labor von S. Häsler Gunnarsdóttir etabliert wurde, mit ein paar Optimierungen an der Lungenkrebs-Zelllinie A549, sowie an den nicht-Lungen-Zelllinien HEK293T und HepG2 angewandt. Zu den Optimierungen zählt die Verwendung von Ethanol-Präzipitation, gefolgt von einem Säulen-basierten Kit für die Aufreinigung der 3C Bibliothek sowie die Verwendung einer neu hergestellten Kontroll-Bibliothek. 1 % Formaldehyde wurde zur Fixierung verwendet, für den Bibliothek-Verdau wurde HindIII herangezogen. Für die Messung der relativen Interaktionsfrequenz der seDMR mit den Ziel-Stellen auf centromerer Seite wurde eine quantitative RT-PCR durchgeführt.

Die 3C Ergebnisse zeigten in allen drei Zelllinien sowohl eine Interaktion der seDMR mit einer intergenen Stelle zwischen den protein-codierenden Gen IRF2BP2 und einer langen nicht-codierenden RNA (lncRNA) LINC00184, als auch mit einer intragenen Stelle, genauer gesagt, mit einer Region auf Intron 4 von TARBP1. In silico Analysen (basierend auf Daten von Chromatin Immunpräzipitation gefolgt von Sequenzierung, ChIP-seq ) zeigten nahe an den gemessenen Interaktionsstellen der seDMR, sowie an der seDMR selbst eine Anhäufung von Bindestellen für die strukturierenden Transkriptionsfaktoren (TFs) Ying Yang 1 (YY1) und CCCTC-binde Faktor (CTCF).

Die 3C Resultate wiedersprechen früheren Erkenntnissen von Luciferase Assays, da die Lungenspezifität der seDMR nicht bestätigt werden konnte. Jedoch wurden bisher nur relative Interkationen der seDMR bestimmt und es bedarf weiterer Analysen, um auch einen quantitativen

2

Zusammenfassung

Vergleich von Experimenten mit unterschiedlichen Zelllinien zu ermöglichen. Weiters konnte das für einen Enhancer typische Interaktionsmuster, nämlich die Enhancer-Promoter-Schleife, nicht nachgewiesen werden. Es besteht folglich noch der Bedarf nach weiteren Untersuchungen, um die Funktion jener unerwarteten Interaktionen, nämlich mit einem Gen-Körper und einer intergenen Stelle, weit von einem Promoter entfernt, zu bestimmen. Um eine zusätzliche Kontroll-Region sowie die Verwendung eines anderen Restriktions-Enzyms implementieren zu können, ist eine entsprechende Anpassung der 3C Anwendung notwendig. Dies würde eine präzisere Messung der absoluten Interaktionsfrequenz der seDMR mit ihren Zielstellen ermöglichen.

Zusammenfassend konnte mit der Anwendung der 3C-Methode an drei verschiedenen Zelllinien gezeigt werden, dass die seDMR sowohl mit Intron 4 des TARBP1-Gens, als auch mit einer intergenen Region zwischen IRF2BP2 und LINC00184 interagiert und weitere Funktionsanalysen könnten tiefere Einblicke bezüglich möglicher Folgen dieser Enhancer-Interaktionen in Bezug auf die Entstehung beziehungsweise auf das Fortschreiten von (Lungen-) Krebs.

3

Table of Content

Table of Content Title ...... 1 Abstract ...... 0 Zusammenfassung ...... 2 Table of Content ...... 4 List of abbreviations ...... 7 List of figures ...... 9 List of tables ...... 10 Introduction...... 11 1.1 Epigenetics ...... 11 1.1.1 Overview ...... 11 1.1.2 DNA methylation ...... 11 1.1.3 Histone modifications ...... 14 1.1.4 Non-coding ...... 15 1.2 Enhancers ...... 17 1.3 Previous work ...... 20 1.3.1 Exposure to environmental carcinogens has been shown to alter methylation patterns 20 1.3.2 sDMRs were identified in smoking discordant twins ...... 21 1.3.3 3C for functional analysis of the seDMR ...... 22 1.4 Chromosome conformation capture (3C) ...... 22 1.4.1 Principle...... 23 1.4.2 Controls ...... 24 1.4.3 Other methods based on 3C...... 26 1.5 Hypothesis and aims ...... 27 2 Material and Methods...... 28 2.1 Equipment/Consumables...... 28 2.2 Chemicals/Enzymes ...... 30 2.3 Kits ...... 31 2.4 Buffers/Solutions ...... 31 2.5 Cell lines and media ...... 32 2.6 Cell culture ...... 32 2.6.1 Cell line handling...... 32 2.7 Bioinformatical Analysis ...... 33 2.7.1 Pre-experimental bioinformatical analysis ...... 33 2.7.1.1 Search for enhancer predictions of seDMR in silico ...... 33

4

Table of Content

2.7.1.2 Comparison of enhancer predictions of seDMR between cell lines in silico ...... 34 2.7.2 Post-experimental bioinformatical analysis ...... 34 2.8 mRNA expression analysis ...... 34 2.9 Chromosome conformation capture ...... 37 2.9.1 Preparation of control library for 3C method ...... 37 2.9.1.1 BAC isolation from host strain and insert verification ...... 37 2.9.1.2 Digestion of BAC control library...... 39 2.9.1.3 Ligation of BAC control library ...... 39 2.9.1.4 Preparation of BAC control library dilutions ...... 40 2.9.2 3C approach...... 42 2.9.2.1 Single cell preparation and formaldehyde crosslinking ...... 42 2.9.2.2 Cell lysis ...... 42 2.9.2.3 Restriction enzyme digestion ...... 42 2.9.2.4 Re-Ligation/De-crosslinking ...... 43 2.9.2.5 DNA purification ...... 43 2.9.2.6 Determination of digestion efficiency ...... 44 2.9.2.7 Purity assessment ...... 46 2.9.2.8 Real-time PCR for interaction quantification...... 47 3 Results ...... 50 3.1 seDMR in silico ...... 50 3.1.1 seDMR is predicted as an enhancer region in silico...... 50 3.1.2 HEK293T and HepG2 are promising cell lines for a comparison of the interaction pattern of the seDMR via 3C ...... 52 3.2 mRNA expression of seDMR adjacent genes is not affected upon HMA treatment ...... 54 3.3 Optimization of 3C experiment in A549 ...... 57 3.4 Replication of previous 3C experiment on seDMR in A549 ...... 62 3.5 3C experiment in two non-lung cell lines ...... 64 3.5.1 3C of HEK293T and HepG2 ...... 64 3.6 Comparison of interaction frequency patterns ...... 66 3.7 Examination of the interaction partners of the seDMR detected by 3C ...... 67 4 Discussion ...... 72 4.1 SeDMR’s enhancer prediction in silico ...... 73 4.2 Selection of non-lung cell lines for 3C ...... 74 4.3 Hypomethylation of the seDMR does not affect mRNA expression of adjacent genes ...... 75 4.4 Optimization of 3C protocol now allows application in a variety of cell lines ...... 76 4.5 Interaction patterns of seDMR assessed by 3C in different cell lines ...... 78

5

Table of Content

4.5.1 3C in A549 – a replication of 3C after protocol optimization ...... 78 4.5.2 Comparison of the interaction pattern between lung and non-lung cell lines ...... 79 4.6 Evaluation of the detected interaction partners of the seDMR ...... 80 4.6.1 Reliability of the generated data ...... 80 4.6.2 In silico analysis of the detected interaction sites of seDMR ...... 81 4.6.3 TARBP1 – the putative target gene of the seDMR...... 82 4.7 Outlook ...... 83 5 References ...... 85 6 Appendix ...... 97 6.1 Bioinformatical Analysis ...... 97 6.2 mRNA expression analysis ...... 98 6.3 Supplementary data obtained by 3C application ...... 102 6.4 SOP for 3C ...... 106 7 Acknowledgements ...... 113

6

List of abbreviations

List of abbreviations

% Percent °C Degree Celsius µl Microliter µM Micromolar 3C Chromosome conformation capture 4C Circular chromosome conformation capture 5C Chromosome conformation capture carbon copy 5caC 5-Carboxyl-cytosine 5fC 5-Formyl-cytosine 5hmC 5-Hydroxymethyl-cytosine 5mC 5-Methyl-cytosine AML Acute myeloid leukemia ARID4B AT-rich interaction domain 4B ATP Adenosine triphosphate AZA 5-Aza-cytidine BAC Bacterial artificial chromosome bp Base pairs cDNA Complementary deoxyribonucleic acid ChIP Chromatin immunoprecipitation ChIP-seq Chromatin immunoprecipitation followed by sequencing cm2 Square centimeter COA6 Cytochrome C oxidase assembly factor 6 CpG Cytosine linked via phosphor-backbone to guanine Ct Threshold cycle CTCF CCCTC-binding factor DAC 5-aza-2′-deoxycytidine, Decitabine DMSO Dimethylsulfoxide DNA Deoxyribonucleic acid DNMT DNA methyltransferase DTT Dithiothreitol e.g. "exempli gratia", latin for "for example" eDMR Differentially methylated enhancer region EDTA Ethylenediaminetetraacetic acid EGTA Glycol ether diamine tetraacetic eRNA Enhancer RNA et al. "et alii", latin for "and others" etc. "et cetera", latin for "and other similar things" FBS Fetal bovine serum g Gravitational constant g Gram gDNA Genomic DNA GGPS1 Geranylgeranyl diphosphate synthase 1 h Hour 7

List of abbreviations

HAT Histone acetyltransferase HDAC Histone deacetylase HDM Histone demethylase HMA Hypomethylating agent HMT Histone methyltransferase IRF2BP2 Interferon regulatory factor 2 binding protein 2 kb Kilobases (1,000 bases) KCl Potassium chloride l Liter LB-Medium Lennox broth - Medium lncRNA long non-coding RNA LUAD Lung adenocarcinoma LUSC Lung squamous cell carcinoma M Molar Mb Mega bases (1,000,000 bases) MBD Methyl-CpG binding domain MDS Myelodysplastic syndrome MeDIP-seq Methylated DNA immunoprecipitation followed by sequencing min Minute miRNA Micro RNA ml Milliliter mM Millimolar mRNA Messenger RNA ng Nanogram NGS Next-generation sequencing nm Nanometer no. Number NSCLC Non-small cell lung cancer nt Nucleotide o/n Over night OD Optical density PBS Phosphate buffered saline PCR Polymerase chain reaction PIC Pre-initiation complex PK Proteinase K qRT-PCR Quantitative Real-Time PCR RBM34 RNA binding motif protein 34 RISC RNA-induced silencing complex RLC RISC-loading complex RNA Ribonucleic acid ROI Region of interest rpm Rounds per minute rRNA Ribosomal RNA RT Room temperature RT-PCR Real-time polymerase chain reaction

8

List of figures

SAM S-adenosyl-L-homocysteine SCLC Small cell lung cancer sDMR Smoking-induced differentially methylated region SDS Dodecyl sodium sulfate sec Seconds seDMR Smoking-induced differentially methylated enhancer region seq Sequencing SINE Short interspaced nuclear elements siRNA small interfering RNA TARBP1 TAR (HIV-1) RNA binding protein 1 TBCE Tubulin folding cofactor E TE-buffer Tris-EDTA buffer TET Ten eleven translocation protein TF Transcription factor

Tm Melting temperature TOMM20 Translocase of outer mitochondrial membrane 20 Tris 2-Amino-2-(hydroxymethyl)-1,3-propanediol tRNA Transfer RNA U Units V Volt vs. "versus", latin for "against" YY1 Ying yang 1

List of figures Figure 1| Overview over the region on chromosome 1 containing the seDMR ...... 22 Figure 2| Principle of 3C method...... 24 Figure 3|Overview of genomic region covered by control library...... 41 Figure 4| Schematic representation of test primer and internal primer location on the DNA ...... 45 Figure 5| Overview of chromatin state annotations at the seDMR obtained from Broad ChromHMM track...... 51 Figure 6| Alignment of enhancer regions predicted by FANTOM5 with the region around the seDMR ...... 51 Figure 7| Results of mRNA expression analysis in A549 and H1299 ...... 56 Figure 8| Amplification curves from 3C library dilutions obtained from LightCycler®96 software...... 58 Figure 9| Overview of slope values obtained from BAC control library standard curves...... 61 Figure 10| Overview of intercept values obtained from BAC control library standard curves...... 61 Figure 11|Results from 3C experiment on seDMR in A549...... 63 Figure 12| Results from 3C experiment of seDMR in HEK293T...... 65 Figure 13| Results from 3C experiment of seDMR in HepG2...... 66 Figure 14|Comparison of the seDMR's interaction pattern between three different cell lines...... 67 Figure 15| Chromatin state annotations for fragment 7 from Broad ChromHMM track...... 69 Figure 16| Schematic illustration of TF-binding sites of YY1 and CTCF on fragment 7 ...... 69 Figure 17| Chromatin state annotations from Broad ChromHMM track for fragment 17 and it's genomic position...... 69 Figure 18| Schematic illustration of TF-binding sites on fragment 17 for YY1 and CTCF...... 70

9

List of tables

Figure 19| Schematic illustration of the distribution of TF-binding sites of YY1 and CTC within the anchor fragment...... 70 Figure 20| Schematic illustration of TF-binding sites of YY1 and CTCF at ROI...... 71

List of tables Table 1| Equipment used for the master thesis ...... 28 Table 2| Consumables used for the master thesis ...... 29 Table 3| Chemicals and enzymes used for the master thesis ...... 30 Table 4| List of Kits used in the Master Thesis ...... 31 Table 5| List of cell lines used for Master Thesis...... 32 Table 6| List of media used for the Master Thesis ...... 32 Table 7| BAC clones used for control library preparation ...... 32 Table 8| cDNA samples generated by S. Rauscher (2017) used for mRNA expression analysis ...... 35 Table 9| List of primers used for mRNA expression analysis ...... 36 Table 10|qRT-PCR program for mRNA expression analysis ...... 36 Table 11| PCR program for BAC insert testing ...... 39 Table 12|qRT-PCR program for 3C ...... 40 Table 13| List of test primers (green) and internal primers (orange) used for assessment of digestion efficiency ...... 45 Table 14| List of primer pairs used for assessment of interaction frequency of seDMR to putative target site(s) ...... 48 Table 15| Comparison of the abundance of selected histone modifications between NHLF and HepG2 at the seDMR ...... 52 Table 16| Enhancer expression data derived from CAGE data of the FANTOM5 database ...... 53 Table 17| qRT-PCR data from purity assessment of 3C libraries...... 59

10

Introduction

Introduction

1.1 Epigenetics

1.1.1 Overview

In the last decades a great effort was made in order to find explanations for the observation that during cellular development phenotypic changes can be observed without detecting alterations in the cell’s genotype [1-3]. Hence, a new research field arose and Conrad Waddington, an embryologist and geneticist, was the first, who introduced the term epigenetics in the early 1940s [4, 5]. The term “epi” stands for “over” and in this case refers to a regulator sitting over genetics. Thus, Waddington hypothesized, that beside the genotype there must be an additional network of processes that generates this high plasticity of the phenotype, e.g. for cell type differentiation [4, 6].

Decades of research have shed light on some processes causing these previously inexplicable phenotypic alterations [1, 2, 7]. It is now known that DNA methylation, histone modifications and non- coding RNAs are the main tools of the epigenetic machinery by which heritable but reversible changes in gene regulation are induced without altering the cell’s DNA sequence [8, 9]. Thus, this machinery provides genomic plasticity and opens the possibility of cells individually adapting to developmental or environmental changes, despite all of them sharing the same DNA sequence. Therefore, it can be postulated that the epigenome is much more variable than the genome [9, 10]. Certain environmental cues as well genomic mutations can cause aberrant epigenetic regulation and consequently facilitate disease initiation and progression [7, 11, 12]. Then again, these injurious alterations in the epigenome might serve as useful biomarkers for early diagnostics in clinics and due to the fact that epigenetic regulation is reversible, great effort is made on using this feature for therapy [1, 13].

1.1.2 DNA methylation

DNA methylation is one of the best studied epigenetic modifications. It’s a chemical modification, which can be found almost exclusively on cytosines adjacent to a guanine, so in a CpG context of DNA bases. The alteration takes place on the 5th carbon of the cytosine’s pyrimidine ring to which a methyl group gets covalently linked and is therefore named 5mC [10, 14]. In eukaryotes there are two main classes of methyltransferases, which both use S-adenosine-methionine (SAM) as substrate in order to induce this modification. DNA methyltransferase 1 (DNMT1) is a maintenance DNMT which, as the name indicates, maintains already existing methylation patterns in the course of cell division by methylation of hemi-methylated DNA strands resulting from replication. Thus, maintenance DNMTs are ubiquitously expressed and provide heritability of this epigenetic modification during mitosis and

11

Introduction meiosis [15, 16]. DNMT3A and DNMT3B are generally called de-novo methylases since they evoke new methylation marks, and therefore they play a crucial role in early development or adaptation to changed environmental cues. However, Jones et al. points out the evidence that DNMT3A and DNMT3B are also involved in maintaining methylation patterns [16, 17]. The ten-eleven-translocation family (TET), which actively demethylates 5mCs stepwise to 5-hydroxymethyl-cytosine (5hmC), then to 5-formyl-cytosine (5fC) and finally to 5-carboxyl-cytosine (5caC) [18, 19], constitutes a counterplayer of DNMTs .

As mentioned above, DNA methylation in general takes place in a CpG context. These CpGs are ubiquitously, but not randomly, distributed throughout the genome [20-22]. Throughout evolution, a huge amount of methylated cytosines has been changed to thymines by deamination, thus CpGs generally are underrepresented in the . Nevertheless, a majority of ~ 80 %, out of 28 million CpGs in total, are methylated and can be found at repetitive elements like short interspaced nuclear elements (SINEs) and satellite DNA repeats, which are not too affected by this evolutionary intervention [14, 23-27]. However, particularly densely clustered regions of CpG sites spanning over several hundred bases, termed CpG islands, are mainly unmethylated and also seem not to be affected by this evolutionary effect. Interestingly, approximately 15 % of all CpG islands are co-localized with around 70 % of known gene promoters and embody a huge hub for DNA methylation, thus representing an attractive site for gene regulation [24, 28, 29].

Promoter DNA methylation has a silencing effect on gene transcription, which in general can be explained by two main mechanisms. First, by a shielding effect of the covalently bound methyl groups on DNA, which physically hinder binding of TFs and the basal transcriptional machinery. Second, due to the fact that DNA methylation is recognized by certain methyl-CpG binding domain (MBD) proteins (like MBD1 to 4), that attract chromatin modifying enzymes, which have the ability to lower accessibility of DNA and therefore inhibit transcription [1, 23, 30].

Facing the facts mentioned above it is obvious that DNA methylation has a huge variety of functions. In general, it can be postulated that this epigenetic modification regulates an individual’s development by directing patterns and therefore ensures genomic stability. Especially methylation of endogenous retroviruses represents an important tool for providing this stability [31, 32].

One example to emphasize the importance of DNA methylation in vivo represents the inactivation of the X-chromosome, which is observed in females during early development. Beside other epigenetic modifications, DNA methylation is involved in equalizing the expression of x-linked genes by methylation of promoter CpGs at one of the two x . Especially this epigenetic modification plays a crucial role in maintaining this inactivated state over a female’s lifetime [33].

12

Introduction

Furthermore, genomic imprinting is an observed phenomenon, which is also regulated by DNA methylation. Namely, it can be observed that at imprinted loci either the maternal or paternal allele of a gene is transcribed, in contrast to “usual” loci where both alleles are expressed equally. This allele- specific expression pattern can be explained due to differential methylation of the respective alleles. Loss of imprinting at certain genes was found to be tightly connected to tumorigenesis and once more emphasizes the importance of a strictly regulated and functional epigenetic machinery [21, 31]. Besides providing an important regulation tool during the very early development, DNA methylation also plays a crucial role throughout the whole human being’s life by protecting the genome’s integrity [34]. Therefore, it is important to mention that our lifestyle, meaning our diet, physical activity as well as drinking and smoking behaviour modulates and, in some cases, might have adverse consequences on our epigenetic machinery. Especially, cigarette smoke was shown to alter DNA methylation patterns that might play a role in inducing malignant transformation of cells [34-36].

Facing these facts it is obvious that DNA methylation, beside somatic mutations, represents an important co-player for cancer development and progression, since aberrant methylation, especially at promoter CpGs, can induce the activation of oncogenes, or vice versa, the inactivation of tumour suppressor genes [23, 37].

Hence, huge effort was made in developing a therapy to reverse DNA modifications caused by a dysregulated epigenetic machinery. Hypomethylating agents (HMAs) like azacytidine (AZA) and decitabine (DAC) are epigenetic drugs already approved for treating myelodysplastic syndrome (MDS) or acute myeloid leukaemia (AML) [38]. These drugs inhibit DNMT1, thus they prevent maintenance of DNA methylation which results in reduced methylation levels. The mechanism behind this hypomethylating function is that AZA and DAC are structural analogues of pyrimidine. Consequently, upon administration of the drug, these equivalents are incorporated, after phosphorylation by cytosolic phosphatases, instead of cytosine during replication of DNA [39]. While DAC directly binds to the newly synthesized DNA strand, AZA undergoes deoxidation to DAC before incorporation. Since DAC carries a nitrogen at the 5th position of the cytosine’s pyrimidine ring, exactly the place where usually DNMTs are supposed to covalently link the methyl group to DNA, methylation cannot occur and additionally causes a pausing of the DNMT [38, 40, 41]. Thus, HMAs are applied to passively reduce an aberrantly high methylation status in MDS and AML patients with the intention to reconstitute a normal methylation level. However, it must be pointed out that this kind of treatment is not targeted, thus global hypomethylation must be accepted [38]. In view of this, it is not surprising that now great effort is made in developing new strategies and methods to apply targeted demethylation treatments in an attempt to overcome this putative adverse event [42].

13

Introduction

1.1.3 Histone modifications

The ability of the epigenetic machinery to induce post-translational histone modifications represents another layer of regulation, which provides the genotype more flexibility and adaptability. This type of epigenetic modification frequently takes place at distinct amino acid residues at the N-terminal tail of histone protein variants, which is flexible and accessible due to sticking out of the histone cores. The four core histones are H3, H4, H2A and H2B, which likely form homodimers and finally all together build a histone octamer. This octamer, together with the DNA which is wrapped around it is called nucleosome, and constitutes the basic unit of chromatin [43]. Chromatin describes the state of DNA being wrapped in 1.65 turns (=147 bp) around nucleosomes, which themselves are further wrapped around each other in a way to form coiled chromatin fibres. By using this packaging technique, DNA is present in a condensed state [9, 43, 44]. In general, one differentiates between two main stages of DNA condensation. On the one hand there is a status called euchromatin, describing a moderate condensation of chromatin providing diverse DNA binding factors like transcription factors (TFs) or the transcription machinery access to the DNA sequence, thus affording active gene transcription. In contrast to this, a tight condensed state of chromatin, termed heterochromatin, represents a silenced state in which DNA is not accessible for the transcription machinery or any other factors [43-45]. Consequently, depending on cell type, differentiation status or environmental cues etc. a permanent flux between those two chromatin states can be observed which allows a dynamic gene expression. This shift from a condensed chromatin state to the more loosely one is mediated by the epigenetic machinery, more precisely by post-translational histone modifications and chromatin remodelling complexes [9, 44, 45].

Histone acetylation, methylation, phosphorylation, ubiquitinylation and sumoylation are well known post-translational modifications occurring mainly on the N-terminal tail of histones. Depending on type, locus and combination of this modifications different functional changes are achieved, the most studied ones are described below [9].

Histone acetylation is mediated by histone acetyltransferases (HATs) prevalently at lysine residues and is a powerful tool for decondensing chromatin, since the covalently linked acetyl group neutralizes the pre-existing charge of the amino acid. Thus, this type of modification in general induces a shift towards a more opened chromatin status and correlates with transcriptional activation on DNA level at this region. As epigenetic modifications are known to be reversible, histone deacetylases (HDACs) represent the antagonist to HATs and therefore have a repressing or chromatin compacting function, respectively. Among many others, acetylation of lysine 27 at histone 3 (H3K27ac) represents a prominent modification leading to an open chromatin state and favouring DNA transcription [9, 45]. Additionally, there is strong evidence, mainly gained from whole genome chromatin- 14

Introduction immunoprecipitation followed by sequencing (ChIP-seq) analyses, that this histone modification co- localizes with enhancer regions on DNA and therefore it is used as a marker for this regulatory element [46, 47].

In contrast to histone acetylation, methylation of histones has a bivalent effect, namely activation or repression, depending on the lysine residue on which this modification takes place and on the degree of methylation (mono-, di- or trimethylation). Histone methylation occurs most frequently on lysine and arginine residues in H3 and H4 and is mediated by histone methyltransferases (HMTs) whereas demethylation is performed by histone demethylases (HDM) [48]. For example, tri-methylation on H3K27 represents a prominent repressive mark for transcription while mono-methylation at H3K4 correlates with active transcription [9, 45].

In general, the ratio between the activity of enzymes, which add a certain histone modification, termed writers, and those whose remove them, called erasers, regulate whether DNA expression is repressed or activated [9, 45, 49]. Thus, histone modifications are an important tool of the epigenetic machinery, which provides plasticity to cells regarding their requiremens of flexible transcription regulation. Due to the fact that certain modifications are known to colocalize with specific regulatory elements it has become popular to use these marks for their identification [9].

1.1.4 Non-coding RNAs

Almost two decades ago, non-protein-coding DNA was branded as junk DNA but nowadays it is known that this fraction of DNA, making up about 99 % of the human genome, is anything but junk. Within several years of research accompanied by vast progress of analytical techniques and machines it become clear that huge parts of non-protein-coding DNA sequences are transcribed into various kinds of RNAs carrying a broad range of functions [50-52].

Beside ribosomal RNA (rRNA) and transfer RNA (tRNA), both embodying infrastructural RNAs involved in messenger RNA (mRNA) translation, microRNA (miRNA), small interfering RNA (siRNA), long noncoding RNA (lncRNA), and enhancer RNA (eRNA) are, among others, known as regulatory RNAs, which will be described in more detail in the following.

MiRNAs as well as siRNAs are representatives of small RNAs with a length between 18 to 25 bp and contribute to post-transcriptional gene expression regulation [53]. While siRNAs can be found throughout the eukaryotes, miRNAs are less prevalent. SiRNAs undergo a pathway of processing originating from a long dsRNA precursor molecule, which gets shortened by Dicer, a member of the RNase III family, and becomes a single stranded RNA by incorporation into an RNA-induced silencing complex (RISC) where one siRNA remains attached to this protein complex and functions as a guide

15

Introduction

RNA targeting complementary mRNAs. In case of interaction of a siRNA with its target mRNA cleavage or complete degradation of the latter RNA takes place, thus repressing subsequent mRNA translation. Additionally, siRNAs were found to guide epigenetic regulators as HMT or DNMTs, and thus, causing transcriptional repression [54].

Similar to siRNAs, miRNAs undergo several steps of processing, in this case starting with coding or non- coding RNA from so-called miRNA genes, termed pri-miRNA, representing a hairpin-loop RNA instead of a long dsRNA. With help of Drosha, an RNase, and Pasha, an RNA binding protein, the hairpin gets removed resulting in a pre-miRNA, which is transported into the cytoplasm where Dicer further processes the RNA. This molecule then gets incorporated by the RISC-loading complex (RLC) causing the release of a passenger miRNA while the counterpart of this miRNA, similarly to siRNAs, remains in the complex, and forms together with the RLC the mature RISC with its ability to capture target mRNAs. High complementarity of the miRNA to the mRNA causes cleavage, so destruction of the mRNA and therefore prevents translation of the corresponding mRNA into a functional protein. In contrast, low complementarity leads to post-transcriptional repression of the respective mRNA. Among others, also epigenetic modifiers as HDACs were found to be regulated by miRNAs. Nevertheless, it is known that some miRNAs are also regulated epigenetically, but certain pathways enabling this miRNA regulation by the epigenetic machinery are largely unknown [53, 54].

LncRNAs are defined as non-protein-coding transcripts, usually transcribed by RNA polymerase II, with a minimal length of 200 nt. Since this highly diverse group of RNA consolidates the majority of genomic transcription products, great interest arose in finding its concept of function. Now it is known, that only a very small, but constantly growing, list of this transcribed RNA molecules represents crucial players of gene regulation. [51, 55-58]. LncRNAs are known to be deregulated in cancers. Certain species of these RNA molecules locate to the nucleus and can induce epigenetic changes favouring cancer development by guiding epigenetic regulators to their target site affecting or altering, respectively, gene transcription, chromatin folding and nucleic organization. However, also DNA methylation can be regulated by these long RNA molecules, via recruitment of DNMTs or TETs to specific sites. This kind of epigenetic regulation evoked by lncRNAs is known to be aberrantly regulated in diverse cancer entities [59]. In general, lncRNAs can act in cis, meaning on the same chromosome in order to regulate local gene transcription, as well as in trans, so on any other chromosome. Cis- acting lncRNAs show quite low abundance with only few copies per cell, thus this type might often be missed due to the low copy number. It could be shown, that lncRNA often cluster with mammalian imprinted genes, suggesting that these molecules contribute to imprinting pathways. Therefore, simple lncRNA transcription, occurring on the antisense strand of DNA but overlapping a promoter region of a gene might interfere with gene transcription and reduce the availability of the

16

Introduction corresponding gene product. In this case the way of gene regulation by a lncRNA is sequence- independent [51, 58-61]. eRNAs represent a subgroup of lncRNAs contributing to transcriptional activation of genes. The finding, that many enhancers can recruit a RNA polymerase II and subsequently transcribe partially their own sequence into RNA gave hints towards a new group of RNAs, now termed eRNA. In contrast to lncRNAs, which usually show H3K4me3 marks on their promoter, eRNAs can be transcribed without this epigenetic mark. Additionally, while lncRNAs are in general transcribed unidirectionally, eRNA frequently show a bidirectional transcription pattern and are usually not processed or polyadenylated, in contrast to lncRNAs. Thus, this distinct RNA molecule has only a very limited lifetime, since they are rapidly degraded by exosomes. However, transcription of eRNA was found to correlate with its enhancer activity and a knockdown of certain eRNAs was shown to affect gene expression of adjacent genes. Nevertheless, the whole process of transcriptional activation by this type of RNA is still not completely understood but they might support enhancer looping, described in 1.2.. It also might be that eRNAs are only by-products due recruitment of RNA polymerase II by enhancers [62-65].

1.2 Enhancers

Enhancers are defined as distal regulatory DNA sequences, spanning few hundred base pairs, which have the ability to increase the expression of their target gene(s). Thus, enhancers provide another instrument for regulation of gene expression and contribute to the enormous complexity of the genome. There is evidence for the existence of several hundred thousand enhancers spread throughout the whole mammalian genome [66, 67]. These regulatory sequences can be found at intergenic regions as well as within introns of genes. Interestingly, enhancers do not necessarily regulate their adjacent gene but modulate expression of genes, located up to 1 Mb away irrespective of the enhancer’s orientation. Nevertheless, it was found that only small subsets of this vast number of enhancers are actually active in a certain cell type or that they are restricted to a distinct stage of development. Thus, enhancers govern gene expression in a cell-type specific and also signal- dependent manner and therefore contribute to a highly flexible and versatile transcriptional program. Thereby enhancers are known to be able to regulate the expression of multiple genes simultaneously. Similar to promoters, these regulatory sequences contain binding hubs for diverse TFs contributing to enhancer activation as well as to enhanced target gene activation[66, 68-70].

Enhancer activation means, that an enhancer has to undergo several steps before turning from an inactive enhancer into a primed, poised and finally active enhancer. First local chromatin must be adjusted in a way to make the enhancer sequence accessible for diverse TFs. Subsequent binding of sequence-specific and signal-dependent TFs induces nucleosome remodelling causing open and

17

Introduction accessible chromatin resulting in a change from an inactive towards a primed enhancer. Additionally, hypomethylation of CpGs covering the enhancer sequence can be observed during the activation process. Activation of primed enhancers occurs by additional binding of co-activator proteins and/or by signal-dependent activation. In absence of this activation signals a primed enhancer turns into a poised enhancer, which is marked with repressive histone marks as H3K27me3 [68]. In the course of development accompanied by cellular differentiation a highly dynamic transition between these different enhancer states can be observed indicating that the activity of distinct enhancer subsets is restricted temporally and spatially [66, 70].

In general, histone modifications contribute to enhancer establishment and activation. Since certain types of this epigenetic modification are known to correlate with a distinct enhancer type, these marks can be utilized for enhancer identification [69, 71]. Modern high-throughput techniques such as ChIP- seq revealed that H3K4me1 and H3K27ac are the most frequent histone modifications labelling active enhancers, while enhancers carrying the repressive mark H3K27me3 are characterized as poised enhancers. However, there are also marks like H3K4me2 or H3K4me3 that were found on enhancers as well as on promoter annotated regions, the latter modification actually is, among other modifications, utilized for promoter identification. [65, 70-72].

In the course of the identification of enhancer elements throughout the genome by novel high- throughput techniques it was found that at certain regions enhancers occur in clusters, showing dense occupancy by master TFs and the mediator coactivator complex. These enhancer regions were subsequently subclassified as super-enhancers and differ, among others, in size and composition of TF recognition sites and in level of H3K27 acetylation from normal enhancers. Further research revealed that these large enhancer domains predominantly target genes associated with embryonic stem cells biology or key cell-type specific genes. These genes, in general, show higher expression rates than those associated with conventional enhancers. Thus, super-enhancers provide quite a powerful regulatory tool for maintaining expression programs shaping distinct cell identities [73, 74]. In a vast amount of cancer-types, including lung cancer, super-enhancers have been identified to be oncogenic. The transformation from a normal super-enhancer towards an oncogenic one is usually preceded by genetic mutations within the super-enhancer. Focal amplification of a super-enhancer was observed 3’ to the MYC oncogene and was found to contribute to overexpression of the corresponding gene in lung adenocarcinoma cell lines [75, 76]. Additionally, Yuan et al. introduced a super-enhancer associated to the oncogene RAI14, as a new potential biomarker in lung adenocarcinoma [77].

As mentioned above, enhancers have the ability to increase gene expression of their target gene(s), often localized several kilobases distant. To provide this function, the spatial organization of the genome, as well as the condensation state of the chromatin plays a crucial role. An enhancer-promoter 18

Introduction loop is formed by these two regulatory elements supported by several factors, resulting in a spatial organization of the DNA that brings the enhancer sequence into close physical proximity (<200 nm) with the promoter region of the respective gene [66, 78]. However, active loop formation of the chromatin would require vast amount of energy and thus might not be practicable. Hence, it might be more likely that during constant movement of the chromatin a loop, achieved by random encounter of enhancer and promoter, is transiently stabilized by bound proteins, as for instance YY1 and CTCF proteins or cohesin, that might form dimers or simply attract each other. [66, 79, 80]. Beside the loop- formation process also the concrete pathway that leads to enhanced transcription is not fully understood, but it is thought that formation of the loop might cause an enrichment of TFs and the presence of the enhancer might favour pre-initiation complex (PIC) formation, which is needed for gene transcription. Additionally, there are hints that enhancers might recruit further TFs, co-activators and the RNA polymerase II, thus promoting transcription [66]. The fact that enhancers recruit or attract RNA polymerase II is emphasized by the finding that some of these distal regulatory elements get actively transcribed by this enzyme while enhancing target gene expression. As described in 1.2 the resulting eRNAs might contribute to target gene expression regulation [62, 64].

Facing the fact that enhancers dynamically shape temporal as well as spatial gene expression, it is obvious that dysregulation of these regulatory elements can have crucial consequences. Beside somatic mutations, which might cause aberrant enhancer function or loss of function, a deficient epigenetic regulation machinery also might induce dysregulation of enhancers. [12, 74]. Among others, Bell et al. investigated epigenomic changes that occur in the course of cancer development [13]. He reveals that especially enhancer regions are the most frequent differentially methylated regions in tumour tissue compared to normal. Additionally, a trend towards hypomethylation of these enhancers was observed in tumours (e.g. in lung squamous cell carcinoma, LUSC or breast carcinoma, BRCA), in contrast to promoter regions, which are usually hypermethylated in cancer tissue. Thus, Bell et al. hypothesize that aberrant methylation patterns, also affecting enhancers, might promote cancer progression by providing plasticity to cancer cells [13]. Beside differentially methylated enhancer regions (eDMRs), dysregulation of the epigenetic machinery, affecting writers and erasers of histone marks, is frequently observed during tumorigenesis, and might represent another pathway leading to aberrant enhancer activation. In general, tumour tissues show a trend towards activation of enhancers, especially of those that target growth-associated genes or those that shape cellular identity [74].

19

Introduction

1.3 Previous work

Worldwide, lung cancer is top of the list of all cancer types causing death and exhibits, beside breast cancer, the highest number of newly diagnosed cases in 2018 [81]. This cancer type is histologically divided into two main subgroups, namely into non-small-cell lung cancer (NSCLC), including lung squamous carcinoma (LUSC) and lung adenocarcinoma (LUAD), and into small-cell lung cancer (SCLC). The high mortality of lung cancer is attributed to frequently late diagnosis in already advanced cancer stages, resulting in poor five-year survival rates after common therapies like chemotherapy or immunotherapies of only about 15 % [82].

In the last decades big efforts have been made in developing new diagnostic methods that might allow earlier lung cancer detection in order to reduce the lethality of this disease. Extensive studies revealed that in the course of cancer progression not only genetic mutations accumulate but also continuous aberrations of the epigenome are acquired [82]. The Illumina Infinium HumanMethylation450K Bead Chip is a cost-efficient technology for high throughput methylome analyses and enables the genome- wide detection of epigenetic alterations potentially acquired by environmental cues like lifestyle including smoking-behaviour.

1.3.1 Exposure to environmental carcinogens has been shown to alter methylation patterns

Among others, environmental carcinogens were found to play a crucial role in inducing altered epigenetic patterns. One of these prevalent acquired epigenetic alterations observed in diverse cancer entities affects DNA methylation causing a variety of differentially methylated regions (DMRs), spread throughout the genome. Tobacco smoke represents a prominent environmental carcinogen. Facing the fact that tobacco smoke was found to induce alterations in the methylome, preferentially towards hypomethylation, it was of further interest whether a direct correlation with increased lung cancer risk can be observed [83, 84]. By applying the Illumina Infinium HumanMethylation450K Bead Chip on peripheral blood DNA, isolated from lung cancer cases and controls, respectively, a genome-wide methylation analysis was performed by Fasanelli et al. [83]. The analysis demonstrated that smoking- induced hypomethylation at certain CpG sites is associated with an increased risk for lung cancer [83]. Similar effects were also observed in adipose tissue, where smoking-induced DNA methylation was associated to metabolic disease risk traits [85]. These findings point out the fact that exposure to environmental carcinogens such as tobacco smoke crucially affects DNA methylation patterns in a variety of tissues.

20

Introduction

1.3.2 sDMRs were identified in smoking discordant twins

In order to find novel DNA biomarkers for lung cancer amongst smokers it was of further interest, whether cigarette-smokers as well as lung cancer patients share DMRs that cannot be found in non- smokers and healthy people, then termed sDMRs (smoking-induced differentially methylated regions). Therefore, M. Nair (2017) and S. Kollmann (2017) analysed MeDIP-seq data from 43 monozygotic twins with discordant smoking behaviour (data was previously generated from TwinsUK, King’s Colleague London, Bell Labs) [86, 87]. In contrast to the Infinium HumanMethylation450K Bead Chip, MeDIP-seq is a technique, which is not restricted to a selection of CpGs that can be analysed but enables a genome-wide methylome analysis for every single CpG [88]. The top 100 sDMRs (ranked by p-value) obtained from MeDIP-seq data analysis were further analysed by M. Nair (2017) using the online tool GREAT in order to predict potentially affected genes by these sDMRs as well as ingenuity pathway analysis was performed to identify their putative relation to lung cancer [87]. The same was performed by S. Kollmann (2017) but in this case an enlarged list of sDMRs with a p-value <0.001 was considered. In addition, he aimed to identify sDMRs covering enhancer annotated regions. Thus, he overlapped the enlarged sDMR list with a lung enhancer list provided by M. Laplana (DKFZ) resulting in 28 candidate sDMRs with potential enhancer function. In order to further restrict this candidate list, GREAT analysis was performed followed by a gene expression analysis of the sDMRs associated genes using the TCGA database. The sDMR showing the most significant change in expression between lung cancer tissue to normal tissue was chosen [86]. This sDMR spans over 500 bp on an intergenic region on chromosome 1q42.3 (Chr1:234857751-234858250, Human GRCh37/hg19) next to a lncRNA (LINC01132) on its telomeric site. The nearest genes are IRF2BP2, which is located ~ 110 kb on the seDMR’s centromeric side, and TOMM20 a protein-coding gene that can be found ~ 430 kb distant on telomeric side (see Figure 1).

In order to test the candidate sDMR’s potential to serve as a lung cancer biomarker amongst smokers methylation analysis, using a pyrosequencing method, was performed at the corresponding region using paired blood samples from a lung cancer case-control study from Heidelberg [86, 89]. Interestingly, one of the three analysed CpGs (out of 9 CpGs located within the sDMR) actually showed a significant change towards hypomethylation in lung cancer cases vs. controls [86]. Consequently, amongst smokers this candidate sDMR might have the potential to serve as a biomarker for lung cancer. Nevertheless, also functional analysis of the corresponding region had to be performed in vitro in order to obtain further knowledge of its mechanism and its putative regulatory role in lung cancer. These in vitro analyses included mRNA expression analysis of genes adjacent to the seDMR as well as luciferase reporter assays. In the latter a strong, methylation-dependent enhancer activity of seDMR was confirmed for the lung cancer cell line A549. Remarkably, this activity was much lower for a non-

21

Introduction lung cell line such as HEK293T, suggesting that seDMR might act as a lung specific enhancer [86, 90]. Using several databases and online tools (e.g. UCSC genome browser, 4C browser, etc.) the enhancer entity of the candidate sDMR could further be verified in silico by S. Häsler Gunnarsdóttir (2018) and thus in the following was termed seDMR (smoking-induced differentially methylated enhancer region) [90].

1.3.3 3C for functional analysis of the seDMR

There is growing evidence that the seDMR previously identified by S. Kollmann (2017) might play a role in the development and/or progress of lung cancer amongst smokers and thus might serve as a potential biomarker. However, it was indispensable to get further knowledge of the function of this putative enhancer region. Consequently, great effort was made by S. Häsler Gunnarsdóttir (2018) in establishing the 3C (chromosome conformation capture) method in this laboratory [90], since an application of this method would enable the identification of the interaction partners or target genes, respectively, of the seDMR. In addition, an application of this method in different lung-cancer and non-lung cell lines might further prove a putative lung specificity of the seDMR. A first test of the established 3C protocol revealed that further optimization is still required in order to investigate the function of the seDMR.

Figure 1| Overview over the region on chromosome 1 containing the seDMR SeDMR is labelled in red, while adjacent genes are indicated in grey boxes. A lncRNA is shown in beige. The grey and beige arrows depict the direction of expression. The approximate distances from the seDMR (red vertical line) to the promoter region of the surrounding genes are indicated.

1.4 Chromosome conformation capture (3C)

In the last decades huge technical progress has facilitated the detection of regulatory elements throughout the whole human genome, however, identifying the mechanisms by which these gene

22

Introduction regulators function is still challenging as well as the identification of their target(s) [91]. Chromosome conformation capture (3C) is one method to assess interactions of a regulator and its particular targets (see Figure 2), developed by Dekker et al. in yeast [92]. As the name already indicates, this approach allows the capture of chromosomal conformation, which underlies a constant flux in diverse cell lines and shapes cell-type specific transcription programs. Thus, 3C is a useful method for detection of enhancer-promoter loops leading to enhancer directed target gene expression enhancement [91-93].

1.4.1 Principle

In order to capture chromosome conformation, first, the chromatin of intact cell nuclei gets cross- linked by formaldehyde treatment, causing a fixation of DNA-protein and/or protein-protein interactions. Thus, genomic regions standing in close physical proximity to each other are permanently linked. Duration and dose of the treatment with formaldehyde determine the extent of interactions that can be assessed by the 3C method, and overly stringent fixation might lead to a loss of PCR signal during interaction assessment, described later [92, 94, 95].

Next, a restriction enzyme is used to digest the fixated DNA into fragments of approximately equal length [92]. Therefore, it is important consider that the choice of restriction enzyme determines the resolution gained for interaction assessment, which is specified by the frequency of occurrence of recognition motifs of the respective enzyme on the genomic sequence. Thus, for low resolution requirements 6-base cutters (as EcoRI and HindIII) are sufficient, while for higher demands, regarding interaction loci resolution, 4-base cutters (as DpnII and NlaIII) are recommended. Additionally, restriction enzymes generating cohesive ends on the digested fragments are preferred for this method, since these show higher efficiency in re-ligation than blunt ends [95, 96].

After digestion, ligation of the resulting DNA fragments is performed followed by reversal of fixation. In order to favour intramolecular ligation, i.e. ligation within cross-linked fragments, low DNA concentrations are used for the ligation reaction resulting in DNA fragments consisting of two contiguous genomic regions, which before might have been located several thousand base pairs distant from each other [92, 94].

The resulting 3C library then, after purification, can be used for the assessment of the interaction frequency between two distant loci, that have been in close physical proximity due to chromatin looping while formaldehyde fixation. Therefore, several primers, spread throughout the whole region of interest (ROI), have to be designed hybridizing uniquely within about 50 to 100 bp of a restriction site, thus amplifying across ligation junctions. Since all primer pairs will be applied within one PCR, the designed primers should exhibit almost equal melting temperatures (Tm) [94]. For measurement of the interaction frequency it is of interest how frequent two distinct genomic regions were cross-linked and 23

Introduction ligated with each other. Thus, in the case of analysing the interaction frequency of a certain region, as the seDMR, with its putative interaction sites, primers are paired for the testing in a way that one primer always represents the so-called anchor, in this case hybridizing on the seDMR next to a restriction site, while the second primer binds next to a restriction site spread consistently over the ROI. Unusually, both primers of one primer pair have the same orientation, so both amplify forward or reverse, ensuring that only head-to-head ligation products are amplified, and allowing only the amplification of one out of several potentially possible ligation products [95]. Using quantitative real- time PCR (qRT-PCR), the interaction frequency of the 3C library can be assessed, since only those regions, which were in close physical proximity during fixation are now adjacent to each other. [94, 95].

Figure 2| Principle of 3C method. A stepwise description of the 3C approach is depicted. Genomic fragments in green or light green represent promoters or gene bodies, respectively. Orange indicates an enhancer region. The scissors symbolize restriction enzyme recognition sites. In 4. the arrows indicate specific primers used for detection of ligation products via RT-PCR.

1.4.2 Controls

In order to avoid misinterpretation or biased analysis of interaction frequency data obtained by 3C several controls have to be considered and applied throughout the 3C approach. Dekker et al., the inventors of this method, therefore recommend the implementation of following controls [97]. 24

Introduction

Since during interaction frequency assessment via qRT-PCR different primer pairs are utilized in order to gain the interaction data of several distinct genomic loci, differences in their amplification efficiency have to be normalized. Therefore, a control library has to be generated that contains all possible ligation products, regarding the ROI, in equimolar amounts. For preparation of this control library BAC clones are commonly used, whose insert should cover the ROI. If the ROI spans a greater distance, it is also possible to use more than one BAC clone mixed in equimolar ratios. In case of using multiple BAC clones, the inserts should show only minimal overlapping or gaps [96, 97]. The BAC mixture then is treated exactly as the 3C library but in order to gain random ligation products the fixation step is skipped for the control library. With help of this control the differences in primer efficiency during interaction frequency assessment can be considered for data analysis and unbiased data interpretation is facilitated [96].

In addition to the correction for differences in primer efficiencies, non-functional, random interactions, that might also be measured, should be considered for data interpretation. Thus, it is recommended to determine the interaction frequency not only of those regions where interactions are expected, but also for other loci where no interaction is assumed. As a consequence, functional interactions cause a local peak in comparison to non-functional ones, since random interactions obviously occur with much lower frequency [97]. Nevertheless, it has to be considered that while there will be a high background signal for genomic regions close to the tested anchor region the random interaction frequency in general decreases with increasing distance between two loci [97].

The third control that should be implemented during 3C data analysis represents data normalization. This control serves as an instrument for comparing samples generated during different experimental replicates, since the measured interaction frequencies are determined in arbitrary units [97]. To enable a comparison between these replicates a so-called loading control is implemented during qRT- PCR by the usage of internal primers, amplifying a certain genomic region on the 3C library beyond any restriction site, so without containing a ligation junction. The amount of the gained PCR product of this amplification should be more or less equal for equal amounts 3C library template applied for the reaction. Thus, this control normalizes potential differences in 3C library template quantities [96, 97]. In addition to this normalization, it is also possible to normalize for differential sample preparation conditions as well as to compare results obtained with different cell types. Therefore, interaction patterns of a control region, for instance at a housekeeping gene, should be assessed the same way as for the ROI. It is important to choose a region that is known to show no or at least a constant interaction pattern. Then the resulting data from the control region can be used for normalization of interaction frequency data generated in different cell lines [96, 97]. However, especially for the comparison of interaction frequency data between different cell types also other methods are used [96-99].

25

Introduction

Beside the controls mentioned above, some control steps are additionally described, or explained in more detail in the material and method section.

1.4.3 Other methods based on 3C

Since 2002, the year in which Dekker et al. has published the first protocol to the 3C approach [92], this technology can look back on a continuous development. Great effort was made on the advancement of the 3C method towards a high-throughput method that can be applied in a genome- wide fashion [100].

The first optimized version of 3C embodies the 4C (chromosome conformation capture-on-chip or circular chromosome conformation capture) technology, which was published in 2006 by Simonis et al. [101]. While the conventional 3C method allows the detection of interaction frequency for one certain locus with a selection of other loci, the 4C technique enables a genome-wide assessment of the interaction frequency for one certain locus, termed anchor region. Thus, in contrast to 3C no previous knowledge about potential interaction partners of the locus of interest is required for this advanced method [100, 101]. This feature, that allows a one versus all analysis, is gained by an extension of the original 3C protocol. Namely, by the addition of a second digestion step after ligation followed by another ligation step which leads to a formation of small DNA circles. These serve as a template for inverse PCR using primers hybridizing on the locus of interest, so the anchor region [101, 102]. The interaction frequencies are analysed by microarrays or, more frequently today, by next- generation sequencing (NGS) methods, since they are less cost-extensive and provide a higher resolution [100, 102].

The next step of progression of the 3C method was done by the development of the 5C (chromosome conformation capture carbon copy) method [100, 103]. This approach enables simultaneous detection of interactions between several selected sequences, thus it represents a many versus many method [102]. Thereby oligos are designed to hybridize partially on restriction sites located on the regions of interest. These oligos can then bind to the 3C library and, in case of an interaction of two selected genomic loci, the oligos are consequently located next to each other on the respective ligation- fragment. A subsequent ligation of these adjacent oligos leads to a certain oligonucleotide and can be identified as a carbon copy of the ligation junction. By PCR amplification of this 5C library using universal primers complementary to parts of the oligos a high-throughput analysis of interaction frequencies is enabled, again by microarray, or nowadays more frequently, by NGS [100, 103].

Hi-C embodies a further advanced method from 3C. In contrast to all other 3C derived methods, this technology allows the generation of whole-genome interaction maps. Thus it represents an all versus all approach that was introduced in 2009 in the course of arising NGS methods [100]. For the Hi-C 26

Introduction technology a restriction enzyme generating a 5´overhang is used after formaldehyde fixation. These cohesive ends are subsequently filled with nucleotides, carrying a biotin label. After ligation of the digested fragments this biotin label serves for purification and enrichment of ligation-fragments due to a high binding-affinity of biotin to streptavidin. The enriched ligation-fragments, representing interaction partners, are shared and subsequently sequenced using a paired-end sequencing technique that finally allows the generation of genome-wide interaction maps [100, 104].

Additionally, this 3C approach can be found in a combination with methods such as ChIP, then called ChiA-PET, which embodies a many versus many approach [105]. Together the three derivatives of the 3C method facilitate the assessment of genome-wide interaction patterns, since they provide powerful tools for studying the complex genomic architecture and allow high-throughput analysis in high resolution [100].

1.5 Hypothesis and aims

S. Kollmann (2017) identified, and subsequently characterized a seDMR as a distal enhancer with lung specificity. S. Häsler Gunnarsdóttir (2018) first established 3C in the Risch-lab and began investigating seDMR interaction DNA sites.

It is hypothesized that this seDMR affects the expression of one or more genes in its neighbourhood and therefore interacts with its target gene(s) by the formation of an enhancer-promoter loop. Furthermore, it is hypothesized that this chromosomal looping formation at the seDMR, enabling gene expression enhancement, may exclusively be observed in lung cell lines.

This thesis aims to contribute to the investigation of the mechanism of action of the seDMR in order to evaluate its potential as a lung cancer biomarker.

With the overall aim of verifying the postulated hypotheses the following aims were pursued:

a) to reproduce previous mRNA expression measurements of seDMR adjacent genes b) to optimize the 3C method, previously established by S. Häsler Gunnarsdóttir in this laboratory in the NSCLC cell line A549 c) to apply the 3C method in different cell lines d) to compare seDMR interaction patterns between the lung cell line A549 and two non-lung cell lines

27

Material and Methods

2 Material and Methods

2.1 Equipment/Consumables

Table 1| Equipment used for the master thesis

Device Model Manufacturer -20°C freezer Premium Liebherr, Kirchdorf, Germany C760-86 Innova, New -80°C freezer Eppendorf, Hamburg, Germany Brunswick Cell culture Forma Series II, (water Thermo Fisher Scientific, Waltham, U.S. Incubator based) Cell counter Countess II Thermo Fisher Scientific, Waltham, USA Accuspin Micro 17, Rotor: Thermo Fisher Scientific, Waltham, USA 75003524 Multifuge 3 L-R, Rotor: Heraeus, Hanau, Germany 75006445

Centrifuge 4-16KS, Rotor: 11160 Sigma, St. Louis, USA

Sprout Mini Centrifuge Thermo Fisher Scientific, Waltham, USA

Sorvall RC-6, Rotor: SLC Thermo Fisher Scientific, Waltham, USA 4000 Electrophoresis 40-1214 PEQLAB, Erlangen, Germany chamber Fluorometer Qubit 3 Thermo Fisher Scientific, Waltham, USA

ACCU BLOCK Digital Dry Heating block Labnet, Ried, Austria Bath D1200

HERAEUS Function Line Thermo Fisher Scientific, Waltham, USA Incubator Unitron Infors HAT, Basel, Schwitzerland Laminar flow hood HERA safe KS12 Thermo Fisher Scientific, Waltham, USA

Applied Biosystems 7500 Thermo Fisher Scientific, Waltham, USA Real-time thermal Real-Time PCR System cyclers

LightCycler® 96 Roche, Basel, Switzerland

2.5 µl, 10 µl, 20 µl, 100 µl, Micropipettes 200 µl, 1000 µl Research Eppendorf, Hamburg, Germany plus

Microscope CKX53 Olympus, Tokio, Japan Multistepper Multipette stream Eppendorf, Hamburg, Germany pipette 28

Material and Methods

Power supply PowerPac Basic Bio-Rad, Hercules, USA Refrigerator Profiline Liebherr, Kirchdorf, Germany EK3000i AND, San Jose, USA Scale GR-200 AND, San Jose, USA EMB 500-1 Kern, Balingen, Germany Shaker Rocker 3D digital IKA, Staufen im Breisgau, Germany

NanoDrop2000 UV-Vis Spectrophotometer Thermo Fisher Scientific, Waltham, USA Spectrophotometer

Thermal cycler Mastercycler nexus X2 Eppendorf, Hamburg, Germany (PCR) Thermoshaker ThermoMixer C Eppendorf, Hamburg, Germany Thermostatic Aqualytic Liebherr, Kirchdorf, Germany Cabinet 3UV Benchtop Transilluminator UVP, Upland, CA, USA Transilluminator Water bath Memmert WNB14 Lactan/Roth

Table 2| Consumables used for the master thesis

Item Specification Manufacturer 250 ml / 75 cm2 Cell culture Flasks Greiner Bio-One, Kremsmünster, Austria 50 ml / 25 cm2 Centrifugation Tube 800 ml Koki Holdings, Tokio, Japan

CryoPure Tube 1.8 ml, Cryo Vail Sarstedt, Nümbrecht, Germany white

Petri dish 100 x 20 mm, vented Greiner Bio-One, Kremsmünster, Austria

0.5 ml, 1.5 ml, 2 ml, 5 Reaction Tubes Greiner Bio-One, Kremsmünster, Austria ml, 15 ml, 50 ml RPT Filter Tip 10/20 µl, 100 µl, 200 µl, StarLab, Hamburg, Germany (TipOne) 1000 µl LightCycler® 480 Multiwell plate 96, Roche, Mannheim, Germany RT-PCR plates white 0.2 ml, 96-well, semi- StarLab, Hamburg, Germany skirted Semperguard® latex SATRA Technology Centre, Northamptonshire, S, powderfree glove S UK 1 ml, 5 ml, 10 ml, 25 ml Serological pipettes StarLab, Hamburg, Germany filter, steril 0.1 ml, 0.5 ml, 1 ml Tips for Multipipette Eppendorf, Hamburg, Germany combitip, steril 29

Material and Methods

2.2 Chemicals/Enzymes

Table 3| Chemicals and enzymes used for the master thesis

Item Product ID Manufacturer 37 % Formaldehyde A0877 PanReac AppliChem, Darmstadt, Germany Agarose, research grade Lot:071295 Serva, Heidelberg, Germany ATP A2383 Sigma-Aldrich, St. Louis, USA Boric acid B6768-500G Sigma-Aldrich, St. Louis, USA Chloramphenicol A1806 PanReac AppliChem, Darmstadt, Germany CutSmart Buffer B7204S New England BioLabs (NEB), Ipswich, UK ddH2O T143.3 Carl Roth, Karlsruhe, Germany di- Sodiumhydrogenphosphate 1.06579.1000 Merck, Darmstadt, Germany dodecahydrate dNTP Mix (HotStarTaq DNA 203205 Qiagen, Hilden, Germany Polymerase kit) DTT 10197777001 Carl Roth, Karlsruhe, Germany EDTA A1103 PanReac AppliChem, Darmstadt, Germany EGTA A08778 PanReac AppliChem, Darmstadt, Germany Ethanol absolut p.a. K48572883649 Merck, Darmstadt, Germany FBS Superior Lot:1245S0615 Biochrom, Berlin, Germany GeneRuler 100 bp Lot:00263961 Thermo Fisher Scientific, Waltham, USA Glycine A3707 PanReac AppliChem, Darmstadt, Germany HotStarTaq DNA Polymerase 203205 Qiagen, Hilden, Germany Isopropanol CP41.3 Carl Roth, Karlsruhe, Germany KH2PO4 3904.1 Carl Roth, Karlsruhe, Germany LB-Medium X964.4 Carl Roth, Karlsruhe, Germany Magnesiumchloride 6- 131396 PanReac AppliChem, Darmstadt, Germany hydrate Nippon Genetics Europe GmbH, Dueren, Midori Green Advance M&B-2082311 Germany Orange G A1404,0025 PanReac AppliChem, Darmstadt, Germany PCR buffer (HotStarTaq DNA 203205 Polymerase kit) Qiagen, Hilden, Germany Potassiumchloride 39768 Serva, Heidelberg, Germany Protease Inhibitor Cocktail 5056489001 Roche, Basel, Switzerland (cOmplete, EDTA free) Proteinase K A3830 PanReac AppliChem, Darmstadt, Germany Restriction Enzyme HindIII-HF New England BioLabs (NEB), Ipswich, UK Ribonuclease A (Rnase A) 533659 Thermo Scientific, Sodiumacetat trihydrate HN05.1 Carl Roth, Karlsruhe, Germany Sodiumchloride P029.1 Carl Roth, Karlsruhe, Germany Sodiumdodecylsulfate (SDS) 183.1 Carl Roth, Karlsruhe, Germany Sodiumhydroxide S8045 Sigma-Aldrich, St. Louis, USA Sodiumtetraborate 221732 Sigma-Aldrich, St. Louis, USA 30

Material and Methods

T4 DNA Ligase M0202S New England BioLabs (NEB), Ipswich, UK TB greenTM RR420L TaKaRa, Kusatsu, Japan TRIS 37190 Serva Triton X-100 A1388 PanReac AppliChem, Darmstadt, Germany Trypsin-EDTA Solution, 1x T3924 Sigma-Aldrich, St. Louis, USA

2.3 Kits

Table 4| List of Kits used in the Master Thesis

Kit Product ID Manufacturer NucleoBond® Xtra BAC 740436.10 MACHEREY-NAGEL, Düren, Germany NucleoSpin® gDNA Clean-up 740230.50 MACHEREY-NAGEL, Düren, Germany

QubitTM ds DNA BR Assay Kit Q32853 Thermo Fisher Scientific, Waltham, USA

QubitTM ds DNA HS Assay Kit Q32854 Thermo Fisher Scientific, Waltham, USA

2.4 Buffers/Solutions

Cell culture:

PBS 137 mM NaCl, 2.7 mM KCl, 8 mM NaHPO4 and 1.5 mM KH2PO4

Gel electrophoresis:

Borate buffer (10x) 50mM sodium tetraborate; adjusted pH to 8.0 with boric acid

Loading dye (6x) 60 % glycerol and a spatula tip of Orange G

Chromosome conformation capture:

Ligation buffer (10x) 660 mM Tris-HCl (pH 7.5), 50 mM MgCl2, 50 mM DTT and 10 mM ATP

Ligase buffer 40 mM Tris-HCl (pH 7.8), 10 mM MgCl2, 10 mM DTT and 5 mM ATP

Lysis buffer 10 mM Tris-HCl (pH 7.5), 10 mM NaCl, 5 mM MgCl2, 0.1 mM EGTA and 1x complete protease inhibitor

PK buffer 5 mM EDTA (pH 8.0), 10 mM Tris-HCl (pH 8.0) and 0.5 % SDS

TE buffer 10 mM Tris-HCl, 0.1 mM EDTA

3C buffer 50 mM Tris-HCl (pH 8.0), 5 mM NaCl, 10 mM MgCl2 and 1 mM DTT

31

Material and Methods

2.5 Cell lines and media

Table 5| List of cell lines used for Master Thesis

Cell line Source Tissue and organsim Disease Characteristics Clinical data

male, 58 years, A549 AG Risch Lung (Human) NSCLC Epithelial cells caucasian

HEK293T AG Risch Embryonic kidney (Human) none Epithelial cells Fetus

male, 15 years, HepG2 AG Risch Liver (human) HCC Epithelial cells caucasian

Table 6| List of media used for the Master Thesis Medium Specification Cell line Manufacturer DMEM + 10 % FBS and 2 mM L-Glutamine LOT:RNBG5531 HEK293T Thermo Fisher Ham's F12K + 10 % FBS LOT:MC03748P A549 Scientific, RPMI 1640 + 10 % FBS LOT:1989163 HepG2 Waltham, USA

Table 7| BAC clones used for control library preparation

length of stock BAC clone Host strain Vector genomic region of insert insert solution [bp] RPCI-11 Chr1:234625165 - 181.7 LB media + 8 E.Coli (DH10B) pBACe3.6 1100M9 234806907 43 % glycerol CITB E. Coli pBeloBAC Chr1:234456558 - 167.4 and 2338M19 (GeneHogs®) 11 234624035 78 12.5 µg/ml RPCI-11 Chr1:234809303 - 182.4 chlorampheni E.Coli (DH10B) pBACe3.6 278O12 234991829 27 col

2.6 Cell culture

2.6.1 Cell line handling

The cell lines, stored at -154 °C in liquid nitrogen (vapor phase) in cryo-vials, were quickly thawed in a 37 °C water bath and added to 10 ml pre-warmed growth-medium for the respective cell line. By 32

Material and Methods centrifugation at 300 g for 5 min (4-16KS, Sigma) the cells were pelleted. After aspirating the supernatant, the cells were dissolved in 5 ml or 15 ml, of the respective growth-medium and seeded

2 2 in a 25 cm or 75 cm cell culture flask, respectively. All cell lines were incubated at 37 °C and 5 % CO2 atmosphere in a water-based incubator (Forma Series II, Thermo Fisher Scientific).

Medium exchange was conducted every 2-4 days. The medium was aspirated, and cells were washed with 10 ml pre-warmed PBS. Then PBS was removed and 15 ml of prewarmed growth-medium of the respective cell line was added.

Passaging of cell lines was conducted after cells have reached at least 80 % confluency. First the cells were washed twice with pre-warmed PBS. Then the cells were detached using a mixture of 1 ml trypsin-EDTA and 4 ml PBS and incubated at 37°C for 5 to 10 min (Forma Series II, Thermo Fisher Scientific). The reaction was stopped by adding 10 ml of the respective growth-medium. The splitting

2 ratio into a new 75 cm cell culture flask was variable adjusted according to the desired growth rate of the cell line dependant on experimental timing. In general, for the cell line A549 the splitting ratios ranged between 1:3 and 1:15, for HEK293T from 1:5 to 1:15 and for HepG2 from 1:3 to 1:5.

For cryopreservation the cells were washed twice with prewarmed PBS and detached using a trypsin- EDTA-PBS mixture as described above. After centrifugation of the cells with 300 g for 5 min (4-16KS, Sigma) the cell pellet was dissolved in pre-warmed growth-medium containing 10 % DMSO. The cell suspension was aliquoted into cryo-vials, transferred into a Mr. Frosty box, filled with isopropanol, and was stored at -80 °C. After 48 h the frozen cell aliquots were transferred into liquid nitrogen (vapor phase) for long-term storage.

Cell counting was performed by using Countess II. Cells were detached and supplemented with growth- medium as described above. Then a 20 µl aliquot of the cell suspension was mixed in a ratio of 1:1 with trypan blue. 10 µl of this mixture were then applied into the Countess Cell Counting Chamber Slide. Finally, the cell number was determined by the Countess II software.

2.7 Bioinformatical Analysis

2.7.1 Pre-experimental bioinformatical analysis

2.7.1.1 Search for enhancer predictions of seDMR in silico In the past two years great effort was made on the identification and characterization of the seDMR. To augment previous findings, further bioinformatic analyses were performed. MeDIP-Seq data from whole blood of monozygotic twins as well as by luciferase assays suggest that the seDMR of interest might act as an enhancer in lung-cancer cell lines [86, 90]. To further confirm this putative function of the seDMR, the Broad ChromHMM track at the UCSC browser (Human GRCh37/hg19) [106] was used, 33

Material and Methods which matches genomic regions to certain chromatin states or regulative elements, including enhancers, based on ChIP-seq data of 7 distinct histone modifications measured in 9 different cell types [71, 107]. Additionally, the FANTOM database, more precisely the Transcribed Enhancer Atlas (FANTOM5), was used to look up enhancer predictions regarding the seDMR of interest with focus on lung tissue data. This database defines enhancer regions based on bidirectional cap analysis of gene expression (CAGE) patterns of tissue and primary cell samples that are not associated to promoter regions [63, 108].

2.7.1.2 Comparison of enhancer predictions of seDMR between cell lines in silico Furthermore, an aim of this master’s thesis was to compare the interaction patterns of the seDMR of A549, a lung-cancer cell line, with other non-lung cell lines. Both databases mentioned above, with addition of the Broad Histone track at the UCSC browser were used for the search of suitable cell lines that might differ in their enhancer predictions at the seDMR.

2.7.2 Post-experimental bioinformatical analysis

In order to evaluate the results, gained from the 3C experiment applied to three different cell lines, the UCSC browser was utilized to further investigate the regions, at which interactions with the seDMR were measured. Furthermore, the Gene Transcription Regulation Database (GTRD, v18.06), a database showing TF binding-sites and their localization at the human genome (Human GRCh38/hg38) based on ChIP-seq data collected from diverse public repositories like ENCODE, GEO and SRA [109], was used to search for binding-sites of TFs that are associated to chromatin looping.

2.8 mRNA expression analysis

Cell treatment, RNA isolation and cDNA synthesis had previously been performed by a S. Rauscher (2017) in three experimental approaches [110]. Cell treatment conditions and amount of RNA used for cDNA synthesis of the different experiments can be seen in Table 8. RNA was isolated using the High

Pure RNA Isolation kit. For reverse transcription an Oligo(dT)23 primer mix and 64 U of SuperScript III Reverse Transcriptase were used.

34

Material and Methods

Table 8| cDNA samples generated by S. Rauscher (2017) used for mRNA expression analysis

a) b) RNA RNA reverse Cell line Experiment no. Treatment Cell line Experiment no. Treatment reverse transcribed transcribed

0.5 µM DAC 0.5 µM DAC 2.5 µM DAC 2.5 µM DAC 1 250 ng 1 250 ng 5.0 µM DAC 5.0 µM DAC DMSO DMSO 0.1 µM DAC 0.1 µM DAC 0.5 µM DAC 0.5 µM DAC 2 3 2000 ng 2.5 µM DAC 1.0 µM DAC DMSO DMSO 500 ng A549 0.1 µM DAC 0.5 µM AZA 0.5 µM DAC 2.5 µM AZA H1299 3 1 250 ng 1.0 µM DAC 5.0 µM AZA DMSO DMSO 0.5 µM AZA 0.1 µM AZA 2.5 µM AZA 0.5 µM AZA 1 250 ng 3 2000 ng 5.0 µM AZA 1.0 µM AZA untreated DMSO 0.1 µM AZA 0.5 µM AZA 3 500 ng 1.0 µM AZA untreated

A quantitative real-time PCR (qRT-PCR) was performed with a RT thermal cycler (Applied Biosystems 7500 Real-Time PCR System, Thermo Fischer) in a 96-well plate (StarLab) using 2 µl of cDNA samples, previously diluted in order to compensate differences in the amount of RNA used for reverse transcription, 10 µM primer mix (FW and RV primer mixed in equimolar amount, primer list see Table 9), 1:250 ROX and 1x TB GreenTM in a 20 µl total reaction volume. The RT-PCR program can be seen in Table 10.

For calculation of the fold change in expression two normalization steps were performed. First it was normalized against the expression of ARP, a housekeeping gene, to be able to compare the expression between cell lines, resulting in a ∆Ct for each sample. Second, the ∆Ct was again normalized to the untreated/DMSO control sample (now ∆∆Ct) in order to get a fold change in expression that is only affected by the HMA treatment. The ∆∆Ct finally is used to calculate the real fold change in mRNA expression by Formula 1.

35

Material and Methods

Formula 1

푓표푙푑 푐ℎ푎푛푔푒 𝑖푛 푒푥푝푟푒푠푠𝑖표푛 = 2−∆∆퐶푡

Table 9| List of primers used for mRNA expression analysis

Gene Primer Sequence 5’-3’ name Specification

FW CTTAGAGGATGCTTCTCAATGC COA6 RV TTCTGAAGGCTCAAATTGTCC FW CTGTTTTGGTGGTCTGAGAGG TARBP1 RV TCTGACACCGCATATTCAAAC FW GAGAGCAGGACTGGGTCAAC IRF2BP2 RV AGAGGGCTTCCTTTTCCTTG FW TCACCTTCAGATGGACCACA LINC01132 RV TACACACATGGGTGGCAGAG FW ATGGTGGGTCGGAACAGC TOMM20 RV GAAAGCCCAGCTCTCTCCTT FW ACCACTTTCGCAAGAACCTG RBM34 RV TCCTTTTCTGCCCTTGTTTC FW GCTGAACCTGGTGTTTTAGAGG ARID4B RV TGATCTTGGCTTCACAAAAGG FW GGAGAAGACTCAAGAAACAGTCC GGPS1 RV AAATGCCTGTGAAAGTTTGG FW CGTTTTGCTGGTGTTGTCC TBCE RV TCCTCTCTCGGGATTGTCC FW TCTACAACCCTGAAGTGCTTGAT ARP RV CAATCTGCAGACAGACACTGG

Table 10|qRT-PCR program for mRNA expression analysis

Pre-incubation 95 °C 30 sec 1x 95 °C 5 sec 3 step 60 °C 30 sec 40x amplification 72 °C 30 sec 95 °C 5 sec Melt curve 65 °C 60 sec 1x stage 95 °C 60 sec Cooling 4 °C ∞

36

Material and Methods

2.9 Chromosome conformation capture

2.9.1 Preparation of control library for 3C method

Suitable BACs were selected by using the BAC End pairs track at UCSC genome browser (Human GRCh37/Hg19)[106, 111] based on the following criteria:

• together the selected BACs must cover the complete region of interest • if possible, avoidance of gaps or overlaps within the selected BACs • roughly equal insert length

Subsequently, the selected BACs (see Table 7) were ordered from Thermo Fisher Scientific.

2.9.1.1 BAC isolation from host strain and insert verification

2.9.1.1.1 Isolation of BAC

For preparing the control library each single BAC clone was steaked out separately on a LB-agar plate, containing 12.5 µg/ml of chloramphenicol, which was then incubated overnight (o/n) at 37 °C (HERAEUS Function Line, Thermo Fisher Scientific). Then, 5 ml of pre-warmed LB-medium, also dosed with 12.5 µg/ml chloramphenicol, were inoculated with a single colony of the respective BAC clone. After approximately 8 h of incubation at 37 °C while shaking at 300 rpm (ThermoMixer C, Eppendorf), 400 µl of this starter culture were transferred into a 1 l baffled Erlenmeyer flask containing 250 ml LB- medium with same amount of chloramphenicol as mentioned above. The E. Coli cultures were grown o/n at 37 °C at 210 rpm for maximal 16 h (Unitron, Infors HAT). Then, OD600 was measured by a spectrophotometer for each culture in order to determine the recommended culture volume needed for the next step of vector isolation with the NucleoBond® Xtra BAC kit by Formula 2.

Formula 2 1500 푉푐푢푙푡푢푟푒 [푚푙] = 푂퐷600 Vector isolation was further performed according to manufacturer’s protocol continuing with harvesting the bacterial cells in step 3 with the following specifications:

- Step 3: The calculated volume of E.Coli cell culture was centrifuged for 20 min with 4000 g at 4 °C (Sorvall RC-6, Thermo Fisher Scientific).

37

Material and Methods

- Step 8: Since the filter had to be refilled several times due to high sample volume, the remaining sample was stored on ice until filling it on the column. Additionally, the lysate flow-through was loaded on the filter a second time, as recommended in the manufacturer’s protocol.

- Step 12: Elution buffer was heated to 70 °C (ThermoMixer C, Eppendorf) and was applied stepwise in portions of 2-3 ml.

- Step 13: For DNA precipitation the samples were centrifuged for 1.5 h at 4.565 g and 4 °C (Multifuge 3 L- R, Heraeus).

- Step 14: To wash the DNA pellet, 6 ml of 70% EtOH were added and samples were centrifuged for 15 min at 4.565 g and 4 °C (Multifuge 3 L-R, Heraeus).

- Step 15: Purified BAC vector was finally dissolved in 500 µl of TE buffer. After o/n incubation at 4 °C while shaking with 300 rpm (ThermoMixer C, Eppendorf), DNA concentration was determined by QubitTM 3, using the QubitTM ds DNA BR Assay kit.

2.9.1.1.2 Verification of the BAC insert

After the isolation of the BAC clone from its host strain it is important to verify that the isolated BAC clone still contains the correct insert in full length. Hence, a test PCR was performed with subsequent agarose-gel electrophoresis. To prove the existence of the insert several test primers were used that bind the insert of the respective BAC and as negative control primers were chosen that cannot bind to the insert of the tested BAC but to those from another BAC used for the 3C experiment. As a positive control a primer was designed that binds the E.Coli host strain gDNA and another one that aligns on the BAC vector backbone at the antibiotic-resistance site (CMr). As the control for the PCR all primers used were also tested on gDNA from H1299 cells, provided by F. Wolff. The test PCR was performed in a total reaction volume of 20 µl consisting of 1X PCR Buffer, 1 mM MgCl2, 0.4 µM primer mix (forward and reverse primer mixed in equimolar amounts), 200 µM dNTP mix, 1 U HotStar Taq Polymerase and 4 ng of BAC DNA. The PCR program was utilized according to Table 11.

After PCR, 7 µl of the respective amplification product were diluted in 5 µl dH2O and 3 µl loading dye (6x) were added. Then agarose-gel electrophoresis was performed using the diluted PCR product 38

Material and Methods loaded on a 1.5 % agarose-gel based on borate buffer (1x), dosed with Midori Green Advance. The electrophoresis run for 45 min at 130 V. For visualisation of the PCR product on the gel a transilluminator was used.

Table 11| PCR program for BAC insert testing

Pre- 95 °C 15 min incubation 1x 94 °C 30 sec 3 step 60 °C 30 sec 30x amplification 72 °C 30 sec Elongation 72 5 min 1x Cooling 4 °C ∞

2.9.1.2 Digestion of BAC control library Before digestion the three isolated BACs had to be mixed in equimolar amounts, therefore DNA concentration of each BAC sample was measured using the QubitTM 3 and the QubitTM ds DNA BR Assay kit. Then, the three samples were mixed together taking into account different insert lengths of the BACs. For the shortest BAC clone 5 µg were used, while for the other two correspondingly less was added and the resulting DNA concentration of the BAC library was measured again as described above. Subsequently, digestion of the BAC mix was performed in a total reaction volume of 500 µl containing 5 µg of BAC mix, 500 U HindIII-HF and 1x CutSmart buffer. After incubation for 2 h at 37 °C (ThermoMixer C, Eppendorf) the reaction mix containing the digested BAC library was transferred into a 15 ml tube. Then, an EtOH-precipitation was conducted by adding 1 volume dH2O, 0.1 volume of sodium acetate (2 M, pH 5.6), 0.01 M of MgCl2 and 2 volumes of absolute EtOH, stored at RT, in the mentioned order. After vigorous mixing, the sample was incubated for approximately 1 h at -80 °C. Next, the digested BAC library was centrifuged for 1 h with 2200 g at 4 °C (Multifuge 3 L-R, Heraeus). The supernatant was subsequently discarded, and the pellet was washed by addition of 1 ml of EtOH (70 %) followed by centrifugation for 15 min with 2200 g at 4 °C (Multifuge 3 L-R, Heraeus). After discarding the supernatant, the pellet was air dried for a few minutes at room temperature (RT) and dissolved in 500 µl of 1x ligase buffer.

2.9.1.3 Ligation of BAC control library In order to randomly ligate the digested fragments of the BAC library, 17 µl of T4 DNA ligase were added to the digested sample, which was then incubated o/n at 16 °C in a thermostatic cabinet. Next, an EtOH-precipitation was performed in same way as described above in step 2.9.1.2 but the pellet resulting from centrifugation after EtOH (70 %) addition was dissolved in 150 µl of Tris-HCl (10 mM pH 8.0). Then the digested and re-ligated BAC library was purified by using a column-based gDNA clean-

39

Material and Methods up kit (Macherey & Nagel) according to the manufacturer’s protocol resulting in the final BAC control library, whose DNA concentration was measured by QubitTM 3 using the QubitTM ds DNA HS Assay kit.

2.9.1.4 Preparation of BAC control library dilutions Since a standard curve of the control library must be generated in order to correct for differences in primer efficiencies and to assume background interactions during the measurement of interaction frequencies of seDMR in step 2.9.2.8, the control library has to be serially diluted before.

2.9.1.4.1 Testing optimal concentration of BAC control library

To test the optimal range of the control library dilutions used for qRT-PCR, three different serial dilutions of the control library were prepared and tested successively by qRT-PCR. Every single dilution row consists of three dilutions, namely 1:1, 1:10 and 1:100. For qRT-PCR 2 µl of the respective library dilution, 1 µM of Primer Mix (anchor FW + test primer FW or internal primer FW + RV) and 1x TB GreenTM in a total reaction volume of 20 µl. The qRT-PCR program used can be seen in Table 12.

Table 12|qRT-PCR program for 3C

Pre-incubation 95 °C 30 sec 1x 95 °C 5 sec 3 step 60 °C 20 sec 40x amplification 72 °C 30 sec 95 °C 5 sec Melt curve stage 65 °C 30 sec 1x 95 °C 1 sec Cooling 37 °C 30 sec

2.9.1.4.2 Dilution of BAC control library

After finding the optimal stock concentration of control library for subsequent serial dilution, the library was again diluted in the same way as described above but using higher volumes so that the dilutions of the control library can be used for the measurement of interaction frequency of all prepared 3C samples. Thus, 2 ml of each tenfold-dilution were prepared where the starting dilution contains 2.002 ng/µl of the control library. The dilutions were stored in aliquots of 110 µl at -20 °C until further use.

40

Material and Methods

Figure 3|Overview of genomic region covered by control library. A) Localization of genes and their direction of transcription (grey boxes and arrows) and lncRNAs (beige) on centromeric site of seDMR. B) Position of selected BACs covering the ROI representing the control library. C) Amplification sites of test primers (green), internal primers (orange) and anchor primer (red).

41

Material and Methods

2.9.2 3C approach

In order to identify putative looping events of seDMR to potential target genes on the centromeric side of this enhancer region, the chromosome conformation capture method was used. The 3C approach was performed according to a modified protocol acquired by S. Häsler Gunnarsdóttir (2018) [90], which is based on protocols from Hagége et. al. and Ea et. al. [96, 112].

2.9.2.1 Single cell preparation and formaldehyde crosslinking For the 3C approach 5x106 cells were transferred into a 15 ml tube and were dosed with 9.73 ml of prewarmed 10 % FBS/PBS. Then 1 % of formaldehyde was added to the cell suspension to crosslink genomic regions standing in physical proximity. After incubation for 10 min at RT while permanent manual tumbling the formaldehyde fixated samples were put on ice to stop the crosslinking reaction by addition of 125 mM ice cold glycine. By centrifugation for 8 min with 320 g at 4 °C (4-16KS, Sigma) the cells were pelleted. To wash the cells 1 ml of 3C buffer was added and another centrifugation step for 5 min with 400 g at 4 °C (4-16KS, Sigma) was performed. The supernatant was then discarded.

2.9.2.2 Cell lysis After formaldehyde-fixation cell lysis was induced by dissolving the pellet in 5 ml of cooled lysis buffer followed by incubation on ice for 10 min. To promote optimal cell lysis the suspension was mixed by pipetting every 2-3 min. Afterwards the lysate was centrifuged for 5 min with 400 g at 4 °C (Multifuge 3 L-R, Heraeus). Supernatant was discarded, and the pelleted nuclei were put on dry ice for 5 min and were then stored at -80 °C until further use. In the following steps the experiments were performed in duplicates, thus pelleted nuclei were always further processed in a pairwise manner.

2.9.2.3 Restriction enzyme digestion The pelleted nuclei, gained after cell lysis, were thawed and re-suspended in 1 ml 1x CutSmart buffer to remove residual amounts of lysis buffer. Then the suspension was centrifuged for 5 min at 400 g at 4 °C (Multifuge 3 L-R, Heraeus) followed by discarding the supernatant. Using 1.5x CutSmart buffer the pellet was again dissolved and the solution was transferred into a 1.5 ml tube. This sample was then heated to 37 °C (ThermoMixer C, Eppendorf) and 7.5 µl of 20% SDS were added (final 0.3%) in order to facilitate digestion by gently perforating the nuclear membrane. This step is followed by incubation for 1 h at 37 °C while gently shaking at 900 rpm (ThermoMixer C, Eppendorf) and re-suspending by pipetting up and down every 10 min. Then, Triton X-100 was diluted to 12% in 1x ligase buffer from which 84.5 µl were pipetted to the sample to gain a final concentration of Triton X-100 of 3% which is used to quench the reaction of SDS and therefore prevents SDS to inhibit the digestion enzyme. Again, the solution was incubated for 1 h at 37 °C while shaking with 900 rpm (ThermoMixer C, Eppendorf)

42

Material and Methods and frequently re-suspending. After this step an aliquot of 20 µl was taken for digestion efficiency control labelled as “undigested control”, see step 2.9.2.6. 150 U of the six-cutter digestion enzyme HindIII-HF were added to the remaining sample and the stepwise digestion of the genomic DNA was initiated. During incubation for 1.5 h at 37 °C while shaking with 900 rpm (ThermoMixer C, Eppendorf) the sample was re-suspended every 10-15 min by pipetting up and down. Then, again 150 U of HindIII-HF were added, and the mixture was incubated in the same way as before. Finally, in the last step of digestion, 150 U of HindIII-HF were added to the sample which was then incubated o/n at 37 °C while shaking with 900 rpm (ThermoMixer C, Eppendorf). Before continuing with the next step of the 3C method 20 µl of this sample were pipetted into a new tube representing the “digested control” aliquot, needed for assessment of digestion efficiency in step 2.9.2.6.

2.9.2.4 Re-Ligation/De-crosslinking After o/n incubation, 1.6% SDS were added to the remaining digested nuclei, which were then incubated for 30 min at 37 °C while shaking with 900 rpm (ThermoMixer C, Eppendorf). Afterwards the suspension was transferred into a 50 ml falcon tube and 6.125 ml of 1.15x ligation buffer were added as well as 1% of Triton X-100. Then this sample was incubated for 1 h at 37 °C and 300 rpm (ThermoMixer C, Eppendorf). In order to re-ligate digested fragments 6,800 U (cohesive end units) of T4 DNA ligase were added to the digested nuclei, which were then incubated for 4 h at 16 °C in a thermostatic cabinet.

For de-crosslinking of the nuclei, the re-ligated sample was equilibrated for a few minutes at RT. Then 30 µg of proteinase K were added followed by incubation at 65 °C o/n (HERAEUS Function Line, Thermo Fisher Scientific).

2.9.2.5 DNA purification After equilibration of the re-ligated and de-crosslinked nuclei at RT the sample was first purified from RNA residuals by addition of 300 µg of RNase A and incubation for 45 min at 37 °C (ThermoMixer C,

Eppendorf). Then an EtOH-precipitation was performed by adding 1 volume dH2O, 0.1 volume of sodium acetate (2 M, pH 5.6) and 0,01 M of MgCl2. Lastly, 30 ml of absolute EtOH, stored at RT, were added followed by vigorous shaking and incubation at -80 °C for approximately 1 h. Afterwards the sample was centrifuged at 2,200 g and 4 °C for at least 1 h (Multifuge 3 L-R, Heraeus). Supernatant was then discarded, and the genomic DNA pellet was washed with 10 ml of EtOH (70 %) and again centrifuged for 15 min at 4 °C (Multifuge 3 L-R, Heraeus). After removing the supernatant, the pellet was dissolved in 150 µl of 10 M Tris-HCl pH 8.0. The resulting gDNA sample was then further purified using a column-based gDNA clean-up kit (Macherey & Nagel) according to manufacturer’s protocol. The gDNA was first eluted with 50 µl of elution buffer, provided by DNA clean-up kit, then in order to 43

Material and Methods increase final yield, the eluate was loaded again on the column and centrifuged again as recommended in the manufacturer’s protocol. After this purification step the samples were stored until further use at -20 °C and represent the final 3C libraries, ready for the final measurements to investigate putative interactions between the seDMR and its putative target site(s).

2.9.2.6 Determination of digestion efficiency In order to ensure unbiased detection of interactions of the seDMR to its target site(s) it is necessary to carefully assess the digestion efficiency at the investigated restriction sites of HindIII directly after digestion [96]. To do this, the two aliquots saved during step 2.9.2.3 are used.

2.9.2.6.1 Purification of digestion control aliquots

The digestion control aliquots, collected before and after digestion with HindIII-HF had to be purified from the enzymes, added in previous steps, as well from residual RNA and protein. Therefore, 500 µl of 1x PK buffer and 20 µg of PK were added to the digested and undigested control aliquot, followed by an incubation for 2 h at 65 °C (ACCU BLOCK Digital Dry Bath D1200, Labnet)) to denature any remaining proteins. After an equilibration of the control samples at 37 °C (ThermoMixer C, Eppendorf), 1 µg of RNase A was added. Then, the samples were incubated at 37 °C for 45 min (ThermoMixer C, Eppendorf). For further purification a column-based gDNA purification kit was used according to the manufacturer’s protocol. Therefore, 150 µl of the gDNA of each digestion control aliquot were used and DNA was finally eluted in 50 µl elution buffer, contained in the kit.

2.9.2.6.2 Testing of digestion efficiency

Using the purified digestion control aliquots, qRT-PCR was performed in order to assess digestion efficiency. A set of test primers, which were designed to amplify across restriction sites and internal primers, generating an amplicon beyond a restriction site (Figure 4), were used. All primers used for the assessment of digestion efficiency (Table 13) had previously been designed by S. Häsler Gunnarsdóttir (2018) [90]. In a total volume of 20 µl DNA amplification was performed using 2 µl of digested or undigested DNA, respectively, 1x TB GreenTM and 1 µM of each primer mix (FW and RV primers were previously mixed together in equimolar amounts). For practical reasons, first each primer mix was pipetted 4 times (in duplicates for digested and undigested DNA) into a 96-well plate (Roche), followed by the master-mix containing, among 1x TB GreenTM, the respective DNA. qRT-PCR was performed using a RT thermal cycler (LightCycler 96, Roche) with conditions seen in Table 12.

For calculation of the digestion efficiency of Hind III-HF for the samples, Formula 3 was used in which

CtR represents the mean of Ct values (from duplicates) gained by qRT-PCR with the respective restriction primer pair for digested or undigested DNA sample, while CtI stands for the mean of Ct

44

Material and Methods values measured for digested or undigested samples amplified with the internal primers, binding far beyond a restriction site.

Formula 3 100 % 푑𝑖푔푒푠푡푒푑 = 100 − 2((퐶푡푅.−퐶푡 퐼.)푑푖푔푒푠푡푒푑−(퐶푡푅.−퐶푡 퐼.)푢푛푑푖푔푒푠푡푒푑)

Hagége et al. postulates, that a digestion efficiency above 60%, but ideally higher than 80% should be reached for each restriction site covered by used restriction primer pairs [96].

Figure 4| Schematic representation of test primer and internal primer location on the DNA Test primer FW and RV (green) are designed in a way to amplify a genomic region containing a restriction site of HindIII depicted in on the left side, internal primer FW and RV (orange) are both located beyond a restriction site, thus the amplicon of the internal primer pair does not contain a Hind III restriction site, illustrated on the right site of the figure.

Table 13| List of test primers (green) and internal primers (orange) used for assessment of digestion efficiency

Primer Orientation Sequence 5’-3’ FW CTGCACTACTGGACTGAGGG Anchor RV CGTGTGTCTTGCCTGGTTC FW GGAACTGCCCTTGAATCTTGG 1 RV TGAGCATGTTTCCTTCATTGCA FW TGAGGGACCAGGACAAATGC 2 RV GAGAAGTTAGCACGCAATCTGG FW CTGGCGGTGATGGTTGCTA 3 RV TGCAGCTTCATCCCATCCTG FW AGATAGCAACATAAGCAGTGGGA 5 RV TTTCTTATCTGACTGCTCTGGGT FW AGGTGAAGGCAAGAAGCTCA 6 RV TGCTTTGCCTCCACACTTCA FW AGCATCTCAGGGCACAGACT 7 RV TGCTGCCATATGTTTTTCCA 45

Material and Methods

FW ACTGCTGAGTTCGTGGCTTT 8 RV GAAGAAACCCGCAATGATGT FW AAGGAGCAGAGCCACTGAAA 9 RV CCCAAAGACTTCCTTCACCA FW AAACCTGGGCTGTTAGCAACT 11 RV CTCCTCTAGGGCCAGTCCTT FW AAGCCATCACACCCAGTCA 12 RV ATCACTTGGCTCAGGGATTG FW GTGAAGCGGCTGTCCTATCC 13 RV CTTACACCACGCAGCTAGGG FW ACCATTGCAGTGACTAAGGTGT 14 RV CTGAGCTGGGCAACTATGGT FW CCCTAGATTCTAACATTGGCAGC 15 RV AGTTTCTTCCAGGCGGCTAA FW TTATGATTCAGCGATTTCACCCC 16 RV ACGTTGTGACTGGCTTCCTT FW ACAGGCAGTTCTACCGTTGG 17 RV CCCTTCCTTGCTCCCATTGT FW ACAGGCAAATGTCTCACTCCT 18 RV AAGGCCTATTCACTGCAGCT FW GAAAAGAGAAAGACAGTGGGGC 19 RV TGCCTCTGGTTTCTCGTGAG FW ATGCCTGGCCCTTGATTTTG 20 RV ACTTCCTCTGAATCACTCTGCA FW TCTCTCTTGCTCACCTCCGA 21 RV GGCCAGCCCCACTTGATTT FW ACTGAGGGACGACTATACTGTGT 22 RV TGCCCATACCATACTGTGTTGA FW AAAGGTGGTGGGACTGGTTG 23 RV GGCCTCTGTTCCTTGACCTC FW TTCAGGGGTAAGGTGGGGAA 24 RV AGTGCTCTTCGAGGTCAAAAGT FW TGAACCAGGTGAGAACTTCTAGC Int. 1 RV TGAGACCTGAGCTAGACACCA FW ATTCCACACACACAACCCCTT Int. 2 RV TGCAGGCAAGGATGGGAAATA

2.9.2.7 Purity assessment Purity of the samples was assessed by producing a standard curve via qRT-PCR of diluted aliquots of the purified 3C libraries. Therefore, a serial dilution, in which library concentrations were halved by each dilution, starting from 12.5 ng/µl to 0.2 ng/µl, was performed. In a total reaction volume of 20 µl

46

Material and Methods containing 1 µM of internal primer mix (FW and RV primer mixed in equimolar amounts), 1x of TB GreenTM and 2 µl of each diluted 3C library, a qRT-PCR was performed in duplicates for each dilution with conditions, seen in Table 12, using a RT thermal cycler (LightCycler 96, Roche). Then, using Formula 4 the q-Value was calculated, which in the end is used to assume the deviation from expected to the measured dilution factor.

Formula 4

(퐶푡−푖푛푡푒푟푐푒푝푡) 푞푉푎푙푢푒 = 10 푠푙표푝푒

Here the mean of Ct values (from duplicates) evaluated by qRT-PCR of the respective 3C library dilution is used as well as intercept and slope, gained from standard curve of this dilutions, calculated by LightCycler 96 software (Roche). In order to fulfil purity criteria, the deviation from expected to measured dilution factor should not exceed 30% [96].

2.9.2.8 Real-time PCR for interaction quantification After preparation of the BAC control library and subsequent processing of the cell samples to a 3C library the interaction frequency of the seDMR to putative target site(s) can be measured by qRT-PCR. Therefore, DNA concentration of the 3C library was quantified by QubitTM 3 and the QubitTM ds DNA HS Assay kit. Then, the 3C library was diluted to a final concentration of 12.5 ng/µl and one aliquot of each of the three dilutions of the BAC control library, produced in step 2.9.1.4.2, were gently thawed on ice. In a total reaction volume of 20 µl, 1 µM of a primer mix (containing the FW primer of the anchor primer and the FW primer of the respective test primer in the following termed A/X, where A stands for anchor primer and X for the respective test primer, or internal primer FW and RV), 1x TB GreenTM and 2 µl of DNA (of 3C library dilution or BAC control library dilutions), a qRT-PCR was performed in a RT thermal cycler (LightCycler 96, Roche). Thus, 4 separate master-mixes were prepared containing the DNA, instead of primers, which were pipetted in duplicates into a 96-well plate (Roche). The qRT-PCR-program for measurement of the interaction frequency of seDMR to its putative target site(s) can be seen in Table 12 and processing of the raw data was performed with help of the LightCycler 96 software.

Using the mean of Ct values (from duplicates) of the 3C samples measured for each single primer pair (Table 14), as well as the standard curve, gained from qRT-PCR of the three different BAC control library dilutions using the same primer pairs as for the 3C sample, the q-Value for each primer pair can be calculated by Formula 4. To induce a loading control the q-Values of the A/X primer pairs were normalized against the q-Value of the internal primer pair, see Formula 5, resulting in the relative interaction frequency of seDMR to the centromeric site of the ROI. 47

Material and Methods

Additionally, normalization against ERCC3, a housekeeping gene with constant chromatin looping state was tried in order to be able to compare the relative interaction frequency between different cell lines. Therefore, the primer pair A/ERCC3 was used for qRT-PCR as described above, but for creation of a standard curve dilutions of the 3C library were used instead of the BAC library. Calculation of the q- Value was performed as described above for 3C samples but for normalization the qValue of the A/ERCC3 primer pair was used instead of those from the internal primer pair. In addition, to this way of normalization against ERCC3, it was also tried to normalize by using a ΔCt value (resulting from subtraction of the mean Ct value of each A/X primer pair from the mean of the Ct value of the A/ERCC3 primer pair) instead of the Ct value at the calculation of the q-Value (Formula 4).

Formula 5

푞푉푎푙푢푒퐴/푋 푟푒푙푎푡𝑖푣푒 𝑖푛푡푒푟푎푐푡𝑖표푛 푓푟푒푞푢푒푛푐푦 = 푞푉푎푙푢푒퐼푛푡 Table 14| List of primer pairs used for assessment of interaction frequency of seDMR to putative target site(s)

Primer Orientation Sequence 5’-3’ FW CTGCACTACTGGACTGAGGG A/1 FW GGAACTGCCCTTGAATCTTGG FW CTGCACTACTGGACTGAGGG A/2 FW TGAGGGACCAGGACAAATGC FW CTGCACTACTGGACTGAGGG A/3 FW CTGGCGGTGATGGTTGCTA FW CTGCACTACTGGACTGAGGG A/4 FW GAGTTCCCAACCCTGCTCTAA FW CTGCACTACTGGACTGAGGG A/5 FW AGATAGCAACATAAGCAGTGGGA FW CTGCACTACTGGACTGAGGG A/6 FW AGGTGAAGGCAAGAAGCTCA FW CTGCACTACTGGACTGAGGG A/7 FW AGCATCTCAGGGCACAGACT FW CTGCACTACTGGACTGAGGG A/8 FW ACTGCTGAGTTCGTGGCTTT FW CTGCACTACTGGACTGAGGG A/9 FW AAGGAGCAGAGCCACTGAAA FW CTGCACTACTGGACTGAGGG A/10 FW GCTTTTCCTTGCCCACGATG FW CTGCACTACTGGACTGAGGG A/11 FW AAACCTGGGCTGTTAGCAACT FW CTGCACTACTGGACTGAGGG A/12 FW AAGCCATCACACCCAGTCA

48

Material and Methods

FW CTGCACTACTGGACTGAGGG A/13 FW GTGAAGCGGCTGTCCTATCC FW CTGCACTACTGGACTGAGGG A/14 FW ACCATTGCAGTGACTAAGGTGT FW CTGCACTACTGGACTGAGGG A/15 FW CCCTAGATTCTAACATTGGCAGC FW CTGCACTACTGGACTGAGGG A/16 FW TTATGATTCAGCGATTTCACCCC FW CTGCACTACTGGACTGAGGG A/17 FW ACAGGCAGTTCTACCGTTGG FW CTGCACTACTGGACTGAGGG A/18 FW ACAGGCAAATGTCTCACTCCT FW CTGCACTACTGGACTGAGGG A/19 FW GAAAAGAGAAAGACAGTGGGGC FW CTGCACTACTGGACTGAGGG A/20 FW ATGCCTGGCCCTTGATTTTG FW CTGCACTACTGGACTGAGGG A/21 FW TCTCTCTTGCTCACCTCCGA FW CTGCACTACTGGACTGAGGG A/22 FW ACTGAGGGACGACTATACTGTGT FW CTGCACTACTGGACTGAGGG A/23 FW AAAGGTGGTGGGACTGGTTG FW CTGCACTACTGGACTGAGGG A/24 FW TTCAGGGGTAAGGTGGGGAA FW TGAACCAGGTGAGAACTTCTAGC Int. 1 RV TGAGACCTGAGCTAGACACCA FW CTGCACTACTGGACTGAGGG A/ERCC3 FW ATGCCCACTGTATCCTCCCT

49

Results

3 Results

3.1 seDMR in silico

3.1.1 seDMR is predicted as an enhancer region in silico

Previous findings from S. Kollmann (2017), who identified the seDMR via an enhancer targeted search strategy, suggest that the seDMR acts as a distal enhancer. This could further be verified by luciferase assays, performed by S. Kollmann and S. Häsler Gunnarsdóttir, clearly depicting an enhancer activity of this region in the lung-cancer cell lines A549 and H1299 [86, 90]. To further confirm the enhancer function of the seDMR, enhancer predictions of this particular region were analysed in silico using the UCSC genome browser [106, 111] and the Transcribed Enhancer Atlas from at the FANTOM5 database [63, 108].

In silico analysis utilizing the Broad ChromHMM track of the UCSC Genome Browser allows the illustration of distinct chromatin state annotations at seDMR and its surrounding. In Figure 5 one can see that the seDMR overlaps with a genomic region, annotated as a strong enhancer in 7 out of 9 considered cell lines including the lung cell line of normal human lung fibroblasts (NHLF). However, it can be observed that the region labelled as a strong enhancer has an average length of 7 kb, thus it is many times bigger than the seDMR itself, which only spans over 500 bp. GM12878, a lymphoblastoid cell line, and H1-hESC, a human embryonic stem cell line, in contrast to the other cell lines are annotated as a weak or poised enhancer at the seDMR. Adjacent to the region annotated as a strong enhancer, which includes the seDMR, relatively short regions showing weak or poised enhancer annotation, as well as weak transcribed genomic segments can be found (see Figure 5).

The results regarding enhancer predictions for the seDMR from FANTOM5 data can be seen in Figure 6. Searching for enhancers predicted by the FANTOM5 database on a slightly extended region around the seDMR one hit could be achieved. Like the seDMR this corresponding region is located on chromosome 1q42.3. However, with a distinct position of 234857710 to 234858181 (Human GRCh37/hg19) this predicted enhancer region is shifted 41 bp towards the centromeric site of the seDMR. Spanning a length of 471 bp the predicted enhancer is also 29 bp shorter than the seDMR. However, this predicted enhancer exhibits a sequence overlap with the seDMR of 86 %.

However, beside the predicted enhancer overlapping in major parts with the seDMR there were two further predicted enhancers found on the telomeric side of the seDMR with a distance of 700 bp or 4000 bp, respectively, distant to the seDMR.

50

Results

Figure 5| Overview of chromatin state annotations at the seDMR obtained from Broad ChromHMM track. In A) an overview of the genomic region around the seDMR can be seen for 9 selected cell lines. Certain segments of this region are differentially coloured reflecting differential chromatin state annotations defined by the Broad ChromHMM track on the UCSC Genome Browser. In B) the key of the colour code used in A) from the Broad ChromHMM track is depicted.

Figure 6| Alignment of enhancer regions predicted by FANTOM5 with the region around the seDMR In A) a zoomed view of the region of the seDMR is displayed while in B) a larger region around of the seDMR is shown. The red boxes label the region of the seDMR on genomic sequence while the green boxes depict locations of the enhancer regions predicted by FANTOM 5 database or by the Transcribed Enhancer Atlas, respectively. The scissors display the restriction sites of the anchor fragment.

51

Results

3.1.2 HEK293T and HepG2 are promising cell lines for a comparison of the interaction pattern of the seDMR via 3C

One aim of this master thesis was to apply the 3C method, not just to a lung-cancer cell line, but also in non-lung cell lines in order to test for lung specificity. To identify suitable cell lines, which might exhibit a differential interaction pattern of the seDMR, diverse in silico analyses were performed.

First, histone modification patterns were analysed, since it is known that certain types of these marks colocalize with regulatory elements as enhancers or promoters, respectively. Thus, finding a cell line that shows an altered histone methylation pattern on the seDMR, correlating with another functionality, also might lead to the detection of a differential interaction pattern of the seDMR. Using the Broad histone track of the UCSC Genome Browser [106], ChIP-seq data for the histone modifications H3K4me1 and H3K27ac, marks that are used for enhancer identification, as well as data of the repressive histone marks H3K4me3 and H3K37me3 were analysed in silico for the lung cell line NHLF and 18 other cell lines including the liver derived cell line HepG2. A snapshot showing the ChIP- seq data from the four selected histone marks, mentioned above, can be seen in the appendix for the whole list of cell lines (see Supplementary figure 1). In order to define differential histone modification patterns a comparison with NHLF was performed, since A549 ChIP-seq data is only available in combination with a dexamethasone treatment and thus an unbiased interpretation was not possible. A comparison of the ChIP-seq data sets in Table 15 illustrates that the seDMR exhibit differences regarding H3K4m1 and H3K4me3 abundance in HepG2 cells compared to NHLF. Both histone marks show a higher occupation at the seDMR in HepG2 than in NHLF. H3K4me1 usually colocalizes with enhancers, while H3K4m3 is a characteristic histone mark that usually labels promoters [47]. No significant difference for the histone mark H3K27me3, frequently found at poised enhancers, and H3K27ac, the most prominent histone mark characterizing enhancers [47], can be observed between NHLF and HepG2 at the seDMR.

Table 15| Comparison of the abundance of selected histone modifications between NHLF and HepG2 at the seDMR

Histone modification

H3K4me1 H3K4me3 H3K27me3 H3K27ac + low signal ++ high signal NHLF + + + +++ +++ very high signal HepG2 +++ ++ + +++

Due to the fact that the histone modification patterns were found not to differ that significantly between NHLF and other cell lines considering the Broad Histone track of the UCSC Genome Browser, also the enhancer expression data, provided by the FANTOM5 project [63, 108], was used for the 52

Results selection of suitable cell lines for the 3C experiment. This enhancer expression data corresponds to CAGE data detected from bidirectional transcribed eRNAs in 432 primary cells, 135 tissue samples and 241 cell lines. For simplification, these CAGE data derived from a huge variety of cell types were grouped tissue- or organ wise, respectively [63].

Table 16 depicts the enhancer expression data for the predicted enhancer previously identified in step 3.1.1. It is indicated that from the total amount of CAGE tags, detected for this predicted enhancer region, the highest percentage derives from measurements applied on samples originating from male genitals, like penis and internal male genitalia with 12.96 % and 11.70 %. However, on the fourth place the lung can be found with 9.48 % of contribution to the total CAGE tags measured for the predicted enhancer. Additionally, statistical analysis during the FANTOM5 project evaluate this certain enhancer to be significantly overrepresented in lung beside some other tissues or organs. Interestingly, the fraction of detected CAGE tags contributed by kidney and liver derived samples are much lower (1.37 % and 0.96 % of all measured tags). Furthermore, no statistically significant overrepresentation of this predicted enhancer region was determined in these two organ groups.

Table 16| Enhancer expression data derived from CAGE data of the FANTOM5 database

1 Minimal percentage of contribution of a certain organ to the CAGE tags detected in total

Thus, HepG2, a liver derived cell line, might be a suitable cell line for comparing the interaction pattern of seDMR to its target gene(s), since slight differences in the histone modification pattern could be observed. In addition, the enhancer transcription data from the FANTOM5 project suggest that the enhancer region including the seDMR is less frequently transcribed in liver than in lung.

53

Results

Second, HEK293T cells represent an attractive cell line that might exhibit differential interaction patterns of the seDMR compared to A549 based on the fact that previous performed luciferase assays of the seDMR showed that the enhancer activity is significantly lower in the kidney derived cell line compared with A549, embodying a lung-cancer cell line. Additionally, same as for HepG2, the enhancer region covering the seDMR is less transcribed in kidney derived samples, meaning that this region might exhibit less enhancer activity in kidney cell lines.

3.2 mRNA expression of seDMR adjacent genes is not affected upon HMA treatment

In the course of a study performed by S. Kollmann (2018) the seDMR was found to be hypomethylated at a certain CpG in whole blood samples from smokers as well as from lung cancer patients [86]. In the following, mRNA expression analysis of seDMR neighbouring genes was conducted to assess putative target genes of the seDMR. It was thought that they might be identifiable by an increased mRNA expression upon treatment with AZA or DAC, respectively. The idea behind this is that global hypomethylation, thus including hypomethylation of the seDMR, might cause a enhancer activation that might affect the gene expression of the seDMR’s target gene(s). However, those analyses did not show significant changes in mRNA expression of seDMR adjacent genes in the tested lung-cancer cell lines A549 and H1299 [86, 90]. To confirm these previous findings, in this study an extended mRNA expression analysis was performed in A549 and H1299 cells. Cell treatments (72 h) as well as subsequent RNA isolation was performed by S. Rauscher (2017), who performed two to three separate experiments, using different concentrations of AZA or DAC, respectively, for cell treatment (see Table 8) [110]. mRNA expression analysis was performed for the genes COA6, TARB1, IRF2BP2, LINC01132 (lncRNA), TOMM20, RBM34, ARID4B, GGPS1 and TBCE (see Figure 1).

The results from mRNA expression analysis of each individual experiment conducted in the cell lines A549 and H1299 can be seen in the appendix (see 6.2). In general, the experiments using different sets of HMA concentrations for A549 or H1299 treatment, respectively, revealed no consistent trend regarding drug concentration and its effect on mRNA expression. Additionally, no uniform correlation regarding increase or decrease, respectively, of mRNA expression of a certain gene was determined.

Since for the experiments different sets of HMA concentrations were chosen, it is hardly possible to compare these values. However, all experiments shared one treatment concentration, namely a treatment with 0.5 µM of the corresponding HMA. Thus, the results from mRNA expression analysis from A549 and H1299 cell samples, treated with this HMA concentration were used to test reproducibility of previous findings of S. Kollmann (2017) and S. Häsler Gunnarsdóttir (2018) [86, 90].

A comparison of mRNA expression of seDMR surrounding genes between the treatment with 0.5 µM AZA or DAC, respectively, of the lung-cancer cell lines A549 and H1299 is depicted in Figure 7. In A549 54

Results for all genes, with exception of TBCE, slightly higher fold changes in mRNA expression were obtained by a treatment of the cells with DAC instead of AZA. The opposite was detected in H1299, where only COA6 showed a higher fold change in expression upon DAC treatment compared to the other HMA. Despite these observations, the general expression levels did not change significantly after treatment with any of these two HMAs compared to the untreated or DMSO treated samples, respectively, in both cell lines. The fold changes in expression for TARBP1, IRFBP2, LINC01132 and TBCE show quite a high standard deviation especially in A549 and for samples treated with DAC. The highest mRNA expression increase can be observed in A549 for COA6, showing a 1.7-fold and 1.9-fold increase of mRNA expression of samples treated with AZA or DAC, respectively. However, this effect was not observed in H1299.

55

Results

Figure 7| Results of mRNA expression analysis in A549 and H1299 The fold change in mRNA expression of genes surrounding the seDMR measured in the lung-cancer cell line A549 (upper graph) and H1299 (lower graph) is indicated. The cell cultures were treated for 72 h with 0.5 µM AZA (light grey) or 0.5 µM DAC (dark grey). The depicted values of the fold change in expression correspond to the mean of two experimental replicates. Standard deviation is depicted in black, vertical lines.

56

Results

3.3 Optimization of 3C experiment in A549

Great effort was made by S. Häsler Gunnarsdóttir in establishing the 3C method in this laboratory in order to investigate the interaction pattern of seDMR in the lung-cancer cell line A549. However, the first application of this method on this cell line indicated that some further optimization of the 3C approach is still required [90].

One subject for optimization of the 3C method concerns 3C library purification. The suggested purification via the column-based Blood DNA Mini Kit (Qiagen) was not that convenient, as the column capacity (200 µl) is much too low to purify about 7 ml of 3C library sample. A stepwise loading of the sample onto the column turned out to be too laborious on the one hand and on the other one drastically reduces the final yield of 3C library DNA.

However, 3C library purity is indispensable for proper assessment of interaction frequencies by 3C [96]. Consequently, different purification methods considering phenol-chloroform (Phe/Chl) precipitation, ethanol (EtOH) precipitation and column-based kit purification and their combinations were tested in order to obtain pure 3C libraries that might pass the purity assessment. In Figure 8 one can see a snapshot depicting the amplification curves of 3C library dilutions obtained for the purity assessment experiment by using a qRT-PCR with the internal primer pair 1. The two-fold dilutions of a 3C library, whether purified by phenol-chloroform and ethanol precipitation as well as by using these methods followed by a column-based kit purification step, caused quite varying fluorescence signals (see Figure 8 A) and B). Consequently, a wide spreading of the respective amplification curves over the Y-axis can be observed. A variation like this is an indicator that different PCR efficiencies were achieved during qRT-PCR for the respective dilutions caused by an impure DNA template. Therefore, for some dilutions of the 3C library also the measured dilution factors varied tremendously from the expected ones, depicted in Table 17 at A and B). According to Hagège et al. a maximal deviation of 30 % between expected and observed dilution factor can be tolerated [96]. Thus, phenol-chloroform and EtOH precipitation alone as well as in combination with kit purification turned out not to be suitable for purification in this context for obtaining pure 3C libraries. In contrast to this, a combination of ethanol precipitation followed by an additional purification step using the column-based NucleoSpin® gDNA clean-up kit was revealed to achieve the required 3C library purity. Less variability in fluorescence signal of the dilution’s amplification curves can be observed after application of this combination of purification steps (see Figure 8 C and D). An application of these purification methods on 3C library dilution rows, adjusted to concentrations that will be used interaction frequency assessment, resulted in almost no variation in fluorescence signal (Figure 8 D). Additionally, the deviation between expected to observed dilution factor was lowered significantly by the purification employing ethanol precipitation and column-based kit purification as depicted in Table 17 D. 57

Results

Figure 8| Amplification curves from 3C library dilutions obtained from LightCycler®96 software. A snapshot of the amplification curves indicates changes of the fluorescence signal over passed qRT- PCR cycles using the internal primer pair 1. The fluorescence was measured at two-fold dilutions of several 3C libraries, which were previously purified with A) phenol-chloroform precipitation followed by ethanol precipitation, B) phenol-chloroform and ethanol precipitation followed by column-based purification via NucleoSpin® gDNA clean-up kit, C) ethanol-precipitation followed by NucleoSpin® gDNA clean-up kit purification and D) also ethanol precipitation followed by NucleoSpin® gDNA clean- up kit purification but using lower concentrations of the 3C library (see Table 17 at D). The grey shaded triangles indicate the grade of dilution that corresponds to the amplification curves below. The wider and brighter the triangle appears the higher was the dilution of the 3C sample corresponding to the adjacent amplification curve.

58

Results

Table 17| qRT-PCR data from purity assessment of 3C libraries

A) A549 library 1, purified by phenol-chloroform and EtOH precipitation B) A549 library 1, purified by phenol-chloroform and EtOH precipitation followed by column- based kit purification C) A549 library 2, purified by ethanol precipitation followed by column-based kit purification D) A549 library 3, purified by ethanol precipitation followed by column-based kit purification, using lower concentrations (from 25 to 0.39 ng instead of 100 to 1.56 ng) of 3C library DNA

Besides 3C library purification, the generation of a BAC control library represents another issue that required optimization. The first application of the 3C method, performed by S. Häsler Gunnarsdóttir (2018), revealed some problems regarding the generation of a standard curve using the BAC control library [90]. More precisely, the assessment of the interaction frequency of the seDMR for the test 59

Results primer pairs A/22, A/23 and A/24 failed completely and for many other primer pairs the amplification of the BAC control library showed uncommonly high values for the slope and intercept, suggesting that the control library might have been too impure after phenol-chloroform and ethanol precipitation [90]. Hence, it was decided to generate a completely new BAC control library and purify this library in the same way, as the 3C libraries, namely by ethanol precipitation followed by the column-based NucleoSpin® gDNA clean-up kit.

In Figure 9 the slope values (mean, n=7) obtained from the BAC control library standard curve generated during 3C measurements for each A/X primer combination as well as for the internal primer pair is depicted. One can observe that all primer pairs exhibit a mean slope value lying within the optimal range between -3.1 and -3.6 [113] or at least quite close to it. Furthermore, the mean values for the intercept, also obtained from BAC control library standard curve, are below a Ct value of 30 and thus can be classified as reliable (see Figure 10).

Besides these two optimizations mentioned above the BAC clone RPCI-11 278O12 was used instead of CITB 2604E23, which was previously chosen in the protocol generated by S. Häsler Gunnarsdóttir (2018), since for the latter one the existence of the insert could not be verified after isolation. Additionally, for BAC clone isolation the NucleoBond® Xtra BAC kit was preferred, because higher yields could be achieved using this column-based isolation kit, compared to the Large Construct kit (Qiagen), which was used for first 3C approach by S. Häsler Gunnarsdóttir (2018) [90].

With the help of these few optimizations, described above, the 3C method was further improved in this laboratory and can be applied also to other cell lines beside A549 for assessing the interaction pattern of a region of interest.

60

Results

Figure 9| Overview of slope values obtained from BAC control library standard curves. The mean slope value is indicated for each single primer pair. The values were obtained from the standard curve generated by performing a qRT-PCR on dilutions of the BAC control library using the respective primer pairs. For averaging the slope values obtained from 7 experimental replicates were considered. Standard deviation is indicated by black vertical lines. The optimal range for the slope value is labelled in light green, zoned with a dotted line.

Figure 10| Overview of intercept values obtained from BAC control library standard curves. The intercept obtained from BAC control library standard curve during the 3C measurement is indicated. The Ct values depicted, constitute an average of seven measurements performed for each single test primer pair or internal primer pair, respectively. The standard deviation is depicted by black vertical lines.

61

Results

3.4 Replication of previous 3C experiment on seDMR in A549

A variety of investigations performed on the seDMR in silico and also in vitro suggest that this region acts as a distal enhancer with lung specificity [86, 90]. Nevertheless, the target gene(s) affected by this enhancer region have to be determined, since although previous results obtained from a first 3C approach, performed in this laboratory, showed though an interaction of the seDMR with the promoter region of the protein-coding gene TARBP1, these results must be considered with caution since the 3C library as well as the BAC control library did not fulfil all criteria allowing an unbiased 3C analysis [90].

Thus, after optimization of the 3C method, previously established in this laboratory by S. Häsler Gunnarsdóttir (2018), the method was once again applied in its optimized form to the lung-cancer cell line A549 in order to identify the interaction partners of the seDMR. An interaction of the seDMR with a certain genomic region can be determined by detecting the ligation products, obtained during 3C library preparation, using qRT-PCR. Each amplification product thus corresponds, after consideration of diverse controls described in 1.4.2, to an interaction event of the seDMR with its target site(s) [92, 96, 97].

In Figure 11 the results of 3C of the seDMR applied to A549 in three experimental replicates can be seen. The corresponding intermediate results of digestion efficiency as well as the raw data obtained by qRT-PCR while 3C assessment in this cell line can be seen in Supplementary table 1 and Supplementary table 2.

The most outstanding relative interaction frequency could be obtained for the unidirectional primer pair A/7 (see Figure 11). By investigating the corresponding region with an increased interaction frequency with the seDMR one can observe that fragment 7, containing the binding site of test primer 7, is located on an intergenic region approximately 12 kb distant from the nearest promoter region, namely the promoter of IRF2BP2. This gene lies on the centromeric site of this test primer (see Figure 3). On the other side of the test primer 7 a lncRNA (LINC00184) is situated about 7.5 kb apart. The second peak embodying an increased relative interaction frequency could be observed for fragment 17 (see Figure 11), which corresponds to an intragenic region, more precisely this fragment covers a part of intron 4 of the protein-coding gene TARBP1. However, test primer 17 hybridizes only 200 bp next to exon 4 of this gene (see Figure 3). Regarding all other tested fragments including 8, 16 and 24, which are located close to a promoter region of IRF2BP2, TARB1 or COA6, respectively, no outstanding interaction with seDMR could be observed. In contrast, for the fragments 9,12 5 and 4 a slightly but not significantly, increased relative interaction frequency was detected (mentioned order corresponds to the height of the detected values, beginning with the highest one). By evaluating the results

62

Results obtained, it is striking that the three experimental replicates all share equal interaction patterns for the tested fragments, thus reproducibility of the 3C approach could be verified using the optimized 3C protocol.

A comparison of the recently obtained data with those achieved from the first 3C application one can observe a discrepancy between the identified interaction partners of the seDMR. While the 3C of the seDMR in A549, performed by S. Häsler Gunnarsdóttir, reveals fragment 16 and 20 as those having the highest interaction abundance with the seDMR [90], this study detects fragment 7 and 17 as target sites of the distal enhancer showing the highest peaks of relative interaction frequency. Furthermore, the range in which the values of the relative interaction frequency were measured varies tremendously. While in the very first 3C application by S. Häsler Gunnarsdóttir (2018) four- or even five-digit values were measured for relative interaction frequencies between interaction partners [90], in this study the highest obtained value was 219.5 for the relative interaction frequency of fragment 7 with the anchor, i.e. the seDMR.

Figure 11|Results from 3C experiment on seDMR in A549. The relative interaction frequency, normalized to internal primer 1, of the seDMR in A549 obtained for different test primer pairs by qRT-PCR are indicated. Each purple shaded line represents the result of a single experimental replicate. Below a schematic illustration of the genomic region is depicted correlating to the approximate location of the test primer on genomic DNA. Protein-coding genes (grey boxes) with their orientation of transcription (grey arrows) and ncRNAs (beige boxes) with their location on genomic DNA are indicated.

63

Results

3.5 3C experiment in two non-lung cell lines

After optimization of the 3C approach in this laboratory it was of further interest, whether this version of the protocol can be applied to other cell lines beside A549 without problems. Additionally, it was aimed in this study to compare the interaction pattern obtained in non-lung cells with this lung-cancer cell line in order to confirm previously suggested lung specificity of the seDMR. Hence, the optimized 3C protocol was used to perform 3C experiments in a kidney cell line, HEK293T as well as in a liver derived cell line, HepG2. The selection of these two cell lines was based on previous findings obtained by in silico analysis of the seDMR in a variety of cell lines, described in 3.1.2.

3.5.1 3C of HEK293T and HepG2

The interaction pattern obtained from three 3C experiments of the seDMR in HEK293T is depicted in Figure 12. The corresponding data regarding the digestion efficiency of the tested fragments as well as the raw data obtained during 3C can be seen in Supplementary table 1 and Supplementary table 3. One can determine in Figure 12 that fragment 7, representing an intergenic region distant to any promoter region, as well as fragment 17, located on intron 4 of TARBP1 with close proximity to exon 4, revealed an enhanced relative interaction frequency to the seDMR. In comparison to these two peaks, the other tested fragments depict only low interaction abundance to the seDMR. Thus, slightly increased interaction frequencies observed for the fragments 9, 5, 12, 4 and 23 (listed from highest to lowest measured interaction frequency) are not significant and are negligible for the identification of interaction partners (see Supplementary table 3). Generally, the data obtained from three experimental replicates all share an equal trend regarding the interaction pattern for certain tested fragments, thus the results can be evaluated to be trustful.

Regarding the application of 3C in HepG2 almost the same findings regarding the detected interaction partners of the seDMR could be observed, indicated in Figure 13. Again, outstanding peaks representing a increased interaction abundance with the seDMR can be determined for fragment 7 and 17. Lower, but not significantly outstanding, increased relative interaction frequencies were also detected for the fragments 23, 5, 9, 12 and 19 (listed from highest to lowest detected interaction frequency) seen in Supplementary table 4. Due to time restriction and the fact that HepG2 grew very slow in culture only one 3C experiment was performed in this liver derived cell line. However, reliability of the obtained data is given, since other results obtained from other cell lines showed a more or less equal interaction pattern. Interestingly, the digestion conducted in the course of 3C library preparation worked best in this cell line, indicated in Supplementary table 1.

64

Results

Thus, it can be postulated that beside the application of the 3C method to A549 also trustable and reproduceable results in two other cell lines, namely HEK293T and HepG2, could be obtained. Consequently, it can be confirmed that the optimized protocol is also practicable in other cell lines.

Figure 12| Results from 3C experiment of seDMR in HEK293T. The green shaded lines indicate the detected relative interaction frequency (normalized to internal primer 1) of the seDMR to the tested fragments 1 to 24 in HEK293T. Each line represents a single experimental replicate. Below the corresponding genomic region to the tested fragments is schematically depicted. Protein-coding genes are indicated by grey boxes and their orientation of transcription is depicted by grey arrows. NcRNAs are indicated by beige boxes.

65

Results

Figure 13| Results from 3C experiment of seDMR in HepG2. The relative interaction frequency of the seDMR for the tested fragments 1 to 24 is indicated by a blue line. The data was obtained from one single 3C experiment in HepG2. Below a schematic depiction of the genomic region is indicated correlating approximately to the position of the tested fragments. Genes (grey boxes), ncRNAs (beige boxes) and the direction of gene transcription (grey arrows) are illustrated.

3.6 Comparison of interaction frequency patterns

Enhancers are known to shape cell-type specific gene transcription programs and only a selection of these regulatory elements are actually active in a certain cell type [68]. Based on previous findings the seDMR was though to act as a lung specific enhancer [86, 90]. Hence, it was hypothesized that only in lung-cancer cell lines an enhancer-promoter loop formation of the seDMR to its target site(s) can be found. In order to validate this hypothesis 3C was performed in a lung-cancer cell line A549 and in two non-lung cell lines, HEK293T and HepG2.

Previously performed luciferase assays on seDMR in A549 and HEK293T cells revealed that this region shows a much higher enhancer activity in the lung-cancer cell line compared to the kidney derived cell line [86]. Hence, it was also expected that 3C experiments conducted in these two cell lines might offer differences in the interaction pattern of the seDMR. Similar findings were expected for the liver derived cell line HepG2, based on the findings obtained by in silico analysis considering histone modification patterns at the seDMR and enhancer prediction for this region, further described in 2.7.1 and 3.1.2. However, since these analyses revealed that no clear differences in the abundance of selected histone

66

Results marks around the seDMR can be found, it was hard to estimate, whether a differential interaction pattern might be observed for this cell line.

An overlay of the interaction pattern of the seDMR detected via 3C in A549, HEK293T and HepG2 for the ROI is depicted in Figure 14. One can observe that all three cell lines share almost the same general interaction pattern for the 24 tested fragments. The highest peaks, representing an interaction with the anchor fragment including the seDMR, could be determined for fragment 7 and 17. Much lower relative interaction frequencies were measured for all other fragments like for those located close to the promoter regions of COA6, TARBP1 and IRF2BP2.

In consideration of this finding, there is no evidence for a lung specific interaction pattern, since the two non-lung cell lines share the same pattern of seDMR interactions as the lung-cancer cell line A549.

Figure 14|Comparison of the seDMR's interaction pattern between three different cell lines. The mean relative interaction frequency (normalized to internal primer 1) of the seDMR to 24 tested fragments, assessed by 3C in A549 (purple line), HEK293T (green line) and HepG2 (blue line) is indicated. The mean was calculated using the data obtained from qRT-PCR in three experimental replicates for A549 and HEK293T, while for HepG2 only one experiment was performed, thus no mean was calculated. The standard deviation is shown by a purple or green vertical line, respectively, for the cell lines A549 or HEK293T.

3.7 Examination of the interaction partners of the seDMR detected by 3C

In the course of the 3C experiment, conducted in three different cell lines, a consistent finding was made regarding the interaction partners of the seDMR. By in silico analysis using the Broad 67

Results

ChromHMM track at the UCSC Genome Browser as well as the GTRD database the distinct loci, showing an interaction with the seDMR, were examined in more detail.

As previously mentioned, the highest interaction frequency with the seDMR was observed at fragment 7. On genomic sequence this fragment, zoned by HindIII restriction sites, can be found between the protein-coding gene IRF2BP2 and the ncRNA LINC00184, or more precisely, on chromosome 1 at position 234755802 to 234757364 (Human GRCh37/hg 19). Thus, fragment 7 is located several kilobases distant to the promoter region of IRF2BP2. In order to analyse the prevalent chromatin state at this region the corresponding annotations provided by the Broad ChromHMM track were investigated. Since in this track no data is available for the tested cell line A549 instead another lung cell line, namely NHLF, was used to compare the annotations. In Figure 15 one can see that fragment 7 is categorized as a strong enhancer in HepG2, while in NHLF also an additional annotation for weak enhancer activity is prevalent at the corresponding genomic region. Unfortunately, for kidney cell lines there is no chromatin state data available in this track.

Beside chromatin state annotations, it was of further interest if certain transcription factors, associated with contribution to enhancer looping, can be observed at fragment 7 or its immediate surrounding. Hence, the GTRD database was used to search for binding sites of the TFs YY1 and CTCF, both known as structural factors [80], in enhancer regions. In Figure 16 the binding sites found for the investigated region for these TFs are indicated. While no YY1-binding sites could be found on fragment 7 but three directly next to it on the telomeric site, CTCF was found to bind on fragment 7 once, and on each fragment directly adjacent to this fragment another CTCF-binding site was seen.

The second highest relative interaction frequency with the seDMR was measured for fragment 17 in all 3C experiments conducted in the cell lines A549, HepG2 and HEK293T. In silico analysis revealed that the genomic region of this fragment is annotated as weakly transcribed in the lung cell line NHLF. Similarly, HepG2 carries the chromatin state of transcriptional transition/elongation according to the Broad ChromHMM track, seen in Figure 17. Furthermore, the distinct genomic location of fragment 17 is illustrated. One can see that this fragment can be found on intron 4 directly adjacent to the coding region of exon 4 of the gene TARBP1 with the distinct genomic position on chromosome 1 from 234601882 to 234603092 (Human GRCh37/Hg 19). As for fragment 7 also for this region the GTRD database was used to analyse TF-binding site abundance of YY1 and CTCF around the seDMR. However, while on fragment 17 no binding of the structuring TFs YY1 and CTCF could be found using the GTRD database, adjacent regions to the detected interaction site at fragment 17 actually harbour binding sites for these TFs as, depicted in Figure 18.

68

Results

Figure 15| Chromatin state annotations for fragment 7 from Broad ChromHMM track. The chromatin state annotation for fragment 7 are depicted for HepG2 (upper line) and NHLF (lower line). The annotations derived from the Broad ChromHMM trac at the UCSC Genome Browser. The genomic region of fragment 7 is zoned by a red box. (Red line = active promoter; orange line = strong enhancer; yellow line = weak enhancer; green line = weakly transcribed).

Figure 16| Schematic illustration of TF-binding sites of YY1 and CTCF on fragment 7 Pink vertical lines indicate the location of binding sites of YY1 in the surrounding of fragment 7 (orange). CTCF-binding sites are depicted by turquoise vertical lines.

Figure 17| Chromatin state annotations from Broad ChromHMM track for fragment 17 and it's genomic position. The chromatin state annotation for fragment 17 (zoned by a red box) are indicated in A). The upper line represents the annotations in HepG2, the lower line those from NHLF. (Light green = weak transcription; dark green = transcriptional transition/elongation) In B) the genomic region is depicted. The grey arrow represents a part of the protein-coding gene TARBP1 with its orientation of transcription. Black boxes indicate the exons located on the depicted region.

69

Results

Figure 18| Schematic illustration of TF-binding sites on fragment 17 for YY1 and CTCF. The genomic region containing fragment 17 (green) on TARBP1 (grey) is indicated. The exons of this gene are depicted by black boxes. In blue vertical lines the binding sites for CTCF are shown, while those for YY1 are illustrated by pink vertical lines.

Additionally, to the identified target sites of the seDMR also the anchor fragment, containing the seDMR, was analysed in silico regarding the abundance of TF-binding sites for YY1 and CTCF. In Figure 19 the binding sites found at the anchor fragment, including the sequence of the seDMR are depicted. One can see that a vast amount of CTCF binding sites as well as those for YY1 are prevalent at this genomic region, partially covering the lncRNA LINC01132, which is also localized on this fragment.

Figure 19| Schematic illustration of the distribution of TF-binding sites of YY1 and CTC within the anchor fragment. In A) the TF-binding sites of YY1 (pink vertical lines) and CTCF (turquoise vertical lines) located on the anchor fragment (brown box) are depicted. The red box labels the distinct location of the seDMR. In B) the corresponding genomic region is indicated. The red box represents the seDMR while the beige box illustrates the localization of a lncRNA LINC01132.

In order to get an overview regarding the general binding frequency of the structuring TFs YY1 and CTCF at the ROI the UCSC genome browser, especially the Uniform TFBS track for the lung-cancer cell line A549 was used, since for the other two cell lines (HEK293T and HepG2) no ChIP-seq data was available for the corresponding TFs at this track. In Figure 20 a schematic illustration depicting an 70

Results approximated abundance of YY1 and CTCF binding at the ROI is shown. It can be seen that several binding sites for both structuring TFs can be found enriched near the detected interaction sites at fragment 7 and 17, which furthermore represent sites close to the promoter region of the protein coding genes IRF2BP2 and TARBP1.

Figure 20| Schematic illustration of TF-binding sites of YY1 and CTCF at ROI. In the upper part of this figure the ROI, which was analysed by 3C is depicted. The grey and beige bars represent coding genes or lncRNAs, respectively. The red bar illustrates the location of the seDMR and the vertical red lines depict the sites where an interaction with the seDMR was detected via 3C in the three tested cell lines A549, HEK293T and HepG2. In the lower part of the figure the turquoise or pink bars, respectively, represent sites where some binding sites for CTCF or YY1, respectively, were found in the cell line A549 across the ROI using the UCSC genome browser, especially the Uniform TFBS track.

71

Discussion

4 Discussion Increasing interest in epigenetics gene regulatory effects has lead to identification of an association between altered epigenetic regulation and cancer development. However, the main focus has been on aberrant promoter methylation for a long time, since a direct correlation between promoter hypermethylation and inactivation of tumour-suppressor genes was found [114]. More recently also enhancers have come more and more into the focus of cancer research. It was observed that besides altered promoter methylation also eDMRs are frequently found in diverse cancer entities and might serve as useful biomarkers for early cancer diagnosis [13, 115].

Analysing MeDIP-Seq data from whole blood samples of monozygotic twins discordant for smoking, S. Kollmann (2017) identified such an eDMR, later termed seDMR. He further characterized this enhancer region as hypomethylated in lung cancer patients as well as in smokers [86]. This finding emphasizes that epigenetic alterations can be acquired by an unhealthy lifestyle and might pave the way towards the onset of a malignant disease like lung-cancer. Nevertheless, the identification of this seDMR, especially the finding that this region is hypomethylated in smokers and lung-cancer patients, was made, as mentioned above, by utilizing data obtained from whole blood samples. Thus, in the next step it was aimed to further investigate this region in lung or lung tumour models, respectively.

Therefore, in the course of the last two years great effort was made to identify the seDMR’s target site(s) in order to further understand the mechanism by which this enhancer region putatively might contribute to disease, with a focus on lung cancer. A cornerstone for this was laid in the last year with the establishment of the 3C method by S. Häsler Gunnarsdóttir [90].

This study aimed to expose the interaction partners of the seDMR in order to gain further insights into the functional mechanism of this putative enhancer region. Additionally, it was of interest whether this putative enhancer function, determined by the detection of enhancer-promoter loop formation via 3C, is solely observed in lung-cancer cells or if also other non-lung cell lines share equal interaction patterns of the seDMR. To investigate this, in silico analysis, mRNA expression analysis and 3C experiments were performed.

In the following the main findings are listed:

• In silico analyses showed that the seDMR is predicted as enhancer • mRNA expression analysis of two in vitro HMA treated lung-cancer cell lines revealed that hypomethylation of the seDMR does not affect the expression of genes adjacent to this region. • The optimized version of the 3C protocol allowed the analysis of a variety of cell lines.

72

Discussion

• The seDMR interacts with a region in intron 4 of TARBP1 and an intergenic region between IRF2BP2 and LINC00184 in A549, HEK293T and HepG2. • The interaction pattern of the seDMR may be not lung specific.

4.1 SeDMR’s enhancer prediction in silico

Previous in silico and in vitro investigations revealed that the seDMR exhibits enhancer activity. Especially, findings gained from luciferase assays suggested a lung specific enhancer activity of the seDMR [86, 90]. To further confirm the enhancer entity as well as lung specificity of the seDMR several analyses were performed in silico. Knowing that certain histone modifications correlate with a distinct regulatory function of the DNA on the regions where these modifications can be observed, the Broad ChromHMM track at the UCSC Genome Browser [71, 106] was used to analyse the function of the seDMR. This track annotates genomic regions corresponding to their regulatory functions based on ChIP-seq data obtained for a variety of histone marks, TF binding sites and many other factors correlating with gene regulation detected in nine selected cell lines. Using this data certain patterns regarding the abundance of these chromatin marks spread over the whole human genome can be seen. These patterns, or more precisely, certain combinations of them, are then analysed by applying a Hidden Markov Model, which enables the distinction of certain chromatin states annotating, among others, promoters and enhancers and their state of activity [71].

Using the Broad ChromHMM track to analyse the region around the seDMR regarding the predicted chromatin state it was found that in 7 out of 9 of the selected cell lines the seDMR is annotated as strong enhancer (see Figure 5). One of these cell lines, interestingly, constitutes the lung cell line NHLF, representing a lung cell line derived from normal, non-cancerous, lung tissue. Thus, it was postulated that in lung this region, including the seDMR, might act as a strong, active enhancer. However, it was striking to find the strong enhancer annotation also in a variety of non-lung cell lines, discordant to the hypothesis which assumed that the seDMR acts as a lung specific enhancer.

Besides the investigation of the chromatin state annotations of the seDMR it was of interest how this region is categorized in the FANTOM5 database. This database, including the Transcribed Enhancer Atlas ([63, 108]), lists enhancers identified by the detection of bidirectional CAGE tags obtained for regions distant to gene promoters. Thus, the region around the seDMR was selected for an enhancer search using this database. Interestingly, one hit was obtained by this strategy, effectively agreeing with the genomic region of the seDMR (see Figure 6). An alignment of the predicted enhancer regions to the seDMR revealed an overlap of 86 %, attributing an enhancer identity to the seDMR. However, two further predicted enhancers were found close to the seDMR on its telomeric side. Nevertheless, there it has to be pointed out that a huge variety of samples were used for enhancer identification by

73

Discussion the FANTOM5 database, ranging from tissue or organ samples to those from primary cells and established cell lines. For enhancer prediction, all the CAGE tags, detected in these samples were summarized and taken into consideration to identify novel enhancer sites. However, a percentage of contribution by the respective samples, previously grouped organ- or tissue-wise, respectively, to this total amount of measured CAGE tags can be read out. By doing so it was observed that only a minor percentage (9,84 %) of contribution to the total amount of CAGE tags came from lung (see Table 16). This finding emphasizes that this region is not an enhancer exclusively in lung cells but also in a variety of other cell lines. Since eRNAs, originating from enhancer transcription, correlate with enhancer activity it is also likely that in all cell lines in which CAGE tags were obtained for the seDMR, an enhancer activity can be observed [63, 64]. Thus, this finding further contradicts the original hypothesis that the seDMR is exclusively acting as an enhancer in lung cells, instead it appears to be active in a variety of tissues.

4.2 Selection of non-lung cell lines for 3C

Despite, or especially because of these findings, suggesting that the seDMR might not be a lung-specific enhancer, cell lines had to be selected for comparing the interaction pattern of the seDMR between lung and non-lung cell lines by 3C. It was hypothesized that lung enhancer specificity of the seDMR can be determined by the exclusive detection of interactions of this enhancer region with its target gene(s) in a lung derived cell line. In the case of lung specificity, it is expected that in an 3C approach the non- lung cell lines show no or at least an altered interaction pattern, compared to the tested lung cell line A549.

Based on the knowledge that enhancer activity is closely related with histone modifications, present at the putative enhancer region, in silico analysis of the seDMR was employed in order to find cell lines that might not show an enhancer activity at this region. The search for the perfect cell line, exhibiting no prominent enhancer correlated histone marks, this task turned out to be almost impossible. Firstly, due to the fact that the data available for A549, the cell line used for the following 3C comparison of the seDMR, could not be considered for comparing prevalent histone marks at the seDMR, since these data might be biased by a dexamethasone treatment performed before ChIP-seq analysis. Additionally, a majority of the analysed cell lines, for which ChIP-seq data was available at the Broad Histone Track at the UCSC Genome Browser, showed a high abundance of enhancer related histone marks colocalizing with the seDMR. Especially, the histone modifications H3K4me1 and H3K27ac, associated with enhancer activity were frequently found at the seDMR (see Supplementary figure 1). Those cells showing less coverage of these enhancer labelling histone marks (e.g. CD14+ monocytes, K562 and HeLa-S3) were not suitable as they were either primary cells or they derived from hormone sensitive

74

Discussion tissues, respectively. Due to time restriction, also the availability of the respective cell lines was considered for the selection of cell lines used for 3C comparison with A549.

Based on the observations regarding the histone marks analysed in silico, the liver derived cell line HepG2 revealed a higher abundance of H3K4 mono- and trimethylation at the seDMR, compared to the lung cell line NHLF (see Table 15). While mono-methylation of H3K4 is an indicator for enhancer activity of the respective DNA locus, trimethylation of the same lysine residue of H3 is associated with promoter function. These deviations from the NHLF cell line were not as significant as expected or desired for selection of a cell line showing opposing histone modifications at the seDMR compared to the lung cell line. However, on the basis of these little alterations HepG2 was shortlisted for the comparison of the interaction pattern with the lung cell line A549 by 3C.

In order to further strengthen the suggestion that HepG2 might show a differential interaction pattern in 3C compared to a lung cell line the FANTOM5 database was considered. In silico analysis revealed that a genomic region overlapping 86 % of the seDMR sequence was predicted to obtain an enhancer function (see Figure 6) [63, 108]. Thus, it was of interest to which extent the CAGE tags detected in liver derived cell samples contributed to the total number of detected CAGE tags obtained for the predicted enhancer region. While lung cell samples contributed of 9.48 % to the total amount of detected CAGE tags, liver derived cell samples only contributed 0.96 % (see Table 16), suggesting that in this organ the respective enhancer might not be as active as in lung and thus might also show a differential interaction pattern in 3C of the seDMR.

Similar to this finding, also kidney derived cell samples showed only 1.37 % contribution to the total CAGE tag number obtained for the enhancer region predicted by FANTOM5 (see Table 16). Together with previous results from luciferase assays, which showed less enhancer function for the seDMR region in HEK293T cells compared to A549, this pointed to a lower enhancer activity of the seDMR in the kidney derived cell line HEK293T.

Based on these in silico investigations, the liver-cancer cell line HepG2 and the embryonic kidney cell line HEK293T were selected for validation of the lung specificity of the seDMR by 3C.

4.3 Hypomethylation of the seDMR does not affect mRNA expression of adjacent genes

The main aim of this thesis was to identify the target gene(s) affected by the seDMR. The putative function of the seDMR of further increasing the target gene’s expression was assumed to be evoked by the formation of a transient enhancer-promoter loop. Before investigating these putative looping events using 3C, mRNA expression analysis was performed on genes surrounding the seDMR in order

75

Discussion to get a first hint regarding the seDMR’s target gene(s) and also to assess the effect the methylation level of the seDMR to its target genes. It was assumed that an increase in mRNA expression, caused by the enhancer activity, evoked by the hypomethylation of the seDMR, might be measured for one or more genes surrounding the seDMR, thus representing putative targets of this enhancer.

Previously, the seDMR was found to be hypomethylated in smokers as well as in lung cancer patients at a certain CpG (chr1: 234857892-234857893), not covered by the Illumina Infinium HumanMethylation450K Bead Chip [86]. Additionally, a correlation between the seDMR’s methylation level and its enhancer activity could be shown in diverse luciferase assays [86, 90] in the lung-cancer cell lines A549 and H1299. Thus, it was of further interest whether mRNA expression of putative target genes, adjacent to the seDMR, is altered upon 72 h HMA treatment in these lung-associated cell lines.

Interestingly, as previous studies [86, 90] have already shown also, extended testing of the mRNA expression, using DAC as well as AZA for treatment and applying these HMAs in varying concentrations, revealed no significant changes or trends towards an increased or decreased expression of any of the tested genes adjacent to the seDMR (see 8.2, Figure 7).

The lack of effect of the seDMR on transcription of putative target genes might be explained by the fact that HMA’s induce global hypomethylation. The cytosine-analogues, AZA and DAC, both reduce the overall methylation level of DNA by destabilizing or inhibiting, respectively, the DNMT1 during the S-Phase of the cell cycle [41, 116]. Global hypomethylation may lead to such a variety of methylation changes that in the end they might compensate each other. This may explain why it is not possible to detect an increase or decrease in mRNA expression of the seDMR putative target genes in this analytical setup.

To overcome the obstacle that global hypomethylation might camouflage the effect of the seDMR on adjacent gene transcription, targeted hypomethylation of the nine CpG sites covering the seDMR could be applied instead, enabling the investigation of the distinct and exclusive effect of this enhancer region on gene transcription.

4.4 Optimization of 3C protocol now allows application in a variety of cell lines

In the last year great effort was made in establishing the 3C method in this laboratory by S. Häsler Gunnarsdóttir (2018) [90]. However, due to time restriction several steps in the 3C protocol still required some optimization, since a first 3C approach performed in A549 did not show entirely trustworthy results. One outstanding issue affecting the reliability of the 3C results concerned the BAC control library. Dilutions of this library are utilized for the generation of a standard curve by qRT-PCR for the anchor primer paired with every single test primer (A/X). Using the slope and intercept values,

76

Discussion obtained by these standard curves, putative differences in the primer efficiencies can be compensated. However, the first 3C approach on A549 cells was unsuccessful for the primer combinations A/22, A/23 and A/24, since amplification via qRT-PCR did not work. Additionally, the slope values obtained by the first 3C experiment performed by S. Häsler Gunnarsdóttir (2018) were far beyond the optimal range for most of the primer pairs [90].

Aiming to optimize the previously acquired 3C protocol in this laboratory, the BAC control library was freshly generated. To do this, the BAC clone CTIB 2604E23, used in the first 3C approach, was replaced by the BAC clone RPCI-11 278O12, since for the first one the existence of the insert could not be verified after isolation. According to Hagège et al. gaps in or overlap of the BAC inserts should be minimized [96], but due to limited availability of certain BAC clones it was not possible to completely avoid gaps by selecting three BAC clones covering the ROI (see Figure 3). Hence, between the inserts of the three BAC clones two gaps of approximately 1100 and 2400 bp, respectively, had to be accepted, however it must be pointed out that none of the tested restriction sites lies within these gaps. Moreover, in light of the fact that the ROI investigated in the 3C experiment spans about 350 kb the gaps represent only 1 % of the analysed genomic sequence. Thus, the existence of gaps between the BAC clone inserts can be neglected and it can be assumed that an unbiased assessment of the interaction frequency at the tested fragments was guaranteed despite having gaps between the BAC clone inserts. For BAC isolation the column-based NucleoBond® Xtra BAC kit was used instead of the Large Construct kit (Qiagen), since the yield achieved with the latter one was unsatisfying. By applying the newly generated BAC control library to the 3C experiment in seven experimental replicates it could be shown that for all tested primer pairs the slope values lay within the optimal range or at least quite close to it (see Figure 9). Also, the amount of BAC control library used as template for qRT-PCR was chosen properly, since the Ct values obtained for the intercept of the control library standard curve, generated during the interaction frequency assessment, were below 30 cycles for the test primer pairs (A/X) and higher than 10 cycles for the internal primer pair, representing a loading control (see Figure 10). Thus, the protocol, regarding BAC control library preparation, is now optimized and allows an unproblematic application in the course of the 3C experiment in varying cell lines.

Another item of the previously established 3C protocol requiring optimization was the purification of the 3C libraries, since the method, recommended by S. Häsler Gunnarsdóttir (2018), using the Blood DNA Mini kit (Qiagen) turned out to be too laborious and negatively affected the final yield of the 3C library [90]. In order to overcome this obstacle, a variety of purification methods and their combinations were tested. A comparison of the obtained amplification curves generated via qRT-PCR in the course of sample purity assessment showed that ethanol precipitation followed by a column- based DNA purification using the NucleoSpin® gDNA clean-up kit are the most convenient methods for

77

Discussion achieving pure 3C libraries. By applying these purification techniques, less variation in the fluorescence signal was obtained, meaning that almost equal PCR efficiencies could be obtained during qRT-PCR for the different dilutions of the 3C library (see Figure 8). Hence, also the purity control, required according to Hagège et al., could be passed, since the deviation between the expected to the observed dilution factor lay far below the maximal allowed percentage of 30 % [96].

With help of these little changes to the protocol the establishment of the 3C method in this laboratory was further promoted. However, there are still further requirements for optimization. One item concerns the further improvement of the final yield of the 3C library after purification, since for some of the 3C libraries it was not possible to conduct the interaction frequency assessment at all or only for a selection of test primer pairs due to a lack of library DNA template.

4.5 Interaction patterns of seDMR assessed by 3C in different cell lines

The main aim of this thesis was the identification of the seDMR’s target gene(s) as part of promoting the functional analysis of this regulatory region. Additionally, an assumed lung specificity of this region should be tested. The 3C approach was chosen for these tasks, since it allows the assessment of chromosome conformation in diverse cell lines, particular for putative target genes in close physical proximity to the seDMR by comparing the interaction patterns for the seDMR between the lung-cancer cell line A549 and the non-lung cell lines HepG2 and HEK293T.

4.5.1 3C in A549 – a replication of 3C after protocol optimization

A previously performed 3C experiment of the seDMR conducted in A549 suggested that TARBP1 might be a target gene of the tested enhancer region. However, the generated data was not fully trustworthy, as discussed above in 4.4. Thus, in this study a replication of this experiment was performed in A549 utilizing the optimized version of the 3C protocol, previously established by S. Häsler Gunnarsdóttir (2018) [90].

While the first 3C experiment revealed five fragments interacting with the seDMR, namely fragment 20, 16, 5, 7 and 9 (listed from highest to lowest detected relative interaction frequency) [90], in this study, two fragments, namely fragments 7 and 17, exhibited a remarkably increased relative interaction frequency with the seDMR (see Figure 11) while for the fragments 4, 5 and 9 only insignificantly increased frequencies of interaction were detected. In A549 the highest interaction value was obtained for fragment 7 by applying 3C in three experimental replicates. Interestingly, in the past experiment performed by S. Häsler Gunnarsdóttir the same fragment revealed the fourth highest value of relative interaction frequency. The corresponding test primer 7 hybridizes on an intergenic region in the surrounding of the protein-coding gene IRF2BP2. Nevertheless, fragment 7 is located

78

Discussion several kilobases distant to any gene promoter or any known regulatory site. Fragment 17 represents the second genomic site at which an increased relative interaction frequency with the seDMR was obtained by 3C in this study. Together with fragment 20 and 16, also fragment 17 is located on TARBP1. But, while fragment 16, located directly adjacent to fragment 17, includes the promoter region of this gene, the test primers 17 and 20 hybridize on intron 4 and exon 17, respectively, of TARBP1. Thus, for these two fragments the function achieved by an interaction with the seDMR is unclear. Fragment 16, identified to interact with the seDMR in the previously performed experiment, represents the most obvious interaction partner of the seDMR, due to its location at the promoter region of TARBP1 but the interaction originally detected by applying the original, unoptimized version of the 3C protocol was not reproduced in this study. In the original study, as for many other primer pairs, also for the one including the test primer 16 worse slope values were obtained caused by an impure BAC control library [90]. Thus, it is likely that the achieved relative interaction frequency for fragment 16 might have been biased. No notable interaction with the seDMR was observed for this fragment in this study, even though slope values lying within the optimal range or at least quite close to it were achieved. Consequently, based on the 3C data obtained for A549 in three experimental replicates no typical enhancer-promoter looping of the seDMR was observed. Even though the results were somehow unexpected regarding the enhancer’s target sites, one has to keep in mind that the findings were reproduced in three experimental replicates. Additionally, the detected interaction sites of the seDMR were found at least close to the promoter sites from the protein coding genes IRF2BP2 and TARBP1. Consequently, further testing, especially at the putative target sites of the seDMR, has to be performed in order to verify the expected enhancer-promoter looping.

4.5.2 Comparison of the interaction pattern between lung and non-lung cell lines

A comparison of the results, obtained from the 3C experiments conducted in the three tested cell lines, reveals that all of them share an almost identical interaction pattern of the seDMR for the ROI. In each cell line, deriving whether from lung, liver or kidney, for fragment 7 and 17 a significantly increased relative interaction frequency of the seDMR was detected (see Figure 14). Thus, it can be postulated that the general interaction pattern of the seDMR is not specific to lung. Nevertheless, it must be considered for data interpretation that only relative interaction frequencies were obtained after loading control normalization using the internal primer pair. Consequently, the actual values do not correlate directly to the actual number of interaction events. Only the exact number of interaction events for each tested fragment would allow a comparison of the real abundance of seDMR- interactions with the corresponding tested fragments between different cell lines or differentially treated cells. This is also why no standard deviations are depicted for the averaged relative interaction values in Figure 14, since otherwise the data might be misinterpreted. By implementing another

79

Discussion normalization using a control region, which is known to obtain no or at least constant interaction frequencies in all cell-types, a direct comparison of the actual number of interactions, correlating with enhancer activity, for each tested fragment could be obtained. Considering this fact, it can be suggested that even though the general interaction pattern was shown not to be lung specific, the interaction events of the lung-cancer cell line to fragment 7 and 17 might be more frequent than in non-lung cell lines. Thus, a higher activity of the seDMR might be observed in lung than in the other two cell lines. To confirm this assumption, normalization against a constantly interacting region, as for instance at the housekeeping gene ERCC3, should be performed. In order to do so, a complete new 3C experiment would have to be designed targeting the region surrounding the ERCC3 gene including the generation of another BAC control library for the corresponding region.

4.6 Evaluation of the detected interaction partners of the seDMR

4.6.1 Reliability of the generated data

A previously performed 3C test approach, aiming to find the target gene(s) of the of the seDMR, did not reveal trustworthy results. The main reason why the obtained data cannot be rated as reliable represents the fact that unusually high slope values from the control library’s standard curve were obtained in qRT-PCR [90]. Since these values are used to correct differential primer efficiencies in the course of interaction assessment by 3C [96, 97], this will have resulted in wrongly elevated, relative interaction frequencies for certain fragments [90]. This assumption gave occasion to further optimize the previously established 3C protocol in this laboratory, including the generation of a completely new BAC control library.

In this study, the highest relative interaction frequency of the seDMR was detected in all three tested cell lines by using the primer pair A/7. Interestingly, fragment 7 already aroused attention in the first test approach of the 3C experiment in A549 conducted by S. Häsler Gunnarsdóttir (2018). But in contrast, in the previously performed 3C approach fragment 7 showed only the fourth highest interaction frequency with the seDMR. Nevertheless, by analysing the slope values obtained for the fragments showing a higher interaction frequency in this corresponding experiment one can see that fragment 7 represents the first one exhibiting an increased relative interaction frequency in combination with a trustworthy slope value. Based on this finding, and the fact that the same result was obtained in several independently conducted experiments in this study, an actual interaction of the seDMR with fragment 7 is quite likely.

Same can be assumed for the interaction of the seDMR with fragment 17, for which unfortunately no data is available from the first 3C test approach. Nevertheless, the slope values obtained for this

80

Discussion fragment in the seven conducted experiments all lay around or even within the optimal range between -3.1 and -3.6 [113].

By considering the slope values obtained for each tested fragment, one can assume that reliable 3C data was generated for all the 3C experiments conducted in this study in three different cell lines. Nevertheless, it has to be pointed out that in the course of digestion efficiency assessment for some fragments the recommended minimal digestion efficiency of 60 % was not achieved [96]. Unfortunately, one of these fragments represents fragment 7 for which a digestion efficiency below this recommended percentage was obtained for 3C libraries of A549 and HEK293T samples (see Supplementary table 1). In contrast, for HepG2, the cell line for which the overall best digestion efficiencies were detected, fragment 7 also ranged within the recommended efficiency percentage. Since both, fragment 7 (bad digestion efficiency) and fragment 17 (good digestion efficiency), showed an increased interaction frequency it might be suggested that the digestion efficiency only slightly influences the final outcome regarding the relative interaction frequency. This assumption is also emphasized by the fact that in HepG2, where consistently good digestion efficiencies were observed, equal results concerning the seDMR’s target sites was achieved.

Finally, the anchor fragment not only covers the seDMR but also harbours a major part of the LINC01132. (see Figure 19). Thus, it cannot be completely excluded that the interaction was evoked by the genomic region of the corresponding lncRNA instead of the seDMR. In order to test this a differential experimental design of the 3C approach would be required.

4.6.2 In silico analysis of the detected interaction sites of seDMR

In silico analysis revealed that the interaction of the seDMR with fragment 7 does not correspond to the typical interaction with a promoter site, but with an intergenic region, several kilobases distant to the promoter region of gene IRF2BP2 on the centromeric site and the ncRNA LINC00184 on the telomeric site. Fragment 7 is in fact annotated as strong enhancer. Thus, it was of further interest which function might be acquired by this type of enhancer looping.

As already mentioned in 1.2, enhancer looping might not be an active process, instead it is more likely that in the course of constant rearrangement of the chromatin the accidental encounter of the enhancer and its target site is transiently fixated [80]. This short fixation may be the result of dimerization of certain TFs, such as YY1 and CTCF, bound at the respective sites which interact with each other [66, 80]. Interestingly, in silico analysis using the GTRD database revealed that fragment 7 as well at the anchor fragment, which contains the seDMR, indeed exhibit binding motives for the TFs YY1 and CTCF, among many other TFs (see Figure 16 and Figure 19). These two TFs are well known to be involved in chromatin structuring by supporting the formation of the enhancer loop towards its 81

Discussion target site by dimerization [80, 117]. The fact that also the fragment 7 as well as the anchor fragment, exhibit binding sites for these two TFs makes it likely that the detected interaction was stabilized by these factors. However, given that the interaction on fragment 7 does not correspond to the expected loop formation towards a gene promoter, and the fact that no alternative TSS (transcription start site) could be found at this site, it remains unclear which function is achieved upon this type of interaction with an intergenic site of DNA.

Regarding fragment 17, which showed the second highest relative interaction frequency with the seDMR in all 3C experiments, a closer look on the genomic region of this fragment revealed that it is located on intron 4 of the protein-coding gene TARBP1. However, interestingly, this restriction fragment, harbouring the binding site of test primer 17, lies only few base pairs apart from the gene’s fourth exon. Due to this fact it is not surprising that the corresponding genomic region is annotated as a transcribed region by the Broad ChromHMM track. As for fragment 7, again this interaction does not represent the expected enhancer-promoter interaction but an enhancer-gene body interaction. Thus, one can assume that looping of the seDMR to the gene body of TARBP1 might in some way support or promote, respectively, transcriptional elongation. To further prove this additional functional analysis is required. On the other hand, it is also important to mention that fragment 17 can be found directly adjacent to fragment 16, which harbours the promoter region of TARBP1. Thus, it can also not completely be excluded that also fragment 17 somehow represents an enhancer-promoter interaction.

4.6.3 TARBP1 – the putative target gene of the seDMR

The trans-activation response RNA-binding protein (TARBP1) gene, more precisely intron 4 or exon 4, respectively, was found in this study to represent a target site of the seDMR. The gene’s product, a cellular dsRNA binding protein, regulates the expression of the human immunodeficiency virus type 1 (HIV-1) promoter as well as those from other viruses like the simian virus 40 (SV40) [118, 119]. Thereby this protein acts as an iRNA and in addition represents a co-factor for DICER, hence contributing to miRNA formation. Interestingly, in context of lung cancer an increased expression level of the TARBP1 gene was recently found to serve as a potential biomarker enabling earlier diagnosis of this malignant disease [120]. This makes it tempting to speculate the seDMR, which was found to be hypomethylated in lung cancer patients and smokers, might contribute to this change towards an increased gene expression of TARBP1.

82

Discussion

4.7 Outlook

In order to confirm and expand the knowledge gained from this study so far, the following further investigations should be carried out.

One subject which requires further work concerns the mRNA expression analysis of the genes surrounding the seDMR, since no notable changes were observed despite the fact that an interaction with TARBP1 was detected for this enhancer region. Thus, a repetition of this experiment should be performed but with a change in the general experimental design. For the distinct analysis of the effect of seDMR hypomethylation on gene transcription it might be useful to apply a targeted method to induce hypomethylation of the seDMR instead of global methylating HMAs. Thus, the exclusive effect of the seDMR on gene expression can be assessed without having a putative compensating effect caused by global methylation changes. Furthermore, tailored to the finding that TARBP1 represents an interaction partner of the seDMR new primers should be designed hybridizing downstream to intron 4 of the gene, the detected site of seDMR interaction, in order to examine the effect on gene expression upon seDMR interaction. In case that this changes in the experimental setup lead to the detection of a change in the mRNA expression of the respective gene a hint towards a so far unknown alternative TSS may can be achieved as well as the proof that the interaction of the seDMR with the fourth intron of TARBP1 leads to an increased gene transcription activity.

Next, the main subject for further development obviously represents the 3C approach. First, there is to point out that there is further need of optimization regarding the method’s protocol. In the process of 3C library preparation for the final interaction assessment the problem arose that too little DNA library remains after the purification steps via ethanol precipitation and kit purification. Thus, it is recommended to further test other purification methods in order to increase the final yield of 3C library DNA.

In order to assess the interaction partners of the seDMR by 3C Naumova et al. recommended to use unidirectional primer pairs in order to exclude head-to-head ligation product amplification [95]. In order to confirm the observed interaction of the seDMR with fragment 7 and 17, it might be interesting if the use of reverse primers, instead of forward primers, for the interaction assessment leads to the same results. By doing so one has to consider that the tested fragments do not correspond identically to those tested by using the other primer orientation. However, at least they represent the restriction fragment right handed to the previously tested one. Another strategy for confirming the results obtained by 3C would be to use other techniques than 3C. For instance, 4C, a one versus all approach would provide a genome wide interaction map of the seDMR and its targets. In contrast, the 3C data could verified by the usage of a completely other method. For instance, the detection of proteins

83

Discussion bound to the site of interaction might promote a correct interpretation of the obtained results. Furthermore, as already broached, to further confirm the interaction partners of the seDMR determined by 3C it might be necessary to design another 3C experiment using another anchor primer, as it was used in this study. By doing so, the possibility that the measured interactions were evoked by the lncRNA LINC01132 instead of the seDMR can be excluded. In addition, a novel design of the 3C experiment, for instance, by using a 4-base cutter restriction enzyme instead of the six-base cutter HindIII would provide a higher resolution since smaller restriction fragments are available for further testing. Consequently, fragments around the detected interaction sites as well at the promoter sites of the tested genes can be analysed regarding their interaction frequency with the seDMR more precisely. In addition, the employment of another restriction enzyme would allow the generation of an anchor fragment containing exclusively one predicted enhancer region.

One further big issue in this thesis was the confirmation of lung specificity of the seDMR. Since, the interaction patterns of the seDMR were shown to be almost equal in the lung cell line A549 compared to non-lung cell lines as HEK293T and HepG2 one can assume that this enhancer function is not specific to lung. Nevertheless, the application of the 3C method to another lung-cancer cell line as for instance H1299 or the non-tumour lung cell line BEAS2B might be useful to further evaluate lung specificity. Additionally, there is an urgent need for another normalization, which enables the direct comparison between different cell lines regarding the actual number of interaction events for a certain tested fragment. Therefore, the design of another 3C experiment is required for a region, preferentially surrounding a housekeeping gene like ERCC3. Regarding the choice of the perfect region one has to consider that the 3C has to be applied to a genomic region showing no or at least a constant interaction pattern with the tested anchor fragment. By doing so a quantitative comparison of the interaction frequencies with the seDMR is enabled and thus further insights regarding the seDMR’s activity in different cell lines might be acquired and might then allow a better understanding of the specificity of this enhancer region.

84

References

5 References 1. Allis, C.D. and T. Jenuwein, The molecular hallmarks of epigenetic control. Nat Rev Genet, 2016. 17(8): p. 487-500. 2. Goldberg, A.D., C.D. Allis, and E. Bernstein, Epigenetics: a landscape takes shape. Cell, 2007. 128(4): p. 635-8. 3. Felsenfeld, G., A brief history of epigenetics. Cold Spring Harb Perspect Biol, 2014. 6(1). 4. Waddington, C.H., The epigenotype. 1942. Int J Epidemiol, 2012. 41(1): p. 10-3. 5. Slack, J.M., Conrad Hal Waddington: the last Renaissance biologist? Nat Rev Genet, 2002. 3(11): p. 889-95. 6. Tronick, E. and R.G. Hunter, Waddington, Dynamic Systems, and Epigenetics. Front Behav Neurosci, 2016. 10: p. 107. 7. Flavahan, W.A., E. Gaskell, and B.E. Bernstein, Epigenetic plasticity and the hallmarks of cancer. Science, 2017. 357(6348). 8. Holliday, R., The inheritance of epigenetic defects. Science, 1987. 238(4824): p. 163-70. 9. Sharma, S., T.K. Kelly, and P.A. Jones, Epigenetics in cancer. Carcinogenesis, 2010. 31(1): p. 27-36. 10. Suzuki, M.M. and A. Bird, DNA methylation landscapes: provocative insights from epigenomics. Nat Rev Genet, 2008. 9(6): p. 465-76. 11. Jones, P.A. and S.B. Baylin, The fundamental role of epigenetic events in cancer. Nat Rev Genet, 2002. 3(6): p. 415-28. 12. You, J.S. and P.A. Jones, Cancer genetics and epigenetics: two sides of the same coin? Cancer Cell, 2012. 22(1): p. 9-20. 13. Bell, R.E., T. Golan, D. Sheinboim, H. Malcov, D. Amar, A. Salamon, T. Liron, S. Gelfman, Y. Gabet, R. Shamir, and C. Levy, Enhancer methylation dynamics contribute to cancer plasticity and patient mortality. Genome Res, 2016. 26(5): p. 601-11. 14. Bird, A., DNA methylation patterns and epigenetic memory. Genes Dev, 2002. 16(1): p. 6-21. 15. Bestor, T.H., The DNA methyltransferases of mammals. Hum Mol Genet, 2000. 9(16): p. 2395-402. 16. Jones, P.A. and G. Liang, Rethinking how DNA methylation patterns are maintained. Nat Rev Genet, 2009. 10(11): p. 805-11. 17. Jones, P.A., Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat Rev Genet, 2012. 13(7): p. 484-92. 18. Ito, S., L. Shen, Q. Dai, S.C. Wu, L.B. Collins, J.A. Swenberg, C. He, and Y. Zhang, Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5- carboxylcytosine. Science, 2011. 333(6047): p. 1300-3. 19. Hill, P.W., R. Amouroux, and P. Hajkova, DNA demethylation, Tet proteins and 5- hydroxymethylcytosine in epigenetic reprogramming: an emerging complex story. Genomics, 2014. 104(5): p. 324-33.

85

References

20. Varriale, A. and G. Bernardi, Distribution of DNA methylation, CpGs, and CpG islands in human isochores. Genomics, 2010. 95(1): p. 25-8. 21. K, L.D.H. and M.E. R, DNA methylation: a form of epigenetic control of gene expression. The Obstetrician & Gynaecologist, 2010. 12(1): p. 37-42. 22. Bird, A.P., CpG-rich islands and the function of DNA methylation. Nature, 1986. 321(6067): p. 209-13. 23. Long, M.D., D.J. Smiraglia, and M.J. Campbell, The Genomic Impact of DNA CpG Methylation on Gene Expression; Relationships in Prostate Cancer. Biomolecules, 2017. 7(1). 24. Lovkvist, C., I.B. Dodd, K. Sneppen, and J.O. Haerter, DNA methylation in human epigenomes depends on local topology of CpG sites. Nucleic Acids Res, 2016. 44(11): p. 5123-32. 25. Ehrlich, M., M.A. Gama-Sosa, L.H. Huang, R.M. Midgett, K.C. Kuo, R.A. McCune, and C. Gehrke, Amount and distribution of 5-methylcytosine in human DNA from different types of tissues of cells. Nucleic Acids Res, 1982. 10(8): p. 2709- 21. 26. Antequera, F., Structure, function and evolution of CpG island promoters. Cell Mol Life Sci, 2003. 60(8): p. 1647-58. 27. Williams, K., J. Christensen, and K. Helin, DNA methylation: TET proteins- guardians of CpG islands? EMBO Rep, 2011. 13(1): p. 28-35. 28. Bird, A., M. Taggart, M. Frommer, O.J. Miller, and D. Macleod, A fraction of the mouse genome that is derived from islands of nonmethylated, CpG-rich DNA. Cell, 1985. 40(1): p. 91-9. 29. Saxonov, S., P. Berg, and D.L. Brutlag, A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters. Proc Natl Acad Sci U S A, 2006. 103(5): p. 1412-7. 30. Bogdanovic, O. and G.J. Veenstra, DNA methylation and methyl-CpG binding proteins: developmental requirements and function. Chromosoma, 2009. 118(5): p. 549-65. 31. Plass, C. and P.D. Soloway, DNA methylation, imprinting and cancer. Eur J Hum Genet, 2002. 10(1): p. 6-16. 32. Cotton, A.M., E.M. Price, M.J. Jones, B.P. Balaton, M.S. Kobor, and C.J. Brown, Landscape of DNA methylation on the X chromosome reflects CpG density, functional chromatin state and X-chromosome inactivation. Hum Mol Genet, 2015. 24(6): p. 1528-39. 33. Sharp, A.J., E. Stathaki, E. Migliavacca, M. Brahmachary, S.B. Montgomery, Y. Dupre, and S.E. Antonarakis, DNA methylation profiles of human active and inactive X chromosomes. Genome Res, 2011. 21(10): p. 1592-600. 34. Lee, K.W. and Z. Pausova, Cigarette smoking and DNA methylation. Front Genet, 2013. 4: p. 132. 35. Lim, U. and M.-A. Song, Dietary and Lifestyle Factors of DNA Methylation, in Cancer Epigenetics: Methods and Protocols, R.G. Dumitrescu and M. Verma, Editors. 2012, Humana Press: Totowa, NJ. p. 359-376.

86

References

36. Breitling, L.P., R. Yang, B. Korn, B. Burwinkel, and H. Brenner, Tobacco-smoking- related differential DNA methylation: 27K discovery and replication. Am J Hum Genet, 2011. 88(4): p. 450-7. 37. Sieber, O.M., S.R. Tomlinson, and I.P. Tomlinson, Tissue, cell and stage specificity of (epi)mutations in cancers. Nat Rev Cancer, 2005. 5(8): p. 649-55. 38. Bohl, S.R., L. Bullinger, and F.G. Rucker, Epigenetic therapy: azacytidine and decitabine in acute myeloid leukemia. Expert Rev Hematol, 2018. 11(5): p. 361- 371. 39. Pleyer, L. and R. Greil, Digging deep into "dirty" drugs - modulation of the methylation machinery. Drug Metab Rev, 2015. 47(2): p. 252-79. 40. Navada, S.C., J. Steinmann, M. Lubbert, and L.R. Silverman, Clinical development of demethylating agents in hematology. J Clin Invest, 2014. 124(1): p. 40-6. 41. Hagemann, S., O. Heil, F. Lyko, and B. Brueckner, Azacytidine and decitabine induce gene-specific and non-random DNA demethylation in human cancer cell lines. PLoS One, 2011. 6(3): p. e17388. 42. Lei, Y., X. Zhang, J. Su, M. Jeong, M.C. Gundry, Y.-H. Huang, Y. Zhou, W. Li, and M.A. Goodell, Targeted DNA methylation in vivo using an engineered dCas9- MQ1 fusion protein. Nature Communications, 2017. 8: p. 16026. 43. Luger, K., A.W. Mader, R.K. Richmond, D.F. Sargent, and T.J. Richmond, Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature, 1997. 389(6648): p. 251-60. 44. Lawrence, M., S. Daujat, and R. Schneider, Lateral Thinking: How Histone Modifications Regulate Gene Expression. Trends Genet, 2016. 32(1): p. 42-56. 45. Kouzarides, T., Chromatin modifications and their function. Cell, 2007. 128(4): p. 693-705. 46. Creyghton, M.P., A.W. Cheng, G.G. Welstead, T. Kooistra, B.W. Carey, E.J. Steine, J. Hanna, M.A. Lodato, G.M. Frampton, P.A. Sharp, L.A. Boyer, R.A. Young, and R. Jaenisch, Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc Natl Acad Sci U S A, 2010. 107(50): p. 21931-6. 47. Calo, E. and J. Wysocka, Modification of enhancer chromatin: what, how, and why? Mol Cell, 2013. 49(5): p. 825-37. 48. Greer, E.L. and Y. Shi, Histone methylation: a dynamic mark in health, disease and inheritance. Nature reviews. Genetics, 2012. 13(5): p. 343-357. 49. Quina, A.S., M. Buschbeck, and L. Di Croce, Chromatin structure and epigenetics. Biochem Pharmacol, 2006. 72(11): p. 1563-9. 50. Palazzo, A.F. and E.S. Lee, Non-coding RNA: what is functional and what is junk? Front Genet, 2015. 6: p. 2. 51. Kopp, F. and J.T. Mendell, Functional Classification and Experimental Dissection of Long Noncoding RNAs. Cell, 2018. 172(3): p. 393-407. 52. Djebali, S., C.A. Davis, A. Merkel, A. Dobin, T. Lassmann, A. Mortazavi, A. Tanzer, J. Lagarde, W. Lin, F. Schlesinger, C. Xue, G.K. Marinov, J. Khatun, B.A. Williams, C. Zaleski, J. Rozowsky, M. Roder, F. Kokocinski, R.F. Abdelhamid, T. Alioto, I. Antoshechkin, M.T. Baer, N.S. Bar, P. Batut, K. Bell, I. Bell, S. Chakrabortty, X.

87

References

Chen, J. Chrast, J. Curado, T. Derrien, J. Drenkow, E. Dumais, J. Dumais, R. Duttagupta, E. Falconnet, M. Fastuca, K. Fejes-Toth, P. Ferreira, S. Foissac, M.J. Fullwood, H. Gao, D. Gonzalez, A. Gordon, H. Gunawardena, C. Howald, S. Jha, R. Johnson, P. Kapranov, B. King, C. Kingswood, O.J. Luo, E. Park, K. Persaud, J.B. Preall, P. Ribeca, B. Risk, D. Robyr, M. Sammeth, L. Schaffer, L.H. See, A. Shahab, J. Skancke, A.M. Suzuki, H. Takahashi, H. Tilgner, D. Trout, N. Walters, H. Wang, J. Wrobel, Y. Yu, X. Ruan, Y. Hayashizaki, J. Harrow, M. Gerstein, T. Hubbard, A. Reymond, S.E. Antonarakis, G. Hannon, M.C. Giddings, Y. Ruan, B. Wold, P. Carninci, R. Guigo, and T.R. Gingeras, Landscape of transcription in human cells. Nature, 2012. 489(7414): p. 101-8. 53. Sato, F., S. Tsuchiya, S.J. Meltzer, and K. Shimizu, MicroRNAs and epigenetics. The FEBS Journal, 2011. 278(10): p. 1598-1609. 54. Nakahara, K. and R.W. Carthew, Expanding roles for miRNAs and siRNAs in cell regulation. Current Opinion in Cell Biology, 2004. 16(2): p. 127-133. 55. Louro, R., A.S. Smirnova, and S. Verjovski-Almeida, Long intronic noncoding RNA transcription: expression noise or expression choice? Genomics, 2009. 93(4): p. 291-8. 56. DiStefano, J.K., The Emerging Role of Long Noncoding RNAs in Human Disease, in Disease Gene Identification: Methods and Protocols, J.K. DiStefano, Editor. 2018, Springer New York: New York, NY. p. 91-110. 57. Sanchez Calle, A., Y. Kawamura, Y. Yamamoto, F. Takeshita, and T. Ochiya, Emerging roles of long non-coding RNA in cancer. Cancer science, 2018. 109(7): p. 2093-2100. 58. Geisler, S. and J. Coller, RNA in unexpected places: long non-coding RNA functions in diverse cellular contexts. Nature reviews. Molecular cell biology, 2013. 14(11): p. 699-712. 59. Morlando, M. and A. Fatica, Alteration of Epigenetic Regulation by Long Noncoding RNAs in Cancer. International journal of molecular sciences, 2018. 19(2): p. 570. 60. Hanly, D.J., M. Esteller, and M. Berdasco, Interplay between long non-coding RNAs and epigenetic machinery: emerging targets in cancer? Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 2018. 373(1748): p. 20170074. 61. Latos, P.A., F.M. Pauler, M.V. Koerner, H.B. Şenergin, Q.J. Hudson, R.R. Stocsits, W. Allhoff, S.H. Stricker, R.M. Klement, K.E. Warczok, K. Aumayr, P. Pasierbek, and D.P. Barlow, <em>Airn</em> Transcriptional Overlap, But Not Its lncRNA Products, Induces Imprinted <em>Igf2r</em> Silencing. Science, 2012. 338(6113): p. 1469. 62. Kim, T.-K., M. Hemberg, and J.M. Gray, Enhancer RNAs: a class of long noncoding RNAs synthesized at enhancers. Cold Spring Harbor perspectives in biology. 7(1): p. a018622-a018622. 63. Andersson, R., C. Gebhard, I. Miguel-Escalada, I. Hoof, J. Bornholdt, M. Boyd, Y. Chen, X. Zhao, C. Schmidl, T. Suzuki, E. Ntini, E. Arner, E. Valen, K. Li, L. Schwarzfischer, D. Glatz, J. Raithel, B. Lilje, N. Rapin, F.O. Bagger, M. Jørgensen,

88

References

P.R. Andersen, N. Bertin, O. Rackham, A.M. Burroughs, J.K. Baillie, Y. Ishizu, Y. Shimizu, E. Furuhata, S. Maeda, Y. Negishi, C.J. Mungall, T.F. Meehan, T. Lassmann, M. Itoh, H. Kawaji, N. Kondo, J. Kawai, A. Lennartsson, C.O. Daub, P. Heutink, D.A. Hume, T.H. Jensen, H. Suzuki, Y. Hayashizaki, F. Müller, F.C. The, A.R.R. Forrest, P. Carninci, M. Rehli, and A. Sandelin, An atlas of active enhancers across human cell types and tissues. Nature, 2014. 507: p. 455. 64. Mikhaylichenko, O., V. Bondarenko, D. Harnett, I.E. Schor, M. Males, R.R. Viales, and E.E.M. Furlong, The degree of enhancer or promoter activity is reflected by the levels and directionality of eRNA transcription. Genes & development, 2018. 32(1): p. 42-57. 65. Ko, J.Y., S. Oh, and K.H. Yoo, Functional Enhancers As Master Regulators of Tissue-Specific Gene Regulation and Cancer Development. Molecules and cells, 2017. 40(3): p. 169-177. 66. Pennacchio, L.A., W. Bickmore, A. Dean, M.A. Nobrega, and G. Bejerano, Enhancers: five essential questions. Nature reviews. Genetics, 2013. 14(4): p. 288-295. 67. The, E.P.C., I. Dunham, A. Kundaje, S.F. Aldred, P.J. Collins, C.A. Davis, F. Doyle, C.B. Epstein, S. Frietze, J. Harrow, R. Kaul, J. Khatun, B.R. Lajoie, S.G. Landt, B.-K. Lee, F. Pauli, K.R. Rosenbloom, P. Sabo, A. Safi, A. Sanyal, N. Shoresh, J.M. Simon, L. Song, N.D. Trinklein, R.C. Altshuler, E. Birney, J.B. Brown, C. Cheng, S. Djebali, X. Dong, I. Dunham, J. Ernst, T.S. Furey, M. Gerstein, B. Giardine, M. Greven, R.C. Hardison, R.S. Harris, J. Herrero, M.M. Hoffman, S. Iyer, M. Kellis, J. Khatun, P. Kheradpour, A. Kundaje, T. Lassmann, Q. Li, X. Lin, G.K. Marinov, A. Merkel, A. Mortazavi, S.C.J. Parker, T.E. Reddy, J. Rozowsky, F. Schlesinger, R.E. Thurman, J. Wang, L.D. Ward, T.W. Whitfield, S.P. Wilder, W. Wu, H.S. Xi, K.Y. Yip, J. Zhuang, B.E. Bernstein, E. Birney, I. Dunham, E.D. Green, C. Gunter, M. Snyder, M.J. Pazin, R.F. Lowdon, L.A.L. Dillon, L.B. Adams, C.J. Kelly, J. Zhang, J.R. Wexler, E.D. Green, P.J. Good, E.A. Feingold, B.E. Bernstein, E. Birney, G.E. Crawford, J. Dekker, L. Elnitski, P.J. Farnham, M. Gerstein, M.C. Giddings, T.R. Gingeras, E.D. Green, R. Guigó, R.C. Hardison, T.J. Hubbard, M. Kellis, W.J. Kent, J.D. Lieb, E.H. Margulies, R.M. Myers, M. Snyder, J.A. Stamatoyannopoulos, S.A. Tenenbaum, Z. Weng, K.P. White, B. Wold, J. Khatun, Y. Yu, J. Wrobel, B.A. Risk, H.P. Gunawardena, H.C. Kuiper, C.W. Maier, L. Xie, X. Chen, M.C. Giddings, B.E. Bernstein, C.B. Epstein, N. Shoresh, J. Ernst, P. Kheradpour, T.S. Mikkelsen, S. Gillespie, A. Goren, O. Ram, X. Zhang, L. Wang, R. Issner, M.J. Coyne, T. Durham, M. Ku, T. Truong, L.D. Ward, R.C. Altshuler, M.L. Eaton, M. Kellis, S. Djebali, C.A. Davis, A. Merkel, A. Dobin, T. Lassmann, A. Mortazavi, A. Tanzer, J. Lagarde, W. Lin, F. Schlesinger, C. Xue, G.K. Marinov, J. Khatun, B.A. Williams, C. Zaleski, J. Rozowsky, M. Röder, F. Kokocinski, R.F. Abdelhamid, T. Alioto, I. Antoshechkin, M.T. Baer, P. Batut, I. Bell, K. Bell, S. Chakrabortty, X. Chen, J. Chrast, J. Curado, T. Derrien, J. Drenkow, E. Dumais, J. Dumais, R. Duttagupta, M. Fastuca, K. Fejes-Toth, P. Ferreira, S. Foissac, M.J. Fullwood, H. Gao, D. Gonzalez, A. Gordon, H.P. Gunawardena, C. Howald, S. Jha, R. Johnson, P. Kapranov, B. King, C. Kingswood, G. Li, O.J. Luo, E. Park, J.B. Preall, K. Presaud, P. Ribeca, B.A. Risk,

89

References

D. Robyr, X. Ruan, M. Sammeth, K.S. Sandhu, L. Schaeffer, L.-H. See, A. Shahab, J. Skancke, A.M. Suzuki, H. Takahashi, H. Tilgner, D. Trout, N. Walters, H. Wang, J. Wrobel, Y. Yu, Y. Hayashizaki, J. Harrow, M. Gerstein, T.J. Hubbard, A. Reymond, S.E. Antonarakis, G.J. Hannon, M.C. Giddings, Y. Ruan, B. Wold, P. Carninci, R. Guigó, T.R. Gingeras, K.R. Rosenbloom, C.A. Sloan, K. Learned, V.S. Malladi, M.C. Wong, G.P. Barber, M.S. Cline, T.R. Dreszer, S.G. Heitner, D. Karolchik, W.J. Kent, V.M. Kirkup, L.R. Meyer, J.C. Long, M. Maddren, B.J. Raney, T.S. Furey, L. Song, L.L. Grasfeder, P.G. Giresi, B.-K. Lee, A. Battenhouse, N.C. Sheffield, J.M. Simon, K.A. Showers, A. Safi, D. London, A.A. Bhinge, C. Shestak, M.R. Schaner, S. Ki Kim, Z.Z. Zhang, P.A. Mieczkowski, J.O. Mieczkowska, Z. Liu, R.M. McDaniell, Y. Ni, N.U. Rashid, M.J. Kim, S. Adar, Z. Zhang, T. Wang, D. Winter, D. Keefe, E. Birney, V.R. Iyer, J.D. Lieb, G.E. Crawford, G. Li, K.S. Sandhu, M. Zheng, P. Wang, O.J. Luo, A. Shahab, M.J. Fullwood, X. Ruan, Y. Ruan, R.M. Myers, F. Pauli, B.A. Williams, J. Gertz, G.K. Marinov, T.E. Reddy, J. Vielmetter, E. Partridge, D. Trout, K.E. Varley, C. Gasper, A. Bansal, S. Pepke, P. Jain, H. Amrhein, K.M. Bowling, M. Anaya, M.K. Cross, B. King, M.A. Muratet, I. Antoshechkin, K.M. Newberry, K. McCue, A.S. Nesmith, K.I. Fisher-Aylor, B. Pusey, G. DeSalvo, S.L. Parker, S. Balasubramanian, N.S. Davis, S.K. Meadows, T. Eggleston, C. Gunter, J.S. Newberry, S.E. Levy, D.M. Absher, A. Mortazavi, W.H. Wong, B. Wold, M.J. Blow, A. Visel, L.A. Pennachio, L. Elnitski, E.H. Margulies, S.C.J. Parker, H.M. Petrykowska, A. Abyzov, B. Aken, D. Barrell, G. Barson, A. Berry, A. Bignell, V. Boychenko, G. Bussotti, J. Chrast, C. Davidson, T. Derrien, G. Despacio-Reyes, M. Diekhans, I. Ezkurdia, A. Frankish, J. Gilbert, J.M. Gonzalez, E. Griffiths, R. Harte, D.A. Hendrix, C. Howald, T. Hunt, I. Jungreis, M. Kay, E. Khurana, F. Kokocinski, J. Leng, M.F. Lin, J. Loveland, Z. Lu, D. Manthravadi, M. Mariotti, J. Mudge, G. Mukherjee, C. Notredame, B. Pei, J.M. Rodriguez, G. Saunders, A. Sboner, S. Searle, C. Sisu, C. Snow, C. Steward, A. Tanzer, E. Tapanari, M.L. Tress, M.J. van Baren, N. Walters, S. Washietl, L. Wilming, A. Zadissa, Z. Zhang, M. Brent, D. Haussler, M. Kellis, A. Valencia, M. Gerstein, A. Reymond, R. Guigó, J. Harrow, T.J. Hubbard, S.G. Landt, S. Frietze, A. Abyzov, N. Addleman, R.P. Alexander, R.K. Auerbach, S. Balasubramanian, K. Bettinger, N. Bhardwaj, A.P. Boyle, A.R. Cao, P. Cayting, A. Charos, Y. Cheng, C. Cheng, C. Eastman, G. Euskirchen, J.D. Fleming, F. Grubert, L. Habegger, M. Hariharan, A. Harmanci, S. Iyengar, V.X. Jin, K.J. Karczewski, M. Kasowski, P. Lacroute, H. Lam, N. Lamarre-Vincent, J. Leng, J. Lian, M. Lindahl-Allen, R. Min, B. Miotto, H. Monahan, Z. Moqtaderi, X.J. Mu, H. O’Geen, Z. Ouyang, D. Patacsil, B. Pei, D. Raha, L. Ramirez, B. Reed, J. Rozowsky, A. Sboner, M. Shi, C. Sisu, T. Slifer, H. Witt, L. Wu, X. Xu, K.-K. Yan, X. Yang, K.Y. Yip, Z. Zhang, K. Struhl, S.M. Weissman, M. Gerstein, P.J. Farnham, M. Snyder, S.A. Tenenbaum, L.O. Penalva, F. Doyle, S. Karmakar, S.G. Landt, R.R. Bhanvadia, A. Choudhury, M. Domanus, L. Ma, J. Moran, D. Patacsil, T. Slifer, A. Victorsen, X. Yang, M. Snyder, K.P. White, T. Auer, L. Centanin, M. Eichenlaub, F. Gruhl, S. Heermann, B. Hoeckendorf, D. Inoue, T. Kellner, S. Kirchmaier, C. Mueller, R. Reinhardt, L. Schertel, S. Schneider, R. Sinn, B. Wittbrodt, J. Wittbrodt, Z. Weng, T.W.

90

References

Whitfield, J. Wang, P.J. Collins, S.F. Aldred, N.D. Trinklein, E.C. Partridge, R.M. Myers, J. Dekker, G. Jain, B.R. Lajoie, A. Sanyal, G. Balasundaram, D.L. Bates, R. Byron, T.K. Canfield, M.J. Diegel, D. Dunn, A.K. Ebersol, T. Frum, K. Garg, E. Gist, R.S. Hansen, L. Boatman, E. Haugen, R. Humbert, G. Jain, A.K. Johnson, E.M. Johnson, T.V. Kutyavin, B.R. Lajoie, K. Lee, D. Lotakis, M.T. Maurano, S.J. Neph, F.V. Neri, E.D. Nguyen, H. Qu, A.P. Reynolds, V. Roach, E. Rynes, P. Sabo, M.E. Sanchez, R.S. Sandstrom, A. Sanyal, A.O. Shafer, A.B. Stergachis, S. Thomas, R.E. Thurman, B. Vernot, J. Vierstra, S. Vong, H. Wang, M.A. Weaver, Y. Yan, M. Zhang, J.M. Akey, M. Bender, M.O. Dorschner, M. Groudine, M.J. MacCoss, P. Navas, G. Stamatoyannopoulos, R. Kaul, J. Dekker, J.A. Stamatoyannopoulos, I. Dunham, K. Beal, A. Brazma, P. Flicek, J. Herrero, N. Johnson, D. Keefe, M. Lukk, N.M. Luscombe, D. Sobral, J.M. Vaquerizas, S.P. Wilder, S. Batzoglou, A. Sidow, N. Hussami, S. Kyriazopoulou-Panagiotopoulou, M.W. Libbrecht, M.A. Schaub, A. Kundaje, R.C. Hardison, W. Miller, B. Giardine, R.S. Harris, W. Wu, P.J. Bickel, B. Banfai, N.P. Boley, J.B. Brown, H. Huang, Q. Li, J.J. Li, W.S. Noble, J.A. Bilmes, O.J. Buske, M.M. Hoffman, A.D. Sahu, P.V. Kharchenko, P.J. Park, D. Baker, J. Taylor, Z. Weng, S. Iyer, X. Dong, M. Greven, X. Lin, J. Wang, H.S. Xi, J. Zhuang, M. Gerstein, R.P. Alexander, S. Balasubramanian, C. Cheng, A. Harmanci, L. Lochovsky, R. Min, X.J. Mu, J. Rozowsky, K.-K. Yan, K.Y. Yip and E. Birney, An integrated encyclopedia of DNA elements in the human genome. Nature, 2012. 489: p. 57. 68. Heinz, S., C.E. Romanoski, C. Benner, and C.K. Glass, The selection and function of cell type-specific enhancers. Nature Reviews Molecular Cell Biology, 2015. 16: p. 144. 69. Heintzman, N.D., G.C. Hon, R.D. Hawkins, P. Kheradpour, A. Stark, L.F. Harp, Z. Ye, L.K. Lee, R.K. Stuart, C.W. Ching, K.A. Ching, J.E. Antosiewicz-Bourget, H. Liu, X. Zhang, R.D. Green, V.V. Lobanenkov, R. Stewart, J.A. Thomson, G.E. Crawford, M. Kellis, and B. Ren, Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature, 2009. 459(7243): p. 108-112. 70. Hu, Z. and W.-W. Tee, Enhancers and chromatin structures: regulatory hubs in gene expression and diseases. Bioscience Reports, 2017. 37(2). 71. Ernst, J., P. Kheradpour, T.S. Mikkelsen, N. Shoresh, L.D. Ward, C.B. Epstein, X. Zhang, L. Wang, R. Issner, M. Coyne, M. Ku, T. Durham, M. Kellis, and B.E. Bernstein, Mapping and analysis of chromatin state dynamics in nine human cell types. Nature, 2011. 473: p. 43. 72. Pradeepa, M.M., Causal role of histone acetylations in enhancer function. Transcription, 2016. 8(1): p. 40-47. 73. Whyte, Warren A., David A. Orlando, D. Hnisz, Brian J. Abraham, Charles Y. Lin, Michael H. Kagey, Peter B. Rahl, Tong I. Lee, and Richard A. Young, Master Transcription Factors and Mediator Establish Super-Enhancers at Key Cell Identity Genes. Cell, 2013. 153(2): p. 307-319. 74. Sur, I. and J. Taipale, The role of enhancers in cancer. Nature Reviews Cancer, 2016. 16: p. 483.

91

References

75. Shin, H.Y., Targeting Super-Enhancers for Disease Treatment and Diagnosis. Molecules and cells, 2018. 41(6): p. 506-514. 76. Sengupta, S. and R.E. George, Super-Enhancer-Driven Transcriptional Dependencies in Cancer. Trends in Cancer, 2017. 3(4): p. 269-281. 77. Yuan, C., H. Hu, M. Kuang, Z. Chen, X. Tao, S. Fang, Y. Sun, Y. Zhang, and H. Chen, Super enhancer associated RAI14 is a new potential biomarker in lung adenocarcinoma. Oncotarget, 2017. 8(62): p. 105251-105261. 78. Krivega, I. and A. Dean, Enhancer and promoter interactions—long distance calls. Current Opinion in Genetics & Development, 2012. 22(2): p. 79-85. 79. Williamson, I., R. Eskeland, L.A. Lettice, A.E. Hill, S. Boyle, G.R. Grimes, R.E. Hill, and W.A. Bickmore, Anterior-posterior differences in HoxD chromatin topology in limb development. Development, 2012. 139(17): p. 3157. 80. Weintraub, A.S., C.H. Li, A.V. Zamudio, A.A. Sigova, N.M. Hannett, D.S. Day, B.J. Abraham, M.A. Cohen, B. Nabet, D.L. Buckley, Y.E. Guo, D. Hnisz, R. Jaenisch, J.E. Bradner, N.S. Gray, and R.A. Young, YY1 Is a Structural Regulator of Enhancer-Promoter Loops. Cell, 2017. 171(7): p. 1573-1588 e28. 81. Ferlay, J.E., M.; Colombet, M.; Mery, L.; Pineros, M.; Znaor, A.; Soerjomataram, I.; Bray, F. Global Cancer Observatory: Cancer Today. 2018 [cited 15.11.2018]; Available from: https://gco.iarc.fr/today. 82. Duruisseaux, M. and M. Esteller, Lung cancer epigenetics: From knowledge to applications. Seminars in Cancer Biology, 2018. 51: p. 116-128. 83. Fasanelli, F., L. Baglietto, E. Ponzi, F. Guida, G. Campanella, M. Johansson, K. Grankvist, M. Johansson, M.B. Assumma, A. Naccarati, M. Chadeau-Hyam, U. Ala, C. Faltus, R. Kaaks, A. Risch, B. De Stavola, A. Hodge, G.G. Giles, M.C. Southey, C.L. Relton, P.C. Haycock, E. Lund, S. Polidoro, T.M. Sandanger, G. Severi, and P. Vineis, Hypomethylation of smoking-related genes is associated with future lung cancer in four prospective cohorts. Nature communications, 2015. 6: p. 10192-10192. 84. Shenker, N.S., R. Brown, J.M. Flanagan, C. Sacerdote, F. Ricceri, S. Polidoro, K. van Veldhoven, P. Vineis, M.G. Belvisi, and M.A. Birrell, Epigenome-wide association study in the European Prospective Investigation into Cancer and Nutrition (EPIC-Turin) identifies novel genetic loci associated with smoking. Human Molecular Genetics, 2012. 22(5): p. 843-851. 85. Tsai, P.C., C.A. Glastonbury, M.N. Eliot, S. Bollepalli, I. Yet, J.E. Castillo- Fernandez, E. Carnero-Montoro, T. Hardiman, T.C. Martin, A. Vickers, M. Mangino, K. Ward, K.H. Pietilainen, P. Deloukas, T.D. Spector, A. Vinuela, E.B. Loucks, M. Ollikainen, K.T. Kelsey, K.S. Small, and J.T. Bell, Smoking induces coordinated DNA methylation and gene expression changes in adipose tissue with consequences for metabolic health. Clin Epigenetics, 2018. 10(1): p. 126. 86. Kollmann, S., Smoking-induced Differentially Methylated Regions as potential Lung Cancer Risk Markers and their Functional Analysis, in Faculty of Natural Sciences. 2017, Paris-Lodron-University Salzburg. 87. Nair, M., Epigenomic profiling of lung cancer: Molecular characterization of lung tumor types and lung cancer risk. 2017, Medical Faculty of Heidelberg

92

References

88. Taiwo, O., G.A. Wilson, T. Morris, S. Seisenberger, W. Reik, D. Pearce, S. Beck, and L.M. Butcher, Methylome analysis using MeDIP-seq with low DNA concentrations. Nature Protocols, 2012. 7: p. 617. 89. Dally, H., L. Edler, B. Jager, P. Schmezer, B. Spiegelhalder, H. Dienemann, P. Drings, V. Schulz, K. Kayser, H. Bartsch, and A. Risch, The CYP3A4*1B allele increases risk for small cell lung cancer: effect of gender and smoking dose. Pharmacogenetics, 2003. 13(10): p. 607-18. 90. Häsler Gunnarsdóttir, S., The Functional Analysis of a Smoking-induced Differentially Methylated Region on Chromosome 1, in Faculty of Natural Sciences. 2018, Paris-Lodron-University Salzburg. 91. Davies, J.O., J.M. Telenius, S.J. McGowan, N.A. Roberts, S. Taylor, D.R. Higgs, and J.R. Hughes, Multiplexed analysis of chromosome conformation at vastly improved sensitivity. Nat Methods, 2016. 13(1): p. 74-80. 92. Dekker, J., K. Rippe, M. Dekker, and N. Kleckner, Capturing chromosome conformation. Science, 2002. 295(5558): p. 1306-11. 93. Stamatoyannopoulos, J., Connecting the regulatory genome. Nat Genet, 2016. 48(5): p. 479-80. 94. Splinter, E., F. Grosveld, and W. de Laat, 3C technology: analyzing the spatial organization of genomic loci in vivo. Methods Enzymol, 2004. 375: p. 493-507. 95. Naumova, N., E.M. Smith, Y. Zhan, and J. Dekker, Analysis of long-range chromatin interactions using Chromosome Conformation Capture. Methods, 2012. 58(3): p. 192-203. 96. Hagege, H., P. Klous, C. Braem, E. Splinter, J. Dekker, G. Cathala, W. de Laat, and T. Forne, Quantitative analysis of chromosome conformation capture assays (3C-qPCR). Nat Protoc, 2007. 2(7): p. 1722-33. 97. Dekker, J., The three 'C' s of chromosome conformation capture: controls, controls, controls. Nat Methods, 2006. 3(1): p. 17-21. 98. Tsai, Y.C., N.E. Cooke, and S.A. Liebhaber, Long-range looping of a locus control region drives tissue-specific chromatin packing within a multigene cluster. Nucleic Acids Res, 2016. 44(10): p. 4651-64. 99. Deng, W., J. Lee, H. Wang, J. Miller, A. Reik, P.D. Gregory, A. Dean, and G.A. Blobel, Controlling long-range genomic interactions at a native locus by targeted tethering of a looping factor. Cell, 2012. 149(6): p. 1233-44. 100. Denker, A. and W. de Laat, The second decade of 3C technologies: detailed insights into nuclear organization. Genes Dev, 2016. 30(12): p. 1357-82. 101. Simonis, M., P. Klous, E. Splinter, Y. Moshkin, R. Willemsen, E. de Wit, B. van Steensel, and W. de Laat, Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat Genet, 2006. 38(11): p. 1348-54. 102. de Wit, E. and W. de Laat, A decade of 3C technologies: insights into nuclear organization. Genes Dev, 2012. 26(1): p. 11-24. 103. Dostie, J., T.A. Richmond, R.A. Arnaout, R.R. Selzer, W.L. Lee, T.A. Honan, E.D. Rubio, A. Krumm, J. Lamb, C. Nusbaum, R.D. Green, and J. Dekker, Chromosome

93

References

Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome research, 2006. 16(10): p. 1299-1309. 104. Lieberman-Aiden, E., N.L. van Berkum, L. Williams, M. Imakaev, T. Ragoczy, A. Telling, I. Amit, B.R. Lajoie, P.J. Sabo, M.O. Dorschner, R. Sandstrom, B. Bernstein, M.A. Bender, M. Groudine, A. Gnirke, J. Stamatoyannopoulos, L.A. Mirny, E.S. Lander, and J. Dekker, Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science, 2009. 326(5950): p. 289. 105. Fullwood, M.J., M.H. Liu, Y.F. Pan, J. Liu, H. Xu, Y.B. Mohamed, Y.L. Orlov, S. Velkov, A. Ho, P.H. Mei, E.G.Y. Chew, P.Y.H. Huang, W.-J. Welboren, Y. Han, H.S. Ooi, P.N. Ariyaratne, V.B. Vega, Y. Luo, P.Y. Tan, P.Y. Choy, K.D.S.A. Wansa, B. Zhao, K.S. Lim, S.C. Leow, J.S. Yow, R. Joseph, H. Li, K.V. Desai, J.S. Thomsen, Y.K. Lee, R.K.M. Karuturi, T. Herve, G. Bourque, H.G. Stunnenberg, X. Ruan, V. Cacheux-Rataboul, W.-K. Sung, E.T. Liu, C.-L. Wei, E. Cheung, and Y. Ruan, An oestrogen-receptor-alpha-bound human chromatin interactome. Nature, 2009. 462(7269): p. 58-64. 106. Kent, W.J., C.W. Sugnet, T.S. Furey, K.M. Roskin, T.H. Pringle, A.M. Zahler, and D. Haussler, The human genome browser at UCSC. Genome Res, 2002. 12(6): p. 996-1006. 107. Ernst, J. and M. Kellis, Discovery and characterization of chromatin states for systematic annotation of the human genome. Nature Biotechnology, 2010. 28: p. 817. 108. The, F.C., R.P. the, Clst, A.R.R. Forrest, H. Kawaji, M. Rehli, J. Kenneth Baillie, M.J.L. de Hoon, V. Haberle, T. Lassmann, I.V. Kulakovskiy, M. Lizio, M. Itoh, R. Andersson, C.J. Mungall, T.F. Meehan, S. Schmeier, N. Bertin, M. Jørgensen, E. Dimont, E. Arner, C. Schmidl, U. Schaefer, Y.A. Medvedeva, C. Plessy, M. Vitezic, J. Severin, C.A. Semple, Y. Ishizu, R.S. Young, M. Francescatto, I. Alam, D. Albanese, G.M. Altschuler, T. Arakawa, J.A.C. Archer, P. Arner, M. Babina, S. Rennie, P.J. Balwierz, A.G. Beckhouse, S. Pradhan-Bhatt, J.A. Blake, A. Blumenthal, B. Bodega, A. Bonetti, J. Briggs, F. Brombacher, A. Maxwell Burroughs, A. Califano, C.V. Cannistraci, D. Carbajo, Y. Chen, M. Chierici, Y. Ciani, H.C. Clevers, E. Dalla, C.A. Davis, M. Detmar, A.D. Diehl, T. Dohi, F. Drabløs, A.S.B. Edge, M. Edinger, K. Ekwall, M. Endoh, H. Enomoto, M. Fagiolini, L. Fairbairn, H. Fang, M.C. Farach-Carson, G.J. Faulkner, A.V. Favorov, M.E. Fisher, M.C. Frith, R. Fujita, S. Fukuda, C. Furlanello, M. Furuno, J.-i. Furusawa, T.B. Geijtenbeek, A.P. Gibson, T. Gingeras, D. Goldowitz, J. Gough, S. Guhl, R. Guler, S. Gustincich, T.J. Ha, M. Hamaguchi, M. Hara, M. Harbers, J. Harshbarger, A. Hasegawa, Y. Hasegawa, T. Hashimoto, M. Herlyn, K.J. Hitchens, S.J. Ho Sui, O.M. Hofmann, I. Hoof, F. Hori, L. Huminiecki, K. Iida, T. Ikawa, B.R. Jankovic, H. Jia, A. Joshi, G. Jurman, B. Kaczkowski, C. Kai, K. Kaida, A. Kaiho, K. Kajiyama, M. Kanamori-Katayama, A.S. Kasianov, T. Kasukawa, S. Katayama, S. Kato, S. Kawaguchi, H. Kawamoto, Y.I. Kawamura, T. Kawashima, J.S. Kempfle, T.J. Kenna, J. Kere, L.M. Khachigian, T. Kitamura, S. Peter Klinken, A.J. Knox, M.

94

References

Kojima, S. Kojima, N. Kondo, H. Koseki, S. Koyasu, S. Krampitz, A. Kubosaki, A.T. Kwon, J.F.J. Laros, W. Lee, A. Lennartsson, K. Li, B. Lilje, L. Lipovich, A. Mackay- sim, R.-i. Manabe, J.C. Mar, B. Marchand, A. Mathelier, N. Mejhert, A. Meynert, Y. Mizuno, D.A. de Lima Morais, H. Morikawa, M. Morimoto, K. Moro, E. Motakis, H. Motohashi, C.L. Mummery, M. Murata, S. Nagao-Sato, Y. Nakachi, F. Nakahara, T. Nakamura, Y. Nakamura, K. Nakazato, E. van Nimwegen, N. Ninomiya, H. Nishiyori, S. Noma, T. Nozaki, S. Ogishima, N. Ohkura, H. Ohmiya, H. Ohno, M. Ohshima, M. Okada-Hatakeyama, Y. Okazaki, V. Orlando, D.A. Ovchinnikov, A. Pain, R. Passier, M. Patrikakis, H. Persson, S. Piazza, J.G.D. Prendergast, O.J.L. Rackham, J.A. Ramilowski, M. Rashid, T. Ravasi, P. Rizzu, M. Roncador, S. Roy, M.B. Rye, E. Saijyo, A. Sajantila, A. Saka, S. Sakaguchi, M. Sakai, H. Sato, H. Satoh, S. Savvi, A. Saxena, C. Schneider, E.A. Schultes, G.G. Schulze-Tanzil, A. Schwegmann, T. Sengstag, G. Sheng, H. Shimoji, Y. Shimoni, J.W. Shin, C. Simon, D. Sugiyama, T. Sugiyama, M. Suzuki, N. Suzuki, R.K. Swoboda, P.A.C. ’t Hoen, M. Tagami, N. Takahashi, J. Takai, H. Tanaka, H. Tatsukawa, Z. Tatum, M. Thompson, H. Toyoda, T. Toyoda, E. Valen, M. van de Wetering, L.M. van den Berg, R. Verardo, D. Vijayan, I.E. Vorontsov, W.W. Wasserman, S. Watanabe, C.A. Wells, L.N. Winteringham, E. Wolvetang, E.J. Wood, Y. Yamaguchi, M. Yamamoto, M. Yoneda, Y. Yonekura, S. Yoshida, S.E. Zabierowski, P.G. Zhang, X. Zhao, S. Zucchelli, K.M. Summers, H. Suzuki, C.O. Daub, J. Kawai, P. Heutink, W. Hide, T.C. Freeman, B. Lenhard, V.B. Bajic, M.S. Taylor, V.J. Makeev, A. Sandelin, D.A. Hume, P. Carninci and Y. Hayashizaki, A promoter-level mammalian expression atlas. Nature, 2014. 507: p. 462. 109. Yevshin, I., R. Sharipov, T. Valeev, A. Kel, and F. Kolpakov, GTRD: a database of transcription factor binding sites identified by ChIP-seq experiments. Nucleic Acids Research, 2017. 45(D1): p. D61-D67. 110. Rauscher, S., Effects of hypomethylating agents on the methylation on epigenetic regulator gene, in Faculty of Natural Science. 2017, Paris-Lodron- University Salzburg. 111. Raney, B.J., T.R. Dreszer, G.P. Barber, H. Clawson, P.A. Fujita, T. Wang, N. Nguyen, B. Paten, A.S. Zweig, D. Karolchik, and W.J. Kent, Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser. Bioinformatics, 2014. 30(7): p. 1003-5. 112. Ea, V., F. Court, and T. Forne, Quantitative Analysis of Intra-chromosomal Contacts: The 3C-qPCR Method. Methods Mol Biol, 2017. 1589: p. 75-88. 113. qPCR slope to efficiency. [cited 05.12.2018]; Available from: http://www.labtools.us/qpcr-slope-to-efficiency/. 114. Esteller, M., Epigenetics in cancer. N Engl J Med, 2008. 358(11): p. 1148-59. 115. Aran, D. and A. Hellman, Unmasking risk loci: DNA methylation illuminates the biology of cancer predisposition: analyzing DNA methylation of transcriptional enhancers reveals missed regulatory links between cancer risk loci and genes. Bioessays, 2014. 36(2): p. 184-90. 116. Wolff, F., M. Leisch, R. Greil, A. Risch, and L. Pleyer, The double-edged sword of (re)expression of genes by hypomethylating agents: from viral mimicry to

95

References

exploitation as priming agents for targeted immune checkpoint modulation. Cell Commun Signal, 2017. 15(1): p. 13. 117. Ren, G., W. Jin, K. Cui, J. Rodrigez, G. Hu, Z. Zhang, D.R. Larson, and K. Zhao, CTCF-Mediated Enhancer-Promoter Interaction Is a Critical Regulator of Cell-to- Cell Variation of Gene Expression. Mol Cell, 2017. 67(6): p. 1049-1058 e6. 118. Gatignol, A., A. Buckler-White, B. Berkhout, and K.T. Jeang, Characterization of a human TAR RNA-binding protein that activates the HIV-1 LTR. Science, 1991. 251(5001): p. 1597-600. 119. Daniels, S.M. and A. Gatignol, The multiple functions of TRBP, at the hub of cell responses to viruses, stress, and cancer. Microbiology and molecular biology reviews : MMBR, 2012. 76(3): p. 652-666. 120. Ye, J., J. Wang, N. Zhang, Y. Liu, L. Tan, and L. Xu, Expression of TARBP1 protein in human non-small-cell lung cancer and its prognostic significance. Oncology letters, 2018. 15(5): p. 7182-7190.

96

Appendix

6 Appendix

6.1 Bioinformatical Analysis

Supplementary figure 1| ChiP-seq data from Broad Histone track of the USCSC genome bowser Snapshots from the ChIP-seq data for the histone modifications H3K4me1 in A), H3K27ac in B) H3K4me3 in C) and H3K27me3 in D) measured in 19 different cell lines. The seDMR is marked as a black box at the top of each figure as well as by the red highlighted background.

97

Appendix

6.2 mRNA expression analysis

Supplementary figure 2| Results of mRNA expression analysis of individual experiments in A549 cells treated with AZA. In general, the fold change in mRNA expression of the genes adjacent to the seDMR varies only slightly between the samples treated with different AZA concentrations. However, the highest concentration of AZA in experiment 1 (upper graph) caused the highest change in expression at the COA6 gene. Similarly, in experiment 3 (lower graph) of A549 cells treated with AZA the highest concentration achieved the highest fold change at the same gene.

98

Appendix

Supplementary figure 3| Results of mRNA expression analysis of individual experiments in A549 cells treated with DAC. The results from experiment 1 (upper graph) depict a slight increase in mRNA expression of COA6 with increasing DAC concentration. However, for the other genes no significant change in expression between different DAC concentrations can be observed. TBCE displays the highest decrease to ~40 % in mRNA expression, followed by the ncRNA LINC01132 showing a decrease towards ~50 %. In experiment 3 (lower graph) the highest fold change was always measured for the samples treated with the middle concentration of DAC. With a 2.2-fold increase in mRNA expression for COA6 and LINC01132 the highest fold change could be observed within this experiment. The other genes show an almost unchanged mRNA expression pattern.

99

Appendix

Supplementary figure 4| Results of mRNA expression analysis of individual experiments in H1299 cells treated with AZA. In experiment 1 (upper graph) only COA6 shows an increased expression of mRNA for samples treated with middle or highest AZA concentration, respectively. All the other genes depict a decrease in expression, especially for the two higher concentrations of AZA. In general, except for COA6 all other genes show highest expression at the samples treated with the lowest AZA concentrations. Almost no change in expression of the gene’s mRNA can be observed in experiment 3 (lower graph). Only for TBCE an increase in mRNA expression in the sample treated with the highest AZA concentration can be seen.

100

Appendix

Supplementary figure 5| Results of mRNA expression analysis of individual experiments in A549 cells treated with DAC. In experiment 1 (upper graph) only COA6 indicates an increase in mRNA expression upon DAC treatment with the two highest concentrations. All the other genes adjacent to the seDMR display no significant change or a slight decrease in mRNA expression. The same can be observed for experiment 2 (middle graph), except that here TARBP1 shows the highest increase in mRNA expression for the sample treated with the lowest DAC concentration. In experiment 3 (lower graph) all genes show an almost unchanged mRNA expression pattern.

101

Appendix

6.3 Supplementary data obtained by 3C application

Supplementary table 1| Assessed percentage of digestion efficiency for tested restriction sites in three cell lines normalized to internal primer 1 or 2.

green & checkmark: digestion efficiency is ≥ 60 % yellow & exclamation mark: digestion efficiency is between 50 and 59.9 % red & X: digestion efficiency is < 50 %

102

Appendix

Supplementary table 2| Raw data of 3C experiment in A549. A549 mean mean mean Ct 1 Relative Standard Primer pair slope intercept value interaction deviation (n=3) (n=3) (n=3) frequency A/1 -3.30 22.12 29.08 0.60 1.67 A/2 -3.82 21.72 27.25 0.64 7.68 A/3 -3.28 21.89 28.70 0.74 1.87 A/4 -3.48 21.58 26.16 0.47 10.21 A/5 -3.45 21.98 26.43 0.68 11.51 A/6 -3.82 21.13 26.53 0.60 8.21 A/7 -3.87 22.52 23.77 0.48 116.89 A/8 -3.31 22.89 29.57 0.56 2.03 A/9 -3.70 22.22 26.58 0.54 13.55 A/10 -3.44 21.94 29.45 0.66 1.47 A/11 -3.74 22.96 28.94 0.49 5.24 A/12 -3.55 20.38 24.95 0.69 11.43 A/13 -3.73 20.87 26.25 0.33 7.45 A/14 -3.62 23.04 29.81 0.88 3.30 A/15 -3.47 21.54 29.00 0.48 1.51 A/16 -2.72 22.56 28.26 0.19 1.58 A/17 -3.35 27.50 30.09 0.75 37.16 A/18 -3.46 21.57 28.32 0.53 2.30 A/19 -3.50 22.84 27.90 0.57 7.59 A/20 -3.24 21.72 29.36 0.69 0.98 A/21 -3.94 20.39 27.56 0.43 2.94 A/22 -3.79 20.78 30.26 0.50 0.63 A/23 -3.37 21.21 26.09 0.71 7.89 A/24 -3.45 22.77 29.39 0.79 2.73 Int1 -3.53 12.63 20.71 0.29 1 Mean Ct value obtained in three experimental replicates. For each experiment the Ct value was previously averaged from duplicates used for qRT-PCR measurement

103

Appendix

Supplementary table 3| Raw data of 3C experiment in HEK293T. HEK293T mean mean mean Ct 1 Relative Standard Primer pair slope intercept value interaction deviation (n=3) (n=3) (n=3) frequency A/1 -3.65 21.71 29.14 0.22 1.67 A/2 -3.15 22.47 27.62 0.52 7.68 A/3 -3.79 21.56 28.39 0.96 1.87 A/4 -3.28 22.07 26.96 0.32 10.21 A/5 -3.61 21.83 25.99 1.10 11.51 A/6 -3.73 21.32 27.94 1.87 8.21 A/7 -3.50 22.61 25.26 2.00 116.89 A/8 -3.56 22.27 30.00 0.25 2.03 A/9 -3.23 22.94 28.24 1.74 13.55 A/10 -3.31 22.38 28.44 2.02 1.47 A/11 -3.47 22.95 29.41 0.31 5.24 A/12 -3.40 21.11 27.43 3.24 11.43 A/13 -3.38 21.93 28.76 1.46 7.45 A/14 -3.67 22.13 29.86 0.15 3.30 A/15 -3.27 21.71 30.47 1.89 1.51 A/16 -3.10 22.66 29.44 1.37 1.58 A/17 -3.54 25.47 30.50 0.36 37.16 A/18 -3.05 22.15 29.64 1.03 2.30 A/19 -3.22 24.70 29.14 1.39 7.59 A/20 -3.65 21.36 30.70 1.76 0.98 A/21 -3.46 21.54 28.69 0.29 2.94 A/22 -3.61 21.14 31.24 0.68 0.63 A/23 -3.76 20.83 24.27 3.57 7.89 A/24 -3.51 22.22 29.51 0.69 2.73 Int1 -3.61 15.44 20.12 0.10 1 Mean Ct value obtained in three experimental replicates. For each experiment the Ct value was previously averaged from duplicates used for qRT-PCR measurement.

104

Appendix

Supplementary table 4| Raw data of 3C experiment in HepG2. HepG2 mean Ct 1 Relative Standard Primer pair slope intercept value interaction deviation (n=2) frequency A/1 -3.72 21.74 29.24 0.01 2.19 A/2 -3.44 22.06 27.66 0.24 5.30 A/3 -4.42 20.73 28.91 0.16 3.19 A/4 -3.89 21.04 26.91 0.07 7.02 A/5 -3.62 21.87 26.67 0.04 10.70 A/6 -3.56 21.44 26.61 0.11 8.00 A/7 -4.13 22.32 24.44 0.08 69.28 A/8 -3.57 22.42 29.62 0.01 2.16 A/9 -3.61 22.26 27.10 0.03 10.29 A/10 -3.21 22.01 29.50 0.30 1.04 A/11 -3.52 23.06 29.45 0.09 3.45 A/12 -3.47 20.43 25.56 0.09 7.50 A/13 -3.29 21.24 26.38 0.10 6.19 A/14 -3.53 23.01 30.07 0.51 2.26 A/15 -3.60 21.49 29.01 0.20 1.83 A/16 -2.91 22.15 28.37 0.34 1.65 A/17 -3.17 27.65 30.26 0.01 33.90 A/18 -3.63 21.26 28.64 0.36 2.08 A/19 -3.27 23.13 28.02 0.05 7.24 A/20 -3.66 21.16 29.55 0.09 1.14 A/21 -3.33 20.91 28.54 0.20 1.15 A/22 -3.75 20.78 30.57 0.00 0.55 A/23 -3.60 21.00 25.66 0.07 11.45 A/24 -3.59 22.53 28.49 0.06 4.96 Int1 -3.63 12.44 20.98 0.04 1 Mean Ct value from duplicates

105

Appendix

6.4 SOP for 3C

SOP – Chromosome conformation capture – Version 2

Adopted from S. Häsler Gunnarsdóttir [90] DAY 1:

Single cell preparation from adherent cultured cells (30 min – 1 h)

1. Remove medium and wash cells twice with 1x PBS 2. Add 2 ml of 0.25 % Trypsin-EDTA and 2 ml PBS and incubate for 5 min at 37 °C 3. Re-suspend cells in 10 ml medium 4. Transfer cells to a 50 ml tube and centrifuge for 8 min at 300 g 5. Discard supernatant and re-suspend the pellet in 1 ml 10 % FBS/PBS 6. Mechanically separate the cells by pipetting 7. Count the cells with the Countess II 8. Transfer 5x106 cells into a 15 ml tube 9. Fill up to 9.73 ml with 10 % FBS/PBS

Formaldehyde cross-linking (20-25 min)

10. Add 270 µl of 37 % formaldehyde (final concentration 1 %) and incubate for 10 min at RT while tumbling 11. Transfer the reaction tubes to ice and add 1.425 ml of 1 M ice cold glycine to stop cross- linking reaction 12. Spin for 8 min at 320 g at 4 °C and carefully remove all supernatant 13. Re-suspend pellet in 1 ml 3C buffer 14. Spin for 5 min at 400 g at 4 °C and remove supernatant

Cell lysis (20-25 min)

15. Resuspend (pipette gently up and down) the pellet in 5 ml cold lysis buffer (containing 1x protease inhibitor) and incubate for 10 min on ice while mixing cells every 2-3 min (by pipetting up and down) 16. Centrifuge for 5 min at 400g at 4 °C and remove all supernatant

Pause point (pelleted nuclei can be frozen on dry ice and stored at -80°-c for several months)

DAY 2:

17. Re-suspend the pellet in 1 ml 1x restriction enzyme buffer (CutSmart 10x) 18. Centrifuge the cell suspension for 5 min at 400 g at 4 °C (wash step to remove lysis buffer) and discard the supernatant

Digestion (18-20 h (O/N))

19. Take up the nuclei in 0.5 ml 1.2 x restriction enzyme buffer (CutSmart) and transfer to a 1.5 ml safe-lock tube 20. Place the tube at 37 °C for two minutes and add 7.5 µl of 20 % SDS (final 0.3 %) 106

Appendix

21. Incubate for 1 hour at 37 °C while shaking at 900 rpm and re-suspend every 10-15 min 22. Add 84.5 µl of 12 % Triton diluted in 1x ligase buffer (final 2 %) 23. Incubate for 1 h at 37 °C while shaking at 900 rpm -> re-suspend every 10-15 min ➔ Take a 20 µl aliquot of the sample and label as undigested genomic DNA control (U) and store undigested sample at -20°C

24. Add 150 U (7.5 µl) of HindIII-HF to the remaining sample and incubate for 1.5 hours at 900 rpm at 37°C and resuspend every 10-15 min Add 150 U of HindIII-HF to the sample and incubate for 1.5 hours at 900 rpm at 37 °C and re- suspend every 10-15 min

Add last 150 U of HindIII-HF to the sample and incubate O/N at 900 rpm at 37 °C

➔ Take a 20 µl aliquot of the sample and label as digested genomic control (D).

DAY 3:

Ligation (8-9 h (O/N))

25. Add 40 µl of 20% SDS (final 1.6%) to the remaining sample 26. Incubate 30 min at 37°C while shaking with 900 rpm 27. Transfer the digested nuclei to a 50 ml falcon tube 28. Add 6.125 ml of 1.15x ligation buffer 29. Add 375 µl of 20% Triton X-100 (final 1%) 30. Incubate for 1 hour at 37°C while shaking gently (300 rpm) 31. Add 17 µl T4 DNA ligase and incubate for 4 h at 16°C 32. Equilibre 30min at RT 33. Add 30 µl of 10 mg ml-1 PK (final 300 µg) 34. Incubate at 65°C overnight to de-crosslink the sample

DNA purification (5-6 h)

35. Equilibre the samples for few min at 37 °C 36. Add 30 µl of 10 mg/ml RNase (final 300 µg) 37. Incubate for 30 min at 37 °C 38. Perform EtOH precipitation

- Add 7 ml dH2O (1 volume) - Add 1.5 ml Na-Acetate pH 5.6 (2 M) (0.1 volumes)

- Add 152 µl MgCl2 (final 0.01 M) - Add 35 ml EtOH absolute (2 volumes) - Incubate for 1h at -80 °C - Centrifuge for 90 min at 2200g at 4 °C and discard supernatant - Add 10 ml 70 % EtOH and centrifuge for 20 min at 2200 g and 4 °C – discard supernatant - Dissolve pellet in 150 µl Tris-HCl pH 8, (10 mM) and incubate o/n on shaker 300 rpm at 4 °C 39. Use NucleoSpin DNA clean-up kit for further purification according to manufacturer’s protocol

107

Appendix

Purification of digestion control aliquots

1. Add 500 µl of 1x PK buffer and 2 µl of 10 mg/ml PK (final 20 µg) to the control aliquots saved in step 23 and 24 2. Incubate for 2 hours at 65 °C 3. Equilibre for few min at 37 °C 4. Add 1µl of 1mg/ml RNase A and incubate for 45 min at 37 °C 5. Use 150 µl of the respective aliquot for purificaiton by NucleoSpin DNA-Cleanup kit 6. Measure DNA concentration of each digestion control aliquot with Qubit 3

Digestion efficiency assessment

Perform a qRT-PCR with the undigested control sample (step 23) and the digested control sample (step 24).

Reaction mix 1x [µl] ! Due to high number of test primers, DNA is added into DNA [0.5-5 ng/µl] reaction mix instead of primers! (undigested or digested) 2

Water 6 2x Mix (TB-green) 10 Test primer or internal primer 2 Total Volume 20

Compare the values for every test primer (Fw+Rv) and calculate the percentage of digestion with following formula: 100 % 푑𝑖푔푒푠푡푒푑 = 100 − 2((퐶푡푅.−퐶푡 퐼.)푑푖푔푒푠푡푒푑−(퐶푡푅.−퐶푡 퐼.)푢푛푑푖푔푒푠푡푒푑)

CtR = test primer (across restriction site), CtI = internal primer (no restriciton site)

The efficiency of the restriction enzyme digestion should be above 60-70 %, but ideally > 80 % should be digested. Samples with lower digestion efficiencies should be discarded.

Purity assessment (2-4 h)

1. Perform a serial dilution of your 3C sample (e.g. 1:2, 1:4, 1:8…) starting with 12.5 ng/µl. 2. Set up 20 µl reaction mix for qRT-PCR, as described below:

Stock component 1x (µl) H2O 6 Internal Primer 1 - Mix (Fw+Rv) (10µM) 2

108

Appendix

TB green (TAKARA) (2x) 10 Diluted 3C sample from step 36 2

3. Run the qPCR using the conditions tabulated below. Preincubation 95 °C 30 sec 1x 95 °C 5 sec 3 step 60 °C 30 sec 40x amplification 72 °C 30 sec 95 °C 5 sec Melt curve 65 °C 60 sec 1x stage 95 °C 60 sec Cooling 4 °C ∞

First, the quantificaiton value (qValue) for each sample dilution has to be calculated by the following formula:

퐶푡−푖푛푡푒푟푐푒푝푡 푞푉푎푙푢푒 = 10 푠푙표푝푒

Then, the measured dilution factor can be calculated:

푞푉푎푙푢푒1:1 퐷𝑖푙푢푡𝑖표푛 푓푎푐푡표푟 = 푞푉푎푙푢푒1:푋 Check that measured dilution factor does not differ more than 30 % from expected dilution factor, otherwise this sample should be discarded.

Detection of interaction frequency (4-5h)

1. Measure DNA concentration of 3C sample gained after step 39 with Qubit 3 2. Prepare a dilution of 3C sample with a concentration of 12.5 ng/µl and prove this dilution by another quantificaiton by Qubit 3. (deviaiton should not exeed 10 %) 3. Perform qRT-PCR with diluted 3C sample and BAC library (3 dilutions) and primer A(Fw)/X (Fw) (X stands for everey single test primer), and one internal primer (Fw+Rv). Prepare one master mix for all measurements containing the DNA (3C sample or respective BAC dilution) instead of primer. 4. Use sampe qRT-PCR conditions as for purity assessment. For calculation of relative interaction frequency first the quantificaiton value (qValue) for each primer pair (A/X) and (Int) has to be calculated by following formula:

퐶푡3퐶 푠푎푚푝푙푒− 푖푛푡푒푟푐푒푝푡푐표푛푡푟표푙 푙푖푏푟푎푟푦 푞푉푎푙푢푒 = 10 푠푙표푝푒푐표푛푡푟표푙 푙푖푏푟푎푟푦 Then, qValue of each primer pair (A/X) has to be normalized for qValue (Int) in order to gain the relative interaction frequency.

109

Appendix

푞푉푎푙푢푒퐴/푋 푟푒푙푎푡𝑖푣푒 𝑖푛푡푒푟푎푐푡𝑖표푛 푓푟푒푞푢푒푛푐푦 = 푞푉푎푙푢푒퐼푛푡

Preparation of BAC control library

The following procedure is based on the NucleoBond Xtra BAC protocol.

BAC cultruing and harvesting (3 days)

1. Steak out BAC (glycerol stock) on a LB-agarose plate containing 12.5 µg/ml chloramphenicol by using a pipette tip (ATTENTION: do not thaw glycerol stock) 2. Incubate plate o/n at 37 °C 3. Pick a single colony with a pipette tip and inoculate 5 ml of prewarmed LB-medium containing 12.5 µg/ml chloramphenicol 4. Incubate the starter culture for ~ 8 h while shaking with 300 rpm (ThermoMixer C, Eppendorf) at 37 °C 5. Add 400 µl of the starter culture into two Erlenmeyer flasks (1L) containing 250 ml LB- medium containing 12.5 µg/ml chloramphenicol 6. Incubate the flasks o/n (12 to maximal 16 h) at 37 °C and 210 rpm (Unitron, Infors HAT) 7. Dilute an aliquot of each flask 1:10 (e.g. 900 µl LB-medium + 100 µl BAC culture)

8. Measure OD600 and calculate the volume, needed for centrifugation in the next step with following formula:

푂퐷 ×10 • 푉표푙푢푚푒 [푚푙] = 600 (x10 due to dilution in step 7) 퐵퐴퐶 푐푢푙푡푢푟푒 1500 9. Transfer the calculated volume of BAC culture into a centrifugation vial. 10. Centrifuge for 20 min with 4000 g at 4 °C and immidiately discard supernatant after centrifugation.

• (BAC pellet can be stored at -20 °C until further use) BAC Isolation (6 h)

11. Perform BAC isolation according to manufacturer’s protocol starting with step 4 with following changes/specifications: Step 4: Transfer resuspended BAC pellet into a 250 ml flask Step 8: Invert flask containing the precipitate several times until white flakes are spread homogenously and put the flask back on ice between re-loading steps. Load the flow-through a second time. Step 12: Elute the BAC stepwise by adding 5x 3 ml of elution buffer, preheated to 70 °C Step 13: Centrifuge for 1 h at 4565 g at 4 °C, discard supernatant Step 14: Centrifuge for 20 min at 4565 g at RT 110

Appendix

Step 15: Dissolve pellet in 500 µl TE-Buffer o/n on ThermoMixer 3C, Eppendoff at 4 °C and 300 rpm. 12. Verify insert via PCR using test primers of the according insert region BAC digestion ( 6 h)

1. Determine DNA concentration of each isolated BAC via Qubit 2. Mix the used BACs in equimolar amounts (use insert length for calculation) ➔ use ≥ 5 µg of BAC having the longest insert 3. Digest the BAC-Mix by using 5 µg BAC-Mix DNA, 500 U of Hind III and 1x CutSmart Buffer in a total volume of 500 µl 4. Incubate for 2 h at 37 °C 5. Transfer the reaction mix into a 15 ml tube 6. Perform EtOH-precipitation:

- add 1 volume dH2O - add 0.1 volume sodium acetate (2 M)

- add 0.01 M of MgCl2 - then add 2 volumes of EtOH (abs., RT) - mix gently and incubate at -80 °C for ~ 1 h - centrifuge with 2200 g at 4° C for at least 1 h, then discard supernatant - add 1 ml EtOH (70 %) - centrifuge with 2200 g at 4 °C for at least 15 min, then discard supernatant - airdry pellet for few minutes at RT 7. dissolve pellet in 500 µl 1x ligase buffer

BAC ligation + purification (o/n)

1. add 17 µl of T4 DNA ligase to the digested BAC-Mix 2. incubate o/n at 16 °C 3. perform an EtOH precipitation (see BAC digestion step 6) 4. dissolve the pellet in 150 µl Tris-HCl (10 mM, pH 8.0) 5. use NuclesoSpin gDNA clean-up kit (Macherey & Nagel) for further purification according to manufacturer’s protocol

BAC control library dilution

1. Determine the DNA concentration of the BAC control library via Qubit 2. prepare 1:1, 1:10 and 1:100 dilutions of the control library 111

Appendix

- test the dilutions by RT-PCR if the Ct values range between 10 and 30 cycles - calculate needed volumes for all 3C measurements - prepare dilutions and perepare aliquots containing the needed volume for one 3C experiment, store the aliquots at – 20 °C until usage

112

Acknowledgements

7 Acknowledgements

The greatest gratitude is addressed to Prof. Dr. Angela Risch. Thanks for giving me the possibility to perform my master thesis in your group and guiding me through it. I really appreciate the many scientific discussions and suggestions from which I have learned a lot.

Furthermore, a big thank goes to my supervisor Dr. Florian Wolff. Thank you for your patience and the innovative ideas, which really pushed me forward in my master thesis.

Additionally, I would like to thank the entire team of this working group. A heartfelt thank you for the warm welcome in your team and the funny lunch breaks. A very profound gratitude I must express to Tanja for the practical advices based on her profound experience; to Esher for the emotional support, the constructive criticism and also for the deep discussions apart scientific issues; to Anna for supporting me in every situation emotionally and also practically and thanks for the many hours we spent together with a lot of laughter.

Thank you all for the very exciting time!

Furthermore, my gratitude is addressed to Sebastian for laying the cornerstone for this research project and to Sissý for her great support throughout my thesis, and especially for the well-established SOP of the 3C approach.

Last but not least I have to express my greatest gratitude to my family for supporting me during my whole studies. Especially a big thank goes to Markus, for standing behind me and for all the support in difficult times. Thanks for motivating me every day by new, although it was not always easy.

113