UCSF UC San Francisco Electronic Theses and Dissertations

Title Discovery and molecular mechanism of cis-regulatory element mutations in cancer

Permalink https://escholarship.org/uc/item/25s3r6fs

Author Bell, Robert Joseph Allen

Publication Date 2015

Peer reviewed|Thesis/dissertation

eScholarship.org Powered by the California Digital Library University of California

Copyright 2015

by

Robert Joseph Allen Bell

ii

DEDICATION AND ACKNOWLEDGMENTS

The text of chapters three and four of this dissertation are a reprint of the material as it appears in Bell et al. The GABP selectively binds and activates the mutant TERT promoter in cancer, Science 348, 1036–1039 (2015).

First I would like to thank my mentors Joseph Costello and Jun Song. I know that many graduate students end up struggling with their PI, and for me to end up with two fully supportive, encouraging, and patient mentors has me feeling truly blessed. You have helped me grow in my professional and personal life, given me assistance when I needed it, and allowed me the freedom to forge my own path when I’m being particularly stubborn. Thank you.

I would also like to give a special thanks to all the postdocs who have mentored me over the past six years. Both Shaun Fouse and Ravi Nagarajan in the lab, and Tomas

Rube, Aaron Diaz, and Courtney Onodera on the computer. You are the ones that took the time to teach me new techniques, and put up with my constant questions. None of this work would have been possible without your encouragement and assistance.

Finally I want to thank my family for their endless love and support. I would like to thank my sister Amy Bell for serving as an example of strength and independence throughout my life. And to my parents, I am confident that I wouldn’t have made it out of high school if not for you. Thank you for prioritizing education in my life, for always being there to cheer me on, and for teaching me to stay young at heart.

iii Discovery and molecular mechanism of cis-regulatory element mutations in cancer

Robert J.A. Bell

ABSTRACT

Cis-regulatory element (cRE) mutations underlie hereditary cancers and may contribute significantly to sporadic cancer, but cancer genetics research to date has primarily focused on the coding exome. Non-coding mutations can contribute to tumorigenesis by altering transcription factor (TF) binding and the expression level of their target (s). Determining which cRE mutations are functional and act as drivers of a malignant phenotype in part depends on the cell type-specific chromatin state and the presence of relevant transcription factors. Only a fraction of cREs in the genome are active in a given cell type, and many show little conservation between species, making their identification difficult using comparative genomics alone. Recent advances in mapping histone modifications using ChIP-seq allows for the rapid identification of active cREs in any accessible tissue type. We generated these “chromatin state” maps from primary Glioblastoma (GBM) and adult normal brain tissues, and intersected publicly available GBM whole genome and transcriptome sequencing data to identify novel mutations in cREs. We identified the TERT promoter to be the most commonly mutated cRE in GBM patients. TERT is critical for the modulation of telomerase activity, the enzyme responsible for telomere maintenance, and TERT promoter mutations are among the most common genetic alterations observed across

iv multiple cancer types. We identify the functional consequence of these mutations in

GBM to be recruitment of the multimeric GABP transcription factor specifically to the mutant promoter. Allelic recruitment of GABP is consistently observed across four cancer types, highlighting a shared mechanism underlying TERT reactivation. This study provides an experimental and computational framework to identify driver cRE mutations that is widely applicable to most other tumor types.

v

TABLE OF CONTENTS

CHAPTER 1: INTRODUCTION

1.1 GENE REGULATION IN TUMORIGENESIS……………………………………………2

1.2 CIS-REGULATORY ELEMENTS : PROMOTERS AND ENHANCERS……………..3

1.3 CIS-REGULATORY ELEMENT ACTIVITY IN CANCER………………………………4

1.4 GENETIC VARIATION RESPONSIBLE FOR CRE ACTIVITY CHANGES…………5

1.5 CHALLENGES STUDYING CRE MUTATIONS………………………………………..6

1.6 AIMS OF THIS STUDY…………………………………………………………………….9

CHAPTER 2: A FRAMEWORK FOR DETECTING FUNCTIONAL NON-

CODING REGULATORY ELEMENT MUTATIONS IN CANCER

2.1 UTILIZING EPIGENOMICS TO DEFINE CIS-REGULATORY ELEMENTS

GENOME WIDE……………………………………………………………………………….11

2.1.i Rationale for histone profiling…………………………………………………………..11

2.1.ii Chromatin immunoprecipitation-sequencing (ChIP-seq)……………………………12

2.1.iii ChIP-seq data analysis………………………………………………………………...13

2.1.iv Applications of ChIP-seq to cancer epigenomics…………………………………...14

2.2 IDENTIFYING SOMATIC MUTATIONS GENOME WIDE …………………………...16

2.2.i Sequencing alignment, filtering, and recalibration……………………………………17

2.2.ii Detecting single nucleotide mutations in WGS tumor-normal pairs……………….19

2.3 ANNOTATING CRE MUTATIONS FOR THEIR EFFECTS ON TF BINDING AND

GENE EXPRESSION…………………………………………………………………………20

vi 2.3.i Annotating cRE mutations for conservation and open-chromatin status…………..21

2.3.ii Predicting the effects of cRE mutations on TF binding affinity……………………..21

2.3.iii Identifying recurrently mutated cRE mutations across patients……………………22

2.3.iv Linking cREs to their target ……………………………………………………23

2.4 APPLICATIONS OF THE FRAMEWORK……………………………………………..24

2.4.i Identifying active promoters and enhancers in normal brain and GBM……………24

2.4.ii Identifying functional splicing mutations in neuroblastoma…………………………26

2.4.iii Annotating functional single-nucleotide changes in enhancer assays……………28

2.4.iv Identifying TERT promoter mutations in GBM………………………………………30

CHAPTER 3: THE TRANSCRIPTION FACTOR GABP SELECTIVELY

BINDS AND ACTIVATES THE MUTANT TERT PROMOTER IN CANCER

3.1 ABSTRACT………………………………………………………………………………..41

3.2 MAIN TEXT………………………………………………………………………………..41

CHAPTER 4: MATERIALS AND METHODS

4.1 SAMPLE ACQUISITION…………………………………………………………………88

4.2 TERT GENOTYPING……………………………………………………………………..88

4.3 RNA EXTRACTIONS AND QRT-PCR FOR HUMAN PRIMARY GLIOBLASTOMAS

AND NORMAL BRAIN………………………………………………………………………..88

4.4 CHROMATIN IMMUNOPRECIPITATION (CHIP) …………………………………….89

4.5 DEFINING PROMOTER AND ENHANCER STATES………………………………..91

4.6 CALLING SOMATIC MUTATIONS IN PRIMARY GBM……………………………..92

vii 4.7 COMPUTING DIFFERENTIAL TF BINDING AFFINITY…….……………………….92

4.8 CELL CULTURE………………………………………………………………………….93

4.9 CELL PROLIFERATION ASSAY……………………………………………………….93

4.10 FLOW CYTOMETRY……………………………………………………………………94

4.11 LUCIFERASE ASSAYS AND SITE-DIRECTED MUTAGENESIS………………...94

4.12 SIRNA KNOCKDOWN AND RT-QPCR………………………………………………95

4.13 RNA-SEQ GENE EXPRESSION ANALYSIS………………………………………..96

4.14 ALLELE-SPECIFIC ANLYSIS OF CHIP DNA AND RNA-SEQ READS………….96

4.15 BUFFERS AND DNA CONSTRUCTS FOR SINGLE-MOLECULE PROTEIN

BINDING ASSAY……………………………………………………………………………...97

4.16 FLUORESCENT LABELING OF GABPB AND ETV4……………………………...98

4.17 SINGLE MOLECULE FLUORESCENCE DATA ACQUISITION…………………..98

4.18 PEAK MOTIF ENRICHMENT ANALYSIS……………………………………………98

4.19 SPACING DEPENDENCE OF CHIP ENRICHMENT………………………………101

4.20 ALLELE-SPECIFIC BINDING FROM ENCODE DATA…………………………...102

CHAPTER 5: DISCUSSION

5.1. POTENTIAL FOR FRAMEWORK EXPANSION……………………………………104

5.2 AREAS OF IMPROVEMENT…………………………………………………………..105

5.3 DETECTING OTHER CIS-REGULATORY ELEMENT DRIVER MUTATIONS….107

5.4 FUTURE DIRECTIONS FOR STUDYING TERT ACTIVATION BY GABP………109

5.5 CLINICAL IMPLICATIONS…………………………………………………………….112

5.5.i History of telomerase inhibition……………………………………………………….112

vi ii 5.5.ii Strategies for targeting GABP………………………………………………………..115

5.5.iii Caveats to targeting GABP…………………………………………………………..117

5.5.iv Potential indications for clinical trials………………………………………………..118

BIBLIOGRAPHY………………………………………………………………………….120

ix LIST OF TABLES

Table 2.1 GBM and normal brain ChIP-seq read quality control statistics………….36

Table 2.2 Recurrent genes with enriched splicing motif mutations…………………..38

Table 2.3 TFs predicted to have differential binding at the TERT promoter mutation sites……………………………………………………………………………………………..39

Table 3.1 Summary of the TERT mutation status for all samples and cell lines used………………………………………………………………………………….79

Table 3.2 Cloning and mutagenesis primers…………………………………………..80

Table 3.3 siRNAs used for knockdown studies………………………………………..81

Table 3.4 qPCR primers used for SYBR green gene expression assays…………..82

Table 3.5 ENCODE Gabpa ChIP-seq peak annotations……………………………..83

Table 3.6 SRA ID numbers for raw sequencing files………………………………….86

x

LIST OF FIGURES

Figure 2.1 cRE calls identified by epigenomic profiling………………………………..32

Figure 2.2 Mutations called from whole-genome sequencing of 40 GBM patients…33

Figure 2.3 Nucleotide changes in the SORL1 enhancer predicted to affect TF-binding often have a larger effect size in reporter activity………………………………………….34

Figure 2.4 Recurrenty mutated cREs in 40 GBM patients…………………………….35

Figure 3.1 TERT promoter mutant GBMs display elevated TERT expression compared to TERT wild-type GBM or normal brain (NB)…………………………………52

Figure 3.2 The de novo ETS motif is critical for mutant TERT promoter activity in

GBM……………………………………………………………………………………………..53

Figure 3.3 ETS family gene expression in GBM………………………………………..54

Figure 3.4 TERT expression in response to ETS siRNA knockdown…………….…..56

Figure 3.5 GABPA selectively regulates and binds the mutant TERT promoter across multiple cancer types………………………………………………………………………….58

Figure 3.6 GABPA selectively regulates the mutant TERT promoter in U251 cells...60

Figure 3.7 Cell cycle profile and proliferation changes with GABPA knockdown in

GBM cell lines………………………………………………………………………………….61

Figure 3.8 Enrichment of the mutant (CCGGAA) and wild-type (CCGGAG) hexamer sequences within genome-wide………………………………………………….…………..63

Figure 3.9 Other ETS factor candidates do not bind the mutant TERT promoter…...65

Figure 3.10 GABP but not ETV4 selectively binds to mutant TERT promoter DNA….67

xi Figure 3.11 Read coverage around the TERT promoter mutations in ENCODE GABPA

ChIP-Seq, Pol II ChIP-Seq, and Digital Genomic Footprinting (DGF) data from HepG2 and SK-N-SH cells…………………………………………………………………………….69

Figure 3.12 GBMs harboring TERT promoter mutations display allele-specific

H3K4me3 and gene expression……………………………………………………………..70

Figure 3.13 Relationship between GABP motif pair distances and binding strength and

TERT promoter activity……………………………………………………………………….71

Figure 3.14 Map of CCGGAA motifs around ENCODE GABPA ChIP-seq peaks in

HepG2 cells…………………………………………………………………………………….73

Figure 3.15 G228A and G250A cooperate with the native ETS sites ETS-195 and

ETS-200 and fall within spacing for GABP heterotetramer recruitment…………………74

Figure 3.16 Fourier spectral analysis of the GABPA motif spacing distribution for separation between 15 and 100 bps from Figure 3.15A…………………………………..76

Figure 3.17 GABPB isoform expression in GBM…………………………………………77

xii

CHAPTER 1.

INTRODUCTION

1 1.1 GENE REGULATION IN TUMORIGENESIS

The human body is composed of approximately 200 distinct cell types with specifically defined characteristics and functions. Each cell contains the same genome, with the same number of genes, and yet a wide variety of functional and morphological diversity is observed across cell types. Organismal and cellular complexity doesn’t arise from total gene number, but how those genes are coordinately regulated in a spatial and temporal manner(1). Only a subset of genes are expressed in a given cell type, and it is the timing, combination, and expression levels of these genes that give rise to the cellular diversity observed in the human body.

Deregluation of this finely tuned complex orchestration of gene expression can give rise to many diseases, including cancer`. It is the improper expression of genes that allow evolving cancer cells to divide limitlessly, avoid cell death mechanisms, and invade throughout the body. Understanding the mechanisms of how cells regulate proper gene expression programs and how this process is hijacked in cancer are fundamental to uncovering novel therapeutic strategies to fight this devastating disease.

The proper expression of protein coding genes is regulated by multiple molecular mechanisms, but all are fundamentally controlled at the level of mRNA transcription.

The amount of mRNA produced from a given gene is tightly controlled by a series of non-coding DNA elements called ‘cis-regulatory elements’ (cREs)(2, 3). cREs are portions of DNA containing specific DNA sequences that act as recognition and binding sites for trans-regulatory elements called transcription factors (TFs)(4, 5). Upon binding

2 to DNA, TFs can recruit other co-activator that facilitate the recruitment of RNA polymerase II to the gene promoter.

In order for TFs to bind a given cRE, however, the DNA needs to be accessible within the nucleus. If the TF binding site is wrapped tightly around a series of nucleosomes

(inactive state), then the TF will not be able to bind that sequence. In contrast, if the

DNA is left unwound (active state), the TF can bind the cRE’s recognition sequence and facilitate transcription of the proper target gene(2). Thus, cRE activity can be directly regulated by chromatin structure, another fundamental mechanism of gene regulation in normal and disease cell states. For any given gene to turn on, the correct cREs need to be in accessible chromatin, and the matching set of TFs need to be present in the nucleus. Only then will RNA Pol II be recruited to the gene transcription start site (TSS) and initiate mRNA transcription.

1.2 CIS-REGULATORY ELEMENTS : PROMOTERS AND ENHANCERS

There are many classifications of cREs throughout the cell but the two primary ones responsible for gene activation are promoters and enhancers(6-8). Promoters lie directly upstream of a gene’s TSS, and are unidirectional by definition. This means that an active promoter will always initiate transcription in the same direction along the DNA.

The vast majority of genes, however, cannot initiate transcription based on promoter activity alone. These genes require the activity of distal cREs called enhancers.

3 Enhancers by definition are distal from their target TSS, and are multidirectional. They can exist upstream, downstream, or within the gene body of their target, and can even exist on separate . The TFs bound to an active enhancer are brought into proximity with the TFs bound at the gene promoter through chromatin-looping(9). Only the combination of TFs bound to both enhancers and promoters is sufficient to recruit

Pol II and initiate transcription. Whereas promoters have a one-to-one relationship with their target TSS, multiple enhancers can regulate each TSS. Furthermore, a single enhancer can interact with various target genes, depending on the chromatin environment and milieu of TFs in the nucleus. Many enhancers have cell type-specific activity and are responsible for fine-tuning the complex gene expression patterns found in each cell type(3, 4).

1.3 CIS-REGULATORY ELEMENT ACTIVITY IN CANCER

Improper regulation of cRE activity can act as a driver of tumorigenesis. For example, a recent genome-wide comparison of Histone 3 Lysine 4 monomethylation (H3K4me1) between colorectal cancer (CRC) and tissue-matched normal colonic crypt cell lines, revealed a set of recurrently gained and lost enhancers(10). These variant regulatory elements were predictive of in-vivo CRC gene expression, implicating a substantial role for cREs in promoting tumorigenesis. In glioblastoma (GBM), the most common and aggressive form of adult brain tumors, it has also been shown that specific cREs can be recurrently silenced or activated throughout tumorigenesis(11, 12).

4 cRE activity changes have also been shown to play a prominent role in predicting response to therapy in GBM. The standard of care, surgical resection followed by the alkylating agent Temozolomide and radiation treatment, results in a median survival of only 12-15 months(13). Approximately 20-30% of GBMs display promoter methylation and silencing of MGMT, a gene critical to the proper DNA repair of alkylating lesions(14,

15). When treated with the standard of care, patients harboring this silenced promoter are more likely to survive significantly longer(16).

1.4 GENETIC VARIATION RESPONSIBLE FOR CRE ACTIVITY CHANGES

Mutations in cREs can contribute to many diseases including cancer. For example, single SNPs in both promoters and enhancers can contribute to X-linked deafness and preaxial polydactyly (1, 17-21). Interestingly, of 1,200 of the most significant SNPs from

GWAS studies, approximately 40% are found within non-coding regions(3, 22).

Furthermore, non-coding GWAS SNPs are more likely to fall within enhancers that are active in the disease-relevant tissue type(23). Most known functional cRE mutations disrupt TF binding and subsequent target gene expression however, mutations can also increase TF binding, or even create de novo TF binding sites (TFBS)(24). In hereditary cancer, cRE mutations have been shown to underlie cancer susceptibility(25, 26). For example, a mutation in the promoter of DAPK1 tracks with a hereditary form of Chronic

Lymphocytic Leukemia(27). Further analyses show that the promoter mutation increases HOXB7 binding and subsequently decreases DAPK1 expression. In hereditary non-polyposis colon cancer, promoter mutations in both the MLH1 and MSH2 genes strongly associate with cancer progression(28, 29).

5

Recent genome-wide studies have identified a series of recurrently mutated cREs across multiple cancer types. A pan-cancer bioinformatics study identified recurrent somatic mutations within cREs of PLEKHS1, WDR74, and SDHD(30). Additionally, it has been shown that many non-coding somatic mutations drive allele-specific gene expression changes associated with colorectal carcinoma(31). None of these studies have tested the functional effects of these cRE mutations on tumorigenesis directly.

Perhaps the best example of a driver cRE mutation is the recently discovered recurrent mutation of the TERT promoter. These mutations were initially discovered to occur in

71% of sporadic melanomas, exceeding the recurrence frequency of BRAF and NRAS exonic mutations(32, 33). Currently, TERT promoter mutations are found in 21% of medulloblastomas(34), 47% of hepatocellular carcinomas (HCC)(35), 66% of urothelial carcinomas of the bladder(36), 71% of melanomas(32, 33), and 83% of primary glioblastomas (GBM)(37), making them the most recurrent single-nucleotide mutations observed in these cancer types. Germline mutations have also been discovered in the

TERT promoter in a family with hereditary melanoma(32). This finding further underscores the fundamental role these mutations play in driving tumorigenesis.

1.5 CHALLENGES STUDYING CRE MUTATIONS

Despite their potential to contribute to cancer, there is a significant lack of studies devoted to discovering and analyzing cRE mutations. One reason they are understudied is that they have been difficult to define genome wide. Evolutionary conservation has

6 commonly been used to identify functional non-coding regions(1, 38, 39). While conservation does enrich for cREs, one- to two-thirds of functional TFBS are not well conserved and thus missed by this method(40-42). Conservation alone offers limited insight into the biological function of a cRE; one cannot determine if the cRE acts as a promoter or enhancer, or what cell type(s) it is active in. Another method to define cREs has been to select a fixed distance upstream of all known transcription start sites (TSS).

A whole genome sequencing study of lung carcinoma used such an approach to analyze the mutational spectrum in proximal promoters, and found that both promoter and exon mutations occur at significantly lower frequency than the genome-wide mutation rate(43). This finding implies that mutations in promoters are under purifying selection to a similar extent as exons, implying that mutations found in promoters are more likely have adverse effects on gene expression than background mutations.

However, the study was restricted to arbitrarily defined proximal promoters, as they had no genome-wide cRE annotations to couple with their sequencing data.

Discriminating driver from passenger mutations is also inherently more complex in cREs compared to exons(40). Recurrence frequency of mutations is one useful parameter for selecting candidate driver mutations in exons or cREs. In contrast to exon mutations which may alter protein structure, a driver cRE mutation will most likely affect TF binding, a highly cell type-specific event(19). TFBS motifs are often degenerate; mutations can have a variable effect on binding affinity depending on the position of the nucleotide in the motif(44, 45). There is a lack of available tools to predict which cRE mutations will affect TF binding and gene expression. To accurately predict such an

7 effect, chromatin status and candidate TF expression from the matching cancer tissue is essential. Recently, tools have been developed to predict functional germline variants in cREs(46, 47); however, no analogous method has been developed to predict functional somatic mutations in cREs.

The advent of next generation sequencing coupled with the influential findings that cREs are marked by specific histone modification signatures provides a novel opportunity to define cell type-specific cREs on a genome-wide scale(4, 7, 23, 48).

Active promoters are marked by the presence of H3K4me3, and enhancers have a signature defined by the presence of H3K4me1 and absence of H3K4me3(2, 48, 49).

Histone 3 Lysine 27 acetylation (H3K27ac) further distinguishes cREs which are active in a given cell type from those in a poised state(48, 50). Data analysis tools have been developed to localize these and other cREs based on epigenomic profiles(23, 51, 52).

Seventy to eighty percent of cREs defined by such methods can be experimentally validated, a dramatic improvement in specificity relative to the evolutionary conservation approach alone(23, 53, 54). The Reference Epigenome Mapping Center (REMC) has profiled these marks in seven different regions of human normal brain(55, 56). The

Costello lab has generated the first H3K4me3 map of GBM, but H3K4me1 and

H3K27ac profiles have not yet been created. Harnessing chromatin mapping to annotate cREs allows us to capture both conserved and non-conserved cREs, as well as inform us about which cREs are active and relevant to study in our cell type of interest.

8 Next-generation sequencing technologies also allow for DNA sequencing of the entire in a time-efficient and cost-effective manner. By sequencing and comparing the entire genome of a patient’s tumor and normal DNA, investigators can now quickly identify all somatic mutations, copy number changes, and chromosomal rearrangements that have accrued specifically during the tumor’s development. Such technology allows for the identification of both coding and non-coding somatic mutations, which when combined with epigenomic profiling of cREs can be used to identify cRE mutations specific to a given tumor type.

1.6 AIMS OF THIS STUDY

Somatic mutations in cREs play a critical role in tumorigenesis, but they have been severely understudied in the field of cancer genetics. The goal of this study is to provide a framework to allow future investigators to identify tissue type-specific cRE mutations, computationally annotate and prioritize them for their ability to alter TF binding, and experimentally validate their functional effects on transcription and tumorigenesis.

We have corroborated this framework by applying it to GBM patient data obtained from

The Cancer Genome Atlas Project(14, 57). Doing so identified TERT promoter mutations as the most common cRE mutations in GBM. Finally, we validate the functional effect of TERT promoter mutations in GBM, and identify the mechanism of action connecting the mutations to altered TERT gene expression.

9

CHAPTER 2.

A FRAMEWORK FOR DETECTING FUNCTIONAL NON-

CODING REGULATORY ELEMENT MUTATIONS IN

CANCER

10

2.1 UTILIZING EPIGENOMICS TO DEFINE CIS-REGULATORY ELEMENTS GENOME WIDE

2.1.i Rationale for histone profiling

The majority of cREs are cell type specific, and their activity is dependent on the proper combination of chromatin accessibility and transcription factor (TF) availability. There are an estimated 105-106 enhancers in the human genome, but on average about

30,000 are thought to be active in a given cell type(4). As distal cREs are not defined by their distance to known TSS’s, and are difficult to characterize by their DNA sequence content alone, new techniques are required to accurately and robustly define them across the genome.

The understanding that cREs are marked by specific histone signatures provides researchers with the novel opportunity to define cell type-specific cREs on a genome- wide scale(23). Active promoters and enhancers can be accurately and robustly annotated by a signature of just three histone modifications (H3K4me1, H3K4me3, and

H3K27ac)(2, 49). By performing chromatin immunoprecipiation followed by deep sequencing (ChIP-seq) against each of these histone modification, investigators can integrate the resulting datasets to obtain an accurate and cell type-specific profile of both active and poised cREs genome wide. Below is a description of the experimental and computational techniques used in this project to define active promoters and enhancers in a given cancer type.

11 2.1.ii Chromatin immunoprecipitation-sequencing (ChIP-seq)

Alterations in histone modification patterns and transcription factor binding impact gene expression and have been implicated in tumorigenesis, cancer cell stemness, metastasis, and drug resistance(58-61). ChIP-seq has become the gold standard to study these modifications genome-wide. It provides higher resolution, improved signal- to-noise ratios, and when using indexed libraries, it is less expensive than coupling

ChIP with microarrays (ChIP-chip)(62). Fresh or fresh frozen tissue or cells are either kept native (N-ChIP)(63) or formaldehyde cross-linked to preserve weaker DNA-protein interactions (X-ChIP)(64), followed by cell lysis. N-ChIP is primarily used for histone modifications, where the DNA histone interactions are inherently strong(62). Antibody specificity and immunoprecipitation are more efficient with N-ChIP as epitopes can be disrupted by formaldehyde(63), but N-ChIP cannot be applied to proteins with lower

DNA binding affinities such as transcription factors. Cross-linking ameliorates this problem, and minimizes stochastic nucleosome movement that can occur during N-

ChIP(63); however, it also may fix transient non-functional interactions and react at lysines which may create biases. Native or cross linked chromatin is then fragmented by sonication or microccocal nuclease (MNase) digestion. Both methods impart bias in downstream sequencing(65). MNase creates higher resolution, primarily mononucleosome (~146bp) fragments, but is less efficient at cutting between G and C bases, creating greater fragmentation bias(66, 67). In contrast, sonication provides decreased resolution (200-600bp) but is more uniform(62). Fragmented chromatin is immunoprecipitated with an antibody that specifically recognizes the epitope of interest.

12 The success of ChIP reactions is dependent on antibody quality. Polyclonal antibodies are advantageous for X-ChIP experiments, as they reduce the chance of crosslinking destroying antibody interactions(64), but may have increased cross-reactivity. Relative enrichment of ChIP DNA is assayed via qPCR. Enrichment varies greatly with the protein of interest, antibody quality, and positive and negative control regions of the genome that are used. To minimize the number of reads contributing to background noise, it is common to require greater enrichment in ChIP-seq (5-50 fold) when compared to single locus ChIP-PCR(65). Purified ChIP DNA sequencing libraries are constructed by end repair, ‘A’ base addition, adapter ligation, PCR amplification and size selection. Additional bias may occur during library construction and PCR amplification, as both GC-rich and GC-poor regions are underrepresented(62, 65). The total number of sequence reads required depends on the quality of ChIP enrichment, the expected number of peaks and peak size, but sequencing multiple indexed ChIP libraries in a single lane is now common practice.

2.1.iii ChIP-seq data analysis

Transforming the millions of sequencing reads generated by ChIP-seq into biologically interpretable data is a computationally demanding, multi-step process for which a variety of tools have been developed. While the tools are addressing the same problem, each tool is different and can impact the final result. The first and most resource- intensive step is aligning the sequence reads to the genome. Most sequencing platforms come with alignment pipelines, but many third party aligners are commonly used, such as MAQ(68), Bowtie(69), BWA(70), and SOAP(71, 72). These packages

13 differ by alignment algorithm, as well as how multi-aligning reads and gapped vs. un- gapped alignments are handled, resulting in differences in sensitivity and specificity. For most cancer samples a gapped aligner is preferred to allow for the variety of genetic aberrations accumulated in the tumor.

Aligned reads are then analyzed to find enriched areas or ‘peaks’ in the genome, for which a number of ‘peak calling’ algorithms have been created(62, 73). Though the exact method varies between programs, most shift tags based on chromatin fragment size to accumulate tags near the true binding site and increase peak resolution(73).

Regions of statistical enrichment of IP tags relative to a background control are calculated. The most commonly used control is input DNA isolated from the same chromatin batch as the ChIP(62). This reduces false positives introduced from fragmentation and mappability biases, and controls for genetic differences such as copy number alterations that could affect read density. Finally, peaks are filtered based on the presence of duplicate reads introduced from PCR, and uneven distributions of sense and antisense tag accumulation(73). Most current peak callers identify focal enrichments such as transcription factor binding sites, but some have been developed for broader marks like histone modifications associated with heterochromatin(74-76).

Many groups are actively developing ways to reduce noise and increase true positives.

2.1.iv Applications of ChIP-seq to cancer epigenomics

The network of transcription regulatory factor interactions and their effects on gene expression in cancer are under investigation. ChIP-seq was initially used to profile T-

14 cells, and since then a main focus has been on embryonic stem cells and cell lines(77-

79). Current cancer epigenome research is focused on investigating how the epigenome remodels during tumor progression. Multidimensional epigenomic profiles of tumors also provide a novel means of sub-type classification, identifying prognostic markers, and insight into tumor cell of origin. ChIP-seq will also help the annotation and functional characterization of non-genic susceptibility loci, as has been recently performed in (80) and in GWAS studies(23). New techniques are being developed to perform ChIP-seq on a small number of cells, creating an opportunity to better analyze intratumoral heterogeneity of epigenomic patterns(81, 82). Finally, conformation capture (3C) technology(83) and its high-throughput derivatives (4C(84), 5C(85), Hi-C(86), ChIP-Loop(87, 88), ChIA-PET(89)) detect distal

DNA-DNA interactions (e.g. promoter-enhancer), but can also be used to identify complex genomic rearrangements in cancers(90). Coupling ChIP with 3C technologies followed by sequencing will likely be a powerful way to study how both epigenetic patterns, and associated structural interactions change during the process of tumorigenesis.

Recently, distinct chromatin states or ‘signatures’ comprised of combinatorial histone marks have been linked to specific functional genomic elements by integrating multiple

ChIP-seq data across human cell lines(4, 23, 91). Analysis tools have been developed to isolate these recurring patterns of multiple histone modifications, such as those observed in cREs. One particular tool, ChromHMM, has been utilized by both the

ENCODE and Roadmap Epigenome projects to segment the entire genome into a

15 series of discrete chromatin states, ranging in a series of healthy and diseased cell types(23, 56). Like most other ChIP-seq analysis tools, chromHMM begins by binarizing the genome into ‘peaks’ of enrichment for each given histone modification (or transcription factor) individually. A Hidden Markov Model (HMM) is then applied to segment the genome into a pre-defined number of chromatin states. Each state is defined by a commonly occurring pattern of histone mark peaks that is observed throughout the genome(51). Once states are assigned, biologists can look at which marks are present and absent in each state, and use existing knowledge of these patterns to assign each state a function such as ‘active promoter’ or ‘poised enhancer’.

The strength of using chromHMM is that it is an unsupervised approach to defining cREs. The HMM algorithm will identify the most commonly occurring patterns of the provided histone modification data. This means it will pick up known histone mark combinations, or chromatin signatures, but it can also detect common patterns of histones and TF-binding that do not yet have an established function. Future experiments can be performed to derive the function of these novel chromatin signatures.

2.2 IDENTIFYING SOMATIC MUTATIONS GENOME WIDE

Despite the growing body of evidence supporting the role of cRE mutations in cancer, the majority of cancer genetics research has focused on the ~2% of the genome which codes for protein. However, the advent of next generation sequencing allows for the sequencing of patients’ entire genomes for increasingly cheaper costs. We will likely reach the cost of $1,000 per whole-genome sequence (WGS) by the end of the decade,

16 driving the era of precision medicine forward. Patient specific WGS data will allow cancer geneticists to identify somatic mutations both in exons as well as functional non- coding genomic space, and begin to address the extent to which cRE mutations can drive sporadic cancers. To study the acquired genetic mutations of an individual tumor,

WGS must be performed on both the tumor DNA, as well as the patients’ ‘normal’ DNA.

By analyzing the tumor and normal DNA as a patient-matched pair, investigators can effectively subtract out any normal (germline) genetic variation present in the patient, in order to properly identify the cancer-specific mutations. Below describes a computational pipeline to process and analyze matched tumor-normal WGS datasets to identify somatic mutations genome wide.

2.2.i Sequencing alignment, filtering, and recalibration

The majority of human WGS data is being generated from Illumina’s sequencing technology. Each sequencing run from an Illumina machine generates millions of individual DNA sequences or ‘reads’ ranging from 50-150bp in length. The first step in analyzing these data is to align each of the reads to the human reference genome.

There are multiple proprietary and open-source sequence aligners available, but Bowtie and BWA have become the most commonly used(70, 92). In this process, reads of low quality and reads that do not match any known sequence in the human reference genome are removed.

Once aligned, there are a series of post-alignment processing steps that are applied to improve data quality. Tools such as Picard Suite and the Genome Analysis Tool Kit

17 (GATK) can be utilized to remove duplicate reads, or reads that have the exact same sequence(93). These duplicate reads are problematic for downstream analysis, as it is unclear if they accurately represent an individual sequence, or if they are a byproduct of

PCR amplification (required for sequencing library construction). Duplicate reads can be a particular challenge when analyzing enrichment-based NGS datasets such as ChIP- seq or DHS-seq.

Once duplicate reads have been removed, reads aligning to multiple locations in the genome, or ‘multi-mappers’, can be filtered with the same tool sets. A large fraction of the genome is made up of repeat sequences derived from retrotransposon duplications throughout evolution(94). A sequence read matching these repeat sections is difficult to accurately align and can cause false positive mutation calls due to mis-alignment. The current accepted strategy is to remove all multi-mappers prior to mutation calling, in order to increase specificity.

Finally, most somatic mutation detection tools use the base quality score (or Phred score) assigned to each nucleotide of each read, in order to assess confidence in each mutation call. It is commonly accepted to re-calibrate all base quality scores prior to mutation calling, as the Phred scores originally generated with the raw data can often be inflated or suffer from batch effects(93).

18 2.2.ii Detecting single nucleotide mutations in WGS tumor-normal pairs

While many somatic mutation callers exist, Mutect is one of the most commonly used as it was designed at the Broad institute and thus works seamlessly with other Broad NGS analysis tools such as GATK(95). As minimal input, Mutect requires one aligned tumor and patient-matched normal sequencing file (see chapter 2.2.i), as well as a reference human genome sequence. However, Mutect can also take dbSNP and COSMIC files in order to annotate each called variant as having previously been identified as a SNP or logged in the COSMIC database(95).

Mutect scans each base in the genome where there is sufficient read coverage in both the tumor and normal sequencing file. At each of these positions it calculates A) the probability that the observed is different from that of the reference genome and B) the probability that the observed base pair in the tumor is different from that observed in the normal. At the end of the analysis two output files are generated. One is a wiggle file that contains information about which bases in the genome Mutect had sufficient power to detect a mutation. This file is essential if mutation frequency analysis is the downstream goal, as one needs an accurate measure of ‘callable space’ in order to estimate mutation frequencies. The second file is a text file containing all the variant calls. Each row is a different variant call where Mutect determined the base call was statistically different from the reference genome sequence. A separate column indicates if the variant was determined to be somatic or already present in the germline. Thus, this one output file contains both normal genetic variants associated with a given patient, as well as the tumor-acquired somatic mutations. Mutect can be applied to

19 Exome-seq datasets as well as WGS datasets. If WGS files are used as input, the resulting mutation calls can be intersected with genome-wide cRE definitions generated from ChIP-seq analysis in order to focus on the specific subset of cRE mutations.

2.3 ANNOTATING CRE MUTATIONS FOR THEIR EFFECTS ON TF BINDING AND

GENE EXPRESSION

While many strategies have been developed to filter functional from non-functional mutations in exonic regions, there is a lack of tools for predicting functional mutations in non-coding regions of the genome. Unlike exon mutations, the primary effect of functional cRE mutations is to alter (strengthen or weaken) TF binding affinity, and subsequently de-regulate target gene expression(19, 24, 26, 96, 97). The activity of a cRE not only depends on its sequence content, but on its chromatin status and the availability of specific TFs in a given cell type. This information must be taken into account when predicting if a cRE mutation is likely to have a functional outcome. It is thus critical that each dataset used for annotation and functional prediction of cRE mutations is generated from a cell or tissue type that closely matches that used to generate mutation calls and cRE definitions. By combining existing tools with in-house generated scripts, I have constructed a cRE mutation functional prediction computational pipeline that annotates each input mutation with 1) evolutionary conservation, and tissue-matched 2) open chromatin status, 3) TF binding prediction, 4) target gene expression, and 5) recurrence across multiple patient samples.

20 2.3.i Annotating cRE mutations for conservation and open-chromatin status

Epigenomically defined cREs, usually ranging from 200-3Kb in length, can be further subdivided into areas of open- and closed-chromatin with assays such as DNase-seq and ATAC-seq(98, 99). Mutations that fall within these subsets of exposed chromatin are more likely to overlap an area of TF-binding. Furthermore, although some cREs are not globally conserved, a nucleotide within a cRE that shows greater inter-species conservation is more likely than others to play a key role in TF-binding(41). Evolutionary conservation and open-chromatin status can both be easily obtained by running a list of mutation calls through the single nucleotide variant annotation package

ANNOVAR(100). ANNOVAR can also be used to annotate mutation calls with any existing ENCODE datasets, including TF ChIP-seq and histone modification data.

2.3.ii Predicting the effects of cRE mutations on TF binding affinity

To predict whether a given mutation increases or decreases TF-binding, one needs to first determine whether the mutation position falls within a predicted binding sequence motif for a TF based on the neighboring DNA sequence, and then asses whether the mutation alters the predicted binding strength of the given TF. The challenge is that TF binding sites are inherently degenerate, meaning that somatic mutations can have a variable effect on binding affinity depending on the position of the nucleotide in the motif. This degeneracy is encoded in the form of a probability matrix called a Position

Specific Scoring Matrix (PSSM). The PSSM of a specific TF is a 4xN matrix where each row is one of the four DNA nucleotides and each column is a position in the binding motif sequence of length N. The value of each matrix entry gives the probability at which

21 a specific nucleotide is observed at a specific position within a binding site for the given

TF (see methods section 4.7). PSSMs have been generated for the majority of human

TFs and are obtainable though databases such as TRANSFAC and JASPAR(101, 102).

I have generated in-house scripts to scan each mutation for its predicted effects on TF binding affinity using PSSMs. For each mutation, the neighboring DNA sequence is scanned to predict which PSSMs likely match both the mutated and wild-type sequence.

For each PSSM scan, the wild-type PSSM score is subtracted from the mutant PSSM score, generating a list of TFs that have differential binding predictions for each mutation. Although a TF motif may come up as being largely affected by a mutation, it should not be considered for functional annotation purposes if that TF is not expressed in the cell type of interest. A strength of the computational pipeline is that one can restrict the TF search to only those that are expressed in the cell type of interest.

2.3.iii Identifying recurrently mutated cRE mutations across patients

Another method to enrich for functional mutations, coding or non-coding, is to determine which mutations commonly occur across a patient cohort more often than would be expected by chance. This method enriches for mutations that are positively selected for throughout the tumor’s evolution, due to the mutation’s beneficial phenotype on the cancer cell. Many driver oncogenes and tumor suppressor genes (TSGs) have been identified with this method such as , BRAFV600E, and EGFR(14, 103).

22 Here, we evaluate recurrent cRE mutations in two ways. The first is to evaluate cRE mutations on an individual nucleotide level. If the exact same genomic position is mutated across many tumor patients, it is strong evidence that the mutation is essential for tumorigenesis. Examples of coding mutations identified this way are BRAFV600E in melanoma, and IDH1 in low-grade gliomas(104, 105). However, some genes can be mutated at various positions throughout the gene body and still result in the same phenotype. This is particularly true for TSGs and examples of this type of recurrent mutation can be observed in p53, PTEN, and EGFR (14, 57). Thus, we not only evaluate recurrent cRE mutations on a single-nucleotide level, but investigate them on a cRE level as well. As cREs vary greatly in length, it is critical to normalize each cRE by its total length when evaluating which cREs are recurrently mutated across patients.

2.3.iv Linking cREs to their target genes

The final step in the annotation pipeline is to link cREs to their most likely target gene

(or genes). This is simple for promoters as by definition, promoters lie directly proximal to the gene TSS they regulate. Enhancers however can theoretically regulate a gene from any distance in the genome, posing a more difficult challenge in assigning each enhancer to its proper target. It has been observed that the majority of enhancer-target gene pairs are within one Megabase of each other(106). Furthermore, many enhancers regulate the TSS nearest to them, regardless of whether or not the enhancer is positioned upstream or downstream of the TSS. We thus utilized the nearest TSS approach in order to properly assign all promoters and most enhancers to their correct target genes. To prioritize cREs likely driving expression changes in cancer, one can

23 integrate tissue-matched gene expression data (RNA-seq) or tumor-normal differential expression data. cREs are prioritized if their linked target gene is expressed in tumor, or differentially expressed between tumor and normal.

2.4 APPLICATIONS OF THE FRAMEWORK

Although the computational framework was initially designed to identify and prioritize functional cRE mutations in cancer, subcomponents of the framework can be utilized to aid in addressing various other cancer genomics and epigenomics questions. Below are four independent examples of applications of the framework, highlighting its strength and flexibility. Each one of these applications helped advance a project to publication.

2.4.i Identifying active promoters and enhancers in normal brain and GBM

Gliomas display a set of epigenetic alterations that play a central role in tumorigenesis, and response to therapy(15). For example, approximately 20-30% of GBM patients display promoter methylation and silencing of MGMT, a gene critical for the proper DNA repair of alkylating lesions(16). When treated with the standard of care, patients harboring this epimutation are more likely to survive significantly longer. Additionally, demethylation and transcriptional activation of the putative oncogene MAGEA1 is associated with a severely hyperproliferative subset of GBM(107, 108). However, no one has systematically investigated promoter and enhancer activity changes between

GBM and normal brain genome wide.

24 To achieve this goal, we performed histone ChIP-seq on four primary GBM tissue samples and applied the cRE identification portion of the framework (chapter 2.1). The same histone ChIP-seq datasets had been generated for adult normal brain (NB) tissues by the Roadmap Epigonme project (56). These datasets were downloaded and analyzed in parallel with the GBM ChIP-seq data resulting in promoter and enhancer state definitions specific to NB, GBM, or active in both (Figure 2.1, Table 2.1). By integrating the cRE definitions with DNA methylation data generated by the Costello lab, we were able to identify areas of recurrent hypomethylation falling within promoters.

Strikingly, many of these recurrently hypomethylated sites occur within previously unannotated alternative promoters that were identified de-novo by the ChIP-seq pipeline. This work identified TERT, GLI3, and TP73 as all containing gene-body alternative promoters which undergo recurrent hypomethylation in primary GBM. The full body of this work can be read in the journal of Genome Research:

Nagarajan, R.P., Zhang, B., Bell, R.J.A., Johnson, B.E., Olshen, A.B., Sundaram, V.,

Li, D., Graham, A.E., Diaz, A., Fouse, S.D., Smirnov, I., Song, J., Paris, P.L., Wang, T.,

and Costello, J.F., Recurrent epimutations activate gene body promoters in primary

glioblastoma, Genome Research 24, 761–774 (2014).

We also applied our GBM cRE definitions to a separate study investigating DNA methylation changes associated with progression from primary low grade glioma to high grade secondary GBM. The lead investigator of this project discovered that recurrences of high grade (WHO grade III and IV) display a common signature of DNA

25 hypomethylation when compared to recurrences of low grade (WHO grade I and II).

When we integrated our cRE definitions for GBM, we found that many of these high- grade recurrence-associated hypomethylation sites are found within GBM active enhancers. This implies that the transition from a low grade to a high-grade glioma relies on convergent changes in specific enhancer activity driven by epimutations. The full body of this work can be read in the journal of Cancer Cell:

T. Mazor, T., Pankov, A., Johnson, B.E., Hong, C., Hamilton, E.G., Bell, R.J.A.,

Smirnov, I.V., Reis, G.F., Phillips, J.J., Barnes, M.J., Idbaih, A., Alentorn, A.,

Kloezeman, J.J., Lamfers, M.L.M., Bollen, A.W., Taylor, B.S., Molinaro, A.M., Olshen,

A.B., Chang, S.M., Song, J.S., Costello, J.F. et al., DNA methylation and somatic

mutations converge on cell cycle and define similar evolutionary histories in brain

tumors, Cancer Cell. In press (2015).

2.4.ii Identifying functional splicing mutations in neuroblastoma

Aside from gene deregulation at a transcriptional level, transcript isoform control via alternative splicing mechanisms can also play a role in driving cancer(109).

Genetic mutations within cis-motifs that regulate splicing can have a tremendous effect on isoform usage and can even lead to de-novo transcript generation(109). Although originally designed to identify functional cRE mutations, the mutation-calling portion of the computational framework can be used to identify any type of single-nucleotide mutation ranging from exonic to non-coding. We utilized the mutation-calling pipeline to identify GBM mutations residing in splice-site motifs of genes.

26

Using an un-biased in vivo analysis of a mouse model of neuroblastoma, the lead investigator of this project identified splicing quantitative trait loci (sQTLS). These sQTLs were used to discover novel genes affecting splicing control, as well as determine novel cis-acting splice site DNA motifs found within the introns of genes. The genotype of these novel intronic splicing motifs was found to associate with specific isoform usage of their nearby genes. Furthermore, these motifs were found in the introns of both mouse and human genes, supporting the theory that they are functionally conserved across species to regulate splicing events.

Given their functional conservation, we wanted to test whether any of these de-novo splicing motifs were subject to recurrent somatic mutation in cancer. To address this, we downloaded WGS datasets of tumor and normal DNA from 40 primary GBM patients generated by The Cancer Genome Atlas project (TCGA)(57). These datasets were applied to the mutation detection portion of our computational framework (chapter 2.2) in order to determine germline variations and somatic mutations genome-wide (Figure

2.2).

Once identified, we intersected the germline and somatic variant calls with known splicing motif locations and identified three candidate genes, LOC100505811, TRAM2-

AS1, and NPIPA1, with enrichment of somatic mutations compared to germline variants in intronic splicing motifs (Table 2.2). Furthermore, by integrating RNA-seq data generated from the same sample cohort we determined that the recurrent splicing

27 mutations in TRAM2-AS1 and NPIPA1 were associated with significant differential exon usage. This work highlights the flexibility in which the computational pipeline can be applied to identify recurrent non-coding mutations outside of promoter and enhancer mutations. The full body of this work can be read in the journal of Cancer Discovery:

Chen, J., Hackett, C.S., Zhang, S., Song, Y.K, Bell, R.J.A., Molinaro, A., Quigley, D.A.,

Balmain, A., Song, J.S., Costello, J.F., Gustafson, W.C., Van-Dyke, T., Kwok, P., Khan,

J., and Weiss, W.A. The genetics of splicing in neuroblastoma, Cancer Discovery 5,

380–395 (2015).

2.4.iii Annotating functional single-nucleotide changes in enhancer assays

The functional annotation portion of this computational framework is not restricted to cancer mutations, but can be expanded to germline variants or experimental variation used to dissect cRE function. For example, to determine which nucleotides in a cRE are functionally relevant, one would historically clone the cRE into a luciferase reporter plasmid. The importance of individual nucleotides can then be queried by mutating these positions one at a time and testing their effects on reporter gene expression. This method is time consuming and costly as each mutant must be generated and tested individually.

Next-generation methods have now been developed which use high-throughput sequencing to assay the functional effects of thousands of single nucleotide changes en-mass. This massively parallel reporter assay (MPRA) has changed the way in which

28 investigators can dissect the regulatory grammar of cREs(45, 110). For a given experiment, the output produces a list of single nucleotide variants within a cRE, each associated with an effect size on reporter activity. While the majority of nucleotide changes result in little or no change, some result in significant increases or decreases of cRE activity. To determine if these functional nucleotide changes correspond to predicted changes in TF binding ability we applied the annotation portion of the pipeline to the MPRA data.

The lead investigator of this project applied the MPRA to three liver enhancers located within the genes SORL1, TRAF3IP2, and PPARG respectively. All tested single- nucleotide variants were annotated with our annotation pipeline (chapter 2.3) to determine which are predicted to have an effect on TF binding. Since the MPRA was performed in a liver cell experimental system, we restricted our TF-PSSM annotations to those TF’s expressed in liver. We observed a striking overlap between predicted liver- associated TFBS and clusters of nucleotide positions with significant effect sizes upon mutation. For example, in the SORL1 enhancer, nucleotide changes computationally predicted to affect TFBS affinity were more likely to have a significant effect size (Figure

2.3, Fisher’s exact Test P-value < 2.2e-16). This work demonstrates that predicted changes in TF biding often relate to observed changes in cRE activity, and highlights the importance of restricting the TFBS analysis to tissue-expressed TFs. This pipeline can also be used to enrich for functional cRE mutations occurring naturally in cancer patients. The full body of this work can be read in the journal of PLoS Genetics:

29 Birnbaum, R.Y., Patwardhan, R.P., Kim, M.J., Findlay, G., Martin, B., Zhao, J., Bell,

R.J.A., Smith, R.P., Ku, A.A., Shendure, J., Ahituv, N Systematic Dissection of Coding

Exons at Single Nucleotide Resolution Supports an Additional Role in Cell-Specific

Transcriptional Regulation, PLoS Genet 10, e1004592 (2014).

2.4.iv Identifying TERT promoter mutations in GBM

By intersecting the genome-wide somatic mutation calls generated from TCGA patients

(chapter 2.4.ii) with the GBM- and normal brain- (NB) defined promoters and enhancers

(chapter 2.4.i), we identified all promoter and enhancer mutations in a cohort of 40

GBMs. All identified cRE mutations were annotated for conservation, ENCODE DHS, and predicted effects on TF binding. The mutations were then analyzed on a single nucleotide and cRE level to identify recurrently mutated cREs across patients (Figure

2.4). The most common cRE mutation was detected in the promoter of the TERT gene, which was epigenetically activated in ¼ of GBM ChIP-seq samples. The most likely TF to have binding affected by these TERT promoter mutations was SPI-1 (one of the ETS

TFs), consistent with other publications on TERT promoter mutations (Table 2.3)(32,

33).

It is now known that TERT promoter mutations are the most common genetic mutations in many cancers including adult glioblastoma (83%) and hepatocellular carcinoma

(47%). Cells harboring these mutations reactivate TERT gene expression and telomerase activity, thus maintaining telomere length and becoming effectively immortalized. Our ability to detect this mutation with our analysis framework helps to

30 validate our approach. As TERT promoter mutations are the most common mutation in multiple cancers, and the top hit from our genome-wide survey in GBM, we decided to pursue its molecular mechanism of activation and corroborate the computational predictions of TF binding alterations. The full body of this work can be read in Chapter

3.

31 FIGURE 2.1 A GBM Normal Brain

54152 39027 5956 C H3K4me1 H3K27Ac H3K4me3 Active promoter Heterochromatin B Poised enhancer GBM Normal Brain Active enhancer Low H3K27Ac

101430 65143 13121

Figure 2.1: cRE calls identified by epigenomic profiling .

(A, B) A venn diagram of promoters (A) and enhancers (B) that are unique to GBM,

normal brain, or common between the two. (C) The five chromatin states generated by

chromHMM when analyzing H3K4me3, H3K4me1, and H3K27Ac from GBM and normal

brain. Darker blue represents a higher probability of observing a given histone modifica-

tion at a given state. State one and four were chosen to define active promoters and

enhancers respectively.

32 FIGURE 2.2 5000 1000 Mutations 200 50

Genome-wide Exons

Figure 2.2: Mutations called from whole-genome sequencing of 40 GBM patients. The

total number of somatic mutations called by Mutect genome-wide (red) and within exons

(blue). Each dot represents an individual patient.

33 FIGURE 2.3 1.2 0.8 Density 0.4 0.0 −1.5 −0.5 0.5 1.0 1.5

Log2 Effect Size

Figure 2.3: Nucleotide changes in the SORL1 enhancer predicted to affect TF-binding often have a larger effect size in reporter activity. A density plot of the effect size of each nucleotide change is shown above. Nucleotides computationally predicted to alter liver

TF-binding are shown in red, and the rest are shown in black.

34 FIGURE 2.4 12 TERT 10 8 6 4 Patients Mutated 2

8 10 12 14 16

log2 cRE length (bp)

Figure 2.4: Recurrenty mutated cREs in 40 GBM patients.

Each cRE is plotted as a function of its length and the number of patient in which it is independently mutated. Darker points represent overlapping data points. The TERT promoter is the most recurrently mutated cRE.

35 TABLE 2.1 Samples Generated in the Costello Lab

pass filter aligned duplication data type patient sample type total_reads reads reads rate ChIP-seq GBM01 H3K4me3 54,948,899 47,335,427 28,357,473 0.401 ChIP-seq GBM01 H3K4me1 69,763,200 61,766,081 54,764,288 0.113 ChIP-seq GBM01 H3K27ac 54,932,406 46,007,279 39,185,972 0.148 ChIP-seq GBM01 Input 68,292,233 55,286,557 48,595,209 0.121 ChIP-seq GBM02 H3K4me3 52,140,050 34,684,499 24,625,383 0.290 ChIP-seq GBM02 H3K4me1 95,979,196 80,753,283 74,806,291 0.074 ChIP-seq GBM02 H3K27ac 87,286,516 59,692,832 53,612,453 0.102 ChIP-seq GBM02 Input 58,980,778 49,218,006 33,163,322 0.326 ChIP-seq GBM03 H3K4me3 92,323,976 71,232,629 36,767,363 0.484 ChIP-seq GBM03 H3K4me1 92,987,428 81,926,356 76,983,287 0.060 ChIP-seq GBM03 H3K27ac 79,579,916 63,513,796 57,855,398 0.089 ChIP-seq GBM03 Input 59,823,120 48,949,192 47,381,335 0.032 ChIP-seq GBM04 H3K4me3 52,609,302 43,095,282 29,792,539 0.309 ChIP-seq GBM04 H3K4me1 92,265,478 75,900,287 70,665,228 0.069 ChIP-seq GBM04 H3K27ac 55,631,920 39,930,644 36,287,616 0.091 ChIP-seq GBM04 Input 62,087,390 51,614,061 48,702,088 0.056

Samples downloaded from REMC downloaded data type patient sample type reads ChIP-seq Anterior_Caudate H3K4me3 35,616,706 ChIP-seq Anterior_Caudate H3K4me1 39,992,528 ChIP-seq Anterior_Caudate H3K27ac 26,285,811 ChIP-seq Anterior_Caudate Input 32,718,096 ChIP-seq Cingulate_Gyrus H3K4me3 34,477,121 ChIP-seq Cingulate_Gyrus H3K4me1 34,535,603 ChIP-seq Cingulate_Gyrus H3K27ac 34,537,642 ChIP-seq Cingulate_Gyrus Input 35,971,424 ChIP-seq Mid_Frontal_Lobe H3K4me3 37,296,152 ChIP-seq Mid_Frontal_Lobe H3K4me1 37,339,589 ChIP-seq Mid_Frontal_Lobe H3K27ac 34,362,604 ChIP-seq Mid_Frontal_Lobe Input 36,503,506 ChIP-seq Hippocampus_Middle H3K4me3 35,625,648 ChIP-seq Hippocampus_Middle H3K4me1 34,257,403 ChIP-seq Hippocampus_Middle H3K27ac 22,634,126 ChIP-seq Hippocampus_Middle Input 34,233,676 ChIP-seq Inferior_Temporal_Lobe H3K4me3 31,124,677

36 ChIP-seq Inferior_Temporal_Lobe H3K4me1 31,418,935 ChIP-seq Inferior_Temporal_Lobe H3K27ac 27,823,926 ChIP-seq Inferior_Temporal_Lobe Input 27,992,918

Table 2.1 GBM and Normal brain ChIP-seq read quality control statistics

37 TABLE 2.2 Gene name GBM splicing motif GVs GBM splicing motif SMs

DYNLRB1 1-chr20:33104296 C/G 2-chr20:33121706 C>T

NPIPA1 1-chr16:15040359 G/A 4-chr16:15040359 G>A 1-chr16:15031765 G>C

LOC100505811 0 3 chr5:117618623 C>T TRAM2-AS1 0 3 chr6:52445058 T>G

AIMP1 0 2-chr4:107260426 G>T PRDM12 0 2-chr9:133545135 G>A TIGD7 0 2-chr16:3353982 A>G FBXO45 0 2-chr3:196310237 C>T SLC4A1 0 2-chr17:42342896 C>A ARHGAP29 1-chr1:94677742 G/C 2-chr1:94678897 G>A

LOC100128006 1-chr17:12681207 G/A 1-chr17:12679656 T>G 1-chr17:12673734 C>T LOC440896 1-chr9:69180408 G/A 1-chr9:69179798 T>C 1-chr9:69180519 C>G STXBP5 1-chr6:147648825 G/C 2-chr6:147693778 T>C TNNI2 1-chr11:1860346 T/G 1-chr11:1860346 T>G 1-chr11:1862029 G>A

Table 2.2 Recurrent genes with enriched splicing motif mutations

Table is modified from:

J. Chen et al., The genetics of splicing in neuroblastoma, Cancer Discovery 5, 380–395

(2015).

38 TABLE 2.3

score Relative Mutation Mark wt_Score mut_Score change Entropy G228A SPI-1 0.3458 1.1602 0.8144 0.8644 G228A E74A 0.5950 1.2971 0.7021 1.0211 G228A M00469.AP-2alpha 0.9320 0.3289 -0.6031 0.7069 G228A AP2alpha 0.9320 0.3289 -0.6031 0.7069 G228A M00932.Sp-1 0.2748 0.7863 0.5115 0.6897 G228A NRF-2 0.5182 1.0097 0.4915 0.9667 G228A Elk-1 0.4210 0.8850 0.4640 0.6382 G228A M00492.STAT1 0.4416 0.8781 0.4365 0.7090 G228A LeitMotif.ETS 0.6276 1.0573 0.4296 1.0343 G228A M00496.STAT1 0.1777 0.5999 0.4221 0.4443 G228A M00499.STAT5A 0.2108 0.6250 0.4142 0.5269 G228A M00189.AP-2 0.7518 0.3410 -0.4108 0.7037 G228A M00771.ETS 0.3289 0.7153 0.3863 0.6164 G228A M00777.STAT 0.2762 0.6542 0.3780 0.5306 G228A M00500.STAT6 0.2416 0.6164 0.3748 0.6040 G228A M00446.Spz1 0.6033 0.2354 -0.3679 0.4638 G228A M00025.Elk-1 0.2187 0.5606 0.3419 0.5223 G228A M00497.STAT3 0.4083 0.7486 0.3404 0.4636 G228A M00450.Zic3 0.6044 0.3020 -0.3024 0.5404 G228A M00915.AP-2 0.7053 0.4035 -0.3018 0.6592 M00470.AP- G228A 2gamma 0.9420 0.6490 -0.2930 0.7691 G228A M00449.Zic2 0.5464 0.2948 -0.2517 0.4956 G228A M00484.Ncx 0.3021 0.5426 0.2405 0.3308 G228A M00986.Churchill 0.8808 0.7693 -0.1115 0.8483 G228A M00933.Sp-1 0.9910 0.8828 -0.1082 0.8626 G228A M00196.Sp1 0.8062 0.7126 -0.0935 0.7413 G228A M00395.HOXA3 0.3223 0.2353 -0.0870 0.2689 G228A M00794.TTF-1 0.4048 0.4781 0.0733 0.4499 G250A SPI-1 0.3458 1.1602 0.8144 0.8644 G250A M00986.Churchill 1.0487 0.3393 -0.7093 0.8483 G250A E74A 0.5950 1.2971 0.7021 1.0211 G250A LeitMotif.ETS 0.4576 1.0573 0.5996 1.0343 G250A NRF-2 0.5182 1.0097 0.4915 0.9667 G250A M00492.STAT1 0.4390 0.8781 0.4391 0.7090 G250A M00496.STAT1 0.1777 0.5999 0.4221 0.4443 G250A M00499.STAT5A 0.2108 0.6250 0.4142 0.5269

39 G250A M00500.STAT6 0.2489 0.6164 0.3675 0.6040 G250A M00025.Elk-1 0.2265 0.5776 0.3511 0.5223 G250A M00497.STAT3 0.4338 0.7486 0.3149 0.4636 G250A Elk-1 0.5918 0.8992 0.3074 0.6382 G250A M00378.Pax-4 0.4915 0.1862 -0.3053 0.4654 G250A M00484.Ncx 0.2419 0.5426 0.3007 0.3308 G250A M00016.E74A 0.3542 0.6434 0.2891 0.5615 G250A M00449.Zic2 0.7110 0.4592 -0.2518 0.4956 G250A M00448.Zic1 0.6298 0.4550 -0.1748 0.5159 G250A M00395.HOXA3 0.2762 0.1132 -0.1630 0.2689

Table 2.3 TFs predicted to have differential binding at the TERT promoter mutation sites. The relative entropy is the average binding score or a given true binding sequence.

40

CHAPTER 3.

THE TRANSCRIPTION FACTOR GABP SELECTIVELY

BINDS AND ACTIVATES THE MUTANT TERT

PROMOTER IN CANCER

41 3.1 ABSTRACT

Reactivation of telomerase reverse transcriptase (TERT) expression enables cells to overcome replicative senescence and escape apoptosis, fundamental steps in the initiation of human cancer. Multiple cancer types, including up to 83% of glioblastomas

(GBM), harbor highly recurrent TERT promoter mutations of unknown function but specific to two nucleotide positions. We identify the functional consequence of these mutations in GBM to be recruitment of the multimeric GABP transcription factor specifically to the mutant promoter. Allelic recruitment of GABP is consistently observed across four cancer types, highlighting a shared mechanism underlying TERT reactivation. Tandem flanking native ETS motifs critically cooperate with these mutations to activate TERT, likely by facilitating GABP heterotetramer binding. GABP thus directly links TERT promoter mutations to aberrant expression in multiple cancers.

3.2 MAIN TEXT

The human telomerase is an enzyme critical for maintaining telomere length and chromosomal stability in stem cells(111, 112). The transcriptional regulation of the telomerase reverse transcriptase (TERT) gene, encoding the catalytic subunit of telomerase, is a rate-limiting step in modulating telomerase activity(113). Although normally silenced in somatic cells, TERT is aberrantly expressed in 90% of aggressive cancers, highlighting this event as a hallmark of tumorigenesis(114-116). Reactivating telomerase helps cells with finite lifespan to achieve limitless proliferative potential and bypass cellular senescence induced by DNA replication-associated telomere

42 shortening. Understanding the mechanisms of aberrant TERT expression thus represents a crucial outstanding problem in cancer research.

Recently discovered non-coding mutations in the TERT promoter are among the most common genetic alterations observed across multiple cancer types, revealing a potentially causal biological mechanism driving increased telomerase activity in tumors

(32, 33, 37). Specifically, one of two positions, G228A or G250A, is mutated in 20% of medulloblastomas(34), 44% of hepatocellular carcinomas (HCC)(35), 66% of urothelial carcinomas of the bladder(36), 71% of melanomas(32, 33), and 83% of primary glioblastomas (GBM)(37), making them the most recurrent single-nucleotide mutations observed in these cancer types. Both the G228A and G250A mutations are associated with increased TERT expression (Figure 3.1) and telomerase activity(117), and have prognostic power in bladder cancer and GBM(118-120). Both G>A transitions generate an identical 11bp sequence which is hypothesized to generate a de novo binding site for an ETS transcription factor(32). Despite these compelling findings and the central importance of TERT in human cancer, the precise function of the mutations has remained elusive since their initial discovery in melanoma patients.

To determine whether the de novo ETS motif is necessary for mutant TERT activation, we performed site-directed mutagenesis of the core TERT promoter. The G228C,

G250C, and G250T mutations did not increase promoter activity, highlighting the requirement for the G>A transition for TERT activation (Figure 3.2A). Furthermore, removing the ETS motif while retaining the G228A mutation (A227T, G228A) resulted in

43 a complete reduction of promoter activity to wild-type levels. Interestingly, the G228T mutation also partially increased promoter activity; this induction is consistent with the site being the second adenine position in an ETS motif, a position that is often degenerate for A/T(121). Mutating the second adenine position to thymine in the context of G250A (G250A, A251T) resulted in a similar intermediate level of promoter activity.

A siRNA screen of 13 ETS factors expressed in GBM revealed 5 ETS factors (ELF1,

ETS1, ETV3, ETV4, GABPA) whose knock-down reduced TERT expression in at least one of two GBM cell lines harboring TERT promoter mutations (Figure 3.2B, Figure 3.3, and Figure 3.4). Only three factors (ETS1, ETV3, and GABPA) consistently reduced

TERT expression in both lines. Of note, GABPA knockdown reduced TERT expression by as much as 50% within the first 24 hours, and sustained the largest effect on TERT expression amongst the ETS candidates throughout 72 hours (Figure 3.4). In contrast, knockdown of ETS1 and ELF1 resulted in a more modest reduction of TERT mRNA, and only reached statistical significance at 72 hours, suggesting their regulation of

TERT is through indirect mechanisms. ETV3 is a transcriptional repressor in the ETS family and was thus not considered a candidate direct regulator of mutant TERT(122-

124). Thus, the de novo ETS motif is critical for mutant TERT promoter activity in GBM, and one or more candidate ETS factors may regulate TERT expression directly through the G228A and G250A mutations.

We next investigated whether regulation of TERT by ETS1, ETV3, ETV4, or GABPA depends upon the TERT promoter mutation status by testing the effect of siRNA

44 knockdowns on activity of TERT promoter-driven luciferase reporters. Only GABPA knockdown significantly reduced mutant promoter activity without affecting wild-type promoter activity (Figure 3.5, Figure 3.6). While ETV4 knockdown reduced mutant promoter activity, it also significantly reduced the activity of the wild-type promoter, indicating the potential of ETV4 to bind and regulate the wild-type TERT promoter sequence in this assay. Knockdown of ETS1 and ETV3 did not significantly reduce promoter activity (Figure 3.5, Figure 3.6). GABPA was thus the only ETS factor that reproducibly affected TERT expression in a mutation-specific manner. Furthermore knockdown of GABPA did not significantly affect cell cycle or proliferation rate within this timeframe (Figure 3.7).

To determine the in vivo binding specificity to the mutant TERT promoter sequence

(‘CCGGAA’) relative to the wild-type sequence (‘CCGGAG’) amongst the candidate

ETS factors, we analyzed publicly available ChIP-seq data for GABPA, ELF1, ETS1, and ETV4(125, 126). While all factors display significant enrichment of the sequence found in the mutant TERT promoter relative to the wild-type sequence, we found that

GABPA peaks contained significantly greater enrichment of the mutant motif compared to ETS1 or ETV4 peaks (P-value = 5.1x10-8 for ETS1 and 1.8x10-8 for ETV4, Wilcoxon rank-sum test)(Figure 3.5, Figure 3.8). This genome-wide analysis supports the binding specificity for the motif created by the TERT promoter mutations, and suggests that

GABPA binding may be more sensitive to these promoter mutations. Furthermore, this enrichment is not observed in DNase I hypersensitivity peaks in the same cells, demonstrating that the motif enrichment does not represent sequence biases in areas of

45 open chromatin (Figure 3.8). Among the eight ENCODE cell lines with GABPA ChIP- seq, only HepG2 hepatocellular carcinoma cells and SK-N-SH neuroblastoma cells, both of which harbor heterozygous G228A mutations, displayed significant GABPA binding at the TERT promoter (Figure 3.5). In contrast, none of the TERT mutant cell lines showed ELF1 binding at the TERT promoter (Figure 3.9). Likewise, ChIP of ETS1 and ETV4 did not show binding at the mutant TERT promoter in vivo (Figure 3.9). An in vitro single-molecule protein binding assay further confirmed that ETV4 does not stably bind the mutated sequence (Figure 3.10). These results are consistent with the fact that only GABPA knockdown shows immediate reduction on TERT expression (Figure 3.4) and implicate GABPA to be the only ETS factor among the candidates to directly bind the mutant TERT promoter. All of the cell lines that did not show GABPA binding (K562,

GM12878, A549, Hela, MCF-7, HL-60) were derived from cancers in which TERT promoter mutations are absent or uncommon(37). Strikingly, 100% of the GABPA ChIP- seq reads covering the mutated site within the TERT promoter contain G228A, suggesting that GABPA selectively binds the mutant allele in vivo and that it cannot recognize and bind the wild-type sequence (Figure 3.5). Recruitment of GABP to the

G250A mutant sequence was confirmed in vitro using a single-molecule protein binding assay. In contrast, no binding event of GABP was detected for the wild-type TERT sequence (Figure 3.10). Mutant allele-specific DNase I hypersensitivity and Pol II recruitment was also observed in these lines (Figure 3.11).

To confirm that GABPA is specifically recruited to the mutant allele, we performed

GABPA ChIP in HepG2, SK-N-SH, two GBM lines, and three melanoma lines (Table

46 3.1). All cell lines harboring either the G228A or G250A mutation showed significant

GABPA binding in the TERT core promoter (P-value = 0.016, Wilcoxon Rank Sum Test,

Figure 3.5). In contrast, the TERT wild-type melanoma line SK-MEL-28 showed no

GABPA binding at the TERT promoter compared to the other lines (P-value = 0.007,

Weisberg t-test for outliers). Consistent with our findings of specificity for the mutant allele in the ENCODE ChIP-seq data, the GABPA-immunoprecipitated DNA from the heterozygous mutant cell lines HepG2, SK-N-SH, and GBM1 all showed significant bias towards the mutant allele compared to input control DNA (P-value = 1.264x10-5, Fisher’s

Exact Test, Figure 3.5E). Furthermore, we confirmed that both heterozygous mutations in the TERT promoter resulted in allelic deposition of H3K4me3 and allele-specific expression (Figure 3.12). Nucleosome positioning analysis from micrococcal nuclease- digested H3K4me3 ChIP-seq(11) data revealed that both mutation positions lie within a nucleosome free region, with the upstream nucleosomes containing the H3K4me3 modifications (Figure 3.12). These data demonstrate that GABPA is selectively recruited to the mutant TERT allele across multiple cancer types and results in allele-specific activation of TERT.

GABPA is unique among the large ETS transcription factor family as it is the only obligate multimeric factor(127, 128). GABPA dimerizes with GABPB, and the resulting heterodimer (GABP) forms a fully functional transcription factor that can both bind DNA and activate transcription(129). GABPA has a single transcript isoform that is widely expressed across tissue types, while GABPB is encoded by either the GABPB1 or

GABPB2 gene and GABPB1 contains multiple isoforms(130, 131). A subset of GABPB

47 isoforms contain -like domains which allow two GABP heterodimers to form a heterotetramer complex capable of binding two GABPA motifs (core consensus

‘CCGGAA’) in proximity to each other, and further stimulating transcription(132).

Consistent with this fact, genome-wide analysis of ENCODE GABPA ChIP-seq data showed that peaks containing two GABPA motifs have significantly higher binding enrichment scores compared to peaks with just one or zero motifs (P-value =1.6x10-157,

Wilcoxon rank-sum test, Figure 3.13, Figure 3.14). Analysis of GABPA motif spacing within peaks containing 2 motifs revealed that strong peaks are more likely to have a separation distance shorter than 50bp compared to weak peaks (Figure 3.15A, Figure

3.13). Moreover, this increase in likelihood occurred at discrete spacing that aligned well with the 10.5bp periodicity of relaxed B-DNA, highlighting the importance of having two

GABPA binding sites in phase and separated by full helical turns of double stranded

DNA. This periodicity was unique to GABPA and is not observed in ELF1 or ETS1

ChIP-seq data (Figure 3.13). The Fourier spectrum of the enrichment also spiked around the helical frequency in strong GABPA peaks, but not in weak peaks or the genomic background (Figure 3.16). This analysis suggested that two proximal motifs in helical phase act synergistically to recruit a GABP heterotetramer complex.

Investigation of the DNA sequence flanking the mutation sites revealed three native

ETS binding motifs (ETS-195, ETS-200, and ETS-294) (Figure 3.15B). To determine whether these flanking ETS motifs are required for mutant TERT activation, we performed site-directed mutagenesis of the flanking ETS sites with or without the

G228A or G250A mutation. Mutating ETS-195 or ETS-200 alone reduced promoter

48 activity from the relatively low level of the wild-type promoter, and also significantly reduced activity in the context of G228A or G250A. In contrast, mutating ETS-294 had no effect on promoter activity in the context of G250A despite being closer than ETS-

195 or ETS-200 (Figure 3.15C). These data demonstrate that both ETS-195 and ETS-

200 are required for aberrant activity of the mutant TERT promoter. The GABPB1 isoforms required for GABP heterotetramer formation are the predominant isoforms expressed in GBM melanoma, hepatocellular carcinoma and bladder urothelial carcinoma, all tumor types prone to TERT promoter mutations (Figure 3.17).

To test whether ETS motif spacing is essential for mutant TERT promoter activation, we performed a series of deletions in 2bp increments between the native ETS site and the

G250A mutation, effectively bringing G250A out of phase and back into phase with the native ETS motifs. While the wild-type reporter construct displays only noise level fluctuations in activity, we observed clear periodic behavior in the G250A reporter, suggesting the recruitment of a GABP heterotetramer (Figure 3.15D, Figure 3.13).

However, G250A promoter activity peaked after deleting six base pairs, which brought the G250A site in phase with the ETS-200 site by perfect 4 helical turns. Mutating ETS-

195, although reducing the TERT activation level (Figure 3.15C), did not change the periodic pattern, implying a preferential interaction of GABP with ETS-200 instead of

ETS-195 (Figure 3.13). Repeating the experiment with a mutated ETS-200, however, led to a translation in 10.5bp periodicity, which was now consistent with pairing between

G250A and ETS-195 (Figure 3.15D). These results strongly suggest that GABP may be able to bind and switch between both native ETS motifs in the context of G250A,

49 consistent with the fact that both native ETS motifs are essential for robust TERT activation (Figure 3.15C).

The critical role of two adjacent ETS motifs in aberrant TERT activation was further strengthened by our analysis of an oligodendroglioma tumor containing a unique, heterozygous 41bp tandem duplication within the core TERT promoter. While this sample was wild-type at G228A and G250A, we found that the junction of the duplication event generated a de novo ETS motif that is 41bp away from the native downstream ETS-195 motif (Figure 3.15B). The promoter sequence containing this duplication induces elevated promoter activity similar to the G228A and G250A mutant sequences, despite its wild-type status at these positions (Figure 3.15C). Mutagenesis of either the native ETS-195 site or the de novo junction ETS site significantly reduced promoter activity, once again demonstrating that this duplication satisfies the pre- requisite for GABP heterotetramer recruitment (Figure 3.15C).

We have thus identified GABP as the critical ETS transcription factor activating TERT expression in the context of highly recurrent promoter mutations. Although many ETS transcription factors can bind similar DNA sequence motifs, GABP is unique in that it can bind neighboring ETS motifs as a heterotetrameric complex. We show that strong

GABPA ChIP-seq peaks contain a periodicity of approximately 10.5bp between neighboring ETS motifs, consistent with the binding of a GABP complex at two locations separated by full helical turns of DNA. This genome-wide pattern is reproduced in the context of TERT promoter mutations, where both G228A and G250A are separated

50 from two tandem proximal native ETS motifs by 2.9/2.4 (ETS-195/ETS-200) and 5.0/4.6

(ETS-195/ETS-200) helical turns respectively. We propose that TERT promoter mutations cooperate with both of these native ETS sites to recruit GABP. Further work is necessary to elucidate which other transcription factors are interacting with GABP at the mutant TERT promoter in order to drive aberrant transcription. Additionally, both

TERT promoter mutations fall within a GC-rich repeat sequence with potential to form a

G-quadruplex, DNA secondary structure which can regulate gene expression(133, 134).

A potential impact of TERT promoter mutations on this predicted secondary structure and on the complex relationship between secondary structure and GABP recruitment may also play a role in deregulating TERT expression. The cancer-specific interaction of

GABP with the TERT core promoter mutations highlights a common mechanism utilized by many cancers to overcome replicative senescence.

51 GBM tissueornormalbraintissue.*P <0.05,Welch twosamplet-test. to TERT wild-typeGBMornormal brain(NB). TERT RT-qPCR performedfromprimary Figure 3.1: TERT promotermutantGBMsdisplay elevated TERT expressioncompared FIGURE 3.1 Expression Relative to NB ï 0  4 6 8 Mut * WT NB GBM 52 FIGURE 3.2

A 228 250 Luciferase Activity A A Luc C ATG C ATG G AGCT G AGCT 0 5 10 15 A CG A CG TA TT TA TT CGGACTAC CGGACTAC CCGGAGGGG CCGGGAGGG WT CCGGAAGGG G228A * CCGGATGGG G228T * CCGGACGGG G228C CCGGTAGGG A227T/ G228A CCGGAAGGG G250A * CCGGTAGGG G250T CCGGCAGGG G250C CCGGATGGG A251T/ G250A *

B 2.0 GBM1 (G228A) * U251 (G228A) 1.5

1.0 * * * * * * TERT Expression 0.5 * *

0.0 SiScr ELF1 ELK1 ETS1 ETV1 ETV3 ETV4 ETV5 GABPA siRNA

Figure 3.2: The de novo ETS motif is critical for mutant TERT promoter activity in GBM.

(A) TERT promoter-luciferase reporter assays for wild-type, G228A, G250A, or targeted

mutation sequences. * P <0.05, Student’s t-test compared to wild-type (WT) (B) TERT

expression relative to siScramble (siScr) 72 hours post ETS factor siRNA knockdown. *

P <0.05, Student’s t-test compared to siScr. The results are an average of at least 3

independent experiments. Values are mean ± sd.

53 TERT Expression 0.0 0.5 1.0 1.5 2.0 B A Luc SiScr CCGG CCGGA CCGG CCGG CCGGAGGGG C G A T C A G ELF1 G A T T A A 228 C A A T C A T * C G GGG GGG A GGG GGG T C G A T C G A T ELK 1 CCGGA CCGGCAGGG CCGG CCGGAAGGG CCGGGAGGG C G A T C A G ETS1 * G 250 A T T * AGGG C A T T GGG C G A T C G s A T i C G A T RNA ETV1 * G250A A251T/ G228A A227T/ G250C G250A G228C G228A G250T G228T W ETV3 * T 0 * ETV4 * Luciferase Activity 5 * U GBM1 ETV5 251 1 ( * * 0 G ( G 228A GABPA 228A * * ) ) * 1 5 FIGURE 3.3 15000 RPKM RPKM GBM Expression (RSEM) 12000 6000 2000 4000 9000 100 20 40 60 5 0 0 0 0

FEV FEV FEV

ELK1 ELK1 ELK1

ELF2 ELF2 ELF2

ERG ERG ERG

SPIB SPIB SPIB

ELK4 ELK4 ELK4

ERF ERF ERF

ETV1 ETV1 ETV1

SPIC SPIC SPIC

ETV5 ETV5 ETV5

ELF5 ELF5 ELF5

ETV7 ETV7 ETV7

ELF4 ELF4 ELF4 54 ELF3 ELF3 ELF3

EHF EHF EHF

GABPA GABPA GABPA

ETV3 ETV3 ETV3

ETV2 ETV2 ETV2

FLI1 FLI1 FLI1

ETS1 ETS1 ETS1

ELK3 ELK3 ELK3

ETV6 ETV6 ETV6

ETS2 ETS2 ETS2

SPI1 SPI1 SPI1

SPDEF SPDEF SPDEF

ETV4 ETV4 ETV4

ELF1 ELF1 ELF1 15000 RPKM RPKM GBM Expression (RSEM) 12000 2000 6000 4000 9000 100 20 40 60 5 0 0 0 0

FEV FEV FEV

ELK1 ELK1 ELK1

ELF2 ELF2 ELF2

ERG ERG ERG

SPIB SPIB SPIB

ELK4 ELK4 ELK4

ERF ERF ERF

ETV1 ETV1 ETV1

SPIC SPIC SPIC

ETV5 ETV5 ETV5

ELF5 ELF5 ELF5

ETV7 ETV7 ETV7

ELF4 ELF4 ELF4

ELF3 ELF3 ELF3

EHF EHF EHF

GABPA GABPA GABPA

ETV3 ETV3 ETV3

ETV2 ETV2 ETV2

FLI1 FLI1 FLI1

ETS1 ETS1 ETS1

ELK3 ELK3 ELK3

ETV6 ETV6 ETV6

ETS2 ETS2 ETS2

SPI1 SPI1 SPI1

SPDEF SPDEF SPDEF

ETV4 ETV4 ETV4

ELF1 ELF1 ELF1 GBM1 andU251RNA-seqrespectively. expression ofeachETSfamilymember. (B,C)RPKMvaluesofeachETSfactorfrom obtained for169GBMsamplesfromthe TCGA dataportal.Boxplotsareshownforthe Figure 3.3:ETSfamilygeneexpressioninGBM. 55 (A) Level3RNA-seqdatawas FIGURE 3.4 A D

2.0 1.0 1.0

0.8 0.8 1.5 ess io n

0.6 r 0.6 ress io n ress ion p p x x ex p

1.0 E E A T T 0.4 0.4 R R E E AB P T T 0.5 G 0.2 0.2

0.0 0.0 0.0 SiSc r ETS 1 ETS 2 ETV 1 ETV 3 ETV 4 ETV 5 ELF 1 ELF 2 ELF 4 ELK 1 ELK 3 ELK 4 GABP A

ETS siRNA GABPA- 1 GABPA- 2 GABPA- 1 GABPA- 2

ELF1 B E 1.5 1.0 U251-48hrs ETS1 U251-72hrs ETV3 0.8

es io n GBM1-48hrs ETV4 r 1.0 p x GBM1-72hrs ress ion E GABPA

0.6 p e x n E e T G

0.4 R

E 0.5 T

0.2

0.0 0.0 ELF1 ELK1 ETS1 ETV1 ETV3 ETV4 ETV5 GABPA 24 hrs 48 hrs 72 hrs C siRNA F 1.5 2.0 ELF1 GBM1 (G228A) ETS1 U251 (G228A) ETV3 1.5 1.0 ETV4 ress ion p ress ion x p E x 1.0 GABPA E T T R E R 0.5

* * * * T E T 0.5

0.0 0.0 SiScr ELF1 ELK1 ETS1 ETV1 ETV3 ETV4 ETV5 GABPA 24 hrs 48 hrs 72 hrs siRNA

Figure 3.4: TERT expression in response to ETS siRNA knockdown. (A) A preliminary

siRNA screen of 13 expressed ETS factors in GBM. TERT expression was measured by

RT-qPCR 48 hours post siRNA transfection into U251 cells. 6 ETS factors were

removed from further follow-up. ETV1 was retained as an internal negative control.

56 respectively. Values aremean ±rangefrom2replicateexperiments. hours postsiRNA knockdownofthetop4ETSfactorcandidatesinGBM1andU251 replicate experiments. siGABPA-1 orsiGABPA-2 inGBM1culturedcells.Values aremean±rangefrom2 GABPA expressionwasmeasuredbyRT-qPCR at72hrposttransfectionwitheither response toknockdownofGABPA fromtwoindependentpoolsofsiRNAs. TERT and Values aremean±sd.(D) TERT andGABPA expressionrelativetosiScramblein compared tosiScr. The resultsareanaverageofatleast3independentexperiments. siScramble 48hourspostsiRNA knockdownof8ETSfactors.*P <0.05,Student’s t-test Values aremean±rangefrom2replicateexperiments. down. ExpressionwasmeasuredbyRT-qPCR at48hrand72hrpost-transfection. factor expressionrelativetosiScramble(siScr)inresponsetargetedsiRNA knock- Values aremean±rangerelativetosiScrambleofn=2replicates. Expression Relative to NB ï 0  4 6 8 Mut * WT (E, F) TERT expressionrelativetosiScrambleat24,48,and72 NB GBM 57 (C) TERT expressionrelativeto (B) ETStranscription A 228 250 Luciferase Activity A A Luc C ATG C ATG G AGCT G AGCT 0 5 10 15 A CG A CG TA TT TA TT CGGACTAC CGGACTAC CCGGAGGGG CCGGGAGGG WT CCGGAAGGG G228A * CCGGATGGG G228T * CCGGACGGG G228C CCGGTAGGG A227T/ G228A CCGGAAGGG G250A * CCGGTAGGG G250T CCGGCAGGG G250C CCGGATGGG A251T/ G250A *

B 2.0 GBM1 (G228A) * U251 (G228A) 1.5

1.0 * * * * * * TERT Expression 0.5 * *

0.0 SiScr ELF1 ELK1 ETS1 ETV1 ETV3 ETV4 ETV5 GABPA siRNA

FIGURE 3.5

A B 15 20 SiScr MUT WT MUT WT A549 HL-60 ETS1 GM12878 K562 ETV3 HeLa-S3 MCF-7 10 HepG2 SK-N-SH ETV4 GABPA * * * * enrichment 5 Moti f Luciferase Activity *

0 0 WT G228A G250A 0 1 (strong peak) Peak rank (weak peak) Reporter Construct C D TERT+47 TERT-5329 0.8

TERT HepG2 Primers 0.6 SK-N-SH TERT+47 TERT-450 TERT-639 TERT-5329 GBM1 A549 U251 A375 npu t 0.4 100bp 100bp I SK-MEL-1239

GM12878 % SK-MEL-28

HeLa-S3 0.2

HL-60 0.0 GABPA IgG GABPA IgG K562

Read Coverage E MCF-7 15 IP Input HepG2 SK-N-SH GBM1 SK-N-SH HepG2 10

60 ETS-195 G228A G250A ETS-294 5 G228A Mutant Clones HepG2 WT

Read Coverage 0 0 1295150 1295200 1295250 1295300 chr5 0 5 10 15 Wild-Type Clones

Figure 3.5: GABPA selectively regulates and binds the mutant TERT promoter across

multiple cancer types. (A) Wild-type, G228A, or G250A luciferase activity 72 hours post

ETS siRNA knockdown in GBM1 cultured cells, scaled to WT-siScramble (siScr). The

results are an average of at least 3 independent experiments. Values are mean ± sd * P

<0.05, Student’s t-test compared to siScr. (B) Enrichment of mutant (CCGGAA) or

wild-type (CCGGAG) hexamer sequences in ENCODE GABPA ChIP-seq peaks relative

to flanking regions. (C) ENCODE GABPA ChIP-seq data at the proximal TERT promoter

58 Read Coverage Read Coverage GM12878 SK-N-SH HeLa-S3 Primers C A HepG2 HepG2 MCF-7 TERT Expression HL-60 TERT K562 A549 0.0 0.5 1.0 1.5 2.0 Luciferase Activity B A 1 1 60 0 5 0 5 0 Luc SiScr 100bp 1295150 CCGG CCGGA CCGG CCGG CCGGAGGGG C G A T C A G W ELF1 G T A T T A A 228 C A A T C A T * * C G GGG GGG A GGG GGG T C G A T C G A T ELK TERT+47 ETS-195 1295200 R 1 e CCGGA CCGGCAGGG CCGG CCGGAAGGG CCGGGAGGG C G A T por C A G ETS1 G228A * t G e 250 r A T T

* C AGGG C A T T G228A GGG ons C G A T * C G s A T i C G A T RNA t ETV1 r u * c 1295250 G250A t * G250A A251T/ G228A A227T/ G250C G250A G228C G228A G250T G228T TERT-450 W ETV3 * T G250A 0 * TERT-639 ETV4 * * ETS-294 1295300 Luciferase Activity * 5 * U GBM1 ETV5 100bp 251 TERT-5329 1 G E E E S ( * * 0 G ( T T T i ABPA S G V V S 228A GABPA WT G228A c 228A 4 3 1 r * chr5 * ) ) * 1 5 B D (strong E 0.4 0.6 0.8 0.0 0.2 % Input Mutant Clones 20 10 15 Motif enrichment 0 0 5 0 peak) 0 GABPA TERT+47 Wild-Type Clones 5 IgG Peak 10 MUT rank WT GABPA HepG2 HeLa-S3 GM12878 A549 15 TERT-5329 MUT IP IgG Input SK-MEL A G H SK-MEL U SK-N WT 375 epG 251 BM1 (weak SK-N-SH GBM1 HepG2 SK-N-SH MCF-7 K562 HL-60 -SH 2 1 - - peak) 2 1239 8 (IP) orinputcontrolDNA. cate qPCRmeasurments.N=1foreachcellline. control locusinsevencancercelllines.Values representmean%inputbasedontripli G228A inHepG2cells.(D)GABPA ChIP-qPCRforthe TERT promoterandanearby tated byorangeandblacktickmarksrespectively. Insetshowsallelicreadcoverageat and arounddistalqPCRprimers.NativeETSmotifsmutationpositionsareanno 15000 RPKM RPKM GBM Expression (RSEM) 12000 2000 6000 4000 9000 100 20 40 60 5 0 0 0 0

FEV FEV FEV

ELK1 ELK1 ELK1

ELF2 ELF2 ELF2

ERG ERG ERG

SPIB SPIB SPIB

ELK4 ELK4 ELK4

ERF ERF ERF

ETV1 ETV1 ETV1

SPIC SPIC SPIC

ETV5 ETV5 ETV5

ELF5 ELF5 ELF5

ETV7 ETV7 ETV7

ELF4 ELF4 ELF4 59 ELF3 ELF3 ELF3

EHF EHF EHF

(E) Allelic variantfrequencyinGABPA GABPA GABPA GABPA

ETV3 ETV3 ETV3

ETV2 ETV2 ETV2

FLI1 FLI1 FLI1

ETS1 ETS1 ETS1

ELK3 ELK3 ELK3

ETV6 ETV6 ETV6

ETS2 ETS2 ETS2

SPI1 SPI1 SPI1

SPDEF SPDEF SPDEF

ETV4 ETV4 ETV4

ELF1 ELF1 ELF1 - - 15000 RPKM RPKM GBM Expression (RSEM) 12000 2000 6000 4000 9000 100 20 40 60 5 0 0 0 0

FEV FEV FEV

ELK1 ELK1 ELK1

ELF2 ELF2 ELF2

ERG ERG ERG

SPIB SPIB SPIB

ELK4 ELK4 ELK4

ERF ERF ERF

ETV1 ETV1 ETV1

SPIC SPIC SPIC

ETV5 ETV5 ETV5

ELF5 ELF5 ELF5

ETV7 ETV7 ETV7

ELF4 ELF4 ELF4

ELF3 ELF3 ELF3

EHF EHF EHF

GABPA GABPA GABPA

ETV3 ETV3 ETV3

ETV2 ETV2 ETV2

FLI1 FLI1 FLI1

ETS1 ETS1 ETS1

ELK3 ELK3 ELK3

ETV6 ETV6 ETV6

ETS2 ETS2 ETS2

SPI1 SPI1 SPI1

SPDEF SPDEF SPDEF

ETV4 ETV4 ETV4

ELF1 ELF1 ELF1 compared tosiScr. at least3independentexperiments.Values aremean±sd * P <0.05,Student’s t-test U251 GBMcells,scaledtowild-typesiScramble (siScr). The results areanaverageof Wild-type, G228A,orG250A luciferase activity72hourspostETSsiRNA knockdownin Figure 3.6:GABPA selectivelyregulatesthemutant TERT promoterinU251cells. FIGURE 3.6 Luciferase Activity 1 1 0 5 0 5 W T R e por G228A t e r

* C ons * t r * u c t G250A * * * 60 S G E E E i T T T ABPA S V V S c 4 3 1 r A D

2.0 1.0 1.0

0.8 0.8 1.5 ess io n

0.6 r 0.6 ress io n ress ion p p x x ex p

1.0 E E A T T 0.4 0.4 R R E E AB P T T 0.5 G 0.2 0.2

0.0 0.0 0.0 SiSc r ETS 1 ETS 2 ETV 1 ETV 3 ETV 4 ETV 5 ELF 1 ELF 2 ELF 4 ELK 1 ELK 3 ELK 4 GABP A

ETS siRNA GABPA- 1 GABPA- 2 GABPA- 1 GABPA- 2

ELF1 B E 1.5 1.0 U251-48hrs ETS1 U251-72hrs ETV3 0.8

es io n GBM1-48hrs ETV4 r 1.0 p

x GBM1-72hrs ress ion E GABPA

0.6 p e x n E e T G

0.4 R

E 0.5 T

0.2

0.0 0.0 ELF1 ELK1 ETS1 ETV1 ETV3 ETV4 ETV5 GABPA 24 hrs 48 hrs 72 hrs C siRNA F 1.5 2.0 ELF1 GBM1 (G228A) ETS1 U251 (G228A) ETV3 1.5 1.0 ETV4 ress ion p ress ion x p E x 1.0 GABPA E T T R E R 0.5

* * * * T E T 0.5

0.0 0.0 SiScr ELF1 ELK1 ETS1 ETV1 ETV3 ETV4 ETV5 GABPA 24 hrs 48 hrs 72 hrs siRNA

FIGURE 3.7

15 A) B) * * SiScr 100 100 ETS1 10 ETV3 80 80 ETV4 * * * * GABPA 60 GBM1, 48 hours 60 U251, 48 hours 5 % of Ma x

% of Ma x siScr siScr Luciferase Activity 40 siGABPA 40 siGABPA

0 20 20 WT G228A G250A Reporter Construct 0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 FL2-A FL2-A 100 100

80 80

60 60 GBM1, 72 hours U251, 72 hours % of Ma x % of Ma x 40 siScr 40 siScr siGABPA siGABPA 20 20

0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 FL2-A FL2-A C) U251 0.6 GBM1 2.5 0.5 2.0 0.4 1.5 0.3 1.0 0.2 siScr siScr Viable cells (millions) Viable Viable cells (millions) Viable siGABPA siGABPA 0.1 0.5

0 0 24 hrs 48 hrs 72 hrs 24 hrs 48 hrs 72 hrs Time Time

61 Figure 3.7: Cell cycle profile and proliferation changes with GABPA knockdown in GBM

A) B) cell lines. (A, B) flow cytometry of propidium iodine-stained genomic DNA of GBM1 (A) 100 100 and U251 (B) cells fixed and analyzed at 48 h and 72 h post-transfection of GABPA 80 80 siRNA and scramble control. (C) Cell growth curves of GBM1 and U251 generated by

60 GBM1, 48 hours 60 U251, 48 hours counting cells collected at 0, 24, 48 and 72 h post-transfection of GABPA siRNA and % of Ma x

% of Ma x siScr siScr 40 siGABPA 40 siGABPA scramble control siRNA. Cell counts are presented as the mean value of triplicate wells

20 20 that were individually transfected, and the error bars reflect s.d.

0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 FL2-A FL2-A 100 100

80 80

60 60 GBM1, 72 hours U251, 72 hours % of Ma x % of Ma x 40 siScr 40 siScr siGABPA siGABPA 20 20

0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 FL2-A FL2-A C) U251 0.6 GBM1 2.5 0.5 2.0 0.4 1.5 0.3 1.0 0.2 siScr siScr Viable cells (millions) Viable Viable cells (millions) Viable siGABPA siGABPA 0.1 0.5

0 0 24 hrs 48 hrs 72 hrs 24 hrs 48 hrs 72 hrs Time Time

62 A B 15 20 SiScr MUT WT MUT WT A549 HL-60 ETS1 GM12878 K562 ETV3 HeLa-S3 MCF-7 10 HepG2 SK-N-SH ETV4 GABPA * * * * enrichment 5 Moti f Luciferase Activity *

0 0 WT G228A G250A 0 1 (strong peak) Peak rank (weak peak) Reporter Construct C D TERT+47 TERT-5329 0.8

TERT HepG2 Primers 0.6 SK-N-SH TERT+47 TERT-450 TERT-639 TERT-5329 GBM1 A549 U251 A375 npu t 0.4 100bp 100bp I SK-MEL-1239

GM12878 % SK-MEL-28

HeLa-S3 0.2

HL-60 0.0 GABPA IgG GABPA IgG K562

Read Coverage E MCF-7 15 IP Input HepG2 SK-N-SH GBM1 SK-N-SH HepG2 10

60 ETS-195 G228A G250A ETS-294 5 G228A Mutant Clones HepG2 WT

Read Coverage 0 0 1295150 1295200 1295250 1295300 chr5 0 5 10 15 Wild-Type Clones

FIGURE 3.8 A GABPA B ELF1 20 20 p=5.3x10-48 MUT WT p=4.9x10-66 MUT WT A549 A549 GM12878 GM12878 HeLa- S3 HepG2 HepG2 HTC- 116 HL- 60 K562 enrichment enrichment K562 MCF- 7 MCF- 7 SK- N- SH Moti f Moti f SK- N- SH

0 0 0 Peak rank 1 0 Peak rank 1 (strong peak) (weak peak) (strong peak) (weak peak) C ETS1 D ETV4 20 20 p=5.8x10-13 MUT WT p=1.6x10-4 MUT WT A549 PC3 GM12878 K562 enrichment enrichment Moti f Moti f

0 0 0 Peak rank 1 0 Peak rank 1 (strong peak) (weak peak) (strong peak) (weak peak) E ETS1 F ETV4 20 20 p=1.6x10-17 T[GC]TAGT p=4.5x10-4 T[GA]ANTCA A549 PC3 GM12878 K562 enrichment enrichment Moti f Moti f

0 0 0 Peak rank 1 0 Peak rank 1 (strong peak) (weak peak) (strong peak) (weak peak)

G 0.003 MUT WT

] GABPA ChIP bp /

1 ELF1 ChIP [ 0.002 DNase HS

Frequency 0.001 Moti f

0. 5th 4th 3rd 2nd 1st (strong peak) Enrichment Quintile (weak peak)

63 A B 15 20 SiScr MUT WT MUT WT A549 HL-60 ETS1 GM12878 K562 ETV3 HeLa-S3 MCF-7 10 HepG2 SK-N-SH ETV4 GABPA * * * * enrichment 5 Moti f Luciferase Activity *

0 0 WT G228A G250A 0 1 (strong peak) Peak rank (weak peak) Reporter Construct C D TERT+47 TERT-5329 0.8

TERT HepG2 Primers 0.6 SK-N-SH TERT+47 TERT-450 TERT-639 TERT-5329 GBM1 A549 U251 A375 npu t 0.4 100bp 100bp I SK-MEL-1239

GM12878 % SK-MEL-28

HeLa-S3 0.2

HL-60 0.0 GABPA IgG GABPA IgG K562

Read Coverage E MCF-7 15 IP Input HepG2 SK-N-SH GBM1 SK-N-SH HepG2 10

60 ETS-195 G228A G250A ETS-294 5 G228A Mutant Clones HepG2 WT

Read Coverage 0 0 1295150 1295200 1295250 1295300 chr5 0 5 10 15 Wild-Type Clones

Figure 3.8: Enrichment of the mutant (CCGGAA) and wild-type (CCGGAG) hexamer

sequences within genome-wide (A) GABPA, (B) ELF1, (C) ETS1, and (D) ETV4 ChIP-

seq peaks relative to flanking regions. P-values based on Wilcoxon rank sum test. (E)

Enrichment of the ETS1 consensus motif and indicated mutated motif in ETS1 ChIP-seq

peaks relative to flanking regions. (F) Enrichment of the ETV4 consensus motif and

indicated mutated motif in ETV4 ChIP-seq peaks relative to flanking regions. (G) Fre-

quency (per base pair) of the wild-type and mutant hexamers in HepG2 GABPA and

ELF1 ChIP-seq peaks and DNase hypersensitive sites, partitioned by quintile of enrich-

ment score.

64 15 * * SiScr ETS1

10 ETV3 ETV4 * * * * GABPA 5 Luciferase Activity

0 WT G228A G250A Reporter Construct

FIGURE 3.9

A PPP1R12B ZNF333 TERT+47 5 HepG2

GBM1 4 U251

3 A375 npu t

I SK-MEL-1239 % 2

1

0 ETS1 IgG ETS1 IgG ETS1 IgG

B STAT6 ZNF333 TERT+47 2.0 GBM1

U251

1.5 A375

SK-MEL-1239 npu t

I 1.0 %

0.5

0.0 ETV4 IgG ETV4 IgG ETV4 IgG

C Chr5 1295000 1295500 16 465 000 16 465 500 16 466 000 TERT ZNF622 Genes & Motifs 120 SK-N-SH 0 250 100bp HepG2 100bp 0

65 15 A) B) * * SiScr 100 100 ETS1 10 ETV3 80 80 ETV4 * * * * GABPA 60 GBM1, 48 hours 60 U251, 48 hours 5 % of Ma x

% of Ma x siScr siScr Luciferase Activity 40 siGABPA 40 siGABPA

0 20 20 WT G228A G250A Reporter Construct 0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 FL2-A FL2-A 100 100

80 80

60 60 GBM1, 72 hours U251, 72 hours % of Ma x % of Ma x 40 siScr 40 siScr siGABPA siGABPA 20 20

0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 FL2-A FL2-A C) U251 0.6 GBM1 2.5 0.5 2.0 0.4 1.5 0.3 1.0 0.2 siScr siScr Viable cells (millions) Viable Viable cells (millions) Viable siGABPA siGABPA 0.1 0.5

0 0 24 hrs 48 hrs 72 hrs 24 hrs 48 hrs 72 hrs Time Time

Figure 3.9: Other ETS factor candidates do not bind the mutant TERT promoter. (A)

A PPP1R12B ZNF333 TERT+47 ETS1 ChIP-qPCR for the TERT promoter and a positive and negative control locus in 5 HepG2

GBM1 five cancer cell lines. (B) ETV4 ChIP-qPCR for the TERT promoter and a positive and 4 U251 negative control locus in four cancer cell lines. Values represent mean % input based on 3 A375 npu t

I SK-MEL-1239 triplicate qPCR measurments. N=1 for each cell line. (C) ENCODE ELF1 ChIP-seq data % 2 at the proximal TERT promoter and around a distal positive control locus. Native ETS

1 motifs and mutation positions are annotated by orange and black tick marks respec-

0 ETS1 IgG ETS1 IgG ETS1 IgG tively.

B STAT6 ZNF333 TERT+47 2.0 GBM1

U251

1.5 A375

SK-MEL-1239 npu t

I 1.0 %

0.5

0.0 ETV4 IgG ETV4 IgG ETV4 IgG

C Chr5 1295000 1295500 16 465 000 16 465 500 16 466 000 TERT ZNF622 Genes & Motifs 120 SK-N-SH 0 250 100bp HepG2 100bp 0

66 A) B) 100 100

80 80

60 GBM1, 48 hours 60 U251, 48 hours % of Ma x

% of Ma x siScr siScr 40 siGABPA 40 siGABPA

20 20

0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 FL2-A FL2-A 100 100

80 80

60 60 GBM1, 72 hours U251, 72 hours % of Ma x % of Ma x 40 siScr 40 siScr siGABPA siGABPA 20 20

0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 FL2-A FL2-A C) U251 0.6 GBM1 2.5 0.5 2.0 0.4 1.5 0.3 1.0 0.2 siScr siScr Viable cells (millions) Viable Viable cells (millions) Viable siGABPA siGABPA 0.1 0.5

0 0 24 hrs 48 hrs 72 hrs 24 hrs 48 hrs 72 hrs Time Time

FIGURE 3.10

A MUTANT G250A TERT PROMOTER DNA B WILDTYPE TERT PROMOTER DNA E 25

GABPB Cy3 20 GABPA GABPB GABPA 15 Cy3

% Binding 10

5 C D GABP + G250A GABP + WILD TYPE 1500 1500 0 G250A WILDTYPE 1000 1000 y (au) y (au) 500 500 ensi t ensi t t t I n I n 0 0

1500 1500

y (au) 1000 y (au) 1000 ѝt 500 500 ensi t ensi t t t

I n 0 I n 0 0 10 20 30 0 10 20 30 Time (sec) Time (sec)

ETV4 COGNATE DNA F MUTANT G250A TERT PROMOTER DNA G J 25

20 ETV4 15 ETV4 Cy3

% Binding 10

5 H I ETV4 + G250A ETV4 + COGNATE DNA 1500 1500 0 ETV4 G250A WILDTYPE 1000 1000 Cognate y (au) y (au) 500 500 ensi t ensi t t t 0 0 i n i n

1500 1500 1000

1000 y (au) y (au) ѝW 500 500 ensi t ensi t t t i n i n 0 0 0 10 20 30 0 10 20 30 time (sec) time (sec)

67 A GABPA B ELF1 20 20 p=5.3x10-48 MUT WT p=4.9x10-66 MUT WT A549 A549 GM12878 GM12878 HeLa- S3 HepG2 HepG2 HTC- 116 HL- 60 K562 enrichment enrichment K562 MCF- 7 MCF- 7 SK- N- SH Moti f Moti f SK- N- SH

0 0 0 Peak rank 1 0 Peak rank 1 (strong peak) (weak peak) (strong peak) (weak peak) C ETS1 D ETV4 20 20 p=5.8x10-13 MUT WT p=1.6x10-4 MUT WT A549 PC3 GM12878 K562 enrichment enrichment Moti f Moti f

0 0 0 Peak rank 1 0 Peak rank 1 (strong peak) (weak peak) (strong peak) (weak peak) E ETS1 F ETV4 20 20 p=1.6x10-17 T[GC]TAGT p=4.5x10-4 T[GA]ANTCA A549 PC3 GM12878 K562 enrichment enrichment Moti f Moti f

0 0 0 Peak rank 1 0 Peak rank 1 (strong peak) (weak peak) (strong peak) (weak peak)

G 0.003 MUT WT

] GABPA ChIP bp /

1 ELF1 ChIP [ 0.002 DNase HS

Frequency 0.001 Moti f

0. 5th 4th 3rd 2nd 1st (strong peak) Enrichment Quintile (weak peak)

Figure 3.10: GABP but not ETV4 selectively binds to mutant TERT promoter DNA. (A,

B) Schematic of Cy3 labeled GABP complex binding to G250A mutant and wild-type

TERT promoter DNA sequence, which are also labeled with Cy3. The DNA was immobi-

lized to single molecule imaging surface (PEG-NeutrAvidin-biotin) and premixed GABPA

and Cy3-GABPB was applied to the imaging chamber (C, D) Representative single

molecule traces that show binding of GABP to G250A mutant, but not to wild-type TERT

promoter DNA. The green signal (~500) arises from Cy3 on DNA in all experiments and

the additional green intensity increase (~1200) is seen when Cy3 labeled GABP binds.

Binding is not detected when Cy3 labeled GABPB is added in the absence of GABPA,

suggesting that the signal increase is due to GABP complex binding. (E) The percent-

age of single molecule traces showing GABP binding in the G250A or wild-type DNA.

The average dwelltime (δt) for GABP binding to G250A mutant DNA is 5.04 ± 2.78

seconds (n=255) and no such binding was detected in the wild-type DNA. The results

are an average of at least 3 independent experiments. Values are mean ± sd. (F, G)

Schematic of Cy3 labeled ETV4 binding to G250A mutant or ETV4 Cognate Sequence,

which are also labeled with Cy3. (H, I) Representative single molecule traces show no

binding of ETV4 to the G250A mutant TERT promoter. Binding of ETV4 is seen on ETV4

Cognate DNA. (J) The percentage of single molecule traces showing ETV4 binding in

the ETV4 Cognate, G250A, or wild-type DNA sequence. The average dwell time (δt) for

ETV4 binding to ETV4 Positive Control sequence is 5.96 ± 3.08 seconds (n=250) and

no such binding was detected in the G250A or wild-type sequence. The average dwell

time (δt) for ETV4 binding to ETV4 Positive Control sequence is 5.96 ± 3.08 seconds

(n=250)

68 FIGURE 3.11 Gapba ChIP, SK-N-SH 20 G228A G228C

Read G228T

coverage WT 0 1 295 150 1 295 200 1 295 250 1 295 300 chr5 DGF, HepG2 100

0 1 295 150 1 295 200 1 295 250 1 295 300 chr5 DGF, SK-N-SH 200

0 1 295 150 1 295 200 1 295 250 1 295 300 chr5 PolII ChIP, HepG2 35

0 1 295 150 1 295 200 1 295 250 1 295 300 chr5 PolII ChIP, SK-N-SH 12

0 1 295 150 1 295 200 1 295 250 1 295 300 chr5

Figure 3.11: Read coverage around the TERT promoter mutations in ENCODE GABPA

ChIP-Seq, Pol II ChIP-Seq, and Digital Genomic Footprinting (DGF) data from HepG2 and SK-N-SH cells. The read coverage at the G228A mutation is color coded by the observed nucleotides to highlight allele specific binding.

69 A PPP1R12B ZNF333 TERT+47 5 HepG2

GBM1 4 U251

3 A375 npu t

I SK-MEL-1239 % 2

1

0 ETS1 IgG ETS1 IgG ETS1 IgG

B STAT6 ZNF333 TERT+47 2.0 GBM1

U251

1.5 A375

SK-MEL-1239 npu t

I 1.0 %

0.5

0.0 ETV4 IgG ETV4 IgG ETV4 IgG

C Chr5 1295000 1295500 16 465 000 16 465 500 16 466 000 TERT ZNF622 Genes & Motifs 120 SK-N-SH 0 250 100bp HepG2 100bp 0

FIGURE 3.12 Gapba ChIP, SK-N-SH 20 A B 0.6 G228A TERT-450 15 IP Input TERT-639 GBM2 G228C GBM3

Read TERT-5329 G228T GBM4 H coverage WT D 0.4 10 GBM5 P

0 A U87

1 295 150 1 295 200 1 295 250 1 295 300 chr5 T / G R

E 0.2 5 T

DGF, HepG2 Mutant Clones 100

0 0.0 0 5 10 15 GBM1 GBM2 GBM3 GBM4 GBM5 GBM6 GBM7 Wild-Type Clones Mutant Wild-Type 0 1 295 150 1 295 200 1 295 250 1 295 300 chr5 C D 250 DGF, SK-N-SH RNA Exome 200 GBM1 200 G228A G250A 90 GBM9

150 Allele 60

0 100 Alternate Read Coverage 30 1 295 150 1 295 200 1 295 250 1 295 300 chr5 50 PolII ChIP, HepG2 0 35 0 4500 4750 5000 5250 5500 5750 6000 0 30 60 90 Location (last 4 Hg19 digits) Reference Allele

0 1 295 150 1 295 200 1 295 250 1 295 300 chr5 PolII ChIP, SK-N-SH Figure 3.12: GBMs harboring TERT promoter mutations display allele-specific 12 H3K4me3 and gene expression. (A) H3K4me3 ChIP-PCR of the TERT promoter in five

TERT promoter mutant and two wild-type primary GBMs. Primer set enrichments for the 0 1 295 150 1 295 200 1 295 250 1 295 300 chr5 regions TERT-450 and TERT-639 indicated in Fig. 2 were compared between the

mutant and wild-type groups (Wilcoxon Rank Sum test. P-Value = 0.0057). Values

represent mean enrichment compared to GAPDH positive control primers based on

triplicate qPCR measurments. N=1 for each cell line. (B) Allelic variant frequency in

H3K4me3 (IP) or input control DNA. (C) Nucleosome-Positioning prediction from

H3K4me3 ChIP-seq in GBM1. (D) Allelic TERT expression in GBM. RNA-seq and

Exome-seq read counts are given for three informative SNPs in the TERT gene.

70 A PPP1R12B ZNF333 TERT+47 5 HepG2

GBM1 4 U251

3 A375 npu t

I SK-MEL-1239 % 2

1

0 ETS1 IgG ETS1 IgG ETS1 IgG

B STAT6 ZNF333 TERT+47 2.0 GBM1

U251

1.5 A375

SK-MEL-1239 npu t

I 1.0 %

0.5

0.0 ETV4 IgG ETV4 IgG ETV4 IgG

C Chr5 1295000 1295500 16 465 000 16 465 500 16 466 000 TERT ZNF622 Genes & Motifs 120 SK-N-SH 0 250 100bp HepG2 100bp 0

FIGURE 3.13

A B C 3 0.02 3000 A B 10.5 10.5 0.6 # motifs Strong Peaks TERT-450 IP Input 15 0 Weak Peaks Score GBM2 t TERT-639 1 Genome-wide GBM3 2 2000 TERT-5329 Density 2 Density Random Uniform GBM4 y y H 0.01 D 0.4 10 GBM5 P Enrichmen A U87 GABPA-ChIP 1 n 1000 Probabilit Probabilit T / G R Media E 0.2 5 T Mutant Clones 0 0 0 2 3 0 10 20 30 40 50 0 10 20 30 40 50 Log [Enrichment] Motif Spacing(bp) Motif Spacing(bp) 10 0 0.0 0 5 10 15 GBM1 GBM2 GBM3 GBM4 GBM5 GBM6 GBM7 3 0.02 2000 Wild-Type Clones 10.5 10.5 Mutant Wild-Type # motifs Strong Peaks

0 Weak Peaks Score Genome-wide t C D 2 1 Density 2 Density Random Uniform 250 y y RNA Exome 0.01 1000 GBM1 ELF1-ChIP G228A G250A Enrichmen 200 90 GBM9 1 n Probabilit Probabilit

150 Media Allele 60 0 0 0 2 3 0 10 20 30 40 50 0 10 20 30 40 50 Log [Enrichment] Motif Spacing(bp) Motif Spacing(bp) 100 10 Alternate Read Coverage 30 50 3 0.02 1000 10.5 Strong Peaks 10.5 0 # motifs 0 0 Weak Peaks Score 4500 4750 5000 5250 5500 5750 6000 0 30 60 90 Genome-wide t 2 1

Density Density Random Uniform Location (last 4 Hg19 digits) Reference Allele 2 y y 0.01 ETS1-ChIP Enrichmen

1 n Probabilit Probabilit Media 0 0 0 2 3 0 10 20 30 40 50 0 10 20 30 40 50 Log [Enrichment] Motif Spacing(bp) Motif Spacing(bp) 10 D 30

G250A G250A-A197T 25 G250A-G201T WT 100

20

15 50

WT / G250A-G201T 10 G250A / G250A-A197T

0 5

0 2 4 6 8 10 12 14 16 Deletion

71 A MUTANT G250A TERT PROMOTER DNA B WILDTYPE TERT PROMOTER DNA E 25

GABPB Cy3 20 GABPA GABPB GABPA 15 Cy3

% Binding 10

5 C D GABP + G250A GABP + WILD TYPE 1500 1500 0 G250A WILDTYPE 1000 1000 y (au) y (au) 500 500 ensi t ensi t t t I n I n 0 0

1500 1500

y (au) 1000 y (au) 1000 ѝt 500 500 ensi t ensi t t t

I n 0 I n 0 0 10 20 30 0 10 20 30 Time (sec) Time (sec)

F MUTANT G250A TERT PROMOTER DNA G ETV4 COGNATE DNA J 25

20 ETV4 15 ETV4 Cy3

% Binding 10

5 H I ETV4 + G250A ETV4 + COGNATE DNA 1500 1500 0 ETV4 G250A WILDTYPE 1000 1000 Cognate y (au) y (au) 500 500 ensi t ensi t t t 0 0 i n i n

1500 1500 1000

1000 y (au) y (au) ѝW 500 500 ensi t ensi t t t i n i n 0 0 0 10 20 30 0 10 20 30 time (sec) time (sec)

Figure 3.13: Relationship between GABP motif pair distances and binding strength and

A B C 3 0.02 3000 TERT promoter activity. (A) Distribution of binding enrichment scores within ChIP-seq 10.5 10.5 # motifs Strong Peaks

0 Weak Peaks Score Genome-wide t peaks containing zero, one, or two CCGGAA motifs. (B) Distribution of motif separations 2 1 2000 Density 2 Density Random Uniform y y 0.01 in weak and strong ChIP-seq peaks. The dashed black curve shows the theoretical null Enrichmen GABPA-ChIP 1 n 1000 Probabilit Probabilit distribution and green curve shows the genome-wide distribution outside of peaks. Media 0 0 0 2 3 0 10 20 30 40 50 0 10 20 30 40 50 Log [Enrichment] Motif Spacing(bp) Motif Spacing(bp) Vertical dotted lines indicate 10.5bp periodicity. (C) Dependence of ChIP enrichment on 10

3 0.02 2000 motif separation using sliding window of width 3bp. The shaded area indicates 95% 10.5 10.5 # motifs Strong Peaks

0 Weak Peaks Score Genome-wide t confidence interval of the median based on bootstrap resampling. (D) Site-directed 2 1 Density 2 Density Random Uniform y y 0.01 1000

ELF1-ChIP mutagenesis deleting between 2 to 16 base pairs at the G228A site. Deletions were Enrichmen

1 n Probabilit Probabilit tested for promoter activity in a Wild-type, G250A, G250A+A197T, and G250A+G201T Media 0 0 0 2 3 0 10 20 30 40 50 0 10 20 30 40 50 Log [Enrichment] Motif Spacing(bp) Motif Spacing(bp) background. The results are an average of at least 3 independent experiments. Values 10

3 0.02 1000 are mean ± sd. 10.5 10.5 # motifs Strong Peaks

0 Weak Peaks Score Genome-wide t 2 1 Density 2 Density Random Uniform y y 0.01 ETS1-ChIP Enrichmen

1 n Probabilit Probabilit Media 0 0 0 2 3 0 10 20 30 40 50 0 10 20 30 40 50 Log [Enrichment] Motif Spacing(bp) Motif Spacing(bp) 10 D 30

G250A G250A-A197T 25 G250A-G201T WT 100

20

15 50

WT / G250A-G201T 10 G250A / G250A-A197T

0 5

0 2 4 6 8 10 12 14 16 Deletion

72 FIGURE 3.14

Longest Peak (by length) Peak Rank

Shortest Peak -1000 -500 0 500 1000 Position (bp)

Figure 3.14: Map of CCGGAA motifs around ENCODE GABPA ChIP-seq peaks in

HepG2 cells. The peaks (outlined by dashed orange lines) were sorted by length.

73 Gapba ChIP, SK-N-SH 20 G228A G228C

Read G228T

coverage WT 0 1 295 150 1 295 200 1 295 250 1 295 300 chr5 DGF, HepG2 100

0 1 295 150 1 295 200 1 295 250 1 295 300 chr5 DGF, SK-N-SH 200

0 1 295 150 1 295 200 1 295 250 1 295 300 chr5 PolII ChIP, HepG2 35

0 1 295 150 1 295 200 1 295 250 1 295 300 chr5 PolII ChIP, SK-N-SH 12

0 1 295 150 1 295 200 1 295 250 1 295 300 chr5

FIGURE 3.15

25bp 46bp Longest Peak A B Strong peaks 48bp 0.02 Weak peaks ETS-200 xxxx Random uniform Genome-wide GCGGAAAGGAAGGG CTGGAAGGT TSS ETS-195 G228A G250A ETS-294 density Random uniform

y 30bp 53bp 0.01 10.5 TERT Promoter Probabilit Tandem Duplication Without Mutations 0 (by length) Peak Rank 0 50 100 TSS ETS-195 Junction ETS ETS-294 Motif spacing (bp) 41bp

C * * * D 15 * * * WT 30 G250A G228A Fit G250A 25 Shortest Peak Insertion 100 -1000 -500 0 500 1000 10 20 Position (bp) 15 50 G250A

5 G250A+G201T 10 Luciferase Activity * 0 * 5 G250A+G201T Fit 0 - + ------+ - - - - - A296T (ETS-294) -2 0 2 4 6 8 10 12 14 16 - - + - - + - - - + - - - + A197T (ETS-195) Deletion (bp) - - - + - - + - - - + - - - G201T (ETS-200) - + - Junction Mutation

Figure 3.15: G228A and G250A cooperate with the native ETS sites ETS-195 and

ETS-200 and fall within spacing for GABP heterotetramer recruitment. (A) Distribution of

motif separation in weak and strong GABP peaks. Vertical dotted lines denote periodic-

ity of 10.5bp. Horizontal dashed line indicates the theoretical null distribution. (B) Native

and de novo putative ETS-binding sites in the core TERT promoter. (C) Site-directed

mutagenesis of the GABP heterotetramer motifs in the wild-type, G228A, G250A, or

insertion TERT reporter constructs. Mutation of the ETS-195, ETS-294, or junction motif

are indicated by ‘+’. The results are an average of at least 3 independent experiments.

Values are mean ± sd * P <0.05, Student’s t-test. (D) Site-directed mutagenesis deleting

between 2 to 16 base pairs at the G228A site.

74 Gapba ChIP, SK-N-SH 20 A B 0.6 G228A TERT-450 15 IP Input TERT-639 GBM2 G228C GBM3

Read TERT-5329 G228T GBM4 H coverage WT D 0.4 10 GBM5 P

0 A U87

1 295 150 1 295 200 1 295 250 1 295 300 chr5 T / G R

E 0.2 5 T

DGF, HepG2 Mutant Clones 100

0 0.0 0 5 10 15 GBM1 GBM2 GBM3 GBM4 GBM5 GBM6 GBM7 Wild-Type Clones Mutant Wild-Type 0 1 295 150 1 295 200 1 295 250 1 295 300 chr5 C D 250 DGF, SK-N-SH RNA Exome 200 GBM1 200 G228A G250A 90 GBM9

150 Allele 60

0 100

Read Coverage Alternate 30 1 295 150 1 295 200 1 295 250 1 295 300 chr5 50 PolII ChIP, HepG2 0 35 0 4500 4750 5000 5250 5500 5750 6000 0 30 60 90 Location (last 4 Hg19 digits) Reference Allele

0 1 295 150 1 295 200 1 295 250 1 295 300 chr5 PolII ChIP, SK-N-SH 12

0 1 295 150 1 295 200 1 295 250 1 295 300 chr5

Deletions were tested for promoter activity in a G250A or G250A+G201T background.

The sinusoidal fits were obtained by using the model asin(2π(x - b)/10.5) + cx + d. The

results are an average of at least 3 independent experiments. Values are mean ± sd.

75 A B C 3 0.02 3000 A B 10.5 10.5 0.6 # motifs Strong Peaks TERT-450 IP Input 15 0 Weak Peaks Score GBM2 t TERT-639 1 Genome-wide GBM3 2 2000 TERT-5329 Density 2 Density Random Uniform GBM4 y y H 0.01 D 0.4 10 GBM5 P Enrichmen A U87 GABPA-ChIP 1 n 1000 Probabilit Probabilit T / G R Media E 0.2 5 T Mutant Clones 0 0 0 2 3 0 10 20 30 40 50 0 10 20 30 40 50 Log [Enrichment] Motif Spacing(bp) Motif Spacing(bp) 10 0 0.0 0 5 10 15 GBM1 GBM2 GBM3 GBM4 GBM5 GBM6 GBM7 3 0.02 2000 Wild-Type Clones 10.5 10.5 Mutant Wild-Type # motifs Strong Peaks

0 Weak Peaks Score Genome-wide t C D 2 1 Density 2 Density Random Uniform 250 y y RNA Exome 0.01 1000 GBM1 ELF1-ChIP G228A G250A Enrichmen 200 90 GBM9 1 n Probabilit Probabilit

150 Media Allele 60 0 0 0 2 3 0 10 20 30 40 50 0 10 20 30 40 50 Log [Enrichment] Motif Spacing(bp) Motif Spacing(bp) 100 10

Read Coverage Alternate 30 50 3 0.02 1000 10.5 Strong Peaks 10.5 0 # motifs 0 0 Weak Peaks Score 4500 4750 5000 5250 5500 5750 6000 0 30 60 90 Genome-wide t 2 1

Density Density Random Uniform Location (last 4 Hg19 digits) Reference Allele 2 y y 0.01 ETS1-ChIP Enrichmen

1 n Probabilit Probabilit Media 0 0 0 2 3 0 10 20 30 40 50 0 10 20 30 40 50 Log [Enrichment] Motif Spacing(bp) Motif Spacing(bp) 10 D 30

G250A G250A-A197T 25 G250A-G201T WT 100

20

15 50

WT / G250A-G201T 10 G250A / G250A-A197T

0 5

0 2 4 6 8 10 12 14 16 Deletion

FIGURE 3.16

1

Strong peaks

Weak peaks

Genome wide spectrum (A.U.) Powe r

0 1/20 1/10 1/5 1/4 1/3 1/2 Frequency (1/bp)

Figure 3.16: Fourier spectral analysis of the GABPA motif spacing distribution for sepa-

ration between 15 and 100 bps from Figure 3.15A. Fourier analysis reveals significant

periodicity around 10bp intervals.

76 A B C 3 0.02 3000 10.5 10.5 # motifs Strong Peaks

0 Weak Peaks Score Genome-wide t 2 1 2000 Density 2 Density Random Uniform y y 0.01 Enrichmen GABPA-ChIP 1 n 1000 Probabilit Probabilit Media 0 0 0 2 3 0 10 20 30 40 50 0 10 20 30 40 50 Log [Enrichment] Motif Spacing(bp) Motif Spacing(bp) 10

3 0.02 2000 10.5 10.5 # motifs Strong Peaks

0 Weak Peaks Score Genome-wide t 2 1 Density 2 Density Random Uniform y y 0.01 1000 ELF1-ChIP Enrichmen

1 n Probabilit Probabilit Media 0 0 0 2 3 0 10 20 30 40 50 0 10 20 30 40 50 Log [Enrichment] Motif Spacing(bp) Motif Spacing(bp) 10

3 0.02 1000 10.5 10.5 # motifs Strong Peaks

0 Weak Peaks Score Genome-wide t 2 1 Density 2 Density Random Uniform y y 0.01 ETS1-ChIP Enrichmen

1 n Probabilit Probabilit Media 0 0 0 2 3 0 10 20 30 40 50 0 10 20 30 40 50 Log [Enrichment] Motif Spacing(bp) Motif Spacing(bp) 10 D 30

G250A G250A-A197T 25 G250A-G201T WT 100

20

15 50

WT / G250A-G201T 10 G250A / G250A-A197T

0 5

0 2 4 6 8 10 12 14 16 Deletion

FIGURE 3.17

A B 1 1000 Strong peaks 2000

Weak peaks 800

Genome wide 1500 600 (RSEM) (RSEM) 1000 Expression Expression 400 spectrum (A.U.)

500 200 Powe r

0 0 .2 .2 .2 .2 .2 .2 .2 .2 f f e e a.2 a.2 w r w r 0 uc002ylx.3 uc002yly.3 uc002ylx.3 uc002yly.3 uc001zy uc001zy uc001zyc.2 uc001zyc.2 uc001zy b uc001zy b uc001z y uc001z y uc001z y uc001z y uc001 e uc001 e 1/20 1/10 1/5 1/4 1/3 1/2 uc001zyd.2 uc001zyd.2

Frequency (1/bp) GABPA GABPB GABPB GABPA GABPB GABPB (Tetramer forming) (non-tetramer forming) (Tetramer forming) (non-tetramer forming)

C D

1200

1000 1000

800 800

600 600

400 400

200 200

0 0 .2 .2 .2 .2 .2 .2 .2 .2 f f e a.2 e a.2 w r w r uc002ylx.3 uc002yly.3 uc002ylx.3 uc002yly.3 uc001zy uc001zy uc001zyc.2 uc001zyc.2 uc001zy b uc001z y uc001zy b uc001z y uc001z y uc001 e uc001z y uc001zyd.2 uc001 e uc001zyd.2

GABPA GABPB GABPB GABPA GABPB GABPB (Tetramer forming) (non-tetramer forming) (Tetramer forming) (non-tetramer forming)

77 Longest Peak (by length) Peak Rank

Shortest Peak -1000 -500 0 500 1000 Position (bp)

Figure 3.17: GABPB isoform expression in GBM. Level 3 isoform-specific RNA-seq data were obtained for (A) GBM, (B) melanoma, (C) HCC, and (D) bladder urothelial carcinoma samples from the TCGA data portal. Box plots are shown for the expression of each UCSC gene ID of GABPA, GABPB1, and GABPB2. Isoforms that allow for heterotetramer formation (purple) are expressed at higher levels than those that do not allow heterotetramer formation (cyan).

78 TABLE 3.1

Sample Genotype (Allele1/Allele2) GBM1 wt / G228A GBM1-culture wt / G228A GBM2 wt / G250A GBM3 wt / G250A GBM4 wt / G228A GBM5 wt / G250A GBM6 wt / wt GBM7 wt / wt GBM8 wt / G250A GBM9 wt / G250A GBM10 wt / G228A GBM11 wt / wt NB1 wt / wt NB2 wt / wt U87 wt / G228A U251 G228A / G228A HepG2 wt / G228A SK-N-SH wt / G228A A375 wt / G250A SK-MEL-28 wt / wt SK-MEL-1239 G228A / G228A

Table 3.1 Summary of the TERT mutation status for all samples and cell lines used

79 TABLE 3.2

Cloning Primers Name Forward Sequence (5'->3') Reverse Sequence (5'->3') TERT_ GCTCGCTAGCCTCGAgtcctgcc CGCCGAGGCCAGATCCAGCGCTGCC WT ccttcacctt TGAAACTC

Mutagenesis Primers Name Forward Sequence (5'->3') Reverse Sequence (5'->3') G228A gggagggcccggaaggggctgg ccagccccttccgggccctccc G228T gggagggcccggatggggctgg ccagccccatccgggccctccc G228C ccagccccgtccgggccctccc gggagggcccggacggggctgg G250A cggggacccggaaggggtcggga tcccgaccccttccgggtccccg G250T tcccgacccctaccgggtccccg cggggacccggtaggggtcggga G250C ggggacccggcaggggtcggg cccgacccctgccgggtcccc A227T/G228A ggcccagcccctaccgggccctcc ggagggcccggtaggggctgggcc A251T/G250A cgtcccgaccccatccgggtccccgg ccggggacccggatggggtcgggacg A197T tccccttccttaccgcggccccg cggggccgcggtaaggaagggga A296T gccccttcacctaccagctccgcct aggcggagctggtaggtgaaggggc G201T gcccctccccttactttccgcggcc ggccgcggaaagtaaggggaggggc Junction ctccccttccttacccagccccctc gagggggctgggtaaggaaggggag 2bp_del tgggagggcccagggggctggg cccagccccctgggccctccca 4bp_del gggagggcccggggctgggc gcccagccccgggccctccc 6bp_del gggagggcccggctgggccg cggcccagccgggccctccc 8bp_del cccggcccaggggccctccc gggagggcccctgggccggg 10bp_del gggagggcccgggccgggga tccccggcccgggccctccc 12bp_del gggagggcccgccggggacc ggtccccggcgggccctccc 14bp_del gggagggccccggggacccg cgggtccccggggccctccc 16bp_del gggagggcccgggacccggg cccgggtcccgggccctccc

Table 3.2 Cloning and mutagenesis primers

80 TABLE 3.3

Name Catalog Number Molarity Full Name ETS1 M-003887-00-0005 5 nmol SMARTpool: siGENOME ETS1 siRNA ETS2 M-003888-00-0005 5 nmol SMARTpool: siGENOME ETS2 siRNA ETV1 M-003801-03-0005 5 nmol SMARTpool: siGENOME ETV1 siRNA ETV3 M-010509-00-0005 5 nmol SMARTpool: siGENOME ETV3 siRNA ETV4 M-004207-01-0005 5 nmol SMARTpool: siGENOME ETV4 siRNA ETV5 M-008894-00-0005 5 nmol SMARTpool: siGENOME ETV5 siRNA ELK1 M-003885-01-0005 5 nmol SMARTpool: siGENOME ELK1 siRNA ELK3 M-010320-00-0005 5 nmol SMARTpool: siGENOME ELK3 siRNA ELK4 M-010315-01-0005 5 nmol SMARTpool: siGENOME ELK4 siRNA ELF1 M-012669-00-0005 5 nmol SMARTpool: siGENOME ELF1 siRNA ELF2 M-012754-00-0005 5 nmol SMARTpool: siGENOME ELF2 siRNA ELF4 M-020177-01-0005 5 nmol SMARTpool: siGENOME ELF4 siRNA GABPA-1 M-011662-01-0005 5 nmol SMARTpool: siGENOME GABPA siRNA SMARTpool: ON-TARGETplus GABPA GABPA-2 L-011662-00-0005 5 nmol siRNA siScr D-001206-13-05 5 nmol siGENOME Non-Targeting siRNA Pool #1

Table 3.3 siRNAs used for knockdown studies

81 TABLE 3.4

Name Forward Sequence (5'->3') Reverse Sequence (5'->3') Tm TERT-1 CTCCTTCCGCCAGGTGTC GAAGGCCAGCACGTTCTTC 60 GAPDH ATGGGGAAGGTGAAGGTCG GGGGTCATTGATGGCAACAAT 60 GUSB CTCATTTGGAATTTTGCCGATT CCGAGTGAAGATCCCCTTTTTA 60 TERT-2 TCACGGAGACCACGTTTCAAA TTCAAGTGCTGTCTGATTCCAAT 60 ETS1 TACACAGGCAGTGGACCAATC CCCCGCTGTCTTGTGGATG 60 ETS2 CCCCTGTGGCTAACAGTTACA AGGTAGCTTTTAAGGCTTGACTC 60 ELF1 TGTCCAACAGAACGACCTAGT GGCAGGAAAAATAGCTGGATCAC 60 ELF2 AAACTGTAGTGGAGGTGTCAACT CATGGCTATCTGGTGATGTTGG 60 ELF4 CCTGATCTTTGAGTTCGCAAGC AGTCCCGAGTACAGATGCAGT 60 ETV1 GGCCCCAGGCAGTTTTATGAT GATCCTCGCCGTTGGTATGT 60 ETV3 GGTGGAGGGTATCAGTTTCCT TGATGAATGGGTAGTTGGGCAT 60 ETV4 CAGTGCCTTTACTCCAGTGCC CTCAGGAAATTCCGTTGCTCT 60 ETV5 CAGTCAACTTCAAGAGGCTTGG TGCTCATGGCTACAAGACGAC 60 ELK1 TCCCTGCTTCCTACGCATACA GCTGCCACTGGATGGAAACT 60 ELK3 ATCTGCTGGACCTCGAACGA TTCTGCCCGATCACCTTCTTG 60 ELK4 ACTCAGCCGAGCCCTCAG GGTGGCTTTTTGGAAGGTG 60 GABPA AAGAACGCCTTGGGATACCCT GTGAGGTCTATATCGGTCATGCT 60

Table 3.4 qPCR primers used for SYBR green gene expression assays

82 TABLE 3.5

Cell line GABPA A549 wgEncodeHaibTfbsA549GabpV0422111Etoh02PkRep1.broadPeak.gz wgEncodeHaibTfbsA549GabpV0422111Etoh02PkRep2.broadPeak.gz GM12878 wgEncodeHaibTfbsGm12878GabpPcr2xPkRep1.broadPeak.gz wgEncodeHaibTfbsGm12878GabpPcr2xPkRep2.broadPeak.gz Hek293

HeLa-S3 wgEncodeHaibTfbsHelas3GabpPcr1xPkRep1.broadPeak.gz wgEncodeHaibTfbsHelas3GabpPcr1xPkRep2.broadPeak.gz HepG2 wgEncodeHaibTfbsHepg2GabpPcr2xPkRep1.broadPeak.gz wgEncodeHaibTfbsHepg2GabpPcr2xPkRep2.broadPeak.gz HL-60 wgEncodeHaibTfbsHl60GabpV0422111PkRep1.broadPeak.gz wgEncodeHaibTfbsHl60GabpV0422111PkRep2.broadPeak.gz HTC-116

K562 wgEncodeHaibTfbsK562GabpV0416101PkRep1.broadPeak.gz wgEncodeHaibTfbsK562GabpV0416101PkRep2.broadPeak.gz MCF-7 wgEncodeHaibTfbsMcf7GabpV0422111PkRep1.broadPeak.gz wgEncodeHaibTfbsMcf7GabpV0422111PkRep2.broadPeak.gz SK-N-SH wgEncodeHaibTfbsSknshGabpV0422111PkRep1.broadPeak.gz wgEncodeHaibTfbsSknshGabpV0422111PkRep2.broadPeak.gz Cell line ELF1 A549 wgEncodeHaibTfbsA549Elf1V0422111Etoh02PkRep1.broadPeak.gz wgEncodeHaibTfbsA549Elf1V0422111Etoh02PkRep2.broadPeak.gz GM12878 wgEncodeHaibTfbsGm12878Elf1sc631V0416101PkRep1.broadPeak.gz wgEncodeHaibTfbsGm12878Elf1sc631V0416101PkRep2.broadPeak.gz Hek293

HeLa-S3

HepG2 wgEncodeHaibTfbsHepg2Elf1sc631V0416101PkRep1.broadPeak.gz wgEncodeHaibTfbsHepg2Elf1sc631V0416101PkRep2.broadPeak.gz HL-60

HTC-116 wgEncodeHaibTfbsHct116Elf1V0422111PkRep1.broadPeak.gz wgEncodeHaibTfbsHct116Elf1V0422111PkRep2.broadPeak.gz K562 wgEncodeHaibTfbsK562Elf1sc631V0416102PkRep1.broadPeak.gz wgEncodeHaibTfbsK562Elf1sc631V0416102PkRep2.broadPeak.gz wgEncodeHaibTfbsMcf7Elf1V0422111PkRep1.broadPeak.gz

83 MCF-7 wgEncodeHaibTfbsMcf7Elf1V0422111PkRep1.broadPeak.gz wgEncodeHaibTfbsMcf7Elf1V0422111PkRep2.broadPeak.gz SK-N-SH wgEncodeHaibTfbsSknshElf1V0422111PkRep1.broadPeak.gz wgEncodeHaibTfbsSknshElf1V0422111PkRep2.broadPeak.gz Cell line Elk1 A549

GM12878 wgEncodeAwgTfbsSydhGm12878Elk112771IggmusUniPk.narrowPeak.gz

Hek293

HeLa-S3 wgEncodeAwgTfbsSydhHelas3Elk112771IggrabUniPk.narrowPeak.gz

HepG2

HL-60

HTC-116

K562 wgEncodeAwgTfbsSydhK562Elk112771IggrabUniPk.narrowPeak.gz

MCF-7

SK-N-SH

Cell line Elk4 A549

GM12878

Hek293 wgEncodeAwgTfbsSydhHelas3Elk4UcdUniPk.narrowPeak.gz

HeLa-S3 wgEncodeAwgTfbsSydhHelas3Elk4UcdUniPk.narrowPeak.gz

HepG2

HL-60

HTC-116

K562

84 MCF-7

SK-N-SH

Cell line Ets1 A549 wgEncodeHaibTfbsA549Ets1V0422111Etoh02PkRep1.broadPeak.gz wgEncodeHaibTfbsA549Ets1V0422111Etoh02PkRep2.broadPeak.gz GM12878 wgEncodeHaibTfbsGm12878Ets1Pcr1xPkRep1V2.broadPeak.gz wgEncodeHaibTfbsGm12878Ets1Pcr1xPkRep2V2.broadPeak.gz Hek293

HeLa-S3

HepG2

HL-60

HTC-116

K562 wgEncodeHaibTfbsK562Ets1V0416101PkRep1.broadPeak.gz wgEncodeHaibTfbsK562Ets1V0416101PkRep2.broadPeak.gz MCF-7

SK-N-SH

Table 3.5 ENCODE Gabpa ChIP-seq peak annotations broadPeak files were downloaded from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeHaibTfbs/ narrowPeak files were downloaded from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgTfbsUnifo rm/

85

TABLE 3.6 ENCODE Data: Cell line Method/ChIP SRA ID A549 GABPA SRR577875, SRR577876 GM12878 GABPA SRR351549, SRR351550 HeLa-S3 GABPA SRR351735, SRR351736 HepG2 GABPA SRR351519, SRR351520 HL-60 GABPA SRR577853, SRR577854 K562 GABPA SRR351867, SRR351868 MCF-7 GABPA SRR577938, SRR577939 SK-N-SH GABPA SRR577688, SRR577689 HepG2 Pol2 SRR351572, SRR351573 SK-N-SH Pol2 SRR577844, SRR577845 HepG2 DGF SRR089667, SRR089668, SRR089669 SK-N-SH DGF SRR089676

Other published ETS-factor ChIP-seq data: Cell line ChIP SRA ID PMID 293T Input SRR330108 22028470 293T ETV3 SRR330110 22028470 PC3 Input SRR314992 22012618 PC3 ETV4 SRR314993 22012618 RPWE1 Input SRR314995 22012618 RPWE1 ETS1 SRR314998 22012618 RPWE1 GABPA SRR314999 22012618 RPWE1 ETV1 SRR315001 22012618

Table 3.6 SRA ID numbers for raw sequencing files

86

CHAPTER 4.

MATERIALS AND METHODS

87 4.1 SAMPLE ACQUISITION

All primary tumor samples were snap frozen in liquid nitrogen and stored at -80° C until use. Patient-matched normal samples were peripheral blood mononuclear cells or muscle tissue. GBM1 – GBM11 samples were obtained from the Neurosurgery Tissue

Bank at the University of California San Francisco (UCSF). Sample use was approved by the Committee on Human Research at UCSF and research was approved by the institutional review board at UCSF. All patients provided informed written consent. Snap frozen normal human post-mortem brain tissue from two males (55 and 56 years of age respectively) was obtained from the National Disease Research Interchange (NDRI) and frontal cerebral cortex gray matter was macrodissected.

4.2 TERT GENOTYPING

Samples were genotyped for TERT promoter mutation status using the Roche GC-Rich

Kit. Primers: TERT_GENOTYPE (forward: 5’-

GTAAAACGACGGCCAGACGTGGCGGAGGGACTG-3’; reverse: 5’-

CAGGAAACAGCTATGACAGGGCTTCCCACGTGCG-3’; Tm=60C). Samples were submitted to Sanger sequencing with the following sequencing primers M13 (forward:

5’-GTAAAACGACGGCCAG-3’; reverse: 5’-CAGGAAACAGCTATGAC-3’; Tm=60C). A list of genotypes for all samples is given in Table 3.1.

4.3 RNA EXTRACTION AND QRT-PCR FOR HUMAN PRIMARY GLIOBLASTOMAS

AND NORMAL BRAIN

88 Total RNA was isolated from frozen tumor tissue or non-diseased postmortem human brain using TRIzol (Life Technologies, Grand Island, NY). Further cleanup and on- column DNase digestion were performed with the RNeasy kit (Qiagen, Valencia, CA). cDNA synthesis was performed with a mix of random hexamer and oligo dT using

Moloney Murine Leukemia Virus Reverse Transcriptase (Life Technologies).

Quantitative RT-PCR using iQ SYBR Green Supermix (Bio-Rad, Hercules, CA) was performed with TERT-1 and GAPDH primers (primer sequences given in Table 3.4).

Melting curves were manually inspected to confirm PCR specificity. Relative expression levels were calculated using the delta delta C(T) method.

4.4 CHROMATIN IMMUNOPRECIPITATION (CHIP)

Chromatin isolated from primary GBM tissue was digested to mononucleosomes with micrococcal nuclease. Histones marked with H3K4me3 (Cell signaling #9751),

H3K4me1 (Diagenode # pAb-037-050), and H3K27ac (Active Motif #39133), were immunoprecipitated using Sepharose beads coated in protein A/G, and then DNA purified. An IgG negative control was performed in each experiment and quantitative

PCR verified enrichment. Illumina library construction was performed as per manufacturer’s instructions. A total of 75-bp single-end or paired-end sequencing was performed on the Illumina HiSeq. As a control, input DNA from each chromatin preparation was also sequenced. The resulting sequences were quality filtered and mapped back to the human genome using the Burrows-Wheeler Aligner (Li and Durbin

2010). The sequencing libraries were aligned as single-end samples to ensure equal mapping bias across the samples. ChIP enrichment was further verified using CHANCE

89 (Diaz et al. 2012). Individual peak calling was performed using MACS at a 1% false discovery rate and a P-value < 1x10-10 (Zhang et al. 2008).

Enrichment at the TERT promoter was determined by qPCR with the following primer sets: TERT-450 (forward: 5’-CTGATCCGGAGACCCAGG-3’; reverse: 5’-

GTCCTCCCCTTCACGTCC-3’; Tm=60C), TERT-639 (forward: 5’-

ACTTGGGCTCCTTGACACAG-3’; reverse: 5’-TTTGGGGTGGTTTGCTCATG-3’;

Tm=60C), TERT-5329 (forward: 5’-TTAGGGCGAAAAATCCCTCT-3’; reverse: 5’-

CCAAAGGCGTAAAACAGGAA-3’; Tm=60C. H3K4me3 enrichment between TERT mutant and wild-type samples was compared using a Wilcoxon Rank Sum test with the

TERT-450 and TERT-639 primer sets (P-value < 0.05).

ChIP for GABP, ETS1, and ETV4 was performed using the ActiveMotif High Sensitivity kit. GBM1, U251, HepG2, SK-N-SH, A375, MEL-SK-28, and MEL-SK-1239 were grown to 80% confluency in 15cm plates and fixed with 4% formaldehyde. Chromatin was sonicated to a size range of 200-1500bp by the Diagenode Biorupter. 14-20ug of chromatin was used per GABPA (Santa Cruz Biotechnology: sc-22810), ETV4 (Aviva

Systems Biology: ARP32263_P050), ETS1 (Santa Cruz Biotechnology: sc-350) and IgG control (Cell Signaling: 2729) immunoprecipitation for each cell type. Enrichment at the

TERT promoter was determined by qPCR with the ssoAdvanded Universal SYBR

Green Supermix (Biorad). The following primer sets were used: TERT+47 (forward: 5’-

GCCGGGGCCAGGGCTTCCCA-3’; reverse: 5’-CCGCGCTTCCCACGTGGCGG-3’;

Tm=74C), TERT-5329 (above), STAT6 (forward: 5’- CACTGAGACACGCCAAGAAA-3’;

90 reverse: 5’-AATGGGCTCTGTTTGTGAGC-3’; Tm=60C), PPP1R12B (forward: 5’-

CAGCTTTAGGGCAAAGGTGA-3’; reverse: 5’- GTGGCCGTAAGGGAACTACA-3’;

Tm=60C), ZNF333 (forward: 5’- TGAAGACACATCTGCGAACC -3’; reverse: 5’-

TCGCGCACTCATACAGTTTC-3’; Tm=60C). 1M of Resolution Solution (Roche GC-

Rich kit) was added to each standard ssoAdvanced qPCR reaction. PCR was carried out on the Applied Biosystems 7900HT Fast Real-Time System. Three replicate PCR reactions were carried out for each sample.

4.5 DEFINING PROMOTER AND ENHANCER STATES

Pre-aligned ChIP-Seq data for H3K4me3, H3K4me1, and H3K27ac histone modifications was downloaded for adult Inferior Temporal Lobe, Hippocampus Middle,

Mid Frontal Lobe, Cingulate Gyrus, and Anterior Caudate from the Human Epigenome

Atlas (http://www.genboree.org/epigenomeatlas/multiGridViewerPublic.rhtml).

Genome-wide active promoter and enhancer states were generated from the aligned primary GBM and adult normal brain ChIP-seq data using ChromHMM v1.03 (Ernst et al., 2011). The default parameters were used to binarize the bed files (chromHMM.jar binarizeBed), and the following parameters were used to learn the HMM Model: -xmx3g chromHMM.jar LearnModel 5 hg19. The hidden state showing co-occurrence of high

H3K4me3 and H3K27ac marks was assigned as an ‘Active Promoter.’ Similarly, the state showing co-occurrence of high H3K4me1 and H3K27ac but no H3K4me3 was assigned as an ‘Active Enhancer.’

91 4.6 CALLING SOMATIC MUTATIONS IN PRIMARY GBM

Primary GBM and patient-matched normal DNA WGS data for 42 patients were downloaded from the TCGA project’s Cancer Genomics Hub (www.cghub.ucsc.edu; dbGAP Study Accession number: phs000178; refs. 39, 40).

Single-nucleotide variants (SNV) were detected with MuTect, a Bayesian framework for the detection of somatic mutations (53). Somatic and germline SNVs were filtered according to MuTect defaults [germline variants were kept at a LOD(N) threshold of

2.3]. SNVs for TCGA GBM samples aligned to hg18 were converted to hg19 coordinates using the UCSC liftover tool (54). For technical reasons, one GBM sample had SNV calls only from chromosomes 1–7, and another GBM sample had SNV calls only from chromosomes 1–12. These samples were removed from subsequent analysis.

4.7 COMPUTING DIFFERENTIAL TF BINDING AFFINITY

For each position with an observed nucleotide substitution, a 51bp segment of DNA centered at the position was used for a Position Specific Scoring Matrix (PSSM) scan.

PSSMs for a subset of liver expressed TFs (Table 3.5) were obtained from the publicly available JASPAR and TRANSFAC databases. For each PSSM of length L, a ‘mark score’ was calculated for all subsequences of length L within the 51bp DNA segment that overlapped the central position. A mark score for the reference subsequence (Sr) and the mutant subsequence (Sm) were calculated as:

92 Scorer =

Scorem =

Where B is the background distribution of nucleotides in the genome. TF motifs were only considered if either Scorer or Scorem was greater than the relative entropy score of the TF. Finally, the TF with the largest absolute score change between Scorer and

Scorem is listed in Table 3.5. The relative entropy is defined as:

RE =

where mi,j is entry in row I and column j of the PSSM.

4.8 CELL CULTURE

GBM1, U251, and HepG2 cells were cultured in DMEM/Ham’s F-12 1:1 media, 10%

FBS, 1% Penicillin/Streptomycin. GBM1 cultured cells were kept under 20 passages following the primary tumor tissue dissociation. A375, MEL-SK-28, MEL-SK-1239, and

SK-N-SH were cultured in RPMI-1640, 10% FBS, 1% Penicillin/Streptomycin, and 1%

Sodium Pyruvate. All cells were maintained at 37 degrees Celsius, 5% CO2.

4.9 CELL PROLIFERATION ASSAY

GBM1 and U251 cells were transfected with siRNA for GABPA and a scramble control from Dharmacon using the standard Dharmafect 1 protocol. Briefly, cells were seeded at a density of 30,000 cells/mL in a 6-well plate. 24-hours post-seeding, cells were

93 transfected with 50nM of siRNA and 2uL of Dharmafect 1 reagent. At t=0, 24, 48 and 72 h post-transfection, cells were trypsinized, collected and counted on a hemocytometer with trypan-blue exclusion.

4.10 FLOW CYTOMETRY

Following transfection of GBM1 and U251 cells with the GABPA siRNA or the scramble siRNA control from Dharmacon, cells were trypsinized and collected at timepoints 0, 24,

48 and 72 h post-transfection. Cells were then fixed with 70% ethanol and stained with a propidium iodide/triton X-100 staining solution containing DNase-free RNaseA. Cells were acquired and analyzed using the Beckman Dickson FACSCalibur flow cytometer.

At least 10,000 events were collected for U251 cells and at least 35,000 cells were collected for GBM1 cells. The cell cycle subpopulations were determined using FlowJo, utilizing the Dean-Jett-Fox algorithm.

4.11 LUCIFERASE ASSAYS AND SITE-DIRECTED MUTAGENESIS

The TERT core promoter sequence was cloned into the pGL4.10 Promega dual luciferase vector using the Cold Fusion Cloning kit. Site-directed mutagenesis of the

TERT-pGL4.10 construct was performed using the QuikChange Lightning Site Directed

Mutagenesis kit (Agilent Technologies). Cloning and mutagenesis primers are listed in

Table 3.2. Transfection of reporter plasmid was carried out with the XtremeGene-HP

(Roche) transfection reagent. Briefly, GBM1 and U251 cells were seeded at a density of

30,000 cells/mL in a 96-well plate. 24-hours post-seeding cells were transfected with

90ng vector, 9ng pGL4.74 (renilla control), and 0.3ul of XtremeGene-HP reagent. 48-

94 hours post transfection, firefly luciferase activity was measured by using the Dual-

Luciferase Reporter assay system (Promega) and normalized against renilla luciferase activity. All experiments were performed with six replicate wells, and each experiment was repeated at least three times. Each plate was normalized by the positive control pGL4.13 construct. Replicate experiments were scaled to the average wild-type-TERT- pGL4 construct.

For each deletion, 18 measurements were organized into a 3x6 table, corresponding to

3 biological replicates and 6 technical replicates. We median polished the table to obtain the overall effect of a given deletion. After removing batch effects by subtracting the row effects, standard deviation around the overall median polished effect was estimated, discarding top two and bottom two outliers. The sinusoidal fitting was performed using the model asin(2π(x − b)/10.5) + cx + d in Mathematica.

4.12€ SIRNA KNOCKDOWN AND RT-QPCR siGenome and On-Target Plus siRNA pools were obtained from Dharmacon (siRNAs listed in Table 3.3). siRNAs were transfected into GBM1 and U251 cells using the standard Dharmafect1 protocol. Briefly, cells were seeded at a density of 30,000 cells/mL in a 96-well plate. 24-hours post-seeding cells were transfected with 50nM of siRNA and 0.3uL of Dharmafect 1 reagent. At 24, 48, and 72 hours post-transfection, cells were lysed, cDNA was generated, and qPCR was performed to measure gene expression by the POWER SYBR Green Cells-to-Ct kit (Ambion). All experiments were performed in duplicate and each experiment was replicated at least three times. Primers

95 used: GUSB, TERT2, ETS1, ETS2, ELF1, ELF2, ELF4, ETV1, ETV3, ETV4, ETV5,

ELK1, ELK3, ELK4, GABPA (Primers are listed in Table 3.4). Each sample was measured in triplicate on the Applied Biosystems 7900HT Fast Real-Time System.

Melting curves were manually inspected to confirm PCR specificity. Relative expression levels were calculated using the deltaCT method against GUSB. Each plate was internally normalized to no-siRNA control wells. Replicate experiments were scaled to the average scramble-siRNA control values.

4.13 RNA-SEQ GENE EXPRESSION ANALYSIS

Level 3 RNA-seq data was obtained for GBM, SKCM, LIHC, and BLCA samples from the TCGA data portal(57). Normalized RSEM values were used to generate boxplots of gene expression for each ETS factor. Processed read counts per gene in U251 cells were obtained from Gene Expression Omnibus GSE53220(135) RNA-seq from GBM1 and GBM9 cultured cells was generated in-house and analyzed with TopHat and

Cufflinks(136, 137). Exome-seq from GBM1 and GBM9 was analyzed as previously described(138).

4.14 ALLELE-SPECIFIC ANLYSIS OF CHIP DNA AND RNA-SEQ READS

ChIP-DNA from GABPA or H3K4me3 immunoprecipitation and matched input control

DNA was used to amplify the TERT promoter as described above. PCR products were extracted from an agarose gel using the QiaQuik Gel Extraction Kit (Qiagen). Purified

PCR products were subsequently cloned into the pCR2.1/TOPO vector (Invitrogen). At least 12 individual bacterial colonies were amplified with PCR using vector specific

96 primers and sequenced using an ABI 3700 automated DNA sequencer. Allelic bias between ChIP DNA and matched input control DNA was calculated by Fisher’s exact test. Combined P-values were generated by the Stouffer method.

To measure expression in GBM1 and GBM9 we identified informative SNPs within the

TERT gene body by analyzing existing exome-seq data. RNA-seq read pileups were calculated at these positions in the primary tumor as well as matched primary culture cells. The proportion of each allele present in RNA-seq reads was compared to that of matched exome-seq reads by Fisher’s exact test. Combined P-values were generated by the Stouffer method.

4.15 BUFFERS AND DNA CONSTRUCTS FOR SINGLE-MOLECULE PROTEIN

BINDING ASSAY

DNA and GAPBA/GAPBB imaging buffer consisted of 50mM KCL in 25mM Tris (pH

7.5). The ETV4 assay utilized the previous buffer conditions supplemented with 5mM

MgCl2. For single molecule imaging, 0.8 mg/ml glucose oxidase, 0.625% glucose, 3 mM

6-hydroxy-2,5,7,8-tetramethylchromane-2-carboxylic (Trolox), and 0.03 mg/ml catalase were added to the buffers.

Oligonucleotides required to make the partial DNA duplex substrates were purchased from IDT with either Cy3 or Cy5 dyes as internal dye modification. Additionally, the CY5 strand contained a 3' biotin modification. DNA constructs were prepared by mixing the

Cy3 sequence with the CY5/biotin sequence at a molar ratio of 1:1.5 in 20mMTris-HCl

97 pH 7.5, 5mM MgCl2 and incubating at 95 C for 2 min then slowly cooling to room temperature for 2 hr.

4.16 FLUORESCENT LABELING OF GABPB AND ETV4

GABPB and ETV4 (abNova) were reacted with Alexa647 (NHS-ester) in a 1:20 protein to dye ratio for 1 hour in 100 mM sodium biacarbonate buffer, pH 8.5. Excess dye was removed using a micro Bio-spin 6 column (Biorad) twice. Alexa647 labeled proteins were used immediately after labeling.

4.17 SINGLE MOLECULE FLUORESCENCE DATA ACQUISITION

Single molecule fluorescence experiments were carried out on quartz slides

(Finkenbeiner). To minimize surface interactions with the protein, quartz slides and coverslips were coated with polyethylene glycol(139). Duplex DNA molecules were immobilized on the PEG-passivated surface via biotin-neutravidin interaction. Excess

Cy3 molecules were washed away with reaction buffer. Imaging was initiated before protein mix was applied to capture the moment of protein binding to DNA. GABPA and

Alexa647s-GABPB were premixed and incubated 5 minutes before flowing into imaging chamber. GAPBA/GAPBB and ETV4 concentrations were held at 30 nM for all experiments. All experiments and measurements were carried out at room temperature

(23 ± 1oC).

4.18 PEAK MOTIF ENRICHMENT ANALYSIS

98 To determine the enrichment of motifs in ETS factor ChIP-seq peaks, we first downloaded all available ENCODE peak annotation files for 10 cell lines (A549,

GM12878, Hek293, HeLa-S3, HepG2, HL-60, HTC-116, K562, MCF-7 and SK-N-SH) and five ETS factors (ELF1, ELK1, ELK4, ETS1 and GABPA, see Table 3.5) (52). To complement this, we also downloaded published ChIP-seq data for ETV1, ETV3, ETV4 and GABPA(126, 140) (see Table 3.6) and called peaks using MACS2(141), generating both broadPeak and narrowPeak files.

The consensus binding motifs of these ETS factors were determined using MEME

(version 4.10.0) (142). To identify reliable peaks, we first selected the top 500 peaks in each data set. For the ENCODE sets with replicate peak calls we trimmed this list further by taking the base pair-wise intersection across the replicates. For the non-

ENCODE data sets we used the narrowPeak files. MEME was then instructed to identify up to 6 motifs with lengths 6-15bp in each peak set. Motifs corresponding to simple repeats were removed manually.

To determine the relative enrichment of the motifs CCGGAG (WT) and CCGGAA (MUT) in ChIP-seq peaks, we then counted the number ��(�) of occurrences of motif type m in each peak i. The local background occurrence ��(�) was estimated by the motif count in peaks shifted one peak length to the left and right. To clarify the individual importance of the two motifs, we discarded all peaks containing the WT motif while calculating the

MUT enrichment and vice versa. The peaks were then sorted by enrichment score

(column 7 in the ENCODE files) and assigned a rank �∈[0,1], highest enrichment being

99 zero. These ordered counts were smoothed using a Gaussian smoothing kernel of width

500 peaks, giving ��(�) and ��(�). The motif enrichment 2��(�)/��(�) was finally plotted as a function of r. This analysis was performed for GABPA, ELF1, ETS1 and ETV4 using the broadPeak files. We also repeated the analysis for ETS1, comparing the consensus motif TGTAGT to the mutated motif TCTAGT, and for ETV4, comparing the consensus motif TGANTCA to the mutated motif TATAGT.

To check if the high enrichment of the MUT motif in GABPA ChIP-seq peaks was due to preferential GABPA binding to open chromatin, we calculated the frequency (per base pair) of the WT and MUT motifs in HepG2 GABPA and ELF1 ChIP-seq peaks as well as in DNase I hypersensitive sites (see Table 3.5). As above, we discarded all peaks containing the WT motif when calculating the MUT frequency and vice versa. The motif frequency was calculated separately for each quintile of the peak enrichment score.

The number of CCGGAA motifs necessary for strong ETS factor binding was determined by first classifying the peaks by the number of motifs occurring within 100bp of the peak center and then plotting the distribution of the Log10-transformed enrichment scores in each class of peaks using a Gaussian smoothing kernel of width 0.05. The

ENCODE enrichment scores appeared to be clipped at 3100. Peaks with the palindromic sequence TTCCGGAA were discarded to avoid potential over counting. For peak annotation we used consensus peaks identified by taking the intersection between replicates and averaging the enrichment scores. The distribution of enrichment scores was then calculated for each of GABPA, ELF1 and ETS1 after pooling the consensus

100 peaks across the cell lines A549, GM12878, K562, HepG2 and SK-N-SH (the last two were not available for ETS1).

4.19 SPACING DEPENDENCE OF CHIP ENRICHMENT

We determined how the distribution of motif separations differs between weak and strong ChIP peaks by first identifying peaks with exactly two motifs within 1kb of the peak center and then, using the peak enrichment score, classifying these peaks as either strong (top quartile) or weak (lower three quartiles). Peaks with overlapping motif combinations were discarded. The distribution of distance between motif centers was plotted for each class of peaks using Gaussian smoothing kernel of width 1.25bp. To calculate the null distribution of spacing, we tiled the genome with 2kb non-overlapping windows and repeated the analysis, discarding windows overlapping ChIP-seq peaks or genome assembly gaps. We also displayed the theoretical probability density function

2(�−�)/�2 for the spacing � between two points sampled with uniform probability in d=2kb interval. This analysis was performed both for the GABPA consensus ChIP-seq peaks in HepG2 and for each of GABPA, ELF1 and ETS1 after pooling the consensus peaks across cell lines as above.

To calculate how the enrichment score depends on the motif spacing between the motifs, we first identified all peaks with exactly two motifs within 200bp of the peak center. After sorting these peaks by motif spacing, calculated the median enrichment score for each spacing using a sliding window (in spacing) of width 3bp. A 5% confidence interval of the median was estimated using bootstrap resampling with

101 10,000 repetitions. This enrichment profile was calculated for each of GABPA, ELF1 and ETS1 after pooling the consensus peaks across cell lines as above.

4.20 ALLELE-SPECIFIC BINDING FROM ENCODE DATA

We downloaded raw ENCODE ChIP-seq and digital genomic footprinting (DGF) reads

(see Table 3.6) and aligned them to the human reference genome (hg19) using bowtie2(92). For each base pair in the reference, we tabulated the occurrence of each nucleotide among the covering reads and then classified the mutation status of the nucleotides at positions 228 and 250. These categorized read-coverage counts were then displayed as bar charts.

102

CHAPTER 5.

DISCUSSION

103

5.1. POTENTIAL FOR FRAMEWORK EXPANSION

Although the primary purpose of the constructed analysis framework was to identify functional single nucleotide mutations in cancer relevant promoters and enhancers, a notable strength of the framework is its inherent capacity to be expanded and applied to a variety of scientific questions related to cRE activity. This can be observed in the wide variety of questions to which we have successfully applied the framework (chapter 2.4) beyond our primary research question.

All the work described here was restricted to analyzing promoter and enhancer cREs, but there are many other functional classes of cREs in the human genome. By integrating more histone modifications as well as transcription factor binding site data, the same analysis tools can be used to define other discrete cREs such as insulator and silencer elements(23, 51). As an alternative approach, methods such as DNase-seq and ATAC-seq can profile all areas of focal open chromatin within a single experiment(98, 99). ATAC-seq is additionally powerful as it can capture nucleosome positioning data as well, providing a higher resolution as to which base pairs are amenable to TF binding within each cRE. The caveat to these techniques is that they currently lack a method to distinguish the various cRE subclasses from each other.

Likely, a combination of strategies will prove the most useful in the future.

The proposed framework can also be expanded to evaluate genomic alterations beyond single nucleotide variants (SNVs). Whole-genome sequencing (WGS) data can be analyzed to identify multiple classes of genomic alterations such as SNVs, indels, copy

104 number alterations (CNAs), and chromosomal rearrangements. Recently, recurrent somatic indels were identified in an enhancer that alters binding of the Myb transcription factor in T-cell acute lymphoblastic leukemia(143). Indels are more likely to alter cRE activity as adding or removing multiple base pairs within a TF-binding site is statistically more likely to affect binding than a SNV. As the accuracy of indel callers improves, it will be interesting to evaluate the effect of indels on cRE activity in a similar genome-wide manner.

5.2 AREAS OF IMPROVEMENT

By applying this framework, we were able to elucidate many novel discoveries about the role of promoter and enhancer activity in GBM, as well as the role of cRE mutations in cancer. However, there are places in which the current framework could be improved to obtain even further insight. As mentioned previously, tissue-specific cRE definitions can be improved and expanded to a larger set of functional classes by incorporating more histone modification and TF ChIP-seq data. Integrating high-resolution open-chromatin data such as ATAC-seq will allow for further prioritization of nucleotides within each cRE.

Additionally, our epigenomics approach to defining cREs does not inherently link each cRE to a target gene. By using the ‘nearest-gene’ approach we successfully capture the majority of promoter-gene pairs and many enhancer-gene pairs. However, the confidence in correctly assigning a target gene decreases as a function of distance to the transcription start site (TSS). One method to increase the accuracy of distal cRE-

105 gene assignments is to correlate cRE activity and nearby gene expression across a large cohort of samples. Although this method was not applied in this project due to small sample size, it has been successfully used by both ENCODE and the Roadmap

Epigenome project(52, 56). Perhaps the best approach to linking cREs to their proper target is to perform chromatin conformation capture assays in conjunction with epigenomic profiling (see chapter 2.1.iv)(144). This method allows for identification of both cREs and the TSSs they physically interact with on a genome-wide scale.

Lastly, although we were able to successfully detect TERT promoter mutations as the most recurrently mutated cRE in GBM, there are many ways in which the search for recurrence can be improved. Our current method tallies the number of patients in which a given cRE is mutated, either on a SNV level or cRE level. This enriches for true driver mutations, but also enriches for cREs prone to false positive mutation calls. Indeed, aside from the TERT promoter, the 10 next most commonly mutated cREs were all found to reside in difficult to map regions based on their DNA sequence content. We did follow up with the CNTN4 promoter, which was found to be mutated in 4/40 of the

TCGA GBM samples. Interestingly, an alternative promoter of CNTN4 was mutated in

3/40 additionally samples. In contrast to TERT however, the CNTN4 promoter mutations were distributed across 4.8Kb and were not clustered around similar transcription factor binding sites.

To validate whether CNTN4 is a target of recurrent promoter mutation, we PCR amplified and Sanger sequenced the CNTN4 promoter in an independent set of 27

106 GBMs obtained from the UCSF tissue core. While we did detect mutations in two cases, they were at new positions in the promoter suggesting the mutations were random passenger events. We conclude that CNTN4 is another false positive detected due to the above average size of the CNTN4 promoter. This work highlights the importance of evaluating statistical significance when detecting recurrent cRE mutations. As mentioned previously, the expectation of observing a random mutation increases with cRE length. Furthermore, the background somatic mutation rate is not consistent across the genome, but has been shown to vary based on many covariates such as replication timing and chromatin status(145). It is critical to account for these covariates and use a proper local estimate for background mutation frequency when evaluating significant recurrence. This approach has been recently used to look at recurrent cRE mutations across a side variety of TCGA cancer samples(146).

5.3 DETECTING OTHER CIS-REGULATORY ELEMENT DRIVER MUTATIONS

Our genome-wide study of GBM promoter and enhancer mutations identified TERT promoter mutations as the only recurrent driver cRE mutations in GBM. All other hits of similar recurrence frequency contained hallmarks of false-positives calls such as variable base quality, low quality variant reads in normal, or residing in poor mapping regions of the genome. However, there have now been a series of systematic studies designed to identify cRE driver mutations in various cancers.

Fredriksson et al. investigated somatic cRE mutations across 15 cancer types and found TERT promoter mutations to be unique in their frequency of recurrence and

107 strong association to gene expression changes(147). In contrast, Weinhold et al. found recurrent cRE mutations upstream of PLEKHS1, WDR74 and SDHD. SDHD promoter mutations were associated with altered target gene expression(30). Most recently,

Melton et al. took a more rigorous approach to evaluating recurrence by adjusting for sample- and genomic locus-specific mutation rates(146). In addition to TERT, the authors identified eight new recurrently mutated cREs potentially regulating cancer- associated genes. The authors conclude that there is evidence for genome-wide positive selection for several cRE mutations in addition to those found in TERT promoter. It is important to note however that none of the other mutations detected in this study had a similar effect on cRE activity as TERT when tested in reporter assays.

Furthermore, TERT was the top hit identified in all three studies, highlighting these promoter mutations as potentially unique driver cRE events in cancer. This unique characteristic for promoter mutation may be due to the fact that telomerase function is tightly linked to the transcription rate of the TERT gene. A focused investigation of similarly dosage-sensitive genes would be a potential strategy to enrich for functional cRE driver mutation susceptible loci.

The original seminal work by TCGA on GBM demonstrated there is a large convergence of specific pathways disrupted by various genetic mechanisms within a cancer type. For example, in GBM, 88% of tumors have been found to have RAS/PI3K signaling altered either by exonic mutation, chromosomal rearrangement, or copy number alteration of some component of the pathway(14). It is likely that functional cRE mutations in these genes bring genetic alteration of the pathways closer to 100% of tumor patients. As

108 sample size increases, tissue-matched datasets continue to grow, and algorithms for detecting statistical recurrence in cREs are improved, we will continue to gain a more accurate understanding of the role cRE mutations play in cancer as a whole.

5.4 FUTURE DIRECTIONS FOR STUDYING TERT ACTIVATION BY GABP

Although we have determined the transcription factor responsible for binding and activating the mutant TERT promoter in GBM and three other cancer types, many questions still need to be addressed. There are still a series of unanswered questions related to the specificity of GABP at the mutant TERT promoter. For example, while our work has proven GABP is the only ETS TF capable of selectively regulating the mutant promoter, it remains unknown if GABP is the exclusive ETS factor regulating the mutant

TERT promoter in all cancer types. To address this, a similar siRNA screening approach should be applied to each cancer type individually. It is also unclear if other

ETS factors are capable of mutant TERT promoter binding and activation in the absence of GABP. Such a compensatory mechanism could provide a complication for therapeutic strategies designed to disrupt telomerase by inhibiting GABP function.

Performing pair-wise combination knockdowns of GABP with each other ETS factor will be a critical experiment to determining whether such compensatory mechanisms exist.

Finally, although we show GABP is incapable of binding to the wild type TERT promoter directly, it is still a possibility that GABP can regulate wild type TERT expression through secondary mechanisms. Repeating GABP knockdown experiments in TERT wild type cell lines that naturally express TERT will be crucial to determine if GABP inhibition will truly regulate TERT expression in a mutation dependent manner.

109

It is also unclear the extent to which the heterotetramer form of GABP is necessary for mutant TERT activation. Although it has weaker DNA binding affinity, the GABP heterodimer is still capable of activating transcription(148, 149). Our analysis of the ETS-

195 and ETS-200 native binding sites suggests they are cooperating with the mutation sites to recruit GABP, but this remains to be proven in-vivo. The possibility remains that

ETS-195 and ETS-200 play a critical role in binding of an as-of-yet unidentified co-factor at the TERT promoter. All of these questions can be addressed by editing the genome of cancer cells with CRISPR-Cas9 technology. For example, it is possible to delete specific transcript isoforms of GABPB so that a heterodimer can still form, but a heterotetramer cannot.

Determining which other transcription factors or transactivating cofactors are binding mutant TERT and cooperating with GABP to activate transcription may lead to the identification of novel targets for disrupting mutant TERT activity. It will be critical to compare the protein landscape bound to the wild type promoter to those observed at the mutant promoter in order to identify any mutation-dependent events. For example, preliminary analysis of ENCODE ChIP-seq in HepG2 and SK-N-SH cells shows binding of the MAX transcription factor directly downstream of GABP in the TERT promoter.

However, this is also observed in embryonic stem cells implying MAX is necessary for both mutant and wild type TERT regulation. As a complimentary analysis, it would be interesting to systematically determine which nucleotides are functionally relevant in

110 both a wild type and mutant context. This could be accomplished by utilizing the MPRA technique employed in chapter 2.4.iii.

It remains unclear how GABP function is regulated by upstream signaling pathways within the context of TERT promoter mutant cancer cells. GABP is primarily regulated by the control of its nuclear localization. Both MAPK and Hippo signaling pathways have been demonstrated to modulate GABP activity through post-translational modification and nuclear localization(150, 151). Indeed, EGFR amplification and BRAFV600E mutation, both MAPK activating events, have been shown to significantly co-occur with

TERT promoter mutations in GBM and melanoma, respectively(32, 37). Inhibiting these pathways may prove to be viable precision medicine strategies for mutant TERT promoter inhibition. It is likely possible that the specific pathway driving GABP activity depends on the driver mutations within each individual patient.

The TERT promoter contains a specific G-rich sequence predicted to form a non- canonical secondary structure called a G-quadruplex (GQ)(133, 134). GQ motifs exist in many promoter sequences, and the formation of a stable GQ is thought to repress transcription by blocking RNA Pol II recruitment(152). The TERT promoter sequence has been experimentally proven to form GQs in-vitro, although this has yet to be confirmed in-vivo. Interestingly, the canonical G228A and G250A hot spot positions lie directly within this GQ sequence motif. Recent work has shown that the promoter mutations can alter the GQ stability of the TERT promoter in-vitro, highlighting another potential mechanism by which the mutations can regulate expression(153). It is

111 currently unknown whether GQ stability directly affects the ability of GABP to bind to the

TERT promoter.

Finally, it is still unknown if inhibiting GABP either by experimental or pharmacologic methods will result in attenuated cancer cell growth and increased apoptosis. If this proves to be the case, it will be critical to elucidate if the toxicity phenotypes are dependent on TERT regulation or due to other normal functions of GABP. Inhibiting telomerase directly can slow tumor proliferation in many cancer cell lines, giving strength to the hypothesis that targeting GABP in TERT mutant cells will result in a similar phenotype(154). It will be necessary to determine if any GABP phenotype is

TERT mutation dependent in order to evaluate whether GABP inhibitors can act as a cancer cell-specific strategy for telomerase disruption.

5.5 CLINICAL IMPLICATIONS

Over 90% of aggressive human cancers upregulate telomerase, highlighting this event as a fundamental step in tumorigenesis. If GABP inhibition at the TERT promoter kills tumor cells selectively harboring the promoter mutations, GABP may provide a precision medicine target to inhibit telomerase in tumor cells without inhibition in healthy normal cells. Below is a brief summary of the past attempts to inhibit telomerase in the clinic, as well as a description of possible strategies for targeting GABP function in TERT mutant cancers.

5.5.i History of telomerase inhibition

112 Many approaches have been taken over the last few decades to block telomerase activity in cancer patients, but none have made it beyond clinical trials. One early strategy was to block the ability of telomerase to bind to telomeres, by stabilizing G- quadruplex structures that occur within the telomeres themselves. Although some GQ stabilizers have shown to have an anti-proliferative effect, there are many GQ’s throughout the genome and acquiring a potent telomere-specific GQ stabilizer has proved difficult(155-158). Another popular strategy was to design a telomerase-targeted immuno-therapy. The most successful version of this, GV-1001 designed by Korea- based developer KAEL-GemVax, is an injectable MHC Class II peptide derived from the active site of telomerase(156, 159). GV-1001 was well tolerated in patients and made it to phase II trials in HCC and phase III trials in pancreatic cancer(160, 161).

Unfortunately, no significant change in overall survival was detected in either of these trials.

Perhaps the most successful method to inhibit telomerase in cancer has been blocking the RNA component of telomerase, TERC, by hybridizing TERC with a synthetic oligonucleotide. This is the approach GERON has taken with their GRN163L telomerase inhibitor. GRN163, or imetelstat, has been tested in primary GBM neurospheres and shown to decrease proliferation and clonogenicity(162). Furthermore, it synergizes with radiation and Temozolomide (TMZ) treatment to decrease cell viability to a greater extent than radiation+TMZ alone. These and other exciting preclinical results led to Geron’s initiating a series of clinical trials in various indications, including non-small cell lung cancer (NSCLC) and breast cancer phase II clinical trials(163).

113 Unfortunately, these trials were all halted due to severe toxicities observed in the imetelstat treatment arms. Approximately 50% of treated NSCLC patients displayed grade 3 or higher neutropenia or thrombocytopenia. The breast cancer trial was halted after observing significantly worse progression free survival and increased incidence of patient death in the imetelstat treated arm. Concerns of haemotoxicity led to dose reductions in the standard of care treatment which is the hypothesized cause of the increased death rate. Imetelstat was also tested in children with recurrent central nervous system tumors but was also determined to be too toxic(164).

In all these cases, myelosuppression was the primary dose-limiting toxicity for imetelstat. This is consistent with the telomerase inhibitor having a potent effect on the health hematopoeitc progenitor population within each patient. While this effect is undesired when treating solid tumors, it is often considered a benefit when attempting to treat hematological cancers that result in an excess of blood cells. This is likely why

Geron has restricted all its remaining clinical trials to hematologic cancers, and is finally observing success. Geron currently has an ongoing phase II trial testing the efficacy of

Imetelstat in treating myelofibrosis. Furthermore, a recent pre-clinical research study has shown that Imetelstat impairs progression and delays relapse in a mouse xenograft model of human AML(165). The authors propose that the mechanism of action is through elimination of the leukemic-stem cell fraction that is dependent on telomerase.

Of interest, the authors observe that the phenotype of Imetelstat treatment is blocked by p53 mutation. It thus may be important to stratify patients based on p53 mutation status when considering whom to include for any future telomerase inhibition clinical trials.

114

5.5.ii Strategies for targeting GABP

There are multiple approaches that can be taken to create a potent GABP inhibitor but as each approach has its own strengths and weaknesses, it is likely best to pursue multiple in parallel to increase the chances of success. One popular method is to perform an unbiased screen against a large library of individual compounds. Such a screen requires there to be an assay which can test a given drugs efficacy and that is also amenable to high-throughput scaling. Both cell-based and in-vitro protein binding- based assays can be developed to test for GABP inhibitors. These initial assays should help prove initial proof-of-mechanism experiments.

A complimentary approach would be to design a small molecule or small peptide based on existing knowledge of the target. The crystal structure of the GABP heterodimer binding to DNA has been solved and could be used to design compounds both for DNA- protein disruption, as well as GABPA-GABPB heterodimer disruption(148). Finally, as genome-editing technologies continue to be refined, it may become possible to target the mutant TERT-GABP interaction with CRISPR-Cas9 technology. A guide RNA specific to the mutant TERT promoter sequence but not the wild-type would could allow for blocking of GABP binding by CRISPRi, or even directed mutation repair within the cancer cells.

There are also multiple different mechanisms that could be targeted to disrupt the mutant TERT-GABP interaction, thus increasing the chances of finding a functional

115 inhibitor. The DNA-protein interface of transcription factors has been historically challenging to target with small molecules. However, the GQ folding ability of the TERT promoter may allow for manipulation of GABP binding through GQ stabilizing drugs.

Many compounds within the class of porphyrin molecules have been shown to stabilize

GQ formation, and this class of molecules should be included in any drug discovery endeavors. Aside from the DNA-protein interaction site, GABP requires multiple protein- protein interactions to function as a transcription factor. The interaction between

GABPA-GABPB and the interaction between GABPB-GABPB are both required for heterotetramer formation(148). GABP could also rely on binding to additional co-factors at the TERT promoter. Disrupting any of these protein-protein interaction sites may prove a viable strategy for GABP-TERT inhibition.

Lastly, GABP is primarily regulated at the post-translational level by controlling the amount of GABP present in the nucleus versus the cytosol. Various upstream signaling pathways such as MAPK- and Hippo-signaling can regulate this nuclear localization. For example LATS1, a kinase in the Hippo pathway phosphorylates GABP and inhibits is ability to translocate into the nucleus(150). It will be critical to examine known modulators of these signaling pathways and assess their ability to affect TERT expression through GABP regulation. It may also be possible to design small molecules or peptides, such as intrabodies or transbodies, capable of binding GABP in the cytosol and preventing its nuclear localization(166, 167). Recent techniques have also been developed to target intracellular proteins for degradation by conjugating a phthalimide group to a targeting ligand(168).

116

5.5.iii Caveats to targeting GABP

While GABP has the potential to provide a cancer cell-specific target to inhibiting telomerase, there are many caveats and difficulties that need to be addressed before a compound can be brought to clinic. As mentioned above, transcription factors have been historically difficult to inhibit with small molecules. If a compound is discovered in initial drug screens, drug delivery may pose a new challenge depending on the cancer type chosen for initial testing. It also remains unclear if blocking GABP will inhibit telomerase enough to kill cancer cells.

Aside from providing pre-clinical evidence for efficacy of a GABP inhibitor, the inherent toxicities of inhibiting GABP in an adult body have not been characterized. The farther away a given modality of inhibition is from the TERT-GABP interaction, such as upstream pathways, the greater the chance of observing off-target toxicities associated with treatment. Furthermore, normal GABP function is essential for proper development, and is embryonic lethal in mouse(169, 170). While the adverse effects of systemic

GABP inhibition in an adult are unknown, extensive pre-clinical testing will be required.

If GABP proves to be targetable, and its inhibition proves to be tolerable in adults, it is still unclear if successful inhibition will affect overall survival in patients. As the expected phenotypic effect relies on telomere shortening to reach cell crisis, GABP inhibition may require a long duration of treatment before an effect is observed. In cancer types with exceptionally poor prognosis, this may be too long of period to wait for efficacy.

117 Furthermore, mechanisms of resistance to GABP inhibition need to be carefully studied.

It is possible that an increase in genomic stability induced by telomere crisis could lead to a form of hypermutation. This could result in accelerated evolution of resistant tumor cell clones that rely on alternative methods of telomere maintenance such as ATRX mutation or TERT genomic amplification. A similar effect has been observed in low grade glioma patients receiving TMZ as therapy(138). While TMZ has an overall survival benefit in GBM, a subset of low grade gliomas are subject to hypermutation upon TMZ treatment, which facilitates progression to a more aggressive grade of glioma.

5.5.iv Potential indications for clinical trials

As the majority of work on the TERT-GABP interaction has been performed in GBM, it is likely one of the best indications to pursue for clinical trials. GBM is the most common and aggressive form of adult brain tumor. Tumors are histologically diagnosed by their four defining characteristics. They are highly 1) proliferative and 2) invasive, and demonstrate massive amounts of 3) angiogenesis and 4) necrosis. The standard of care, surgical resection followed by adjuvant TMZ and radiation treatment, still results in an average survival of only 12-15 months highlighting the extent of unmet medical need(13). If a GABP inhibitor given alone or as adjuvant therapy were able to extend life by even 2-3 months it would be considered a major advancement in the field.

However, GBM may provide additional difficulties with initial drug discovery, as successful compounds need to be able to cross the blood-brain barrier. There are many

118 other cancer types with high unmet medical need which harbor TERT promoter mutations. For example, the median survival of hepatocellular carcinoma (HCC) patients ranges from 1.6 months to 2.4 years depending on the stage and treatment modality(171). The median survival for HCC patients not eligible for liver resection or transplantation ranges from 1.6-9.4 months. Additionally, the five-year survival rate for the most malignant form of bladder cancer is 50%(172). Both of these cancer types should be much easier settings for drug delivery and may be ideal initial indications to test GABP inhibitor efficacy. As the TERT-GABP interaction is a common mechanism across multiple cancer types, it is likely wise to design a basket trial for initial phase 1 clinicial trials. Potential indications include glioblastoma, oligodendroglioma, malignant melanoma, hepatocellular carcinoma, urothelial carcinoma of the bladder, neuroblastoma, medulloblastoma, and myxoid liposarcoma that have been positively genotyped for TERT mutation status. By designing the trial to include multiple TERT mutant cancers, the results of the phase 1 study will help inform which indications will be optimal for a phase II trial.

This study establishes a framework for future investigators to systematically identify functional cRE mutations, and dissect their mechanism of action. Applying this framework to GBM uncovered TERT promoter mutations as the most common point mutation in all brain tumors. We identified GABP as the transcription factor responsible for binding and activating the mutant TERT allele. Future work on understanding the necessity of the mutant TERT-GABP interaction in established tumors will help elucidate the potential of GABP as a therapeutic target.

119

BIBLIOGRAPHY

120

1. M. Bulger, M. Groudine, Functional and mechanistic diversity of distal transcription enhancers, Cell 144, 327–339 (2011).

2. N. D. Heintzman, B. Ren, Finding distal regulatory elements in the human genome, Curr Opin Genet Dev 19, 541–549 (2009).

3. A. Visel, E. M. Rubin, L. A. Pennacchio, Genomic views of distant-acting enhancers, Nature 461, 199–205 (2009).

4. N. D. Heintzman et al., Histone modifications at human enhancers reflect global cell- type-specific gene expression, Nature 459, 108–112 (2009).

5. K. L. MacQuarrie, A. P. Fong, R. H. Morse, S. J. Tapscott, Genome-wide transcription factor binding: beyond direct target regulation, Trends Genet 27, 141–148 (2011).

6. L. Flintoft, Gene regulation: Enhancing the hunt for enhancers, Nat Rev Genet, 1–1 (2013).

7. C.-T. Ong, V. G. Corces, Enhancer function: new insights into the regulation of tissue-specific gene expression, Nature Publishing Group, 1–11 (2011).

8. B. Lenhard, A. Sandelin, P. Carninci, Regulatory elements: Metazoan promoters: emerging characteristics and insights into transcriptional regulation, Nature Publishing Group, 1–13 (2012).

9. X. Zhang, R. Cowper-Sal lari, S. D. Bailey, J. H. Moore, M. Lupien, Integrative functional genomics identifies an enhancer looping to the SOX9 gene disrupted by the 17q24.3 prostate cancer risk locus, Genome Res 22, 1437–1446 (2012).

10. B. Akhtar-Zaidi et al., Epigenomic enhancer profiling defines a signature of colon cancer, 336, 736–739 (2012).

11. R. P. Nagarajan et al., Recurrent epimutations activate gene body promoters in primary glioblastoma, Genome Res 24, 761–774 (2014).

12. X. Wu et al., CpG island hypermethylation in human astrocytomas, Cancer Res 70, 2718–2727 (2010).

13. J. Sul, H. A. Fine, Malignant gliomas: new translational therapies, Mt Sinai J Med 77, 655–666 (2010).

14. Cancer Genome Atlas Research Network, Comprehensive genomic characterization defines human glioblastoma genes and core pathways, Nature 455, 1061–1068 (2008).

15. R. P. Nagarajan, J. F. Costello, Epigenetic mechanisms in glioblastoma multiforme, Semin Cancer Biol 19, 188–197 (2009).

121 16. M. E. Hegi et al., MGMT gene silencing and benefit from temozolomide in glioblastoma, N Engl J Med 352, 997–1003 (2005).

17. S. A. Maas, J. F. Fallon, Single base pair change in the long-range Sonic hedgehog limb-specific enhancer is a genetic basis for preaxial polydactyly, Dev Dyn 232, 345– 348 (2005).

18. J. E. VanderMeer, N. Ahituv, cis-regulatory mutations are a genetic cause of human limb malformations, Dev Dyn 240, 920–930 (2011).

19. D. J. Epstein, Cis-regulatory mutations in human disease, Brief Funct Genomic Proteomic 8, 310–316 (2009).

20. Y. J. de Kok et al., Identification of a hot spot for microdeletions in patients with X- linked deafness type 3 (DFN3) 900 kb proximal to the DFN3 gene POU3F4, Hum Mol Genet 5, 1229–1235 (1996).

21. L. A. Lettice et al., Disruption of a long-range cis-acting regulator for Shh causes preaxial polydactyly, Proc Natl Acad Sci USA 99, 7548–7553 (2002).

22. T. A. Manolio, F. S. Collins, The HapMap and genome-wide association studies in diagnosis and therapy, Annu Rev Med 60, 443–456 (2009).

23. J. Ernst et al., Mapping and analysis of chromatin state dynamics in nine human cell types, Nature, 1–9 (2011).

24. M. De Gobbi et al., A regulatory SNP causes a human genetic disease by creating a new transcriptional promoter, Science 312, 1215–1217 (2006).

25. I. S. Lossos, R. Levy, Higher-grade transformation of follicle center lymphoma is associated with somatic mutation of the 5' noncoding regulatory region of the BCL-6 gene, Blood 96, 635–639 (2000).

26. G. L. Bond et al., A single nucleotide polymorphism in the MDM2 promoter attenuates the p53 tumor suppressor pathway and accelerates tumor formation in humans, Cell 119, 591–602 (2004).

27. A. Raval et al., Downregulation of death-associated protein kinase 1 (DAPK1) in chronic lymphocytic leukemia, Cell 129, 879–890 (2007).

28. R. C. Green et al., Germline hMLH1 promoter mutation in a Newfoundland HNPCC kindred, Clin Genet 64, 220–227 (2003).

29. K.-H. Shin, J.-H. Shin, J.-H. Kim, J.-G. Park, Mutational analysis of promoters of mismatch repair genes hMSH2 and hMLH1 in hereditary nonpolyposis colorectal cancer and early onset colorectal cancer patients: identification of three novel germ-line mutations in promoter of the hMSH2 gene, Cancer Res 62, 38–42 (2002).

122 30. N. Weinhold, A. Jacobsen, N. Schultz, C. Sander, W. Lee, Genome-wide analysis of noncoding regulatory mutations in cancer, Nat Genet 46, 1160–1165 (2014).

31. H. Ongen et al., Putative cis-regulatory drivers in colorectal cancer, Nature, 1–4 (2014).

32. S. Horn et al., TERT Promoter Mutations in Familial and Sporadic Melanoma, Science (2013), doi:10.1126/science.1230062.

33. F. W. Huang et al., Highly Recurrent TERT Promoter Mutations in Human Melanoma, Science (2013), doi:10.1126/science.1229259.

34. M. Remke et al., TERT promoter mutations are highly recurrent in SHH subgroup medulloblastoma, Acta Neuropathol 126, 917–929 (2013).

35. A. Quaas et al., Frequency of TERT promoter mutations in primary tumors of the liver, Virchows Arch (2014), doi:10.1007/s00428-014-1658-7.

36. N. Papadopoulos et al., TERT Promoter Mutations Occur Early in Urothelial Neoplasia and are Biomarkers of Early Disease and Disease Recurrence in Urine, Cancer Res, 1–18 (2013).

37. P. J. Killela et al., TERT promoter mutations occur frequently in gliomas and a subset of tumors derived from cells with low rates of self-renewal, Proceedings of the National Academy of Sciences 110, 6021–6026 (2013).

38. L. A. Pennacchio et al., In vivo enhancer analysis of human conserved non-coding sequences, Nature 444, 499–502 (2006).

39. N. Ahituv, S. Prabhakar, F. Poulin, E. M. Rubin, O. Couronne, Mapping cis- regulatory domains in the human genome using multi-species conservation of synteny, Hum Mol Genet 14, 3057–3063 (2005).

40. J. P. Noonan, A. S. McCallion, Genomics of long-range regulatory elements, Annu Rev Genomics Hum Genet 11, 1–23 (2010).

41. T. Vavouri, G. Elgar, Prediction of cis-regulatory elements using binding site matrices--the successes, the failures and the reasons for both, Curr Opin Genet Dev 15, 395–402 (2005).

42. M. J. Blow et al., ChIP-Seq identification of weakly conserved heart enhancers, Nat Genet 42, 806–810 (2010).

43. W. Lee et al., The mutation spectrum revealed by paired genome sequences from a lung cancer patient, Nature 465, 473–477 (2031).

44. G. A. Maston, S. K. Evans, M. R. Green, Transcriptional regulatory elements in the human genome, Annu Rev Genomics Hum Genet 7, 29–59 (2006).

123 45. R. P. Patwardhan et al., Massively parallel functional dissection of mammalian enhancers in vivo, Nat Biotechnol 30, 265–270 (2012).

46. M. A. Schaub, A. P. Boyle, A. Kundaje, S. Batzoglou, M. Snyder, Linking disease associations with regulatory information in the human genome, Genome Res 22, 1748– 1759 (2012).

47. A. P. Boyle et al., Annotation of functional variation in personal genomes using RegulomeDB, Genome Res 22, 1790–1797 (2012).

48. A. Rada-Iglesias et al., A unique chromatin signature uncovers early developmental enhancers in humans, Nature (2010), doi:10.1038/nature09692.

49. R. P. Zinzen, C. Girardot, J. Gagneur, M. Braun, E. E. M. Furlong, Combinatorial binding predicts spatio-temporal cis-regulatory activity, Nature 462, 65–70 (2009).

50. A. Rada-Iglesias et al., Epigenomic Annotation of Enhancers Predicts Transcriptional Regulators of Human Neural Crest, Stem Cell 11, 633–648 (2012).

51. J. Ernst, M. Kellis, ChromHMM: automating chromatin-state discovery and characterization, Nature Publishing Group 9, 215–216 (2012).

52. ENCODE Project Consortium, An integrated encyclopedia of DNA elements in the human genome, Nature 489, 57–74 (2012).

53. R. D. Hawkins et al., Dynamic chromatin states in human ES cells reveal potential regulatory sequences and genes involved in pluripotency, Cell Res 21, 1393–1409 (2011).

54. Y. Shen et al., A map of the cis-regulatory sequences in the mouse genome, Nature 488, 116–120 (2012).

55. B. E. Bernstein et al., The NIH Roadmap Epigenomics Mapping Consortium, Nature Publishing Group 28, 1045–1048 (2010).

56. A. Kundaje et al., Integrative analysis of 111 reference human epigenomes, Nature 518, 317–330 (2015).

57. C. W. Brennan et al., The Somatic Genomic Landscape of Glioblastoma, Cell 155, 462–477 (2013).

58. X.-S. Ke et al., Genome-wide profiling of histone h3 lysine 4 and lysine 27 trimethylation reveals an epigenetic signature in prostate carcinogenesis, PLoS ONE 4, e4687 (2009).

59. M. F. Fraga et al., Loss of acetylation at Lys16 and trimethylation at Lys20 of histone H4 is a common hallmark of human cancer, Nat Genet 37, 391–400 (2005).

124 60. S. Varambally et al., The polycomb group protein EZH2 is involved in progression of prostate cancer, Nature 419, 624–629 (2002).

61. S. V. Sharma et al., A chromatin-mediated reversible drug-tolerant state in cancer cell subpopulations, Cell 141, 69–80 (2010).

62. P. J. Park, ChIP-seq: advantages and challenges of a maturing technology, Nat Rev Genet 10, 669–680 (2009).

63. L. P. O'Neill, B. M. Turner, B. Turner, ChIP with Native Chromatin: Advantages and Problems Relative to Methods Using Cross-Linked Material, Methods 31, 76–82 (2003).

64. V. Orlando, Mapping chromosomal proteins in vivo by formaldehyde-crosslinked- chromatin immunoprecipitation, Trends Biochem Sci 25, 99–104 (2000).

65. A. Barski, K. Zhao, Genomic location analysis by ChIP-Seq, J Cell Biochem 107, 11–18 (2009).

66. C. Dingwall, G. P. Lomonossoff, R. A. Laskey, High sequence specificity of micrococcal nuclease, Nucleic Acids Res 9, 2659–2673 (1981).

67. W. Hörz, W. Altenburger, Sequence specific cleavage of DNA by micrococcal nuclease, Nucleic Acids Res 9, 2643–2658 (1981).

68. H. Li, J. Ruan, R. Durbin, Mapping short DNA sequencing reads and calling variants using mapping quality scores, Genome Res 18, 1851.

69. B. Langmead, C. Trapnell, M. Pop, S. L. Salzberg, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome Biol 10, R25 (2009).

70. H. Li, R. Durbin, Fast and accurate long-read alignment with Burrows-Wheeler transform, Bioinformatics 26, 589–595 (2010).

71. R. Li, Y. Li, K. Kristiansen, J. Wang, SOAP: short oligonucleotide alignment program, Bioinformatics 24, 713–714 (2008).

72. R. Li et al., SOAP2: an improved ultrafast tool for short read alignment, Bioinformatics 25, 1966–1967 (2009).

73. S. Pepke, B. Wold, A. Mortazavi, Computation for ChIP-seq and RNA-seq studies, Nat Methods 6, S22–32 (2009).

74. H. Xu et al., A signal-noise model for significance analysis of ChIP-seq with negative control, Bioinformatics 26, 1199–1204 (2010).

75. H. Xu, C.-L. Wei, F. Lin, W.-K. Sung, An HMM approach to genome-wide identification of differential histone modification sites from ChIP-seq data, Bioinformatics

125 24, 2344–2349 (2008).

76. C. Zang et al., A clustering approach for identification of enriched domains from histone modification ChIP-Seq data, Bioinformatics 25, 1952–1958 (2009).

77. A. Barski et al., High-resolution profiling of histone methylations in the human genome, Cell 129, 823–837 (2007).

78. D. S. Johnson, A. Mortazavi, R. M. Myers, B. Wold, Genome-wide mapping of in vivo protein-DNA interactions, Science 316, 1497–1502 (2007).

79. T. S. Mikkelsen et al., Genome-wide maps of chromatin state in pluripotent and lineage-committed cells, Nature 448, 553–560 (2007).

80. N. F. Wasserman, I. Aneas, M. A. Nobrega, An 8q24 gene desert variant associated with prostate cancer risk confers differential in vivo activity to a enhancer, Genome Res 20, 1191–1197 (2010).

81. M. Adli, B. E. Bernstein, Whole-genome chromatin profiling from limited numbers of cells using nano-ChIP-seq, Nat Protoc 6, 1656–1668 (2011).

82. A. Goren et al., Chromatin profiling by directly sequencing small quantities of immunoprecipitated DNA, Nat Methods 7, 47–49 (2010).

83. J. Dekker, K. Rippe, M. Dekker, N. Kleckner, Capturing chromosome conformation, Science 295, 1306–1311 (2002).

84. Z. Zhao et al., Circular chromosome conformation capture (4C) uncovers extensive networks of epigenetically regulated intra- and interchromosomal interactions, Nat Genet 38, 1341–1347 (2006).

85. J. Dostie et al., Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements, Genome Res 16, 1299–1309 (2006).

86. E. Lieberman-Aiden et al., Comprehensive mapping of long-range interactions reveals folding principles of the human genome, Science 326, 289–293 (2009).

87. S. Cai, C. C. Lee, T. Kohwi-Shigematsu, SATB1 packages densely looped, transcriptionally active chromatin for coordinated expression of cytokine genes, Nat Genet 38, 1278–1288 (2006).

88. M. Simonis, J. Kooren, W. de Laat, An evaluation of 3C-based methods to capture DNA interactions, Nat Methods 4, 895–901 (2007).

89. M. J. Fullwood, C.-L. Wei, E. T. Liu, Y. Ruan, Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses, Genome Res 19, 521– 532 (2009).

126 90. M. Simonis et al., High-resolution identification of balanced and complex chromosomal rearrangements by 4C technology, Nat Methods 6, 837–842 (2009).

91. N. D. Heintzman et al., Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome, Nat Genet 39, 311–318 (2007).

92. Ben Langmead, S. L. Salzberg, Fast gapped-read alignment with Bowtie 2, Nat Methods 9, 357–359 (2012).

93. A. McKenna et al., The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data, Genome Res 20, 1297–1303 (2010).

94. J. C. Chow et al., LINE-1 Activity in Facultative Heterochromatin Formation during X Chromosome Inactivation, Cell 141, 956–969 (2010).

95. K. Cibulskis et al., Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples, Nat Biotechnol (2013), doi:10.1038/nbt.2514.

96. J. Ha Choi et al., Identification and characterization of novel polymorphisms in the basal promoter of the human transporter, MATE1, Pharmacogenet Genomics 19, 770– 780 (2009).

97. J. Ma et al., Somatic mutation and functional polymorphism of a novel regulatory element in the HGF gene promoter causes its aberrant expression in human breast cancer, J Clin Invest 119, 478–491 (2009).

98. L. Song, G. E. Crawford, DNase-seq: A High-Resolution Technique for Mapping Active Gene Regulatory Elements across the Genome from Mammalian Cells, Cold Spring Harbor Protocols 2010, 5384 (2010).

99. J. D. Buenrostro, B. Wu, H. Y. Chang, W. J. Greenleaf, ATAC-seq: A Method for Assaying Chromatin Accessibility Genome-Wide, Curr Protoc Mol Biol 109, 21–29 (2015).

100. K. Wang, M. Li, H. Hakonarson, ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data, Nucleic Acids Res 38, e164–e164 (2010).

101. J. C. Bryne et al., JASPAR, the open access database of transcription factor- binding profiles: new content and tools in the 2008 update, Nucleic Acids Res 36, D102–6 (2008).

102. V. Matys et al., TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes, Nucleic Acids Res 34, D108–10 (2006).

103. St Jude Children's Research Hospital Washington University Pediatric Cancer Genome Project et al., Whole-genome sequencing identifies genetic alterations in pediatric low-grade gliomas, Nat Genet 45, 602–612 (2013).

127 104. J. Bauer et al., BRAF mutations in cutaneous melanoma are independently associated with age, anatomic site of the primary tumor, and the degree of solar elastosis at the primary tumor site, Pigment Cell Melanoma Res 24, 345–351 (2011).

105. D. W. Parsons et al., An integrated genomic analysis of human glioblastoma multiforme, Science 321, 1807–1812 (2008).

106. C. S. Chan, J. S. Song, CCCTC-binding factor confines the distal action of estrogen , Cancer Res 68, 9041–9049 (2008).

107. M. Monte et al., MAGE-A tumor antigens target p53 transactivation function through histone deacetylase recruitment and confer resistance to chemotherapeutic agents, Proceedings of the National Academy of Sciences 103, 11160–11165 (2006).

108. B. Cadieux, T.-T. Ching, S. R. Vandenberg, J. F. Costello, Genome-wide hypomethylation in human glioblastomas associated with specific copy number alteration, methylenetetrahydrofolate reductase allele status, and increased proliferation, Cancer Res 66, 8469–8476 (2006).

109. J. Chen, W. A. Weiss, Alternative splicing in cancer: implications for biology and therapy, Oncogene 34, 1–14 (2015).

110. A. Melnikov et al., Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay, Nat Biotechnol 30, 271–277 (2012).

111. T. M. Bryan, T. R. Cech, Telomerase and the maintenance of chromosome ends, Curr Opin Cell Biol 11, 318–324 (1999).

112. C. W. Greider, E. H. Blackburn, Identification of a specific telomere terminal transferase activity in tetrahymena extracts, Cell 43, 405–413 (1985).

113. S. L. S. Weinrich et al., Reconstitution of human telomerase with the template RNA component hTR and the catalytic protein subunit hTRT, Nat Genet 17, 498–502 (1997).

114. P. C.-B. Phd et al., Methylation of the TERT promoter and risk stratification of childhood brain tumours: an integrative genomic and molecular study, Lancet Oncology 14, 534–542 (2013).

115. N. W. Kim et al., Specific Association of Human Telomerase Activity with Immortal Cells and Cancer, Science 266, 2011–2015 (1994).

116. J. W. Shay, S. Bacchetti, A survey of telomerase activity in human cancer, Eur. J. Cancer 33, 787–791 (1997).

117. J. A. O. Vinagre et al., Frequency of TERT promoter mutations in human cancers, Nature Communications 4, 1–6 (2013).

128 118. P. S. Rachakonda et al., TERT promoter mutations in bladder cancer affect patient survival and disease recurrence through modification by a common polymorphism, PNAS 110, 17426–17431 (2013).

119. M. Simon et al., TERT promoter mutations: a novel independent prognostic factor in primary glioblastomas, Neuro-Oncology 0, 1–8 (2014).

120. S. Spiegl-Kreinecker et al., Prognostic quality of activating TERT promoter mutations in glioblastoma: interaction with the rs2853669 polymorphism and patient age at diagnosis, Neuro-Oncology (2015), doi:10.1093/neuonc/nov010.

121. G.-H. Wei et al., Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo, The EMBO Journal 29, 2147–2160 (2010).

122. An induced Ets repressor complex regulates growth arrest during terminal macrophage differentiation, 109, 169–180 (2002).

123. Cutting edge: A transcriptional repressor and corepressor induced by the STAT3- regulated anti-inflammatory signaling pathway, 179, 7215–7219 (2007).

124. PE-1/METS, an antiproliferative Ets repressor factor, is induced by CREB- 1/CREM-1 during macrophage differentiation, 279, 17772–17784 (2004).

125. E. Birney et al., Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project, Nature 447, 799–816 (2007).

126. P. C. Hollenhorst et al., Oncogenic ETS proteins mimic activated RAS/MAPK signaling in prostate cells, Genome Res 25, 2147–2157 (2011).

127. C. C. Thompson, T. A. Brown, S. L. Mcknight, Convergence of Ets- and notch- related structural motifs in a heteromeric DNA binding complex, Science 253, 762–768 (1991).

128. T. Oikawa, T. Yamada, Molecular biology of the Ets family of transcription factors, Gene 303, 11–34 (2003).

129. K. LaMarco, C. C. Thompson, B. P. Byers, E. M. Walton, S. L. Mcknight, Identification of Ets- and notch-related subunits in GA binding protein, Science 253, 789–792 (1991).

130. S. Gugneja, J. V. Virbasius, R. C. Scarpulla, Four structurally distinct, non-DNA- binding subunits of human nuclear respiratory factor 2 share a conserved transcriptional activation domain, Mol Cell Biol 15, 102–111 (1995).

131. F. C. De La Brousse, E. H. Birkenmeier, D. S. King, L. B. Rowe, S. L. Mcknight, Molecular and genetic characterization of GABP beta, Genes Dev 8, 1853–1865 (1994).

132. J. Sawada, M. Goto, C. Sawa, H. Watanabe, H. Handa, Transcriptional activation

129 through the tetrameric complex formation of E4TF1 subunits, EMBO J 13, 1396–1402 (1994).

133. S. L. Palumbo, S. W. Ebbinghaus, L. H. Hurley, Formation of a unique end-to-end stacked pair of G-quadruplexes in the hTERT core promoter with implications for inhibition of telomerase by G-quadruplex-interactive ligands, J Am Chem Soc 131, 10878–10891 (2009).

134. K. W. Lim et al., Coexistence of two distinct G-quadruplex conformations in the hTERT promoter, J Am Chem Soc 132, 12331–12342 (2010).

135. Genomic analyses reveal broad impact of miR-137 on genes associated with malignant transformation and neuronal differentiation in glioblastoma cells, 9, e85591– e85591 (2014).

136. TopHat: discovering splice junctions with RNA-Seq, 25, 1105–1111 (2009).

137. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation, 28, 511–515 (2010).

138. B. E. Johnson et al., Mutational analysis reveals the origin and therapy-driven evolution of recurrent glioma, Science 343, 189–193 (2014).

139. R. Roy, S. Hohng, T. Ha, A practical guide to single-molecule FRET, Nat Methods 5, 507–516 (2008).

140. Large-scale discovery of ERK2 substrates identifies ERK-mediated transcriptional regulation by ETV3, 4, rs11–rs11 (2011).

141. Y. Zhang et al., Model-based analysis of ChIP-Seq (MACS), Genome Biol 9, R137 (2008).

142. MEME SUITE: tools for motif discovery and searching, 37, W202–W208 (2009).

143. M. R. Mansour et al., Oncogene regulation. An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element, Science 346, 1373–1377 (2014).

144. M. J. Fullwood, Y. Han, C.-L. Wei, X. Ruan, Y. Ruan, Chromatin interaction analysis using paired-end tag sequencing, Curr Protoc Mol Biol Chapter 21, Unit 21.15.1–25 (2010).

145. B. Schuster-Böckler, B. Lehner, Chromatin organization is a major influence on regional mutation rates in human cancer cells, Nature (2012), doi:10.1038/nature11273.

146. C. Melton, J. A. Reuter, D. V. Spacek, M. Snyder, Recurrent somatic mutations in regulatory regions of human cancer genomes, Nat Genet 47, 710–716 (2015).

130 147. N. J. Fredriksson, L. Ny, J. A. Nilsson, E. Larsson, Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types, Nat Genet 46, 1258–1263 (2014).

148. A. H. Batchelor, The Structure of GABP/: An ETS Domain- Ankyrin Repeat Heterodimer Bound to DNA, Science 279, 1037–1041 (1998).

149. A. G. Rosmarin, K. K. Resendes, Z. Yang, J. N. McMillan, S. L. Fleming, GA- binding protein transcription factor: a review of GABP as an integrator of intracellular signaling and protein-protein interactions, Blood Cells Mol Dis 32, 143–154 (2004).

150. H. Wu et al., The Ets Transcription Factor GABP Is a Component of the Hippo Pathway Essential for Growth and Antioxidant Defense, CellReports 3, 1663–1677 (2013).

151. E. Flory, A. Hoffmeyer, U. Smola, U. R. Rapp, J. T. Bruder, Raf-1 kinase targets GA-binding protein in transcriptional regulation of the human immunodeficiency virus type 1 promoter, J Virol 70, 2260–2268 (1996).

152. M. L. Bochman, K. Paeschke, V. A. Zakian, DNA secondary structures: stability and function of G-quadruplex structures, Nat Rev Genet 13, 770–780 (2012).

153. J. B. Chaires et al., An improved model for the hTERT promoter quadruplex, PLoS ONE 9, e115580–e115580 (2014).

154. W. C. Hahn et al., Inhibition of telomerase limits the growth of human cancer cells, Nat Med 5, 1164–1170 (1999).

155. D. Gomez, Telomerase downregulation induced by the G-quadruplex ligand 12459 in A549 cells is mediated by hTERT RNA alternative splicing, Nucleic Acids Res 32, 371–379 (2004).

156. M. Ruden, N. Puri, Novel anticancer therapeutics targeting telomerase, Cancer Treat. Rev. 39, 444–456 (2013).

157. M. Read et al., Structure-based design of selective and potent G quadruplex- mediated telomerase inhibitors, Proc Natl Acad Sci USA 98, 4844–4849 (2001).

158. A. Siddiqui-Jain, C. L. Grand, D. J. Bearss, L. H. Hurley, Direct evidence for a G- quadruplex in a promoter region and its targeting with a small molecule to repress c- MYC transcription, Proc Natl Acad Sci USA 99, 11593–11598 (2002).

159. J. Vinagre et al., Telomerase promoter mutations in cancer: an emerging molecular biomarker? Virchows Arch 465, 119–133 (2014).

160. P. G. M. MD et al., Gemcitabine and capecitabine with or without telomerase peptide vaccine GV1001 in patients with locally advanced or metastatic pancreatic cancer (TeloVac): an open-label, randomised, phase 3 trial, Lancet Oncology 15, 829–

131 840 (2014).

161. T. F. Greten et al., A phase II open label trial evaluating safety and efficacy of a telomerase peptide vaccination in patients with advanced hepatocellular carcinoma, BMC Cancer 10, 209 (2010).

162. C. O. Marian et al., The telomerase antagonist, imetelstat, efficiently targets glioblastoma tumor-initiating cells leading to decreased proliferation and tumor growth, Clin Cancer Res 16, 154–163 (2010).

163. S. C. P. Williams, No end in sight for telomerase-targeted cancer drugs, Nat Med 19, 6–6 (2013).

164. R. Salloum et al., TR-11 * A MOLECULAR BIOLOGY AND PHASE II STUDY OF IMETELSTAT (GRN163L) IN CHILDREN WITH RECURRENT OR REFRACTORY CENTRAL NERVOUS SYSTEM (CNS) MALIGNANCIES: A PEDIATRIC BRAIN TUMOR CONSORTIUM STUDY, Neuro-Oncology 17, iii39–iii39 (2015).

165. C. Bruedigam et al., Telomerase Inhibition Effectively Targets Mouse and Human AML Stem Cells and Delays Relapse following Chemotherapy, Cell Stem Cell 15, 775– 790 (2014).

166. C. Sakuma et al., Anti-WASP intrabodies inhibit inflammatory responses induced by Toll-like receptors 3, 7, and 9, in macrophages, Biochemical and Biophysical Research Communications 458, 28–33 (2015).

167. Y. Wang et al., Transbody against hepatitis B virus core protein inhibits hepatitis B virus replication in vitro, International Immunopharmacology 25, 363–369 (2015).

168. G. E. Winter et al., DRUG DEVELOPMENT. Phthalimide conjugation as a strategy for in vivo target protein degradation, Science 348, 1376–1381 (2015).

169. Z.-F. Yang et al., GABP transcription factor is required for development of chronic myelogenous leukemia via its control of PRKD2, Proc Natl Acad Sci USA 110, 2312– 2317 (2013).

170. S. Ristevski et al., The ETS Transcription Factor GABP{alpha} Is Essential for Early Embryogenesis, Mol Cell Biol 24, 5844–5849 (2004).

171. K. Okuda et al., Natural history of hepatocellular carcinoma and prognosis in relation to treatment. Study of 850 patients, Cancer 56, 918–928 (1985).

172. M. Frantzi et al., Developing proteomic biomarkers for bladder cancer: towards clinical application, Nat Rev Urol 12, 317–330 (2015).

132