p53 and Sp1 Associated RNAs Act as Non-coding Transcriptional Regulators at Homologous Loci

Rachel Hughes

A thesis in fulfilment of the requirements for the degree of

Master of Philosophy

School of Biotechnology and Biomolecular Sciences

Faculty of Science

April 2016 PLEASE TYPE THE UNIVERSITY OF NEW SOUTH WALES

Thesis/Dissertation Sheet

Surname or Family name: Hughes

First name: Rachel Other name/s: Genevieve

Abbreviation for degree as given in the University calendar: MPhil

School: Biotechnology and Biomolecular Sciences Faculty: Science

Title: p53 and Sp1 Associated RNAs Act as Non-Coding Transcriptional Regulators at Homologous Loci

Abstract 350 words maximum: (PLEASE TYPE)

RNA functionality has been proven to extend far beyond the outdated coding divide, as transcripts not bound for translation are instead found to act as endogenous messengers and moderators, utilising inherent to interact with DNA or protein targets. A RIP-Seq of six major transcription factors including p53 and Sp1 uncovered multiple bound RNAs, some of which were interestingly protein coding. Of these, the HIST1 H1 D and SF3B5 mRNAs were knocked down in order to investigate the ability of their affiliated transcription factors to localise to the target . Both RNAs demonstrated an inherent ability to modulate transcription factor localisation to homology containing loci in cis by acting as protein guides in the case of SF3B5, and as decoys in HIST1 H1 D. Expression of other linker histone H1 was also observed to be under H1ST1 H1 D RNA regulation, suggesting a network involving the p53 tumour suppressive cascade which induces senescence in damaged cells. Together, the data confirms an innate complexity of RNA that is only beginning to be

unveiled, with major lies to tumour preventative pathways and therefore therapeutic possibilities.

Declaration relating to disposition of project thesis/dissertation

I hereby grant to the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or in part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all property rights, such as patent rights. I also retain the right to use in future works (such as articles or bo I or part of this thesis or dissertation.

I also authorise University Mi se the 350 word abstract of my thesis in Dissertation doctoral theses only).

The University recognises that there may be exceptional circumstances requiring restrictions on copying or conditions on use. Requests for restriction for a period of up to 2 years must be made in writing. Requests for a longer period of restriction may be considered in exceptional circumstances and require the approval of the Dean of Graduate Research.

FOR OFFICE USE ONLY Date of completion of requirements for Award:

Scanned by CamScanner

COPYRIGHT STATEMENT

‘I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral theses only). I have either used no substantial portions of copyright material in my thesis or I have obtained permission to use copyright material; where permission has not been granted I have applied/will apply for a partial restriction of the digital copy of my thesis or dissertation.'

Signed ……………………………………………......

Date ……………………………………………......

AUTHENTICITY STATEMENT

‘I certify that the Library deposit digital copy is a direct equivalent of the final officially approved version of my thesis. No emendation of content has occurred and if there are any minor variations in formatting, they are the result of the conversion to digital format.’

Signed ……………………………………………......

Date ……………………………………………......

Originality Statement

‘I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in this thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project’s design and conception or in style, presentation and linguistic expression is acknowledged.’

Signed

Date

Table of Contents

Abstract i

Acknowledgements ii

Abbreviations iii

List of Figures v

List of Tables vi

Chapter 1: Introduction

1.1 RNA 2

1.1.1 Protein Coding and Infrastructural RNA 2

1.1.2 Non-coding RNA 3

1.1.3 RNAs in Disease 8

1.2 The Search for Transcription Factor Associated RNAs 9

1.2.1 RIP-Seq 10

1.3 RNAs of Interest 10

1.3.1 HIST1H1D 10

1.3.2 SF3B5 12

1.4 Associated Transcription Factors 13

1.4.1 p53 14

1.4.2 Sp1 16

1.5 Aims 18

Chapter 2: Materials and Methods 19

2.1 Cell lines 19

2.2 Reagents 19

2.3 Buffers 20

2.4 Chemicals and Solutions 21

2.5 Media 21

2.6 Oligodeoxynucleotides 22

2.7 Primers 22

2.8 Antibodies 24

2.9 Biotinylated Antisense ODNs 24

2.10 Deep Sequencing 25

2.11 Cell Culture 25

2.12 Transfection 26

12.12.1 Seeding 26

2.12.2 Transfection Complexes 27

2.13 Knockdown Efficiency 28

2.13.1 RNA Isolation 28

2.13.2 Removing DNA Contaminants 28

2.13.3 Reverse Transcription 28

2.13.4 Quantitative Real Time PCR 28

2.14 Chromatin Immunoprecipitation for Transcription Factor Localisation 30

2.14.1 Cross-linking 30

2.14.2 Lysis and Sonication 31

2.14.3 Binding the Antibody 31

2.14.4 Immunoprecipitation 31

2.14.5 DNA Extraction 33

2.14.6 Quantitative PCR 33

2.14.7 Standard Curve 34

2.15 UCSC Genome Browser 34

2.16 Biotinylated asODN-RNA Pulldown 34

2.16.1 Immunoprecipitation 34

2.16.2 Validation of the Immunoprecipitated RNA 36

Chapter 3: Results 37

3.1 Deep Sequencing Alignment 37

3.2 Determining the Efficacy of asODN Mediated RNA Knockdown 42

3.3 Enrichment or Loss of Transcription Factor Localisation After RNA Knockdown 45

3.4 Enrichment or Loss at Homologous Loci 48

3.4.1 Determining Possible Homologous Targets 48

3.4.2 Enrichment or Loss of Transcription Factor Localisation

at Homologous Loci 50

3.5 Expression at Homologous Loci 52

3.6 Biotinylated asODN-RNA Pulldown 54

Chapter 4: Discussion 55

4.1 RNA-Directed Transcriptional Activation and Repression 55

4.2 Trans-regulation at Homologous Loci 60

4.3 mRNA Acting as ncRNA 62

4.4 HIST1H1D in Cell Senescence 66

4.5 Looking Ahead 67

Chapter 5: Conclusion 69

References 70

Appendix 81

A. UCSC Genome Browser 81

B. Sequenced Transcripts From RIP-Seq Data 84

Abstract

RNA functionality has been proven to extend far beyond the outdated protein coding divide, as transcripts not bound for translation are instead found to act as endogenous messengers and moderators, utilising inherent sequence homology to interact with DNA or protein targets. A RIP-Seq of six major transcription factors including p53 and Sp1 uncovered multiple bound RNAs, some of which were interestingly protein coding. Of these, the HIST1H1D and SF3B5 mRNAs were knocked down in order to investigate the ability of their affiliated transcription factors to localise to the target genes. Both RNAs demonstrated an inherent ability to modulate transcription factor localisation to homology containing loci in cis by acting as protein guides in the case of SF3B5, and as decoys in HIST1H1D. Expression of other linker histone H1 proteins was also observed to be under HIST1H1D RNA regulation, suggesting a network involving the p53 tumour suppressive cascade which induces senescence in damaged cells. Together, the data confirms an innate complexity of RNA that is only beginning to be unveiled, with major ties to tumour preventative pathways and therefore therapeutic possibilities.

i

Acknowledgements

Preliminary immunoprecipitation and deep sequencing data was provided with permission by Kevin Morris, John Burdach and Merlin Crossley.

Many thanks to Kevin Morris for the insightful guidance over the course of this project, as well as Rosie, Chris, Galina, Caio, Nick and Albert of the Morris lab who have been kind enough to answer questions along the way. Thanks also to Louise Lutze-

Mann and Matthew Clemson for the additional support.

This project is dedicated to Bill, Julie, Derek, Lauren and Zac who's continued encouragement has made the achievement possible.

ii

Abbreviations

asODN - antisense oligodeoxynucleotide

BLAT - basic local alignment tool cDNA - complementary deoxyribonucleic acid

ChIP - chromatin immunoprecipitation

Ct - cycle threshold

DNA - deoxyribonucleic acid

F1- forward primer 1

F2- forward primer 2

HIST1H1D - protein member D of the histone cluster 1 family

H1 - histone cluster 1 lncRNA - long non-coding ribonucleic acid mRNA - messenger ribonucleic acid ncRNA - non-coding ribonucleic acid

ODN - oligodeoxynucleotide

PCR - polymerase chain reaction qPCR - quantitative polymerase chain reaction qRT-PCR - quantitative reverse transcription polymerase chain reaction

RIP-Seq - ribonucleic acid immunoprecipitation with high-throughput sequencing

iii

R1- reverse primer 1

R2 - reverse primer 2

RNA - ribonucleic acid

SF3B5 - subunit 5 of the splicing factor 3b

UCSC - University of California Santa Cruz

iv

List of Figures

Figure 3.1 p53 deep sequencing alignment mapped data

Figure 3.2 Sp1 deep sequencing alignment mapped data

Figure 3.3 KLF3 deep sequencing alignment mapped data

Figure 3.4 Transcript knockdown in asODN transfected HEK293 cells

Figure 3.5 Transcription factor localisation at the HIST1H1D and SF3B5 loci after asODN-mediated knockdown

Figure 3.6 UCSC Genome Browser BLAT query of HIST1H1D sequence

Figure 3.7 UCSC Genome Browser BLAT query of SF3B5 sequence

Figure 3.8 Transcription factor enrichment at homologous loci after HIST1H1D asODN-mediated knockdown

Figure 3.9 Relative expression at homologous loci after HIST1H1D asODN-mediated knockdown

Figure 4.1 SF3B5 RNA as a protein guide and HIST1H1D as a target decoy

v

List of Tables

Table 2.1 Reagents by supplier

Table 2.2 Buffer composition

Table 2.3 Common chemicals by supplier

Table 2.4 Antisense ODN sequences for RNA knockdown with control

Table 2.5 Primers pairs and sequences

Table 2.6 Biotin labelled asODNs and sequences targeting of their corresponding homologous RNA

Table 2.7 Transfection complex mixtures by culture vessel size

Table 3.1 Deep sequencing hits enriched 1.1X over IgG control per transcription factor

vi Chapter 1 Introduction

Introduction

It has become increasingly apparent over recent years, that RNA molecules not only are the blueprints for protein assembly, but also have crucial regulatory roles with regards to their own transcription and translation. The completion of the

Project and its ongoing continuation as the Encyclopedia of DNA Elements (ENCODE) project have provided invaluable data for investigating the double stranded code that is responsible for our existence. In 2003, for the first time it became clear that as humans our DNA contains roughly 20,000 - 25,000 protein coding genes 1. Later sequencing studies have shown that number to be nearly the same as less advanced organisms such as frogs 2 or chickens 3. Francis Crick’s central dogma of molecular biology, that

DNA encodes RNA which translates to proteins, obviously needed to be retired4. Since then, a new era of study has emerged, one in which non-coding RNA, formerly known as

“junk DNA”, and even transcripts which could potentially code for proteins, were instead being demonstrated to have taken on vital roles within regulation 5. Perhaps these

RNAs held some responsibility for the evident complexities among species. Perhaps too, these RNAs could be the elusive key players in transcriptional regulation and the diseases resulting from its imbalance.

1 Chapter 1 Introduction

1.1 RNA

1.1.1 Protein Coding and Infrastructural RNA

RNA is the well known, short-lived molecule generally made of a single nucleotide chain complementary to its template DNA during the process of transcription. Where as

DNA is the ultimate holder of all genetic information, it is the RNA that is the real workhorse by carrying out and regulating its message. However, all RNA is not the same; there are multiple classifications based on varying structures and functions, but there is one underlying divide - its protein coding ability.

Messenger RNA (mRNA) is the passive coding transcript synthesised from a gene, which is classically thought of as the blueprint for a protein’s production. These transcripts are immediately processed in the nucleus before exportation to the cytoplasm for translation. mRNAs are spliced to remove introns and a 7-methylguanylate 5’ cap is added. All of these transcripts, with the exception of histone mRNAs, also undergo cleavage at the 3’ end and polyadenylation 6. The non-coding infrastructural RNAs such as transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs) have also been long understood components of transcription and translation, but it is the other more cryptic yet ever growing classes of non-coding RNAs that are shaping up to be the truly interesting elements within the genome.

According to the human genome sequencing data, only 1.5% of our DNA codes for proteins 7, but approximately two thirds of it is transcribed 8. Originally, these

2 Chapter 1 Introduction pervasively synthesised molecules and their DNA counterparts were thought to have somehow been a genetic silent majority, a waste of biological energy, or in other words, just junk 7. However, with the continuing advances in high throughput sequencing technologies and bioinformatics, new classes of non-coding regions are being characterised and actually demonstrated to have quite important roles. For example,

GENCODE and genome wide association studies have shown that 95% of the genome lies within 8 kb of a DNA-protein interaction and most of it participates in some type of biochemical RNA or chromatin reaction 9. It has also been reported that although the amount of protein coding elements are similar between eukaryotic species, the number of non-coding intronic and intergenic transcripts do accordingly increase with complexity

10. Expression of non-coding transcripts can also be highly coordinated just like its protein coding counterparts 11. Transcription, including that of non-coding elements, is greatly tissue specific 12. This is even found to be true during development 13 as well as in cancer and other diseases 14. From this, and a growing body of other evidence, it’s intuitive that there must be much more transcriptome functionality than what initially meets the eye.

1.1.2 Non-coding RNA

RNA molecules which do not code for a functional protein or are not intended for protein production can be considered non-coding and although the list of specific types is continually increasing, they can be generally categorised according to length as either short-, middle size-, or long- non-coding RNA (ncRNA) 15. Short ncRNAs are approximately 17-30 nucleotides in length and include micro RNA (miRNA), piwi- interacting RNA (piRNA) and transcription initiation RNA (tiRNA) 16. Many of these

3 Chapter 1 Introduction considerably short nucleotide strands have been proven as key players in RNA interference and gene silencing pathways at both the transcriptional and translational level16. Mid-sized ncRNAs are described as 20-200 nucleotides in length and include small nucleolar RNA (snoRNA), transcriptional start site associated RNA (TSSa-RNA), and promoter upstream transcripts (PROMPTs) 15,17. The latter two exist near active protein coding gene transcription, suggesting a noteworthy role in gene regulation and possibly its response to environmental factors 18,19. Lastly, long ncRNA (lncRNA) transcripts are composed of 200 nucleotides or more 15. Much like mRNA, they are processed with the addition of a 5’ cap and a polyadenylation, and can undergo transcript splicing 20. This category is the most functionally diverse, yet enigmatic of them all.

LncRNAs have recently been demonstrated to possess crucial functions regarding gene expression, with numerous relationships to both the transcriptional and translational level. These long nucleotide strands can be sense or antisense, and may stem from bidirectional, intronic, or intergenic transcription 21. A body of experimental evidence is rapidly growing which shows the diversity of lncRNA interactions, including their uses as epigenetic markers, transcriptional signals, protein guides, complex scaffolds, RNA sponges, and regulatory protein decoys 22.

One of the first well characterised functional lncRNAs was X-inactive specific transcript (XIST) and its impressive role in genetic imprinting. Since females possess two X sex , whereas males receive only one, it is critical for gene transcription from the sex chromosomes of females to be normalised, or dosage

4 Chapter 1 Introduction compensated, so that twice the amount of X-linked alleles are not expressed during development. This is achieved by random inactivation of one female X via

DNA methylation across its various CpG locations after fertilisation, and this genetic marker is passed on to all subsequent daughter cells 23. To initiate the process, XIST is transcribed solely from the soon to be inactive chromosome along with its overlapping antisense transcript Tsix, but after molecular machinery have made the necessary epigenetic changes on the active X, Tsix transcription is down regulated and XIST transcription is derepressed in cis. The chromatin modifier PRC2 is recruited by the resulting abundance of XIST transcripts surrounding the inactive chromatin where, with the aid of the bivalent transcription factor YY1, all three molecules are bound and lasting histone 3 lysine 27 trimethylation (H3K27me3) repressive markers are placed 23,24. In this way, a non-coding transcript is demonstrated to signal the start of a biological process as well as engage integral proteins for its completion. Comparably, other lncRNAs have also been shown to regulate genetic imprinting in cis such as H19, Air and Kcnq1ot 22,25.

More localised epigenetic gene silencing can be carried out by lncRNAs as they guide protein complexes to targeted loci. Non-coding antisense stands containing homology to a primary RNA being synthesised can target the gene’s promoter approximately 1-2 nucleosomes upstream by recruiting an epigenetic silencing complex of proteins including Argonaut-1(Ago1), Histone Deacetylase-1 (HDAC1), DNA

Methyltransfrase-3a (DNMT3a) to the location, facilitating the addition of repressive trimethylation at H3K27 and modifying the chromatin packing 26. An example of this is the non-coding p15 antisense RNA which was found to induce an epigenetic silent state of the tumour suppressor p15 in leukaemia 27. Short term gene silencing can occur in an

5 Chapter 1 Introduction almost identical method. When the antisense stand binds to a homologous primary

RNA, a recruited Argonaut-2 (Ago2) spatially blocks RNA Polymerase II from its intended binding site 26.

On the opposite side of the spectrum, LncRNAs can also be involved in enhancing expression using active chromatin modifiers such as H3K27 acetylation and histone 3 lysine 4 trimethylation (H3K4me3). A lncRNA named HOTTIP was demonstrated to bind to a protein in the MLL/TRX complex that is responsible for the

H3K4me3 mark and, due to its location with respect to chromosomal looping, cause stable chromatin activation for the transcription of HOXA genes 28.

Post-transcriptional regulation of gene expression involving sequence complementarity between ncRNA and mRNA base pairs can result in mRNA protection, accelerated mRNA degradation, or prevention of mRNA translation. The mRNA encoding BACE1, a β-secretase crucial for β-amyloid production and subsequently involved in Alzheimer’s disease, is complementary to its expressed antisense transcript,

BACE1-AS and the pairing is shown to protect the coding transcript from RNAse degradation 29,30. On the other hand, partial homologous pairing can result in blocking of translational machinery, as in the case of PU.1 mRNA and its antisense transcript, or in an increased rate of decay, as exemplified by Staufen recognised double stranded mRNAs 30.

6 Chapter 1 Introduction

LncRNAs can also regulate gene expression by directly influencing mRNA processing needed before translation. Non-coding MALAT1 was shown to be responsible for the gathering of splicing factors at transcriptionally active loci. In neuronal cells, MALAT1 repression results in a decrease of mRNA expression due to the disruption of proper splicing ability 29.

Another interesting, example of the involvement of lncRNAs in gene regulation involves some basic deception. Due to their similarity in sequence or structure to the targeted loci, some ncRNAs can deflect regulatory proteins away from DNA interaction.

In a struggle too avoid senescence, the lncRNA PANDA sequesters the transcription factor NF-YA from apoptosis inducing genes, thus inhibiting their transcription and prolonging the cell’s life 29,31.

Perhaps more than a simple side effect of their own transcription, ncRNA transcription can result in transcriptional activation or repression of neighbouring genes.

One study reported that just over half of the lncRNAs under investigation affected the expression of neighbouring genes, albeit positive or negative 29. One of the reasons for this is nucleosome repositioning, which is needed by transcriptional machinery to gain access to the target loci, but as a result makes access more or less favourable to other nearby genes 32. Whether it’s nucleosome shifting, or the subsequent derepression of regulatory factor binding, lncRNA transcription in an area near coding genes can activate neighbouring genes much like an enhancer. This is perfectly summarised by the activation of the entire V,D and J gene family due to antisense lncRNA transcription within intergenic regions of the loci. Transcription in this way provides a recombination mechanism needed for functional B and T lymphocyte elements in the immune system

7 Chapter 1 Introduction demonstrating yet another critical role for lncRNAs within the genome 32. With this category, important to point out that there is difference between regulation via a product of lncRNA transcription as in earlier examples, and regulation as a direct result of the transcriptional process itself such as an instance in which an antisense lncRNA interacts with the promoter of a nearby gene as it is being transcribed.

RNA mediated transcriptional activation or repression is also achieved by directly regulating RNA polymerase II and transcriptional machinery. In response to a heat stress, transcription of the non-coding heat-shock RNA-1 (HSR1) is induced, which recruits heat-shock-factor protein-1 (HSF1), a transcriptional activator, to form a trimer and initiate transcription 33. HSR1 not only enlists the translation factor EF1A in the process, it also is thought to associate with the promoters of target loci detailing an active role in transcriptional activation 33.

From the previous examples, it is clear that the abundant RNA variations of the transcriptome and their expansive list of biological functions are critical coordinators of genetic expression. Understandably, their mis-regulation often leads to deleterious consequences.

1.1.3 RNAs in Disease

Genome wide association studies (GWAS) have determined that over 90% of single nucleotide polymorphisms (SNPs) are associated with disease in humans do not lie within protein coding genes, but instead within non-coding regions of the genome 34.

8 Chapter 1 Introduction

These single changes in the nucleotide code could affect regulatory regions of protein coding genes such as promoters and binding sites, or may induce changes in ncRNA transcripts affecting their structure, base pairing ability and in result, their function. As demonstrated earlier, ncRNAs are vital to the healthy control of gene expression so an alteration as such could propose catastrophic consequences. A recurring theme among cancers is the repression of tumour suppressor genes, mainly through epigenetic silencing of their promoters 35. Some ncRNAs can even be thought of as having tumour suppressive qualities, such as MALAT1 regarding the down regulation of the tumour related enzyme MMP2 in glioma cells 36, or oncogenic as in the case of FAL1 which represses p21 transcription in cancer 37. Miss regulated ncRNAs in disease, especially cancer, are a promising a new array of targets for therapeutics development as well as potential molecules for early detection screening. Therefore, uncovering novel endogenous RNA transcriptional regulators is imperative not only for our understanding of the transcriptome, but also for developing genetic therapies for when it goes awry.

1.2 The Search for Transcription Factor Associated RNAs

Transcription factors are proteins that bind to specific DNA sequences for either a recruitment or blocking of RNA polymerase for transcription, making them essential for gene regulation and a significantly interesting area in genomics research seeking novel ways these regulators are regulated themselves. With the increasing evidence of functionality for ncRNAs across gene regulation and examples of their ability to be protein recruiters, scaffolds and guides, the possibility of uncovering the involvement of these transcripts in directing transcription factors is intriguing.

9 Chapter 1 Introduction

1.2.1 RIP-Seq

In short, to discover novel transcription factor bound transcripts, RNA immunoprecipitation followed by reverse transcription and deep sequencing can be performed. Any bound transcripts co-immunoprecipitated with the protein are sequenced, aligned and subtracted against a control to normalise the data. The resulting differences in enriched sequence peaks mark potentially bound transcripts.

1.3 RNAs of Interest

1.3.1 HIST1H1D

The HIST1H1D gene encodes a protein member of the linker histone H1 family

38. Histones concisely package the extensive amount of DNA contained in the nucleus.

Octamers of histone proteins are encircled twice by about 146 bp of DNA per each of these repeating nucleosome units 38. Linker histones bind the DNA between nucleosomes and are responsible for the compacting secondary structure into chromatin

38. The H1 family also contains HIST1H1B, HIST1H1C, and HIST1H1E in the large cluster of histone genes located almost exclusively on . An interestingly high amount of conservation and sequence homology between the HISTH1 family, including some surrounding sequences outside the gene, indicates that they are most likely homologues 39,40. Genes in the H1 family have characteristic roles and are differentially expressed, yet it has also been demonstrated in mice, that the variants can compensate for a loss in functionality of one of its members during development 41-43.

10 Chapter 1 Introduction

The HIST1H1D gene is intronless and its mRNA lacks a polyanenylated tail, a feature of histone RNAs. Although there seems to be no overlapping non-coding RNA expression at that locus, an expressed antisense pseudogene HIST1H2APS3 is located approximately 1kb upstream 44, 38. The translated HIST1H1D protein is globular in structure, lysine rich, features a highly charged C-terminus tail and undergoes a heavy amount of modifications including methylation, acetylation, phosphorylation and ubiquitination 42,45. These modifications are crucial as the linker histone H1 is involved in chromatin arrangement, transitioning from its its most condensed heterochromatin state to a more open, transcriptionally active, euchromatin state 45. Recall that DNA access is needed for gene activation, hence regulation of HIST1H1D and its histone family members are key to gene expression. These varying post translational modifications act in coordination, which together creates a “histone code,” marking areas of gene transcription or silence 46. For example, in histone H1 null mice models, overall methylation remained largely unchanged, yet H1 involved genes experienced a loss of methylation mainly at the promoter region, indicating that these linker histones are able to epigenetically regulate specific gene expression through the formation of methylation patterns 47. Understandably, aberrant histone modifications may alter gene expression thus contributing to a number of diseases including cancer 46.

Aside from the structural functions of the linker histone’s globular domain, the C- terminus also possesses major functions including roles in regulating cell death. Linker histones HIST1H1B, HISTH1C and HIST1H1D have been associated to apoptotic transduction cascades via the interaction of their C-terminus with the proapoptotic protein Bax and it’s endogenous repressor Bcl-xL 48. Interestingly, p53 is a direct

11 Chapter 1 Introduction transcription-independent activator of Bax 49. Following DNA damage, the C-terminal domain alone can also act as an apoptotic agent by experiencing a decrease in phosphorylation, leaving DNA open to nuclease cleavage 48.

Histone proteins including HIST1H1D have surprisingly been found to translocate from the nucleus into the intracellular space under conditions of severe cellular stress, inflammation, infection, tissue injury, toxicity, clot formation, autoimmune disease or cancer. The released histones often continue to translocate to the cell surface across into the extracellular space where they are my trigger response pathways. This is therapeutically useful as extranuclear histone protein levels could be considered endogenous threat signals and potentially serve as biomarkers 46.

Since HIST1H1D is highly conserved due to its functional secondary structure, single nucleotide polymorphisms (SNPs) and mutations have a considerable effect on the expression of H1 regulated genes. For example, one particular SNP in HIST1H1D has been shown to impact human height 50,51. On the other hand, more deleterious mutations in this gene family can affect DNA methyltransferase binding and therefore cause a misregulation of epigenetic states, as in the case of follicular lymphoma 52.

1.3.2 SF3B5

The SF3B5 gene codes for subunit 5 of the splicing factor 3B. Splicing factors, as part of the spliceosome, are involved in gene expression via nuclear mRNA splicing, in which introns are removed and coding axons are joined 53. The SF3B splicing factor is a

12 Chapter 1 Introduction stable subunit of the 17S U2 small nuclear ribonucleo proteins (snRNP), a main subunit of the U2-dependent spliceosome 53,54. SF3B has been shown to be particularly critical for this snRNP formation 54. Splicing factors in the complex are involved in the initiation and stabilisation of pre-mRNA intron branch site binding 54.

SF3B5 is located on chromosome 6 with its closest pseudogene, MRPL42P3 approximately 50 kb upstream and no apparent overlapping or expressed ncRNA 55.

Since the majority of human protein coding genes contain introns, regulation of splicing machinery is necessary for post-transcriptional production of mature mRNA and gene expression 56. On the other hand, aberrant regulation can lead to disease and cancer. Mutations in snRNP involved genes and irregular cellular levels of splicing factors contribute to incorrect processing of pre-mRNA resulting in deformed, inactive, or erroneous amounts of protein products. Some of these proteins can contribute to tumorigenesis 56,57 while increased levels of splicing factors such as SFRS1 can become oncogenic 58. Alternative splice variants that have been well characterised with disease can therefore be employed as biomarkers, such as in breast and pancreatic cancer 59.

1.4 Associated Transcription Factors

The transcription factors p53 and Sp1 were indicated to partner with HIST1H1D and SF3B5 respectively.

13 Chapter 1 Introduction

1.4.1 p53

P53 is a well characterised tumour suppressor that can directly activate hundreds of RNA Polymerase II (RNAPII) transcribed genes and also activates or represses a multitude of other genes via indirect pathways 60-62. P53 exists at the hub of an elaborate stressed induced signalling network, yet its end results all involve the regulation of mediated cell death 61. In response to a variety of cytotoxic stimuli, p53 tetramers bind to appropriate gene promoters and recruit histone modifiers, chromatin remodellers or transcriptional complexes to the targeted loci 62.

It is the structure of p53 that allows for its multifunctional ability and multitude of sequence recognition. The protein contains two DNA binding domains: a highly conserved domain at its core for recognition of p53 response elements, and an unstructured domain at the highly basic, lysine rich C-terminus which is able to recognise both DNA and RNA 61-64. The amino terminus contains a loosely structured transactivation domain, a common feature of transcription factors, consisting of two sub domains allowing for increased general interactions 61,63. With the structural ability to recognise and interact with hundreds of sequences, both specifically and loosely, as well as its multiple roles as a transcriptional regulator, the p53 modulated network is dauntingly extensive but tremendously valuable to investigate.

P53 has been nicknamed ‘the guardian of the genome’ for a good reason 65,66.

The tumour suppressor’s vast signal transduction cascades lead to DNA damage repair, cycle arrest or, most importantly, programmed death in cells that would otherwise exhibit

14 Chapter 1 Introduction malicious aberrant growth 65-67. When the apoptosis pathway is triggered, p53 directly activates several key genes to carry out the orders such as Puma, Noxa, Apaf–1, and the previously mentioned Bax 65,67. Corresponding expression of apoptotic suppressors such as Bcl-2 and Bcl-xL are reciprocally repressed 68. It is when these complex, yet balanced pathways are disrupted, that tumour growth is almost inevitable. Defective p53 is common amongst cancers and nearly 90% of the mutations occur within the core DNA binding domain, impairing the protein’s function 63. Downstream players in p53’s regulatory network become disproportional as exemplified by the decreased expression of Bax in p53 deficient tumour cells. Failure in the p53/Bax pathway has been shown to be crucial for colon tumour cells to escape apoptotic signals, often correlating with enhanced expression of Bcl-2 and Bcl-xL. Because of this, Bax levels have been an indicator of patient survival 68. Maintenance of p53 function is undoubtedly of utmost importance in tumour suppression.

The main regulator of p53 is the oncogenic MDM2 protein whose balanced expression keeps the two opposing signals in check. MDM2 bound p53 results in inactivation and degradation, yet free p53 activates MDM2 expression ultimately creating a regulatory feedback loop 69. Signals in response to cellular stress lead to the phosphorylation of multiple sites on p53, loosening its affinity for MDM2 and releasing it for responding use. Although increased cellular levels of p53 will in turn increase MDM2 expression, the phosphorylation renders this effect superfluous until the kinase induced signals dissipate following damage repair 69. Understandably, this p53/MDM2 regulatory relationship is often disproportionate in cancer showing increased expression of the repressor MDM2 and subsequent suppression of p53 64.

15 Chapter 1 Introduction

1.4.2 Sp1

Specificity protein 1 (Sp1) is a ubiquitously expressed transcription factor featuring a triple zinc finger domain that binds to GC rich promoters of many genes involved in cell growth, DNA damage, apoptosis, immune response and chromatin remodelling 70,71. The transcription factor belongs to the SP/Krüppel-like factor (KLF) family which all share conserved DNA binding domain, yet otherwise are not significantly homologous factors. Sp1 also contains two transactivating glutamine rich domains that interact directly with TATA-binding protein (TBP) and TBP-associated factor 4 (TAF4) 70.

Between these two domains and the zinc fingers lies a highly charged domain that may aid in DNA binding and transactivation. At the C-terminal, Sp1 features a multimerisation domain, which is unique to the transcription factor 72. An inhibitory domain is located at the N-terminus 73. Apart from the zinc finger domain however, the protein is generally unstructured, providing an ability to interact with various proteins and form super activation complexes 72.

Sp1 is regulated by numerous post translational modifications as well as protein interactions. Because of its loose, malleable structure, heavy post translational modifications such as phosphorylation, sumoylation, acetylation and glycosylation modify its form to differentially interact with diverse partners or sequences 73. These post translational modifications can also affect Sp1’s stability. Direct protein interactions further stimulate or repress Sp1 transactivation. MDM2 is one of the protein regulators that binds to the transcription factor, suppressing its function, much comparable to its interaction with p53 74. In one pathway, the tumour suppressor protein pRB has been demonstrated to bind to the MDM2/Sp1 complex, releasing Sp1 and thus restoring its

16 Chapter 1 Introduction

DNA-binding ability. Increased expression of the tumour suppressor has also been shown to further stimulate Sp1 transactivation, including formation of its superactivation complex 74. Indirectly, relative levels of the homology containing Sp3 are found to correlate with Sp1-mediated transcription suggesting another level of co-regulation and competitive equilibrium 73. Undoubtedly, a complex combination of regulatory mechanisms is needed to conduct such an important transcriptional mediator which is achieved through post translational modifications, promoter location, relative levels of other interactive proteins 75.

With over 12,000 Sp1 binding sites in the human genome involving genes which function in cell proliferation, apoptosis and DNA repair, its understandable that misregulation of the transcription factor is detrimental 73,75. Sp1 is found to be up- regulated in most cancer types and its increasing level is negatively correlated with patient survival 72,73. Knockdown experiments reducing Sp1 to a normal level subsequently decreased tumour growth and metastasis 73. Sp1 mediates promoters involved in growth regulation pathways such as vascular endothelial growth factor

(VEGF), p21, and transforming growth factor-� (TGF-�), as well as apoptosis suppression such as Survivin, Bcl-2 and Bax, all of which can contribute to tumorigenesis when mismanaged 75. Metastasis involved genes such as matrix metalloproteinase-2 are also upregulated by Sp1 in some cancers 76. Because of Sp1’s unique ability to multimerize via multiple nearby binding sites, it’s also been proposed that intrachromatin bending could result and contribute to the chromosomal rearrangements observed in oncogenesis 73. Even apart from cancer, Sp1 has been demonstrated to regulate the expression of splicing factor genes which in turn affects the

17 Chapter 1 Introduction alternative splicing of downstream proteins 77. Since Sp1 is constitutively expressed, deciphering its extensive network of pathways and partners involved in its cellular regulation, both in normal and cancer cells, is of continual research.

Note: Effective knockdown of KLF3 bound RNY1 was not achieved during this study and it’s investigation was therefore its relevant introductory background is not provided.

1.5 Aims

Immunoprecipitation of the p53 and Sp1 transcription factors, which exist at the hub of extensive regulatory networks, followed by investigation of the attached RNA, may lead to the discovery of novel roles for the non-coding transcripts potentially directing transcriptional activation or repression at their homologous loci. This study seeks to uncover the RNAs of interest’s involvement in transcription factor localisation and targeting in cis after the respective bound transcripts are knocked down, hypothesising that they may be acting as protein guides, decoys or scaffolds.

Localisation and expression at other homologous loci is also investigated for possible trans-regulatory involvement. Since these transcription factors are so highly involved in cancer, this could benefit our understanding of tumourigenesis as well as lead to the identification of novel gene therapy targets.

18 Chapter 2 Materials and Methods

Materials

2.1 Cell Lines

Aliquots of the HEK293 human cell line were supplied from American Type

Culture Collection (ATCC) and kept in cryogenic storage until use.

2.2 Reagents

Commercial reagents were purchased from suppliers as listed in table 2.1 below.

Supplier Reagent Catalog Number

Sigma-Aldrich 10 X Phosphate Buffered Saline (PBS) P5493

Dulbecco’s Modified Eagle’s Medium D6046

Phenol:Chloroform:Isoamyl Alcohol 25:24:1 P3803

Random Hexamers R7647

Proteinase K P4850

M-MLV Reverse Transcriptase M1427

M-MLV Reverse Transcriptase Buffer 10X B8559

Ambion Trizol 15596018

Glycoblue AM9515

Turbo DNase AM2238

TURBO DNase Buffer 10X 4022G

Thermo Fisher Fetal Bovine Serum (FBS) 26140087

Penicillin-Streptomycin-Glutamine (100X) 10378-016

19 Chapter 2 Materials and Methods

Supplier Reagent Catalog Number

Dynabeads® Protein G 10003D

Invitrogen Oligofectamine 12252-011

Opti-MEM - Reduced Serum Medium 31985-062

SYBR Safe Gel Stain in DMSO (10,000X) S33102

New England Biolabs Deoxynucleotide Mix (DNTP Mix) N0447L

Gel Loading Dye Blue (6X) B7021S

Taq Polymerase 2X Master Mix M0270L

Kapa Biosystems KAPA SYBR Fast qPCR Master Mix 2X KK4601

Table 2.1 Reagents by supplier.

2.3 Buffers

The buffers for ChIP sample preparation and biotinylated RNA pulldown are listed in Table 2.2 in order of use. All were made with MilliQ ultra-pure water and autoclaved to sterilise.

Buffer Composition

Cell Lysis Buffer 5 mM PIPES, 85 mM KCl, 0.5% NP-40

Nuclei Lysis Buffer 10 mM Tris-HCl (pH 7.4), 10 mM NaCl, 3 mM MgCl2, 0.5%, NP-40 2 X Wash Buffer 10 mM Tris (pH 8.0), 1 mM EDTA, 0.5 M NaCl 2 X B&W Buffer 10 mM Tris-HCl (pH7.5), 1 mM EDTA, 2 M NaCl 2 X Elution Buffer 10 mM Tris-HCl (pH 6.0), 1 mM EDTA, 2 M NaCl

Table 2.2 Buffer composition.

20 Chapter 2 Materials and Methods

2.4 Chemicals and Solutions

The following common chemicals were used and supplied as listed in Table 2.3.

Supplier Chemical Catalog Number

Sigma EDTA EDS-500G

PIPES P1851-100G

NP-40 74385-1L

Paraformaldehyde 158127-500G

Ajax Finechem Glycine 1083-500G

NaCl 465-500G

Sodium Acetate 679-500G

KCl 383-500G

Tris 2311-500G

MgCl2 296-500G

Chem-Supply Ethanol 100% EA0432.5L

Isopropanol PA013-2.5L

Table 2.3 Common chemicals by supplier.

2.5 Media

For cell culture maintenance, Dulbecco’s Modified Eagle’s Medium (DMEM) supplemented with 10% Fetal Bovine Serum and 1% Penicillin-Streptomycin-Glutamine

(100X) was used. During transfection, Gibco® Opti-MEM reduced serum media was used.

21 Chapter 2 Materials and Methods

2.6 Oligodeoxynucleotides

Antisense oligodeoxynucleotides (asODNs) were designed by Kevin V. Morris and synthesised by Integrated DNA Technologies, Inc. Two asODN’s were synthesised per RNA being investigated to be pooled in transfection. Sequences for these transcripts are shown in Figure 2.4 below.

Antisense ODN Sequence asHIST1HID_1 A*C*G*T*C*C*C*G*C*C*A*G*T*T*T*C*A*C*T asHIST1HID_2 C*G*C*C*T*G*C*C*T*T*C*T*T*C*G*C*C*T*T*T asSF3B5_1 G*T*T*C*A*C*C*A*G*C*C*A*C*T*C*C*C*A*C*T asSF3B5_2 C*C*T*C*T*C*C*A*C*C*A*G*C*A*C*A*G*T*T*C HY1_as1 C*T*C*C*T*T*G*T*T*C*T*A*C*T*C*T*T*T*C*C*C HY1_as2 T*T*C*C*A*C*T*C*T*C*C*T*G*G*C*A*T*C*C*C*T miRN367 A*C*T*G*A*C*C*T*T*T*G*G*A*T*G*G*T*G*C*T*T*C*A*A

Table 2.4 Antisense ODN sequences for RNA knockdown with control. * indicates a phosphorothioate-modified backbone.

2.7 Primers

Primers were designed using PrimerQuest (http://sg.idtdna.com/primerquest/ home/index ) and manufactured by Integrated DNA Technologies, Inc. Two pairs of

22 Chapter 2 Materials and Methods primers were used for each RNA being investigated (with the exception of HIST1H1E).

Primer binding locations are shown in the Appendix.

Primer pair Forward and reverse sequence

HIST1H1D_1 Forward - CTCCACTTGCTCCTACCATTC

Reverse - GCCTTGGTGATAAGCTCAGATA HIST1H1D_2 Forward - CTTCTGGCTCCTTCAAACTCA

Reverse - CCACCTTCTTGGGCTTCTTgg

SF3B5_1 Forward - GTGGCTTCACTCTCCTGTAAA

Reverse - GCTGGCTATGGATGGTGTAG SF3B5_2 Forward - CCGGTTTGCCAATCCTCTTA

Reverse - GTTCTAAGGGTCCACATGAAGG RNY1_1 Forward - GATGCCAGGAGAGTGGAAAC

Reverse - CAGTCAGTTACAGATCGAACTCC RNY1_2 Forward - CAAGAAACGAGGGATGCCAgg

Reverse- AACTCCTTGTTCTACTCTTTCCC

HIST1H1B_1 Forward - GGGCTTTGTTGCGGTTTtca

Reverse- TCCGAAGAAGGCGAAGAAGcc HIST1H1B_2 Forward - TTGGCAGGACTCTTGGTTGcc

Reverse - AGCCCAAAGCCAAGAAGGca HIST1H1C_1 Forward - GCAGATAGTAAAGAAATTATCCAGCTC

Reverse - AAGGTTGCGAAGCCCAAgaaa HIST1H1E_1 Forward - AGAACAACAGCCGCATCAagc

Reverse -CGGCTTCTTCGCCTTCTTtgg

HIST1H1E_2 Forward - GCTCATTACTAAAGCTGTTGCC

Reverse - CGCCTTCTTGGGCTTCTTcgc

Table 2.5 Primers pairs and sequences.

23 Chapter 2 Materials and Methods

2.8 Antibodies

Anti-p53 antibody [PAb 1801] (ab28) and Anti-Sp1 (ab13370) antibodies used in immunoprecipitation were purchased from Abcam.

2.9 Biotinylated Antisense ODNs

asODN Biotinylated sequence

asHIST1H1D_Biotin1 5’Biotin-ACGTCCCGCCAGTTTCACT

asHIST1H1D_Biotin2 5’Biotin-CGCCTGCCTTCTTCGCCTTT

asSF3B5_Biotin1 5’Biotin-GTTCACCAGCCACTCCCACT

asSF3B5_Biotin2 5’Biotin-CCTCTCCACCAGCACAGTTC

asRNY1_Biotin1 5’Biotin-CTCCTTGTTCTACTCTTTCCC

asRNY1_Biotin2 5’Biotin-TTCCACTCTCCTGGCATCCCT

miRNA367_Biotin 5’Biotin-ACTGACCTTTGGATGGTGCTTCAA

Table 2.6 Biotin labelled asODNs and sequences targeting of their corresponding homologous RNA. Biotin is attached to the synthesised nucleotide sequence at the 5’ terminus as indicated.

24 Chapter 2 Materials and Methods

Methods

2.10 Deep Sequencing

Deep sequencing was analysis was performed at The Ramaciotti Centre for

Genomics at UNSW in order to identify potential RNA partners bound to transcription factor proteins in HEK293 cells. Sequence reads were assembled using Cufflinks (http:// cole-trapnell-lab.github.io/cufflinks/) after quality control with FastQC (http:// www.bioinformatics.babraham.ac.uk/projects/fastqc/) and trimming with Trimmomatic

(http://www.usadellab.org/cms/?page=trimmomatic). Sequence hits were aligned to the human genome using TopHat2 (https://ccb.jhu.edu/software/tophat/index.shtml) 78. Tag counts from the transcription factor immunoprecipitations were manually calculated as a fraction of those from the IgG control. Transcripts with the greatest fold enrichments were narrowed down for further investigation.

2.11 Cell Culture

Aliquots of HEK293 cells were obtained from UNSW liquid nitrogen storage and quickly defrosted at 37ºC before being added to pre-warmed DMEM media containing

10% FBS and 1% PSG in 150 cm2 tissue culture treated flasks.

25 Chapter 2 Materials and Methods

HEK293 cell cultures were incubated at 37ºC with a humidified atmosphere of 5%

CO2. When cell density reached an estimated 80% or greater, the culture was passaged.

To passage the HEK293 cell culture, the spent media was first removed. Cells were gently washed once with 5 ml of 1X PBS, and then 5ml trypsin was added. The cells were incubated for approximately 2-3 minutes until fully detached from the flask surface.

To quench the trypsin reaction, 15 ml of media was added. The cell-containing mixture was pipetted into a 50 ml Falcon tube and centrifuged for 5 minutes at 80.5 rcf at room temperature. The supernatant was poured off and the cell pellet was uniformly resuspended in 10 ml of media with gentle pipetting. Approximately 2 x 106 cells were then seeded per 150 cm2 per flask and the volume made up to 20 ml with fresh media.

2.12 Transfection

2.12.1 Seeding

HEK 293 cells were subcultured into the appropriate culture treated vessels 24 hours before transfection. For the mRNA expression assay, one 12-well plate per cell line was used (one well for each transcription factor plus the miRN367 control in triplicate) then seeded with approximately .1 x 106 cells. For the ChIP, a total of eighteen

100 mm x 20 mm tissue culture treated dishes per cell line (one per transcription factor and its control in triplicate) were seeded with approximately 2 x 106 cells.

When ready to transfect, the current DMEM media was aspirated from the vessels. PBS was added to gently wash and then also aspirated. Opti-MEM reduced

26 Chapter 2 Materials and Methods serum media was added to the culture, 400 µL for the 12 well plates, and 7 ml for the

100 mm x 20 mm dishes.

2.12.2 Transfection Complexes

Note: The following amounts and descriptions are provided on a per-well basis.

Antisense ODNs (10 µM stock) were pooled for each transcription factor bound

RNA (for example: asHIST1H1D_as1 + asHIST1H1D_as2). Two separate complexes were prepared, one containing the antisense ODNs, and the other containing oligofectamine, the contents of which are provided in Table 2.7.

Culture vessel Antisense ODN & Opti-MEM Oligofectamine & Opti-MEM Combined volume volume volume 12 well plate 10 µL in 80 µL 2 µL in 8 µL 100 µL

100 mm x 20 30 µL in 400 µL 10 µL in 60 µL 500 µL mm dish

Table 2.7 Transfection complex mixtures by culture vessel size.

The complexes were incubated for 5-10 minutes at room temperature then gently mixed together and incubated for another 10-15 minutes also at room temperature. The

final transfection complex was then carefully added onto the cells by droplets and swirled to mix. The cultures were incubated at 37ºC, 5% CO2 for 4 hours. Cell starvation was ended when media containing a higher concentration of serum was added to the

27 Chapter 2 Materials and Methods culture so that the total volume was 800 µL in the 12 wells of each plate, and 10 mL in each of the 100 mm x 20 mm dishes, with 10% of the volume FBS. Cultures were incubated at 37ºC, 5% CO2 for 48-72 hours until ready to assay for RNA knockdown and localisation effects.

2.13 Knockdown Efficacy

2.13.1 RNA Isolation

After 72 hours of incubation post-transfection, the growth media was removed the

12-well cell culture plates. Each well received 500 µL of Trizol causing the adherent cells to lift. The contents were transferred to new 1.5 mL tubes after 2-3 minutes of incubation at room temperature. Chloroform was added (100 µL) and the tubes were shaken hard until the mixture became a uniformly cloudy pink colour. Samples were left to incubate for 10 minutes then centrifuged at 4ºC, max speed (18,213 rcf), for 20 minutes in a tabletop microcentrifuge. The RNA-containing aqueous layer was carefully transferred to new tubes and 1 µL of glycoblue added. To precipitate the RNA, 750 µL isopropanol was added, briefly vortexed and placed in the -80ºC freezer overnight. Samples were then centrifuged at 4ºC, max speed (18,213 rcf), for 20 minutes to pellet the RNA, made visible by the glycoblue. The supernatant was carefully pipetted out and discarded. One mL of 70% ethanol was added and the sample vortexed briefly before being centrifuged again at 4ºC, max speed (18,213 rcf), for 20 minutes. As much of the supernatant as possible was carefully removed by pipetting out and was discarded. The pelleted RNA was left to air dry in a fume hood for approximately 20-30 minutes.

28 Chapter 2 Materials and Methods

2.13.2 Removing DNA Contaminates

The air-dried pellet was dissolved in 20 µL of 1X DNase buffer before 1 µL of

TURBO DNase was added in order to remove any genomic DNA. The sample was then incubated at 37ºC for 30 minutes. To inactivate the DNase, 5 µL of 150 mM EDTA was added and incubated at 75ºC for 10 minutes.

2.13.3 Reverse Transcription

In a PCR strip, 1 µL of 10 mM dNTP mix, 1 µL of Random Hexamers, and 8 µL of the DNase treated RNA template were combined, briefly spun down to remove liquid from the sides and then incubated at 70ºC for 10 minutes. Afterwards, 2 µL of 10X M-

MLV Reverse Transcriptase Buffer, 1 µL of M-MLV Reverse Transcriptase and 7 µL of water were added for a final volume of 20 µL. The reaction was incubated at room temperature for 10 minutes, 37ºC for 50 minutes, then finally 94ºC for 10 minutes in order to inactivate the reverse transcriptase.

2.13.4 Quantitative Real Time PCR

The resulting cDNA was amplified by quantitative reverse transcription PCR (qRT-

PCR) using primer pairs for the corresponding locus of interest and well as RPL10 expression. For each sample, a reaction mixture containing 10 µL of Kapa Sybr Fast

(2X), 7 µL of water, 1 µL of the forward primer, 1 µL of the reverse primer, and 1 µL of the cDNA template, was loaded into a 384 well plate in triplicate. The plate was briefly spun down to bring all components to the bottom of the well.

29 Chapter 2 Materials and Methods

qRT-PCR was performed on a Applied Biosystems Viia7 Real Time PCR machine. Stage 1 was set to 95ºC for a hold time of 2 minutes. Stage 2 amplification was set to 40 cycles of 95ºC for 3 seconds followed by 60ºC for 30 seconds. A melt curve finished the program. The amplification data was analysed by the Applied

Biosystems software which provided the threshold cycle (Ct) for each sample.

2.14 Chromatin Immunoprecipitation for Transcription Factor Localisation

2.14.1 Cross-linking

To fix protein-DNA reactions 48 hours after transfection, 37% formaldehyde was added to the 100 mm x 20 mm plates to a final concentration of 1% (270 µL) . The plates were incubated at room temperature for 10 minutes with gentle rocking. The cross-linking reaction was terminated by adding 500 µL of 2.5 M glycine and incubating at room temperature for 5 minutes with gentle rocking. The media was pipetted off and discarded. Cells were washed with 3 mL cold PBS, added slowly to the plate, then swirled gently. As the adherent cells began to show signs of lifting, a cell scraper was used to remove the culture from the bottom surface and the contents of the plates were transferred to 15 mL Falcon tubes. The cell suspension was then centrifuged at 700 rcf for 5 minutes and the supernatant poured off. The cells were washed once more by resuspension in 3 µL of PBS, followed by centrifugation and discarding the supernatant as in the previous step.

30 Chapter 2 Materials and Methods

2.14.2 Lysis and Sonication

The cell pellet was completely resuspended in 600 µL of cold ChIP lysis buffer and transferred to a new 1.5 mL tube before being left on ice to incubate for 20 minutes.

The cellular lysate was centrifuged at 2,655 rcf at 4ºC for 5 minutes. The supernatant was discarded and the pellet was resuspended in 600 µL of cold nuclei lysis buffer, then incubated on ice for 10 minutes. The nuclear fraction was then sonicated in order shear the DNA into lengths around 300- 800 base pairs. The sonicator was programmed to 6 rounds of alternating 30 second pulse times and 30 second off times at an amplitude of

16 in ice cold water. After sonication, the samples were centrifuged at 2,655 rcf at 4ºC for 5 minutes and the supernatants were transferred to new 1.5 mL tubes.

2.14.3 Binding the Antibody

Each of the 600 µL samples were equally divided into 3 new 1.5 mL tubes: one to receive the antibody, one to receive no antibody, and one to forgo immunoprecipitation, considered the ‘input’. To those first tubes, 1 µL (1 µg) of the appropriate antibody was added and placed on a rotating platform along with the second tubes containing no antibody, meant to receive just the ‘beads alone', and both were incubated overnight at

4ºC. The ‘input’ tubes were left untouched at 4ºC until needed later for qRT-PCR.

2.14.4 Immunoprecipitation

In order to precipitate the transcription factor from the nuclear lysate, 30 µL of magnetic Protein G Dynabeads were added to the samples and incubated at room

31 Chapter 2 Materials and Methods temperature for 30 minutes on a rotating platform. The beads were pulled to the side of the tube after 1-2 minutes on a magnetic rack and the was supernatant carefully aspirated.

To thoroughly wash, each of the following steps was done twice. First, the beads were resuspended in 1 mL of PBS, tubes placed back on the magnetic rack for 1-2 minutes and supernatant aspirated. The second wash required the resuspension of the beads in 500 µL of 2X B&W buffer. Third, the beads were resuspended in 2X Wash buffer.

Elution of the DNA-protein complex from the Dynabeads was done by resuspension in 300 µL of 2X Elution buffer and incubation at 70ºC for 10 minutes. The beads were separated on the magnetic rack and the supernatants were transferred to new tubes.

At this point, the ‘input’ samples were removed from the 4ºC and made up to 300

µL with 2X Elution buffer. To remove protein from the samples, 1 µL of Proteinase K was added. Lastly, the cross-link was reversed by adding 12 µL of 5 M NaCl and incubating overnight at 65ºC.

32 Chapter 2 Materials and Methods

2.14.5 DNA Extraction

Isolation of the DNA was achieved by adding 300 µL of phenol:chloroform:isoamyl alcohol to the samples and shaking vigorously, followed by centrifugation at 14,000 rpm for 5 minutes. Approximately 100 µL of the aqueous layer was delicately pipetted into a new tube avoiding phenol contamination and 1 µL of Glycoblue was also added

To precipitate the DNA, 500 µL of 100% ethanol and 10% Sodium Acetate were mixed into each tube and left in -80ºC overnight. The samples were then centrifuged at

14,000 rpm at 4ºC for 20 minutes. As much of the supernatant was removed as possible and the pellet was air dried in a fume hood for 20-30 minutes. The DNA pellet was rehydrated in 20 µL of water.

2.14.6 Quantitative PCR

All quantitative PCR (qPCR) experiments were conducted in a 20 µL reaction volume consisting of 10 µL of Kapa Sybr Fast (2X), 7 µL of water, 1 µL of the forward primer, 1 µL of the reverse primer, and 1 µL of the immunoprecipitated DNA. Samples were loaded into a 384 well plate in triplicate and a standard curve relating to each primer pair was also plated in triplicate. The amplification was carried out under the same conditions described in section 2.13.4

33 Chapter 2 Materials and Methods

2.14.7 Standard Curve

A standard curve for qPCR Ct value comparison and absolute sample quantification was created using known volumes of HEK293 DNA. Genomic DNA was plated in triplicate in concentrations of 1 ng, 0.1 ng, 0.01 ng, 0.001 ng and 0.0001 ng for each primer pair used. Apart from the template, the components of reaction mixture remained the same relative to those in the experimental qPCR. Resulting Ct values were graphed as a line plot, the slope and intercept of which were used to calculate the quantities of DNA in the ChIP samples.

2.15 UCSC Genome Browser

Online tools from the UCSC Genome Bioinformatics web page were used to identify transcript details including orientation, coding ability, sequence homology and primer binding locations. Sequences of interest were queried in the UCSC human BLAT genome search tool (http://genome.ucsc.edu/cgi-bin/hgBlat). Controls were used to alter tracks displaying mRNA, ncRNA and expressed sequence data.

2.16 Biotinylated asODN-RNA Immunoprecipitation

2.16.1 Immunoprecipitation

Approximately 10 x 106 HEK293 cells were centrifuged at 700 rcf for 5 minutes at room temperature. The pellet was uniformly resuspended in 700 µL of ice cold ChIP cell

34 Chapter 2 Materials and Methods lysis buffer and evenly aliquoted into seven 1.5 mL tubes: one for each of the two asODN’s for each of the three RNAs, plus one for the miRN367 control. The aliquots were then incubated on ice for 5 minutes before centrifugation at 5,000 rpm at 4ºC for 5 minutes. The supernatant was discarded and the pellet resuspended in 750 µL of ice cold nuclei lysis buffer. After 10 minutes of incubation on ice, the tubes were centrifuged for 5 minutes at 5000 rpm at 4ºC. The supernatants containing the nuclear fractions were transferred to new tubes and 1 µL (100 nM) of the appropriate biotin labelled asODN was added. The samples were incubated for 30 minutes at room temperature on a rotating platform. Streptavidin magnetic beads were pre-washed and resuspended in an equal volume of nuclei lysis buffer. Each sample received 60 µL of the Streptavidin beads and was incubated for 15 minutes at room temperature on a rotating platform in order to capture the bound biotinylated asODN-RNA-protein complexes. Afterward, the magnetic beads underwent four wash stages — twice with 2XPBS + protease inhibitors and twice with 2X wash buffer — each time having been incubated in 500 µL of the buffer for 5 minutes on a rotating platform before separation on magnetic racks and discarding the supernatant. The proteins were eluted from the bound complexes by the addition of 100 µL 2X elution buffer and incubation at 65 ºC for 10 minutes.

The elutes (50 µL of each) were taken to the Bioanalytical Mass Spectrometry

Facility at the Mark Wainwright Analytical Centre at UNSW for protein composition analysis. Prior to trypsin digest, the samples were passed through 10kDa filter unit. The peptide results were matched against the NCBI database.

35 Chapter 2 Materials and Methods

2.16.2 Validation of the Immunoprecipitated RNA

To verify the correct RNA complexes were targeted by the biotin labelled asODNs, a portion of the previous immunoprecipitation elutes were reverse transcribed and then underwent PCR followed by gel electrophoresis.

Reverse transcription was carried out in two steps as described in section 2.13.3.

The samples then underwent PCR to amplify the cDNA quantity. In new PCR tubes, 12.5

µL of 2 X Taq polymerase master mix, 6.5 µL of water, 4 µL of cDNA and 1 µL each of forward and reverse primers for each corresponding sample was combined for a total reaction volume of 25 µL. The tubes were placed in the thermocycler and heated to

95ºC for 2 minutes followed by 50 cycles of a denaturation step at 95ºC for 30 seconds, an annealing step at 54ºC for 30 seconds, an extension step at 72ºC for 1 minute. At the end of the program, the samples were brought back down to a temperature of 4ºC.

Blue DNA loading dye (6X) was added to 5 µL of each sample before being loaded into lanes in a 1% agarose gel stained with Sybr Safe in DMSO (10,000 X) in

TBE buffer. A 100 bp ladder was also loaded as a size reference. The gel was charged at 110 mV for approximately 40 minutes, until the dye had run halfway across. The gel was illuminated under UV light for cDNA band detection.

36 Chapter 3 Results

Results

3.1 Deep Sequencing Alignment

Immunoprecipitations of the transcription factors Gtf3a, Sp1, p53, myc, Klf3 and

Klf1 along with an IgG control were performed in HEK293 cells. After deep sequencing and alignment quantification using Cufflinks, transcript enrichment from the IgG elutes was subtracted from those of the transcription factors of interest. Numerous transcript hits were found enriched over the IgG control (Table 3.1). Those with the greatest difference from the control were of interest for further investigation.

Transcription Factor RIP-Seq hits over 125 bp RIP-Seq hits over 1 kb

p53 37 28

Sp1 25 53

Klf1 27 29

Klf3 136 32

myc 6 92

Gtf3a 12 35

Table 3.1 Deep sequencing hits enriched 1.1X over IgG control per transcription factor.

The numerous deep sequencing enrichment peaks for the IgG control vs transcription factor immunoprecipitations were aligned with RefSeq genes (Figures

3.1-3.3). These significant enrichments were compared to known expressed sequence data from the UCSC genome browser for identification and further characterisation.

37 Chapter 3 Results

Based on this, the following transcription factor bound RNAs were chosen for study: Sp1 bound SF3B5, p53 bound HIST1H1D, and KLF3 bound RNY1.

38 Chapter 3 Results IgG vs p53vs HIST1H1DIgG sequence peaks Figure 3.1 p53 deep sequencing alignment mapped data A). Thelocation theHIST1H1Dof gene A). Figure3.1p53 deep sequencing alignment mapped data Theimmunoprecipitation results demonstrateIgG of thelack of alongchromosome B). six. Theimmunoprecipitation of C). expressedsequence locusthat hitsat asindicated bythearrow. p53results reveal abound sequence visible bytheenrichment deepof sequencing alignment AlignedRefSeq expression data identifies thetranscript astheHIST1H1D RNA. D). hits. B) A) C) D)

39 Chapter 3 Results

IgG vs Sp1vs SF3B5sequence IgG peaks Figure 3.2 Sp1 deep sequencing alignment mapped data A). Thelocation theSF3B5geneof along A). Figure3.2Sp1 deep sequencing alignment mapped data Theimmunoprecipitation results demonstrateIgG of thenon-existence of chromosome B). six. Theimmunoprecipitation Sp1of C). expressedsequence locusthat hitsat asindicated bythearrow. resultsreveal abound sequence visible bytheenrichment deepof sequencing alignment D). hits. AlignedRefSeq expression data identifies thetranscript astheSF3B5RNA. A) B) C) D)

40 Chapter 3 Results

data A). Thelocation theRNY1of gene A). data IgG vs KLF3RNY1vs sequenceIgG peaks Figure3.3KLF3deep sequencing alignment mapped Theimmunoprecipitation results demonstrateIgG of theabsence of alongchromosome B). six. Theimmunoprecipitation of C). expressedsequence locusthat hitsat asindicated bythearrow. KLF3results reveal abound sequence visible bytheenrichment deepof sequencing alignment AlignedRefSeq expression data identifies thetranscript astheRNY1 RNA. D). hits. A) B) C) D)

41 Chapter 3 Results

3.2 Determining the Efficacy of asODN Mediated RNA Knockdown

Before an investigation could be done into the effects of transcription factor associated RNA knockdown on their localisation to homologous loci, it was necessary to

first validate a significant reduction of the targeted RNAs would occur as a result of asODN transfection. The small antisense nucleotide strands should ideally bind to their complementary RNA sequence thus sterically preventing protein interactions and targeting its degradation by RNase H.

The two antisense ODNs for each of the transcription factor associated RNAs were pooled and then transfected into the cell cultures. A control transfection with miRN367 was also carried out. After 72 hours, the cells were lysed and RNA abundance assessed. HIST1H1D, SF3B5 and HY1 transcript expression in their respective asODN transfected samples were compared against the miRN367 transfected samples. RNA abundance of a constitutively expressed housekeeping gene, RPL10, was also compared between samples to provide a standard for relative expression levels and demonstrate that the results of the transfections were due to specific targeting. After

DNase treatment and cDNA conversion, variation in transcript abundance between the treatment and control samples were quantified with qRT-PCR using the software provided with the ViiA 7 Real-Time PCR machine.

42 Chapter 3 Results

Analysis of the qRT-PCR data was done using the comparative Ct method and relative quantification. First, the difference in Ct values ( ΔCt) between the experimental and housekeeping genes was calculated for all samples.

ΔCt = Ct sample – Ct RPL10

Next, the ΔCt between the treatment and control samples ΔΔCt was calculated.

ΔΔCt = ΔCt treatment - ΔCt control

The resulting fold-change in expression of the targeted RNA between the treatment and control samples was measured as 2^(-ΔΔCt).

Expression = 2^(-ΔΔCt) x 1000

The average expression from biological and technical replicates of each sample set was calculated and then graphed as a fraction of the control.

Error bars were added representing the standard error of the control and two tailed t-tests were performed as an indicator of significant data.

43 Chapter 3 Results

Transcript knockdown in asODN transfected HEK293 cells 3.00

2.19

1.38 1.00 1.00 1.00 1.00 1.00 1.00 0.842 0.563 0.56

0.012 0.003 0.008 0.04 RelativeExpression (Gene/RPL10) -0.25 F1/R1 F2/R2 F1/R1 F2/R2 F1/R1 F2/R2 HIST1H1D SF3B5 RNY1 Targeted RNAs by primer pair Treatment Control

Figure 3.4 Transcript knockdown in asODN transfected HEK293 cells measured by A) forward and reverse primer pair 1 and B) forward and reverse primer pair 2. Both a show a significant decrease in HIST1H1D and SF3B5 transcripts but a weaker effect in RNY1.

Both HIST1H1D and SF3B5 transcripts showed a near complete knockdown after it’s targeting with pooled antisense asODNs in HEK293 cells. Transfections with the asODNs resulted in a 98.8% knockdown of HIST1H1D as detected by the F1/R1 primer pair and a staggering 99.7% knockdown by the F2/R2 primers. SF3B5 showed in a

99.2% and 96% reduction as indicated by F1/R1 and F2/R2 primers respectively.

However, the targeted RNA knockdown was not as successful for RNY1. Transcript expression was not even halved for RNY1 in HEK293 cells with a 44% and 13% reduction according to the data from both primer pairs. Expression of RPL10 was not affected by the transfections as presumed and remained the same across all samples.

44 Chapter 3 Results

Significant RNY1 knockdown was not achieved and since a successful knockdown of the transcripts was needed for the next step of assessing the effects on transcription factor localisation at its corresponding homologous loci, it was therefore decided to focus the rest of the investigation solely on HIST1H1D and SF3B5 transcripts.

3.3 Enrichment or Loss of Transcription Factor Localisation After RNA

Knockdown

In order to test the hypothesis that these RNAs are involved in transcription factor binding at homologous loci, asODN mediated knockdown followed by ChIP was performed. Pooled asODNs targeting HIST1H1D and SF3B5 were transfected into

HEK293 cells. A control transfection using miRN367 was also performed. After 48 hours, the cells were cross-linked as they had reached confluency. Any interactions between

DNA, RNA and protein were fixed from that moment.

After nuclei lysis, sonication was used to shear the chromosomal DNA into approximately 200 bp lengths so that those bound to the transcription factors of interest could be easily separated. Antibodies specific to p53 and Sp1 were used to immunoprecipitate the protein-DNA complexes. As a control for background, samples containing no antibodies were subjected to the same immunoprecipitation process. As a positive control, aliquots of the original nuclear fraction that had not undergone

45 Chapter 3 Results immunoprecipitation were also quantified. Bound DNA was then measured for the

HIST1H1D and SF3B5 loci via qPCR using the both primer pairs for each gene.

Analysis of the qPCR data was done using absolute quantification in reference to the standard curve of per primer pair. The Ct values of the standard curve were plotted according to dilution concentration. With its slope and intercept calculated, the standard curve was used to deduce deduce the nanogram concentration of the corresponding samples.

DNA concentration = Ct sample x slope + intercept

To normalise against background, the DNA concentrations from the no antibody

(beads alone) samples were subtracted from those of the antibody and also the input samples with the difference signifying the DNA enrichment.

46 Chapter 3 Results

Transcription factor localisation at HIST1H1D and SF3B5 Loci After Knockdown 4.00

3.00 2.617 2.205 2.00 Control

1.00 1.00 1.00 1.00 1.00 0.884

EnrichmentasaFraction of 0.334 0.00 HIST1H1D F1/R1 HIST1H1D F2/R2 SF3B5 F1/R1 SF3B5 F2/R2 Cis-regulated Loci by Primer Pair Treatment Control

Figure 3.5 Transcription factor localisation at the HIST1H1D and SF3B5 loci after asODN-mediated knockdown. The HIST1H1D locus displays an increase in p53 localisation after its RNA knockdown, while that of SF3B5 witnesses a decrease in Sp1 localisation.

Differential yet contrasting localisation enrichment was observed between the controls and treatments for both transcription factors after transfection of the asODNs.

Localisation to the HIST1H1D locus by p53 was heightened in proportion to its control by over 2.5 fold as measured by the first primer with a two tailed t-test of p < 0.03 significance. The second primer pair indicated a 2.2 fold localisation enrichment over the control. The opposite effect was observed in SF3B5 asODN treated cells as Sp1 with a loss of localisation. The DNA concentration in the treated samples was 88% as enriched as compared to those of the control when measured by the first set of primers, and just

33% when measured by the second (with a significance of p < 0.05).

47 Chapter 3 Results

Localisation of p53 to the HIST1H1D locus was enriched after the asODN

knockdown of its associated RNA, however loss of localisation was experienced by Sp1

at the SF3B5 locus.

3.4 Enrichment or Loss at Homologous Loci

3.4.1 Determining Possible Homologous Targets

The UCSC Genome Browser was used to uncover other possible loci whose

transcriptional activation may also have been affected by the knockdown due to inherent

sequence homology to the transcription factor associated RNA. The HIST1H1D and

SF3B5 sequences were copied into the human BLAT genome search and the queries

matched according to the Dec. 2013 (GRCh38/hg38) assembly79. Sequences for both Genomes Genome Browser Tools Mirrors Downloads My Data Help About Us Human BLAT Resultsare located in Appendix II.

BLAT Search Results

Go back to chr6:26156700-26157115 on the Genome Browser.

ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO STRAND START END SPAN ------browser details YourSeq 737 1 737 786 100.0% 6 - 144094881 144095617 737 browser details YourSeq 26 641 670 786 89.3% 14 + 100654691 100654719 29 browser details YourSeq 25 372 397 786 100.0% 17 - 32530046 32530079 34 browser details YourSeq 24 444 467 786 100.0% 8 + 64196465 64196488 24 browser details YourSeq 22 547 569 786 100.0% 18 - 5709092 5709116 25 browser details YourSeq 21 576 596 786 100.0% 16 + 81590088 81590108 21 browser details YourSeq 20 654 673 786 100.0% 5 + 63517980 63517999 20

Missing a match?

Figure 3.6 UCSC Genome Browser BLAT query of HIST1H1D sequence. Besides the

top hit of its own locus, the HIST1H1D transcript shares a large span of homology with 3

other loci.

48 Genomes Genome Browser Tools Mirrors Downloads My Data Help About Us Human BLAT Results

BLAT SearchChapter Results 3 Results

Go back to chr6:26156700-26157115 on the Genome Browser.

ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO STRAND START END SPAN ------browser details YourSeq 777 1 777 777 100.0% 6 - 26234212 26234988 777 browser details YourSeq 324 182 775 777 80.4% 6 - 26055741 26056305 565 browser details YourSeq 193 155 777 777 85.7% 6 - 27866792 27867424 633 browser details YourSeq 145 368 777 777 83.8% 6 + 26156700 26157115 416 browser details YourSeq 22 754 777 777 95.9% 6 + 27865775 27865798 24 browser details YourSeq 21 757 777 777 100.0% 1 - 149840687 149840707 21 browser details YourSeq 21 757 777 777 100.0% 1 - 149842218 149842238 21 browser details YourSeq 21 757 777 777 100.0% 1 - 149886183 149886203 21 browser details YourSeq 21 757 777 777 100.0% 6 + 27810516 27810536 21 browser details YourSeq 21 757 777 777 100.0% 1 + 228458538 228458558 21 browser details YourSeq 21 757 777 777 100.0% 1 + 149851574 149851594 21 browser details YourSeq 21 757 777 777 100.0% 1 + 149853105 149853125 21 browser details YourSeq 20 26 45 777 100.0% 10 - 106609003 106609022 20

Missing a match? Figure 3.7 UCSC Genome Browser BLAT query of SF3B5 sequence. The SF3B5 RNA

shares homology with various genes over much significantly shorter spans, thus

decreasing the likelihood that it regulates these loci via transcription factor interaction.

The UCSC BLAT results of the HIST1H1D and SF3B5 RNAs showed significant

match homology (80% similarity to the query or greater) for multiple loci in the human

genome. The highest score hit with 100% identity was the gene of interest and the

matches that followed varied in alignment span along with a decreasing score. A total of

12 loci matches with a homologous span of 20+ nucleotides were found for HIST1H1D

and 6 for SF3B5.

Viewing the matches in the UCSC genome browser identified the homologous

loci and understandably, the majority of matches were of functionally related proteins for

the HIST1H1D sequence. All but one of the matches were of the histone family including

some from histone cluster 2 and 3 which were located in multiple chromosomes.

Interestingly, multiple long span matches were found for the HIST1H1D RNA which

showed significant sequence homology with 3 other loci across a span of 400 bases or

49 Chapter 3 Results greater, the majority of its transcript length. The long span matches for HIST1H1D were identified as linker histones HIST1H1C, HIST1H1B, and HIST1H1E with the top 3 alignment spans of 565, 633 and 416 respectively. All three loci are protein coding and located on chromosome 6, however HIST1H1E is in the opposite strand orientation.

The SF3B5 sequence query in BLAT returned no long span matches. All of the homologous hits were of 34 nucleotides or less and after identification in the UCSC genome browser, were found to be mainly intronic. None of these loci were of functionally related genes.

Due to their long span high sequence homology as well as functional similarity, the HIST1H1B, HIST1H1C and HIST1H1E loci were further investigated for potential transcriptional effects as a result of HIST1H1D knockdown.

3.4.2 Enrichment or Loss of Transcription Factor Localisation at Homologous Loci

To explore a potential regulatory network involved with HIST1H1D RNA, the enrichment or loss of transcription factor localisation to its homologous HIST1HIB,

HIST1H1C and HIST1H1E loci was measured by qPCR of the ChIP elutes after its asODN mediated knockdown. The ChIP was carried out in the same manner as section

3.3 but primer pairs for HIST1H1B-E were used to measure amplification of the transcripts in the qPCR step. Enrichment differential was calculated as a fraction of the control as done previously.

50 Chapter 3 Results

Enrichment at Homologous Loci After HIST1H1D asODN Transfection 14.00

10.44

7.490 6.88

4.062

3.31 2.864 2.875 EnrichmentasaFractionControl of 1.00 1.00 1.00 1.00 1.205 1.00

-0.25 H1B F1/R1 H1B F2/R2 H1C F1/R1 H1E F1/R1 H1E F2/R2 HIST1H1 Homologous Loci by Primer Pair Treatment Control

Figure 3.8 Transcription factor enrichment at homologous loci after HIST1H1D asODN- mediated knockdown. p53 binding at the HIST1H1B, HIST1H1C and HIST1H1E loci was increased after HIST1H1D RNA knockdown.

After knockdown of the HIST1H1D transcript, not only was an increase in p53 localisation to its own locus observed, but also an increase to those of the homologous

HIST1H1B, HIST1H1C and HIST1H1E. The greatest localisation enrichment was at the

HIST1H1B locus with an over 7 fold increase over the control as measured by primer pair 1, and a 4 fold increase by primer pair 2. Transcription factor binding was found to increase by over two fold in the other HIST1H1 loci.

51 Chapter 3 Results

3.5 Expression at Homologous Loci

Once evidence had demonstrated that p53 bound HIST1H1D affects the transcription factor’s localisation to homology containing loci, the next step was to investigate the transcriptional outcome. RNA expression for the HIST1H1B, HIST1H1C and HIST1H1E loci was measured by qRT-PCR after asODNs-mediated HIST1H1D knockdown. The same samples from the knockdown efficacy step (section 3.2) were reverse transcribed, converting the mRNA contents into cDNA. Analysis by qRT-PCR was carried out under the previous conditions. Primer pairs for the HIST1H1B,

HIST1H1C and HIST1H1E loci detected the RNA expression levels and were compared to those of the housekeeping gene RPL10. Expression for the loci of interest were calculated as done in section 3.2 and represented as a fraction of the control.

52 Chapter 3 Results

Expression at Homologous Loci After HIST1H1D asODN Knockdown 3.00 2.582

2.25

1.653 1.50

Control 1.00 1.00 1.00 1.00 1.028 1.00 0.75 0.665 0.492 ExpressionasaFraction of 0.00 H1B F1/R1 H1B F2/R2 H1C F1/R1 H1E F1/R1 H1E F2/R2 HIST1H1 Homologous Loci by Primer Pair Treatment Control

Figure 3.9 Expression at homologous loci after HIST1H1D asODN-mediated knockdown. Increased transcription factor localisation led to contrasting expression results between the HIST1H1b, HIST1H1C and HIST1H1E loci.

Transcription at the homologous HIST1H1B, HIST1H1C and HIST1H1E loci was was altered by HIST1H1D RNA knockdown with the subsequent increase in p53 localisation. HIST1H1C experienced over a 250% increase in RNA abundance over its control, while one HIST1H1E primer pair reported a 160% increase. Interestingly, expression of HIST1H1B RNA exhibited a decrease by 51% and 35% by primer pair which is in contrast to those of the other loci.

53 Chapter 3 Results

3.6 Biotinylated asODN-RNA Pulldown

Whereas the ChIP experiments had immunoprecipitated the transcription factor in order to investigate the DNA or RNA bound, it was also of interest to reversely pull down the RNA to validate the protein bound. Biotin labelled asODNs designed to bind to the

RNAs of interest were added to the un-crosslinked cell lysates then magnetically separated and washed. Mass spectrometry of the elutes was performed to verify the formation of these RNA-transcription factor complexes as well as identify other potential protein partners. The resulting peptide summaries did not include hits for the p53, Sp1 or

RNY1 but instead contained common human proteins, mainly keratin.

To validate the correct RNAs were pulled down by the biotinylated asODNs, aliquots of the elutes were reverse transcribed and then amplified by PCR before being run on an agarose gel along with a 100 bp ladder. The gel was visualised under UV light for the presence and size comparison of bands indicating correctly targeted sequences.

Only a few faint bands were visible and were nearest the smallest step on the ladder, approximately 100 bp. Overall, the quality and clarity of the cDNA in the agarose gel was low.

54 Chapter 4 Discussion

Discussion

RNAs found bound transcription factors are not necessarily coincidental, and in fact may possess roles which regulate the transcription of their own loci or others. In this study, p53, Sp1 and KLF3 transcription factors were immunoprecipitated and deep sequenced in order to uncover significant bound RNAs. These HIST1H1D, SF3B5 and

RNY1 transcripts were knocked down in order to evaluate their respective bound transcription factor localisation, as the RNAs may be acting as guides, decoys, or otherwise to affect loci targeting. Other homology containing loci were also investigated to discover whether the knockdown had a trans-regulatory function.

4.1 RNA Directed Transcriptional Activation and Repression

The role of RNA is far from singular. The varying forms of these DNA compliments are involved in and an ever increasing list of critical biological processes aside from the protein synthesis stream of the central dogma. A long list of RNA categorical definitions is ever growing with research continually elucidating its specific functions. More than just the familiar players in translation, RNAs have also been found to act as transcriptional mediators through their ability to act as decoys, scaffolding, guides and signals in transduction cascades. Direct involvement of non-coding transcripts in both gene silencing and activation pathways has been demonstrated, along with its misregulaton linked to tumourigenesis. This, taken along with ENCODE data which reports non- coding transcript expression as not only overwhelming, but also tissue specific and

55 Chapter 4 Discussion conditional, suggested that there are still many aspects of the vast human transcriptome that are uncharacterised. Evidently, there was much more to the single stranded nucleotide transcripts than had been previously considered.

A body of evidence has been recently building which demonstrates ncRNA’s ability to mediate transcription in both the short and long term. Morris et al described a mechanism in which antisense ncRNA molecules recruited and guided chromatin remodelling protein complexes to complementary loci thus causing silent state epigenetic changes or activation through derepression of target loci (25, 36, 80). RNAs had also been demonstrated to act as decoys, detracting regulatory proteins from the target loci like a sponge (81). Assigning such major regulatory roles to these RNAs, which were previously thought to have been merely transcriptional noise, marked a new chapter in furthering our understanding of the complex web of interactions that essentially turns genes ‘on’ or ‘off’.

Since transcription factors are the DNA binding proteins involved in the initiation or repression of transcription, it was logical to investigate whether RNAs could potentially play a role in their targeting as well. In order to elucidate any transcription factor bound RNAs, a RIP-Seq was carried out which immunoprecipitated the proteins of interest. Originally, this was done on six transcription factors: p53, Sp1, KLF3, KLF1, myc and Gtf3a, all of which are widely expressed and can transcriptionally activate or repress a range of genes including those involved in cycle progression, differentiation and apoptosis (65, 82). After the the deep sequencing hits were aligned to the human genome and compared against the control, the greatest transcript enrichment

56 Chapter 4 Discussion differentials were identified with the UCSC genome browser. Although the original hypothesis was based on the assumption of ncRNAs binding to transcription factors in order to modulate gene activation or repression, some of the most striking transcript differentials were actually identified as protein coding. Out of the significant RIP-Seq hits, it was HIST1H1D, SF3B5 and RNY1 that were chosen for this investigation and all were interestingly protein coding. With research constantly seeming finding new functions for various RNA subtypes, it was possible too that these protein encoding transcripts could possess another role other than as simple protein blueprints. Surely, not all mRNA would be destined for translation and may also act as the previously characterised non-coding transcripts, able to modulate transcription utilising its inherent complementarity and secondary structure.

Whether they act as protein recruiters, guides, decoys or otherwise, the current mechanisms for ncRNA-mediated transcriptional regulation all depend on the use of their sequence complementarity to interact with the necessary partners. In order to assess the modulating ability of the RNAs in this study, it was hypothesised that the most complimentary loci, their own, has the highest probability of being subject to this control.

To determine if such a RNA-transcription factor complex is involved in regulating of its own locus, is to remove the RNA from the transcriptional landscape and observe any subsequent changes in expression. In this study, the knockdown was achieved using short asODNs uniquely complimentary to each RNA, which bind to the transcript rendering it ineffective or targeted for degradation.

57 Chapter 4 Discussion

Before any effects on transcription factor localisation could be assessed, it was necessary to first validate the asODNs efficacy in knocking down its target. To do this, two pooled asODNs per RNA as well as the control miRN367 were and transfected into

HEK293 cell cultures. After 72 hours, the cells were lysed and RNA contents isolated.

The resulting differential RNA expression was measured by qRT-PCR and compared to that of a constitutively expressed housekeeping gene, RPL 10, serving as a benchmark.

As expected, the expression of RPL10 was consistent between samples, while the

RNAs targeted by the asODNs had substantially declined. Because the efficacy of

HIST1H1D and SF3B5 knockdown was specific and near complete, it could be assumed that for the remaining experiments, any resulting changes in localisation would not only be more clearly pronounced, but also a direct consequence of the RNA’s absence.

Verification of targeted and complete knockdown of the investigated RNAs made it possible to further evaluate the hypothesis, that the HIST1H1D and SF3B5 transcripts may be involved in modulating the action of its transcription factor when bound. Once again, the asODNs were pooled and transfected along with separate miRN367 controls.

When the cells had reached confluency, cross linking was done to fix the interactions much like a collective snapshot of the goings on within each at that moment. Antibodies were used to separate the transcription factors from the nuclear fraction after cell lysis and sonication which had sheared the DNA into smaller fragments of approximately 200 bp. Because of the crosslink, any interacting partners would remain bound to p53 and

Sp1, including the DNA loci it was regulating. A transcription factor bound to DNA is an direct indicator of either transcriptional activation or repression, therefore by pulling down p53 or SP1 and being able to quantify the targeted DNA sequences means it is

58 Chapter 4 Discussion possible to measure localisation differentials, in other words changes in transcriptional regulation at that loci. The resulting qPCR amplification data was used to compare the deduced amount of targeted DNA within the knockdown samples over those of the controls.

Transcription factor localisation to both of the target loci was affected by the knockdown of its correlating RNA, but surprisingly with opposite results. While knocking down SF3B5 RNA leads to a loss of Sp1 localisation at that locus, the HIST1H1D knockdown actually enriched p53 at its locus. The mechanism proposing that these regulatory RNAs may recruit protein partners, in this case transcription factors, and guide the complex to homology containing loci logically suggests that by knocking down the RNA, a loss of localisation enrichment would be observed. This was true for the Sp1 bound SF3B5, giving way for the consideration of this possible mechanism. On the other hand, HIST1H1D experienced a gain of p53 at its locus so a different mechanism must have been responsible.

Past studies have demonstrated some lncRNAs ability to regulate expression of other loci by endogenous competition for a protein’s interaction. With their sequence similarity inherently mimicking the intended locus, these RNAs bind and consequently sequester regulatory proteins away from the target. As mentioned in a prior example, the lncRNA PANDA acts as a decoy, keeping the transcription factor NF-YA from its target gene promoters (29-30, 83). Because localisation of p53 to the HIST1H1D locus was found to increase after HIST1H1D RNA knockdown, it is possible the the HIST1H1D

RNA was acting as a decoy sequestering the transcription factor from its target.

59 Chapter 4 Discussion

4.2 Trans-regulation at Homologous Loci

Both SF3B5 and HIST1H1D RNAs were found to be involved in regulating expression through transcription factor binding in cis but it was also necessary to investigate whether these RNAs may target other loci in trans. Examination of both

RNAs in the UCSC genome browser revealed plausible loci which may be subject to their regulation based on close homology. The SF3B5 sequence query returned only small, mostly intronic matches of unrelated genes so it was assumed that these were a matter of chance and not mediated by the RNA. Search of HIST1H1D returned three loci matches, all with high homology, long span length and related function. It was very conceivable that these genes - HIST1H1B, HIST1H1C and HIST1H1E - could be transcriptionally mediated by the HIST1H1D sequestration of p53.

To test the effects of HIST1H1D RNA knockdown on the localisation of p53 to the homologous HIST1H1B, HIST1H1C and HIST1H1E loci, another ChIP was performed on HEK293 cells 48 hours after transfection with HIST1H1D or miRN367 asODNs. The transcription factor bound DNA was isolated and quantified by qPCR for each of the

HIST1H1 loci then compared against the miRN637 control. The results revealed an increase in p53 localisation at all three of these homologous genes. A loss of

HIST1H1D RNA led to p53 enrichment at the HIST1H1B-E loci, meaning the RNA was involved in regulating its own locus in cis as well as those of its related loci in trans across two chromosomes.

60 Chapter 4 Discussion

The increased localisation of p53 to HIST1H1B-E loci which was observed after

HIST1H1D RNA knockdown wasn't enough to conclude the transcriptional outcomes for those genes. Differentials in expression at the homologous loci also needed to be determined. The same elutes from the first knockdown assay were used since the nuclear mRNA landscape in both treatment and control cells had been captured.

Whereas earlier the HIST1H1D mRNA was measured to validate its complete knockdown after asODN transfection, this second qRT-PCR measured the activation or repression of HIST1H1B, HIST1H1C and HIST1H1E mRNA expression.

Transcription of the HIST1H1 genes did not uniformly correlate with p53 localisation. Although the transcription factor was able to bind to the target loci after

HIST1H1D RNA knockdown more readily, mRNA expression from those genes was conflicting. While HIST1H1B witnessed a seven fold increase in p53 binding, its expression was surprisingly knocked down to less than half of it’s control (p<0.005).

Although these genes share extensive homology, this could not have been due to a side-effect of asODN mis-targeting as its was designed to be uniquely specific to

HIST1HID. The other two loci underwent an increase in expression, with the HIST1H1C transcript enriched twice fold and HIST1H1E over 1.5.

Contrary to some reports describing the transcription factor as solely an activator, here p53 localisation is shown to both represses and activate these homologous linker histone genes after its sequestering HIST1H1D RNA is knocked down. This demonstrates an endogenous mechanism of transcriptional regulation by mRNA

61 Chapter 4 Discussion effectively acting as a non-coding antisense transcript to titrate regulatory proteins away from target loci in cis and trans.

4.3 mRNA Acting as ncRNA

The post-Human Genome Project revelations which proved ncRNA is biologically functional, broadened the entire field of transcriptomics demonstrating our understanding of classical RNA biology may need an updating. Just as we found that not all ncRNA is transcriptional noise, it’s becoming apparent that not all mRNA serves to protein build.

This case for functional duality in mRNA has only recently been established and exemplified by transcripts such as MYCNOS. In humans, the neuronal MYC (MYCN) gene is found to be transcriptionally regulated by MYCN opposite stand (MYCNOS), a protein-coding RNA generated by the MYCN locus (84). When translated, MYCNOS inhibits GSK3beta which stabilises the MYCN protein, an indicator of poor prognosis in neuroblastoma. However, when acting as a non-coding functional regulator, the antisense RNA recruits various proteins such as G3BP1 to the MYCN promoter and may act as a scaffold for protein localisation which suppresses transcription at the target locus (84). Other protein coding RNAs have also been found to function as non-coding regulators such as VegT in frogs which utilises its secondary structure to relay signals for cytokeratin filaments within the oocyte (84).

62 Chapter 4 Discussion

Both HIST1H1D and SF3B5 genes encode protein, however this investigation has confirmed the transcripts also act as endogenous antisense RNA regulators at homology containing loci. HIST1H1D RNA may either be translated into the H1D linker protein, or modulate transcription as a ncRNA decoy of the HIST1H1B-E loci. Similarly,

SF3B5 translates to the splicing factor 3b subunit 5, or alternatively may be utilised as non-coding guide aiding Sp1 localisation. Although examples of this RNA coding vs non- coding dichotomy are emerging, the conditional mechanisms balancing the functional fate are still unclear.

It seems from the data presented that SF3B5 RNA acts as a Sp1 guide and is involved in a singular feedback relationship with its protein coding locus as no other loci were discovered to be under its control. On the other hand, HIST1H1D RNA regulates p53 binding by acting as a target decoy and also supports transcriptional regulation at multiple homology containing genes, suggesting the presence of a response cascade.

Both of these mechanisms are illustrated in Figure 4.1.

63 Chapter 4 Discussion

1a. 1b.

Sp1

Sp1 SF3B5 SF3B5

Target Target

2a. 2b. p53 p53

HIST1H1D HIST1H1D

Target Target

Figure 4.1 SF3B5 RNA as a protein guide and HIST1H1D as a target decoy. 1a). SF3B5 binds to Sp1. 1b.) SF3B5 RNA leads the transcription factor Sp1 to the target locus utilising homologous sequence complementarity. 2a.) HIST1H1D RNA contains homology to the target loci. 2b.) p53 binds instead to HIST1H1D.

64 Chapter 4 Discussion

4.4 HIST1H1D in Cell Senescence

In addition to their primary function as chromosome packaging linker proteins, the histone H1 gene family plays an interesting role in tumour suppression through cell senescence and apoptosis which plausibly tie back to the observed regulatory activities of HIST1H1D RNA.

Cellular senescence is a tumour suppressive response mechanism preventing over proliferative and stress damaged cells from becoming immortalised by inducing cell-cycle arrest (85). The tumour suppressive qualities of cell senescence rely on chromosomal maintenance of growth arrest signals in which proliferative genes are compacted into inactive heterochromatin states (86). It was recently reported that the histone H1 family is transcriptionally repressed in senescence-induced cells and linker histones are actually lost from critical areas of chromatin, leaving a unique condensation signature called senescence-associated heterochromatic foci (SAHF) (86). Loss of de novo H1 proteins in senescent cells is direct result of a repression in mRNA synthesis at those loci (86). Funayama et. al. found that HIST1H1D mRNA was primarily repressed in

RasG12V-transfected senescent WI-38 cells as compared to their proliferative control.

Furthermore, the study found HIST1H1D mRNA was undetectable in the senescence- induced WI-38 cells while the amount of its protein remained similar to its control, suggesting that repression of H1 protein synthesis may also be coupled with post translational modifications (86).

65 Chapter 4 Discussion

It is reasonable to postulate that this bifunctional HIST1H1D RNA may be the messenger involved in a linker histone mediated, cellular senescence cascade orchestrated by the tumour suppressive transcription factor p53. In this investigation, the

HIST1H1D mRNA was reported to act also as a ncRNA which sequestered p53 from its locus as well as its related H1 family loci. In ‘normal’ proliferating cells, including the miRN367 control, balance of HIST1H1D mRNA transcription kept the senescence inducing p53 from localisation at the H1 genes. However, once the RNA was lost via asODN knockdown, p53 was engaged at the H1 loci, possibly to have begun its tumour suppressive cascade and subsequent SAHF formation.

HIST1H1D RNA involvement in transcriptionally regulating the H1 family of genes within other p53 controlled networks has yet to be defined but as with all p53 pathways, the possibilities are therapeutically promising.

4.5 Looking Ahead

In light of this recent data, further experimentation is still needed to verify the mechanisms and players involved, as well as investigate viable therapeutic applications.

Over the study, asODN-mediated knockdown was used to observe transcriptional

66 Chapter 4 Discussion ramifications, but it would also be useful to over express the RNAs to assess the opposing effects. This may uncover the mechanism by which the bimodal RNAs are balanced between non-coding and coding functions. Secondly, mass spectrometry of the

ChIP elutes may elucidate protein partners involved in formation of the RNA- transcription factor complex and localisation to its target. This may be especially useful in regards to the SF3B5 guiding of Sp1 to its locus. Little is known about this mechanism as SF3B5 may recruit more than one regulatory protein for transcriptional activation and possibly act as a scaffold at the gene’s promotor. It would also be of interest to repeat the biotin labelled asODN pulldown of the investigated RNAs, which would further validate the RNA-transcription factor interaction as well as recognise any other proteins bound including Bax and BCL-xL which interact with the H1 and p53 proteins in the apoptosis pathway. Unlike the previous attempt, it may be useful to crosslink the reaction before the immunoprecipitation as well as use a larger starting culture since the

RNAs are not relatively abundant and a considerable portion of the sample may be lost during the processes.

Therapeutically, it would be intriguing to investigate whether the induction of

HIST1H1D asODNs could be used to induce cell senescence in immortalised or otherwise toxically damaged cells in order to suppress tumour growth. In this study, knockdown of HIST1H1D locus was singular and enough to clearly affect p53 localisation and H1 gene expression, suggesting it could be used to specifically target a tumourigenesis pathway by obstructing uninhibited cellular growth signals in SAHF formation. After HIST1H1D asODN transfection, cell viability would need to be measured along progressive time points to gauge the induction of cellular senescence. A different

67 Chapter 4 Discussion cell line may also be used in this case as HEK293’s are easy to transfect and test for preliminary effects, but do not accurately represent all conditions in vivo.

68 Chapter 5 Conclusion

Conclusion

This study aimed to discover novel transcription factor associated RNAs with the notion that these were functional non-coding sequences utilised as guides, decoys or scaffolds to modulate regulatory protein localisation and ultimately expression at homology containing loci. Primary deep sequencing analysis of ChIP elutes uncovered multiple transcripts but the HIST1H1D, SF3B5 and RNY1 were chosen for investigation due to their staggering enrichment differentials over the IgG control. Interestingly, all three were identified be protein coding. These RNAs were knocked down in order to assess the affects on transcription factor localisation. Although HY1 was not successful, the HIST1H1D and SF3B5 loci experienced contrasting localisation results suggesting each used a distinct functional role - SF3B5 acted as a guide, while HIST1H1D was a decoy. The homologous H1 gene family was also assessed, which elucidated a network regulated by the HIST1H1D RNA. The H1 loci all experienced an increase in p53 localisation with varying expressional outcomes. In the case of HIST1H1D, this has clear ties to tumour suppression with therapeutic possibilities.

HIST1H1D and SF3B5 are just two more examples of the surprisingly multifunctional RNA molecule in the increasingly complex and flexible interplay between genome regulation and cellular signalling. Further studies are inherently needed to assess the roles of other transcripts, protein coding or otherwise, which are bound to regulatory proteins as it becomes increasingly proven that these unsung heroes are far more than simply transcriptional noise.

69 References

References

1. Stein, L. D. (2004). Human genome: End of the beginning. Nature, 431(7011),

915-916. doi:10.1038/431915a

2. Hellsten, U., Harland, R. M., Gilchrist, M. J., Hendrix, D., Jurka, J., Kapitonov, V., . . .

Rokhsar, D. S. (2010). The genome of the Western clawed frog Xenopus tropicalis.

Science, 328(5978), 633-636. doi:10.1126/science.1183670

3. Hillier, L. W., Miller, W., Birney, E., Warren, W., Hardison, R., Ponting, C. P. (2004).

Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695-716. | doi:10.1038/nature03154(International

Chicken Genome Sequencing, 2004)

4. Koonin, E. V. (2012). Does the central dogma still stand? Biology Direct 7:27. doi:

10.1186/1745-6150-7-27.

5. Fu, X.-D. (2014). Non-coding RNA: a new frontier in regulatory biology. National

Science Review, 1(2), 190-204. doi:10.1093/nsr/nwu008

6. Lodish H, Berk A, Zipursky SL, et al. Molecular Cell Biology. 4th edition. New York: W.

H. Freeman; 2000. Section 11.2, Processing of Eukaryotic mRNA.

7. Lander, E. S., Linton, L. M., Birren, B., et al. International Human Genome

Sequencing, C. (2001). Initial sequencing and analysis of the human genome. Nature,

409(6822), 860-921. doi:10.1038/35057062

8. Djebali, S., Davis, C. A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A., . . .

Gingeras, T. R. (2012). Landscape of transcription in human cells. Nature, 489(7414),

101-108. doi:10.1038/nature11233

70 References

9. Consortium, E. P. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature, 489(7414), 57-74. doi:10.1038/nature11247

10. Taft, R.J., Pheasant, M., Mattick. J. S. (2007). The relationship between non-protein- coding DNA and eukaryotic complexity. BioEssays 29:288–299. doi: 10.1002/bies.20544

11. Cawley, S., Bekiranov, S., Ng, H. H., et al. (2004). Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of non-coding RNAs. Cell, 116(4), 499-509.

12. Nakaya, H. I., Amaral, P. P., Louro, R., et al. (2007). Genome mapping and expression analyses of human intronic non-coding RNAs reveal tissue-specific patterns and enrichment in genes related to regulation of transcription. Genome Biol, 8(3), R43. doi:10.1186/gb-2007-8-3-r43

13. Fatica, A., & Bozzoni, I. (2014). Long non-coding RNAs: new players in cell differentiation and development. Nat Rev Genet, 15(1), 7-21. doi:10.1038/nrg3606

14. Michael J Brannon, et al (2015). Identification of long non-coding RNAs disregulated in the midbrain of cocaine users. Journal of Neurochemistry,135, 50–59. doi: 10.1111/ jnc.13255

15. Deng, G., Sui, G. (2013). Non-coding RNA in oncogenesis: An new Era of identifying

Key players. Int. J. Mol. Sci., 14, 18319-18349; doi:10.3390/ijms140918319

16 Weinberg, M. S., & Wood, M. J. (2009). Short non-coding RNA biology and neurodegenerative disorders: novel disease targets and therapeutics. Hum Mol Genet,

18(R1), R27-39. doi:10.1093/hmg/ddp070

17. Esteller, M. (2011). Non-coding RNAs in human disease. Nat Rev Genet, 12(12),

861-874. doi:10.1038/nrg3074

71 References

18. Zaramela, L. S., Vencio, R. Z., ten-Caten, F., Baliga, N. S., & Koide, T. (2014).

Transcription start site associated RNAs (TSSaRNAs) are ubiquitous in all domains of life. PLoS One, 9(9), e107680. doi:10.1371/journal.pone.0107680

19. Preker, P., Almvig, K., Christensen, M. S., Valen, E., Mapendano, C. K., Sandelin, A.,

& Jensen, T. H. (2011). PROMoter uPstream Transcripts share characteristics with mRNAs and are produced upstream of all three major types of mammalian promoters.

Nucleic Acids Res, 39(16), 7179-7193. doi:10.1093/nar/gkr370

20. Ulitsky, I., & Bartel, D. P. (2013). lincRNAs: genomics, evolution, and mechanisms.

Cell, 154(1), 26-46. doi:10.1016/j.cell.2013.06.020

21. Ponting, C. P., Oliver, P. L., & Reik, W. (2009). Evolution and functions of long non- coding RNAs. Cell, 136(4), 629-641. doi:10.1016/j.cell.2009.02.006

22. Wang, K. C., & Chang, H. Y. (2011). Molecular mechanisms of long non-coding

RNAs. Mol Cell, 43(6), 904-914. doi:10.1016/j.molcel.2011.08.018

23. Autuoro, J. M., Pirnie, S. P., & Carmichael, G. G. (2014). Long non-coding RNAs in imprinting and X chromosome inactivation. Biomolecules, 4(1), 76-100. doi:10.3390/ biom4010076

24. Froberg, J. E., Yang, L., & Lee, J. T. (2013). Guided by RNAs: X-inactivation as a model for lncRNA function. J Mol Biol, 425(19), 3698-3706. doi:10.1016/j.jmb.

2013.06.031

25. Koerner, M. V., Pauler, F. M., Huang, R., & Barlow, D. P. (2009). The function of non- coding RNAs in genomic imprinting. Development, 136(11), 1771-1783. doi:10.1242/dev.

030403

72 References

26. Morris, K. V. (2009). RNA-directed transcriptional gene silencing and activation in human cells. Oligonucleotides, 19(4), 299-306. doi:10.1089/oli.2009.0212

27. Yu, W., Gius, D., Onyango, P., Muldoon-Jacobs, K., Karp, J., Feinberg, A. P., & Cui,

H. (2008). Epigenetic silencing of tumour suppressor gene p15 by its antisense RNA.

Nature, 451(7175), 202-206. doi:10.1038/nature06468

28. Rinn, J. L., & Chang, H. Y. (2012). Genome regulation by long non-coding RNAs.

Annu Rev Biochem, 81, 145-166. doi:10.1146/annurev-biochem-051410-092902

29. Yoon, J. H., Abdelmohsen, K., & Gorospe, M. (2013). Posttranscriptional gene regulation by long non-coding RNA. J Mol Biol, 425(19), 3723-3730. doi:10.1016/j.jmb.

2012.11.024

30. Gomes, A. Q., Nolasco, S., & Soares, H. (2013). Non-coding RNAs: multi-tasking molecules in the cell. Int J Mol Sci, 14(8), 16010-16039. doi:10.3390/ijms140816010

31. Puvvula, P. K., Desetty, R. D., Pineau, P., Marchio, A., Moon, A., Dejean, A., &

Bischof, O. (2014). Long non-coding RNA PANDA and scaffold-attachment-factor SAFA control senescence entry and exit. Nat Commun, 5, 5323. doi:10.1038/ncomms6323

32. Kornienko, A. E., Guenzl, P. M., Barlow, D. P., & Pauler, F. M. (2013). Gene regulation by the act of long non-coding RNA transcription. BMC Biology 11, 59. doi:

10.1186/1741-7007-11-59

33. Goodrich, J. A., & Kugel, J. F. (2006). Non-coding-RNA regulators of RNA polymerase II transcription. Nat Rev Mol Cell Biol, 7(8), 612-616. doi:10.1038/nrm1946

34. Hrdlickova, B., de Almeida, R. C., Borek, Z., & Withoff, S. (2014). Genetic variation in the non-coding genome: Involvement of micro-RNAs and long non-coding RNAs in disease. Biochim Biophys Acta, 1842(10), 1910-1922. doi:10.1016/j.bbadis.2014.03.011

73 References

35. Morris, K. V. (2009). Long antisense non-coding RNAs function to direct epigenetic complexes that regulate transcription in human cells. Epigenetics, 4(5), 296-301.

36. Han, Y., Wu, Z., Wu, T., Huang, Y., Cheng, Z., Li, X., . . . Du, Z. (2016). Tumor- suppressive function of long non-coding RNA MALAT1 in glioma cells by downregulation of MMP2 and inactivation of ERK/MAPK signaling. Cell Death Dis, 7, e2123. doi:

10.1038/cddis.2015.407

37. Hu, X., Feng, Y., Zhang, D., Zhao, S. D., Hu, Z., Greshock, J., . . . Zhang, L. (2014).

A functional genomic approach identifies FAL1 as an oncogenic long non-coding RNA that associates with BMI1 and represses p21 expression in cancer. Cancer Cell, 26(3),

344-357. doi:10.1016/j.ccr.2014.07.009

38. Kersey, P. J., Allen, J. E., Armean, I., Boddu, S., Bolt, B. J., Carvalho-Silva, D., . . .

Staines, D. M. (2016). Ensembl Genomes 2016: more genomes, more complexity.

Nucleic Acids Res, 44(D1), D574-580. doi:10.1093/nar/gkv1209

39. Dong, Y., Sirotkin, A. M., Yang, Y. S., Brown, D. T., Sittman, D. B., & Skoultchi, A. I.

(1994). Isolation and characterization of two replication-dependent mouse H1 histone genes. Nucleic Acids Res, 22(8), 1421-1428.

40. Ohe, Y., Hayashi, H., & Iwai, K. (1989). Human spleen histone H1. Isolation and amino acid sequences of three minor variants, H1a, H1c, and H1d. J Biochem, 106(5),

844-857.

41. Fan, Y., Sirotkin, A., Russell, R. G., Ayala, J., & Skoultchi, A. I. (2001). Individual somatic H1 subtypes are dispensable for mouse development even in mice lacking the

H1(0) replacement subtype. Mol Cell Biol, 21(23), 7933-7943. doi:10.1128/MCB.

21.23.7933-7943.2001

74 References

42. Brown, D. T., Alexander, B. T., & Sittman, D. B. (1996). Differential effect of H1 variant overexpression on cell cycle progression and gene expression. Nucleic Acids

Res, 24(3), 486-493

43. Terme, J. M., Sese, B., Millan-Arino, L., Mayor, R., Izpisua Belmonte, J. C., Barrero,

M. J., & Jordan, A. (2011). Histone H1 variants are differentially expressed and incorporated into chromatin during differentiation and reprogramming to pluripotency. J

Biol Chem, 286(41), 35347-35357. doi:10.1074/jbc.M111.281923

44. Kent, W. J., Sugnet, C. W., Furey, T. S., Roskin, K. M., Pringle, T. H., Zahler, A. M.,

Haussler, D. The human genome browser at UCSC. Genome Res. 2002 Jun;12(6):

996-1006

45. Izzo, A., & Schneider, R. (2016). The role of linker histone H1 modifications in the regulation of gene expression and chromatin dynamics. Biochim Biophys Acta, 1859(3),

486-495. doi:10.1016/j.bbagrm.2015.09.003

46. Chen, R., Kang, R., Fan, X. G., & Tang, D. (2014). Release and activity of histone in diseases. Cell Death Dis, 5, e1370. doi:10.1038/cddis.2014.337

47. Fan, Y., Nikitina, T., Zhao, J., Fleury, T. J., Bhattacharyya, R., Bouhassira, E. E., . . .

Skoultchi, A. I. (2005). Histone H1 depletion in mammals alters global chromatin structure but causes specific changes in gene regulation. Cell, 123(7), 1199-1212. doi:

10.1016/j.cell.2005.10.028

48. Garg, M., Ramdas, N., Vijayalakshmi, M., Shivashankar, G. V., & Sarin, A. (2014).

The C-terminal domain (CTD) in linker histones antagonizes anti-apoptotic proteins to modulate apoptotic outcomes at the mitochondrion. Cell Death Dis, 5, e1058. doi:

10.1038/cddis.2014.20

75 References

49. Chipuk, J. E., Kuwana, T., Bouchier-Hayes, L., Droin, N. M., Newmeyer, D. D.,

Schuler, M., & Green, D. R. (2004). Direct activation of Bax by p53 mediates mitochondrial membrane permeabilization and apoptosis. Science, 303(5660),

1010-1014. doi:10.1126/science.1092734

50. Lettre, G., Jackson, A. U., Gieger, C., Schumacher, F. R., Berndt, S. I., Sanna,

S., . . . Hirschhorn, J. N. (2008). Identification of ten loci associated with height highlights new biological pathways in human growth. Nat Genet, 40(5), 584-591. doi:10.1038/ng.

125

51. Soranzo, N., Rivadeneira, F., Chinappen-Horsley, U., Malkina, I., Richards, J. B.,

Hammond, N., . . . Deloukas, P. (2009). Meta-analysis of genome-wide scans for human adult stature identifies novel Loci and associations with measures of skeletal frame size.

PLoS Genet, 5(4), e1000445. doi:10.1371/journal.pgen.1000445

52. Li, H., Kaminski, M. S., Li, Y., Yildiz, M., Ouillette, P., Jones, S., . . . Malek, S. N.

(2014). Mutations in linker histone genes HIST1H1 B, C, D, and E; OCT2 (POU2F2);

IRF8; and ARID1A underlying the pathogenesis of follicular lymphoma. Blood, 123(10),

1487-1498. doi:10.1182/blood-2013-05-500264

53. Will, C. L., Urlaub, H., Achsel, T., Gentzel, M., Wilm, M., & Luhrmann, R. (2002).

Characterization of novel SF3b and 17S U2 snRNP proteins, including a human Prp5p homologue and an SF3b DEAD-box protein. EMBO J, 21(18), 4978-4988.

54. Lamond, A. I., (1995). Pre-mRNA Processing. Springer-Verlag Berlin Heidelberg, 1,

53. doi: 10.1007/978-3-662-22325-3

55. Rosenbloom, K. R., Armstrong, J., Barber, G. P., Casper, J., Clawson, H., Diekhans,

M., Dreszer, T.R., …Kent, W. J., The UCSC Genome Browser database: 2015 update.

76 References

Nucleic Acids Res. 2015 Jan;43(Database issue):D670-81. PMID: 25428374; PMC:

PMC4383971

56. Zhang, J., & Manley, J. L. (2013). Misregulation of pre-mRNA alternative splicing in cancer. Cancer Discov, 3(11), 1228-1237. doi:10.1158/2159-8290.CD-13-0253

57. Alsafadi, S., Houy, A., Battistella, A., Popova, T., Wassef, M., Henry, E., . . . Stern, M.

H. (2016). Cancer-associated SF3B1 mutations affect alternative splicing by promoting alternative branchpoint usage. Nat Commun, 7, 10615. doi:10.1038/ncomms10615

58. Karni, R., de Stanchina, E., Lowe, S. W., Sinha, R., Mu, D., & Krainer, A. R. (2007).

The gene encoding the splicing factor SF2/ASF is a proto-oncogene. Nat Struct Mol Biol,

14(3), 185-193. doi:10.1038/nsmb1209

59.Omenn, G. S., Yocum, A. K., & Menon, R. (2010). Alternative splice variants, a new class of protein cancer biomarker candidates: findings in pancreatic cancer and breast cancer with systems biology implications. Dis Markers, 28(4), 241-251. doi:10.3233/

DMA-2010-0702

60. Vogelstein, B., Sur, S. & Prives, C. (2010). p53: The Most Frequently Altered Gene in

Human Cancers. Nature Education 3(9):6.

61. Beckerman, R., & Prives, C. (2010). Transcriptional regulation by p53. Cold Spring

Harb Perspect Biol, 2(8), a000935. doi:10.1101/cshperspect.a000935

62. Fischer, M., Steiner, L., & Engeland, K. (2014). The transcription factor p53: not a repressor, solely an activator. Cell Cycle, 13(19), 3037-3058. doi:

10.4161/15384101.2014.949083

77 References

63. Kitayner, M., Rozenberg, H., Kessler, N., Rabinovich, D., Shaulov, L., Haran, T. E., &

Shakked, Z. (2006). Structural basis of DNA recognition by p53 tetramers. Mol Cell,

22(6), 741-753. doi:10.1016/j.molcel.2006.05.015

64. Joerger, A. C., & Fersht, A. R. (2010). The tumor suppressor p53: from structures to drug discovery. Cold Spring Harb Perspect Biol, 2(6), a000919. doi:10.1101/ cshperspect.a000919

65. Lane, D. P. (1992). Cancer. p53, guardian of the genome. Nature, 358(6381), 15-16. doi:10.1038/358015a0

66. Fridman, J. S., & Lowe, S. W. (2003). Control of apoptosis by p53. Oncogene,

22(56), 9030-9040. doi:10.1038/sj.onc.1207116

67. Yoshida, K., & Miki, Y. (2010). The cell death machinery governed by the p53 tumor suppressor in response to DNA damage. Cancer Sci, 101(4), 831-835. doi:10.1111/j.

1349-7006.2010.01488.x

68. Violette, S., Poulain, L., Dussaulx, E., Pepin, D., Faussat, A. M., Chambaz, J., . . .

Lesuffleur, T. (2002). Resistance of colon cancer cells to long-term 5-fluorouracil exposure is correlated to the relative level of Bcl-2 and Bcl-X(L) in addition to Bax and p53 status. Int J Cancer, 98(4), 498-504.

69. George, P. (2011). p53: How critical is its role in cancer? Int J Curr Pharm Res 3(2),

19-25. ISSN- 0975-7066

70. Wierstra, I. (2008). Sp1: emerging roles--beyond constitutive activation of TATA-less housekeeping genes. Biochem Biophys Res Commun, 372(1), 1-13. doi:10.1016/j.bbrc.

2008.03.074

78 References

71. Black, A. R., Black, J. D., & Azizkhan-Clifford, J. (2001). Sp1 and kruppel-like factor family of transcription factors in cell growth regulation and cancer. J Cell Physiol, 188(2),

143-160. doi:10.1002/jcp.1111

72. Beishline, K., & Azizkhan-Clifford, J. (2015). Sp1 and the 'hallmarks of cancer'. FEBS

J, 282(2), 224-258. doi:10.1111/febs.13148

73. Li, L., & Davie, J. R. (2010). The role of Sp1 and Sp3 in normal and cancer cell biology. Ann Anat, 192(5), 275-283. doi:10.1016/j.aanat.2010.07.010

74. Johnson-Pais, T., Degnin, C., & Thayer, M. J. (2001). pRB induces Sp1 activity by relieving inhibition mediated by MDM2. Proc Natl Acad Sci U S A, 98(5), 2211-2216. doi:

10.1073/pnas.051415898

75. Safe, S., & Abdelrahim, M. (2005). Sp transcription factor family and its role in cancer. Eur J Cancer, 41(16), 2438-2448. doi:10.1016/j.ejca.2005.08.006

76. Wang, L., Wei, D., Huang, S., Peng, Z., Le, X., Wu, T. T., . . . Xie, K. (2003).

Transcription factor Sp1 expression is a significant predictor of survival in human gastric cancer. Clin Cancer Res, 9(17), 6371-6380.

77. Alberstein, M., Amit, M., Vaknin, K., O'Donnell, A., Farhy, C., Lerenthal, Y., . . . Ast,

G. (2007). Regulation of transcription of the RNA splicing factor hSlu7 by Elk-1 and Sp1 affects alternative splicing. RNA, 13(11), 1988-1999. doi:10.1261/rna.492907

78. Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D. R., . . . Pachter, L.

(2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc, 7(3), 562-578. doi:10.1038/nprot.2012.016

79. Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):

656-64

79 References

80. Weinberg, M. S., & Morris, K. V. (2013). Long non-coding RNA targeting and transcriptional de-repression. Nucleic Acid Ther, 23(1), 9-14. doi:10.1089/nat.2012.0412

81. Morris, K. V., & Mattick, J. S. (2014). The rise of regulatory RNA. Nat Rev Genet,

15(6), 423-437. doi:10.1038/nrg3722

82. Pearson, R., Fleetwood, J., Eaton, S., Crossley, M., & Bao, S. (2008). Kruppel-like transcription factors: a functional family. Int J Biochem Cell Biol, 40(10), 1996-2001. doi:

10.1016/j.biocel.2007.07.018

83. Yang, Y., Wen, L., & Zhu, H. (2015). Unveiling the hidden function of long non-coding

RNA by identifying its major partner-protein. Cell Biosci, 5, 59. doi:10.1186/ s13578-015-0050-x

84. Kloc, M., Dallaire, P., Reunov, A., & Major, F. (2011). Structural messenger RNA contains cytokeratin polymerization and depolymerization signals. Cell Tissue Res,

346(2), 209-222. doi:10.1007/s00441-011-1255-x

85. Campisi, J., & d'Adda di Fagagna, F. (2007). Cellular senescence: when bad things happen to good cells. Nat Rev Mol Cell Biol, 8(9), 729-740. doi:10.1038/nrm2233

86. Funayama, R., Saito, M., Tanobe, H., & Ishikawa, F. (2006). Loss of linker histone

H1 in cellular senescence. J Cell Biol, 175(6), 869-880. doi:10.1083/jcb.200604005

80 Appendix

Appendix A. UCSC Genome Browser

The following images of the RNA transcripts in this study are as viewed in UCSC genome browser. Primer pair and biotinylated asODN binding sites are indicated in black above the GENCODE transcript in blue. Identified pseudogene expression, ncRNA expression, H3K27Ac (active transcription) marks as well as sequence conservation are also shown further down the alignment.

81 Appendix

26,234,900 HIST1HID_F1 asHIST1HID_as2 26,234,800 hg38 HIST1HID_R1 26,234,700 RefSeq Genes 26,234,600 HIST1HID_F2 OMIM Allelic Variant SNPs lincRNA and TUCP transcripts Your Sequence from Blat Search Non-Human mRNAs from GenBank Multiz Alignments of 100 Vertebrates lincRNA RNA-Seq reads expression abundances Transfer RNA Genes Identified with tRNAscan-SE 100 vertebrates Basewise Conservation by PhyloP 200 bases 26,234,500 HIST1HID_R2 DNase I Hypersensitivity Peak Clusters from ENCODE (95 cell types) Basic Gene Annotation Set from GENCODE Version 23 (Ensembl 81) Pseudogene Annotation Set from GENCODE Version 23 (Ensembl 81) GENCODE v22 Comprehensive Transcript Set (only Basic displayed by default) H3K27Ac Mark (Often Found Near Regulatory Elements) on 7 cell lines from ENCODE C/D and H/ACA Box snoRNAs, scaRNAs, microRNAs from snoRNABase miRBase 26,234,400 26,234,300 0 - 0 _ Dog Liver chr6: Lung Brain Heart Scale 100 _ -4.5 _ Colon Ovary 4.88 _ Breast Testes Mouse Kidney hLF_r1 hLF_r2 Thyroid Rhesus Adrenal Brain_R Chicken Adipose Prostate Lamprey Elephant Zebrafish Testes_R HIST1HID HIST1H1D Foreskin_R X_tropicalis Placenta_R LymphNode Other mRNAs RefSeq Genes SkeletalMuscle WhiteBloodCell GENCODE v22 DNase Clusters asHIST1HID_as1 1. HIST1H1D1. asviewed in theUCSC Genome Browser noting primer pair and asODN binding Annotation locationsabove theGENCODEGene Layered H3K27Ac Cons 100 Verts Chapter 4 Discussion

144,095,600 SF3B5_F1 144,095,500 SF3B5_R1 hg38 asSF3B5_1 144,095,400 144,095,300 RefSeq Genes OMIM Allelic Variant SNPs lincRNA and TUCP transcripts Your Sequence from Blat Search Non-Human mRNAs from GenBank Multiz Alignments of 100 Vertebrates 144,095,200 lincRNA RNA-Seq reads expression abundances Transfer RNA Genes Identified with tRNAscan-SE 100 vertebrates Basewise Conservation by PhyloP SF3B5_F2 200 bases DNase I Hypersensitivity Peak Clusters from ENCODE (95 cell types) Basic Gene Annotation Set from GENCODE Version 23 (Ensembl 81) Pseudogene Annotation Set from GENCODE Version 23 (Ensembl 81) GENCODE v22 Comprehensive Transcript Set (only Basic displayed by default) 144,095,100 H3K27Ac Mark (Often Found Near Regulatory Elements) on 7 cell lines from ENCODE C/D and H/ACA Box snoRNAs, scaRNAs, microRNAs from snoRNABase miRBase SF3B5_R2 asSF3B5_2 144,095,000 0 - 0 _ Dog Liver chr6: Lung Brain Heart Scale 100 _ -4.5 _ Colon Ovary 4.88 _ Breast Testes Mouse SF3B5 SF3B5 Kidney hLF_r1 hLF_r2 Thyroid Rhesus Adrenal Brain_R Chicken Adipose Prostate Lamprey Elephant Zebrafish Testes_R Foreskin_R X_tropicalis Placenta_R LymphNode Other mRNAs RefSeq Genes SkeletalMuscle WhiteBloodCell GENCODE v22 DNase Clusters 2. SF3B5asviewed2. in theUCSC Genome Browser noting primer pair and asODN binding Annotation. locationsabove theGENCODEGene Layered H3K27Ac Cons 100 Verts

83 Appendix

Appendix B. Sequenced Transcripts From RIP-Seq Data

1. HIST1H1D

CATGCTGTTCTGACAGTTTGAGATTACTTATTGTCTTTTCTGGGAAGACAAAAACATGTCGGAGACTGCTCC

ACTTGCTCCTACCATTCCTGCACCCGCAGAAAAAACACCTGTGAAGAAAAAGGCGAAGAAGGCAGGCGCA

ACTGCTGGGAAACGCAAAGCATCCGGACCCCCAGTATCTGAGCTTATCACCAAGGCAGTGGCAGCTTCTA

AGGAGCGCAGCGGCGTTTCTCTGGCCGCGCTTAAGAAAGCGCTTGCGGCTGCTGGCTACGATGTAGAAA

AAAACAACAGCCGTATCAAGCTTGGCCTCAAGAGCTTGGTGAGCAAAGGTACTCTGGTGCAGACCAAAGG

TACCGGTGCTTCTGGCTCCTTCAAACTCAACAAGAAAGCGGCTTCCGGGGAAGGCAAACCCAAGGCCAA

AAAGGCTGGCGCAGCCAAGCCTAGGAAGCCTGCTGGGGCAGCCAAGAAGCCCAAGAAGGTGGCTGGCG

CCGCTACCCCGAAGAAAAGCATCAAAAAGACTCCTAAGAAGGTAAAGAAGCCAGCAACCGCTGCTGGGAC

CAAGAAAGTGGCCAAGAGTGCGAAAAAGGTGAAAACACCTCAGCCAAAAAAAGCTGCCAAGAGTCCAGC

TAAGGCCAAAGCCCCTAAGCCCAAGGCGGCCAAGCCTAAGTCGGGGAAGCCGAAGGTTACAAAGGCAAA

GAAGGCAGCTCCGAAGAAAAAGTGAAACTGGCGGGACGTTCCCCTTTGAAAATTTTAAACGGCTCTTTTCA

GAGCCACCCA

84 Appendix

2. SF3B5

TCTTCTGCGACGGCGCGGACCTGGAGCTTCCGCGCGGTGGCTTCACTCTCCTGTAAAACGCTAGAGCGG

CGAGTTGTTACCTGCGTCCTCTGACCTGAGAGCGAAGGGGAAAGCGGCGAGATGACTGACCGCTACACC

ATCCATAGCCAGCTGGAGCACCTGCAGTCCAAGTACATCGGCACGGGCCACGCCGACACCACCAAGTGG

GAGTGGCTGGTGAACCAACACCGCGACTCGTACTGCTCCTACATGGGCCACTTCGACCTTCTCAACTACT

TCGCCATTGCGGAGAATGAGAGCAAAGCGCGAGTCCGCTTCAACTTGATGGAAAAGATGCTTCAGCCTTG

TGGACCGCCAGCCGACAAGCCCGAGGAGAACTGAGACTCTGCCTTACCACCGCAGTGCGGGGCACCTC

TCCCAGCGTTTCTCCGGTTTGCCAATCCTCTTAAGTATTCCTGTCTCCAAAGGACCGGCTCTCCATGGCTC

CTGCGCCTCGTGCTTTCCGCGTACAGAAGTGCTTGCCCGGGGAGTCCCGCCTGACCTGCCTTCATGTGG

ACCCTTAGAACAGCACTGGGAGACCAGCAGGACTCCTGAGAACTGTGCTGGTGGAGAGGTCCTAGAGCC

GGCGAGCGTTTGAGAAGAGGGCATGGCGCTGGAGTGAGATGGGATTTGGCGTCTCGTTTTTGGCTAATTG

ATTGTCATTGGCTTTTTCCATAAAGTTTAGAAATCGTTCAGTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAA

3. RNY1

ATGAAAACAAGAAACGAGGGATGCCAGGAGAGTGGAAACTCTCGTAAAAGACTAGTCAAGTGCAGTAGTG

AGAAGGGGGGAAAGAGTAGAACAAGGAGTTCGATCTGTAACTGACTGTGAACAATCAATTGAGATAACTCA

CTACCTTCGGACCAGCCAATAAGTCCTCCTAC

85