California State University, Northridge

The SF3B3 Promoter is Type III Promoter Driven by a TATA-box, and AP-2aA, and Elk-1 are

Necessary for Maximal Levels of Expression\

A Thesis Submitted In Partial Fulfilment of the Requirements

For the Degree of Masters of Science in Biology

By

Alexander Underwood

May 2020

The thesis of Alexander Underwood is approved

______

Dr. Rheem Medh Date

______

Dr. Mariano Loza Coll Date

______

Dr. Cindy Malone Chair Date

California State University, Northridge

ii

Table of Contents

Signature Page...…………………………………………………………………………………..ii

Abstract…………………………………………………………………………………………...iv

Introduction………………………………………………………………………………………..1

Materials and Methods…………………………………………………………………………….7

Results……………………………………………………………………………………………18

Discussion………………………………………………………………………………………..30

References………………………………………………………………………………………..37

Appendix: Supplemental Figures..……………………………………………………………….51

iii

Abstract

The SF3B3 Promoter is Type III Promoter Driven by a TATA-box, and AP-2aA, and Elk-1 are

Necessary for Maximal Levels of Expression\

By Alexander Underwood

Master of Science in Biology

Splicing Factor 3b Subunit 3 (SF3B3) is a crucial splicing factor with a role in the regulation of several essential . While the role of SF3B3 alternative splicing has been well established, the mechanisms that regulate SF3B3 itself have not been identified. Using a 5’

RACE, we determined a single transcription start site (TSS). Through analysis of transcriptional activity and among species, we have identified the SF3B3 promoter region.

We identified the SF3B3 proximal promoter region as within 189bp upstream from the TSS using a series of deletions and transfection. Within the proximal promoter region, we identified a

TATA-box core transcription factor binding site and two proximal transcription factor binding sites, AP-2aA, and Elk-1. When we removed 2-3bp from the binding sites of AP-2aA, Elk-1, and

iv the TATA-box, we revealed that the AP-2aA and Elk-1 binding sites are necessary for maximal levels of transfection and that the TATA-box binding site is required for transcription initiation.

We showed that the SF3B3 promoter was an atypical Type III promoter containing both a

TATA-box and a large CpG island. Furthermore, the TATA-box is atypical being outside of the normal range from the TSS. These data match data seen in a similar promoter sequence and may suggest a pattern for the highly unusual TATA-driven type III promoters.

v

Introduction

1.1 Genetic Regulation - Introduction

Genetic regulation is a complex process that allows for adaptive responses to stimuli, and perhaps most importantly, cell differentiation. Neurotrophins are not expressed in kidney cells, but are in brain cells, at least in a healthy individual. The expression of the proper genes, at the proper time, and in the proper location, is paramount to the healthy function of cells (Aguet et al. 2017). Neurotrophins, for example, induce differentiation of neurons from progenitor cells. If they are not expressed appropriately during embryonic development, congenital disabilities occur (Hong et al. 2008). If neurotrophins are not expressed appropriately later in life, deficient repair of neural pathways leading to neurodegenerative disease could occur. Additionally, if neurotrophins are produced improperly (mutated) or produced in the wrong amounts, too much or too little, at any time, it can lead to improper function of the neural pathways (Gao et al. 2008)

Genes are regulated through non-coding regions of DNA, known as enhancers, silencers, and promoters. These regions are recognized by that inhibit, or recruit RNA polymerase II.

The core promoter typically lies within~35 bp of the TSS (Smale et al. 2003) While proximal promoter elements are located a few hundred base pairs upstream of the TSS (Zhu et al.

2015). Transcription is a complex process that requires many different elements to work correctly due to the complexity of the information stored in our DNA. Each cell needs to be different in a eukaryotic organism, and each cell type needs to function correctly in that tissue.

Every cell contains the entire but does not make use of every gene. For instance, goblet cells need to excrete mucin, but should not be producing milk. The reality is that errors in regulation usually lead to more severe conditions than sweating milk.

1.2 Genetic Regulation – Dysregulation

1

Dysregulation occurs when genes are expressed inappropriately. Inappropriate gene expression often causes disease. In recent years malfunctioning splicing factors and other regulatory elements have been shown to play a vital role in the development of many types of cancer (Grosso et al. 2008) (Gökmen et al. 2014). In one example, genes such as SF3B3, related to DNA damage and repair, were expressed in mantle cell lymphoma but were not expressed in the more severe small lymphocytic lymphoma(Henson et al., 2010). Cancer is, however, not the only disease linked to gene regulation. In the instance of Bipolar disorder differences in the DNA methylation of the Brain-Derived Neurotrophic Factor, Exon 1 promoter has been observed in patients with Bipolar disorder 1 and Bipolar disorder 2 (D’Addario et al. 2012) (Zuccato et al. 2009). As these examples illustrate, dysregulation can lead to serious health problems.

However, these examples alone are not sufficient to show the scope of diseases linked to genetic dysregulation.

1.3 Genetic Regulation – The Core Promoter

The core promoter’s best-understood function is directing transcription initiation. The core promoter region contains short sequences that basal transcription factors recognize and bind

(Danino et al. 2015). The most well studied and recognized core promoter element, the TATA box, had long been used for promoter classification, either TATA containing or TATA-less containing. However, it is now known that most promoters do not contain a TATA box, and instead, use other sequences to attract RNA Pol II. One categorization system of promoters separates them into focused, dispersed, or mixed promoters. Focused promoters are defined as having a single TSS, and are associated with spatiotemporally regulated tissue-specific genes.

Dispersed promoters have multiple TSS, and are associated with housekeeping genes. The majority of vertebrate promoters are dispersed promoters at approximately 70% (Danino et al.

2

2015). Another classification method also separates into three types, called Type I, Type II, and

Type III. Type I promoters are classified as containing a TATA-box and lack a CpG island. Type

II promoters are CpG islands with dispersed Transcription Start Sites (TSS). Type III contains combinations of Initiator and DPE motifs (Danino et al. 2015).

1.4 Genetic Regulation – Proximal Promoter Elements

Proximal promoter elements are short sequences usually -50 to -500 relative to the TSS that are required for the appropriate level of expression for each gene. These sequences recruit specific transcription factors to either increase or decrease expression, and are generally orientation independent (Matson et al. 2006). Chromatin folding can also limit access to these proximal promoter elements (Sanyal et al. 2017). The primary function of the proximal promoter elements is to set up the Pre-Initiation Complex assembly at the core promoter, which leads to transcription initiation (Matson et al. 2006). Recent studies have also shown that tissue-specific chromatin folding is reliant on the proximal promoter elements (Sanyal et al. 2017).

1.5 Genetic Regulation -Transcription Factors

Transcription factors are a group of proteins that regulate levels of transcription. They can increase transcription levels upwards of 1000-fold relative to basal levels when binding at proximal promoter elements(Maas et al. 1991). Transcription factors function by binding to a corresponding sequence and increasing or decreasing the level of RNA produced by the gene they are regulating. Activators increase the level of transcription, while repressors decrease the level of transcription (Whalen et al. 2016).

Transcription factors are characterized by the presence of a DNA-binding domain. They bind to specific recognition sites of the target DNA and work as complexes along with other regulatory proteins to either recruit or block RNA polymerase. Transcription factors can

3 also facilitate the condensation of chromatin by catalyzing the acetylation or deacetylation of histone proteins controlling the availability of the sequence, thereby controlling expression levels. They can also recruit other proteins to regulate transcription, such as coactivators and corepressors. In the core promoter region, transcription factors recruit RNA polymerase II to initiate transcription (Danino et al. 2005). Especially in eukaryotes, transcription factors will often form homodimers or heterodimers before binding DNA (Matson et al. 2006).

1.6 Genetic Regulation – Distal Regulatory Elements

While proximal promoter elements are location-dependent and located in the proximity of the TSS, distal regulatory elements can be located more than 1000 bp from the TSS. Distal regulatory elements include enhancers, silencers, insulators, and control regions. These distal regulatory elements are not dependent on proximity to the TSS because they act on these regions by initiating the looping of DNA. Enhancers are DNA sequences that increase expression and are bound by transcription factors called activators. In enhancer sequences, the transcription factors bind, forming an enhanceosome that increases transcription. The increase in activity is achieved by the looping of DNA so that the enhanceosome is in the proximity of the proximal promoter elements and recruitment of basal transcription factors to increase transcription initiation. (Engel et al. 2016). Silencers are DNA sequences that are bound by repressors and decrease expression levels (Maston et al. 2006). Insulators function by blocking enhancers; they span from a few hundred to a couple of thousand base pairs in length. They contain clustered recognition binding sites for DNA-binding proteins. (Scott et al. 1999). The Locus control regions can open condensed chromatin allowing for transcription factor binding and expression of the gene. The locus control region can act independently of location, and on a long-timescale in comparison to other distal regulatory elements, unlike silencers, they can completely turn off

4 expression, and most notably, they control the expression of clusters of genes (Repele et al. 2015).

1.7 Splicing Factor 3b Subunit 3(SF3B3)

SF3B3 is a splicing factor, a gene that is responsible for the genetic regulation of many other genes. Spliceosomes which are responsible for splicing can give rise to many different isoforms of proteins that have various different functions through alternative splicing (Liu and

Cheng, 2013). Alternative splicing can occur from sites that are different from the recognized splice site that is recognized by spliceosome (Liu and Cheng, 2013). These splice site recognitions and splicing are regulated by two classes of splicing regulators, trans-acting and cis- acting (Liu and Cheng, 2013). It has been noted that these trans-acting splicing factors can have enhancing and/or inhibiting effects on the regulation of alternative splicing (Lie and Cheng,

2013; Silipo et al., 2015).

SF3B3 encodes for a spliceosomal protein subunit that plays a role in the spliceosome assembly of 17S U2 snRNP, which is necessary for splicing (Das et al. 1999). Errors in splicing can cause issues in the genes that these splicing factors regulate. Cancer cells can form by the acquisition of common cancer traits that have been linked as a result of the dysregulation of alternative splicing (Liu and Cheng, 2013). SF3B3 has been found to be highly expressed in ER- positive breast cancer cells that have developed resistance to endocrine therapies and in ER- negative breast cancer cells (Gökmen-Polar et al., 2014). It was observed that higher levels of SF3B3 in ER-positive breast cancer cells correlated with shorter overall survival time

(Gökmen-Polar et al., 2014).

5

It has been observed that aberrant alternative splicing is connected to the aggression of breast cancer (Silipo et al.,2015). While mutations in coding sequences can lead to cancer-related alternative splicing, changes in the concentration of splicing factors have been correlated to cancer-related alternative splicing (Silipo et al., 2015). Understanding the regulation of SF3B3 may provide insight into the role SF3B3 plays in different cancer types.

In this study, we cloned the human SF3B3 promoter region into a luciferase reporter plasmid vector and functionally characterized the promoter region. The characterization of the SF3B3 promoter will provide the mechanism for the expression of the SF3B3 gene in healthy cells, leading to potentially understanding the expression of SF3B3 in MCL and its absence in

SLL.

6

Materials and Methods

2.1 Bacterial strains and plasmids.

Derivatives of Escherichia coli DH5a are from Genessee and Zymo Research. The plasmids were Lucigen’s plasmid pGC_Blue and Promega’s pGL3_Basic.

2.2 Polymerase Chain Reaction (PCR) amplification of the SF3B3 promoter region.

Designed by bioinformatics’ Primer3Plus program, PCR primers sense: CCA AAA CGC

TAA AAT CAC ACC and antisense ATC TCC ACG AGC CCT ATT TTG were phosphorylated at the 5’ end with T4 Polynucleotide Kinase (T4 PNK) following the Lucigen’s pGC_Blue

Cloning Kit protocol. The primer kinase reaction was performed in a 10 μL total reaction volume consisting of primers each at 200 pmol concentration. Amplification was performed in a 50 μL reaction volume consisting of 1 μL of the T4 PNK reaction, 2X GoTaq Green Master Mix, human genomic DNA (50 ng) by following Promega’s GoTaq Green Master Mix kit for PCR protocol. The thermal cycle was programmed (Table 1) as a touchdown PCR. The PCR product was analyzed by gel electrophoresis at 80 V in 1% agarose gel in 1 x TAE buffer, with

GeneRuler 1 kb DNA ladder, stained with GelRed and examined by UV light. The gel bands were extracted by scalpel under long-wave UV light into a 1.5 ml Eppendorf and stored at -

20 °C.

Touch Down Polymerase Chain Reaction

Cycles Temperature (°C) Time

1 95 5 minutes

95 30 seconds

2 57 40 seconds

72 90 seconds

7

95 30 seconds

2 54 40 seconds

72 90 seconds

95 30 seconds

2 51 40 seconds

72 90 seconds

95 30 seconds

2 49 40 seconds

72 90 seconds

95 30 seconds

27 47 40 seconds

72 90 seconds

1 72 5 minutes

4 ∞ (Hold)

2.3 Extraction and purification of PCR products.

Gel extractions were performed following QIAGEN’s Gel Extraction Kit for the extraction of the SF3B3 promoter, the steps for isopropanol addition, and the recommended addition of buffer QG were omitted. In the elution of the DNA from the QIAquick spin column,

30 μL of buffer EB was added to the QIAquick membrane and incubated for 1 minute. The concentration and purity of the gel extraction products were analyzed by the NanoDrop 2000c

UV-Vis spectrophotometer.

2.4 T4 DNA rapid ligation.

Ligation reactions were performed following a modified version of the Thermo Scientific

Rapid DNA ligation protocol. Each reaction contained 10-100ng of vector DNA, insert DNA at a 8

3:1 Molar excess over the vector, 4 μL of 5X rapid ligation buffer, 1 μL of T4 DNA ligase 5

μ/μL, and filled to 20 μL with nuclease-free water. Mixtures were incubated at room temperature

~22°C for 1 hour and immediately used for transformations.

2.5 Transformation of pGC_Blue plasmid vector with the SF3B3 promoter into E. coli.

The transformation reaction was performed with 5 μL of the ligation product of the SF3B3 and pGC_Blue into GC5 Mix & Plate chemically competent E. coli cells following the 96-well format with outgrowth from the Genesee’s GC5 Mix & Plate Competent Cells kit.

100 μL of the transformation mixture was plated on LB agar containing kanamycin (50 μg/mL),

Lucigen’s X-gal (50 μg/mL), and Lucigen’s IPTG (1mM) and incubated overnight at 37 °C.

White colonies were selected based on blue-white screening. The selected colonies were used to inoculate liquid LB with kanamycin at 50 ng/μL and were incubated overnight at 37 °C.

2.6 Boiling Lysis Plasmid Extraction

A master mix of boil buffer (50mM EDTA, 10mMTris, 8% sucrose, 0.5% Triton X-100.

Samples were grown in 5 mL plastic culture tubes for 14 h at 37 °C. 3 mL from each was centrifuged in a microcentrifuge tube at 12,000 xg for 60 s. For each pelleted sample 350 μL master mix, and 25 μL 10 mg/mL lysozyme is added to the microcentrifuge tube and resuspended by dragging the tube across a rack until frothy. Tubes were added to boiling water for 37 s, cooled to room temperature, and centrifuged down at 12,000 xG for 5 min. The gooey cell debris was removed with a 10 μL pipette tip. 40 μL of 3 M NaOAc was added to each tube and mixed via vortexing briefly, 425 μL isopropanol was added and mixed via a brief vortex.

The tubes were centrifuged at 12,000 xg for 5 min. The supernatant was decanted, and tubes were inverted to air dry for 12-15 min. Plasmids were resuspended in 100 μL of 10 μg/mL

RNase water.

9

2.7 Restriction digests of plasmid DNA.

Restriction enzyme digests reaction was performed in a 20 μL reaction volume consisting of 5 μL of the boiling lysis plasmid product, 2X FastDigest Green Buffer, 1 μL of

Thermo Fisher’s FastDigest EcoRI enzyme and incubated for 15 minutes at 37 °C. The products of the restriction enzyme product were analyzed by gel electrophoresis at 80 V in 1% agarose gel in 1 x TAE buffer, with GeneRuler’s 1 kb plus DNA ladder, stained with GelRed and examined by UV light.

2.8 Thermo Scientific GeneJET Plasmid Miniprep

4mL of E. coli overnight cultures were prepared and centrifuged down at 12,000 x g. The resulting supernatant was removed and resuspended in 250 μL of Resuspension Solution w/RNAse via pipetting up and down. 250 μL of Lysis solution was added to each tube and mixed thoroughly by inverting 6 times. 350 μL of Neutralization solution was added and mixed immediately and thoroughly by inverting 6 times. Samples were centrifuged at 12,000 xg for 5 min. The supernatant was transferred to the GeneJET spin column by pipetting, avoiding cell debris, and centrifuged at 12,000 xg for 5 min. Columns were washed by adding 500 μL of Wash

Solution, which were centrifuged 30s at 12,000 xg to remove wash, repeat. Columns were centrifuged at 12,000 xg to remove residual wash solution. Columns were transferred to microcentrifuge tubes, and 50 μL of elution buffer was added and allowed sit for 2min at room temperature. Columns were centrifuged at 12,000 xg for 2 min and products were stored at -

20 °C.

2.9 Transformation of pGL3 Basic plasmid vector with plasmid DNA into E.coli .

Ligated plasmid products (2.4) were transformed into pGL3_Basic reporter vector containing luciferase. LB-Amp plates and SOC media were pre-warmed to 37 °C. Chemically

10 competent cells were thawed on ice, and 4 μL of the ligation reaction was added and incubated on ice for 10 min. 215 μL of 37 °C SOC was added to the reaction, and an outgrowth was performed at 37 °C for 1 hour. Cells were plated using beads and incubated at 37 °C for 16-20 hours.

2.10 Plasmid sample confirmation by sequencing

DNA sequences were sent into Laragen to be confirmed. Samples were sent in at 20 ng/μL, and the primers used were the built-in RV3, and GLP2 primers. The returned sequence was then analyzed using a pairwise alignment using SnapGene, and then chromatographs were analyzed to check for potential errors.

2.11 Transcription Factor Sequence Initial Identification

Transcription factors were identified in the BDNF promoter region using a consensus of sequence similarities gathered using Match, Alibaba 2.1, and PROMO. The parameters for

Match were groups of matrixes set to vertebrates, cut-off selection for matrix group to minimize the sum of both error rates, and predefined profiles set to best_selection.prf (http://gene- regulation.com/cgi-bin/pub/programs/match/bin/match.cgi) Alibaba2.1 default parameters were used (http://gene-regulation.com/pub/programs/alibaba2/index.html). PROMO parameters were set to only check the human site. The maximum matrix dissimilarity was set to 5

(http://alggen.lsi.upc.es/cgi-bin/promo_v3/promo/promoinit.cgi?dirDB=TF_8.3). Consensus for transcription factors between at least 2 of these programs was marked as expected transcription factor binding sites.

2.12 Optimized Touchdown PCR

For all subsequent PCR reactions, the general layout was followed. Primers were designed using the New England Biolabs free software NEBasechanger

11

(https://nebasechanger.neb.com/), and confirmed using the Premier Biosoft International’s

Beacon Designer Free Edition (http://www.premierbiosoft.com/qOligo/Oligo.jsp?PID=1).

Primers were ordered through integrated DNA technologies. The binding temperatures (BT) were determined based on the average of the melting temperatures provided by integrated DNA technologies. To phosphorylate the 5’ ends of the primers, and remove 3’ phosphoryl groups, a

T4 polynucleotide kinase reaction was performed using 2 μL of each primer, 1 μL 10X primer kinase buffer, 1 μL T4 polynucleotide kinase, and filled to 10 μL with nuclease-free water. PNK reactions were incubated at 37 °C for 10mins. PCR reactions were made using 5 μL of 10X PFU turbo buffer, 50ng of template DNA, 1 μL of dNTPs, 1 μL of PNK reaction, 1 μL of PFU turbo polymerase, and filled to 50 μL with nuclease-free water. The reactions were run using the following template.

Touch Down Polymerase Chain Reaction

Cycles Temperature (°C) Time

1 95 5 minutes

95 30 seconds

2 BT+8 40 seconds

72 1min/1000bp

95 30 seconds

2 BT+6 40 seconds

72 90 seconds

95 30 seconds

2 BT+4 40 seconds

72 1min/1000bp

12

95 30 seconds

2 BT+2 40 seconds

72 1min/1000bp

95 30 seconds

27 BT 40 seconds

72 1min/1000bp

1 72 5 minutes

4 ∞ (Hold)

2.13 PCR by exclusion

The SF3B3 -35, -189, -391, and -728 deletion constructs were made using PCR by exclusion. Primers were designed that excluded the region to be removed. The Reverse primer was placed in the region of the plasmid at the beginning of the promoter region containing a fragment of the pGC_Blue cloning vector. Forward primers were placed at the end of the region to be removed. A touchdown PCR was performed (2.12).

2.14 Designing deletion constructs

Deletion constructs were designed to test for transcription factor activity. Initial Deletion sites were designed based on the Bioinformatics analysis of the promoter region. These initial regions were the SF3B3 -728, -629, -391, -257, and -35. The SF3B3 -35, -391, and -728 deletion constructs were created using PCR by exclusion (2.13), and the SF3B3 -257, and -629 region were created by site-directed mutagenesis (2.15). Mutagenesis constructs were designed after the initial deletion analysis of the promoter. Constructs were designed to remove 2-3 bp from the consensus binding motifs of transcription factor binding sites according to the JASPAR binding motif database (http://jaspar.genereg.net/).

13

Products were confirmed by gel electrophoresis and purified via gel extraction (2.3). Products were ligated (2.4) and transformed into E. coli (2.9). Plasmids were purified by MiniPrep plasmid extractions (2.8) and sent in for sequencing (2.10).

2.15 QuikChange site-directed mutagenesis kit

Primers were designed to alter a single nucleotide in a restriction enzyme site. Primers matched the target sequence with the exception of the nucleotides to be altered. For deletion constructs, a single site was altered to create a new restriction enzyme site that was used to remove the desired fragment. Using the new restriction enzyme cut site, the desired fragment was cut out(2.7) and re-ligated into pGC_Blue(2.4).

Products were confirmed by gel electrophoresis and purified via gel extraction (2.3). Products were ligated (2.4), and transformed into E. coli (2.9). Plasmids were purified by MiniPrep plasmid extractions (2.8), and sent in for sequencing (2.10).

2.16 Transient transfections of promoter constructs into human embryonic kidney cells

(HEK293T).

HEK293T cells were counted and seeded into 12-well plates at a density of 2 x 105 cells with 1 mL of 10% fetal bovine serum (FBS) in Dulbecco modified Eagle medium (DMEM).

Each of the constructs was transfected into three separate wells. After seeding, the 12-well plate was placed in the incubator at 37 °C with 5% CO2 for 28-30h. The plasmid constructs, including controls, were transfected. All plasmids were purified using Thermo Scientific’s GeneJET

Plasmid miniprep kit (2.8). The media was removed, and cells were washed with 1 mL of 1X

PBS, and 800 μL of media as added to each well. Each transfection was prepared using a 2.0 mL master mix with 50 ng/μL pGL3 construct filled to 16 μL elution buffer, 9.6 μL enhancer. The mixture was vortexed for 1 s, and incubated 5 min. 4 μL of effectene reagent was added, tubes

14 were vortexed for 10 s, and tubes were incubated at room temperature for 10 min. 1600 μL of pre-warmed media was added. 375 μL of the mixture was added to each of the three wells, and plates were incubated at 37 °C with 5% CO2 for 48-52 h.

2.17 Harvesting proteins from transiently transfected HEK293T cells

The media was removed, and cells were washed with 1 mL of 1X PBS. 250 μL of 1X passive lysis buffer (PLB) was added. For each well, the cells were detached from the surface by pipetting liquid up and down. The lysate was transferred to a microcentrifuge tube and vortexed.

The tubes were incubated in a dry-ice ethanol bath for 5 min followed by 5 min in a 37 °C water bath; this was performed a total of three times, vortexing between cycles. Samples not immediately assayed were stored at -80 °C.

2.18 Dual-luciferase assay

pGL3 constructs were assayed using Promega’s reagents from the Dual-Luciferase

Reporter Assay System and the Monolight 2010 Luminometer. The luminometer was programmed to calculate the normalization ratio of firefly luciferase activity.

2.19 Bioinformatics analysis of the SF3B3 promoter

To determine the presence or absence of a CpG island, the GenBank defined SF3B3 putative promoter was analyzed using the DataBase of CpG islands and

Analytical Tool (DBCAT). This algorithm calculated the number of cysteine guanine dinucleotides and compared them with the observed/expected value, which was set to the highest option at 65%. To locate potential TATA-box binding sites, and the corresponding BRE sites, a search was performed using the Juven-Gurshon Labs free online software

(http://lifefaculty.biu.ac.il/gershon-tamar/index.php/element-description/element) with all parameters set to the default search.

15

2.20 Multiple sequence alignment (MSA)

Based on the data from the phylogenetic tree and the first round of transfections. An

MSA was performed on the sequence from SF3B3 -35 to the -100 region to determine the evolutionary conservation of the Elk-1, Ap-2, and TATA-box binding regions. The free bioinformatics program clustal omega was used to align the sequences

(https://www.ebi.ac.uk/Tools/msa/clustalo/). 13 species were used Homo sapiens, Pan troglodytes, Macaca mulata, Papio anubis, Bos Taurus, Canis lupis familiaris, Felis catus,

Rattus norvegicus, Mus musculus, Peromyscus leocupos, Criceteus grisus, and Danio rerio.

2.21 Phylogenetic Tree Analysis

The sequence from the SF3B3 -245 to the +1 was selected based on initial transfection data. A phylogenetic tree of the 13 species used for the MSA (2.20) was inferred by using the

Maximum Likelihood method and Tamura-Nei model (Tamura et al. 1993). The bootstrap consensus tree inferred from 500 replicates is taken to represent the evolutionary history of the taxa analyzed (Felsenstein et al. 1985). Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are collapsed. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches (Felsenstein et al. 1985). Initial trees for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and selecting the topology with superior log likelihood value. This analysis involved 13 SF3B3 promoters from 13 different species. There were a total of 322 positions in the final dataset. Evolutionary analyses were conducted in MEGA X (Kumar et al. 2018).

2.22 5’ RACE

16

5’ RACE was performed in order to determine the site(s) of transcription initiation. Three primers were designed GSP1: GTATTCCAAAATAACAATTC, GSP2:

AGTCTTTGGTGCCACCTGTC, and GSP3: CTGCAGGCTTTCTTGGACTC. The GSP1 primer was used to synthesize the first-strand cDNA from total human RNA. The sample was purified by the SNAP column purification of cDNA. cDNA was TdT tailed to allow for the binding of the GSP2 primer on the 3’ end during PCR. The PCR using the GSP2 primer was performed to obtain the final cDNA product.

PCR of dC-Tailed cDNAn

Cycles Temperature (°C) Time

1 95 5 minutes

95 30 seconds

35 55 60 seconds

72 120 seconds

1 72 7 minutes

4 ∞ (Hold)

A nested amplification was performed to amplify product to a usable concentration. The

GSP3 primer was designed for the purpose of troubleshooting, but was not used.

Nested Amplification

Cycles Temperature (°C) Time

1 95 5 minutes

95 30 seconds

35 55 60 seconds

72 120 seconds

17

1 72 7 minutes

4 ∞ (Hold)

Amplified 5’ RACE products were transformed into Neb 10-Beta comp cells (2.9), overnight cultures were purified by Thermo Scientific GeneJET Plasmid Miniprep (2.8) and cycle sequenced.

18

Results

3.1 Initial identification and bioinformatic analysis of the splicing factor 3b subunit 3

(SF3B3) promoter

In this experiment, the splicing factor 3b subunit 3 genes (SF3B3) promoter was identified and classified to determine its functional relevance in the role of regulating the SF3B3 protein levels. The SF3B3 putative promoter region was initially identified through the NCBI public database (https://www.ncbi.nlm.nih.gov/gene/23450). Proximal promoter elements are generally found -50 to -500 bp (Matson et al. 2006)from the TSS. However, we selected a broader region to account for the possibility of unusual activity. Using the GenBank defined TSS as a reference, the -938 to the +346 region was selected for amplification and analysis.

The SF3B3 putative promoter -938 to the +346 region was first analyzed with several bioinformatics tools to determine potential regulatory elements by scanning for consensus transcription factor binding sequences within the putative promoter region. First, consensus core promoter elements were identified using the Juven-Gurshon lab’s analytical software. TATA- box binding sequences and the corresponding B Recognition Elements (BRE) were found. Two overlapping TATA-box binding sites were identified overlapping at -49 relative to the 5’ RACE determined TSS. These two sites were TATATATT (PWM 0.1429 consensus match 7/8) and

TATATATA (PWM 0.00357 consensus match 6/8). The BRE upstream sequence GCGCGCC

(PWM 1.00000) was found at -9 relative to the TSS, and the BRE downstream sequence

GTATGTT (PWM 0.7126) was found at +32 relative to the TSS (Figure1.) Functional TATA- boxes are generally found approximately 30bp upstream from the TSS in focused promoters (Liu et al. 2013). The potential TATA-box binding site in the SF3B3 putative promoter sequence fell

19 outside of the normal range, but not outside of observed ranges(Patikoglou et al. 1999) (Dolfini et al. 2009) (Zimmerman et al. 2015).

CpG islands are often found associated with TATA-box driven promoters, so the SF3B3 putative promoter was analyzed to determine the presence or absence of a CpG island.

CpG islands are regions of the genome which contain a higher than expected frequency of cysteine guanine dinucleotides. The SF3B3 putative promoter from the -938 to the +346 was analyzed using the DataBase of CpG islands and Analytical Tool (DBCAT) with an observed/expected (o/e) value set to the highest option at 65%. The DBCAT found a large CpG island 1,284bp in length with a GC content of 64%. DNA regions greater than 500bp in length are generally considered to be CpG islands if they have a GC content of at least 55% with and o/e value of at least 60% (Takei et al. 2001). The SF3B3 putative promoter fits all these requirements and can be considered a CpG island. The CpG dinucleotides are highlighted in

(figure 1).

A 5’ RACE was performed in order to confirm or refute the Gen-bank assigned start site(s) of transcription initiation (TSS) and/or to determine additional TSS for the SF3B3 gene.

The 5’ RACE was performed and sequences were analyzed by pairwise sequence alignment of the -938 to the +346bp SF3B3 putative promoter. A single TSS was found from 3 replicates matching the GenBank defined TSS at -49bp from the TATA-box binding site (Figure 1.) While the TATA-box binding sites are almost always found approximately 30bp upstream from the

TSS, examples of TATA-box binding sites have been found 40-120bp upstream of the 5’ RACE determined TSS (Liu et al. 2013). The SF3B3 promoter had only a single detectable TSS, and a

TATA-box binding site, which, while further than usual from the TSS was found in range of the

20 associated BRE binding sites. These data would suggest a Type I promoter; however, the presence of a large CpG island (1,284bp) indicates a Type III promoter. (Danino et al. 2015).

Figure 1. The SF3B3 promoter sequence with 5’ RACE determined transcription start site, transcription factor binding sites, and deletion constructs. The GenBank assigned transcription and 5’ RACE determined transcription start site is indicated by a red arrow and a “+1”. A bioinformatically determined TATA box is shown in bold red text. BRE upstream, and BRE downstream sites are underlined. CpG sites are indicated by blue text. Colored boxes indicate consensus sequences of transcription factor binding sites (TFBSs) and the transcription factor identity labeled above the box. Blue boxes indicate TFBSs common between the results of Alibaba2 and Match. Green boxes indicate TFBSs common between the results of Alibaba2 and PROMO. Orange boxes indicate TFBSs common between the results of Match and PROMO. Red boxes indicate TFBSs common between the results of all three bioinformatics programs. 5’ ends of the deletion constructs are indicated by bold ^ and corresponding numbers.

3.2 The -189 to the -35 Region of the SF3B3 Promoter is Responsible for Maximal

Promoter Activity

21

To ascertain if the putative promoter is a functional promoter sequence, a series of deletions, were designed. The SF3B3 -938 construct contains the -938 to the +346 region of the SF3B3 promoter. The SF3B3 -938 construct was transiently transfected into human embryonic kidney cells (HEK293T), which express the SF3B3 gene (Gao et al. 2012). The

HEK293T cells were harvested, and the activity of the SF3B3 -938 construct was compared with a promoterless control construct, pGL3_Basic. The assay revealed that the SF3B3 -938 construct had 34 fold more activity than the promoterless control, which definitively shows that the sequence isolated was a functional promoter.

In order to determine the essential regions of the SF3B3 promoter activity, a series of deletion constructs were designed based on the location of potential transcription factor binding sites (TFBS). Potential TFBS were identified through the TFBS consensus sequence similarity algorithms Alibaba 2.1, MATCH, and PROMO (Figure 1)(Saunders et al. 2015). Sixteen potential TFBS were identified: two Oct-01 sites, three C-ets-1, one HNF-1c site, one C/EBPa site, three Elk-1 sites, one NF-kB site, three AP-2aA sites, and one E2F site. One of the Elk-1 sites and a C-ets-1 were found by all three algorithms: Alibaba 2.1, MATCH, and PROMO

(Figure 1). Based on these data, six deletion constructs were designed by removing the regions upstream from the -728, -649, -391, -257, -189, and -35 sites relative to the 5’ RACE determined

TSS (Figure 1). Each construct is designated by the furthest upstream bp it contains; for example, the SF3B3 -728 construct consists of the -728 to the +346 region of the SF3B3 promoter. The SF3B3 -728 construct removed a cluster of 7 TFBS in the -938 to -

728bp region. The SF3B3 -391bp construct removed three TFBS that fell between the -600 to the

-500bp region. The SF3B3 -257 construct removed a smaller region containing an overlapping c- ets-1 and an Elk-1. The SF3B3 -35 construct removed an Elk-1, an AP-2aA, and the TATA-box

22 binding site. The SF3B3 -649 construct and SF3B3 -189bp construct did not remove any unique potential TFBS and were selected for use a controls. The specific sequences of each deletion are detailed in (Figure 1).

All deletion constructs, along with the original SF3B3 -938 construct and the promoterless control construct, were transiently transfected into HEK293T cells and assessed by dual luciferase assay (Figure 2). Six of the seven SF3B3 promoter deletion constructs, the SF3B3 -728, -649, -391, -257, and the -189bp, showed comparable promoter activity to the SF3B3 -938 construct. The SF3B3 -35 construct was the only construct significantly lower in activity from the SF3B3 -938 construct. In fact, it was not even significantly higher than the promoterless control ( Figure 1) The significant drop in expression from the SF3B3 -189 construct to the SF3B3 -35 construct suggested this intervening region was responsible for the maximal promoter activity. Three consensus TFBS sites were previously noted in the SF3B3 -

189 to -35bp region: an AP-2aA (-37) site, an Elk-1 (-82) site, and the TATA-box (-49) (see figure 1). These TFBS were further analyzed to determine if they were responsible for the activity seen in the SF3B3 -189 to the -35 region.

23

Figure 2. Transcriptional activity of the GenBank assigned SF3B3 promoter region. The SF3B3 -938 until the -189 region shows maximal luciferase activity of the SF3B3 promoter region. Transient transfections of human embryonic kidney (HEK293T) cells were performed with the Promoterless Control, SF3B3 -938, and the SF3B3 -728, -629, -391, -257, and -35 deletions, the size of these deletions is represented by the bar on the left-hand of the graph. The yellow box with Luc represents the luciferase gene. All constructs were shown as fold activity relative to the promoterless control. A two-tailed Student-t-test was used to determine if fold values were significantly different (p<0.05). All constructs were significantly different from the Promoterless Control except for the SF3B3 -37 construct. Only the SF3B3 -35 construct was significantly different from the SF3B3 -938 constructs , and the SF3B3 -35 construct was significantly different from all other constructs. 3.3 The TATA-box, Elk-1, and Ap-2aA are conserved

As shown in (Figure 2) three TFBS sites are necessary for maximal promoter activity:

AP-2aA (-37) , the Elk-1 (-82) , and the TATA-box (-49) sites. To determine if these sites were evolutionarily conserved, a multiple sequence alignment (MSA) of the SF3B3 -150 to the -35 bp promoter region. The MSA was performed using the European Bioinformatics Institutes (EMBL-

EBI) free MSA software ClustalX. This alignment revealed that the SF3B3 -150 to the -35 bp region was highly conserved, with 70.39% on average among the species, with the highest being

100%, and the lowest being 56.04% (Figure 5). The outer lying areas of the promoter region ranged from 66.86% to 61.65% conservation on average (data not shown). More importantly, the

TFBS Ap-2aA (-37) , Elk-1 (-82) , and the TATA-box (-49) show high conservation amongst 24 related species (Figure 5.) The Ap-2aA site showed the highest conservation at 85.61%, and all species had what appeared to be a functional binding site for Ap-2aA (Figure 5). The Elk-1 showed the lowest conservation of the three at 59.25% on average, and the five species that did not appear to have a functional Elk-1 binding site were the zebrafish, Norway rat, domestic cat, dog, and the house mouse. The TATA-box binding site [TATA(A/T)A(A/T)(A/G] was present within this region for all species except the zebrafish and Chinese hamster and was 80.55% conserved on average. Sequences matching the TFBS according to the JASPAR binding motif database (http://jaspar.genereg.net/) are highlighted in (Figure 3), and the bp removed for the deletions are shown in bold on the human sequence.

Figure 3. Multiple Sequence alignment of the TATA box and nearby region. The Ap-2, Elk-1, and TATA box are all highly conserved in primates. A multiple sequence alignment of the region near the three binding sites was performed. The AP-2ὰA site is highlighted in green the and the region removed is shown in bold. The Elk-1 is highlighted in orange; the base pairs removed are shown in bold. The TATA-box binding site is highlighted in blue; the region that was removed is bolded. A sequence matching an Elk-1 site is underlined in the Norway rat in the Ap-2 box.

3.4 Phylogenetic analysis of the -254 to the -1 bp region of the SF3B3 promoter

While non-coding regions are generally not subject to selective pressure, promoter regions tend to have higher conservation over broad phylogenetic groups (Doniger and Fay,

2007) (Grade et al. 2017). A phylogenetic tree of 13 species was constructed from the multiple sequence alignment (MSA), to illustrate better the evolutionary conservation shown in the MSA.

The tree was generated by using the Maximum Likelihood method and the Tamura-Nei model

25

(Tamura et al. 1993). The bootstrap consensus tree was generated using the Maximum

Likelihood method and the Tamura-Nei model (Tamura et al. 1993). This tree was constructed using 13 species Homo sapiens, Pan troglodytes, Macaca mulata, Papio anubis, Bos Taurus,

Canis lupis familiaris, Felis catus, Rattus norvegicus, Mus musculus, Peromyscus leocupos,

Criceteus grisus, and Danio rerio as the outgroup (Figure 4). This tree was compared to a tree constructed using the protein-coding sequence of SF3B3 (Supplemental Figure 1), which was one node closer to the primates in a tree constructed from the protein-coding sequence. Primates are grouped into one category with human and chimpanzee in one subgroup, and the rhesus monkey and olive baboon in the other. All bootstrap confidence values between species were above 80%, with the lowest being Chinese hamster and the white-footed mouse at 86%. The highest confidence value was between the human and the chimpanzee at 99%. The Artiodactyla and the Carnivora were placed into one group, with zebu cattle, and cattle in one subgroup, and the domestic cat, and dog in the other subgroup. The Rodentia were placed into their own group with the Norway rat and the house mouse as one subgroup, and the white-footed mouse and the

Chinese hamster as the other. The zebrafish was used as an outgroup to root the tree. With the exception of the minor change in the Artiodactyla, all of the species used to make the tree were organized into expected categories based on a tree made using the SF3B3 protein-coding region

(Supplemental Figure 1).

26

Figure 4. Phylogenetic tree constructed with the SF3B3 promoter A bootstrap consensus phylogenetic tree of the SF3B3 promoter from -1 to -245 region The tree was produced using the software Molecular Evolutionary Genetics Analysis (MEGAX) using a maximum likelihood bootstrap analysis with 500 replication. Probabilities are shown at branch points as values between 0 and 100 and correspond to % likelihoods. The tree was rooted using zebrafish as the outgroup. A second tree was constructed using the protein-coding sequence for SF3B3 and was used to determine expected categories. With the exception of the Artiodactyla, which was grouped as its own branch sister to the primates, all species were organized into expected groups.

3.5 The SF3B3 promoter was driven by a TATA-box, an Elk-1, and an Ap-2aA

Our analysis of the promoter region showed that the SF3B3 -189 to the -35 bp region was necessary and sufficient for maximal promoter activity (Figure 1.). The initial tests did not definitively determine what specific sequences within this region were responsible for this activity. The potential TFBS locations (Figure 1) and the MSA (Figure 3.) revealed an AP-2aA (-

37) , an Elk-1 (-82) , and the TATA-box (-49) as targets for the second set of transfections. A new set of deletions were designed to specifically interrupt these TFBS without affecting the rest of the SF3B3 -938 bp construct, to ascertain if these sites were responsible for SF3B3 promoter activity. As in the first set of deletions, primers were designed to flank the sequences to be removed, PCR by exclusion was performed, and mini-deletion constructs were confirmed by sequencing. Each new deletion removed 2-3 bp of the targeted TFBS motif but was otherwise the

27 same sequences as in the SF3B3 -938bp promoter construct. For the first new deletion, AP-2aA

(-49), that has the sequence CTTCAGGCGCG, the CTT in bold was deleted. In the second deletion to disrupt the Elk-1 (-82) that has the binding site CGGAAGCA, the GA in bold was deleted to disrupt the Elk-1 core binding sequence. Last, for the TATA-box (-49) that has the sequence TATATATAT, the bolded TAT was deleted to disrupt the core binding sequence.

These deletion constructs for AP-2aA (-37) , Elk-1 (-82) , Elk-1/ AP-2aA and the TATA- box (-49) , were transfected into HEK293T cells, and relative transcriptional activity was again analyzed by dual-luciferase assay (2.18). The SF3B3 -938, -189, AP-2aA (-37) , Elk-1 (-82) ,

Elk-1/AP-2aA, TATA-box (-49) , and the SF3B3 -35 constructs were compared to the promoterless control (Figure 5.). As expected, the SF3B3 -189 construct, and the SF3B3 -938 construct both showed maximal levels of transcription, and the SF3B3 -35 construct was not significantly different from the promoterless control. Removal of the AP-2aA (-37) binding site showed a significant decrease, as did the removal of the Elk-1 (-82) binding site compared to the unmutated SF3B3 -938bp construct. Surprisingly the removal of both the AP-2aA (-37) and the

Elk-1 (-82) did not result in a significant drop compared to the removal of either AP-2aA (-37) or

Elk-1 (-82) alone. Removal of the TATA-box (-49) resulted in minimal levels of transcription

(Figure 5), as would be expected for the removal of a functional TATA-box. These data indicate that the Ap-2 and Elk-1 sites were both necessary for maximal levels of transcription of SF3B3, and the TATA-box was necessary for transcription initiation (Gregory et al. 1991).

28

Figure 5. Transcriptional activity of the SF3B3 promoter region. 2-3bp of the core binding motif of the AP-2aA (-37) NNN(G/C/T)C(C/T)(T/C)NN(G/A)G(G/C)NN, Elk-1 (C/G)CGGAAG(T/C), and TATA-box TATA(A/T)A(A/T)(A/G) were removed. The AP-2aA (-37) and Elk-1 (-82) were both significantly lower than the SF3B3 -938 construct , and the TATA-box(-49) was not significantly different from the Promoterless Control. Transient transfections were performed using the Ap-2ὰA (-37) , Elk-1 (-82) , Elk-1/Ap-2ὰA, TATA (-49), SF3B3 -35 construct, Promoterless Control, and SF3B3 -938 construct. The colored boxes represent the transcription factor binding sites, and the black bars represent that the transcription factor binding site was deleted. All constructs were shown as fold activity compared to the Promoterless Control. A two-tailed Student-t-test was used to determine significance (p<0.05). The AP-2ὰA (-37) , Elk-1 (-82) , and Elk-1/Ap-2ὰA were significantly different from the Promoterless Control and significantly lower than the -938. The TATA-box (-82) removal construct, and the SF3B3 -35 construct were both not significantly different from the promoterless Control.

29

Discussion

In this study, the splicing factor 3b subunit 3 genes (SF3B3) promoter was identified and classified to determine the role it plays in the regulation of SF3B3. Analysis of the SF3B3 -938 to the +346 region revealed the presence of a TATA-box along with the corresponding B recognition elements (BRE) binding sites, and the presence of a large CpG island. A 5’ RACE was performed, which showed a single transcription start site 49bp downstream from the TATA- box binding site matching the GenBank defined TSS. The transient transfection of the SF3B3 promoter region showed that the SF3B3 -189 to the -35 region is responsible for all detectable transcription activity in the region. A multiple sequence alignment (MSA) of the SF3B3 -100 to the -35 region revealed that the three transcription factor binding sites (TFBS) in the region an Ap-2aA (-37), an Elk-1 (-82) site, and the TATA-box (-49) site were all conserved. The second round of transient transfections showed that the Ap-2aA (-37) and an Elk-

1 (-82) binding sites both increased levels of transcription and that the TATA-box binding site is necessary for transcription. As expected, these data suggest a Type III with only one TSS driven by a TATA-box and regulated by an Elk-1 and an Ap-2aA (Danino et al. 2015).

Splicing factor 3b subunit 3 (SF3B3) is a splicing factor, a gene that is responsible for the genetic regulation of a large number of other genes, more specifically it encodes for a spliceosomal protein subunit that plays a role in the spliceosome assembly of 17S U2 snRNP which is necessary for splicing (Das et al. 1999). The rising importance of splicing factors in recent years play has led to a greater understanding of their functions, and the role they have in the development of diseases, especially in the onset of cancer (Grosso et al. 2008) (Gökmen et al.

2014). Splicing factors, including SF3B3, have even been considered as potential therapeutic targets (Polar et al. 2014). With this growing importance comes a greater understanding of their

30 function and their regulation (Gonclaves and Jordan, 2014), however very little work has been done on the pre-transcriptional regulation of splicing factors (Das et al. 2012).

The SF3B3 promoter region was initially identified using the NCBI GenBank public database. The -938 to the +346 region of the promoter relative to the GenBank-assigned start site was analyzed to determine the classification of the promoter. The presence of a large CpG island

(1284bp) was found in the region, along with a TATA-box binding site. In the traditional method of promoter classification, promoters are classified as either focused promoters, which are defined as having a single TSS and lack CpG islands or dispersed promoters that have multiple

TSS and contain CpG islands (Haberle and Lenhard, 2016). Promoters can be classified into three categories called Types, type I promoters are classified as containing a TATA box, and lack CpG islands, type II promoters contain CpG islands with dispersed Transcription Start Sites

(TSS), type III have features of both type I and type II promoters (Danino et al. 2012).

The SF3B3 promoter contains a TATA-box binding site and a single transcription start site, which would suggest a type I promoter. However, it also contains a large CpG island >1000bp, which would categorize it as a Type III promoter. CpG islands are more commonly found with dispersed modes of transcription, such as those in type II promoters. However, in mammals, type

III promoters are found with large CpG islands >500bp (Danino et al. 2015), which is supported by the large CpG island (1284bp) found at the SF3B3 promoter. The presence of both a TATA- box binding site and a CpG island classifies the SF3B3 promoter as a Type III promoter.

After the initial bioinformatics analysis, we performed a 5’ RACE to determine the actual

TSS. As expected, we identified a single TSS matching the location of the GenBank defined

TSS. A TATA-box binding site was found in the SF3B3 promoter region 49bp downstream from the 5’ RACE determined TSS. Surprisingly, this is outside of the usual range of ~30bp

31

(Patikoglou et al. 1999). However, TSS has been found 40-120bp from the TATA-box binding sites (Liu et al. 2013) (Dolfini et al. 2009). While the TATA-box binding site was outside of the normal range of the TSS, it was found near the associated BRE sites which accompany TATA- box binding sites. While type III promoters do contain initiator motifs, TATA-boxes are usually indicative of Type I promoters and are rarely found associated with CpG islands (Danino et al.

2015). If the TATA-box binding site is indeed functional, it would make the SF3B3 promoter highly unusual. However, our tests determined that this does indeed seem to be the case.

Transcription factors are proteins that regulate transcription activity by binding to TFBS within the proximal promoter region (Sanyal et al. 2017). Potential TFBS were identified using the bioinformatics tools Alibaba 2.1, MATCH, and PROMO, which identify TFBS through sequence similarity (Saunders et al. 2015). The first series of deletions were designed to determine which of these potential TFBS may be actively contributing to promoter activity.

Seven deletion constructs were created by removing upstream sequences from 938bp, 728bp,

649bp, 391bp, 257bp, 189bp, and 35bp upstream of the 5’ RACE determined TSS in the constructs. Each construct is designated by the furthest upstream bp it contains, and each construct goes downstream to the +346 site. These constructs were transiently transfected into

HEK29T cells, and using a dual-luciferase assay, the relative levels of luciferase expression were measured and compared to determine the level of promoter activity. Surprisingly, of the seven constructs tested, the SF3B3 -938, -728, -649, -391, -257, -189, and -35 constructs, all except for the SF3B3 -35 construct, showed maximal levels of transcription activity. The only significant drop was in the SF3B3 -35 construct, and surprisingly activity was not significantly different from a promoterless control. These data suggest that the region from SF3B3 -189 to -35 region is solely responsible for maximal promoter activity in SF3B3. From our initial bioinformatics

32 analysis, the SF3B3 -189 to the -35 region had three TFBS: an AP-2aA (-37), an Elk-1 (-82) site, and the TATA-Box (-49) site. We targeted these sites for further analysis as they seemed likely to be responsible for the activity in the region.

While non-coding regions are generally not subject to selective pressure, promoter regions tend to have some conservation over broad phylogenetic groups (Doniger and Fay,

2007), and more specifically, mutations within functional TFBS are selected against leading to a high degree of conservation within TFBS, but not intervening sequences (Grade et al. 2017). A multiple sequence alignment (MSA) of the region containing the TFBS believed to be contributing to transcription was performed to aid in the design of the mutated TFBS constructs.

An MSA of the SF3B3 -100 to the -35 region showed that all three sites were conserved. The

Ap-2aA had the highest conservation (85.61%), and the core motif of an Ap-2aA binding site appeared to be in all 13 species, including the outgroup. The Elk-1 had the lowest conservation

(59.25%), and 8 of the 13 species had what appeared to be a functional Elk-1 site. The TATA- box binding site was (80.55%) conserved, and 11 of the 13 species had a core TATA-box binding site motif. All three of these TFBS were found in the only region of the promoter that was observed to contribute to transcription activity; this coupled with the conservation of these sites, suggests that they have functional relevance (Grade et al. 2017).

Based on the bioinformatics and genetic analysis, the TFBS mutations were designed to test the functionality of the AP-2aA (-37), the Elk-1 (-82), and the TATA-box (-49). The TFBS mutations were created by removing 2-3 bp of the targeted TFBS motif within the otherwise wildtype SF3B3 -938 to the +346 region. Mutation of the AP-2aA (-37) binding site resulted in a significant drop in transcription, as did the mutation of the Elk-1 (-82) binding site compared to the SF3B3 -938 construct. Surprisingly the mutation of both the AP-2aA (-37) and the Elk-1 (-

33

82) did not result in a significant drop when compared to the mutation of either AP-2aA (-37) or

Elk-1 (-82) individually. Mutation of the TATA-box (-49) resulted in no detectable transcription activity, as would be expected for the mutation of a functional TATA-box. It is unusual to find a

TATA-box binding site outside of the typical range for humans ~38bp (Patikoglou et al. 1999).

However, as stated earlier, it is not unheard of (Liu et al. 2013) (Dolfini et al. 2009). These data confirm that the Ap-2 and Elk-1 sites are both necessary for maximal levels of transcription of SF3B3, and the TATA-box is necessary for transcription initiation (Gregory et al. 1991).

As shown in the TFBS mutagenesis, both the Ap-2aA and the Elk-1 site were shown to increase levels of transcription. Activating Enhancer Binding Protein 2 Alpha (AP-2aA) is a transcriptional activator and repressor (Eckert et al. 2005) that has been linked to genes involved in development (Eversheim et al. 2000). AP-2aA has the consensus sequence

ANNN(G/C/T)C(C/T)(T/C)NN(G/A)G(G/C)NN. ETS Transcription Factor ELK1 (Elk-1) is a well-studied ETS family transcription factor(Yang et al. 1998) that increases levels of transcription, which binds to the sequence (C/G)CGGAAG(T/C) (Hung et al. 2010), which often works with other transcription factors(Yang et al. 1998). In one study, the MAP kinase activation of Elk-1 resulted in the activation of the p300 co-activator in HEK293T cells (Yang et al. 1998)

(Li et al. 2003). In a second experiment Elk-1, and Ap-2 the TFs found in this study to regulate

SF3B3 have been shown to be associated with one another in a study done on enhancer activity in a number of human hormones (Jin et al. 2004). The SF3B3 promoter activity was shown in human embryonic kidney cells (HEK293T), which do express SF3B3 (Gao et al. 2012). As shown in the transfection data, SF3B3 expression is regulated by Ap-2aA and Elk-1. However, as SF3B3 is expressed in most tissue types, it is quite possible that different modes of transcription occur in other tissue types (Polar et al. 2014).

34

The TATA-box is a core promoter element that is the site of the preinitiation complex formation without which transcription does not take place (Star and Hawley, 1991). The TATA- box is highly conserved in Eukaryotes and Archaea; however, in humans, only about 30% of promoters contain a TATA-box (Deng et al. 2005). In the initial analysis of the promoter region, a TATA-box binding site was identified at -49bp from the TSS experimentally determined from

3 replicates of a 5’ RACE. As shown in the mutation of the TATA-box resulted in no detectable transcriptional activity; this matches what would be expected for an initiator motif like the

TATA-box. The TATA-box binding site was found at -49bp from the TSS, which is outside of the typical range ~30bp, however not outside of observed ranges (Patikoglou et al. 1999)

(Dolfini et al. 2009). The SF3B3 TATA-box binding site was highly conserved even among species not closely related, and we found, not surprisingly, that removal of the TATA-box binding site by site-specific mutagenesis resulted in no detectable transcriptional activity. The

TSS was determined from several rounds of 5’ RACE, all of which showed a single TSS in the same location matching with the GenBank defined TSS. While slightly unusual, these data support the presence of a TATA-box binding driving transcription initiation in the SF3B3 promoter region.

The atypical distance between the TSS and the TATA-box has been observed several times before, although it is rare in humans (Patikoglou et al. 1999) (Dolfini et al. 2009).

The SF3B3 promoter sequence is unusual in several ways; the SF3B3 promoter contains a CpG island and a TATA-box, which were until recently thought to be mutually exclusive (Danino et al. 2015). In the new classification system, TATA-boxes is still considered to be indicative of

Type I promoters. However, Type III promoters are less common, and for this reason, less well studied. In a study performed on the Hugl-2 promoter, there were a surprising number of

35 similarities to the SF3B3 promoter (Zimmerman et al. 2007) Both contain CpG islands, an Elk-1 site, and a TATA-box binding site. Even more interestingly, the TATA-box binding site was also outside of the normal range at -48bp from the TSS (Zimmerman et al. 2015). This is nearly the exact distance seen in the SF3B3 promoter, which was -49bp. The number of similarities between the two is surprising. The only difference between the two is that the Hugl-2 promoter contains an Sp-1 site and no AP-2aA site. Hugl-2 is responsible for, among other things, cell differentiation, and cell migration (Cao et al. 2015). The unusual similarity in promoters in both SF3B3 and Hugl-2 may be indicative of a pattern in poorly studied type III promoters that contain TATA-boxes.

In this study, the putative SF3B3 promoter was identified, isolated, analyzed using multiple sequence alignment, and promoter activity was measured using a series of transient transfections.

These data gathered from both phylogenetic and genetic analyses have revealed that the SF3B3 promoter region is a Type III promoter containing a large CpG island with a single

TSS initiated by an evolutionarily conserved TATA-box binding site and that the Ap-2 and Elk-1 binding sites are necessary for maximal promoter activity (Danino et al. 2012) (Gregory et al.

1991). Splicing factors and other regulatory elements have become the subject of greater study, with a growing understanding of their importance (Grosso et al. 2008) (Gökmen et al. 2014).

However, the pre-transcriptional regulation of regulatory elements has not been well documented

(Das et al. 2012). Furthermore, Type III promoters are not well studied, and the similarities between Hugl-2, and SF3B3 suggest possible patterns in TATA-box placement in this promoter type (Danino et al. 2015) (Zimmerman et al. 2015). Further study on this promoter and the promoters of other splicing factors will give us an understanding of the typical modes of transcription that regulate them.

36

References

A, G. (2016). TATA element recognition by the TATA box-binding protein has been conserved

throughout evolution. 3217–3230.

Aguet, F., Ardlie, K. G., Cummings, B. B., Gelfand, E. T., Getz, G., Hadley, K., …

Montgomery, S. B. (2017). Genetic effects on gene expression across human tissues.

Nature, 550(7675), 204–213. https://doi.org/10.1038/nature24277

Akerman, M., Krainer, A. R., Das, S., & Anczuko, O. (2011). Report Oncogenic Splicing Factor

SRSF1 Is a Critical Transcriptional Target of MYC.

https://doi.org/10.1016/j.celrep.2011.12.001

Allen, S. J., Watson, J. J., Shoemark, D. K., Barua, N. U., & Patel, N. K. (2013). GDNF, NGF

and BDNF as therapeutic options for neurodegeneration. Pharmacology & Therapeutics,

138(2), 155–175. https://doi.org/10.1016/j.pharmthera.2013.01.004

Bikard, D., Euler, C., Jiang, W., Nussenzweig, P. M., Goldberg, G. W., Duportet, X., …

Biotechnol, N. (2014). Development of sequence-specific antimicrobials based on

programmable CRISPR-Cas nucleases. Nat Biotechnol, 32(11), 1146–1150.

https://doi.org/10.1038/nbt.3043

Bird, A. (2002). Epigentic Memory. Genes and Development, 16, 16–21.

https://doi.org/10.1101/gad.947102.6

37

Brown, R. C., Lockwood, A. H., & Sonawane, B. R. (2005). Neurodegenerative diseases: an

overview of environmental risk factors. Environmental Health Perspectives, 113(9), 1250–

1256. https://doi.org/10.1289/EHP.7567

Brown, R. D., O’neill, S. P. T., & Gillespie, T. J. (1994). An evaluation of the solar radiant

environment in the shade of deciduous trees. Arboricultural Journal, 18(2), 193–204.

https://doi.org/10.1080/03071375.1994.9747015

Cao, F., Miao, Y., Xu, K., & Liu, P. (2015). Lethal ( 2 ) Giant Larvae : An Indispensable

Regulator of Cell Polarity and Cancer Development. 11(2).

https://doi.org/10.7150/ijbs.11243

Chem, C. B. I. O. (2013). Comparison of Splicing Factor 3b Inhibitors in Human. 43210, 49–52.

https://doi.org/10.1002/cbic.201200558

Co, W. (n.d.). Laboratory 1. 1–9.

Corti, S., Faravelli, I., Cardano, M., & Conti, L. (2015). Human pluripotent stem cells as tools

for neurodegenerative and neurodevelopmental disease modeling and drug discovery.

Expert Opinion on Drug Discovery, 10(6), 615–629.

https://doi.org/10.1517/17460441.2015.1037737

D’Addario, C., Dell’Osso, B., Palazzo, M. C., Benatti, B., Lietti, L., Cattaneo, E., … Altamura,

A. C. (2012). Selective DNA methylation of BDNF promoter in bipolar disorder:

Differences among patients with BDI and BDII. Neuropsychopharmacology, 37(7), 1647–

1655. https://doi.org/10.1038/npp.2012.10

38

Danino, Y. M., Even, D., Ideses, D., & Juven-Gershon, T. (2015). The core promoter: At the

heart of gene expression. Biochimica et Biophysica Acta - Gene Regulatory Mechanisms,

1849(8), 1116–1131. https://doi.org/10.1016/j.bbagrm.2015.04.003

Danino, Y. M., Even, D., Ideses, D., & Juven-gershon, T. (2015). NU. BBA - Gene Regulatory

Mechanisms. https://doi.org/10.1016/j.bbagrm.2015.04.003

Das, B. K., Xia, L., Palandjian, L., Gozani, O., Chyung, Y., & Reed, R. (2015). Characterization

of a Protein Complex Containing Spliceosomal Proteins SAPs 49, 130, 145, and 155.

Molecular and Cellular Biology, 19(10), 6796–6802.

https://doi.org/10.1128/mcb.19.10.6796

Deaton, A. M., & Bird, A. (2011). CpG islands and the regulation of transcription CpG islands

and the regulation of transcription. 1010–1022. https://doi.org/10.1101/gad.2037511

Deng, W., & Roberts, S. G. E. (2005). A core promoter element downstream of the TATA box

that is recognized by TFIIB. Genes and Development, 19(20), 2418–2423.

https://doi.org/10.1101/gad.342405

Dolfini, D., Zambelli, F., Pavesi, G., Mantovani, R., Dolfini, D., Zambelli, F., … Mantovani, R.

(2009). A perspective of promoter architecture from the CCAAT box. 4101.

https://doi.org/10.4161/cc.8.24.10240

Doniger, S. W., & Fay, J. C. (2007). Frequent Gain and Loss of Functional Transcription Factor

Binding Sites. 3(5). https://doi.org/10.1371/journal.pcbi.0030099

39

Eckert, D., Buhl, S., Weber, S., Jäger, R., & Schorle, H. (2005). Protein family review The AP-2

family of transcription factors. https://doi.org/10.1186/gb-2005-6-13-246

Egan, M. F., Kojima, M., Callicott, J. H., Goldberg, T. E., Kolachana, B. S., Bertolino, A., …

Weinberger, D. R. (2003). The BDNF val66met polymorphism affects activity-dependent

secretion of BDNF and human memory and hippocampal function. Cell, 112(2), 257–269.

https://doi.org/10.1016/S0092-8674(03)00035-7

Engel, K. L., Mackiewicz, M., Hardigan, A. A., & Myers, R. M. (2017). to functional

interpretation. 40–50. https://doi.org/10.1016/j.semcdb.2016.05.014.Decoding

Felsenstein, J. (1985). Confidence Limits on Phylogenies: an Approach Using the Bootstrap.

Evolution; International Journal of Organic Evolution, 39(4), 783–791.

https://doi.org/10.1111/j.1558-5646.1985.tb00420.x

Gao, H.-M., & Hong, J.-S. (2008). Why neurodegenerative diseases are progressive:

uncontrolled inflammation drives disease progression. Trends in Immunology, 29(8), 357–

365. https://doi.org/10.1016/j.it.2008.05.002

Gökmen-Polar, Y., Neelamraju, Y., Goswami, C. P., Gu, X., Nallamothu, G., Janga, S. C., &

Badve, S. (2015). Expression levels of SF3B3 correlate with prognosis and endocrine

resistance in estrogen receptor-positive breast cancer. Modern Pathology, 28(5), 677–685.

https://doi.org/10.1038/modpathol.2014.146

Goodrich, J. A., & Tjian, R. (2011). NIH Public Access. 11(8), 549–558.

https://doi.org/10.1038/nrg2847.Unexpected

40

Gris, D. (2013). Public Access NIH Public Access. 185(2), 974–981.

https://doi.org/10.1038/mp.2011.182.doi

Grosso, A. R., Martins, S., & Carmo-fonseca, M. (2008). The emerging role of splicing factors in

cancer. 9(11). https://doi.org/10.1038/embor.2008.189

Hilger-eversheim, K., Moser, M., Schorle, H., & Buettner, R. (2000). Regulatory roles of AP-2

transcription factors in vertebrate development , apoptosis and cell-cycle control. 260, 1–

12.

Hong, E. J., McCord, A. E., & Greenberg, M. E. (2008). A Biological Function for the Neuronal

Activity-Dependent Component of Bdnf Transcription in the Development of Cortical

Inhibition. Neuron, 60(4), 610–624. https://doi.org/10.1016/j.neuron.2008.09.024

Hoyer, J. S., Breton, J. L. P. G., Hassert, M. A., Holcomb, E. E., Fowler, H., Bauer, K. M., …

Carrington, J. C. (2019). Functional dissection of the ARGONAUTE7 promoter. (August

2018), 1–14. https://doi.org/10.1002/pld3.102

Hung, C., Liu, X., Kwon, M., Kang, Y., Chung, S. W., & Perrella, M. A. (2010). Regulation of

heme oxygenase-1 gene by peptidoglycan involves the interaction of Elk-1 and C / EBP ␣

to increase expression. 1(6). https://doi.org/10.1152/ajplung.00382.2009.

Imagawa, M., & Chiu, R. (1988). Transcription Factor AP-2 Mediates Induction by Two

Different Signal-Transduction Pathways : Protein Kinase C and cAMP. 51, 251–260.

Jin, Y. A. N., Norquay, L. D., Yang, X., Gregoire, S., & Cattini, P. A. (2015). Binding of AP-2

and ETS-Domain Family Members Is Associated with Enhancer Activity in the 41

Hypersensitive Site III Region of the Human Growth Hormone / Chorionic

Somatomammotropin Locus. 18(October), 574–587. https://doi.org/10.1210/me.2003-0405

Joyce, N., Annett, G., Wirthlin, L., Olson, S., Bauer, G., & Nolta, J. A. (2010). Mesenchymal

stem cells for the treatment of neurodegenerative disease. Regenerative Medicine, 5(6),

933–946. https://doi.org/10.2217/rme.10.72

Juo, Z. S., Chiu, T. K., Leiberman, P. M., Baikalov, I., Berk, A. J., & Dickerson, R. E. (1996).

How proteins recognize the TATA box. Journal of Molecular Biology, 261(2), 239–254.

https://doi.org/10.1006/jmbi.1996.0456

Juven-gershon, T., & Kadonaga, J. T. (2010). Regulation of gene expression via the core

promoter and the basal transcriptional machinery. Developmental Biology, 339(2), 225–229.

https://doi.org/10.1016/j.ydbio.2009.08.009

Kampmann, M. (2017). A CRISPR Approach to Neurodegenerative Diseases. Trends in

Molecular Medicine, 23(6), 483–485. https://doi.org/10.1016/j.molmed.2017.04.003

Keller, S., Sarchiapone, M., Zarrilli, F., Videtic, A., & Ferraro, A. (2010). Promoter Methylation

in the Wernicke Area of Suicide Subjects. Arch Gen Psychiatry, 67(3), 258–267.

Kirmiz, N., Cross, B., Finseth, F., Coffroth, M. A., Larson, E., Gerrish, G., & Moeller, H. (2019).

S PRING 2019. 5125.

Kumar, S., Stecher, G., Li, M., Knyaz, C., & Tamura, K. (2018). MEGA X: Molecular

evolutionary genetics analysis across computing platforms. Molecular Biology and

Evolution, 35(6), 1547–1549. https://doi.org/10.1093/molbev/msy096

42

Li, J., Liu, Z., Zhang, R., Ma, S., Lin, T., Li, Y., … Wang, Y. (2019). Sp1 contributes to

overexpression of stanniocalcin 2 through regulation of promoter activity in colon

adenocarcinoma. 25(22), 2776–2787. https://doi.org/10.3748/wjg.v25.i22.2776

Liu, X., Bushnell, D. A., & Kornberg, R. D. (2012). Biochimica et Biophysica Acta RNA

polymerase II transcription : Structure and mechanism ☆. BBA - Gene Regulatory

Mechanisms. https://doi.org/10.1016/j.bbagrm.2012.09.003

Lunn, J. S., Sakowski, S. A., Hur, J., & Feldman, E. L. (2011). Stem cell technology for

neurodegenerative diseases. Annals of Neurology, 70(3), 353–361.

https://doi.org/10.1002/ana.22487

Marais, R., Wynne, J., & Treisman, R. (1993). The SRF Accessory Protein Elk-l Contains a

Growth Factor-Regulated Transcriptional Activation Domain. 73, 391–393.

Maston, G. A., Evans, S. K., & Green, M. R. (2006). Transcriptional Regulatory Elements in the

Human Genome. Annual Review of Genomics and Human Genetics, 7(1), 29–59.

https://doi.org/10.1146/annurev.genom.7.080505.115623

McGeer, P. L., & McGeer, E. G. (1995). The inflammatory response system of brain:

implications for therapy of Alzheimer and other neurodegenerative diseases. Brain

Research Reviews, 21(2), 195–218. https://doi.org/10.1016/0165-0173(95)00011-9

Mitra, M., Agarwal, P., Kundu, A., Banerjee, V., & Roy, S. (2019). Investigation of the effect of

UV-B light on Arabidopsis MYB4 ( AtMYB4 ) transcription factor stability and detection of

a putative MYB4-binding motif in the promoter proximal region of AtMYB4 (Vol. 4).

43

Moi, P., Chant, K., Asunis, I., Cao, A., & Kant, Y. W. A. I. (1994). Isolation of NF-E2-related

factor 2 ( Nrf2 ), a NF-E2-like basic leucine zipper transcriptional activator that binds to

the tandem NF-E2 / AP1 repeat of the f-globin locus control region. 91(October), 9926–

9930.

Neelamraju, Y., Goswami, C. P., Gu, X., Nallamothu, G., Janga, S. C., & Badve, S. (2015).

Expression levels of SF3B3 correlate with prognosis and endocrine resistance in estrogen

receptor-positive breast cancer. 677–685. https://doi.org/10.1038/modpathol.2014.146

Niebler, S., & Bosserhoff, A. K. (2013). The transcription factor activating enhancer-binding

protein epsilon ( AP – 2 e ) regulates the core promoter of type II collagen ( COL2A1 ).

280, 1397–1408. https://doi.org/10.1111/febs.12130

Patty Van Cappellen, Elise L. Rice, Lahnna I. Catalino, and B. L. F. (2017). Upward Spiral,

Positive Affect, Health Behaviors. Psychology & Health, 4(5), 547–566.

https://doi.org/10.1002/wrna.1178.Alternative

Pellicer, J., Fay, M. F., & Leitch, I. J. (2010). The largest eukaryotic genome of them all?

Botanical Journal of the Linnean Society, 164(1), 10–15. https://doi.org/10.1111/j.1095-

8339.2010.01072.x

Pruunsild, P., Kazantseval, A., Aid, T., Palm, K., & Timmusk, T. (2007). Dissecting the human

BDNF locus: Bidirectional transcription, complex splicing, and multiple promoters.

Genomics, 90(3), 397–406. https://doi.org/10.1016/j.ygeno.2007.05.004

44

Repele, A., Krueger, S., Bhattacharyya, T., Id, M. Y. T., & Id, M. (2019). The regulatory control

of Cebpa enhancers and silencers in the myeloid and red-blood cell lineages. 1–24.

Rose, A. B., Elfersi, T., Parra, G., & Korf, I. (2008). Promoter-Proximal Introns in Arabidopsis

thaliana Are Enriched in Dispersed Signals that Elevate Gene Expression. 20(March), 543–

551. https://doi.org/10.1105/tpc.107.057190

Sakata, K., Woo, N. H., Martinowich, K., Greene, J. S., Schloesser, R. J., Shen, L., & Lu, B.

(2009). Critical role of promoter IV-driven BDNF transcription in GABAergic transmission

and synaptic plasticity in the prefrontal cortex. 106(14).

Sanyal, S., Molnarova, L., & Gregan, J. (2017). Promoter interactions direct chromatin folding in

embryonic stem cells. Nature Structural and Molecular Biology, 24(6), 494–495.

https://doi.org/10.1038/nsmb.3421

Saunders, J., Wisidagama, D. R., Morford, T., & Malone, C. S. (2016). Maximal Expression of

the Evolutionarily Conserved Slit2 Gene Promoter Requires Sp1. Cellular and Molecular

Neurobiology, 36(6), 955–964. https://doi.org/10.1007/s10571-015-0281-8

Scott, K. C., Taubman, A. D., & Geyer, P. K. (1999). Enhancer Blocking by the Drosophila

gypsy Insulator Depends Upon Insulator Anatomy and Enhancer Strength.

Shaktah, L. A. (2019). Lawrence A. Shaktah. (September), 1–3.

Shore, P., Bisset, L., Lakey, J., Waltho, J. P., Virden, R., & Sharrocks, A. D. (1995).

Characterization of the Elk-1 ETS DNA-binding domain. Journal of Biological Chemistry,

Vol. 270, pp. 5805–5811. https://doi.org/10.1074/jbc.270.11.5805

45

Sims, R., Vandergon, V. O., & Malone, C. S. (2012). The mouse B cell-specific mb-1 gene

encodes an immunoreceptor tyrosine-based activation motif (ITAM) protein that may be

evolutionarily conserved in diverse species by purifying selection. Molecular Biology

Reports, 39(3), 3185–3196. https://doi.org/10.1007/s11033-011-1085-7

Starr, D. B., & Hawley, D. K. (1991). ‘“ Ii. 67, 1231–1240.

Takai, D., & Jones, P. A. (2002). Comprehensive analysis of CpG islands in human

21 and 22. Proceedings of the National Academy of Sciences, 99(6), 3740–

3745. https://doi.org/10.1073/pnas.052410099

Tamura, K., & Nei, M. (1993). Estimation of the number of nucleotide substitutions in the

control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and

Evolution, 10(3), 512–526. https://doi.org/10.1093/oxfordjournals.molbev.a040023

Taneri, B., Snyder, B., Novoradovsky, A., & Gaasterland, T. (2004). Alternative splicing of

mouse transcription factors affects their DNA-binding domain architecture and is tissue

specific. 5(10).

Tian, J., & Karin, M. (1999). Stimulation of Elk1 transcriptional activity by mitogen-activated

protein kinases is negatively regulated by protein phosphatase 2B (calcineurin). Journal of

Biological Chemistry, 274(21), 15173–15180. https://doi.org/10.1074/jbc.274.21.15173

Tora, L., & Timmers, H. T. M. (2010). The TATA box regulates TATA-binding protein ( TBP )

dynamics in vivo. Trends in Biochemical Sciences, 35(6), 309–314.

https://doi.org/10.1016/j.tibs.2010.01.007

46

Venters, B. J., & Pugh, B. F. (n.d.). Genomic organization of human transcription initiation

complexes. https://doi.org/10.1038/nature12535

Whalen, S., Truty, R. M., & Pollard, K. S. (2016). Enhancer – promoter interactions are encoded

by complex genomic signatures on looping chromatin. Nature Genetics, 1–10.

https://doi.org/10.1038/ng.3539

Wisidagama, D. R., Waters, L. M., Sims, R., Morford, T., & Malone, C. S. (2016). Functional

identification and evolutionary conservation of the yme1L1 mitochondrial integrity gene

promoter. Gene Reports, 4, 272–279. https://doi.org/10.1016/j.genrep.2016.07.012

Xia, H., Dufour, C. R., & Giguère, V. (2019). ERR α as a Bridge Between Transcription and

Function : Role in Liver Metabolism and Disease. 10(April), 1–13.

https://doi.org/10.3389/fendo.2019.00206

Yang, C., Bolotin, E., Jiang, T., Sladek, F. M., & Martinez, E. (2007). Prevalence of the initiator

over the TATA box in human and yeast genes and identification of DNA motifs enriched in

human TATA-less core promoters. 389, 52–65. https://doi.org/10.1016/j.gene.2006.09.029

Yang, G., Sau, C., Lai, W., Cichon, J., & Li, W. (2015). 蚊子网状进化 HHS Public Access.

344(6188), 1173–1178. https://doi.org/10.1126/science.1249098.Sleep

Yang, S., Yates, P. R., Whitmarsh, A. J., Davis, R. J., & Sharrocks, A. D. (1998). The Elk-1 ETS-

Domain Transcription Factor Contains a Mitogen-Activated Protein Kinase Targeting

Motif. 18(2), 710–720.

47

Yella, V. R., & Bansal, M. (2017). DNA structural features of eukaryotic TATA-containing and

TATA-less promoters. FEBS Open Bio, 7(3), 324–334. https://doi.org/10.1002/2211-

5463.12166

Zenzie-gregory, B., Shea-greenfield, A. O., & Smales, S. T. (1992). Similar Mechanisms for

Transcription Initiation Mediated Through a TATA Box or an Initiator Element *. 267(4).

Zhang, Y., Fan, M., & Zhang, X. (2014). Cellular microRNAs up-regulate transcription via

interaction with promoter TATA-box motifs Cellular microRNAs up-regulate transcription

via interaction with promoter TATA-box motifs.

https://doi.org/10.1261/rna.045633.114.transcriptional

Zhu, Z., Wang, H., Wang, Y., Guan, S., Wang, F., Tang, J., & Zhang, R. (2015).

Characterization of the cis elements in the proximal promoter regions of the anthocyanin

pathway genes reveals a common regulatory logic that governs pathway regulation. 66(13),

3775–3789. https://doi.org/10.1093/jxb/erv173

Zimmermann, T., Kashyap, A., Hartmann, U., Otto, G., Galle, P. R., Strand, S., & Strand, D.

(2008). Cloning and characterization of the promoter of Hugl-2 , the human homologue of

Drosophila lethal giant larvae ( lgl ) polarity gene. 366, 1067–1073.

https://doi.org/10.1016/j.bbrc.2007.12.084

Zuccato, C., & Cattaneo, E. (2014). Huntington’s Disease. In Handbook of experimental

pharmacology (Vol. 220, pp. 357–409). https://doi.org/10.1007/978-3-642-45106-5_14

48

Zuccato, C., & Cattaneo, E. (2009). Brain-derived neurotrophic factor in neurodegenerative

diseases. Nature Reviews Neurology, 5(6), 311–322.

https://doi.org/10.1038/nrneurol.2009.54

49

Appendix: Supplemental Figures

Supplemental Figure 1. Figure 4. Phylogenetic tree constructed with the SF3B3 gene coding regionA bootstrap consensus phylogenetic tree of the SF3B3 protein coding region. The tree was produced using the software Molecular Evolutionary Genetics Analysis(MEGAX) using a maximum likelihood bootstrap analysis with 500 replication. Probabilities are shown at branch points as values between 0 and 100 corresponding to % likelihoods. The tree was rooted using zebra fish as the outgroup

50