UvA-DARE (Digital Academic Repository)

Regulation of HIV-1 splicing

Müller, N.

Publication date 2016 Document Version Final published version

Link to publication

Citation for published version (APA): Müller, N. (2016). Regulation of HIV-1 splicing.

General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).

Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.

UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl)

Download date:01 Oct 2021 Chapter 6

A search for cellular 5’ splice sites that are regulated by RNA structure

Nancy Muellera, Alizée Borscheb, Stephan Theissc, Heiner Schaalb, Ben Berkhouta and Atze T. Dasa*

a Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, The Netherlands

b Institute for Virology, Heinrich-Heine-University Duesseldorf, Duesseldorf, Germany

c Institute of Clinical Neuroscience and Medical Psychology, Heinrich-Heine- University Duesseldorf, Duesseldorf, Germany

Manuscript in preparation Abstract Splicing of RNAs produced upon transcription of cellular results in the removal of the intronic sequences that are positioned between 5’ and 3’ splice sites. Previous studies demonstrated that the frequency of splicing at these splice sites can be influenced by local RNA structure, possibly by controlling the accessibility for splice factors. To identify cellular 5’ splice sites (5’ss) in the human transcriptome that are controlled by RNA structure, we screened a large set of 5’ss sequences from the ENSEMBL database for the capacity to fold a stable RNA structure. This resulted in the identification of multiple 5’ss-containing sequences that can fold into a hairpin structure in the primary transcript. Selected 5’ss hairpin sequences and destabilized mutants were inserted into reporter transcripts for testing of the 5’ss activity. We thus demonstrate that the 5’ss#8 site in the primary FAM73B transcript is presented in a stable hairpin structure that can inhibit splicing. Despite this inhibitory effect of the RNA structure, splicing at the 5’ss#8 was found to be complete when tested in FAM73B exon-intron containing reporter transcripts. In this more natural context, splicing was reduced by mutations that further stabilize the hairpin. These results indicate that other factors, like surrounding exon or intron sequences or the 3’ss strength, may influence the splicing frequency. In addition, we identified an alternative 5’ss in the hairpin sequence that can be activated by inactivation of 5’ss#8. Introduction Splicing is a co-transcriptional process in which intronic sequences are removed from the primary RNA transcript. The intron sequences are identified by a splice donor site (5’ splice site; 5’ss) at the 5’ side and a splice acceptor site (3’ss) at the 3’ side. The presence of alternative 5’ss and 3’ss allows the production of different mRNAs, which contributes to the proteome complexity (1). Several mutations in the have been identified that affect splice site usage and cause diseases, including cancer and neurological disorders (2-12).

The spliceosome complex that is formed on the primary RNA transcript consists of several small nuclear ribonucleoproteins (snRNPs). Initially, the sequence complementarity between the 5’ terminal nucleotides (nts) of U1 snRNA and the 5’ss mediates U1 snRNP binding (Fig. 1A). The U2 snRNP auxiliary factor (U2AF) interacts with the 3’ss (13, 14), followed by binding of U2 snRNP to the branch point sequence (14, 15). Subsequently, the U4/U6-U5 tri-snRNP particle will bind to complete the process of spliceosome assembly (14).

The splicing efficiency at the 5’ss is controlled by different factors. Firstly, the sequence of the 5’ss influences how efficient the U1 snRNA can anneal. The hydrogen bond score (HBS; (16)) reflects the base pairing potential between U1 snRNA and 11 nts of the 5’ss (3 exonic and 8 intronic nts). The HBS value can vary from 1.8 (only the critical G+1U positions pair with U1 snRNA) to 23.8 (complete complementarity). Secondly, the splicing process can be stimulated or inhibited by splicing regulatory proteins, like SR proteins and hnRNPs, that bind to splicing regulatory elements (SREs) in close proximity to the 5’ss (17). Thirdly, the splicing frequency can be influenced by local RNA structure at the 5’ss, which can restrict access of the splicing factors, including U1 snRNP and regulatory proteins (18-30). 6

The primary HIV-1 transcript contains several 5’ss and 3’ss sites and differential use of these sites allows the production of a large variety of RNAs. These RNAs are used either as genomic RNA and packaged into new virus particles or as mRNA for the translation of all viral proteins. The balanced production of these differentially spliced RNAs is essential for efficient virus replication, which makes HIV-1 splicing an interesting model to study splicing regulation. The major 5’ss that is present in the 5’ untranslated leader of the viral transcripts is used for the production of all spliced mRNAs. Studies that investigated the regulation of splicing at this 5’ss indicated that all afore-mentioned factors, 5’ss/U1 snRNA complementarity, SRE sites and local RNA structure are involved in splicing control (31-35). The splice donor (SD) region surrounding the major 5’ss can fold a stable RNA hairpin structure that partially occludes the U1 snRNA binding site. Stabilization of this SD hairpin was found to reduce splicing, whereas destabilization of this structure increases splicing (32, 34). These results demonstrated that splicing at the major 5’ss is modulated by the stability of the SD hairpin, most likely by controlling the accessibility for splicing factors. In addition, it was shown that SR proteins bound in the SD region influence splicing and that suboptimal sequence complementarity between 5’ss and U1 snRNA prevents complete splicing (31, 35). 109 In this study, we set out to identify 5’ss sites in the human transcriptome that are controlled by local RNA structure in a similar way as the HIV-1 major 5’ss. A large set (47,279) of 5’ss-containing sequences was extracted from the ENSEMBL human genome database and analyzed for the capacity to fold a stable RNA structure. This resulted in the identification of multiple cellular 5’ss sites that adopt a stem-loop structure in the primary transcript. For several of these sites, we tested whether the splicing frequency is controlled by RNA structure.

Results Selection of candidate 5’ss hairpin structures A large set (47,279) of human 5’ss sequences was extracted from the ENSEMBL human genome database (http://www.ensembl.org/index.html). To identify 5’ss sequences that are potentially masked in a local RNA structure, the secondary structure of the 5’ss region, including the 11-nt U1 snRNA binding site and 10 upstream and downstream nts (Fig. 1A) was predicted with the MFold RNA structure analysis software. The thermodynamic stability (ΔG in kcal/mol) of the “Top 100” most stable RNA structures varied from -24.7 to -15.2 kcal/mol (Supplementary Table 1). The hydrogen bond score (HBS; (16)) that reflects the base pairing potential with U1 snRNA varied from 4.8 to 19.6, with a high value reflecting significant U1 snRNA-5’ss sequence complementarity.

Several 5’ss were selected for subsequent splicing analysis based on the following criteria. First, the 5’ss region should fold a hairpin that includes the complete 11-nt U1 snRNA binding site. Second, the same structure should be predicted when a larger RNA segment, including 30 nts upstream and downstream of the 11-nt 5’ss 6 sequence, is analyzed with MFold (Fig. 1A). Third, to select stable hairpin structures, the maximal loop size was arbitrarily set at 6 nts and the stem region should maximally contain a single destabilizing bulge. Fourth, the 5’ss region should not contain any AUG triplets that may interfere with proper translation of the luciferase reporter used for measuring the splicing efficiency (see below). We thus selected five 5’ss sequences that can fold a stable hairpin structure (HS1 to HS5 in Fig. 1B and C). The HBS value of these sites varied from 10.7 to 17.4 (Table 1). The selected 5’ss sequences are derived from different genes and all are constitutively used during processing of the primary transcript.

Destabilization of 5’ss hairpins can increase the splicing efficiency To measure the splicing activity of the selected 5’ss sites, we made pGL3-based promoter-reporter constructs in which the 5’ss hairpin sequences were inserted in the 5’ untranslated leader of the luciferase-encoding transcript (32) (Fig. 2A). Splicing from the introduced 5’ss to a 3’ss in the luciferase sequences will yield a transcript that does not encode luciferase (Fig. 2A). To analyze the potentially suppressive effect of the 5’ss hairpin structure on splicing efficiency, the structure was destabilized by mutation of the stem sequence (HS1d-HS5d in Fig. 1B). The mutations were introduced either on the left- or right-hand side of the stem, depending on which side allowed the largest number of substitutions without affecting the 11-nt U1 snRNA annealing 110 A

B

C

6

Figure 1. Identification of cellular 5’ss sequences that can fold a stable stem-loop structure. (A) To identify cellular 5’ss sequences that can fold a stable RNA structure, 5’ss sequences were extracted from the ENSEMBL database. The secondary structure of the 5’ss region, including the 11-nt U1 snRNA binding site and 10 upstream and downstream nts was predicted with the MFold RNA structure analysis software. In addition, the structure was predicted for a larger region including 30 upstream and downstream nts. (B-C) The primary sequence (B) and secondary structure (C) of the five selected 5’ss hairpin sequences (HS1-HS5) and their destabilized variants (HS1d-HS5d) are shown (mutations boxed in black). The 11 nt sequence to which the U1 snRNA anneals is underlined in B and encircled in grey in C (▲,cleavage site). The stability of the secondary structure predicted with Mfold is indicated (ΔG in kcal/mol). 111 site. Mutations were designed to disrupt base pairs (bps), while non-paired positions (mismatches, bulges or internal loops) were maintained. We avoided the introduction of AUG translation start codons. Mfold was used to analyze the secondary structure of the mutated 5’ss regions and to calculate the remaining thermodynamic stability (Fig. 1C). In HS1d and HS5d most of the nts were single stranded, but a small hairpin with low stability was predicted to fold that included some of the 5’ss nts. HS2d, HS3d and HS4d fold alternative hairpins that are less stable than the respective wild-type (wt) hairpins. In HS2d and HS3d most of the 5’ss nts were base paired, whereas in HS4d nearly all 5’ss nts were presented in the single stranded loop.

To measure the splicing activity of the wt and mutant 5’ss sites, the promoter-reporter constructs were transfected into 293T cells. Two days after transfection, the intracellular luciferase activity was analyzed to measure the unspliced RNA level. Opening of the HS1 hairpin resulted in 2-fold reduced luciferase production (compare HS1 and HS1d), suggesting that the wt structure suppresses 5’ss splicing. Decreased luciferase expression, although less pronounced, was also observed for the HS2d mutant when compared with the wt HS2 construct. The luciferase level of the other destabilized hairpins (HS3d, HS4d and HS5d) did not significantly differ from that of the wt counterparts, indicating that these hairpin structures do not suppress splicing efficiency (Fig. 2B). We therefore focused the subsequent analyses on the suppressive HS1 hairpin structure.

Table 1. Selected 5’ss hairpin structures.

6

Splicing at 5’ss#8 in the FAM73B context The 5’ss in HS1 corresponds to the SD site#8 (5’ss#8) in the 35-kb primary transcript of the FAM73B (family with sequence similarity 73, member B) gene that is located on 9 (HGNC ID:23621). The gene encodes 16 exons of which 15 are protein- coding. Complete splicing results in a 3,637 nt transcript encoding 593 amino acids. FAM73B is a conserved membrane protein, but its function is yet unknown. Knockout of this gene in mice led to several symptoms, including decreased body weight, hy- poalbuminemia, abnormal tooth morphology, decreased bone mineral content and strength, increased susceptibility to bacterial infection and altered body composition (36, 37). 112 A B

Figure 2. Destabilization of the 5’ss hairpin structure can increase splicing efficiency. (A) In the pGL3-based promoter-reporter constructs, transcription of the luciferase-encoding transcript is driven by the SV40 early promoter. The wt and mutant 5’ss hairpin sequences were introduced in the 5’ untranslated leader RNA region. Splicing from the introduced 5’ss to a cryptic 3’ss in the luciferase sequences yields a transcript that does not encode luciferase. (B) 293T cells were transfected with the promoter-reporter constructs and the intracellular luciferase production was measured after 48 h. The mean and SD of 5 independent measurements are shown. Statistical analysis was used to compare wt and mutant values (*, p < 0.05).

The 5’ss#8 region in the FAM73B transcript is predicted to fold exactly the same hairpin as in the HS1-luc RNA (data not shown). To analyze the effect of this hairpin structure on splicing at the 5’ss#8 site in a more natural context, we constructed the FAM73B-luciferase reporter plasmid, in which a 740-bp exon#8-intron#8-exon#9 fragment, thus including the 5’ss#8 and 3’ss#8 sites, was inserted in the 5’ untranslated leader of an optimized luciferase gene (Fig. 3A). We destabilized the 5’ss#8 hairpin by mutation of the left side of the stem (as in the HS1d-luc mutant; FAML in Fig. 3B and 4A) or by mutation of the right side of the stem (FAMR variant). Combining the left and right mutations in FAMLR restores base pairing in a hairpin of similar stability as the original wt structure. 6

Because the optimized luciferase gene in the FAM-luc constructs does not encode a 3’ss, splicing will occur from the 5’ss#8 to the 3’ss#8, which results in removal of the intron#8 sequences. This intron encodes 15 AUG trinucleotides that may function as translation initiation sites for the scanning ribosomes in the unspliced RNA product. Translation starting at these upstream sites will interfere with translation of the downstream luciferase cistron. However, the upstream AUGs will be removed when the RNA is spliced, thus allowing luciferase expression (Fig. 3A). Thus, an increased luciferase level will reflect increased splicing.

The FAM-luc constructs were transfected into 293T cells and luciferase production was measured after 48 h (Fig. 5A). The FAML and FAMR mutants showed a reduced luciferase level when compared to the wt FAM73B construct. These results demonstrate that the mutations introduced on the left and right side of the stem reduced splicing, whereas both were supposed to open the hairpin and therefore increase splicing. The double-mutant FAMLR with a restored hairpin structure also showed reduced splicing, even to a further extent than the FAML mutant. We also analyzed the produced luciferase transcripts in a RT-PCR assay (reverse transcription followed by PCR) that allowed the simultaneous detection of unspliced and spliced RNAs (Fig. 3A). For the 113 A B

Figure 3. The FAM73B exon-intron reporter constructs. (A) In the FAM73B-luciferase reporter construct, the 740-bp exon#8-intron#8-exon#9 fragment of the FAM73B gene was inserted in the 5’ untranslated leader region of the transcript encoding an optimized firefly luciferase gene. The intron#8 sequences encode 15 AUG trinucleotides. In the unspliced RNA, these AUGs may function as premature translation initiation sites, which will prevent luciferase production, because the upstream sites are either out of frame with the downstream luciferase open reading frame or translation starting at the AUGs will terminate at stop codons before the ribosomes reach the luciferase gene. Splicing of this transcript from the 5’ss#8 to the 3’ss#8 site results in removal of the upstream AUGs, thus allowing luciferase expression. The primers used for RT-PCR analysis are indicated with arrows. This RT-PCR will result in a product of 953 bp for the unspliced RNA and 373 bp for the spliced RNA (383 bp for the alternatively spliced RNA). (B) Wt and mutant 5’ss#8 hairpin sequences. The putative SR protein binding sites in the wt sequence, as predicted with ESEfinder, are shown. The 11 nt sequence to which the U1 snRNA anneals is underlined (▲, cleavage site). Nt mutations are boxed in black. The alternative 5’ss downstream of 5’ss#8 is boxed in the wt sequence. 6 wt FAM73B construct, only the short cDNA product was detected, which indicates efficient splicing (Fig. 5B). Sequencing of this cDNA product confirmed thatthe fragment corresponds to the RNA that is spliced at 5’ss#8 and 3’ss#8 (data not shown). The FAMR mutation resulted in a low level of unspliced RNA, which is in agreement with reduced splicing. For the FAML and FAMLR mutants, a further decreased spliced RNA level and concomitant increased unspliced RNA level was observed. The results of this RT-PCR analysis are in good agreement with the luciferase results as a reduced spliced RNA level corresponds with reduced luciferase expression. Thus, whereas destabilization of the hairpin was expected to increase splicing, the combined results demonstrate that the introduced mutations reduced splicing efficiency. This may be due to modification of underlying SREs in the left and/or right side of the stem that promote splicing. In agreement with this explanation, combining the mutations in the left and right side of the stem in the FAMLR double mutant further reduced the splicing frequency.

Analysis of the sequence surrounding the 5’ss#8 site with ESEfinder, an algorithm based on SELEX-identified RNA binding sites for splicing regulatory proteins (38, 39), revealed several putative SR protein binding sites that may either stimulate or repress splice site usage (Fig. 3B; Table 2). The introduced 5’ss hairpin mutations 114 affected several of these putative SRE sites. How these mutated SRE sites influence the splicing frequency is difficult to predict. Most of the putative binding sites overlap and it seems unlikely that all regulatory proteins can bind simultaneously (Fig. 3B). Furthermore, the extended mutations affected several predicted binding sites (Table 2). For example, the FAML mutation not only removed two putative binding sites for SR proteins that enhance splicing, but also a putative site for a repressive protein. Similarly, the FAMR mutation did not only remove two putative enhancer sites, but also a repressor site.

A B

C

6

Figure 4. Secondary structure of the wt and mutant 5’ss#8 region. (A) The structure and thermodynamic stability (ΔG in kcal/mol; as predicted with MFold) are shown for (A) the wt FAM73B 5’ss#8 hairpin and the destabilized FAML, FAMR and FAMLR mutants, (B) the destabilized mutants FAMd1, FAMd2 and FAMd3 and the stabilized mutant FAMstab, and (C) the 5’ss#8-inactivated mutant (FAMGA) and its destabilized variant (FAMLGA). The mutations are boxed in black. The 11-nt 5’ss#8 sequence is encircled in grey (▲, 5’ss#8 cleavage site; ◆, alternative 5’ss cleavage site).

Destabilization of the FAM73B 5’ss#8 hairpin does not further increase splicing We next designed FAM-luciferase constructs in which the 5’ss hairpin was destabilized by mutations that do not affect the predicted SR protein binding sites (FAMd1, FAMd2 and FAMd3 in Fig. 3B and 4B and Table 2). Upon transfection of these constructs into 293T cells, the destabilized FAMd2 and FAMd3 mutants produced a similar level of luciferase as the wt construct, whereas FAMd1 showed a 50% reduced luciferase level. These results indicate that the mutations did not affect splicing (FAMd2 and FAMd3) or reduced splicing (FAMd1), but again no increase in splicing was observed.

115 Table 2. Putative SR protein binding sites in wt and mutant FAM73B 5’ss#8 hairpin sequences.

* SR protein binding sites as predicted with ESEfinder. a SRSF1 site as predicted using the original SRSF1 (IgM) matrix (38) b SRSF1 site as predicted using the improved SRSF1 (IgM-BRCA1) matrix (39)

It could be that the destabilizing FAMd2 and FAMd3 mutations do not increase splicing because the introduced mutations do not sufficiently destabilize the hairpin. In particular, the mutations do not (FAMd3) or not significantly (FAMd2) alter base pairing in the upper stem region that shields the U1 snRNA binding site. Alternatively, 5’ss#8 splicing in the FAM-reporter may already be very efficient and thus not allow further improvement. RT-PCR analysis of the RNA products did indeed demonstrate the presence of the spliced RNA and absence of the unspliced transcript for the wt construct, which indicates efficient splicing at 5’ss#8 (Fig. 5B). Also for the FAMd2 and FAMd3 variants, only the spliced and no unspliced RNA product was observed.

The FAMd1 mutation that reduced splicing may have affected a binding site for a splicing enhancing factor that is not identified by the ESEfinder algorithm. Analysis of the spliced and unspliced RNA products by RT-PCR did however not result in the 6 detection of unspliced RNA for the FAMd1 construct, which suggests that splicing is not decreased. Alternative explanations, e.g. a reduced RNA half-life, are possible but this was not investigated further.

Splicing at the 5’ss#8 can be inhibited by stabilization of the RNA structure To test whether 5’ss#8 splicing is sensitive to stable RNA structure, we designed another variant in which the wt hairpin was stabilized by three substitutions in the lower stem (FAMstab; Fig 3B and 4B). These mutations allowed the formation of an additional (bp) and caused the replacement of two G-U bps by more stable A-U bps. Consequently, the thermodynamic stability (ΔG) of the hairpin decreased from -25.5 kcal/mol to -29.5 kcal/mol. These mutations were carefully chosen not to affect putative SR protein binding sites. Upon transfection into 293T cells, the FAMstab mutant showed reduced luciferase production compared to wt (Fig. 5A). RT-PCR analysis of the RNA products confirmed that this variant produced less spliced RNA and a detectable level of unspliced RNA (Fig. 5B). These results indicate that the splicing efficiency can be reduced by hairpin stabilization, suggesting that splicing at the 5’ss#8 is sensitive to stable RNA structure.

116 A B

Figure 5. Splicing at wt and mutant 5’ss#8 RNA structures. (A) 293T cells were transfected with the FAM-luc reporter constructs and the luciferase was measured after 48 h. The mean and standard error of 3 independent experiments is shown. Statistical analysis was used to compare wt and mutant values (*, p < 0.05). (B) The RNA was isolated 48 h after transfection. The unspliced and spliced RNAs were detected by RT-PCR (953 and 373 bp products, respectively). Splicing at the alternative 5’ss will result in a 383-bp product (m, the DNA size marker). An alternative 5’ss sequence is present downstream of 5’ss#8 We noticed that the sequence 5’GGGGUGGGGAC3’ immediately downstream of the 5’ss#8 site (boxed in Fig. 3B) has significant nt complementarity with the 5’ end of U1 snRNA (3’GUCCAUUCAUA5’) and could form an alternative 5’ss site. Whereas the 5’ss#8 has an HBS value of 10.7, this alternative sequence has only a slightly lower HBS value of 9.3. However, sequencing of the RT-PCR products of the wt and all mutant FAM-luc reporter constructs demonstrated that only 5’ss#8 was used for splicing (results not shown). To test whether the alternative sequence can function as 5’ss in splicing, we constructed the FAMGA-luc variant in which the 5’ss#8 site was inactivated by mutation of the critical G+1U dinucleotide (Fig. 3B and Fig. 4C). In addition, we constructed the FAMLGA mutant in which the FAMGA mutation was combined with the stem-destabilizing mutations of FAML to test whether the RNA hairpin structure prevents the use of the alternative site. 6

Upon transfection of the constructs into 293T cells, the FAMGA mutant produced luciferase, although significantly less than the wt FAM73B construct (Fig. 5A). This result indicates that the transcripts with an inactivated 5’ss#8 can be spliced at an alternative, less-efficient splice site. RT-PCR analysis revealed the production ofa slightly smaller spliced RNA product (Fig. 5B). Sequencing of this RT-PCR product demonstrated that this RNA resulted from splicing at the identified alternative 5’ss (data not shown). In addition, the RT-PCR analysis showed that the FAMGA mutant produced a substantial level of unspliced RNA, which confirmed the reduced level of splicing for this variant. This result demonstrates that the alternative site is functional, but only when the 5’ss#8 site is inactivated. The destabilized double-mutant FAMLGA produced less luciferase (Fig. 5A) and a reduced spliced RNA level (Fig. 5B) compared to FAMGA, in contrast to what we expected. It is possible that the 10-nt substitution in the upstream sequence affects positive splicing elements. Consistent with this idea, this mutation similarly inhibited splicing at the 5’ss#8 in the FAML construct (Fig. 5A and 5B).

117 Discussion Because several previous studies indicated that splicing of primary transcripts can be influenced by the secondary RNA structure present at the splice signals (19-30), we here set out to identify 5’ss sites that are controlled by local RNA structure. Screening of a large set of human RNA transcripts for the folding of a stable hairpin structure that includes the 5’ss nts resulted in the identification of multiple candidate 5’ss hairpins, five of which were subsequently tested in reporter-based splicing assays. We demonstrated that the 5’ss#8 in the primary transcript of the FAM73B gene is presented in a stable hairpin structure that can influence splicing.

The 5’ss#8 hairpin structure inhibits splicing when tested in a hairpin-luciferase construct (HS1). However, when tested in a more natural context, i.e. in FAM73B exon-intron containing reporter transcripts, splicing at the 5’ss was found to be complete, despite the hairpin structure. Further stabilization of this hairpin reduced splicing, which demonstrates that the activity of 5’ss#8 can be influenced by RNA structure. Possibly, the exon and intron sequences surrounding the 5’ss#8 hairpin in the FAM73B-luc construct encode strong splicing enhancers that overcome the inhibitory effect of the wt hairpin structure. The absence of these elements in the HS1-luc reporter may make the 5’ss more sensitive to the inhibitory RNA structure. As splicing efficiency is also known to depend on the strength of the 3’ss (13, 40-42), the different effect of RNA structure on splicing between the FAM73B-luc and HS1-luc constructs can also be due to the difference in the 3’ss involved in the splicing event, that is a cryptic 3’ss in the luciferase gene in the HS1 construct and the natural 3’ss#8 in the FAM73B construct.

When tested in the exon#8-intron#8-exon#9 context, we observed very efficient splicing 6 at 5’ss#8 and no RNA structure effect. Nevertheless, the 5’ss#8 hairpin structure may affect splicing in the complete FAM73B primary transcript. For example, sequence elements further up- or downstream the exon#8-intron#8-exon#9 region may reduce splicing at 5’ss#8 to a level that is sensitive for inhibitory RNA structure effects. Furthermore, the 5’ss#9 - that was lacking in our reporter constructs but is present in the complete RNA - may similarly influence splicing at 5’ss#8, as previous studies demonstrated that U1 snRNA binding to a downstream 5’ss can influence splicing of an upstream intron, independent of whether this downstream 5’ss is used or not (43).

We identified an alternative 5’ss in the FAM73B 5’ss#8 hairpin sequence that can be activated by inactivation of the 5’ss#8 (FAMGA mutation). The HBS value of this alternative 5’ss was only slightly lower than that of 5’ss#8. Inhibition of weak 5’ss by nearby positioned strong 5’ss sequences has previously been described (44, 45). Remarkably, the alternative 5’ss that we identified is located in the stem domain of the 5’ss#8 hairpin and almost all U1 snRNA-binding nts are base paired, which is likely to contribute to its inefficient use. Unfortunately, we could not confirm this inhibitory effect as the mutations introduced in the FAMLGA construct, designed to open the stem structure, did in fact reduce splicing at the alternative 5’ss, which is possibly due to mutation of underlying SREs.

118 In our current analysis, we tested several 5’ss that are known to be constitutively used during splicing of the primary transcript. It will be of interest to include 5’ss that are known to be involved in alternative splicing in future studies, as such sites may require a more tight regulation of their activity.

Acknowledgements This research was sponsored by the Netherlands Organisation for Scientific Research (Chemical Sciences Division; NWO-CW; TOP grant).

Material and Methods Generation and analysis of 5’ss sequence library Complete genomic sequences of human 6, 7, 9, 10, 13, 14, 20, 22 and X together with annotation for 5’ss were kindly extracted from the ENSEMBL database and provided by Dr. Jennifer Ashurst and Dr. Eduardo Eyras (Human Sequence Analysis Group, The Sanger Institute) in 2006. This dataset contained sequences of 6,395 genes and 10,355 RNA splice variants. In total, 47,279 canonical 5’ss were identified. Among those, there were 43,464 constitutively used 5’ss. The RNA secondary structure of the 5’ss region was analyzed using the Mfold algorithm (version 3.4) offered by the MBCMR Mfold server (http://mfold.burnet.edu.au) using standard settings (46). The HBS of the 5’ss was predicted with the HBond Score Web-Interface (http://www.uni-duesseldorf.de/rna/html/hbond_score.php) (16, 47-50).

DNA constructs Promoter-reporter constructs based on the plasmid pGL3 (Promega) with a wt or mutant hairpin sequence (HS1 to HS5; HS1d to HS5d) were created by mutagenesis PCR with one of the HS oligonucleotides (Table 3) and TA113 (34) as primer and pGL3 as template. The resulting PCR products were digested with HindIII and Bsp119I and ligated into the corresponding sites of the pGL3 vector. For the construction of the FAM73B-luciferase reporter construct, the NcoI-XbaI fragment from the pGL4.28 [luc2CP/minP/Hygro] vector (Promega) containing an optimized firefly luciferase gene (lacking cryptic splice transcription factor binding sites) was used to replace the corresponding fragment in pGL3. The FAM73B exon#8-intron#8-exon#9 6 gene fragment was obtained by PCR with primers HS1exon8 and HS1exon9rev (Table 3), using cellular DNA isolated from human SupT1 cells with the QIAamp DNA mini Kit (QIAGEN) as template. The PCR product was digested with AvrII and HindIII and ligated into the corresponding sites of the new luciferase vector, which resulted in the FAM73B-luc plasmid. The 5’ss#8 hairpin sequence in this plasmid was mutated by mutagenesis PCR with one of the FAM oligonucleotides and RVprimer3-pGL3 (Table 3) as primer and FAM73B-luc as template. The resulting PCR fragments were digested with XhoI and XcmI and ligated into the corresponding sites of FAM73B-luc. The pRL-CMV plasmid contains the Renilla luciferase reporter gene under the control of the cytomegalovirus immediate early (CMV-IE) promoter (Promega).

Cell cultures

293T cells were cultured at 37°C and 5% CO2 in DMEM medium containing 10% fetal bovine serum (FBS), nonessential amino acids, 40 units/ml penicillin and 40 µg/ml streptomycin (36). Cells were transfected by calcium phosphate precipitation as previously described (36).

Luciferase assays 293T cells (~1.5 x 105 cells per 2 cm2) were transfected with 20 ng of the pGL3 or FAM-luc constructs, 0.5 ng pRL-CMV (as internal control) and 475 ng pBluescript (as carrier DNA). The cells were cultured for 48 h and subsequently lysed in passive lysis buffer (Promega). Firefly and Renilla luciferase activities were determined with the dual-luciferase assay (Promega) and the firefly luciferase activity was normalized to the Renilla luciferase activity. Data were corrected for between-session variation with the factor correction software (37).

119 Table 3. Oligonucleotides used in this study.

Splice site analysis 293T cells (~1.5 x 105 cells per 2 cm2) were transfected with 1 µg of the FAM-luc constructs. After 48 h, cells were washed with phosphate-buffered saline (PBS), total cellular RNA was isolated with the RNeasy kit (QIAGEN) and contaminating DNA was removed with RNase-free DNase during isolation (QIAGEN). The RNA was used as template for cDNA synthesis with oligo dT and random hexamers as primer (ThermoScript RT-PCR system; Invitrogen). The cDNA product was amplified by PCR with primers pGL5-3seq1 and pGL4-3seq1 (Table 3). The PCR products were analyzed by agarose gel electrophoresis and the spliced products were cloned into a pCRII-TOPO vector (Invitrogen) for sequence analysis.

6 Statistical analyses Data were subjected to independent-samples T-test analysis (IBM SPSS version 21) when indicated. The obtained p-values were considered significant when p<0.05.

References 1. Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ. 2008. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature genetics 40:1413-1415.

2. Kalsotra A, Cooper TA. 2011. Functional consequences of developmentally regulated alternative splicing. Nature reviews. Genetics 12:715-729.

3. Modrek B, Lee C. 2002. A genomic view of alternative splicing. Nature genetics 30:13-19.

4. Stoilov P, Meshorer E, Gencheva M, Glick D, Soreq H, Stamm S. 2002. Defects in pre-mRNA processing as causes of and predisposition to diseases. DNA and cell biology 21:803-818.

5. Hastings ML, Krainer AR. 2001. Pre-mRNA splicing in the new millennium. Current opinion in cell biology 13:302-309.

6. Nissim-Rafinia M, Kerem B. 2002. Splicing regulation as a potential genetic modifier. Trends in genetics : TIG 18:123-127.

7. Faustino NA, Cooper TA. 2003. Pre-mRNA splicing and human disease. Genes & development 17:419-437. 120 8. Padgett RA. 2012. New connections between splicing and human disease. Trends in genetics : TIG 28:147-154.

9. Douglas AG, Wood MJ. 2011. RNA splicing: disease and therapy. Briefings in functional genomics 10:151-164.

10. Wang GS, Cooper TA. 2007. Splicing in disease: disruption of the splicing code and the decoding machinery. Nature reviews. Genetics 8:749-761.

11. Tazi J, Bakkour N, Stamm S. 2009. Alternative splicing and disease. Biochimica et biophysica acta 1792:14-26.

12. Garcia-Blanco MA, Baraniak AP, Lasda EL. 2004. Alternative splicing in disease and therapy. Nature biotechnology 22:535-546.

13. Moore MJ. 2000. Intron recognition comes of AGe. Nature structural biology 7:14-16.

14. Black DL. 2003. Mechanisms of alternative pre-messenger RNA splicing. Annual review of biochemistry 72:291-336.

15. Gozani O, Potashkin J, Reed R. 1998. A potential role for U2AF-SAP 155 interactions in recruiting U2 snRNP to the branch site. Molecular and cellular biology 18:4752-4760.

16. Freund M, Asang C, Kammler S, Konermann C, Krummheuer J, Hipp M, Meyer I, Gierling W, Theiss S, Preuss T, Schindler D, Kjems J, Schaal H. 2003. A novel approach to describe a U1 snRNA binding site. Nucleic acids research 31:6963-6975.

17. Wang Z, Burge CB. 2008. Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. Rna 14:802-813.

18. Zychlinski D, Erkelenz S, Melhorn V, Baum C, Schaal H, Bohne J. 2009. Limited complementarity between U1 snRNA and a retroviral 5' splice site permits its attenuation via RNA secondary structure. Nucleic acids research 37:7429-7440. 6 19. Buratti E, Baralle FE. 2004. Influence of RNA secondary structure on the pre-mRNA splicing process. Molecular and cellular biology 24:10505-10514.

20. Eperon LP, Graham IR, Griffiths AD, Eperon IC. 1988. Effects of RNA secondary structure on alternative splicing of pre-mRNA: is folding limited to a region behind the transcribing RNA polymerase? Cell 54:393-401.

21. Goguel V, Wang Y, Rosbash M. 1993. Short artificial hairpins sequester splicing signals and inhibit yeast pre-mRNA splicing. Molecular and cellular biology 13:6841-6848.

22. Grover A, Houlden H, Baker M, Adamson J, Lewis J, Prihar G, Pickering-Brown S, Duff K, Hutton M. 1999. 5' splice site mutations in tau associated with the inherited dementia FTDP-17 affect a stem-loop structure that regulates alternative splicing of exon 10. The Journal of biological chemistry 274:15134-15143.

23. Hutton M, Lendon CL, Rizzu P, Baker M, Froelich S, Houlden H, Pickering-Brown S, Chakraverty S, Isaacs A, Grover A, Hackett J, Adamson J, Lincoln S, Dickson D, Davies P, Petersen RC, Stevens M, de Graaff E, Wauters E, van Baren J, Hillebrand M, Joosse M, Kwon JM, Nowotny P, Che LK, Norton J, Morris JC, Reed LA, Trojanowski J, Basun H, Lannfelt L, Neystat M, Fahn S, Dark F, Tannenberg T, Dodd PR, Hayward N, Kwok JB, Schofield PR, Andreadis A, Snowden J, Craufurd D, Neary D, Owen F, Oostra BA, Hardy J, Goate A, van Swieten J, Mann D, Lynch T, Heutink P. 1998. Association of missense and 5'-splice-site mutations in tau with the inherited dementia FTDP-17. Nature 393:702-705.

121 24. Liu HX, Goodall GJ, Kole R, Filipowicz W. 1995. Effects of secondary structure on pre-mRNA splicing: hairpins sequestering the 5' but not the 3' splice site inhibit intron processing in Nicotiana plumbaginifolia. The EMBO journal 14:377-388.

25. McManus CJ, Graveley BR. 2011. RNA structure and the mechanisms of alternative splicing. Current opinion in genetics & development 21:373-379.

26. Warf MB, Berglund JA. 2010. Role of RNA structure in regulating pre-mRNA splicing. Trends in biochemical sciences 35:169-178.

27. Shepard PJ, Hertel KJ. 2008. Conserved RNA secondary structures promote alternative splicing. Rna 14:1463-1469.

28. Singh NN, Singh RN, Androphy EJ. 2007. Modulating role of RNA structure in alternative splicing of a critical exon in the spinal muscular atrophy genes. Nucleic acids research 35:371-389.

29. Solnick D. 1985. Alternative splicing caused by RNA secondary structure. Cell 43:667-676.

30. Spillantini MG, Murrell JR, Goedert M, Farlow MR, Klug A, Ghetti B. 1998. Mutation in the tau gene in familial multiple system tauopathy with presenile dementia. Proceedings of the National Academy of Sciences of the United States of America 95:7737-7741.

31. Mueller N, Berkhout B, Das AT. 2015. HIV-1 splicing is controlled by local RNA structure and binding of splicing regulatory proteins at the major 5' splice site. The Journal of general virology.

32. Mueller N, van Bel N, Berkhout B, Das AT. 2014. HIV-1 splicing at the major splice donor site is restricted by RNA structure. Virology 468-470C:609-620.

33. Asang C, Erkelenz S, Schaal H. 2012. The HIV-1 major splice donor D1 is activated by splicing enhancer elements within the leader region and the p17-inhibitory sequence. Virology 432:133-145.

34. Abbink TE, Berkhout B. 2008. RNA structure modulates splicing efficiency at the human 6 immunodeficiency virus type 1 major splice donor. Journal of virology 82:3090-3098. 35. Mueller N, Klaver B, Berkhout B, Das AT. 2015. HIV-1 splicing at the major splice donor site is controlled by highly conserved RNA sequence and structural elements. The Journal of general virology, in press.

36. Skarnes WC, Rosen B, West AP, Koutsourakis M, Bushell W, Iyer V, Mujica AO, Thomas M, Harrow J, Cox T, Jackson D, Severin J, Biggs P, Fu J, Nefedov M, de Jong PJ, Stewart AF, Bradley A. 2011. A conditional knockout resource for the genome-wide study of mouse gene function. Nature 474:337-342.

37. Bassett JH, Gogakos A, White JK, Evans H, Jacques RM, van der Spek AH, Sanger Mouse Genetics P, Ramirez-Solis R, Ryder E, Sunter D, Boyde A, Campbell MJ, Croucher PI, Williams GR. 2012. Rapid-throughput skeletal phenotyping of 100 knockout mice identifies 9 new genes that determine bone strength. PLoS genetics 8:e1002858.

38. Cartegni L, Wang J, Zhu Z, Zhang MQ, Krainer AR. 2003. ESEfinder: A web resource to identify exonic splicing enhancers. Nucleic acids research 31:3568-3571.

39. Smith PJ, Zhang C, Wang J, Chew SL, Zhang MQ, Krainer AR. 2006. An increased specificity score matrix for the prediction of SF2/ASF-specific exonic splicing enhancers. Human molecular genetics 15:2490-2508.

40. Kammler S, Otte M, Hauber I, Kjems J, Hauber J, Schaal H. 2006. The strength of the HIV-1 3' splice sites affects Rev function. Retrovirology 3:89.

41. Hertel KJ. 2008. Combinatorial control of exon recognition. The Journal of biological chemistry 283:1211-1215.

122 42. Shepard PJ, Choi EA, Busch A, Hertel KJ. 2011. Efficient internal exon recognition depends on near equal contributions from the 3' and 5' splice sites. Nucleic acids research 39:8928-8937.

43. Hwang DY, Cohen JB. 1996. Base pairing at the 5' splice site with U1 small nuclear RNA promotes spli- cing of the upstream intron but may be dispensable for slicing of the downstream intron. Molecular and cellular biology 16:3012-3022.

44. Roca X, Sachidanandam R, Krainer AR. 2005. Determinants of the inherent strength of human 5' splice sites. Rna 11:683-698.

45. Balestra D, Barbon E, Scalet D, Cavallari N, Perrone D, Zanibellato S, Bernardi F, Pinotti M. 2015. Re- gulation of a strong F9 cryptic 5'ss by intrinsic elements and by combination of tailored U1snRNAs with antisense oligonucleotides. Human molecular genetics.

46. Zuker M. 2003. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic acids research 31:3406-3415.

47. Asang C, Hauber I, Schaal H. 2008. Insights into the selective activation of alternatively used splice acceptors by the human immunodeficiency virus type-1 bidirectional splicing enhancer. Nucleic acids research 36:1450-1463.

48. Freund M, Hicks MJ, Konermann C, Otte M, Hertel KJ, Schaal H. 2005. Extended base pair complemen- tarity between U1 snRNA and the 5' splice site does not inhibit splicing in higher eukaryotes, but rather increases 5' splice site recognition. Nucleic acids research 33:5112-5119.

49. Caputi M, Freund M, Kammler S, Asang C, Schaal H. 2004. A bidirectional SF2/ASF- and SRp40-de- pendent splicing enhancer regulates human immunodeficiency virus type 1 rev, env, vpu, and nef gene expression. Journal of virology 78:6517-6526.

50. Kammler S, Leurs C, Freund M, Krummheuer J, Seidel K, Tange TO, Lund MK, Kjems J, Scheid A, Schaal H. 2001. The sequence complementarity between HIV-1 5' splice site SD4 and U1 snRNA determines the steady-state level of an unstable env pre-mRNA. Rna 7:421-434.

51. Das AT, Zhou X, Vink M, Klaver B, Verhoef K, Marzio G, Berkhout B. 2004. Viral evolution as a tool to im- 6 prove the tetracycline-regulated gene expression system. The Journal of biological chemistry 279:18776- 18782.

52. Ruijter JM, Thygesen HH, Schoneveld OJ, Das AT, Berkhout B, Lamers WH. 2006. Factor correction as a tool to eliminate between-session variation in replicate experiments: application to molecular biology and retrovirology. Retrovirology 3:2.

123 Supplementary data

Supplementary Table 1. A large set (47,279) of human 5’ss was extracted from the ENSEMBL human genome database (http://www.ensembl.org/index.html). The thermodynamic stability of the secondary RNA structure of the 5’ss region, including the 11-nt U1 snRNA binding sequence and 10 upstream and downstream nts, was predicted with the MFold RNA structure analysis software. The stability (ΔG in kcal/ mol) of the “Top 100” most-stable RNA structures is shown. The hydrogen bond score (HBS; (13)) that reflects the base pairing potential between the U1 snRNA and the 11 nts of the 5’ss is shown for each sequence.

6

124 6

125