Splicing Factor SF3B1: Putative Chronic Lymphocytic Leukemia Driver

Senior Thesis

Presented to

The Faculty of the School of Arts and Sciences Brandeis University

Undergraduate Program in Biology Dr. Robin Reed Advisor Dr. Rachel Woodruff Advisor

In partial fulfillment of the requirements for the degree of Bachelor of Science

By Jingyuan Huang April 2018

Copyright by Jingyuan Huang

Committee members (if applicable):

Name: ______Signature: ______

Name: ______Signature: ______

Name: ______Signature: ______

1 TABLE OF CONTENTS

Cover page 1

Table of contents 2

1.1 Abstract 5

1.2 Contributions and acknowledgements 7

2 Introduction 8 2.1 Chronic Lymphocytic Leukemia:Cause, Diagnosis and Treatments 8 2.2 Putative CLL driver gene, splicing factor SF3B1: Structure, Function 10 2.3 SF3B1 mutations and cancers 12

3 Results 18

3.1 Establishing SF3B1K700E ES cell line 18 3.2 SF3B1 mutation affects the proliferation rate of cells 19 3.3 SF3B1 mutation affects RNA splicing 20 3.4 U2 snRNP is labile in SF3B1K700E cell line 21 3.5 The expression levels of involved in splicing are not

affected by SF3B1 mutation 23

3.6 Hematopoietic are upregulated in SF3B1 mutant cells 24 3.7 Notch activation is a putative mechanism for upregulation of

hematopoietic genes in SF3B1K700E mutant ES cells 27

4 Discussion 29

5 Material and Methods 31 5.1 Generation of SF3B1K700E mutation in human ES cells and HEK

293 cells 31

2 5.2 RNA sequencing 31 5.3 Cell culture 31 5.4 Reverse transcription PCR 32 5.5 Real-time PCR 33 5.6 Western blot analysis 33 5.7 Nuclear extract 34 5.8 17S/12S U2 snRNP Analysis 36

6 Supplementary figures 38

7 References 42

Table Legend Table 1. primers used for reversed transcription PCR and real-time PCR 31

Table 2. primary antibodies used for Western blotting 32

Table 3. secondary antibodies used for Western blotting 32 Table 4. Solutions for ES Nuclear extract experiment 33

Figure Legend Figure 2.1: Genetic lesions of CLL. 9 Figure 2.2: The process of pre-mRNA splicing. 10 Figure 2.3: SF3B1 is an important component of U2 snRNP. 11 Figure 2.4: Different types of mis-splicing. 15 Figure 3.1: Sanger sequencing result 18 Figure 3.2: Growth curve. 19 Figure 3.3: SF3B1K700E would cause mis-splicing events. 21 Figure 3.4: 17S/12S U2 snRNP analysis. 22

3 Figure 3.5: The expression levels of spliceosome components showed little difference between WT and MT. 23 Figure 3.6: The fold change and description of up-regulated genes. 24 Figure 3.7: RNA-seq results showed that hematopoietic genes were upregulated in SF3B1K700E cells. 25 Figure 3.8: The expression of paired box protein 5 (PAX5). 25

4 1.1 Abstract

Background The protein SF3B1 is an essential component of the SF3B complex, which is a constituent of the general splicing factor U2 snRNP. Recent studies found that mutation of

SF3B1 is associated with numerous hematological cancers such as chronic lymphocytic leukemia (CLL), but the mechanism for its pathogenesis remains unclear. Methods Done by the core facility at Harvard Medical School (HMS), CRISPR was used to construct human embryonic stem (ES) cell lines bearing the SF3B1K700E point mutation, which is the most common mutation in hematological cancers. My lab had DNA sequencing performed to identify the cell line carrying the K700E mutation, and RNA sequencing (RNA-Seq) was performed by

Dr. Tan. I prepared nuclear extracts of K700E mutant MT and WT cells to analyze 17S/12S U2 snRNP in the K700E mutation, and I used Western blots to determine whether MT SF3B1 has any effect on U2 snRNP and the spliceosome. Reverse transcription PCR (RT-PCR) and real-time quantitative PCR (qPCR) were conducted to validate that MT SF3B1 affects . Results RNA-Seq showed that there was extensive mis-splicing in SF3B1K700E ES cells compared to WT ES cells. In addition, expression of ~1000 genes were upregulated in the

SF3B1K700E cells. In my project, I, together with a postdoc and my superivor Dr. Reed, used a combination of literature searches and Gene Cards to identify the functions of all of the upregulated genes as well as their disease associations. Remarkably, we found that numerous master regulators of hematopoiesis were upregulated. For example, the TAL1 gene, which is required for hematopoiesis, was 140-fold upregulated, and PAX5, which is required for the B cell lineage, was 20-fold upregulated. In addition, we found that most of these transcription

5 factors are associated with hematological cancers, which is the type of cancer associated with the SF3B1 mutation. Moreover, we found that 22% of the upregulated genes are involved in downstream events in hematopoiesis. In additional studies, we validated the upregulation of the blood transcription factors by RT-PCR and qPCR. We also found that the K700E mutation caused lability of the U2 snRNP, which may explain why mis-splicing occurs. I then carried out western blots to determine whether the K700E mutation affected the expression level of U2 snRNP components and other splicing factors in the spliceosome, but did not see any apparent effect. Further studies are needed to determine how the K700E mutation causes U2 snRNP lability. Conclusions Our study reveals that SF3B1K700E causes extensive mis-splicing in ES cells, similar to what has been reported in blood cancers carrying this mutation. Moreover, our data are the first to show upregulation of important hematopoietic transcription factors and downstream blood genes, which may explain the blood cell type specificity of a general splicing factor in cancer.

6 1.2 Contributions and Acknowledgements

This project is a collaboration between Dr. Shanye Yin, Dr. Aikchoon Tan and me. Dr. Shanye supervised me during the two semesters, thanks for his patience and willingness to teach. The

SF3B1K700E cell line was constructed previously by the core facility at HMS. RNA-Seq data were provided by Dr. Tan. Figure 3.1, 3.3, 3.4 (A), 3.7, 3.9, 3.10 were done by Shanye, and other figures were finished by myself. I want to thank the PI of the lab, Dr. Robin Reed, for her help and for bringing inspirations. I also want to thank everyone in the lab: Jaya Gangopadhyay,

Yong Yu, Binkai Chi and Alex Iocolano, it’s my honor working with you nice, hardworking and talented people.

7 2 Introduction

2.1 Chronic Lymphocytic Leukemia:Cause, Diagnosis and Treatments

Chronic lymphocytic leukemia (CLL) is the most common type of leukemia among adults. This malignant disease is a cancer that affects a specific type of white blood cells, lymphocytes. The progression of CLL is generally slow compared to other types of leukemia. In

CLL patients, aberrant and nonfunctional lymphocytes are produced. As the abnormal lymphocytes generate continuously and live longer than the normal ones, they will gradually spread in the bone marrow (Kipps 2017). As a result, the bone marrow cannot produce blood cells normally, and swelling will occur in the lymphatic system, such as lymph nodes.

The diagnosis of CLL differs from case to case. CLL has a conspicuous degree of heterogeneity with some patients who can live normally without any treatment, while other patients have a more aggressive course of CLL that takes life in a short time (Gaidano et al.

2012). For the former ones, the diagnosis largely depends on a regular blood test whilst for symptomatic patients, symptoms such as fatigue, more frequent infections, and the swelling of lymph nodes occur. Because of the heterogeneity of CLL, it is shown that CLL has different subgroups, each with distinct biological mechanisms. Knowing more about the molecular pathogenesis of the disease is thus important for the treatment of CLL.

The treatment options for CLL patients include chemotherapy and immunotherapy. For instance, anti-CD20 medicines are one of the efficient immunotherapies for CLL (Bose 2017).

8 CD20 is a protein on the surface of CLL cells (Datta 2009). Some of the anti-CD20 drugs are antibodies. They target CLL cells by binding to CD20, and then attack CLL cells by forming a membrane attack complex (Datta 2009). Proven powerful chemotherapy drugs used to treat

CLL include fludarabine, cyclophosphamide, and bendamustine (Hallek 2017). However, there exist patients whose cancer is refractory to fludarabine, one of the most efficient one.

Fludarabine can be converted into F-ara-ATP, which functions in the inhibition of DNA synthesis (Ricci 2009). Splicing factor 3b1 (SF3B1) mutations are enriched in patients who are fludarabine-refractory (Rossi et al. 2011). Therefore, SF3B1 is implicated in CLL chemorefractoriness. It is also studied that SF3B1 plays a part in the pathogenesis of CLL, it could bring more aggressive CLL disease progression and lower overall survival rate (Quesada et al. 2017). With the technologies of next-generation sequencing, CLL genome is further explored, additional genetic lesions are identified in CLL (Figure 2.1). As is shown in Figure 2.1, genetic lesions of SF3B1 shows a relatively high frequency in different phases of CLL, suggesting that SF3B1 is one of the potential driver genes for CLL (Gaidano et al. 2012).

9 Figure 2.1 (Gaidano et al. 2012): Genetic lesions of CLL. The frequency of genetic

lesions in different phases of CLL is provided in parentheses. The shown genetic lesions

have relatively high frequency compared with other genetic lesions. As CLL progresses,

some CLL cases will become chemorefractory, and some will become an aggressive form,

Richter syndrome.

2.2 Putative CLL driver gene, splicing factor SF3B1: Structure, Function

In eukaryotes, after transcription, precursor messenger RNAs (pre-mRNA) are produced.

Before the mRNA can be used as a template to synthesize protein, the pre-mRNA needs to be modified by a set of processes (Wahl et al. 2009). Those processes include RNA splicing. With five small nuclear RNAs (U1, U2, U4, U5, U6) involved, spliceosome assembly is a key step of pre-mRNA splicing (Reed 1996). During RNA splicing, non-coding introns will be removed, and exons will then be joined together to form the mature mRNA (Figure 2.2).

10 Figure 2.2 (Wan et al. 2013): The process of pre-mRNA splicing. U2 binds to the branch

point sequence (BPS) in the intron, then an adenosine in the BPS attacks the

phosphodiester bond at the 5’ splice site (ss) (Wahl et al. 2009). The phosphodiester bond

at the 3’ splice site is then attacked by the 3’-hydroxyl of the free 5’ exon (Wahl et al.

2009). The two exons are ligated and the lariat intron is removed from the mRNA. After

further processing, the mature mRNA will be generated.

SF3B1, together with SF3B2,3,4,5,6, and 7, form the SF3B complex (Wahl et al. 2009).

The SF3B complex is an essential component of the U2 small nuclear ribonucleoprotein (snRNP)

(Wahl et al. 2009). Once named as spliceosome-associated protein 155 (SAP 155), SF3B1 was found to play an important role in recruiting U2 snRNP as well as in binding U2 snRNP with

BPS (Gozani 1998). The U2 snRNP, together with other snRNPs and a large number of splicing factors, assembles the dynamic spliceosome (Wan et al. 2013) (Figure 3). As shown in Figure

2.3, SF3B1, binding on both the 5’ and 3’ ends of the branch point sequence, is located in the catalytic center of the spliceosome (Wahl et al. 2009).

11 Figure 2.3 (Wan et al. 2013): SF3B1 is an important component of U2 snRNP. PY tract

stands for polypyrimidine tract; R is a purine; Y is a pyrimidine. U2AF65 and U2AF35

are splicing factors. In the figure, U2 RNA is hybridized to the BPS.

SF3B1 interacts with other components in the spliceosome complex such as intronic

RNA sequence BPS and U2 auxiliary factor (U2AF) to recruit the U2 snRNP (Gozani et al.

1998). Such interactions contribute to the recognition of the 3’ splice site and stabilization of the spliceosome (Wan et al. 2013). Interacting with the BPS, SF3B1 helps bulge out the conserved adenosine, and thus the adenosine’s 2’OH can attack the phosphodiester bond at 5’ splice site more easily (Wan et al. 2013).

2.3 SF3B1 mutations and cancers

SF3B1 contains two main regions. One is the N-terminal region, which is hydrophilic and functions as the binding site of U2AF (Alsafadi et al. 2016). The other one is a highly conserved

C-terminal domain on which mutations are most frequently detected (Alsafadi et al. 2016). This domain consists of 22 Huntington Elongation Factor 3 PR65/A TOR (HEAT) repeats (Wan et al.

2013). As the rough structure of the HEAT repeats domain is studied by scientists through constructing tentative homology model, it is suggested that the structure of this domain would change during the generation of U2 snRNP (Quesada et al. 2012).

12 SF3B1 mutations that are involved in blood cancers are mostly missense mutations that affect single amino acids (Alsafadi et al. 2016). There are mutational hotspots located within the

HEAT repeats domain. They are the fifth, sixth and seventh HEAT repeats, and their amino acid positions are R625, K666, and K700E, respectively (Alsafadi et al. 2016). Among the three hotspot mutations, K700E is most frequently detected in blood malignancies (Alsafadi et al.

2016). The hotspot mutations in the HEAT domain were found to take place on the inner surface of the SF3B1 structure, where SF3B1 interacts with BPS (Alsafadi et al. 2016). It is suggested that those hotspot mutations could induce an abnormal conformational change in U2 snRNP

(Alsafadi et al. 2016). This conformational change would then affect the physical characteristics of SF3B1 and thus influence the interaction between SF3B1 and other components of the U2 snRNP complex (Wan et al. 2013).

SF3B1 is essential for normal splicing process. Studies of cancers with SF3B1 mutations involved have shown that these mutations can cause altered RNA splicing (Wang et al. 2016).

Co-Immunoprecipitation experiments revealed that mutated SF3B1 is still able to interact with other components of U2 snRNP and thus is involved in the process of pre-mRNA splicing

(Wang et al. 2016). Wang et al. introduced K700E hotspot mutation into hematopoietic cell line

K562 and observed how this point mutation relates to RNA splicing by bulk RNA-sequencing

(Wang et al. 2016). SF3B1-K700E transfected K562 cells showed an over 10-fold increase in certain splice variant expression compared to K562 cells with wild-type SF3B1 (Wang et al.

2016). Wang et al. also confirmed the association of mis-splicing with SF3B1 mutation by analysis of single CLL cells. The genes that are identified as significantly mis-spliced in the

13 bulk RNA-seq analysis are again detected to be alternatively spliced in single CLL cells (Wang et al. 2016).

In cells that have SF3B1 point mutations other than K700E, altered splicing events still take place. The patterns of significantly altered splicing in cells with other SF3B1 mutations are similar to cells with K700E mutation (Wang et al. 2016). Meanwhile, mis-splicing events in

CLL cells that are not affected by K700E mutation are found also not altered for other SF3B1 mutations (Wang et al. 2016). This indicates that the shared RNA mis-splicing events among cells with different SF3B1 mutations are not general defects in CLL but specific events that only take place with SF3B1 mutation (Wang et al. 2016).

There are many types of mis-splicing. Among the different types of mis-splicing (Figure

2.4), the usage of alternative 3’ss is significantly more common in general CLL cells(Alsafadi et al. 2016). Recent studies show that a high portion (32%) of introns have more than one potential branch points, and the misregulation of alternative branch point sequences usage is implicated in pathology of diseases (Alsafadi et al. 2016). U2 snRNP that contains wild-type SF3B1 recognizes the canonical BPS, which is U2AF dependent (Alsafadi et al. 2016). Meanwhile, due to a conformational change in U2 snRNP complex, U2 snRNP with mutated SF3B1 is not so strict for the U2AF-dependent BPS (Alsafadi et al. 2016). Therefore, the U2 snRNP complex would instead bind to an upstream alternative BPS that possesses a high potential of base-pairing with the complex. This is how alternative 3’ss is caused by SF3B1 hotspot mutations.

14 Figure 2.4: Different types of mis-splicing. The grey parts are the exons that should bind

together to form the mature mRNA; the blue parts are what would be included in the

mature mRNA if mis-splicing occurs.

In CLL, SF3B1 mutations are subclonal events that always take place with other gene mutations (Wang et al. 2016). Single cells with SF3B1 mutations within CLL cases were studied, and SF3B1 was shown to be significantly related to changes of the expression of some genes that are involved in multiple important cellular pathways (Wang et al. 2016). The cellular processes affected by those pathways include CLL proliferation and survival, apoptosis, which is the programmed cell death, DNA damage and cell cycle, and Notch signaling (Wang et al.

2016). Later on, Wang et al. examined mutated SF3B1 overexpressed cell lines versus wild-type overexpressed ones. Consistent with results in single-cell studies, DNA damage response is altered in cell lines with overexpressed mutant SF3B1, not limited to K700E point mutation

15 (Wang et al. 2016). This suggests that altered gene expression associated with SF3B1 mutations could contribute to altered DNA damage response.

One of the important pathways that are affected by SF3B1 mutation-associated splice variants is the Notch signaling pathway. SF3B1 mutations that cause in-frame mis-splicing are partially associated with alteration in the splicing of DVL2. DVL2 is a gene that could negatively regulate Notch signaling (Wang et al. 2016). The altered splicing of CVL2 would induce an increase in Notch signaling, a pathway that is concerned with CLL (Wang et al. 2016).

The Notch signaling plays a role in the regulation of cell differentiation and proliferation processes (Gianfelici 2012). The increase of Notch signaling would contribute to apoptosis resistance and therefore CLL cells would show an increased survival rate (Gianfelici 2012).

From previous studies, we learn that SF3B1 mutations are found to be related to more aggressive CLL and thus SF3B1 has the potential to be one of the driver genes of CLL. Further study of SF3B1’s function and how its mutations affect cells would contribute to a clearer understanding of CLL. Using the SF3B1K700E mutant ES cell model, we asked following questions: Does SF3B1K700E mutation affect the protein expression of SF3B1? Does SF3B1K700E mutation affect U2 snRNP integrity? Does SF3B1K700E affect global gene expression in human

ES cells?

First, we used Western blot to exam the expression levels of SF3B1 and other components of U2 snRNP. We then used nuclear extract to test how SF3B1 mutation would affect the stability of U2 snRNP. To see how splicing is altered by SF3B1 mutation, RNA sequencing was applied to compare the total RNA sequence of normal cells and cells with

16 SF3B1 mutations. To validate that many of the affected genes are involved in important cellular pathways that are concerned with CLL, the function of mis-spliced genes was searched. We used techniques such as RT-PCR and Western blot to get a more straightforward view of how the expression of genes related to hematopoiesis or B cell development are affected.

17 3 Results

3.1 Establishing SF3B1K700E ES cell line

Figure 3.1: Sanger sequencing of the target mutation. In the mutant cells, the K700E mutation

was heterozygous. The base A was replaced by the base G in mutant allele. In Sanger

sequencing result, the base A and G showed 1:1 ratio.

In order to further analyze the effects of the K700E mutation which has been shown to affect splicing (Wang et al. 2016), we have independently made our own mutant embryonic stem cell line. In collaboration with Shanye Yin, the core facility at HMS used CRISPR to introduce the K700E mutation into human ES cells, and used RNA-seq to analyze the cell line with the point mutation. Human ES cells after electroporation were sorted into 96 well plates with a single cell per well. A total of 2 plates were sorted and 31 single cells survived the sort and gave rise to colonies. The sequencing results showed that, out of the the 31 colonies, there

18 was one containing the correct point mutation. There were eight cells containing indels. The rest were wild type cells. The one colony that contains the K700E point mutation was then cultured for further research in this project.

3.2 SF3B1 mutation inhibits the proliferation rate of cells

Figure 3.2: Growth curve. P values were calculated using t test(*<0.05, **<0.01).The grey and

orange curves had a higher starting concentration of cells (5*104/ml), and the other two were

started with 1.67*104 cells per ml.

To determine how the K700E point mutation affects the proliferation rate of cells, we carried out growth rate analysis (Figure 3.2). These growth curves revealed that mutant HEK293 cells had lower proliferation rates than wildtype cells. After three days, the growth curves show

19 an interesting pattern. Specifically, the proliferation rate of mutant cells increased, whereas the wildtype cells grew more slowly after three days. One possible explanation for this phenomenon is that the size of the plates limited continued growth of wildtype cells. Previous research showed that SF3B1 knockdown induces cell cycle arrest, and further negatively affects cell growth of different myeloid cell lines (Dolatshad et al. 2015). From this experiment we can conclude that the growth of HEK293 cells was also negatively affected by the SF3B1 mutation.

It’s interesting because CLL cells generally have higher proliferation rate than normal cells.

However, cells bearing K700E mutation, which we assume is a cancer-causing mutation, have lower proliferation rate than wild-type cells. In my opinion, this is because the K700E mutant cells have not developed into CLL cells yet. Without normal mRNA splicing, the mutant cells were growing slowly compared with wild-type cells.

3.3 SF3B1 mutation affects RNA splicing

It is already known that SF3B1 can alter splicing of some genes that are involved in multiple cellular pathways(Wang et al. 2016). To confirm that it is consistent in human ES cells,

RNA-seq was performed in this project by Dr. Tan. The result (Figure 3.3) shows that in cells bearing the K700E mutation, splicing of mRNA was affected. The dysregulation that has the highest frequency was alternative 3’ss. For example, over 500 alternative 3’ss were preferred and used at higher ratio in the SF3B1K700E cell line than in wildtype cell line. This result is consistent with the findings of Wang et al.. The protein SF3B1 plays a part in recognizing 3’ss,

20 as described above, but the mechanism was unclear.

Figure 3.3(Shanye Yin): SF3B1K700E would cause mis-splicing events. (A) RNA-seq validated

the target point mutation. K700E replaces the nucleotide base A by G, and changes the 700th

amino acid Lysine into Glutamine. (B) The splicing of the SF3B1 gene was altered by K700E

mutation. (C) RNA-Seq results shows that splicing alterations are different in SF3B1WT

compared to SF3B1K700E human ES cells. The number of total identified significantly

mis-splicing events is 1394.

3.4 U2 snRNP is labile in SF3B1K700E cell line

As shown before, the K700E mutation affects splicing, and causes usage of alternative

3’ss. Possible mechanisms are based on structural changes of mutant SF3B1 and the changes of interactome of mutant SF3B1. We assume the mechanism has something to do with the

21 destabilizing of U2 snRNP. To examine whether SF3B1K700E has an effect on the integrity of U2 snRNP, I performed a native gel assay with and without heparin to compare the relative abundance of 12s and 17s. The 17S structure is the functional form of U2 snRNP whereas the

12S particle is the core (Behrens et al. 1993). Heparin is known to disrupt weak interactions.

From the results, we observed a higher ratio of 12S to 17S in the MT than the WT, even in the absence of heparin, suggesting 17S U2 snRNP was disrupted in the mutant but not in the wildtype. In the presence of heparin, 17S U2 snRNP was further disrupted to the 12S snRNP in the mutant. I conclude that the K700E mutation disrupts the stability of U2 snRNP, and this might account for the splicing differences that have been observed.

A. B.

Figure 3.4: (A) Schematic of the structure of 17S and 12S U2 snRNP. The green part is the core

part of U2 snRNP and the red part contains SF3a and SF3b. U2 snRNP contains both protein

and RNA sequence. (B) The result of 17S/12S U2 snRNP analysis shows that SF3B1K700E

22 affects the stability of 17S/12S snRNP. In the gel, the levels of 17S and 12S U2 snRNP complex

is visualized.

3.5 The expression levels of proteins involved in splicing are not affected by SF3B1 mutation

The stability of U2 snRNP is lower in MT cells, one of the possible reason is that proteins in the U2 snRNP complex are not expressed at a normal level. Western blot experiments were performed to determine whether SF3B1K700E affects the expression levels of components of the spliceosome. In Figure 3.5, SF3A1, SF3A2 and 3, SF3B1, SFPQ and B’’ are proteins in U2 snRNP, and U1C is a component of U1 snRNP. ARS2 is a protein in the spliceosome. We used

Tubulin as the loading control. The data show little difference in the levels of proteins examined.

This result suggests that the protein expression levels are not affected or only slightly affected by the K700E mutation.

23 Figure 3.5: The protein expression levels of spliceosome components showed little difference

between WT and MT.

3.6 Hematopoietic genes are upregulated in SF3B1 mutant cells

In RNA-seq results, we found that some genes were upregulated by over 20 folds. I looked up the function of over 200 upregulated genes using online sources such as NCBI and . I then selected genes that are involved in hematopoiesis or are related to blood cancers and certain pathways. After looking up the function of the significantly upregulated genes, we found that many of them have a function closely related to hematopoiesis, such as

PAX5, and B cell development, such as TAL1. I did the same to the down regulated genes, interestingly, I did not find many of them related to hematopoiesis. This upregulation might affect cellular pathways related to blood cell and B cell development, and might contribute to the development of CLL, a blood malignancy.

24 Figure 3.6: The fold change and description of up-regulated genes. The fold change data are based on RNA-seq. Gene descriptions are from online sources such as genecards and NCBI.

To confirm these findings, we performed qPCR and RT-PCR to selected genes. In figure

3.7 and 3.8, RNA-Seq showed that PAX5 RNA levels were higher in SF3B1K700E ES cells compared to wildtype cells. RT-PCR experiments were conducted to validate this upregulation of the PAX5 gene. These resultes confirmed that PAX5 was expressed at a significantly higher level in the mutant cells than wildtype cells. This means that the K700E mutation brought changes to the expression of those genes and may further affect hematopoiesis and B cell development.

Figure 3.7(Shanye Yin): qPCR results showed that hematopoietic genes were upregulated in

SF3B1K700E cells.

25 Figure 3.8: The expression of paired box protein 5 (PAX5), a master regulator of B-cell development, was upregulated in the SF3B1K700E cell line. The left one is RNA-seq result and the right one is RT-PCR result. The two results are consistent. GAPDH is a highly conservative gene, and was used as the loading control.

26 Figure 3.9: GSEA analysis revealed selective upregulation of blood genes in SF3B1K700E mutant

ES cells. GSEA positive categories are enriched for upregulated genes, and GSEA negative are

enriched for downregulated genes in SF3B1 mutant cells.

3.7 Notch activation is a putative mechanism for upregulation of hematopoietic genes in

SF3B1K700E mutant ES cells

An interesting question is why SF3B1 mutation can induce upregulation of blood genes inSF3B1 MT cells. Notably, we found that upregulated gene are significantly enriched in Notch signaling targets, suggesting higher activity of the signaling pathway in mutant cells. We examined the RNA-seq data and found a cryptic 3’ss is used in DVL2 . DVL2 is an inhibitor of notch signaling. We found that mis-splicing of the gene result in production of a short isoform of DVL2, which is named DVL2-S. This could be validated by RT-PCR and Western blot.

Shanye, the postdoc I worked with, found that DVL2-S activates Notch signaling and upregulates notch signaling downstream target genes, including multiple blood transcription factors. Notch signaling is one of the most important signaling pathway activated in multiple cancers (Yuan et al. 2015). This provides a putative mechanism for upregulation of hematopoietic genes in SF3B1K700E mutant ES cells.

27 Figure 3.10: SF3B1K700E caused mis-splicing of DVL2. (A) RNA seq data showed a cryptic 3’ss is used in DVL2. (B) RT-PCR validated that DVL2-S was produced in mutant cells. (C) Western

blot validated that DVL2-S was produced in mutant cells.

28 4 Discussion

The presence of SF3B1 mutation is suggested to directly relate to CLL based on the fact that SF3B1 is one of the most frequently mutated genes in CLL patients (Gaidano et al. 2012).

But how the mutation is related to the disease remains unclear, and the changes that SF3B1 mutation brings to cells still need to be studied.

To understand how SF3B1K700E plays a part in the pathogenesis of CLL, we have studied the impacts of SF3B1K700E mutation on cell growth in HEK293 cells, and we also determined the effects of SF3B1K700E on global gene expression in human embryonic stem cells. Firstly, we performed RNA-Seq to validate that we had the targeted K700E mutation in the MT cell line, as well as analyzed aberrant splicing events and changes in gene expression in ES-SF3B1K700E cells compared to wildtype ES cells. Secondly, we analyzed how the stability of U2 snRNP is decreased in ES-SF3B1K700E cells. Then, we investigated the function of upregulated genes and found many of them related to hematopoiesis.

We have shown that there were aberrant splicing events taking place in SF3B1K700E cells compared to WT cells. In addition, master regulatory genes of hematopoiesis were upregulated.

For example, TAL1 was 140-fold upregulated, and PAX5 was 21-fold upregulated based on

RNA-seq experiment. This upregulation was validated by RT-PCR and qPCR. Full length TAL1

(TAL1fl) has a short form transcript variant named TAL1s that can be generated by alternative splicing (Jin et al. 2017). SF3B1K700E mutant has a higher ratio of TAL1s/TAL1fl which may lead to the dysregulation of erythropoiesis (Jin et al. 2017). PAX5 is an oncogene that has an important role in B-cell development (Pierre et al. 2011). The abnormal expression of PAX5 is

29 shown to be associated with B-cell cancers including CLL (Pierre et al. 2011). The significant upregulation of TAL1, PAX5 and other genes indicates how SF3B1K700E is associated with CLL.

The K700E mutation caused U2 snRNP lability. However, the K700E mutation did not appear to affect the expression level of U2 components and other splicing factors in the spliceosome. A recent study by Kesarwani et al. constructed research on how SF3B1 mutation associates with the usage of inaccessible cryptic 3’ss hiding within the secondary structure of

RNA (Kesarwani et al. 2017). SF3B1K700E expression overcomes the constraint that those cryptic

3’ss are sequestered within RNA secondary structures, makes those cryptic 3’ss more accessible

(Kesarwani et al. 2017). Therefore, the K700E mutation affects splicing based on its effects on the structure of the spliceosome. To better understand the mechanism, further analysis of how the structure of the spliceosome is changed in SF3B1K700E should be carried out.

In sum, our study reveals that SF3B1K700E affects the structure of SF3B1, decreases the stability of U2 snRNP, and affects the usage of canonical 3’ss, thus has an impact on splicing.

SF3B1K700E also causes upregulation of important hematopoietic transcription factors, which are normally not expressed in wild type human ES cells.

30 5 Material and Methods

5.1 Generation of SF3B1 K700E mutation in human ES cells and HEK 293 cells

The point mutation in SF3B1 that can cause K700E amino-acid substitution was introduced in human ES cells by applying CRISPR/CAS9 induced homology mediated repair.

The CRISPR guide sequence we used is TGGATGAGCAGCAGAAAGTT. The Ultramer sequence used to introduce the point mutation (AAA to GAA) is as follows, with the protospacer adjacent motif (PAM) region underlined:

TGTAACTTAGGTAATGTTGGGGCATAGTTAAAACCTGTGTTTGGTTTTGTAGGT

CTTGTGGATGAGCAGCAGGAAGTTCGGACCATCAGTGCTTTGGCCATTGCTGCCTTG

GCTGAAGCAGCAACTCCTTATGGTATCGAATCTTTTGAT

Cas9 and guide RNA were delivered in the form of in vitro assembled RNA-protein complex.

5.2 RNA sequencing analysis

Done by Shanye Yin.

5.3 Cell culture

Cells were grown in petri dishes using DMEM/F12(1:1)culture media with 10% Foetal

Bovine Serum (FBS) and 1% Penicillin-Streptomycin. Petri dishes were placed in the 37°C CO2 incubator. The cells were checked microscopically daily, and the culture media for each dish was changed daily. When approximately 80% of the surface of the dish was covered by cell

31 monolayer, cells were splitted. Use glass pipette connected to waste pot to remove media from the dish, and then carefully wash the dish with sterile PBS. After removing PBS from the dish, add trypsin EDTA until cells are covered, incubate at 37°C for about 5 minutes, check the cells every one or two minutes to avoid over-trypsinization. Add culture media to inactivate the trypsin, and equally pipette the cell suspension into new dishes, add more culture media to reach the required volume. Excessive cells are cryopreserved for future use.

5.4 Reverse transcription PCR

Total RNA was isolated using TRIzol reagent (Invitrogen). The gene specific reverse transcription of mRNA was finished using M-MLV reverse transcriptase (Invitrogen). The sense primer and anti-sense primer of each genes were listed in table 1. 2 pmole gene specific anti-sense primers, 1 µg total RNA, 1 µl dNTP(10mM) mix and sterile water were used for a

20-µl reverse transcription reaction in the first step. After heated at 65°C for 5 minutes and quick chilled on ice, 4 µl 5X first-strand buffer, 2 µl DTT(0.1M) and 1 µl RNaseOUT recombinant ribonuclease inhibitor were added to the tube. The the mixture was incubated at

37°C for 2 minutes before 1 µl M-MLV RT was added. After incubating 50 minutes at 37°C, the reaction was stopped by heating at 70°C for 15 minutes.

cDNA of selected genes was amplified by PCR in a 10 µl reaction with 1 µl template, 1

µl each of forward primer(10 pmole) and reverse primer(10 pmole), 1 µl 10X PCR Buffer

(200mM Tris-HCl, 500mM KCl), 0.1 µl Platinum Taq HiFi(Invitrogen), 0.1 µl dNTPs(10mM) and 5.8 µl sterile water. PCR settings were 35 cycles, 94°C 30’’, 58°C 30’’, 72°C 30’’, with

32 94°C 5’ before the cycles started and 72°C 10’ after the cycles were finished. cDNA was dyed

using DNA loading dye and then analyzed on a 1% agarose gel with EB.

Table 1. primers used for reversed transcription PCR and real-time PCR

Gene Forward Reverse GAPDH 5’-TGCACCACCAACTGCTTAGC-3’ 5’-GGCATGGACTGTGGTCATGAG-3’ PAX5 5’-AGCAGAATGTCATCCGAGGTA-3’ 5’-AGGCTTGCCCAGAATCCAC-3’ TAL1 5’-GCCGGATGCCTTCCCTATGTT-3’ 5’-GGCGGAGGATCTCATTCTTGC-3’ EOMES 5’-AAGGCATGGGAGGGTATTAT-3’ 5’-AAACACCACCAAGTCCATCT-3’ EBF2 5’-GAAGCAAAGGACGGGTTATCT-3’ 5’-ATCGTCTCCTCCAAAGCAATC-3’ MECOM 5’-TTTTGTGAGGGCAAGAACCA-3’ 5’-GGAAGGAAACAGACCAGGGA-3’ HHEX 5’-CTCCAACGACCAGACCATCG-3’ 5’-CCTGTCTCTCGCTGAGCTGC-3’

5.5 Real-time PCR

Real-time PCR was performed as described (Wang 2003). Reverse transcription was

carried out to prepare cDNA, the method was described in reverse transcription PCR. The PCR

settings were 50°C 2’, 95°C 10’, then 40 cycles, 95°C 15’’, 60°C 30’’, 72°C 30’’, ended with

72°C 10’. The PCR program was set on the QuantStudio 7 Flex real-time PCR machine. 25 µl

reaction mixture was prepared with 12.5 µl SYBR green mix(2x), 0.2 µl cDNA, 1 µl primer pair

mix (5pmole/µl each primer) and 11.3 µl sterile water in each optical tube. Tubes were put in the

machine, and the dissociation curve was analyzed with QuantStudio Software.

5.6 Western blot analysis

Protein was isolated with TRIzol reagent (Invitrogen). 20 µl Protein gel loading dye(PGB)

was added to 10 µl protein samples. After samples were boiled for 10 minutes and spinned, they

33 were analyzed using Bolt 4-12% Bis-Tris Plus gels(Invitrogen). Gels were transferred to nitrocellulose membrane at 200 mA for 1h, the transfer buffer used was 500mM Glycine, 50mM

TrisHCl, 0.01% SDS, 20% methanol. After washed using TBS-T (10mM TrisHCl, 100 mM

NaCl, 0.1% Tween at pH 7.4), nitrocellulose membranes were blocked in blocking buffer(Odyssey) for 1h. Membranes were exposed to primary antibody in PBS(Odyssey) with

0.1% Tween for 1h at room temperature or over night at 4°C. Membranes were washed with

TBS-T three times for five minutes each time, and then were incubated with certain secondary antibodies for 1h at room temperature. The information of primary and secondary antibodies can be found in Table 2 and Table 3. After washing the membranes three times for five minutes each time, protein levels were analyzed by visualizing signals using Odyssey infrared imaging system.

Table 2. primary antibodies used for Western blotting Target Species Source SF3A Rabbit Customer SF3B1 Rabbit Customer SFPQ Rabbit Customer B’’ Mouse Customer U1C Rat Sigma ARS2 Rabbit Customer Tubulin Mouse Abcam

Table 3. secondary antibodies used for Western blotting Target Wavelength (nm) Species Source Mouse 780 Goat LI-COR Rabbit 680 Goat LI-COR Rat 780 Goat LI-COR

5.7 Nuclear extract

34 Small-scale nuclear extract experiments were conducted as described (Folco et al.

2012).The buffers used in nuclear extract experiment are listed in Table 4, they should be freshly made before using. We harvested three 150mm plates of ES cells in eppendorf tubes when cells were 80% confluent. After cells were washed with 13 ml 1xPBS, spin in 4°C microfuge for 5 min at 1000 rpm. The Packed Cell Volume (PCV) was estimated after PBS was aspirated, it should be about 500 µl of PCV for three plates of cells. Hypotonic buffer was added until the total volume was 5 x PBS. Cells were resuspended by gently shaking the tube. Spin again and aspirate the sup, then resuspend cells in Hypotonic buffer in a total of 3 x PCV. The resuspended cells were left on ice for 10 minutes. Mini-dounce(VWR, 1 ml glass mini dounce, tight pestle) were used to dounce cells for 5 times. After douncing, cells were transferred to pre-chilled eppendorf tubes and spinned in 4°C microfuge for 5 minutes at 4000 rpm. The Packed Nuclear

Volume (PNV) should be about 400 µl. 1/2x PNV of Low Salt buffer was slowly added to the tube with gently mixing using pipet tip. 1/2x PNV of High Salt buffer was added in the cold room and the was quickly mixed by inversion. The tube was left rotating in the cold room for 30 minutes before spinning in 4°C centrifuge for 15 minutes at 14000 rpm. The supernatant, the high salt NE was transferred to eppendorf tubes and could be stored at -80°C. 200 µl of the high salt NE was transferred into pre-chilled mini-centricons (Amicon) and was spinned at 4°C centrifuge for 1 hour at 14000 rpm. The NE was dialyzed to low salt in Dialysis buffer for 1-2 hours before use directly without spinning or storage at -80°C.

Table 4. Solutions for ES Nuclear extract experiment

35 Buffer Final Stock Dispense

Hypotonic 10mM HEPES, pH 7.9 1M HEPES, pH 7.9 5 ml 1.5mM MgCl2 1M MgCl2 750 l 10mM KCl 3M KCl 1.67 ml 0.2mM PMSF 200mM PMSF 500 l 0.5mM DTT 2M DTT 125 l 500 ml

Low Salt 20mM HEPES, pH 7.9 1M HEPES, pH 7.9 2 ml 1.5mM MgCl2 1M MgCl2 150 l 20mM KCl 3M KCl 667 l 0.2mM EDTA 0.5M EDTA 40 l 25% Glycerol Glycerol 25 ml 0.2mM PMSF 200mM PMSF 100 l 0.5mM DTT 2M DTT 25 l 100 ml

High Salt 20mM HEPES, pH 7.9 1M HEPES, pH 7.9 2 ml 1.5mM MgCl2 1M MgCl2 150 l 1.4M KCl 3M KCl 48 ml 0.2mM EDTA 0.5M EDTA 40 l 25% Glycerol Glycerol 25 ml 0.2mM PMSF 200mM PMSF 100 l 0.5mM DTT 2M DTT 25 l 100 ml Dialysis 20mM HEPES, pH 7.9 1M HEPES, pH 7.9 10 ml 100mM KCl 3M KCl 16.625 ml 0.2mM EDTA 0.5M EDTA 200 ul 20% Glycerol Glycerol 100 ml 0.2mM PMSF 200mM PMSF 500 ul 0.5mM DTT 2M DTT 125 ul 500 ml

5.8 17S/12S U2 snRNP analysis

This experiment was performed as described (Folco et al. 2011).The branch point-binding region (BBR) 2’O-Methyl oligonucleotide complementary to the BBR

(5'-mCmAmGmAmUmAmCmUmAmAmCmAmCmUmUmGmA-3′) were 32P-labeled using

36 γ-ATP (6000 Ci/mmol) and T4 polynucleotide kinase. Nuclear extract was depleted at room temperature for 20min. 18µl of nuclear extract was incubated with or without 1µl ATP (12.5 mM), 1µl MgCl2 (80mM) and 1µl CrPh (0.5M) at 30º C for 5min. After the incubation 2µl of labeled 2-OMethyl Oligo (BBR oligo, 2.5µM) were added to the reaction and it was incubated for another 5 min at 30º C. Samples were purified using G-50 columns (GE Healthcare Life

Sciences). 10µl of purified sample were put into 5.5µl of 6x DNA Blue and 4µl of the reaction were loaded on a 1.2% low melting agarose gel. The gel was dried and then analyzed with a phosphoImager cassette on a Personal Molecular Imager™ System (Bio-Rad).

For 17S/12S U2 snRNP assembly experiments using Heparin, 0.1-2µl of Heparin (10µM) was used. For experiments without Heparin, we used 1µl of sterile, distilled water.

37 6 Supplementary figures

Figure 6.1: RNA sequencing result. The figure shows the number of reads for each colonies. The

blue ones and red ones are in-frame and out-of-frame insertions and deletions accordingly. The

green one is the targeted K700E mutation.

38 Figure 6.2: Most of the upregulated transcription factors are associated with blood cancers, such

as acute lymphocytic leukemia.

39 7 References 1. Alsafadi, S. & Houy, A. & Battistella, A. & Popova, T. & Wassef, M. & Henry, E. & Tirode, F. & Constantinou, A. & Piperno-Neumann, S. & Roman-Roman, S. & Dutertre, M. & Stern, M. (2016). Cancer-associated SF3B1 mutations affect alternative splicing by promoting alternative branchpoint usage. European Journal of Cancer. 61. S94-S95. doi:10.1016/S0959-8049(16)61332-1. 2. Behrens, S. E., Tyc, K., Kastner, B., Reichelt, J., & Lührmann, R. (1993). Small nuclear ribonucleoprotein (RNP) U2 contains numerous additional proteins and has a bipartite RNP structure under splicing conditions. Molecular and Cellular Biology, 13(1), 307–319. 3. Bose, P., & Gandhi, V. (2017). Recent therapeutic advances in chronic lymphocytic leukemia. F1000Research, 6, 1924. doi: 10.12688/f1000research.11618.1. 4. Datta, S. K. (2009). Anti-CD20 antibody is an efficient therapeutic tool for the selective removal of autoreactive T cells. Clinical Practice. Rheumatology, 5(2), 80–82. doi:10.1038/ncprheum0983. 5. Dolatshad, H., Pellagatti, A., Fernandez-Mercado, M., Yip, B. H., Malcovati, L., Attwood, M., … Boultwood, J. (2015). Disruption of SF3B1 results in deregulated expression and splicing of key genes and pathways in myelodysplastic syndrome hematopoietic stem and progenitor cells. Leukemia, 29(5), 1092–1103. doi:10.1038/leu.2014.331 6. Folco, E. G., Coil, K. E., & Reed, R. (2011). The anti-tumor drug E7107 reveals an essential role for SF3b in remodeling U2 snRNP to expose the branch point-binding region. Genes & Development, 25(5), 440–444. doi:10.1101/gad.2009411 7. Folco, E.G., Lei, H., Hsu, J.L., Reed, R. (2012). Small-scale Nuclear Extracts for Functional Assays of Gene-expression Machineries. J. Vis. Exp. (64). doi:10.3791/4140. 8. Gaidano, G., Foa, R., & Dalla-Favera, R. (2012). Molecular pathogenesis of chronic lymphocytic leukemia. Journal of Clinical Investigation, 122(10), 3432+. doi: 10.1172/JCI64101. 9. Gianfelici, V. (2012). Activation of the NOTCH1 pathway in chronic lymphocytic leukemia. Haematologica, 97(3), 328–330. doi:10.3324/haematol.2012.061721. 10. Gozani, O., Potashkin, J., & Reed, R. (1998). A Potential Role for U2AF-SAP 155 Interactions in Recruiting U2 snRNP to the Branch Site. Molecular and Cellular Biology, 18(8), 4752–4760. 11. Hallek M. (2017). Chronic lymphocytic leukemia: 2017 update on diagnosis, risk stratification, and treatment. Am J Hematol, 92, 946–965. doi: 10.1002/ajh.24826. 12. Jin S, Su H, Tran N-T, Song J, Lu SS, Li Y, et al. (2017). Splicing factor SF3B1K700E mutant dysregulates erythroid differentiation via aberrant alternative splicing of transcription factor TAL1. PLoS ONE, 12(5). doi:10.1371/journal.pone.0175523

40 13. Kesarwani, A. K., Ramirez, O., Gupta, A. K., Yang, X., Murthy, T., Minella, A. C., & Pillai, M. M. (2017). Cancer associated SF3B1 mutants recognize otherwise inaccessible cryptic 3’ splice sites within RNA secondary structures. Oncogene, 36(8), 1123–1133. doi:10.1038/onc.2016.279 14. Kipps, T. J., Stevenson, F. K., Wu, C. J., Croce, C. M., Packham, G., Wierda, W. G., Rai, K. (2017). Chronic lymphocytic leukaemia. Nature Reviews. Disease Primers, 3, 16096. doi: 10.1038/nrdp.2016.96. 15. Pierre O'Brien, Pier Morin Jr, Rodney J. Ouellette and Gilles A. Robichaud. (2011). The Pax-5 Gene: A Pluripotent Regulator of B-cell Differentiation and Cancer Disease, Cancer Res, 71(24), 7345-7350. doi:10.1158/0008-5472. 16. Quesada, V., Conde, L., Villamor, N., et al. (2011). Exome sequencing identifies recurrent mutations of the splicing factor SF3B1 gene in chronic lymphocytic leukemia. Nat Genet., 44(1), 47–52. doi: 10.1038/ng.1032. 17. Reed, R. (1996) Initial splice-site recognition and pairing during pre-mRNA splicing. Current Opinion in Genetics & Development, 6(2), 215-220. doi: 10.1016/S0959-437X(96)80053-0. 18. Ricci, F., Tedeschi, A., Morra, E., & Montillo, M. (2009). Fludarabine in the treatment of chronic lymphocytic leukemia: a review. Therapeutics and Clinical Risk Management, 5, 187–207. doi:10.2147/TCRM.S3688 19. Rossi, D., Bruscaggin, A., Spina, V., et al. (2011). Mutations of the SF3B1 splicing factor in chronic lymphocytic leukemia: association with progression and fludarabine-refractoriness. Blood, 118(26), 6904-6908. doi: 10.1182/blood-2011-08-373159. 20. Wahl, M., L. Will, C. & Lührmann, R. (2009). The Spliceosome: Design Principles of a Dynamic RNP Machine. Cell, 136(4), 701-18. doi:10.1016/j.cell.2009.02.009. 21. Wan, Y., & Wu, C. J. (2013). SF3B1 mutations in chronic lymphocytic leukemia. Blood, 121(23), 4627–4634. doi:10.1182/blood-2013-02-427641. 22. Wang, L. & Brooks, A. N. & Fan, J. & Wan, Y. & Gambe, R. & Li, S. & Hergert, S. & Yin, S. & Freeman, S. S. & Levin, J. Z. & Fan, L. & Seiler, M. & Buonamici, S. & Smith, P. G. & Chau, K. F. & Cibulskis, C. L. & Zhang, W. & Rassenti, L. Z. & Ghia, E. & Wu, C. J., et al. (2016). Transcriptomic Characterization of SF3B1 Mutation Reveals Its Pleiotropic Effects in Chronic Lymphocytic Leukemia. Cancer Cell. 30. doi:10.1016/j.ccell.2016.10.005. 23. Wang, X., & Seed, B. (2003). A PCR primer bank for quantitative gene expression analysis. Nucleic Acids Research, 31(24), e154. doi:10.1093/nar/gng154.

41 24. Yuan, Xun & Wu, Hua & Xu, Hanxiao & Xiong, Huihua & Chu, Qian & Yu, Shiying & Sheng Wu, Gen & Wu, Kongming. (2015). Notch signaling: An emerging therapeutic target for cancer treatment. Cancer Letters. 369. doi:10.1016/j.canlet.2015.07.048.

42