Functional Characterisation of Mutations in the SRRM2 Associated with Risk of Non-Syndromic Familial Non-Medullary Thyroid Cancer

This thesis is presented for the degree of Bachelor of Science Honours School of Veterinary and Life Sciences Murdoch University 2017

Nathan Michael Main BSc Molecular Biology and Biomedical Science BForensics Forensic Biology and Toxicology

Supervisors: A/Professor Scott Wilson, Dr. Bryan Ward, Clin/Prof John Walsh and A/Professor Wayne Greene

i

Declaration

I declare this thesis is my own account of my research and contains as its main content, work which has not been previously submitted for a degree at any tertiary institution.

Nathan Main

23/10/2017

ii

Abstract

Non-syndromic familial non-medullary thyroid cancer (FNMTC) is the most common malignancy of the endocrine system and is defined by the presence of non-medullary thyroid cancer (NMTC) in 2 or more first-degree relatives in the absence of predisposing environmental factors and a recognized thyroid cancer syndrome. There is a clear genetic basis to FNMTC, however, details of the involved is limited; a number of susceptibility genes have been proposed but these remain controversial.

SRRM2 is a splicing factor that promotes exonic enhancer-dependent splicing by mediating critical interactions between the spliceosomal U2 snRNP and additional splicing factors bound directly to pre-mRNA. SRRM2 is also associated with cell cycle control, where has been shown to associate with the histone H4 subtype transcriptional regulator HiNF-P and is hypothesized that the SRRM2/ HiNF-P complex may regulate histone production via a p220NPAT- dependent pathway.

SRRM2 was previously identified as a putative susceptibility loci for FNMTC, where a missense variant (S346F) was found to co-segregate with FNMTC in a well- documented FNMTC family. RNA-seq of leukocytes revealed altered splicing patterns for 1,642 exons in the FNMTC patients. The altered splicing pattern for 7 exons was experimentally verified using semi-quantitative PCR. In addition to the S346F mutation,

6 SRRM2 variants have been identified in the Western Australian FNMTC cohort: one variant co-segregated with FNMTC and was predicted by in silico methods to be pathogenic (R1805W), and the others of unknown significance.

iii

The first aim of this study was to generate model cell systems of SRRM2 mutations previously associated with FNMTC predisposition. The second aim was to investigate whether these mutant cell systems show differences in the splicing patterns of specific pre-mRNAs through quantitative real-time PCR. The final aim was to employ flow cytometry techniques to assess the proliferative characteristics of the mutant cell systems and determine whether these mutations prevent proper regulation of the cell cycle.

The original project plan was to generate the model cell systems by performing a transient over-expression analysis in HEK293 and Nthy-ori 3-1 cell lines transfected vector constructs containing the wild type or mutant SRRM2 sequences. However, during the construction of the mutant plasmids, the transient over-expression analysis was abandoned due to challenges associated with the size of the SRRM2 ORF. The

CRISPR/Cas9 system was used instead to generate model cell systems of SRRM2 mutations in the HAP1 cell line. CRISPR/Cas9 was successfully implemented to generate a mutant HAP1 cell line harboring the R1805W mutation, however, the cell line was heterozygous for the mutation and the wild type sequence.

RT-qPCR was performed to characterise the splicing pattern of the wild type and mutant cell lines to determine whether there was a difference in splicing for 7 exons previously shown to be differentially spliced in FNMTC patients heterozygous for the

S346F mutation. No significant differences were identified between the wild type and mutant cell lines, suggesting that the R1805W mutation does not alter the splicing pattern of the 7 exons investigated.

iv

Cell cycle analysis was performed to quantify the percentages of cells in the G0/G1 and

G2/M phases of the cell cycle using propidium iodide staining and quantification of

G0/G1 and G2/M peak fluorescence intensity. The experiments were completed successfully in this regard; however, no significant differences were observed between the wild type and mutant cell lines. The proliferation kinetics of the two cell lines were characterised with a proliferation assay using carboxyfluorescein succinimidyl ester staining and the dye dilution method, however, no significant differences were observed in the proliferation metrics quantified.

Whilst preliminary, these findings suggest that the R1805W mutation does not predispose to FNMTC through the altered splicing of the 7 exons investigated, or by altering proper regulation of the cell cycle. Further investigations are to properly interrogate the association of this this mutation to FNMTC, as well as the other mutation

v

Declaration ...... ii Abstract ...... iii Acknowledgments ...... 5 Chapter 1 – Literature Review ...... 6 1.1.1 Structure and Function ...... 6 1.1.2 Follicular Cells ...... 7 1.1.2.1 Control of Thyroid Function by the Hypothalamus-Pituitary-Thyroid Axis ...... 7 1.1.3 Parafollicular C Cells ...... 8 1.2.1 Thyroid Cancer ...... 10 1.2.1.1 PTC ...... 11 1.2.1.2 FTC ...... 11 1.2.1.3 ATC and PDTC ...... 12 1.2.2 Detection, Diagnosis and Treatment ...... 14 1.2.3 Disease Risk Associations ...... 15 1.3.1 Sporadic NMTC ...... 16 1.3.2 Familial non-medullary thyroid cancer ...... 18 1.4.1 Clinicopathological Features and Prognosis of FNMTC ...... 19 1.4.2 FNMTC Screening ...... 23 1.4.3 FNMTC Genetics ...... 24 1.4.3.1 SRGAP1 (SLIT-ROBO Rho GTPase Activating 1) ...... 24 1.4.3.2 NKX2.1 (NK2 Homeobox 1) ...... 25 1.4.3.3 FOXE1 (Forkhead Box E1) ...... 26 1.4.3.4 HABP2 (Hyaluronan Binding Protein) ...... 28 1.4.3.5 Susceptibility loci ...... 29 1.4.3.7 SRRM2 (Serine/Arginine-rich repetitive matrix 2) ...... 29 1.5.1 Alternative Splicing ...... 31 1.5.2 SR ...... 32 1.5.3 Nuclear Speckle Domains ...... 33 1.5.4 SRRM2 (Serine/Arginine Repetitive Matrix Protein 2) ...... 34 1.5.4.1 The SRRM1/SRRM2 Splicing co-activator ...... 36 1.5.4.2 Exon-enhancer activity of the SRRM1/SRRM2 splicing co-activator ...... 37 1.5.4.3 Cell Cycle Control ...... 39 Chapter 2 – Materials and Methods ...... 42 2.1.2 Instruments ...... 43 2.1.3 PCR and DNA Reagents ...... 44 2.1.4 DNA Modification Enzymes ...... 45 2.1.5 Plasmid Constructs ...... 45 2.1.6 Cell Lines ...... 45 2.1.7 Bacterial Strains ...... 45 2.1.8 Tissue Culture ...... 46 2.1.9 Antibodies ...... 46 2.1.10 Polyacrylamide Gel Electrophoresis ...... 46 2.1.11 Commercial Kits ...... 46 2.1.12 General Reagents ...... 47 2.2.1 Bacterial Culture Techniques ...... 49 2.2.1.1 Preparation of E. coli XL2 Blue Competent Cells ...... 49 2.2.1.2 Preparation of E. coli XL2 Blue Competent Cells by the Hanahan Method ...... 49 2.2.1.3 XL2 Blue Competent Cell Transformation ...... 50 2.2.1.4 XL10-Gold Ultracompetent Competent Cells Transformation ...... 50 2.2.1.5 Competent Cell Inoculation ...... 51 1

2.2.2 Plasmid Extractions ...... 51 2.2.2.1 Extraction of Plasmid DNA using the Wizard SV Plus Miniprep Kit ...... 51 2.2.2.2 Extraction of Plasmid DNA using the Pure Yield Plasmid Miniprep Kit ...... 52 2.2.3 Tissue Culture Techniques ...... 53 2.2.3.1 Maintenance of HAP1 Cells ...... 53 2.2.3.2 Passage of human cells ...... 53 2.2.3.3 Cell Counting ...... 53 2.2.4 DNA Techniques ...... 54 2.2.4.1 DNA Quantitation ...... 54 2.2.4.2 Polymerase Chain Reactions ...... 54 2.2.4.3 Agarose Gel Electrophoresis ...... 55 2.2.4.4 Extraction of PCR Products from Agarose Gels ...... 55 2.2.4.5 Restriction Enzyme Digests ...... 56 2.2.4.6 DNA Ligations ...... 56 2.2.4.7 DNA Sequencing ...... 58 2.2.4.8 Genomic DNA Extractions ...... 58 2.2.5 RNA Techniques ...... 59 2.2.5.1 RNA Extraction ...... 59 2.2.5.2 Agarose Gel Electrophoresis for Evaluation of RNA Integrity ...... 60 2.2.6 Protein Techniques ...... 60 2.2.6.1 Protein Extraction from HAP1 Cells ...... 60 2.2.6.2 Protein quantitation using the BCATM assay ...... 61 2.2.6.3 Polyacrylamide Gel Electrophoresis ...... 63 2.2.6.4 Electroblotting ...... 64 2.2.6.5 Immunodetection of Proteins ...... 64 Chapter 3 - Construction of Mammalian Expression Vectors Containing the Wild type and Mutant Human SRRM2 ORF for Generation of Model Cell Systems of SRRM2 Mutations ...... 66 3.2.1 Construction of Mammalian Expression Vectors Containing the Wild Type Human SRRM2 ORF ...... 69 3.2.1.1 SRRM2 cDNA preparation ...... 70 3.2.1.2 pcDNA3-EGFP and pcDNA3.1 Vector Preparation ...... 70 3.2.1.3 Ligation of SRRM2 ORF into pcDNA3-EGFP and pcDNA3.1 vectors ...... 71 3.2.1.4 Colony PCR for the Identification of Colonies Containing the pcDNA3-EGFP-SRRM2 and pcDNA3.1-SRRM2 Vectors ...... 71 3.2.1.5 Sequencing of the SRRM2 ORF in the pcDNA3.1-SRRM2 vector ...... 72 3.2.2 Attempted Construction of Mammalian Expression Vectors Containing the Mutant R1805W SRRM2 cDNA ...... 73 3.2.2.1 Sub-cloning a 3.1kb SRRM2 fragment into the pDrive Vector ...... 73 3.2.2.2 Site-Directed Mutagenesis to Generate the R1805W SRRM2 cDNA ...... 74 3.3.1 Construction of Mammalian Expression Vectors Containing the Wild Type Human SRRM2 ORF ...... 76 3.3.2 Attempted Construction of Mammalian Expression Vectors Containing the R1805W SRRM2 ORF ...... 80 3.4.1 Construction of Mammalian Expression Vectors Containing the Wild Type Human SRRM2 ORF ...... 83 3.4.2 Attempted Construction of Mammalian Expression Vectors Containing the R1805W SRRM2 ORF ...... 84 3.4.3 Conclusion ...... 86 Chapter 4 – Establishing the Protocol for CRISPR/Cas9 Genome Engineering in the HAP1 Cell Line and Generation of Mutant HAP1 Cell Lines with SRRM2 Mutations Associated with FNMTC ...... 87 4.2.1 Construction of sgRNA-expressing pSpCas9(BB)-2A-GFP vectors and ssODN template design for CRISPR/Cas9 genome engineering ...... 92 4.2.1.1 Guide-RNA Selection and Oligonucleotide Design ...... 93 4.2.1.2 ssODN Template Design ...... 94 2

4.2.1.3 Phosphorylation and Annealing of oligonucleotides ...... 98 4.2.1.4 Cloning of Annealed and Phosphorylated Oligonucleotides Into pSpCas9 Vectors ...... 98 4.2.2 Transfection of pSpCas9 Vectors into HAP1 Cells and Enrichment of Transfected Cells with FACS ...... 99 4.2.2.1 Co-transfection of pSpCas9 Vectors and ssODNs into HAP1 cells ...... 99 4.2.2.2 Fluorescent-activated cell sorting ...... 100 4.2.3 Validation of genome engineering in the polyclonal HAP1 populations ...... 101 4.2.3.1 PCR amplification of gDNA extracted from HAP1 cells ...... 101 4.2.3.2 Verification of genome engineering ...... 102 4.2.4 Validation of Successful Genome Engineering in Monoclonal HAP1 Populations 103 4.2.4.1 Single-cell dilution of HAP1 cells transfected with a pSpCas9 vector containing a R1805W gRNA sequence ...... 103 4.2.5 Western blot analysis of SRRM2 protein expression in the wild type and mutant HAP1 cell lines ...... 104 4.3.1 Construction of sgRNA-expressing pSpCas9 vectors ...... 105 4.3.2 Fluorescent-activated cell sorting ...... 105 4.3.3 Generation of a R1805W Mutant HAP1 cell line ...... 109 4.3.3.1 Verification of Successful Genome Engineering in the Polyclonal HAP1 Populations 109 4.3.3.2 Validation of Genome Engineering in Monoclonal Populations ...... 111 4.3.4 Validation of genome engineering in the S346F polyclonal population ...... 117 4.3.5 Western Blot for SRRM2 Expression in Wild Type and Mutant Cell Lines ...... 118 4.3.5.1 Quantitation of total protein in wild type and mutant whole lysates ...... 119 4.3.5.2 Evaluation of SRRM2 protein expression in wild type and mutant HAP1 cell lines .... 120 4.4.1 CRISPR/Cas9 Genome Engineering Overview ...... 123 4.4.2 Identification of Diploid HAP1 Cell Lines ...... 124 4.4.3 The Presence of Frameshift Mutations in HAP1 Cell Lines following CRISPR/Cas9 Genome Engineering ...... 126 4.4.4 Failure to Generate a S346F Mutant Cell Line ...... 126 4.4.5 Conclusion ...... 127 Chapter 5 – Quantitative Real-Time PCR for the Characterisation of Alternative Splicing Events in the Wild Type and Mutant R1805W HAP1 Cell Lines ...... 129 5.2.1 Reverse-Transcription of RNA into cDNA ...... 131 5.2.1.1 Genomic DNA Elimination ...... 131 5.2.1.2 Reverse Transcription of cDNA ...... 132 5.2.2 Primer Design for Amplification of Alternatively-Spliced Transcripts ...... 132 5.2.3 Quantitative PCR Analysis of exon-included and exon-skipped transcripts ...... 133 5.2.3.1 Preparation of primer mixtures ...... 133 5.2.3.2 Preparation of PCR mastermix ...... 134 5.2.3.3 Preparation of 96-well plates for qPCR ...... 134 5.2.3.4 qPCR reaction and melt curve analysis ...... 134 5.2.3.5 qPCR Data Analysis and Statistical Analysis ...... 135 5.3.1 RNA Extraction ...... 136 5.3.2 Agarose Bleach Gel Electrophoresis for Evaluation of RNA integrity ...... 136 5.3.3 qPCR analysis of alternative splicing events on the QuantStudio 6 ...... 138 Chapter 6 - Flow Cytometric Characterisation of the Wild type and R1805W Mutant HAP1 Cell Line’s Growth Kinetics ...... 145 6.2.1 Cell Cycle Analysis ...... 149 6.2.1.1 Preparation of HAP1 Cells for Cell Cycle Analysis ...... 149 6.2.1.2 Cell cycle analysis of Wild type and Mutant HAP1 cell lines ...... 150 6.2.1.3 Cell Cycle Analysis Statistics ...... 151 6.2.2 Cell Sorting of Large and small Populations from Wild type and Mutant HAP1 Cell Lines ...... 151 6.2.2.1 Cell Preparation ...... 151 6.2.2.2 Small and Large Cell Visual Characterisation ...... 151 6.2.3 Cell Proliferation Assay ...... 152 6.2.3.1 Cell Preparation for the Cell Proliferation Assay ...... 152 3

6.2.3.2 Cell Proliferation Assay of Wild Type and Mutant Cell Lines ...... 153 6.2.3.3 Cell Proliferation Statistical Analysis ...... 154 6.3.1 Preliminary Cell Cycle Analysis Experiment ...... 155 6.3.1.1 Gating strategy for cell cycle analysis ...... 155 6.3.1.2 Identification of two distinct populations within the wild type and mutant HAP1 cell lines ...... 155 6.3.2 Characterisation of large and small populations within HAP1 cell cultures ...... 158 6.3.2.1 Gating strategy for separating large and small HAP1 populations ...... 158 6.3.2.2 Visual Inspection of Small and Large HAP1 Populations ...... 159 6.3.3 Analysis of wild type and mutant cell cycle using flow cytometry ...... 163 6.3.3.1 Gating strategy for cell cycle analysis of wild type and mutant singlets ...... 163 6.3.3.2 Cell Cycle Analysis Quantitation ...... 166 6.3.3.3 Statistical analysis of percentages of cells in G0/G1 and G2/M phases of the cell cycle ...... 167 6.3.4 Analysis of Wild type and Mutant Proliferation Using Flow Cytometry ...... 168 6.3.4.1 Gating strategy for cell proliferation assay ...... 169 6.3.4.1.1 Selection of Small Singlet Population for Cell Proliferation Analysis ...... 169 6.3.4.1.2 Gating Strategy to Compensate for PI and CFSE Spectral Overlap ...... 170 6.3.4.2 Mathematical Modelling of Wild type and Mutant HAP1 Proliferation ...... 171 6.3.4.3 Statistical Analysis of CFSE Distributions ...... 175 6.4.1 Preliminary Cell Cycle Analysis and the Identification of two Distinct Populations Within the Wild type and Mutant HAP1 cell lines ...... 177 6.4.2 Analysis of Wild type and Mutant Cell Cycle Using Flow Cytometry ...... 177 6.4.3 Analysis of Wild type and Mutant Proliferation Using Flow Cytometry ...... 178 6.4.3 Future Experiments ...... 178 6.4.4 Conclusion ...... 179 Chapter 7 - General Discussion and Conclusion ...... 180 References: ...... 184 Appendix I: Buffers and Solutions ...... 193 Appendix II: Primer Sequences ...... 198

4

Acknowledgments

A/Prof Scott Wilson, thank you for allowing me to undertake this project and having faith that I would have (some) success with establishing a challenging technology like CRISPR/Cas9. Thank you for all the wisdom and support you have provided this year, especially when times were tough.

Dr. Bryan Ward, thank you for the seemingly endless knowledge you have provided regarding the intricacies of molecular biology, for the words of support during the truly traumatic SRRM2 pcDNA ligations (You only need 1!), and for all the lessons you have taught me that I will never forget.

Clin/Prof John Walsh, thank you for allowing me to undertake a project on thyroid cancer genetics at the Department and for the exposure this project provided into the clinical aspects of medical science.

A/Prof Wayne Greene, thank you for helping me select this project, the advice you gave before the year commenced, and the help you have provided throughout.

To everyone at the lab: Alexia Weeks, thank you for all the help you have provided around the lab, especially considering how busy you are.

Purdey Campbell, thanks for always taking time out of your busy schedule to help me with the administration aspects of the project and for reminding me that things don’t always go as planned.

Dr. Ben Mulin, thank you for the information you provided regarding RT-qPCR, if I hadn’t discussed it with you prior I would have been in some serious trouble!

Dr. Andrea Holme, a huge (!) thank you for all the help you provided regarding the cytometry aspects of the project. Thank you for the training on all the cytometers you have at the lab and for the wisdom you have provided regarding medical science in general. Can’t thank you enough!

Mum and Dad, I wouldn’t be here if it weren’t for the constant support you have provided this year (and all other years). Thank you for the quality genome.

To my friends: Michael & Tori, thanks for being available to de-stress and the genuine (?) interest you have shown in my project and its progress.

Ash & Alex, thanks for the support (ha) and for feeding me after the late nights at the lab.

5

Chapter 1 – Literature Review

1.1 Thyroid Gland

1.1.1 Structure and Function

The thyroid gland is an endocrine gland located in the anterior neck region and consists of two large lateral lobes located on either side of the larynx and upper trachea connected by a thin band of connecting thyroid tissue, the isthmus [1]. Thin fibrinous septae separate the thyroid into lobes comprising of thyroid follicles. Thyroid follicles are the functional units of the gland and are roughly spherical cyst-like compartments containing a gel-like mass termed colloid and encapsulated by follicular epithelium [2].

The principle role of the follicle is the production of the thyroid hormones triiodothyronine (T3) and thyroxine (T4), which have diverse roles in development, the regulation of basal metabolism, and homeostasis. The follicular epithelium is the parenchyma of the thyroid gland and consists of two cell types: follicular cells and parafollicular C cells [3].

Figure 1.1 (A) Anatomy of the Thyroid Gland [3]. (B) Histology of the Normal Adult Thyroid Gland. (1) Colloid of a thyroid follicle; (2) Follicular cells. Single layer forming a follicle; (3) Parafollicular C cells; (4) Connective tissue septum [4].

6

1.1.2 Follicular Cells

Follicular cells are the principle cells of the follicular epithelium with the apical surfaces of the cells in contact with the colloid and the basal surfaces resting on a basal lamina [2]. Follicular cells are responsible for the synthesis of thyroglobulin and thyroperoxidase as well as the uptake of iodide from blood, each of which is stored within the colloid in the thyroid follicle [5]. Thyroperoxidase is responsible for the oxidation of iodide ions to iodine atoms that are subsequently coupled to tyrosine residues of thyroglobulin within the colloid [1]. Thyroglobulin is the inactive storage form of T3 and T4 whose conversion to the thyroid hormones and secretion into circulation is controlled by the hypothalamus-pituitary-thyroid axis [6].

1.1.2.1 Control of Thyroid Function by the Hypothalamus-Pituitary-Thyroid Axis

The cascade that controls thyroid function starts in the hypothalamus with the secretion of thyrotropin-releasing hormone (TRH) through the hypothalamic-pituitary circulation

[6]. TRH stimulates thyrotroph cells within the anterior pituitary to release thyroid stimulating hormone (TSH), which acts directly on thyroid follicular cells, stimulating the uptake of iodine through the sodium-iodine symporter (NIS), the synthesis of thyroperoxidase and the subsequent iodination of thyroglobulin [1]. Iodinated thyroglobulin is endocytosed from the colloid into follicular cells, where it is converted to the thyroid hormones T3 and T4 before secretion into systemic circulation [2]. The hypothalamic-pituitary-thyroid axis is controlled by a negative feedback loop with high levels of T3 and T4 inhibiting the release of TRH and the production of T3 and T4.

This negative feedback loop also occurs when TSH levels are high, preventing the release of TRH (Figure 1.2) [6].

7

Figure 1.2 Control of Thyroid Function by the Hypothalamus-Pituitary-Thyroid Axis [6].

1.1.3 Parafollicular C Cells

Thyroid follicles also contain neuroendocrine parafollicular C cells within the basal lamina of the follicular epithelium. These cells are responsible for the synthesis and secretion of calcitonin, a hormone that participates in calcium and phosphate homeostasis by inhibiting the breakdown of bone by osteoclasts and the reabsorption of calcium and phosphate by renal tubular cells [3].

1.2 Cancer

Cancer is a significant source of morbidity and mortality worldwide and was responsible for 8.8 million deaths in 2015 [7, 8]. Cancer can arise from many different sites within the human body and is a heterogeneous group of diseases whose constituents exhibit complex and distinct natural histories, making the study of pathogenesis, development of screening and diagnostic tests, and well as the development of effective treatments, a complex and ongoing task [9]. Worldwide, one in six deaths are attributable to cancer and the number of new cases is expected to

8 increase by 70% over the next 20 years [7]. The economic impact of cancer is significant, with an estimated US $1.16 trillion total annual economic cost in 2010 [10].

With the projected rise in new cancer cases over the next 20 years, the economic burden caused by cancer is expected to increase significantly [10]. Thyroid cancer is the most common cancer of the endocrine system, accounting for 95% and 1-3% of endocrine cancers [11] and all cancers [12], respectively. The incidence of thyroid cancer is increasing worldwide, with one of the greatest rates of increase of all human cancers [13].

Mutations causing or predisposing to cancer can occur in either somatic or germline tissues. Somatic mutations occur during an individual’s lifetime, are not transmissible to progeny and are observed on all genomic scales ranging from single-nucleotide point mutations to large chromosomal rearrangements or aneuploidy [14]. Some mutations are relatively benign and do not cause or predispose to cancer, however, cancer can arise due to a single somatic mutation if the mutation is particularly damaging to the cell or, it can occur by the gradual accumulation of somatic mutations in a single cell over time [6]. Germline mutations occur in germinal tissues and can be passed to progeny if the mutation occurs in a cell that participates in fertilization. Germline mutations predisposing to cancer are less frequently observed than somatic mutations, however, they are a significant contribution to total cancer incidence [15].

9

Figure 1.3 Somatic Versus Germline Mutations [6].

1.2.1 Thyroid Cancer

Thyroid cancer is divided into two types dependent on the cell of origin: (1) Non- medullary thyroid cancer (NMTC) originating from follicular cells and accounts for

95% of all thyroid cancers and, (2) medullary thyroid cancer (MTC) originating from parafollicular C cells and accounting for the remaining 5%. NMTCs are further divided into four clinical subtypes, based on their histological architecture: papillary (PTC), follicular (FTC), anaplastic (ATC) and poorly differentiated thyroid cancer (PDTC)

[11].

Table 1.1 Thyroid Cancer Subtypes* Characteristics Papillary Follicular Poorly Anaplastic Medullary differentiated

Cell type Follicular Follicular Follicular Follicular Parafollicular C cell Prevalence ~80 ~15 <2 1-2 3-5 (%)

10-year 95-98 90-95 ~50 <10 60-80 survival (%) *Adapted from Nikiforov and Nikiforova, 2011 [16]

10

1.2.1.1 PTC

An approximate 80% of NMTCs are PTCs, characterised by distinctive nuclear alterations including psuedoinclusions, grooves and chromatin clearing. PTCs are typically slow growing, tend to metastasise to lymph nodes and can be solitary or multifocal [13]. While PTC has an excellent prognosis (10-year survival > 90%), morbidity from PTC is high, due to local metastasis and complications arising from therapeutic interventions [17].

Figure 1.4 Papillary Thyroid Cancer [6]. (A) The macroscopic appearance of papillary thyroid cancer with clearly visible papillary structures. (B) Histological appearance of papillary thyroid cancer with characteristic papillae.

1.2.1.2 FTC

FTC accounts for an approximate 15% of NMTCs and appears as solitary nodules that may be well-demarcated or highly infiltrative into surrounding tissue. Unlike PTC,

FTCs are rarely multifocal or metastasise to regional lymph nodes, rather FTCs are characterised by their invasive properties into blood vessels and metastases to blood and bone tissue. FTC is more frequently observed in areas with iodine deficiency, where it constitutes 25-40% of all thyroid cancers [6]. Like PTC, FTC has an excellent prognosis (10-year survival > 90%) [16].

11

Figure 1.5 Follicular Thyroid Cancer [6]. (A) Macroscopic appearance of the follicular thyroid cancer showing solitary tumours on each lobe. (B) Histological appearance of FTC.

1.2.1.3 ATC and PDTC

ATCs or undifferentiated thyroid cancers account for 2-3% of all NMTCs and are characterised by their aggressive features with a <10% 10-year survival rate [6]. These aggressive tumours can arise de novo or by de-differentiation of a well-differentiated

PTC or FTC (Figure 1.8), with the later involving the accumulation of additional genetic mutations [18]. These tumours are more frequently observed in older patients

(mean age ~65 years) and an approximate half of ATC patients have a history of well- differentiated thyroid cancers or exhibit well-differentiated thyroid cancers concurrent with ATC [6].

12

Figure 1.6 Anaplastic Thyroid Cancer [19]. (A) Macroscopic appearance of ATC showing characteristic areas of necrosis (black). (B) Histology of ATC showing cells with variable morphology.

PDTCs constitute less than 2% of all NMTCs and are histologically described as an intermediate between a well-differentiated and an undifferentiated thyroid cancer [6].

Like ATC, PDCT is more frequently observed in older patients (mean age ~71 years), is highly aggressive with a (10-year survival ~ 50%) and can arise de novo or by de- differentiation (Figure 1.8) [18].

Figure 1.7 Poorly Differentiated Thyroid Cancer [19]. (A) Macroscopic appearance showing invasive growth pattern and multiple foci of necrosis (black). (B) Histology of PDTC showing cells of variable morphology.

13

Figure 1.8 ATC and PDTC Formation De Novo and by De-Differentiation [16].

1.2.2 Detection, Diagnosis and Treatment

Thyroid cancer patients typically present with a thyroid nodule that is detected due to symptoms associated with compression of surrounding tissues or discovered during a routine physical examination [20]. The majority of thyroid nodules are benign, however, an approximate 5% are malignant and these can be challenging to distinguish from benign nodules [21]. Thyroid symptoms including thyrotoxicosis or hypothyroidism are associated with thyroid cancer, however, the majority of thyroid cancer patients are euthryoid at the time of diagnosis [22].

Following the identification of a thyroid nodule, an initial blood examination is conducted to determine if the patient is euthyroid, typically by measuring TSH and free

T4 levels, as thyroid nodules can be benign and associated with thyroid disorders as well as malignancy [22]. High resolution neck ultrasonography with fine-needle

14 aspiration (FNA), followed by cytological examination of the cells collected is the most important and definitive determinant of whether the nodule is benign or malignant [21].

However, in an approximate 25% of cases, FNA with cytological examination is unable to distinguish whether the nodule is benign or malignant [16].

With a diagnosis of a benign nodule, the patient will be closely monitored to observe any changes that may be indicative of neoplastic development. Following diagnosis of malignancy, surgery is the primary form of treatment with thyroidectomy or lobectomy with or without regional neck and lymph node dissection [22]. In cases where FNA and cytological examinations are inconclusive, thyroidectomy or lobectomy may be performed, however, it has been demonstrated that 60-90% of inconclusive nodules are benign following surgical removal [16]. Molecular markers for thyroid cancer would be an invaluable tool in aiding the diagnosis of malignancy in this context, preventing unnecessary surgery and morbidity for the patient [23].

1.2.3 Disease Risk Associations

The dramatic increase in the incidence of thyroid cancer can only partially be explained by improvements in cancer diagnosis, suggesting that environmental factors must play a role in the disease aetiology [24]. Several risk factors have been identified for thyroid cancer, including ionising radiation [16], iodine deficiency and excess [25], and previous history of thyroid disease, including benign nodules and thyroid autoimmune disease [20]. A significant gender bias exists with NMTC, suggesting that hormonal environment may play a role in predisposing to NMTC [13].

15

In addition to environmental risk factors, individuals with two or more first-degree relatives with NMTC have an 8-10 fold increased risk of developing the disease, suggesting that the disease has a hereditary component arising from germline mutations

[13].

1.3 Non-Medullary Thyroid Cancer

1.3.1 Sporadic NMTC

The majority of NMTCs are sporadic, caused by the gradual accumulation of genetic and epigenetic alterations, including point mutations and chromosomal translocations in growth factor receptor signaling pathways [16]. The former involves single-nucleotide changes in genes, whereas the latter is a large-scale genetic alteration involving the breakage and fusion of two different or the inversion of the same [6]. In normal cells, these pathways are transiently activated by binding of ligands to growth factor receptors or are inactive in thyroid follicular cells. In sporadic

NMTC, gain-of-function mutations in these pathways leads to constitutively active signaling pathways and subsequently, excessive cellular proliferation and cell survival

[16].

16

Figure 1.9 The Receptor Tyrosine Kinase Signaling Pathway [16]. Effectors of the MAPK and P13K-AKT signaling pathways are frequently mutated in sporadic NMTC.

The majority of mutations that initiate sporadic NMTC occur in effectors of the MAPK and P13K-AKT pathways (Figure 1.9), both of which function in signal transduction from cell membrane tyrosine kinases to the nucleus [26], where they regulate processes including cellular proliferation, differentiation and survival [16]. The specific effector within these pathways that becomes mutated and leads to NMTC influences the histological subtype of NMTC that arises (Table 1.2) [6].

17

Table 1.2 Types of NMTC and their Mutational Profiles* Characteristics PTC FTC PDC ATC Main Classic papillary Conventional - - histopathologic type, type, variants microcarcinoma, oncocytic follicular (Hurthle cell) variant, tall-cell type variant

Common BRAF 40-45 RAS 40–50 RAS 20–40 TP53 50– mutations RAS 10-20 PAX8/PPARγ TP53 20–30 80 and their RET/PTC 10-20 30–35 BRAF 10–20 CTNNB1 prevalence TRK <5 PIK3CA <10 CTNNB1 10– 5–60 (%) PTEN <10 20 RAS 20–40 PIK3CA 5–10 BRAF 20– AKT1 5–10 40 PIK3CA 10–20 PTEN 5– 15 AKT1 5–10 *Adapted from Nikiforov & Nikiforova, 2011[16].

1.3.2 Familial non-medullary thyroid cancer

While the majority of NMTC cases are sporadic (>90%), a significantly high number of

NMTC cases are familial NMTC (3-9%), defined by the presence of NMTC in 2 or more first-degree relatives in the absence of predisposing environmental factors [27].

Familial NMTC has been reported with increased frequencies in familial syndromes

[20]. These syndromes have well-defined germline driver mutations (Table 1.3), are highly penetrant and actionable, and genetic testing for syndromic familial NMTC is available when a clinician recognizes the disease phenotype [28].

18

Table 1.3 Syndromic Familial Non-Medullary Thyroid Cancer* Syndrome Germline gene mutation Type of thyroid cancer Cowden syndrome PTEN mutation PTC SDHB-D mutation KLLN promoter methylation FTC PIK3CA mutation AKT1 mutation SEC23B mutation Familial adenomatous APC mutation PTC polyposis Gardner’s syndrome APC mutation PTC Carney complex PRKAR1A mutation PTC, FTC, follicular adenoma Werner’s syndrome WRN mutation FTC, PTC, ATC DICER1 syndrome DICER1 mutation PTC, MNG ATC, anaplastic thyroid cancer; FTC, follicular thyroid cancer; MNG, multinodular goiter; PTC, papillary thyroid cancer. *Adapted from Yang and Ngeow, 2016 [28]

1.4 Non-Syndromic Familial Non-Medullary Thyroid Cancer

Non-syndromic familial non-medullary thyroid cancer (FNMTC) constitute 95% of all familial NMTCs and are defined the presence of NMTC in 2 or more first-degree relatives in the absence of predisposing environmental factors and a recognized thyroid cancer syndrome [11]. Histologically, FNMTC is indistinguishable from its sporadic form and unlike syndromic familial NMTC or sporadic NMTC, the germline mutations involved in FNMTC are poorly understood [28]. Studies indicate that it may be an autosomal dominant condition with incomplete penetrance and variable expressivity

[24] or it may be a polygenic condition caused the presence of multiple low-penetrance alleles [29].

1.4.1 Clinicopathological Features and Prognosis of FNMTC

Certain clinicopathological parameters are associated with a poor prognosis in patients with NMTC. These parameters are collectively termed disease aggressiveness and

19 include age of onset, tumour size, multifocality, extrathyroidal invasion, metastases, and disease recurrence [20]. There exists much controversy regarding whether FNMTC is more aggressive than sporadic NMTC with both supporters and detractors [23].

Many case-control studies have been conducted to clarify the ambiguity but a consensus has not yet been reached (Table 1.4)

20

Table 1.4 Summary of Studies Investigating the Clinicopathoglical Features of FNMTC*

Study Younger Increased Multifocality Extrathyroidal Lymph node Distant Recurrence Shorter age size invasion involvement Metastasis disease- free survival [30] + - + - - - + + [31] - - + - - N/A + +

[32] + - + + + N/A + N/A [33] - + + - - - + + [34] + N/A + + N/A N/A - -

[35] - - + - - N/A + + [36] + - + + + + - N/A

[37] - + + + + N/A - N/A [38] + - + N/A + N/A + + [39] + N/A + + N/A N/A + [40] - - + + + + N/A N/A [41] - - - N/A N/A N/A - N/A

[42] - - + - - N/A - - [43] ------[44] + - - N/A - N/A N/A -

+ Positive association found - No association found *Adapted and expanded from Nixon et al., 2016 [23]

21

Whether FNMTC is more aggressive than sporadic NMTC remains unknown and this has important implications for treatment. Thyroidectomy or lobectomy with or without regional neck and lymph node dissection is the common practice for most NMTC patients, depending on the severity of the disease [42]. However, some authors advocate more aggressive treatment in cases where FNMTC is suspected, i.e. prophylactic thyroidectomy or thyroidectomy with regional neck dissection where lobectomy would normally be performed with sporadic disease with similar clinical features [45, 46]. Other authors do not recommend more aggressive treatment and rather advocate similar treatment to sporadic NMTC [47].

This difference in treatment protocols raises the question of whether patients are receiving appropriate treatment, with more aggressive treatment possibly causing unnecessary morbidity or less aggressive treatment being insufficient [23].

Whether FNMTC is more aggressive than sporadic NMTC as well as the contradictory treatment protocols likely stems from FNMTC having a clinical definition rather than a genetic one, in combination with multiple clinical definitions of FNMTC [32]. The most stringent definition of FNMTC requires the diagnosis of three first degree relatives with

FNMTC, however, some authors define it as two or more first degree relatives [13]. It is estimated that in families with two affected first degree relatives, the probability that the disease is actually sporadic is 45-69% vs. 5% for families with three affected individuals [48].

Therefore, any studies using the definition of two or more first-degree relatives may be comparing sporadic NMTC to a heterogeneous group of sporadic NMTC and FNMTC, and the data obtained may not indicative of true FNMTC, rather sporadic cases are likely diluting any meaningful data [36].

22

Until molecular markers are available to accurately diagnose FNMTC, future studies investigating the clinicopathological features and prognosis of FNMTC as well as the underlying pathogenesis, should focus on families with three or more affected first-degree relatives rather than two, so that sporadic cases are not included in their analyses [36].

Additionally, when considering evidence relating to the features of FNMTC, it is important to distinguish between studies that analysed cohorts that could contain sporadic cases versus those where it is unlikely [20]. In the event that accurate molecular markers are identified, retrospective analyses of the data so far obtained should be conducted to clarify the clinicopathological features and prognosis of FNMTC.

1.4.2 FNMTC Screening

All first degree relatives of affected individuals should be screened for the presence of thyroid nodules, even if asymptomatic [20]. Some authors also advocate screening of second- degree relatives as recent studies have demonstrated that they have a similar risk of developing the disease [49]. However, genetic testing for FNMTC is not available due to an incomplete understanding of the underlying genetics, lack of validation of putative susceptibility genes in large cohorts and, a lack of functional studies to confirm the underlying pathology of the genes identified [20].

As there is no genetic testing available for index patients or “healthy” relatives who may be susceptible to FNMTC, early clinical diagnosis of thyroid neoplasia is necessary [32]. A careful assessment of patient history followed by FNA and cytological assessment are used to diagnose NMTC [22], but an individual cannot be diagnosed with FNMTC until at least one other family member has been diagnosed with NMTC [20]. Molecular makers would be of great value in this regard, allowing for earlier diagnosis of at-risk individuals, earlier clinical 23 intervention and improved outcomes for patients [28]. However, before molecular markers can be utilized to diagnose and screen at risk individuals, susceptibility genes must be identified and the underlying pathogenesis confirmed [20].

1.4.3 FNMTC Genetics

While it appears clear that there is a hereditary basis underlying FNMTC, no specific gene mutation has been discovered that so far that consistently accounts for the disease [28]. So far,

5 susceptibility genes have been identified that may predispose individuals to FNMTC. These include: SRGAP1 (SLIT-ROBO Rho GTPase Activating Protein 1) [50], FOXE1 (Forkhead

Box E1) [27], NKX2.1 (NK2 Homeobox 1) [51], HABP2 (Hyaluronan Binding Protein) [52] and SRRM2 (Serine/Arginine Repetitive Matrix 2) [17]. Additionally, susceptibility chromosomal loci have been identified, however the candidate genes at these loci remain unknown [28].

1.4.3.1 SRGAP1 (SLIT-ROBO Rho GTPase Activating Protein 1)

The SRGAP1 (SLIT-ROBO GTPase Activating Protein 1) gene is located on chromosome

12q14.2 and encodes for the SRGAP1 protein. A key function of SRGAP1 is to inhibit the G- protein CDC43 in a Slit-Robo-dependent manner in neuronal cells [53]. Aberrant activation of CDC42 has been associated with tumorigenesis [54, 55] and functional studies have demonstrated that activation of CDC42/p21-activated kinase signaling has a role in PTC invasion by promoting epithelial-to-mesenchymal transition [56], a key event in metastatic development in epithelial tumours [57].

24

He et al. (2013) conducted a genome-wide linkage analysis using single-nucleotide polymorphism (SNP) genotyping of 38 FNMTC families from Ohio and identified the 12q14 locus with possible linkage to FNMTC [50]. Four germline variants (Q149H, A275T, R617C and H875R) in SRGAP1 were found to segregate with FNMTC in one family each. The

Q149H and A275T variants were not detected in sporadic NMTC cases from Ohio and

Poland or in the healthy controls. R617C was detected in <1% of sporadic NMTC cases and controls, and H875R was detected in >10% of sporadic NMTC cases and controls [50].

Functional studies of the 4 variants suggested that Q149H and R617C variants could be loss- of-function-type changes resulting in aberrant activation of CDC42 [50].

Taken together, these studies suggest that two variants (Q149H and R617C) may predispose to FNMTC by causing in aberrant activation of CDC42, and the SRGAP1 gene may be a low- penetrance susceptibility gene [50]. Further studies in additional FNMTC cohorts are required to determine the association of SRGAP1 to FNMTC as well as the importance of the

A275T and H875R variants [50].

1.4.3.2 NKX2.1 (NK2 Homeobox 1)

The NKX2.1 gene maps to 14q13 and encodes for the thyroid transcription factor-1 (TTF-1) protein [58]. TTF-1 is essential for the maintenance of normal thyroid architecture and function [58] and is known to activate the transcription of thyroperoxidase [59], thyroglobulin [60] and thyrotropin [61] genes. NKX2.1 mutations have been associated with a range of diseases, including congenital hypothyroidism and benign hereditary chorea [51]. In malignant thyroid carcinoma, NKX2.1 expression is downregulated relative to normal thyroid tissue and its downregulation is utilized as a marker for thyroid differentiation [62]. 25

Ngan et al. (2009) conducted a targeted DNA sequencing study of the NKX2.1 gene in

Chinese families with a history of NMTC with or without multi-nodular goiter (MNG) and healthy controls. A germline mutation (A339V) was identified in four of the 20 MNG/NMTC patients from two families. In vitro functional studies in rat thyroid cells (PCCL3 cells) demonstrated that overexpression of A339V TTF-1 was associated with increased thyrotropin-independent cellular proliferation and decreased transcription of thyroid-specific genes [51].

Of the two families with NMTC, MNG, and the A339V mutation, only a one family had 2 family members with a history of NMTC and therefore meets the clinical definition of

FNMTC [51]. In this context, the association of the A339V variant with FNMTC remains in question. Additionally, the association not be replicated in an Italian cohort of 63 patients with FNMTC from 38 kindreds [63].

NKX2.1 may be a susceptibility gene for MNG and the A339V mutation may predispose to

MNG rather than FNMTC, with the presence of MNG predisposing individuals to NMTC

[51]. Additional studies in larger cohorts of both sporadic and familial NMTC, as well as cohorts susceptible to MNG are required to elucidate the association of NKX2.1 to MNG and

NMTC.

1.4.3.3 FOXE1 (Forkhead Box E1)

The FOXE1 gene (Forkhead Box E1) is localized to chromosome 9q22.33 and encodes for the FOXE1 transcription factor (also known as thyroid transcription factor 2, TTF-2), consisting of a forkhead/winged helix DNA-binding domain and a polyalanine (polyA) tract 26

[64]. The number of alanine residues in the polyA tract is variable and ranges from 11-22, with FOXE114Ala being the most commonly occurring allele [65]. FOXE1 is critical for the formation of the thyroid [66] and its differentiation [64], as well as facilitating the response of the thyroid to hormones by altering chromatin structure, leading to the synthesis of T3 and

T4 [67]. A number of mutations and variants of the FOXE1 gene have been associated with both sporadic NMTC and FNMTC.

The rs1867277 SNP occurs in the 5’UTR of the FOXE1 gene and was originally associated with sporadic NMTC in Spanish and Italian cohorts [68]. Functional studies suggested that this SNP upregulates the expression of FOXE1 by increasing the recruitment of leucine zipper upstream stimulatory factors 1 and 2 to the FOXE1 promoter [68]. A strong association of rs1867277 to FNMTC was later found in the Portuguese population [69], however, a later study only found weak associations between rs1867277 and FNMTC [27].

The association of FOXE1 polyAla tract variants to FNMTC is somewhat controversial.

FOXE116Ala was originally associated with sporadic NMTC in a Spanish and Tunisian cohort

[65]. It was again associated with sporadic NMTC and shown in an in vitro functional study to induce a stronger transactivating signal of the thyroglobulin promoter relative to the wild type FOXE114Ala [70]. Tomaz et al. (2012) demonstrated that FOXE1 polyA tract variants with alanine residues greater than 14 (15-19 Ala) were associated with FNMTC in a

Portuguese population [69]. However, Bonora et al. (2014) observed no association of

FOXE116Ala and FNMTC [27].

A germline variant (c.743C>G, p.A248G) in FOXE1 has also been associated with FNMTC in a Portuguese population [70]. This variant segregated with FNMTC in one family, and was

27 detected in an apparent sporadic case of NMTC. In vitro functional studies demonstrated that the variant promoted cellular proliferation and migration relative the wild type FOXE1 sequence [70]. However, the association of this variant with FNMTC has not yet been demonstrated in additional FNMTC cases.

The inconsistent association of FOXE1 variants to FNMTC may be explained by FOXE1 being a low-penetrance susceptibility gene for FNMTC [28]. Further validation is required before FOXE1 screening can be implemented for FNMTC patients.

1.4.3.4 HABP2 (Hyaluronan Binding Protein)

In a recent study, Gara et al. (2015) performed next-generation exome sequencing of DNA from a family with 7 individuals affected with FNMTC and follicular adenoma [52]. A germline variant (G534E) in exon 13 of HABP2 (Hyaluronan Binding Protein 2) was identified that segregated with all 7 affected family members in this kindred. Functional studies suggested a possible role for HABP2 as a tumour-suppressor gene, with the G534E variant as a dominant-negative tumour suppressor. The variant was also identified in 19/423

(4.7%) of NMTC patients from multiple ethnicities and 0.7% of individuals of unknown disease status [52].

However, the association between this variant and FNMTC is somewhat controversial. The

G534E variant was not detected in 12 FNMTC kindreds from China [71] and 37 FNMTC kindreds from Australia [11]. A subsequent editorial correspondence noted that the frequency of the G534E variant was much higher than the 0.7% reported, and as high as 3.29-5.7% in the European population [72-74]. The frequency of the G534E variant in two independent

28 cohorts was determined to be 7.6 and 9.3 %, indicating that it is a common polymorphism and not a rare variant [11] and therefore, its association with FNMTC remains in question.

1.4.3.5 Susceptibility loci

A number of loci have been identified that may predispose to FNMTC, however, the specific gene(s) at these loci that predispose to FNMTC remain unknown (table 1.5).

Table 1.5. FNMTC Susceptibility Loci* Chromosomal loci Type of thyroid cancer Reference TCO (19q13.2) PTC (Hurtle cell), MNG [75] [76] [77, 78] fPTCI PRN (1q21) PTC [79] [80] FTEN (8p23.1-p22) PTC, MNG [81]

NMTC1 (2q21) PTC [82] [77] MNG1 (14q32) PTC, MNG [83] 6q22 PTC [80] 8q24 PTC [84] PTC, papillary thyroid cancer; MNG, multinodular goiter *Adapted from Yang et al. 2016 [28].

1.4.3.7 SRRM2 (Serine/Arginine-rich repetitive matrix 2)

Tomsic et al. (2016) performed genotyping, haplotype analysis, whole exome sequencing and genetic linage analysis in a FNMTC family with six affected first- or second-degree relatives.

A germline variant (c1037 C>T) in SRRM2 (Serine/Arginine-rich repetitive matrix 2) was identified in all six affected family members, seven out of 1170 sporadic NMTC cases, but in zero of 1404 controls. The missense variant occurs in exon 11 of SRRM2 and is predicted to change the amino acid sequence of SRRM2 from serine to phenylalanine (S346F), is highly conserved among mammals and was predicted to be damaging by SIFT and probably 29 damaging by PolyPhen-2 [17]. High throughput RNA sequencing (RNA-seq) was performed on polyadenylated transcripts from blood samples extracted from three FNMTC patients heterozygous for the SRRM2 mutation and three wild type controls. Significant differences in the alternative splicing pattern were identified in 1,642 exons, of which only 7 were verified experimentally via quantitative real-time PCR (qRT-PCR) [17].

These results suggest that the S346F variant altered the normal splicing function of SRRM2 and this variant may be a loss-of-function mutation predisposing to FNMTC by altering the normal splicing pattern of unidentified downstream effectors [17]. SRRM2 may be a high- penetrance susceptibility gene for FNMTC [17], however, this study is the first to associate

SRRM2 variants to FNMTC and additional studies are required to determine the significance of SRRM2 variants to FNMTC.

1.5 RNA Splicing

Prior to translation into proteins, primary transcripts (pre-mRNA) from protein coding genes are processed into mature mRNA by RNA splicing [85]. This tightly regulated process is carried out by the large macromolecular machines known as the major and minor spliceosomes and involves the removal of introns from pre-mRNA and the splicing together of exons. The vast majority of human pre-mRNA splicing is carried out by the major spliceosome that consists of four small nuclear ribonucleoprotein (snRNP) complexes (U1,

U2, U4, U5, U6 snRNPs) and multiple additional splicing factors [86].

The excision of introns from pre-mRNA is facilitated by the recognition of short sequence motif splice sites that occur at exon/intron boundaries. Splice sites can be categorized as either constitutive or alternative, depending on whether they are always (constitutive) or only

30 sometimes (alternative) incorporated into the final mRNA. The splicing process involves the recognition and interaction of these splice sites with the spliceosome and frequently occurs via additional auxiliary splicing factors. Following the assembly of the spliceosome components and the pre-mRNA substrate, splicing occurs via two sequential transesterification reactions [86].

1.5.1 Alternative Splicing

Alternative splicing is the regulated process during whereby the introns of a primary transcript (pre-mRNA) from a protein-coding gene are removed and the exons are spliced together in many different configurations producing structurally distinct mRNA variants [87]. This process is required for almost all human mRNAs and allows for a single gene to produce multiple functionally distinct protein isoforms and therefore expands the cellular proteome. Alternative splicing patterns can differ between cell and tissue types as well as in response to environmental stimuli, and therefore this process allows for the greater cellular complexity required for higher eukaryotic organisms [85].

Figure 1.10 Alternative Splicing Producing Three Distinct Protein Isoforms [85] 31

Aberrant alternative splicing can have diverse and dramatic effects at the RNA and protein level that are mostly context-dependent. The introduction of a premature stop codon in mRNA may render a protein non-functional, or produce a protein with an opposing function

[88]. Dysregulation of alternative splicing has been associated with almost every aspect of cancer development and progression including metastasis, metabolism, angiogenesis, apoptosis, invasion and cell cycle control [89, 90]. Genome-wide analyses of cancer transcriptomes have demonstrated that cancers, in addition to differentially splicing of specific genes, exhibit global splicing changes relative to normal tissue [86, 91]. Recurrent mutations in genes encoding for splicing factors and components of the spliceosome are observed in several cancers at high frequencies, providing a direct genetic link between alternative splicing dysfunction and cancer, and demonstrating that splicing factors can act as proto-oncogenes [92, 93].

1.5.2 SR Proteins

The regulation of RNA splicing frequently occurs by trans-acting splicing factors such as members of the SR protein family. These splicing factors are highly conserved in metazoans and contain one or two N-terminal RNA recognition motifs (RRMs) and a serine/arginine- rich (RS) domain(s) [94]. In general, RRMs function in RNA recognition and RS domains participate in diverse protein-RNA and protein-protein interactions [95]. SR proteins have diverse functions in RNA splicing and gene expression regulation including: (1) interacting with splicing regulatory elements (SREs), such as exonic splicing enhancers (ESEs) and intronic splicing enhancers (ISE), within pre-mRNA to promote the inclusion of exons within the final mRNA [96], (2) acting as shuttle proteins moving pre-mRNA and mRNA between the cytoplasm and nucleus [97] and, (3) regulating mRNA decay and translation [98]. 32

1.5.3 Nuclear Speckle Domains

The distribution of SR proteins within the nucleus of eukaryotic cells is not uniform, rather

SR proteins are concentrated at 10-50 interchromatin granules called speckle domains (Figure

1.11) [99]. They are also enriched with other splicing factors, spliceosomal snRNPs, kinases and phosphatases [100]. The exact role of these dynamic structures is not understood, however, they may function as storage and assembly compartments that supply splicing factors to sites of active transcription, alternatively they may act as hubs in which certain

RNA splicing processes occur [101].

33

Figure 1.11 Structured illumination microscopy image of a HeLa cell nucleoplasm. Nuclear speckle domains enriched with splicing factors are shown in green [99].

1.5.4 SRRM2 (Serine/Arginine Repetitive Matrix Protein 2)

The SRRM2 gene is localized to 16q13.3 and encodes for 3 protein isoforms, the most abundant of which is the full length ~300kDa SRRM2 protein (also known as SRm300), that is highly concentrated in nuclear speckle domains [101].

34

Figure 1.14 Immunofluorescence Image of a TIG-1 Cell Nucleoplasm. Shown in red is showing SRRM2 (SRm300) highly concentrated in nuclear speckle domains [101].

SRRM2 interacts with a related protein SRRM1 (also known as SRm160), forming the

SRRM1/SRRM2 splicing co-activator complex (SRRM1/SRRM2), a core component of the spliceosome [94]. SRRM2 is a SR protein with structural features unique to the SR protein family, including a RNA-binding domain (RBD) that lacks well-defined RNA-binding motifs observed in other SR proteins and is more closely analogous to an arginine-rich motif (ARM) that participate to protein-RNA interactions [102]. Both SRRM1 and SRRM2 also have an unusually high content of serine, arginine and proline residues, with SRRM2 also having two very long polyserine tracts (Figure 1.12) [103].

35

Figure 1.12 SRRM2 protein sequence*. The N-terminal RNA-recognition motif (RRM) is underlined. Two large RS motif clusters are marked with bold lettering and two polyserine stretches are boxed. *Adapted from Sawada et al., 2000 [102].

1.5.4.1 The SRRM1/SRRM2 Splicing co-activator

The exact role of SRRM1/SRRM2 in RNA splicing is unknown. SRRM1/SRRM2 was shown to associate with pre-mRNA by a U1 snRNP-dependent pathway that is further stabilized by U2 snRNP and other SR family proteins [94]. It was shown to promote splicing activity in HeLa nuclear extracts in vitro but unlike other SR family proteins, not in splicing 36 factor-deficient cytoplasmic S100 extracts [94]. Specific depletion of SRRM1/SRRM2 inactivates the splicing of a subset of pre-mRNAs in nuclear extracts and addition of excess

SRRM1/SRRM2 to nuclear extracts, or S100 cytoplasmic extracts supplemented with SR family proteins, stimulates splicing of a subset of pre-mRNAs, demonstrating that

SRRM1/SRRM2 is a co-activator of pre-mRNA splicing and that its function is dependent on co-operation with additional splicing factors [94].

1.5.4.2 Exon-enhancer activity of the SRRM1/SRRM2 splicing co-activator

SR family proteins have been shown to promote the selection of alternative splice sites by functioning in splice-site recognition by binding to SREs within pre-mRNA [95]. The function of SRRM1/SRRM2 in exon inclusion was demonstrated with heterologous pre- mRNA from the dsx gene of Drosophila with a GAA-repeat purine-rich ESE from mammalian alternatively spliced pre-mRNA [94]. SRRM1/SRRM2 was found to promote

ESE-dependent splicing in this context by mediating critical interactions between additional

SR proteins, the U1 snRNP bound to the 5’ splice site, and U2 snRNP bound to the pre- mRNA branch site (Figure 1.13) [94].

37

Figure 1.13 The Proposed Model of the SRRM1/SRRM2 Splicing Coactivator in ESE Function [8]. SRRM1/SRRM2 promotes exon inclusion by facilitating the interaction between spliceosomal snRNPs (U1 and U2) and the GAA-repeat ESE in pre-mRNA via auxiliary splicing factors (SR proteins and Tra2) [94].

However, it was later shown that immunodepletion of SRRM2 in nuclear extracts did not inactivate splicing of the GAA-repeat ESE dsx pre-mRNA and that addition of SRRM1 alone in SRRM1/SRRM2 depleted reactions activated splicing [103]. This may indicate that

SRRM1 is the more important component of SRRM1/SRRM2, however, the authors could not rule out whether minor levels of SRRM2 were still present in the nuclear extract and that remaining SRRM2 may have facilitated splicing [8].

The fact that SRRM2 is stably associated with SRRM1 and splicing complexes during both transesterification reactions on different pre-mRNAs , suggests that it is involved in splicing, contrary to the previous study that found no difference in RNA splicing when SRRM2 was immunodepleted [8]. Seemingly agreeing with this line or reasoning, the potential importance of SRRM2 in splicing demonstrated incidentally when after purification of the spliceosomal 38

C complex, SRRM2 was the only SR-related protein to remain at the catalytic core of the spliceosome, suggesting that it has a critical function in RNA splicing [104, 105].

It must be considered that other splicing events may be more dependent on SRRM2 than

GAA-repeat ESE pre-mRNA and this should be a focus of future research [103]. Additional

SRE sequences that promote or repress splicing in cooperation with SR proteins may require

SRRM2, however, this has not yet been confirmed experimentally. Future studies should focus on the identification of possible SRE sequences and determined whether SRRM2 has functions in enhancing or silencing the inclusion of the exons associated with these sequences.

1.5.4.3 Cell Cycle Control

DNA replication occurs during S phase of mitosis and this process involves the upregulation of histone genes that are required to produce sufficient levels of histone proteins for nascent

DNA packaging into chromatin [106]. Histone nuclear factors (HiNFs) are transcription factors that participate in the regulation of histone protein production from the H4/n gene.

Histone pre-mRNA do not contain introns and are therefore not subject to traditional RNA splicing [107]. Rather, they undergo endonucleolytic cleavage that results in a non- polyadenylated transcript. Studies indicate that 3’ polyadenylation of histone pre-mRNA occurs via additional splicing factors and this process is required for histone pre-mRNA maturation into mRNA [106].

HiNF-P is a histone H4 subtype-specific transcriptional regulator that interacts with the conserved cell cycle control element in the promoter of histone H4 genes to initiate transcription via a p220NPAT-dependent pathway [107]. SRRM2 was shown to interact with

HiNF-P with yeast two-hybrid, co-immunoprecipitation, and co-immunofluorescence studies 39

[106]. It is hypothesized that the HiNF-P/SRRM2 complex may play a role in the posttranscriptional regulation of histone production, with the formation of the complex necessary to produce mature, polyadenylated histone mRNA [106, 108].

p220NPAT is an essential component of the cyclin E/Cdk2 signaling pathway that regulates the entry of cells into S-phase [108], and multiple other cell cycle events, including DNA replication, centrosome duplication, and activation of the E2F transcription factors that regulate the cell cycle [106]. SRRM2 and p220NPAT have also been shown to associate via

HiNF-P and [106] , SRRM2 and p220NPAT have been shown to co-localize in Cajal bodies suggesting that SRRM2 may function with p220NPAT to regulate the cell cycle events [106].

These findings provide a link between SRRM2 and the cell cycle, however, the functional consequences of SRRM2 interactions with these cell cycle regulators remains unknown.

1.7 Introduction to Thesis

SRRM2 was identified as a potential susceptibility gene for FNMTC, where a missense mutation (S346F) was identified in six of seven affected members of a well-documented

FNMTC family [17]. High throughput RNA-seq suggests that this mutation predisposes to

FNMTC by altering alternative splicing [17]. Unpublished work conducted at the Department of Endocrinology and Diabetes (Sir Charles Gairdner Hospital) has also identified SRRM2 as a potential susceptibility gene for FNMTC, where exome sequencing of a Western Australian

40

FNMTC cohort identified seven mutations/variants with possible associations to the disease.

One missense mutation (R1805W) co-segregated with FNMTC and is predicted in silico to be pathogenic, whereas the others are of unknown significance. Interestingly, both the previously identified S346F mutation [17], as well as the recently identified R1805W mutation, both lie at the 5’ end of one of the highly conserved RS domains [109], with the

S346F occurring in the 5’ RS domain and the R1805W occurring in the 3’ RS domain. Both missense mutations also alter either a serine (S346F) or arginine (R1805W) residue of a RS domain.

Figure 1.14 Mutations in SRRM2 Associated with FNMTC Occur Within the Highly- Conserved RS Domains (RSD).

In this context, the hypothesis of thesis:

The S346F and R1805W mutations in the SRRM2 gene play a role in the pathogenesis of

FNMTC through:

1. Altered splicing of nascent pre-mRNA for downstream effectors in various signal

transduction pathways; mutant SRRM2 results in altered transcript and protein levels

of those effectors and decreased tumour suppression activity.

41

2. Altered regulation of the cell cycle, whereby mutant SRRM2 prevents proper

regulation of the cell cycle, resulting in greater proliferative and tumorigenic

characteristics of the cell.

Therefore, the specific aims of this thesis:

1. To generate mutant model cell systems harboring the SRRM2 mutations so the

functional consequences of these mutations can be interrogated relative to a wild type

model cell system.

2. To investigate whether these mutant cell systems display differences in the splicing

pattern of specific pre-mRNAs through quantitative real-time PCR.

3. To employ flow cytometry technique to characterise the proliferative characteristics

of these mutant cell systems and determine whether these mutations prevent proper

regulation of the cell cycle.

Chapter 2 – Materials and Methods

2.1 Materials

42

2.1.1 General Equipment

2.1.2 Instruments

Item Supplier Item Supplier Aerosol Barrier Pipette Tips (P20, P200, Interpath Services, Australia P1000) Cell Scraper Greiner Bio-One, Australia CELLSTAR Cell Culture Flasks (25 Greiner Bio-One, Australia cm2, 75 cm2, 96-well, 24-well, 12-well, 6-well) Centrifuge Tubes, 15 mL Greiner Bio-One, Australia Centrifuge Tubes, 50 mL Greiner Bio-One, Australia Eppendorf Tubes, 1.5 mL Axygen, USA Glass Pipettes Sigma-Aldrich, USA Gloves, Non-sterile, Latex Ansell, USA Kimwipes, Kimtech Sigma-Aldrich, USA Minisart Filter Sigma-Aldrich, USA Nalgene Rapid-Flow Sterile Disposable Thermo Fisher Scientific, Australia Filter Units Needle, Precision Glide 0.5 x 25mm Becton Dickinson, USA Parafilm Pechiney Plastic Packaging, USA Petri Dish, 90mm Techno Plas, Australia PIPETMAN Neo Multichannel (P200) Sigma-Aldrich, USA PIPETMAN Neo Single Chanel Pipette Sigma-Aldrich, Australia (P2, P20, P200, P1000) Pipette Aid Eppendorf easy3 Eppendorf, Germany Pipette Aid Eppendorf Easy3 Eppendorf, Germany Pipette Tips (P2, P20, P200, P1000) Bio-Rad, Australia Pipettes, Plastic 10 mL Starstedt Inc PYREX Reusable Media Storage Bottles Pyrex, Australia (500 mL, 1 L) Transfer Pipettes Greiner Bio-One, Australia Tuberculin Syringe, 1 mL Becton Dickinson, USA Autoclave Atherton, Australia BD FACSCanto II, Flow Cytometer BD Biosciences, USA BD FACSMelody, Flow Cytometer BD Biosciences, USA BD InfluxTM Cell Sorter, Flow BD Biosciences, USA Cytometer BD LSRFortessa, Flow Cytometer BD Biosciences, USA Centrifuge, Beckman Avanti J-301 Beckman Coulter, USA 43

Centrifuge, MicroCL 21R Thermo Fisher Scientific, Australia Microcentrifuge ChemidocTM XRS+ Bio-Rad, Australia Desiccator Nalgene, Australia DNA Electrophoresis Mini-Sub DNA Bio-Rad, Australia Cell Tank DNA Electrophoresis Wide Mini-Sub Bio-Rad, Australia DNA Haemocytometer Hirschman techcolor, Germany Heracell 150 CO2 Incubator Thermo Fisher Scientific, Australia Hoshkzaki Ice Maker (Model #FM- Hoshizaki, Japan 150KE) Microcentrifuge IEC MicroMax IEC, USA Microscope (Model #IMT-2) Olympus, Japan MJ Research PTC-220 Thermal Bio-Rad, Australia Cycler NanoDrop 2000 Spectrophotometer Thermo Fisher Scientific, Australia pH Indicator, LabChem-pH TPS Pty. Ltd., Brisbane, Australia

Power Pack Supply (Mode Bio-Rad, Australia l#1000/500) QuantStudio 6 Flex Real-Time PCR Thermo Fisher Scientific, Australia System Rocker Unit RR9D Chiltern Scientific, USA Roller Mixer Rater Instrument, Australia SDS-PAGE Electrophoresis and Bio-Rad, Australia Transfer Apparatus, Mini-Protean® Tetra Cell Shaking Water Bath, SW22 Julabo GmbH, Germany ST19 Vortex Sentra, USA T100 Thermal Cycler Bio-Rad, Australia Tissue Culture Hood (Model #CF435) Gelman Sciences, Australia Ultraviolet Transilluminutesator, UVT International Biotechnologies 100 Ultrospec 1 Amersham Biosciences, UK Water Bath 37ºC Techne, UK Weight Scales Sartorius, USA Zoe Fluorescent Cell Imager Bio-Rad, Australia

2.1.3 PCR and DNA Reagents

Item Supplier 1 Kb Plus DNA Ladder Invitrogen, Australia 100 mM dNTPs Promega, USA GelPilot DNA Loading Dye, 5x QIAGEN Pty. Ltd., Australia Oligonucleotide Primers Sigma-Aldrich, USA Single-Stranded Oligonucletodies Integrated DNA Technologies, USA (ssODN) (150 bases) 44

2.1.4 DNA Modification Enzymes

Item Supplier ApoI New England BioLabs, USA BbsI-HF New England BioLabs, USA BsmBI New England BioLabs, USA BsrFaI-HF New England BioLabs, USA EcoRV-HF New England BioLabs, USA IX CutSmart Buffer New England BioLabs, USA IX NEBuffer 3.1 New England BioLabs, USA NotI-HF New England BioLabs, USA T4 DNA Ligase Promega, USA T4 DNA Ligase Buffer Promega, USA T4 Polynucleotide Kinase New England BioLabs, USA

2.1.5 Plasmid Constructs

Item Supplier pcDNA3-EGFP Provided by Karen Kroeger, WAIMR pcDNA3.1 Invitrogen, The Netherlands pSpCas9(BB)-2A-GFP (PX459) Gift from Feng Zhang (Addgene plasmid # 48138)

2.1.6 Cell Lines

Item Supplier HAP1 Horizon Discovery, United Kingdom

2.1.7 Bacterial Strains

Item Supplier Escherichia coli (E. coli) XL2 Blue Department of Endocrinology and Diabetes, Sir Charles Gairdner Hospital, Perth, Australia Escherichia coli (E. coli) XL10-Gold Agilent Technologies, USA Ultracompetent Cells

45

2.1.8 Tissue Culture

Item Supplier CellTraceTM CFSE Thermo Fisher Scientific, Australia 2.1.9 Fetal Bovine Serum (FBS) from Gibco Thermo Fisher Scientific, Australia FuGENE 6 Promega, USA Antibo Iscove's Modified Dulbecco's Medium, Thermo Fisher Scientific, Australia Powder (IMDM) dies LipofectaminuteseTM 3000 Invitrogen, The Netherlands Opti-MEM Thermo Fisher Scientific, Australia Penicillin/Streptomycin Sigma-Aldrich, Australia Propidium Iodide Sigma-Aldrich, Australia RNase A Thermo Fisher Scientific, Australia Trypan Blue Sigma Chemicals, Australia Trypsin 0.05% EDTA Life Technologies, USA Turbofectin 8.0 OriGene, USA Item Supplier Goat Anti-Mouse, HRP Conjugate Sigma-Aldrich Goat Anti-Rabbit IgG, HRP Conjugate Promega, USA Mouse Anti-Tubulin Sigma-Aldrich, USA Rabbit Anti-SRRM2 Polyclonal Thermo Fisher Scientific, Australia Antibody, IgG (PA5-59559)

2.1.10 Polyacrylamide Gel Electrophoresis

Item Supplier Ammonium Persulphate Univar, USA Bovine Serum Albuminutes Sigma-Aldrich, USA N, N'-Methylene-bis-Acrylamide Bio-Rad, Australia N,N,N,N’- Sigma-Aldrich, USA Tetramethylethylenediaminutese (TEMED) Ponceau George T. Gurr, Ltd, UK Precision Plus ProteinTM dual Colour Bio-Rad, Australia Standards Protran Premium (0.45mm) Amersham Biosciences, UK Nitrocellulose Membrane Skim Milk Powder Coles Supermarkets, Australia Western LightningTM Perkin Elmer Life Sciences, Boston, USA Chemiluminutesescence Reagent

2.1.11 Commercial Kits

Item Supplier 46

GoTaq DNA Polymerase Kit Promega, USA Long-Range PCR Kit QIAGEN Pty. Ltd., Australia Pierce BCATM (Bicinchoninic Acid) Thermo Fisher Scientific, Australia Protein Assay Kit PureYieldTM Plasmid Miniprep Kit Promega, USA QIAamp DNA Mini Kit QIAGEN Pty. Ltd., Australia QIAGEN PCR Cloning Kit QIAGEN Pty. Ltd., Australia QIAGEN QIAEX II Gel Extraction QIAGEN Pty. Ltd., Australia Kit QuantiTect Reverse Transcription Kit QIAGEN Pty. Ltd., Australia QuantiTect SYBR Green PCR Kit QIAGEN Pty. Ltd., Australia Quikchange II XL Site-Directed Agilent Technologies, USA Mutagenesis Kit RNeasy Mini Kit QIAGEN Pty. Ltd., Australia Wizard ® Plus SV Miniprep DNA Promega, USA Purification Kit

2.1.12 General Reagents

Item Supplier Absolute Ethanol Biolab, Australia Agarose Promega, USA Ampicillin Sulphate Sigma-Aldrich, Australia BactoTM Agar Becton Dickinson, USA BactoTM Peptone Becton Dickinson, USA BactoTM Tryptone Becton Dickinson, USA BactoTM Yeast Extract Becton Dickinson, USA Bleach (6 % Sodium Hypochlorite) Clorox, USA Dimethyl Sulphoxide (DMSO) BDH Chemicals, Australia Dithiothreitol (DTT) Sigma-Aldrich Double-Distilled Water (ddH2O) In House Ethidium Bromide Sigma-Aldrich, Australia Ethylenediaminutesetetra-acetic acid Sigma-Aldrich, Australia (EDTA) Glacial Acetic Acid Sigma-Aldrich, Australia Glycerol Ajax Finechem, Australia Glycine ICN Biomedicals, USA Kanamycin Sulphate Sigma-Aldrich, Australia Liquid Nitrogen Air Liquide, Perth, WA Magnesium Chloride (MgCl2) Sigma-Aldrich Methanol Ajax Finechem, Australia Na2H2PO4 Ajax Chemicals, Australia pH Buffers TBS Pty. Ltd., Brisbane, Australia Potasium Chloride (KCl) Merck, Pty. Ltd, Vic Potassium Phosphate (KH2PO4) BDH Chemicals, Australia Sodium Bicarbonate (NaHCO3) Ajax Chemicals, Australia

47

Sodium Chloride (NaCl) Sigma-Aldrich, Australia Sodium Dodecyl Sulphate (SDS) Bio-Rad, Australia Sodium Hydroxide Ajax Finechem, Australia Triton-X 100 Roche, USA Trizma Base Sigma-Aldrich, USA Tween® 20 Sigma-Aldrich, USA β-Mercaptoethanol Sigma-Aldrich, Australia Calcium Chloride (CaCl2.2H2O) Univar, USA Magnesium chloride (MgCl2) Sigma-Aldrich, Australia Manganese Chloride (MnCl2) Sigma-Aldrich, Australia Hexaamminecobolt chloride Sigma-Aldrich, Australia Potassium acetate Sigma-Aldrich, Australia Kanamycin Sulphate Sigma-Aldrich, Australia

2.2 General Methods

This section contains general methods that were used in proceeding chapters. Each of the proceeding chapters contain a specific method section for the specific methods employed.

Primer sequences can be found in Appendix II.

48

2.2.1 Bacterial Culture Techniques

2.2.1.1 Preparation of E. coli XL2 Blue Competent Cells

A scrapping of frozen E. coli XL2 glycerol stock was inoculated into 15 mL of 2 x YT media containing 50 µg/mL of tetracycline and grown overnight at 37ºC with shaking at 220 rpm.

The following day, 5 mL of the overnight culture was inoculated into 500 mL of 2 x YT media and grown for 4 hours with shaking at 220 rpm at 37ºC. The optical density of the medium at a wavelength of 600 nm (OD600) was closely monitored every 15 minutes after the

3 hours incubation using an Ultrospec 10 according to the manufacturer’s instructions

(Amersham Biosciences). When OD600 reached 0.6, cells were pelleted by centrifugation at

20,000 G for 10 minutes at 4ºC. The pellet was then suspended in 50 mL of ice-cold 100 mM

MgCl2, mixed by swirling and incubated on ice for 20 minutes. The cells were then pelleted as in the above step, the supernatant removed, and the cells were suspended in 50 mL of ice- cold 100 mM MgCl2 and incubated on ice for 20 minutes. The cells were again centrifuged as described above, the supernatant was removed, and the cells suspended in 50 mL of ice-cold

100 mM CaCl2, swirled and incubated on ice for 20 Minutes. The cells were centrifuged as described above and the supernatant was removed before the cells were suspended in ice-cold

14 % glycerol/100 mM CaCl2. 200 µL of the cells solution was then aliquoted into 1.5 mL

Eppendorf tubes (1.5 mL tubes) and flash-frozen in liquid nitrogen before being stored at -

80ºC.

2.2.1.2 Preparation of E. coli XL2 Blue Competent Cells by the Hanahan Method

A scrapping of frozen E. coli XL2 glycerol stock was inoculated into 15 mL of SOB medium containing 50 µg/mL of tetracycline and grown overnight at 37ºC with shaking at 220 rpm.

The following day, 1 mL of the overnight culture was inoculated into 100 mL of 2 x YT

49 medium and grown for 3 hours with shaking at 220 rpm. The OD600 of the medium was then closely monitored every 15 minutes using a Ultrospec 10. When OD600 reached 0.45, cells were incubated on wet-ice for 20 minutes and then pelleted by centrifugation for 10 minutes at 20,000 G at 4ºC. The supernatant was removed and the cells suspended in 24 mL of ice- cold FSB buffer and incubated on ice for 15 minutes. The cells were then pelleted by centrifugation as described above, the supernatant was removed and the cells were suspended in 8 mL of ice-cold FSB buffer. 8 µL of dimethyl sulfoxide (DMSO) was added to the cells, mixed and incubated for 5 minutes on wet ice. This step was repeated before the cells were aliquoted (100 µL each) into 1.5 mL tubes, flash frozen in liquid nitrogen and stored at -80ºC.

These cells are subsequently referred to as Hanahan XL2 Blue Cells.

2.2.1.3 XL2 Blue Competent Cell Transformation

1 µL of diluted plasmid DNA was added to 200 µL of XL2 Blue cells and incubated on ice for 30 minutes. The cells were then heat-pulsed at 42ºC for 90 seconds and incubated on ice for 2 minutes. 800 µL of 2 x YT media was added to the cells and they were incubated at

37ºC for 1 hour on a shaker at 220 rpm. Approximately 100 µL of the transformed cells were plated on 2 x YT agar containing an antibiotic and incubated overnight at 37ºC. Dilutions were plated where necessary.

2.2.1.4 XL10-Gold Ultracompetent Competent Cells Transformation

XL10 Gold Ultracompetent Cells (Agilent Technologies) were thawed on ice and 45 µL was added to a 15 mL falcon tube. 2 µL of β-Mercaptoethanol (βM) (Agilent Technologies) was added to the cells and mixed by pipetting every 2 minutes for 10 minutes. 2 µL of plasmid

DNA was added to the cells, mixed by pipetting, and the cells were incubated on ice for 30

50 minutes. Cells were then heat pulsed at 42ºC for 30 seconds and subsequently incubated on ice for 2 minutes. 500 µL of pre-heated (42ºC) 2 x TY medium was added and the cells were incubated at 37ºC with shaking at 220 rpm for 1 hour. 250 µL of the cell mixture was then added to 2 x YT agar containing ampicillin and grown overnight at 37ºC.

2.2.1.5 Competent Cell Inoculation

Single colonies grown on agar plates with antibiotics were selected and inoculated into 15 mL 2 x YT broth containing 1.67 mg/mL ampicillin or kanamycin and grown at 37ºC with shaking at 220 rpm. This method was used for all competent cells.

2.2.2 Plasmid Extractions

2.2.2.1 Extraction of Plasmid DNA using the Wizard SV Plus Miniprep Kit

Plasmids transformed into competent cells were extracted using the Wizard Plus SV

Miniprep kit according to the manufacturer’s instructions (Promega). 2 mL of overnight culture was pelleted by centrifugation (17,000 G, 5 minutes), the supernatant removed and the pellet suspended in 600 µL of cell resuspension solution. 250 µL of Cell Lysis Solution was added and the tube mixed 3 times by inversion before 2 minutes’ incubation at room temperature. 10 µL of Alkaline Protease solution was added, mixed by inversion three times and incubated for 5 minutes at room temperature. 350 µL of Cell Neutralization Solution was added, mixed by inversion 3 times and the mixture centrifuged at 17,000 G for 10 minutes at room temperature. The cleared lysate was then transferred to a spin column placed in a 2 mL collection tube and centrifuged at 17,000 G for 1 minutes at room temperature. The flow through was removed from the collection tube, the spin column was placed back into the empty collection tube, and 750 µL of Column Wash Solution (previously diluted with 95 % 51 ethanol) was added and centrifuged at 17,000 G for 1 minute at room temperature. The flow through was discarded, the spin column placed back into the collection tube and 250 µL of

Column Wash was added to the spin column and then centrifuged at 17,000 G for 2 minutes at room temperature. The spin column was added to a sterile 1.5ml tube and 100 µL of NFW water added to elute the plasmid DNA after centrifugation at 17,000 G for 1 minutes.

2.2.2.2 Extraction of Plasmid DNA using the Pure Yield Plasmid Miniprep Kit

Plasmids transformed into competent cells were extracted using the Pure Yield Plasmid

Miniprep according to the manufacturer’s instructions (Promega) to produce endotoxin free plasmids for transfection into human cell lines. 2 mL of overnight bacterial culture was centrifuged at 12,000 G for 3 minutes, the supernatant removed and the pellet suspended in

600 µL of double-distilled water (ddH2O). 100 µL of Cell Lysis Buffer was added and mixed by inversion 6 times and incubated for 2 minutes at room temperature. 350 µL of ice-cold

Neutralization Solution was added and mixed by inversion 6 times. The solution was then centrifuged at 17,000 G for 3 minutes at room temperature and the supernatant transferred to a PureYield Minicolumn in a 2 mL collection tube before centrifugation at 17,000 G for 15 seconds. The flow through was discarded and 200 µL of Endotoxin Removal Wash was added before centrifugation at 17,000 G for 15 seconds. 400 µL of Column Wash (previously diluted with 95 % ethanol) was added to the column and centrifuged at 17,000 G for 30 seconds. The minicolumn was then transferred to a 1.5 mL tube, 30 µL of elution buffer was added the minicolumn, and the minicolumn was centrifuged at 17,000 G after 5 minutes incubation at room temperature to elute the plasmids.

52

2.2.3 Tissue Culture Techniques

2.2.3.1 Maintenance of HAP1 Cells

HAP1 cells were cultured in Iscove's Modified Dulbecco's Media (IMDM) supplemented with 10 % fetal bovine serum (FBS) and 1 % Penicillin/Streptomycin (subsequently referred to as HAP1 medium) and were stored at 37ºC with 5 % CO2. The medium was replaced every third day and cells were passaged at 1:10 or 1:15 dilution into 25 cm2 or 75 cm2 flasks once they reached ~70 % confluency.

2.2.3.2 Passage of human cells

Culture medium was aspirated and cells were washed twice with 1 x phosphate buffered saline (PBS). The PBS was removed by aspiration and cells were detached from flasks using

1-3 mL of 1 x trypsin (in 1 x PBS) and incubated at 37ºC and 5 % CO2 for 5-10 minutes.

Once the cells began to detach, trypsin was inactivated by the addition of 6 mL of HAP1 medium and the cells were transferred to a 15 mL centrifuge tube and centrifuged at 400 G for 2 minutes at room temperature. The medium was removed by aspiration the cells suspended into the required buffer.

2.2.3.3 Cell Counting

The number of viable cells was determined by the trypan blue cell viability assay. 80 µL of

0.4 % trypan blue in 1 x PBS was added to 200 µL of cell suspension and mixed. 20 µL of this suspension was added to a haemocytometer grid and the quantity of viable cells was determined as follows, where cells positive for trypan blue were excluded:

53

Number of viable cells/mL = 5x104 x (Average Number of cells appearing in two of the haemocytometer grids)

2.2.4 DNA Techniques

2.2.4.1 DNA Quantitation

DNA was quantified using a NanoDrop 2000 spectrophotometer (ThermoFisher) with either ddH2O or elution buffer as a blank. 1-2 µL of blanking solution was added to the bottom pedestal and the absorbance read prior to 1-2 µL of DNA solution being added. All quantifications were completed in duplicate and the A260/A280 and A260/A230 absorbance values recorded for quality control purposes.

2.2.4.2 Polymerase Chain Reactions

The polymerase chain reaction (PCR) was used to amplify DNA fragments from plasmids or genomic DNA (gDNA) using Taq DNA polymerase (Qiagen) unless otherwise stated.

Reactions with Taq DNA polymerase were conducted in a final volume of 50 µL with 1 x

PCR Buffer, 2.5 mM MgCl2, 10 µM of each forward and reverse primers, 200 µM of each dNTP, 2.5 units of Taq DNA polymerase, a variable amount of DNA template and the volume made up to 50 µL with ddH2O. Reactions were then placed in a Bio-Rad T100

Thermal Cycler and a reaction-specific thermal cycling program was implemented.

Following PCR amplification, PCR products were used immediately or stored at -20ºC until required

54

2.2.4.3 Agarose Gel Electrophoresis

Agarose gel electrophoresis was employed to check and/or purify DNA fragments after enzymatic reactions. 1 % agarose gels were prepared from 50 mL of 1 x TAE buffer and 0.5 g agarose with 0.4 µg/mL ethidium bromide. DNA samples were suspended in 1 x loading buffer and loaded on a 1 % agarose gel and electrophoresed at 100 V for 50- 90 minutes.

DNA fragments were visualized using a ChemiDoc XRS+ (Bio-Rad) or UV transilluminutesator (International Biotechnologies) as required.

2.2.4.4 Extraction of PCR Products from Agarose Gels

PCR products were purified from 1 % agarose gels following electrophoresis at 100 V for 1 hour using the QIAEX II Agarose Gel Extraction kit (Promega) according to the manufacturer’s instructions. The agarose gel was placed on a UV transilluminutesator and the desired bands excised from the gel using a sterile scalpel blade and placed in a 1.5 mL tube.

The agarose band was weighed and for PCR products under 4 kb, 3 volumes of Buffer QX1 was added to 1 volume of agarose gel. For PCR products >4 kb, 3 volumes of Buffer QX1 and 2 volumes of ddH2O was added to 1 volume of agarose gel. QIAEX II was suspended by vortexing for 30 seconds, 10 µL was added to the samples, and the samples were incubated for 10 minutes at 56ºC or until the agarose band dissolved, with mixing by inversion every 2-

3 minutes. The samples were then centrifuged at 17,000 G for 30 seconds and the supernatant removed by pipetting. The pellet was then washed 500 µL of Buffer QX1 and gently suspended by light vortexing, or by inversion for the larger PCR products, and the sample centrifuged for 30 seconds at 17,000 G. The supernatant was removed and the pellet was washed twice with 500 µL of Buffer PE, with pelleting by centrifugation at 17,000 G for 30 seconds between the washing steps, and the supernatant removed with a pipette. The pellet was then air-dried for 10-15 minutes or until the pellet turned white. The PCR products were 55 eluted in 20 µL of Nuclease Free Water (NFW) and the pellet suspended by gentle vortexing, or flicking the tube for large PCR products. The sample was then incubated at 56ºC for 15 minutes, the sample centrifuged at 17,000 G, and the supernatant containing the eluted DNA collected in a clean 1.5 mL tube and used immediately or stored at -20ºC until required.

2.2.4.5 Restriction Enzyme Digests

Restriction enzyme digests were performed in reactions containing 1 x Reaction Buffer, a variable quantity of DNA and 10-20 units of restriction enzymes, with the volume made up to

50 µL with ddH2O. For digestion of plasmids, 1 µg of DNA was used, and for digestions of

PCR products, 12.5 µL of the PCR product was used. When digestions were conducted with

2 enzymes, a reaction buffer compatible to both enzymes was used.

2.2.4.6 DNA Ligations

All ligations were performed using the T4 DNA Ligase kit (Promega), except for the pDrive-

SRRM2 ligation which used reagents from the QIAGEN PCR Cloning Kit. For ligations of

PCR products into plasmids, 50 ng of vector and a 5 x molar excess of PCR product to plasmid was used in the ligation reaction. The ligation reaction components and the formula to calculate quantity of PCR product required is shown below:

PCR product (ng) = 50 ng x PCR product size (bp) x 5 Plasmid size (bp)

Table 2.1 Reaction Components for Ligations of PCR Products into Vectors Component Volume For a 10 µL Volume For a 20 µL Reaction Reaction Vector (50ng) Variable Variable PCR Product Variable Variable Ligase 10X Buffer 1 µL 2 µL 56

T4 DNA Ligase (1 unit/µL) 1 unit 1 unit ddH2O To a final volume of 10 µL To a final volume of 20 µL

For ligations into pDrive, ligations were performed in a final volume of 10 µL with the reaction components shown in table 2.2

Table 2.2 Reaction Components for Ligation PCR Products into pDrive Component Volume/Reaction PCR Product (195ng) 3µL pDRIVE Cloning Vector (50ng) 1µL Ligation Master Mix, 2x 5µL ddH2O 1µL

For ligations of annealed oligonucleotides into digested vectors, 50 ng of vector and 0.5 µM of oligonucleotides were used in the ligation reaction (Table 2.3).

Table 2.3 Reaction Components for Ligation of Annealed Oligonucleotides into Vectors Component Volume/Reaction Digested pSpCas9 (50 ng/µL) 1 µL Phosphorylated and Annealed 1 µL Oligonucleotide Complex (1:200 dilution) Ligase 10 X Buffer 1 µL T4 DNA Ligase (1 unit/µL) 1 unit NFW 5 µL

All ligation reactions were incubated at 16ºC overnight in a T100 Thermal Cycler (Bio-Rad) and transformed into competent bacteria. Control ligations, that did not contain the DNA to be inserted were included for all ligations, and the volume of DNA used in the ligation reactions was replaced with ddH2O.

57

2.2.4.7 DNA Sequencing

Sanger sequencing was conducted at the Australian Genome Research Facility (AGRF) and was performed to verify the correct DNA in plasmids and PCR products. For sequencing of plasmids, 800-1600 ng of plasmid DNA was added to 9.8 pmol of primer and the volume made to 12 µL with nuclease-free water. For sequencing of PCR products, 20 µL of unpurified PCR product was sent to AGRF with 3.2 pmol of sequencing primer provided in a separate 1.5 mL tube.

2.2.4.8 Genomic DNA Extractions

Genomic DNA (gDNA) was extracted using the QIAamp DNA Minikit according to the manufacturer’s instructions for gDNA extraction from cultured cells. 20 µL of Proteinase K and 200 µL of Buffer AL was added to the cell suspension and mixed by vortexing for 30 seconds prior to 10 minutes incubation at 56ºC. 200 µL of absolute ethanol was then added and the mixture transferred to a spin column placed in a 2 mL collection tube, and centrifuged at 15,000 G for 1 minute. The spin column was added to a new collection tube and 500 µL of buffer AW1 was added to the spin column before centrifugation at 15,000 G for 1 minute. The spin column was added to a new collection tube and 500 uL of buffer AW2 was added and centrifuged at 15,000 G for 3 minutes. The spin column was then placed in a

1.5ml tube, 200 µL of Buffer AE was added, and centrifuged at 15,000 G for 1 minute to elute the gDNA. gDNA was quantified using a NanoDrop 2000 spectrophotometer.

58

2.2.5 RNA Techniques

2.2.5.1 RNA Extraction

RNA was extracted from cultured HAP1 cells once they reached 70 % confluency. Cells were trypsinised, 6 mL of HAP1 medium added, the cells transferred into 15 mL tubes, pelleted by centrifugation, the medium removed, and the pellet washed twice with of 1 x PBS with pelleting by centrifugation between the washing steps. The 1 x PBS was removed and the pellets suspended in 2 mL of 1 x PBS and stored on ice.

RNA extraction was performed using the RNeasy RNA Minikit Kit (Qiagen) according to the manufacturer’s instructions. 21 µL of 14.3 M β-ME was added to 2.08 mL of Buffer RLT and mixed by pipetting. 350 µL of this solution was added to the cell suspensions and mixed by pipetting to lyse the cells and inactivate RNases. The cell lysates were homogenized by vigorous vortexing for 30 seconds and 350 µL of absolute ethanol was added and mixed by pipetting. 700 µL of the homogenized lysates were transferred to an RNeasy spin column placed in a 2 mL collection tube and centrifuged at 17,000 G for 30 seconds. The flow through was discarded, 700 µL of Buffer RW1 (previously diluted with absolute ethanol) was added, and the samples centrifuged at 17,000 G for 15 seconds. The flow through was discarded and 500 µL of Buffer RPE was added to the spin columns prior to centrifugation at

17,000 G for 15 seconds. The flow through was discarded and an additional 500 µL of Buffer

RPE was added to the spin columns prior to centrifugation at 17,000 G for 2 minutes. The

RNeasy spin columns were placed in new 2 mL collection tubes and centrifuged at 17,000 G for 1 minute. The RNeasy spin columns were then placed in 1.5 mL collection tubes, 50 µL of RNase-free water was added directly to the RNeasy spin column membrane, and the column incubated for 5 minutes at room temperature. Following incubation, the spin column was centrifuged at 17,000 G for 1 minute to collect the purified RNA. The extracted RNA was quantified on a NanoDrop 2000 spectrophotometer in duplicate and stored at -80ºC. 59

2.2.5.2 Agarose Gel Electrophoresis for Evaluation of RNA Integrity

Extracted RNA samples were analysed for integrity using a denaturing and RNAse-inhibiting bleach gel as previously described [110]. 0.5 g of agarose was added to 50 mL of 1 x TAE and 600 µL of Chlorox bleach (6 % Sodium Hypochlorite) and the mixture incubated for 5 minutes at room temperature and swirled to mix each minute. The mixture was then heated in a microwave for 60 seconds and cast into an agarose gel.

3 µg of RNA was mixed with 1 x loading buffer and electrophoresed on the 1 % agarose bleach gel for 1 hour at 90 V.

2.2.6 Protein Techniques

2.2.6.1 Protein Extraction from HAP1 Cells

Whole cell protein extracts were prepared from HAP1 cells growing in 25 cm2 flasks when cells reached ~70 % confluency. The HAP1 medium was aspirated, cells washed twice with ice-cold 1 x PBS and kept on ice during the extraction of whole cell lysates. Prior to cell lysis, a protease inhibitor was dissolved in 10 mL of ice-cold Lysis Buffer and the Lysis Buffer kept on ice until the protease inhibitor completely dissolved. 150 µL of lysis Buffer was then added to directly to the cell monolayer, cell scrapers were used to collect the cell lysates in the corner of the flask and the cell lysates were transferred to 1.5 mL tubes. A 25-gauge needle was then used to pull the cell lysates up and down 10 times to further lyse any remaining cells. The lysates were then pelleted by centrifugation at 17,000 G for 30 minutes at 4ºC. The supernatant containing the protein was then collected in a 1.5 mL tube and quantified using the Bicinchoninic Acid (BCATM) assay. 60

2.2.6.2 Protein quantitation using the BCATM assay

Protein quantitation of total proteins in cleared lysates was performed using the BCA assay according to the manufacturer’s instructions (Thermo Fisher Scientific). The BCA assay is a colorimetric detection assay that measures the chelation of two molecules of bicinchoninic acid with a cuprous ion (Cu+), forming a purple product that strongly absorbs light at a wavelength of 562 nm. Cuprous ions are formed by the biuret reaction, whereby reduction of cupric ions (Cu2+) by peptide bonds in protein occurs in an alkaline medium. The amount of reduction that occurs scales near linearly with protein concentration, resulting in increasing quantities of purple product production with increasing protein concentration. The amount of protein present in a sample can therefore be determined by measuring the absorbance of light of a protein sample using a spectrophotometer with an incident wavelength of 562 nm and comparing the absorbance to a standard curve of absorbance values generated from known quantities of protein.

A standard curve was first generated using protein standards of 1,500 ng/µL, 1,000 ng/µL,

750 ng/µL, 500 ng/µL, 250 ng/µL, 125 ng/µL, 25 ng/µL and 0 ng/µL from a 2,000 ng/µL stock solution of bovine serum albumin (BSA) by serial dilution with cell Lysis Buffer to control for background absorbance of the Lysis Buffer in the protein lysates. The serial dilution scheme is summarized in table 2.1 and was prepared in 1.5 mL tubes with the volume Lysis Buffer added to each tube first then the BSA starting at standard B and ending with standard I.

61

Table 2.4 Dilution Scheme for Preparation of BSA standards Standard Final BSA Volume of lysis Volume (µL) and concentration (ng/µL) buffer (µL) source of BSA Stock 2,000 0 N/A B 1,500 7.5 22.5 of stock C 1,000 15 15 of stock D 750 15 15 of B E 500 15 15 of C F 250 15 15 of E G 125 15 15 of F H 25 24 6 of G I 0 15 0 = blank

HAP1 lysates were prepared in triplicate for each dilution in 1.5 mL tubes and were prepared by adding 2 µL of lysates to 8 µL of Lysis Buffer (1:5 dilution), 40 µL of Lysis Buffer (1:20 dilution), and 100 µL of Lysis Buffer (1:50 dilution). From each of the HAP1 lysate dilutions and triplicates, as well as the protein standards, 10 µL was added to a fresh 1.5 mL tube with

200 µL of BCA reagent. The BCA reagent was prepared by adding 3,150 µL of BCA reagent

A to 63 µL of BCA reagent B (50:1) in a 15 mL tube. The proteins were then quantified using a NanoDrop 2000 spectrophotometer using the inbuilt BCA program with Lysis Buffer used as a blank. The total proteins in the HAP1 lysates were then quantified from the BSA standard curve.

62

2.2.6.3 Polyacrylamide Gel Electrophoresis

Polyacrylamide gel electrophoresis was used to separate protein samples on sodium dodecyl sulfate polyacrylamide (SDS-PAGE) gels. The Mini-Protean® Tetra cell electrophoresis gels and tanks were set up according to the manufacturer’s instructions (Bio-Rad). A 7.5 % separating gel was prepared and poured into the gel cassette with a layer of ddH2O added to prevent evaporation. After 1 hour incubation at room temperature, the ddH2O was removed by blotting and a stacking gel solution was added immediately to the gel cassette and a comb was inserted. The stacking gel was left to set for 1 hour and was topped-up with additional stacking solution as required. Once the gel was set, the comb was removed and the electrophoresis apparatus was set up according to the manufacturer’s instructions and placed in an electrophoresis tank containing 1 x Transfer Buffer. While the gel was setting, 75 µg of protein was added to an equivalent volume of 2 x SDS-PAGE loading buffer and heated in a boiling water bath for 5 minutes and centrifuged to collect droplets before loading the entire volume into a well of the SDS-PAGE gel. In addition to the 2 biological replicates for the wild type and mutant cell lines, a protein standard (Bio-Rad Precision ProteinTM Dual Colour

Standard) was electrophoresed on the SDS-PAGE gel to allow for protein size determination.

6 µL of the protein standard was loaded directly into one of the wells. Proteins were first electrophoresed through the stacking gel at 60 V and then through the separating gel at 180 V until the 25 kDa protein standard reached the end of the gel. This was conducted to allow for sufficient resolution of the two large SRRM2 protein isoforms (300 kDa and 250 kDa) while allowing for the smaller SRRM2 protein isoform (34 kDa) to be present on the gel.

63

2.2.6.4 Electroblotting

The proteins separated by polyacrylamide gel electrophoresis were transferred from the gel to a Protran premium (0.45 µm) nitrocellulose membrane (Amersham) using an electro-blot apparatus (Bio-Rad). The gel and membrane were placed against one another, and layers of

Whatman® chromatography paper and fiber pads were placed on either sides and held together by a cassette (Bio-Rad). The cassette was then placed inside the tank filled with ice- cold 1 x Transfer Buffer and electrophoresed at 30 V overnight with stirring at 4ºC. The following day, the nitrocellulose membrane was stained with Ponceau solution for 5 minutes to confirm the transfer of protein. Following confirmation of protein transfer, the membrane was washed with sterile ddH2O prior to immunodetection of proteins.

2.2.6.5 Immunodetection of Proteins

The membrane was placed in a 50 mL falcon tube and first washed in 5 mL of Blocking

Buffer for 15 minutes at room temperature on a rotor. The Blocking Buffer was then removed and the membrane incubated with the primary antibody in 5 mL Blocking Buffer on a rotor.

Following incubation, the membrane was washed 3 times in 10 mL of TBS-T for 10 minutes each on a rotor at room temperature before incubating the with the secondary antibody in 5 mL of Blocking Buffer on a rotor. The membrane was then washed 3 times in 10 mL of 1 x

TBS-T for 15 minutes each. After the final wash, the membrane was rinsed with 1 x TBS.

The immunostained proteins were incubated for 1 minute with Enhanced Chemiluminescence

Reagent (ECL), the membrane dried by gentle shaking, and the proteins visualized in a

ChemiDoc XRS+. The ECL reagent was prepared by combining equal volumes of enhanced luminol and oxidizing reagents according to the manufacturer’s instructions (Perkin-Elmer).

64

Between visualization of different proteins, the membrane was washed 3 times in 10 mL of 1 x TBS-T for 15 minutes each and rinsed with 1 x TBS.

For SRRM2, the primary rabbit anti-SRRM2 antibody (1:200 dilution) was incubated in BSA

Blocking Buffer in 1 x TBS overnight at 4ºC. The secondary goat anti-rabbit HRP antibody

(1:5000 dilution) was incubated in 3% Skim Milk Blocking Buffer for 30 minutes at room temperature.

For α –tubulin, the primary mouse anti-tubulin antibody (1:5000 dilution) was incubated in

Skim Milk Blocking Buffer for 30 minutes. The secondary goat anti-mouse HRP antibody

(1:5000 dilution) was incubated for 30 minutes in 3% Skim Milk Blocking Buffer at room temperature.

65

Chapter 3 - Construction of Mammalian Expression

Vectors Containing the Wild type or Mutant Human

SRRM2 ORF for Generation of Model Cell Systems of

SRRM2 Mutations

3.1 Introduction

In vitro site-directed mutagenesis is the process whereby a DNA sequence is altered by the addition, subtraction or substitution, of one or more nucleotides for another [111]. This process can be achieved by multiple mechanisms, one of which is the widely-used oligonucleotide-based site-directed mutagenesis method [112]. This method involves the design of oligonucleotides containing the desired mutation and the use of these oligonucleotides as primers for DNA synthesis off a DNA template, such as a plasmid, and can therefore generate plasmids with desired gene mutation s [111]. For site-directed mutagenesis of a plasmid template, two oligonucleotide primers, one complementary to the forward strand and the other the reverse strand, are designed to contain the desired mutation(s) centered in the middle of the primer sequence, flanked by regions homologous to the target sequence [111]. The single-stranded primers hybridize to the target sequence through

Watson-Crick base pairing between the regions of the primer homologous to the target sequence, and DNA synthesis occurs generating a complementary template containing the 66 desired mutation(s) and resulting in mixture of mutant and template vectors. As almost all

DNA isolated from E. coli is methylated or hemimethylated, the template plasmid can be eliminated from the reaction by digestion with endonucleases that specifically digest methylated/hemimethylated DNA, such as Dpn I (target sequence: 5’-Gm6ATC-3’), leaving the mutant plasmid that is not methylated/hemimethylated [113].

A

B

C

Figure 3.1. Site-directed Mutagenesis Overview Using the Quikchange II XL Site- Directed Mutagenesis Kit (Aglient Technologies)*. A) Mutagenic primers (blue and pink) containing the desired mutation (marked by X’s) bind the template plasmid (green and yellow) and thermal cycling is performed to denature the template DNA and synthesize the complementary strands with the addition of polymerase. B) DNA synthesis results in mutant (blue and pink) vector production, both wild type and mutant plasmids can then function as a template for synthesis of additional mutant plasmids. C) The wild type plasmid is eliminated

67 by Dpn I digestion, leaving only the mutant plasmid. *Modified from the Quikchange II XL Site-Directed Mutagenesis Kit protocol.

If the DNA sequence modified by site-directed mutagenesis is a cloned gene in an expression vector and alters the amino acid sequence of the gene, expression of this gene in a model cell system can result in the production of a protein with an altered function, such as reduced activity, loss of function or gain of function. The ability to make precise modifications to gene sequences has enabled scientists to explore the functional consequences of gene mutations, such as mutations associated with disease, by creating model cell systems with the wild type and mutant genes cloned into separate expression vectors and transfected into model cell systems. This allows for the comparison of wild type and mutant model cell systems and the identification of disease-causing processes.

The hypothesis of this chapter: SRRM2 wild type and mutant (R1805W) expression vectors can be generated using cloning techniques and site-directed mutagenesis to create model cell systems to study the association of R1805W to FNMTC.

The aims of this chapter are to generate wild type and mutant R1805W SRRM2 mammalian expression vectors through cloning techniques and site-directed mutagenesis.

68

3.2 Method

3.2.1 Construction of Mammalian Expression Vectors Containing the Wild Type Human

SRRM2 ORF

3.2.1.1 SRRM2 cDNA synthesis

The human SRRM2 cDNA was obtained from Kasuza in the bacterial Pf1K-SRRM2 vector

(Promega). The Pf1K-SRRM2 vector was transformed into XL2 Blue Competent cells and inoculated into 2 x YT broth containing kanamycin. Pf1K-SRRM2 was then extracted using a

Wizard Plus SV Miniprep and quantified on a NanoDrop 2000 spectrophotometer.

Primers were designed to achieve the following:

1. Amplify the SRRM2 cDNA from the pf1K-SRRM2 vector.

2. Produce a C-terminally FLAG-tagged SRRM2 construct by adding the FLAG DNA

sequence to the 5’ end of the reverse primer.

3. Add EcoRV and NotI restriction sequences at the 5’ and 3’ ends of the SRRM2 ORF,

respectively.

4. Remove the stop codon from the SRRM2 ORF.

5. Add a stop codon to the 5’ end of the FLAG sequence.

The Qiagen LongRange PCR Kit was used to amplify the SRRM2 ORF from the pF1K-

SRRM2 vector using SRRM2_F and SRRM2_R primers in a final volume of 50 µL (table 3.1) using a two-step thermal cycling program (table 3.2).

69

Table 3.1 Reaction Components for the Amplification of SRRM2 cDNA Component 1 Reaction Final Concentration Volume LongRange PCR Buffer, 10x 5µl 1x; 2.5 mM Mg2+ dNTP mix (10mM each) 2.5µl 500µM of each dNTP SRRM2_F (10µM) 2µl 0.4µM SRRM2_R (10µM) 2µl 0.4µM NFW 37.75µl N/A LongRange PCR Enzyme 0.4µl 2 units Mix Pf1k-SRRM2 vector 0.35µl (20ng) N/A

Table 3.2 Thermal Cycling Conditions for the Amplification of SRRM2 ORF Temperature (ºC) Time Cycles 94 3 Minutes 1 94 15 Seconds 40 30 Seconds 10 68 9 Minutes 94 15 Seconds 64 30 Seconds 25 68 9 Minutes 68 12 Minutes 1

3.2.1.1 SRRM2 cDNA preparation

The PCR amplified SRRM2 ORF was electrophoresed on a 1% agarose gel at 100 V for 1 hour and extracted using the QIAEX gel Extraction Kit and quantified on a NanoDrop 2000 spectrophotometer. 1 µg of purified SRRM2 ORF was digested with EcoRV-HF and NotI-HF overnight and the restriction endonucleases inactivated by incubation at 60ºC for 20 minutes.

3.2.1.2 pcDNA3-EGFP and pcDNA3.1 Vector Preparation

The pcDNA3 and pcDNA3.1 vectors were transformed into XL2 Blue cells and the XL2

Blue cells plated onto 2 x YT agar containing ampicillin and incubated overnight at 37ºC.

Single colonies were identified and inoculated into 2 x YT medium with ampicillin and

70 grown overnight at 37ºC with shaking at 220 rpm prior to extraction using the Wizard Plus

SV MiniKit. The purified vectors were digested with EcoRV-HF and NotI-HF overnight at

37ºC, electrophoresed on a 1% agarose gel and extracted using the QIAEX Gel Extraction kit.

3.2.1.3 Ligation of SRRM2 ORF into pcDNA3-EGFP and pcDNA3.1 vectors

The digested SRRM2 ORF was ligated into the pcDNA3-EGFP and pcDNA3.1 vectors with a in a total reaction volume of 20 µL. For ligation into pcDNA3-EGFP and pcDNA3.1, 460 ng and 380 ng of SRRM2 ORF was used in the ligation reactions, respectively. Following incubation, 2 µL of the ligation reactions were transformed into Hanahan XL2 Blue

Competent Cells and plated on 2 x YT agar containing ampicillin and incubated at 37ºC overnight.

3.2.1.4 Colony PCR for the Identification of Colonies Containing the pcDNA3-EGFP-SRRM2 and pcDNA3.1-SRRM2 Vectors

Colony PCR was performed using the SRRM2_F and SRRM2_IP12 primers to identify colonies containing the pcDNA3-EGFP-SRRM2 and pcDNA3.1-SRRM2 vectors. For each of the vectors to be constructed, a master mix of colony PCR reaction components was prepared and aliquoted into PCR tubes and kept on ice. A pipette tip was then used to add a small amount of a E. coli colony into the PCR reaction and mixed by pipetting up and down.

For pcDNA3.1-SRRM2 screening, 27 colonies were inoculated into PCR tubes, and for pcDNA3-EGFP-SRRM2 screening, 18 colonies were inoculated into PCR tubes. For both

Colony PCR experiments, a no template control was included with no colony added to the

71

PCR mix and a control reaction containing 20 ng of purified pcDNA3.1 or pcDNA3-EGFP was also included. Table 3.3 shows the thermal cycling conditions.

Table 3.3 PCR Cycling Conditions for Colony PCR Temperature (ºC) Duration Cycles 95 5 Minutes 1 95 30s 54 30s 35 72 1m 72 7m 1

PCR products were electrophoresed on a 1% agarose gel for 1 hour at 100 V and colonies containing the desired constructs were identified. The next day, single colonies positive for the desired constructs were inoculated into 2 x YT medium with ampicillin and plasmids were extracted using the Wizard Plus SV Plasmid Miniprep kit the following day. Purified plasmids containing the SRRM2 ORF were identified by a restriction enzyme digest using

EcoRV-HF and NotI-HF.

3.2.1.5 Sequencing of the SRRM2 ORF in the pcDNA3.1-SRRM2 vector

The pcDNA3.1-SRRM2 was sequenced to confirm that no errors had occurred during PCR amplification that alter the SRRM2 amino acid sequence. This was achieved with the

SRRM2_IP1 – SRRM2_IP12 primers.

72

3.2.2 Attempted Construction of Mammalian Expression Vectors Containing the Mutant

R1805W SRRM2 cDNA

3.2.2.1 Sub-cloning a 3.1kb SRRM2 fragment into the pDrive Vector

Primers were designed to amplify a 3.1kb region of the SRRM2 ORF containing the R1805W codon from the pcDNA3.1-SRRM2 vector. PCR amplification was conducted with the

R1805W_CF and R1805_CR primers in a final reaction volume of 50 µL with 50 ng of pcDNA3.1-SRRM2 template. The thermal cycling conditions are shown in table 3.4.

Table 3.4 PCR cycling conditions for PCR Amplification of a 3.1kb SRRM2 Fragment Temperature (ºC) Duration Cycles 94 3m 1 94 30s 52 30s 35 72 1m 72 7m 1

The PCR product was electrophoresed on a 1 % agarose gel for 1 hour prior to purification using a QIAEX Gel Extraction kit. The purified PCR product was subsequently ligated into the pDrive vector using the QIAGEN PCR Cloning Kit with a 5 x molar excess of PCR product to pDrive vector and a total reaction volume of 10 µL. A 3.1kb fragment was selected as it was the smallest fragment that could be subcloned with unique restriction sites at each end for later cloning back into pcDNA3.1-SRRM2 and pcDNA3-EGFP-SRRM2 vectors following mutagenesis. Following incubation, 2 µL of the ligation reaction was transformed into XL2 Blue Cells and plated onto 2 x YT agar containing ampicillin and incubated overnight at 37ºC. The following day, single colonies were inoculated into 2 x YT medium containing ampicillin and were extracted using the Wizard Plus SV Miniprep kit

73 after 24 hours incubation at 37ºC with shaking. Purified plasmids containing the PCR product were identified by a restriction enzyme digestion using BsmBI.

3.2.2.2 Site-Directed Mutagenesis to Generate the R1805W SRRM2 cDNA

Mutagenic primers were designed to introduce a single nucleotide substitution (C àT) into the SRRM2 ORF, changing the amino acid sequence from arginine to tryptophan. This was achieved by designing primers to contain the CàT nucleotide substitution centered in the middle of the primer with rest of the sequence homologous to the SRRM2 gene.

Site-directed mutagenesis was performed on the pDrive-SRRM2 vector using the

Quikchange II XL Site-Directed Mutagenesis Kit (Agilent Technologies) in a final reaction volume of 50 µL with the R1805W_F and R1805W_R primers. The reaction components and thermal cycling conditions are shown in Table 3.5 and 3.6, respectively.

Table 3.5 Reaction Components for Site-Directed Mutagenesis Component Volume/Reaction 10X reaction buffer 5µl DNA template (50ng) 1.38µl Primers (10µM) 1.11µl each dNTP 1µl QuikSolution 3µl ddH2O 37.4µl PfuUltra (2.5U/µl) 1µl

Table 3.6 Thermal Cycling Conditions for Site-Directed Mutagenesis Temperature (ºC) Duration Cycles 95 1 minute 1 95 50 seconds 60 50 seconds 18 68 4 minutes 68 7 minutes 1

74

Following the mutagenesis reaction, template DNA was digested with by the addition of 1 µL of Dpn I to the reaction followed by 2 hours incubation at 37ºC. The mutagenesis reaction was then transformed into XL10 Gold Ultracompetent cells according to the manufacturer’s instructions (Agilent Technologies). The following day single colonies were inoculated into 2 x YT medium containing ampicillin and the plasmids extracted using a Wizard Plus SV

Miniprep kit. Extracted colonies were screened for the desired mutation by Sanger sequencing.

75

3.3 Results

3.3.1 Construction of Mammalian Expression Vectors Containing the Wild Type Human

SRRM2 ORF

A flow diagram of the cloning techniques used in this chapter is shown in figure 3.2.

Figure 3.3 Agarose Gel Electrophoresis Image of pcDNA3.1-SRRM2 Screening with Colony PCR. Lane 1, 1kb+ DNA ladder. Lane 2, colony transformed with pcDNA3.1 (control). Lanes 3-29, Colonies 1-27, transformed with pcDNA3.1-SRRM2 ligation reaction. Lane 30, NTC. Colonies containing the pcDNA3.1-SRRM2 construct were identified by the presence of a band at 729bp.19 of the 27 colonies were positive for the pcDNA3.1-SRRM2 construct (Lanes 3, 5-20, 22 and 24). 10 of the 27 colonies were negative for the pcDNA3.1- SRRM2 construct (Lanes 2, 4, 21, 23, 25-29). No amplification was observed in the pcDNA3.1 control (Lane 2) or the NTC (Lane 30).

76

Figure 3.4 Agarose Gel Electrophoresis Image of pcDNA3-EGFP-SRRM2 Screening with Colony PCR. Lane 1, 1kb+ DNA ladder. Lane 2, colony transformed with pcDNA3- EGFP (control). Lanes 3-21, Colonies 1-18, transformed with the pcDNA3-EGFP-SRRM2 ligation reaction. Lane 22, NTC. Colonies containing the pcDNA3.1-SRRM2 construct were identified by the presence of a band at 729bp. 17 of the 19 colonies were positive for the pcDNA3-EGFP-SRRM2 construct (Lanes 3, 5-18, 20 and 21). 2 of the 19 colonies was negative for the pcDNA3-EGFP-SRRM2 construct (Lanes 4 and 19). Non-specific amplification was observed in lanes from all colonies (Lanes 3 -21) and the pcDNA3-EGFP control (Lane 2). No amplification of the 729bp fragment was observed in the pcDNA3.1 control (Lane 2) or the NTC (Lane 30).

77

Figure 3.2 Flow Diagram of the Construction of the Wild Type pcDNA3-EGFP-SRRM2 and pcDNA3.1-SRRM Vectors and the Attempted Mutagenesis. 1) PCR was performed to amplify the 8.3 kb SRRM2 ORF from the pF1k-SRRM2 Vector. 2 and 3) SRRM2 and pcDNA3/3.1 Vectors were digested with NotI-HF and EcoRV-HF. 4) The SRRM2 ORF was ligated into pcDNA3/3.1. 5) A 3.1kb SRRM2 fragment is PCR amplified and 6) ligated into the pDrive vector. 7) Site-directed mutagenesis was performed on the pDrive-SRRM2 vector.

78

Following identification of colonies potentially containing the desired constructs

(Figures 3.3 and 3.4), colonies were inoculated and grown overnight in 2 x YT with ampicillin and the plasmids extracted using a Wizard SV Plus Miniprep Kit. A restriction enzyme digest with NotI-HF and EcoRV-HF was used to confirm the presence of SRRM2 in the plasmids (figure 3.5). The purified plasmids were then sent for Sanger sequencing to ensure that no errors occurred during PCR amplification that would alter the SRRM2 amino acid sequence.

Figure 3.5 Agarose Gel Electrophoresis Image of pcDNA3-EGFP-SRRM2 and pcDNA3.1-SRRM2 Constructs Digested with EcoRV-HF and NotI-HF. Lane 1, 1kb+ DNA ladder. Lane 2, digested pcDNA3-EGFP-SRRM2. Lane 3 digested pcDNA3-EGFP (control). Lane 4, digested pcDNA3.1-SRRM2. Lane 4, digested pcDNA3.1 (control). The presence of SRRM2 cDNA in pcDNA3-EGFP was demonstrated by bands at 8,300bp (SRRM2) and 6,400bp (pcDNA3-EGFP) (Lane 2), and in pcDNA3.1 with bands at 8,300bp (SRRM2) and 5,400bp (pcDNA3.1).

.

79

3.3.2 Attempted Construction of Mammalian Expression Vectors Containing the

R1805W SRRM2 ORF

A 3.1kb fragment of the SRRM2 cDNA was PCR amplified from the pcDNA3.1-

SRRM2 vector and subcloned into the pDrive vector to produce the pDrive-SRRM2 construct for site directed mutagenesis using the QuikChange II XL Site-Directed

Mutagenesis Kit. Successful cloning of the SRRM2 fragment into pDRIVE was confirmed with a restriction enzyme digest using BsmBI (figure 3.6).

Figure 3.6 Agarose Gel Electrophoresis Image of BsmBI-digested pDRIVE- SRRM2 colonies. Lane 1, 1kb+ DNA ladder, Lanes 2-5, colonies 1-4. The presence of the 3.1kb fragment in Lanes 4 and 5 indicate successful ligation of the SRRM2 fragment into the pDRIVE vector.

80

Following cloning of the SRRM2 fragment into pDRIVE, site-directed mutagenesis was performed and the purified vector Sanger sequenced with the R1805W_SPF primer to confirm the presence of the desired mutation.

Figure 3.7 Sequencing Chromatogram of the pDrive-SRRM2 Vector Following Site-Directed Mutagenesis. In red, the R1805W mutation in the pDrive-SRRM2 vector following successful site-directed Mutagenesis. The mutation changes codon from CGG (Arginine) to TGG (Tryptophan).

Figure 3.8 Sequencing Chromatogram of the pDrive-SRRM2 Vector Following Site-Directed Mutagenesis. In red, an additional, undesired mutation was introduced into the SRRM2 fragment upstream from the R1805W mutation. The mutation changes the codon from TCT (Serine) to CCT (Proline).

81

The site-directed mutagenesis reaction was successful in introducing the R1805W mutation into the pDRIVE-SRRM2 vector (Figure 3.7), but an additional mutation was observed upstream, changing the amino acid sequence from serine (TCT) to proline

(CCT) (figure 3.8).

82

3.4 Discussion

3.4.1 Construction of Mammalian Expression Vectors Containing the Wild Type

Human SRRM2 ORF

Initial attempts at constructing pcDNA3-EGFP and pcDNA3.1 expression vectors containing the wild type SRRM2 cDNA were unsuccessful, and extensive troubleshooting revealed that the issue preventing successful construction of the pcDNA3-EGFP-SRRM2 and pcDNA3.1-SRRM2 vectors was the competency of the

XL2 Blue cells used for transformation. To overcome this problem, XL2 Blue Cells were made competent by the Hanahan method, a method that yields highly competent cells, and the vectors were successfully transformed and extracted.

Colony PCR was used to screen for colonies containing the desired constructs when the issue preventing successful construction of the vectors was not known. It was chosen as it is a high throughput method to rapidly screen many colonies simultaneously. As the majority of the colonies were positive for the desired constructs after transformation into Hanahan XL2 Blue cells, it appears that the pcDNA3-EGFP-SRRM2 (14.7kb) and pcDNA3.1-SRRM2 (13.7kb) vectors were too large to be successfully transformed into the XL2 Blue cells prepared by the standard method. This also demonstrates the high level of competency of the Hanahan XL2 Blue cells. Given that the majority of

Hanahan XL2 Blue cell colonies were positive for pcDNA3-EGFP-SRRM2 and pcDNA3.1-SRRM2 constructs, it was not necessary to screen colonies using Colony

PCR, simply inoculating 3-4 colonies in 2 x YT and extracting the plasmids with a

Wizard Plus SV Miniprep followed by a restriction enzyme digest would have been sufficient, however, this was not known at the time. An alternative to this method 83 would have been to use electroporation, which has been shown to be successful in transforming large plasmids [114].

3.4.2 Attempted Construction of Mammalian Expression Vectors Containing the

R1805W SRRM2 ORF

Following unsuccessful attempts at performing the site-directed mutagenesis reaction on the pcDNA3.1-SRRM2 vector using the QuikChange II XL Site-Directed

Mutagenesis Kit, a kit specifically designed for large plasmid templates, a 3.1kb

SRRM2 fragment was subcloned into the pDrive vector to produce a smaller template for the mutagenesis reaction (13.7kb for pcDNA3.1-SRRM2 vs. 6.9kb for pDRIVE-

SRRM2). Sequencing revealed the mutagenesis reaction was successful in generating the R1805W mutation, however, an additional mutation was observed upstream, changing the amino acid sequence from serine (TCT) to proline (CCT). The pDrive-

SRRM2 vector was then to be digested with BsmBI, the 3.1kb SRRM2 fragment gel purified and inserted into BsmBI-digested pcDNA3-EGFP-SRRM2 and pcDNA3.1-

SRRM2 vectors, creating the mutant R1805W expression vector.

The presence of the additional mutation prevented the construction of the R1805W mutant SRRM2 expression vectors for later use in generating a mutant model cell system to study the association of R1805W to FNMTC, as the additional mutation may have altered the normal function of the SRRM2 protein, and any data generated from the mutant cell system may have been caused by the additional mutation rather than

R1805W, and therefore it would not have been an appropriate model to study the association of R1805W to FNMTC. A 3.1kb fragment was subcloned into the pDrive

84 vector for the mutagenesis reaction as it was the smallest possible SRRM2 fragment that contained the R1805W codon and unique restriction sites (BsmBI) for later cloning back into pcDNA3-EGFF-SRRM2 and pcDNA3.1-SRRM2. The additional mutation could have been introduced during the mutagenesis reaction or alternatively and more likely, was generated during PCR amplification of the 3.1kb fragment from the pcDNA3.1-SRRM2 vector. This is more likely as the PCR amplification was performed with Taq polymerase, which has a much lower fidelity than PfuUltra, that was used in the mutagenesis reaction.

An attempt was made to perform the subcloning with the Qiagen LongRange PCR Kit, previously used to successfully amplify the 8.3kb SRRM2 cDNA without any errors, however, while the amplification was successful, cloning the fragment into the pDrive vector was unsuccessful. Cloning into pDrive is achieved via binding of uracil overhangs on the 5’ and 3’ ends of the pDrive vector to adenine (A) overhangs on the

3’ end of the PCR product [115]. Taq and many other polymerases generate 3’A overhands during PCR amplification, however, high-fidelity polymerases, such as those in the Qiagen LongRange PCR kit, have a proofreading 3’à5’ exonuclease activity, where these polymerases will remove incorrectly incorporated bases from the PCR product, such as 3’A overhangs [115]. This is the mechanism by which these polymerases achieve high-fidelity amplification, and prevented the subcloning of the amplified fragment into pDrive.

3’A overhangs could have been added to the SRRM2 fragment generated by the Qiagen

LongRange PCR kit by incubating the fragment with Taq polymerase in a standard

PCR reaction without primers after amplification and purification of the PCR product,

85 however, due to time constraints this was not achieved. Additionally, the mutation potentially could have been introduced by PfuUltra, rather than Taq, and this could have been confirmed by sequencing the 3.1kb PCR product prior to the mutagenesis reaction, but this was not conducted.

3.4.3 Conclusion

The aim of this chapter was to generate wild type and mutant R1805W SRRM2 mammalian expression vectors for later use in generating model cell systems to study the association of SRRM2 mutations with FNMTC. The generation of the wild type pcDNA3-EGFP-SRRM2 and pcDNA3.1-SRRM2 expression vectors was successful, however, the generation of the R1805W mutant expression vector was abandoned at the discovery of the additional mutation in favor of generating model cell systems of

SRRM2 mutations using CRISPR/Cas9 genome engineering in the HAP1 cell line.

86

Chapter 4 – Establishing the Protocol for

CRISPR/Cas9 Genome Engineering in the HAP1 Cell

Line and Generation of Mutant HAP1 Cell Lines with

SRRM2 Mutations Associated with FNMTC

4.1 Introduction

The discovery of programmable sequence-specific endonucleases has enabled scientists to make precise modifications to the genome of many organisms and holds immense promise in both basic science and clinical medicine [116]. Several genome-editing tools have been discovered in recent years, including zinc-finger nucleases (ZFN), transcription activator-like effector nucleases (TALENS) and the CRISPR/Cas9 system

[117]. These technologies function by inducing double-stranded breaks (DSBs) in DNA at specific genomic loci by an endonuclease protein domain, however, the

CRISPR/Cas9 is distinct from TALENS and ZFN insofar that the endonuclease protein domain is tethered to a small guide RNA (sgRNA) that determines the cleavage site by

Watson-Crick base pairing between a portion of the sgRNA to the genomic loci, whereas the endonuclease protein domain of TALENS and ZFN is fused to DNA- binding proteins that determined the cleavage specificity [116].

The CRISPR/Cas system consists of two components, the endonuclease domain (Cas9) that produces double-stranded breaks (DSBs) in DNA and the sgRNA that determines the genomic loci [117]. The sgRNA consists of two domains, the guide-RNA (gRNA) domain that determined by the cleavage specificity by binding to a homologous

87 sequence in the genome, and the tracrRNA domain that exhibits a strong hair-pin structure and binds Cas9 [118].

Figure 4.1 The Structure of the sgRNA [119]* The guide-RNA sequence is target specific and localizes Cas9 to the genomic loci to be cleaved via Watson-Crick base pairing to the genomic loci. The gRNA sequence is altered to change the specificity of the cleavage site. The tracrRNA binds the Cas9 protein and is kept constant regardless of the target cleavage site [116]. *Modified from Ran et al. 2013.

In addition to a sgRNA containing a gRNA sequence specific to genomic loci of interest, for a DSB to be introduced into the genome, Cas9 requires a protospacer adjacent motif (PAM) sequence to be present at the genomic loci [120]. A PAM sequence is a 2-6bp DNA sequence located in the genome immediately following the gRNA binding sequence [121]. For the streptococcus pyogenes CRISPR/Cas9 system,

PAM requirement is 5’-NGG-3’, where N is any nucleotide and G is guanine [121]. For

Cas9 to create a DSB, the gRNA sequence must bind immediately upstream from the

NGG sequence, and therefore the possible cleavage sites within the genome, and gRNA sequences, are dependent on the availability of these PAM sequences [116]. When the gRNA sequence binds immediately upstream of the PAM sequence, Cas9 induces a 88

DSB 3bp upstream from the PAM sequence (3bp upstream from the 3’ end of the DNA sequence that the gRNA binds) [122].

Figure 4.2 Schematics of the CRISPR/Cas9 system creating A DSB in genomic DNA. The gRNA sequence binds to the complementary DNA sequence immediately upstream of the PAM sequence through Watson-Crick base pairing. A DSB is made 3bp upstream of the PAM sequence [123].

Once a DSB is the genome is created, the cell repairs the break via one of two pathways: the non-homologous end joining (NHEJ) pathway or homology-directed repair (HDR) pathway, with the pathway that is manipulated for genome editing purposes being dependent on the desired outcome (figure 4.3) [124]. The NHEJ pathway is error-prone and functions by re-ligating the two cleaved strands back together, typically resulting in insertion/deletion (indel) mutations, which can be manipulated to create gene knock- outs by frameshift mutations or the introduction of a premature stop codon if the genomic loci occurs within the coding region of an exon [116]. When an exogenous repair template is provided, in the form of a double-stranded DNA template or single- stranded oligonucleotide template (ssODN), the cell can repair the DSB by the HDR

89 pathway [116]. HDR occurs at much lower frequencies that NHEJ, however, the process can be successfully manipulated to create large or small scale knock-in (KI) changes by incorporating the desired changes into the exogenous repair template flanked by regions homologous to the target sequence [122].

Figure 4.3 Following CRISPR/Cas9 cleavage of dsDNA, the cell repairs the break by the Non-homologous end joining (NHEJ) pathway or the homology-directed repair (HDR) pathway. NHEJ is error prone and typically results in the introduction of indel mutations. The HDR pathway functions by repairing the ds-break based on a repair template and can be manipulated to create KI mutations if an exogenous template is provided containing the desired mutation flanked by region homologous to the target sequence [116].

The hypothesis of this chapter is: the CRISPR/Cas9 genome engineering tool can be used to generate mutant cells lines containing a single missense mutation corresponding to genomic variants associated with FNMTC.

The aim of this chapter is to establish the protocols and workflow for the use of the

CRISPR/Cas9 genome engineering tool to generate the R1805W and S346F single- nucleotide missense mutations in the genome of human HAP1 cells, generating two

90 mutant cell lines for functional characterisation of the mutations and their association to

FNMTC.

91

4.2 Method

4.2.1 Construction of sgRNA-expressing pSpCas9(BB)-2A-GFP vectors and ssODN template design for CRISPR/Cas9 genome engineering

The pSpCas9(BB)-2A-GFP (pSpCas9) vector contains the ORF for the Cas9 endonuclease and a sgRNA scaffold containing the DNA equivalent of the tracrRNA and two BbsI cloning sites upstream for the insertion of a 20bp DNA sequence that becomes the DNA sequence encoding the gRNA sequence of the sgRNA. When pSpCas9 is transfected into cells, a complete sgRNA is produced in vitro by transcription from the U6 promoter (figure 4.4), along with the Cas9 nuclease.

Figure 4.4 Partial plasmid Map of the pSpCas9(BB)-2A-GFP Vector. The sgRNA scaffold contains two BbsI cloning sites for the insertion of a DNA sequence equivalent to the gRNA sequence. Once inserted and the vector transfected into cells, a complete sgRNA is produced by transcription from the U6 promoter, along with Cas9. pSpCas9 also contains the ORF and EGFP, to allow for identification of transfected cells. Image taken from SnapGene.

92

4.2.1.1 Guide-RNA Selection and Oligonucleotide Design gRNAs targeting genomic loci close to the regions to be modified were identified using the online design tool (crispor.tefor.edu) and were selected based on the following criteria:

1. Specificity score: the inverse likelihood of producing off-target effects.

2. Proximity to genomic loci to be edited.

3. Ability to create a silent mutation in the PAM sequence of the ssODN template

without altering the amino acid sequence of the SRRM2 gene.

Table 4.1 gRNAs Selected for the R1805W and S346F Mutations gRNA Sequence (5’-> 3’) PAM Strand Specificity Bp score R1805W_gRNA1 CAACCTCTCGGCGAAGACAG CGG Forward 86% 10 R1805W_gRNA2 AGCGGAGCCGGTCAAGGTCG GGG Forward 89% 8 R1805W_gRNA3 GCGGAGCCGGTCAAGGTCGC CGG Forward 90% 7 S346F_gRNA1 TGCTGCTCCTTTCCGGAGAG GGG Reverse 76% 22 S346F_gRNA2 CTAAGAGTGGAAGAGAGAGC AGG Reverse 68% 21 Bp = Base pairs between the cut site and the desired mutation to be introduced into the SRRM2 gene

For the R1805W mutation, 3 gRNAs were selected for the introduction of the missense mutation. For the S346F mutation, 2 gRNAs were selected due to limited availability of gRNAs that met the selection criteria.

For each gRNA selected, two partially complementary oligonucleotides were designed so that they could be phosphorylated and annealed together and inserted into pSpCas9 vectors to create sgRNA-expressing vectors. The forward oligonucleotide was designed

93 to contain the DNA equivalent of the gRNA sequence and the reverse oligonucleotide the reverse complement of the forward oligonucleotide. CACCG and AAAC sequences were added to the 5’ end of the forward and reverse oligonucleotides, respectively, and an additional C base was added to the 3’ end of the reverse oligonucleotide to facilitate the cloning of these oligonucleotides into the BbsI restriction sites of the pSpCas9 vector. Table 4.2 shows the oligonucleotides designed for each gRNA sequence; underlined are the additional bases that were added for cloning in pSpCas9.

Table 4.2 Partially Complementary Oligonucleotide Design gRNA Forward oligonucleotide (5’à3’) Reverse oligonucleotide (5’à3’) R1805W CACCGCAACCTCTCGGCGAAGACAG AAACCTGTCTTCGCCGAGAGGTTGC _gRNA1 R1805W CACCGAGCGGAGCCGGTCAAGGTCG AAACCGACCTTGACCGGCTCCGCTC _gRNA2 R1805W CACCGGCGGAGCCGGTCAAGGTCGC AAACGCGACCTTGACCGGCTCCGCC _gRNA3 S346F CACCGTGCTGCTCCTTTCCGGAGAG AAACCTCTCCGGAAAGGAGCAGCAC _gRNA1 S346F CACCGCTAAGAGTGGAAGAGAGAGC AAACGCTCTCTCTTCCACTCTTAGC _gRNA2

4.2.1.2 ssODN Template Design

For each gRNA selected, a corresponding 150 base ssODN template was designed with the following alterations from the SRRM2 sequence:

1. The ssODNs contain the desired mutation to be introduced into the SRRM2 gene

of HAP1 cells.

2. A silent mutation was added in the PAM sequence to prevent Cas9 from

targeting the loci following successful genome engineering.

94

These ssODNs were designed to contain the mutation to be introduced into the genome centered in the middle of the ssODN with 75bp arms flanking the mutation homologous to the SRRM2 gene, except for the silent PAM mutation (figure 4.5 and 4.6).

For the S346F ssODNs, an additional nucleotide substitution was added within the

SRRM2 intron to introduce a ApoI restriction site for later screening of successful genome engineering events. For the R1805W mutation, no additional mutations were introduced to generate a restriction site, as the R1805W mutation removes a restriction site for BsrFaI.

95

Table 4.3 ssODN Template Designs for the R1805W Mutation gRNA Corresponding ssODN Desired Mutation PAM mutation Additional Modifications R1805W_gRNA1 R1805W _ssODN1 CGG à TGG CGG à CGA None Arg à Trp R1805W_gRNA2 R1805W _ssODN2&3 CGG à TGG GGG à GTG None Arg à Trp R1805W_gRNA3 R1805W _ssODN3&3 CGG à TGG CGG à CGT None Arg à Trp

1. Wild type Sequence 5’-GAACAACCCGACGTCGAGATAGGTCTGGATCTTCTCAGTCAACCTCTCGGCGAAGACAGCGGAGCCGGTCAAGGTC GCGGGTTACTCGGCGGCGGAGGGGAGGCTCGGTTATCACTCAAGGTCACCTGCCCGGCAGGAAAGTTCCCGGA-3’

2. R1805W_ssODN1 5’-GAACAACCCGACGTCGAGATAGGTCTGGATCTTCTCAGTCAACCTCTCGGCGAAGACAGCGAAGCTGGTCAAGGTCG CGGGTTACTCGGCGGCGGAGGGGAGGCTCTGGTTATCACTCAAGGTCACCTGCCCGGCAGGAAAGTTCCCGGA-3’

3. R1805W _ssODN2&3 5’-GAACAACCCGACGTCGAGATAGGTCTGGATCTTCTCAGTCAACCTCTCGGCGAAGACAGCGGAGCTGGTCAAGGTCG CGTGTTACTCGGCGGCGGAGGGGAGGCTCTGGTTATCACTCAAGGTCACCTGCCCGGCAGGAAAGTTCCCGGA-3’

Figure 4.5 ssODN Template Sequences for the Introduction of the R1805W Mutation into the SRRM2 Gene of HAP1 Cells. 1) Wild type sequence. 2) R1805W_ssODN1 for R1805W_gRNA1. 3) R1805W_ssODN2&3 for R1805W_gRNA2 and R1805W_gRNA3. Shown in bold and in blue is the arginine codon in the wild type sequence (1) or the tryptophan codon to be introduced into the HAP1 genome (2 and 3), with the CàT nucleotide substitution (underlined) to generate the tryptophan codon (2 and 3). In red are the PAM sequences, unaltered for the wild type sequence (1), GàA substitution for R1805W_ssODN1 (2), and a GàT substitution for R1805W _ssODN2&3 (3). Nucleotide substitutions in PAM sequences of ssODNs are underlined. Note: R1805W_gRNA2 and R1805W_gRNA3 have overlapping PAMs and a single nucleotide substitution was sufficient to remove both PAM sequences and therefore, the same template was used for both.

96

Table 4.4 ssODN Template Designs for the S346F Mutation gRNA Corresponding ssODN Desired Mutation PAM Mutation Additional Modifications S346F_gRNA1 S346F_ssODN1 TCT à TTT AGG à AGA ApoI restriction site : Ser à Phe GAATCC à GAATTC S346F_gRNA2 S346F_ssODN2 TCT à TTT GGG à GGT ApoI restriction site : Ser à Phe GAATCC à GAATTC

1. Wild type Sequence: 5’-TTGGGATCTTAGGGGTGATGTGAAGTTTTGGCGTTTATGAATCCCTATCCCTGCTCTCTCTTCCACTCTTAGAAATCTG CAACTCGACCTAGCCCCTCTCCGGAAAGGAGCAGCACAGGCCCAGAACCACCTGCTCCCACTCCGCTCCTT-3’

2. S346F_ssODN1: 5’-TTGGGATCTTAGGGGTGATGTGAAGTTTTGGCGTTTATGAATTCCTATCTCTGCTCTCTCTTCCACTCTTAGAAATTTG CAACTCGACCTAGCCCCTCTCCGGAAAGGAGCAGCACAGGCCCAGAACCACCTGCTCCCACTCCGCTCCTT-3’

3. S346F_ssODN2: 5’-TTGGGATCTTAGGGGTGATGTGAAGTTTTGGCGTTTATGAATTCCTATCCCTGCTCTCTCTTCCACTCTTAGAAATTTG CAACTCGACCTAGTCCCTCTCCGGAAAGGAGCAGCACAGGCCCAGAACCACCTGCTCCCACTCCGCTCCTT-3’

Figure 4.6. ssODN Template Sequences for the Introduction of the S346F Mutation into the SRRM2 Gene of HAP1 Cells. 1) Wild type sequence. 2) S346F_ssODN1 for S346F_gRNA1. 3) S346F_ssODN2 for S346F_gRNA2. Shown in bold and in purple is the ApoI restriction site that is absent in the wild type sequence (1), and present in S346F_ssODN1 (2) and S346F_ssODN2 (3) by a Cà T nucleotide substitution (underlined in 2 and 3). Shown in bold and in blue is the serine codon in the wild type sequence (1) or the phenylalanine codon to be introduced into the HAP1 genome (2 and 3), with CàT nucleotide substitutions (underlined) to generate the phenylalanine codon underlined (2 and 3). In red are the PAM sequences, unaltered for the wild type sequence (1), and with Cà T nucleotide substitutions in S346F_ssODN1 (2) and S346F_ssODN3 (3), for their two different PAM sequences. Note S346F gRNAs target the reverse strand and PAM mutations shown on these templates are the forward strand sequence.

97

4.2.1.3 Phosphorylation and Annealing of oligonucleotides

Partially complementary oligonucleotides were phosphorylated and annealed using the reaction components and cycling conditions shown in tables 4.5 and 4.6, respectively.

Table 4.5 Reaction components for phosphorylation and annealing of oligonucleotides Component Volume/Reaction Forward oligonucleotide (100µM) 1µl Reverse oligonucleotide (100µM) 1µl T4 DNA Ligase buffer (NEB) 1µl T4 Polynucleotide kinase (NEB) 0.5µl NFW 6.5µl NFW, Nuclease Free Water

Table 4.6 Cycling conditions for phosphorylation and annealing of oligonucleotides Temperature Duration 37ºC 30 minutes 95ºC to 25ºC -5ºC/minutes

4.2.1.4 Cloning of Annealed and Phosphorylated Oligonucleotides Into pSpCas9

Vectors

The pSpCas9 vector was a gift from Addgene and received as a bacterial stab that was streaked onto 2 x YT agar containing ampicillin and grown overnight at 37ºC. A single colony was selected the next day and inoculated into 2 x YT medium containing ampicillin and grown over night at 37ºC with shaking at 220 rpm. The vector was purified using the Wizard Plus SV Miniprep kit and quantified using a Nanodrop 2000 spectrophotometer.

98 1 µg of pSpCas9 was digested with BbsI-HF for 1 hour at 37ºC in a final reaction volume of 50 µL. The digested vector was then electrophoresed on a 1% agarose gel for

1 hour at 100 V and gel purified using the QIAEX Gel Extraction kit.

1 µL of annealed and phosphorylated oligonucleotides were diluted in 199 µL of NFW

(1:200 dilution) and ligated into the BbsI-digested pSpCas9. Following incubation of the ligation reaction, 2 µL of the ligation reaction was then transformed into XL2 blue cells and grown on 2 x YT agar containing ampicillin overnight. The next day, single colonies were inoculated into 15 mL of 2 x YT media containing ampicillin and grown overnight at 37ºC with shaking. pSpCas9 plasmids were then extracted using a Wizard

Plus SV Miniprep kit and sequenced to confirm successful ligations with the pSpCas9_SP primer. Following successful generation of the sgRNA-expressing vectors, the vectors were purified using Pure Yield Plasmid Miniprep Kit.

4.2.2 Transfection of pSpCas9 Vectors into HAP1 Cells and Enrichment of Transfected

Cells with FACS

4.2.2.1 Co-transfection of pSpCas9 Vectors and ssODNs into HAP1 cells

Each of the sgRNA-expressing pSpCas9 vectors were co-transfected with the corresponding ssODN template into separate populations of HAP1 cells using

Turbofectin 8.0 (Origene). For each of the two mutations to be generated, 2 control populations were included during transfection:

1. Wild type control: Transfected with a pSpCas9 vector without a gRNA sequence

cloned into the BbsI restriction sites. This population of cells became the wild

type control for later functional studies.

99 2. Negative control: Treated with Turbofectin 8.0 but no DNA was added.

Each transfection was completed in duplicate in 25 cm2 flasks, except for the negative control which was completed with a single 25 cm2 flask.

The day before transfection, HAP1 cells were trypsinised, pelleted (400 G for 2 minutes , washed with 1 x PBS and suspended in 10 mL of IMDM with 10% FBS without antibiotics. Cells were counted and 1,800,000 cells were transferred into 25 cm2 flasks in IMDM medium with 10% FBS and without antibiotics. 12 µg of endotoxin- free pSpCas9 was mixed with 28 µL of the corresponding ssODN (10µM) and opti-

MEM was added to a final volume of 600µL. The solution was mixed and then incubated for 15 Minutes at room temperature. After incubation, 36 µL of Turbofectin

8.0 was added, mixed by gentle pipetting and incubated at room temperature for 30

Minutes. For the wild type control, the ssODN template was replaced with 28 µL of opti-MEM. For the negative control, the DNA was replaced with opti-MEM. The resulting mixture was then added dropwise to the cells and the cells were incubated for

24 hours at 37ºC with 5% CO2.

4.2.2.2 Fluorescent-activated cell sorting

Due to the low transfection efficiency, enrichment of transfected cells was performed based on GFP expression using fluorescent-activated cell sorting (FACS) on the BD

Influx Cell Sorter at the Centre for Microscopy and Cellular Analysis (CMCA). FACS was performed for the HAP1 populations transfected with a sgRNA-expressing pSpCas9 vector and the corresponding ssODN template, as well as the wild type controls for both the R1805W and S346F mutations.

100 Cells were trypsinised, counted and suspended in FACS buffer to a final concentration of ~1x106 cells/mL. Cells to be collected were selected based on GFP expression with a negative untransfected control used to identify untransfected cells. GFP+ cells were collected in 1 mL of collection media. Following FACS, the cell suspensions were added to 25 cm2 flasks with the volume made up to 4 mL with additional collection media. The following day the media was removed and replaced with fresh HAP1 media.

4.2.3 Validation of genome engineering in the polyclonal HAP1 populations

1 week after FACS, the enriched cells reached 70% confluency in 25 cm2 flasks and were typsinised, collected in 15 mL tubes and suspended in 10 mL of 1 x PBS. Cells were counted and 5 x 105 cells were collected for genome engineering validation while the remainder of the cells were pelleted by centrifugation (400 G for 2 minutes), suspended in HAP1 medium and passaged into 75 cm2 flasks. To verify the genome engineering events had occurred, gDNA was extracted and PCR was performed to amplify a 1 kb or 1.1 kb region of the SRRM2 gene containing the region that was to be edited. Following PCR, a restriction enzyme digest was performed and the unpurified

PCR product sent for sequencing.

4.2.3.1 PCR amplification of gDNA extracted from HAP1 cells

For PCR amplification of a 1.1kb fragment of the SRRM2 gene containing the R1805W codon, PCR was performed using the R1805W_SPF and R1805W_SPR primers. The cycling conditions shown in table 4.7.

101

Table 4.7 PCR Cycling Conditions for PCR Amplification of the R1805W codon Temperature Duration Cycles 94ºC 3m 1 94 30s 52 30s 35 72 1m 72 7m 1m 4ºC hold

For amplification of a 1.0 kb fragment of the SRRM2 gene containing the S346F codon,

PCR was performed using the S346F_SPF and S346F_R primers and the cycling conditions shown in table 4.8.

Table 4.8 Thermal cycling conditions for PCR amplification of R1805W codon Temperature (ºC) Duration Cycles 94 3m 1 94 30s 55 30s 35 72 1m 72 7m 1m 4ºC hold

Following PCR amplification, products were visualized on a 1% agarose gel (100 V for

1 hour) to confirm successful amplification prior to the restriction enzyme digest.

4.2.3.2 Verification of genome engineering

For the restriction enzyme digest, 12.5 µL of PCR product was digested with BsrFaI-

HF (R1805W) or ApoI (S346F) for 3 hours at 37ºC. The digested products were electrophoresed on a 1% agarose gel for 1 hour at 100V. Sanger sequencing was performed with the R1805W_SPF or S346F_SPF for the R1805W and S346F mutations, respectively.

102

4.2.4 Validation of Successful Genome Engineering in Monoclonal HAP1 Populations

Once the presence of the desired mutations in HAP1 cells was confirmed with sequencing, the cells were single cell diluted into 96-well plates and left to expand for

2-3 weeks to obtain monoclonal cell lines. When the cells reached ~60% confluency, the monoclonal cell lines were assayed for the presence of the desired mutation as previously described for the polyclonal population (4.2.4) with the following alterations:

1. Following trypsinisation, cells were suspended in an approximate 2 mL of

HAP1 medium and ¾ of the cell suspension was transferred to a 1.5 mL tube,

the cells pelleted by centrifugation, the medium aspirated and the cells

suspended in 200 µL of 1 x PBS for gDNA extraction.

2. The remaining cell suspensions were made up to 1 mL with HAP1 medium and

transferred to 24-well plates, where each monoclonal population was placed in

its own well.

4.2.4.1 Single-cell dilution of HAP1 cells transfected with a pSpCas9 vector containing a R1805W gRNA sequence

HAP1 cells were trypsinised and suspended in HAP1 medium. 20 µL of the cell suspension was then diluted in HAP1 medium to a final concentration of 0.5 cells/100

µL and 100 µL of this suspension was aliquoted into each well of a 96-well plate. Cells were monitored periodically for the presence of a single colony to identify monoclonal cell lines. Wells containing no cells or wells seeded with more than 1 cell were excluded.

103

4.2.5 Western blot analysis of SRRM2 protein expression in the wild type and mutant

HAP1 cell lines

A western blot was performed to ensure SRRM2 was expressed in the wild type and mutant cell lines following CRISPR/Cas9 genome engineering, and to ensure the expression was equivalent between the two cell lines. 2 biological replicates for each cell line were included in the western blot analysis. Whole cell protein extracts were prepared from wild type and mutant cells growing in 25 cm2 flasks when cells reached

~70% confluency. Immunostaining was performed for SRRM2 and α –tubulin.

104

4.3 Results

4.3.1 Construction of sgRNA-expressing pSpCas9 vectors

A flow diagram of the construction of the sgRNA-expressing vectors in shown in figure

4.7.

4.3.2 Fluorescent-activated cell sorting

A series of gating steps were employed on the BD FACSDiva 8.0.1 flow cytometer software to exclude cellar debris and select for live cells based on size and cellular complexity/granularity (SSC vs. FSC-Par), select cells passing through the flow cytometer laser one at a time (singlets) and exclude cells passing through the laser in clumps (doublets) (FSC-Per vs. FSC-Par-Area). A untransfected control HAP1 population (GFP negative) was included in the cell sort to allow for identification and gating of transfected cells within the transfected populations. This gating strategy was applied all populations to be sorted, although minor changes were applied to account for population shifts between replicate runs.

105

Figure 4.7 Construction of sgRNA-Expressing Vectors. 1) 20bp gRNA sequences (red) are identified in the genome 5’ to a PAM (NGG) sequence (blue) and in close proximity to the loci to be edited (black; underlined and in bold). 2) Partially complementary oligonucleotides (red and purple) are synthesized with overhangs (black) and are phosphorylated and annealed together. 3) The pSpCas9 vector is digested with BbsI-HF. 4) The annealed and phosphorylated oligonucleotides are ligated into the BbsI restriction sites of the pSpCas9 vector, creating a sgRNA-expressing vector.

106

655366556356536 655366565535636 Live LiLvieve 65.85%656.58.585%% 491524914592152 491542941951252 r e r Singlets r e

e SingletSsinglets P C 65.11% P P C C S

32768 C 32768 65.11%65.11% S S S S C 3276832768 32768C 32768 S S F S S F F 16384 16384 1638416384 1638416384

0 0 0 0 0 16384 32768 49152 65536 0 00 16384 32768 49152 65536 A0 16038413623786483F42S97C16 58P2ar46951532665536 B 0 1063841F63S32C874 6P8a3r2 4A79r6e18a52496155523665536 FSC PFarSC Par FSC PFaSr CA rPear Area 4 4 0.90% 10 0.00% 0.00% 10 0.00% 4 4 4 4 10 01.900% 0.90% 10 0.010% 03.00% 0.00% 0.00% 3 0.00% 0.00% 10 10 3 3 3 3 P 1P 0 2 10 120 10 F F 10 100.00% 0.00% 10 0.00%

G 99.10% G P P P P 2 2 2 2 F F F F 10 100.00%0.00% 0.00% 110 0.00% 0.00% 100.00% 10 G 99.10%

10 G

G 99.10% G 10 10

1 1 1 0 1 0 10 10 10 10 1010 0 16384 32768 49152 65536 0 16384 32768 49152 65536 0 FSC Par Area 0 C 0 FSC Par Area D 0 10 10 10 10 0 1064384 163824768327469815249165253665536 0 4 01638416338247683274698152491655253665536 10 10 0.21% 0F.7S5%C PFaSr CA rPe0a.0r0 A%rea FSC PFaSr0 C.A00 rP%eaar Area 3 3 4 410 4 4 10 1010 0.21% 0.00% 10 0.75% 0.75% 0.00% 0.00% 10 0.21% 0.00% P 2 P 2 F 3 F 3 10 99.25% 3 10 3 99.79% 0.00% G 0.00% 10 10 10 G 10 1 1 P P

P 2 10 P 10 2 F 2 2 F

F 99.25% 919.025% F 10 99.79% 0.00%

G 0.00% 10 10 G 99.79% 0.00%

G 0.00% 0 G 0 110 10 1 1 0 16384 32768 49152 65536 1 0 16384 32768 49152 65536 10 10 10 FSC Par Area 10 FSC Par Area E 0 F 0 0 10 0 10 10 0 4 16384 32768 49152 65536 10 0 16384 32768 49152 65536 0 1063814.27%32768 4901.0502% 65536 0 16384 32768 49152 65536 FSC PFaSr CA rPear Area FSC PFaSr CA rPeaar Area 3 10 4 4 1P 0 12.27% 0.00%

10 F 1.27%10 98.73% 0.00% 0.00% G 3 3 10 1 10 10

P 2 P 2 F 0

F 10 98.73% 0.00% 10 G 98.731%0 0.00% G 0 16384 32768 49152 65536 G 1 FSC Par Area 1 10 10

0 0 10 10 0 16384 32768 49152 65536 0 16384 32768 49152 65536 FSC Par Area Figure 4.8 FFluorescentSC Par Area -Activating Cell Sorting for generation of the R1805W mutant HAP1 cell line. A) SSC vs. FSC Par gating of live cells (blue) to exclude cellular debris (in black). B) FSC Per vs. FSC Par Area gating of singlets (red) to exclude doublets. C) Untransfected control (GFP negative). D) Wild type control cell line (GFP positive). E-G) Transfected with gRNA1-3 expressing pSpCas9(BB)-2A- GFP vectors (GFP positive).

107

65536 65536 6655553366 6565535636 Live 49152 94.39% 49152 Singlets Liivvee 49152 32.24% 49152 944..3399% r 49152 % e 49152 Singlets C Singlets P

S 32.24%

32768 r

C 32768 32.24% S r e e S C P

C P F S

32768 S C 32768 3S 2768

C 32768 S S S F

16384 1638F 4 16384 16384 16384 16384 0 0 0 16384 32768 49152 65536 0 0 16384 32768 49152 65536 FSC Par 0 A 0 0 16384 32768 49152 65536 B 00 1F6S3C8 4Pa3r2 A7r6e8a 49152 65536 0 16384 32768 49152 65536 FSC Par 0 16F3S8C4 P3ar2 7A6re8a 49152 65536 FSC Par 4 4 FSC Par Area 10 0.00% 10 0.25% 0.00% 4 0.00% 4 104 0.00% 10 40.25% 0.00% 0.00% 10 3 10 0.00% 0.00% 3 0.25% 0.00% 10 10 3 3 10 10 3 3 P

P 10 2 10 2 F F 100.00% 0.00% 99.75% 0.00%

10P P

10 G G 2 2 F F 100.00% 0.00% 10 99.75% 0.00% P

10 G P G 2 2 F F 100.00% 0.00% 1 10 99.75% 0.00% 1 10 G G 10 10 1 1 10 10 1 1 0 10 0 10 10 0 10 0 010 16384 32768 49152 65536 100 16384 32768 49152 65536 D 0 16384 32768 49152 65536 C 0 0 F1S6C38 P4ar3 A27re6a8 49152 65536 0 FSC Par Area 10 FSC Par Area 10 FSC Par Area 0 16384 32768 49152 65536 0 16384 32768 49152 65536 4 FSC Par Area 4 FSC Par Area 10 1.246% 0.00% 10 1.433% 0.00% 10 1.26% 0.00% 10 1.33% 0.00% 4 4 3 10 1.26% 0.00% 3 10 1.33% 0.00% 10 3 10 3 10 10 3 3 P 2 P 2

F 10 F 10

P 98.74% P 10 2 0.00% 10 982.67% 0.00% F F

G 98.74% G 10 0.00% 10 98.67% 0.00% G G

P 2 P 2

F 98.74% F 1 10 0.00% 1 10 98.67% 0.00% 10G 1 10 G 1 10 10

1 1 0 0 10 0 100 10 10 10 10 E 0 01638146338247638274698154291655253665536 F 0 0 1631864383427362876489145921526556356536 FSC Par Area FSC Par Area 0 FSC Par Area 0 FSC Par Area 10 10 0 16384 32768 49152 65536 0 16384 32768 49152 65536 Figure 4.9 FluorescentFSC- Pactivatingar Area cell sorting for generation FofS Cthe Pa rS346F Area mutant HAP1 cell line. A) SSC vs. FSC Par gating of live cells (blue) to exclude cellular debris (in black). B) FSC Per vs. FSC Par Area gating of singlets (red) to exclude doublets. C) Untransfected control (GFP negative). D) Wild type control cell line (GFP positive). E- F) Transfected with gRNA1-2 expressing pSpCas9(BB)-2A-GFP vectors (GFP positive).

108 4.3.3 Generation of a R1805W Mutant HAP1 cell line

Prior to isolation of a R1805W mutant cell line, a preliminary analysis was conducted to determine if the presence of the R1805W genome engineering event could be identified in the polyclonal HAP1 population by a restriction enzyme digest and Sanger sequencing.

4.3.3.1 Verification of Successful Genome Engineering in the Polyclonal HAP1

Populations

Following FACS, cells were expanded in 25 cm2 flasks until they reached 70% confluency. gDNA was extracted using a QIAmp DNA Minikit and PCR performed to amplify a 1.1kb region of the SRRM2 gene containing the R1805W region. A restriction enzyme digest was performed using BsrFaI on the unpurified PCR products.

Cells transfected with R1805W_gRNA2-pSpCas9 and R1805W _ssODN2&3 (figure

4.10, lane 4) or R1805W_gRNA3-pSpCas9 and R1805W _ssODN2&3 (figure 4.10, lane 5), were positive for the R1805W, as indicated by the presence of a band at 754bp.

Cells transfected with R1805W_gRNA1-pSpCas9 and R1805W _ssODN1 were negative for the R1805W mutation, as indicated by the absence of a band at 754bp

(figure 4.10, lane 3). Alternatively, the genome engineering event may have occurred within this population but occurred at a frequency too low to be detected by agarose gel electrophoresis.

109

Figure 4.10 Agarose Gel Electrophoresis Image of BsrFaI-Digested PCR products Amplified From gDNA Extracted from the Polyclonal HAP1 Cell Lines Transfected with an R1805W-sgRNA-Expressing Vector and a R1805W-ssODN. Lane 1: 1kb+ DNA Ladder, Lane 2: HAP1 cells transfected with an empty pSpCas9 vector (wild type), Lane 3: HAP1 cells transfected with R1805W_gRNA1-pSpCas9 and R1805W _ssODN1, Lane 3: HAP1 cells transfected with R1805W_gRNA2-pSpCas9 and R1805W _ssODN2&3, Lane 4: HAP1 cells transfected with R1805W_gRNA3- pSpCas9 and R1805W _ssODN2&3, Lane 6: No template control. The presence of three bands at 547bp, 344bp and 207bp (lanes 2 and 3) indicate the wild type sequence, whereas the presence of the additional band at 754bp suggests the presence of the R1805W mutation within the polyclonal population.

Following the identification of two populations of HAP1 cells positive for the genome engineering event, PCR products were sequenced and the presence of the genome engineering event was confirmed by the appearance of two peaks at the R1805W loci and the silent PAM mutation in the sequencing chromatogram for both populations

(figures 4.11 and 4.12).

110

Figure 4.11 Sequencing chromatogram of HAP1 cells transfected with the R1805W_gRNA2-pSpCas9 vector and R1805W _ssODN2&3. In red: a double peak showing the wild type sequence (C) and the desired mutant (T), in blue: a double peak showing the wild type sequence (G) and the silent PAM mutation (T).

Figure 4.12 Sequencing chromatogram of HAP1 cells transfected with the R1805W_gRNA3-pSpCas9 vector and R1805W _ssODN2&3. In red: a double peak showing the wild type sequence (C) and the desired mutant (T), in blue: a double peak showing the wild type sequence (G) and the silent PAM mutation (T).

4.3.3.2 Validation of Genome Engineering in Monoclonal Populations

Following single cell dilution, cells were expanded and periodically monitored for the presence of single colonies. Wells containing more than one colony or containing no colonies were excluded from further analyses.

111

Figure 4.12 Image Showing a Single Colony From the gRNA2-Transfected Population Expanding Following Single Cell Dilution. A) Colony 1 at day 3. B) Colony 1 at day 7. Images were taken on a Zoe-Fluorescent Cell Imager (Bio-Rad) using the brightfield channel.

112 Once the cells reached ~60% confluency in the 96-wells, gDNA extraction, PCR amplification, a restriction enzyme digest, and Sanger sequencing was performed as identify monoclonal cell populations containing the desired R1805W mutation.

Figure 4.13 Agarose Gel Electrophoresis Image of BsrFaI-Digested PCR Products Amplified from gDNA Extracted from Monoclonal HAP1 Cell Lines Transfected with R1805W_gRNA2-pSpCas9 and R1805W _ssODN2&3. Lane 1: 1kb+ DNA Ladder, Lane 2-12: Monoclonal cell lines 1-11, Lane 13: NTC. The presence of three bands at 547bp, 344bp and 207bp (Lanes 3, 4, 6, 9 and 10) indicate the wild type sequence, whereas the presence of the bands at 754bp and 344bp suggests the presence of the R1805W mutant (lanes 2, 5 and 11). The presence of all 4 bands (547bp, 344bp and 207bp) suggests both wild type and mutant sequences (Lanes 7, 8 and 12).

From figure 4.13, it was found that three populations (lanes 2, 5 and 11) were positive for the R1805W mutation and were identified based on the presence of two bands at

754bp and 344bp. Three populations (lanes 7, 8 and 12) were positive for both the wild type and R1805W based on the presence of all four bands. Five populations (lanes 3, 4,

6, 9 and 10) were positive for the wild type sequence by the presence of three bands at

547bp, 344bp and 207bp. PCR products amplified using the R1805W_SPF and

R1805W_SPR were sequenced using the R1805W_SPF to determine the genotype of the cells that correspond to lanes 2, 5, 7 and 11 in figure 4.11.

113

Figure 4.14 Sequencing Chromatogram of Lane 2 HAP1 Population (figure 4.13). In red, a single peak (T) corresponding to the R1805W mutation. Blue arrow, two overlapping peaks, one the correct SRRM2 sequence and the other a frameshift mutation.

Figure 4.15 Sequencing Chromatogram of Lane 6 HAP1 Population (figure 4.13). In red, a double peak showing the wild type sequence (C) and the mutant (T). In blue, a double peak showing the wild type sequence (G) and the silent PAM mutation (T).

All three HAP1 populations appearing homozygous for the R1805W mutation by the restriction enzyme digest (figure 4.13) were homozygous for R1805W in the sequencing chromatogram, but also had overlapping peaks downstream of the R1805W

(figure 4.14), a suggesting a indel and a frameshift mutation had occurred. The population from lane 7 (figure 4.13) that was positive for the R1805W mutation and the wild type sequence by restriction enzyme digest showed both mutant and wild type peaks for the R1805W mutation and the PAM sequence with no frameshift mutations

114 (figure 4.15). As HAP1 cells are a haploid cell line, two possibilities exist for the presence of a seemingly diploid karyotype:

1. The cells became diploid prior to the genome engineering event. In the case of

the cell lines that were homozygous for the R1805W mutation but also had a

frameshift mutation, this would mean the cells contain the R1805W mutation

with the correct sequence on one allele while the other contains the R1805W

mutation plus a frameshift mutation. In the case of the cell population that

appeared heterozygous, this cell line may be a true heterozygous with one

mutant allele and one wild type with no frameshift mutations.

2. During the single cell dilution into 96-well plates, the wells were seeded with

more than one cell and each of these populations contains either a mixture of

cells.

To determine whether the cells became diploid prior to transfection or the 96-well plates were seeded with more than one cell, cells from the ‘heterozygous’ population (Figure

4.13 lane 7) and the R1805W with a frameshift mutation (R1805W+FS) (figure 4.13, lane 2) were single cell diluted into 96-well plates as previously described. Once cells reached ~60% confluency, gDNA extraction, PCR amplification and Sanger sequencing was performed to determine if the ‘heterozygous’ or R1805W+FS cell lines retained their genotype or if a homozygous R1805W cell line could be obtained. This was conducted for 3 ‘heterozygous’ and 3 R1805W+FS colonies from 96-well plates.

115

Figure 4.16 Representative Sequencing Chromatogram of a Cell Line Originating from the R1805W+FS Cell Line Following Single Cell Dilution. In red: a single homozygous peak for the R1805W mutation. The blue arrow indicates two overlapping peaks, one the correct SRRM2 sequence and the other a frameshift mutation.

Figure 4.17 Representative Sequencing Chromatogram of a Cell Line Originating From the ‘Heterozygous’ Cell Line Following Single Cell Dilution. In red, a double peak showing the wild type sequence (C) and the mutant (T). In blue, a double peak showing the wild type sequence (G) and the silent PAM mutation (T).

Figures 4.16 and 4.17 show sequence chromatograms obtained from cell lines originating from the R1805+FS and ‘heterozygous’ cell lines. Both cell lines have the same genotype as the cell line they originated from, this was observed for all sequenced colonies and indicates that the cells became diploid prior to the CRISPR/Cas9 genome engineering event and that they are both heterozygous HAP1 cell lines.

116 4.3.4 Validation of genome engineering in the S346F polyclonal population

Following FACS of HAP1 cells co-transfected with a ssODN and a S346F sgRNA- expressing pSpCas9 vector, cells were immediately single cell diluted into 96-well plates and left to expand. When cells reached ~60 % confluency, gDNA was extracted and a restriction enzyme digestion with ApoI was performed on the monoclonal populations.

Figure 4.18 Agarose Gel Electrophoresis Image of ApoI-Digested PCR Products Amplified from gDNA Extracted from Monoclonal HAP1 Cell Lines Transfected with S346F_gRNA1-pSpCas9 and S346F _ssODN1 or S346F_gRNA2-pSpCas9 and S346F _ ssODN2. Lane 1: 1kb+ DNA Ladder, Lane 2: wild type control. Lane 3-9 cells transfected with S346F_gRNA1-pSpCas9 and R1805W _ssODN1. Lanes 10-16 cells transfected with S346F_gRNA2-pSpCas9 and R1805W _ssODN2. The presence of 1 band at 1kb indicates the wild type sequence. All colonies were positive for the wild type sequence and negative for the S346F sequence.

No colonies were positive for the S346F mutation following the restriction enzyme digestion (figure 4.18), with all colonies showing the undigested (1Kb) PCR product.

The absence of the digested product suggested that the genome engineering event had been unsuccessful. PCR products amplified from 6 colonies were sequenced to further confirm the absence of the S346F mutation.

117

Figure 4.19 Representative Sequencing Chromatogram of a S346F_gRNA1- pSpCas9 and S346F _ssODN1 Transfected HAP1 Population. In red, the wild type sequence (C).

Figure 4.20 Representative Sequencing Chromatogram of a S346F_gRNA2- pSpCas9 and S346F _ssODN2 Transfected HAP1 Population. In red, the wild type sequence (C) and marked with a blue arrow, a frameshift mutation.

Sequencing revealed that both S346F_gRNA1-pSpCas9 + S346F _ssODN1 and

S346F_gRNA2-pSpCas9 + S346F _ssODN2 transfected cell lines contained either the wild type sequence (figure 4.19) or a frameshift mutation (figure 4.20). The presence of the frameshift mutation and the double peak suggests that this population of HAP1 cells are also diploid.

4.3.5 Western Blot for SRRM2 Expression in Wild Type and Mutant Cell Lines

Following the generation of the heterozygous HAP1 cell line, subsequently referred to as the mutant HAP1 cell line, the expression of SRRM2 was evaluated in both wild type and mutant (heterozygous) cell lines using a western blot analysis.

118 4.3.5.1 Quantitation of total protein in wild type and mutant whole lysates

Protein was extracted from wild type and mutant cell lines and total protein in cleared lysates was quantified using the BCA assay. A standard curve was generated from the

BSA standards and HAP1 total protein was quantified using the equation generated from the BSA standard trendline, after background absorbance readings were subtracted from the measured protein absorbance.

BSA Standard Curve 1.5

1.0

0.5 Absorbance (562nm) 0.0 0.0 0.5 1.0 1.5 2.0 Protein Concentration (mg/mL)

Figure 4.21 BSA Standard Curve Generated Using the BCA Assay to Quantify Total Protein in HAP1 Lysates. Total protein in HAP1 lysates was quantified using the following formula: y = 0.5492x + 0.4751, where y = Absorbance at 562nm and x = Protein Concentration (mg/mL). A high regression coefficient (R2=0.996) was observed, indicating a high degree of correlation between absorbance at 562nm and protein concentration.

119 Table 4.9 Cell Lysate Total Protein Quantitation from Wild type and Mutant HAP1 cell Lines Background Subtracted Absorbance at 562nm Sample Replicate Replicate2 Replicate Average Protein 1 3 concentration (mg/mL) Wild type 1 0.649 0.653 0.662 0.655 0.283 Wild type 2 0.829 0.753 0.853 0.812 0.517 Mutant 1 0.522 0.49 0.538 0.517 0.130 Mutant 2 0.551 0.542 0.548 0.547 0.131

Table 4.9 shows the calculated protein concentrations in the HAP1 cleared lysates for both wild type and mutant cell lines. The average protein concentration for wild type samples 1 and 2 was 0.283mg/mL and 0.517 mg/mL, respectively; the average protein concentration for mutant samples 1 and 2 was 0.130 and 0.131 mg/mL, respectively.

4.3.5.2 Evaluation of SRRM2 protein expression in wild type and mutant HAP1 cell lines

Protein lysates (75 µg) form mutant and wild type HAP1 cell lines were separated on a

7.5% SDS-PAGE gel by electrophoresis and blotted onto a nitrocellulose membrane.

Immunodetection of SRRM2 was achieved with the primary rabbit anti-SRRM2 antibody and a secondary goat-anti rabbit antibody. Western blot analysis identified two bands at 300 kDa and 250 kDa corresponding to SRRM2 protein isoforms 1 and 2, respectively, in both wild type (Figure 4.22; 1st panel; lanes 1 and 2) and mutant (Figure

4.22; 1st panel; lanes 3 and 4). Expression of SRRM2 isoforms 1 and 2 in wild type and mutant cell lines was similar for all replicates. Tubulin expression was also measured with the primary mouse anti-tubulin antibody and a secondary goat anti-mouse HRP antibody to probe the gel to assess evenness of loading and transfer (Figure 4.22; panel

2; wild type, lanes 1 and 2; mutant, lanes 3 and 4). Tubulin expression was similar for

120 all replicates in both wild type and mutant cell lines (Figure 4.22; panel 2; wild type, lanes 1 and 2; mutant, lanes 3 and 4).

Overall, western blot analysis showed that CRISPR/Cas9 genome engineering did not significantly alter the expression of SRRM2 and expression was equivalent between both cell lines for all replicates. Each lane was loaded and transferred evenly, evident by the similar tubulin bands for all replicates in both cell lines. The SRRM2 isoform 1 was the more abundant of the two isoforms present in both HAP1 cell lines, with isoform 2 having a lower expression, as demonstrated by the fainter band present at 25 0kDa

(Figure 4.22; panel 1; wild type, lanes 1 and 2; mutant, lanes 3 and 4) relative to the stronger isoform 1 band at 300 kDa (Figure 4.22; panel 1; wild type, lanes 1 and 2; mutant, lanes 3 and 4). SRRM2 isoform 3 (34 kDa) was not detected by western blot analysis for wild type or mutant HAP1 cell lines (Figure 4.22; panel 1; wild type, lanes

1 and 2; mutant, lanes 3 and 4), possibly due to the isoform not being expressed in

HAP1 cells or being present at a concentration that was undetectable for 75 µg of protein loading and visualization using a Chemidoc XRS+.

121

Figure 4.22 Western Blot Analysis for SRRM2 Expression in Wild type and Mutant HAP1 Cell Lines. SRRM2 expression was evaluated in the wild type and mutant (heterozygous) cell lines to determine if CRISPR/Cas9 genome engineering altered the expression of SRRM2. Total protein was extracted from two separate cultures for both cell line and 75 µg of protein were separated on a 7.5% SDS-PAGE gel and then blotted onto a nitrocellulose membrane. SRRM2 expression was evaluated with anti-SRRM2 antibody (first panel), identifying two bands at 300 kDa and 250 kDa that correspond to SRRM2 isoform 1 and 2, respectively. SRRM2 protein isoform 3 (34 kDa) was not detected in either cell line. Anti-tubulin antibody was used to probe the gel to assess evenness of loading and transfer (second panel).

122

4.4 Discussion

4.4.1 CRISPR/Cas9 Genome Engineering Overview

CRISPR/Cas9 genome engineering was employed to generate two mutant cell lines containing SRRM2 mutations associated with FNMTC predisposition. The R1805W mutation found to co-segregate with FNMTC in the Western Australian FNMTC cohort and the S346F mutation, previously implicated in FNMTC predisposition [17].

Since gRNAs differ in their ability to induce DSBs in DNA, multiple gRNAs were selected and tested to generate the mutations [116]. For the R1805W mutation, 3 gRNAs were selected and for the S346F, 2 gRNAs were selected. 2 gRNAs were selected for the S346F due to limited availability of gRNAs that met the selection criteria (4.2.1.1). For each of the gRNAs selected a sgRNA-expressing pSpCas9 vector was constructed and a corresponding ssODN containing the desired mutations purchased. Transfection of the sgRNA-expressing pSpCas9 vectors and ssODNs was achieved using Turbofectin 8.0 and FACS was employed to enrich for transfected cells.

Following single cell dilution, expansion and sequencing of the monoclonal cell lines, heterozygous R1805W cell lines, and homozygous R1805W cell lines with a frameshift

(R1805W+FS) mutation on one allele were discovered.

As the HAP1 cell line is haploid, a second round of single cell dilutions was performed for both the heterozygous and R1805W+FS cell lines to determine whether the seemingly diploid karyotype was due to an error in the single cell dilution process. Both the heterozygous (figure 4.16) and R1805W+FS (4.17) cell lines retained their genotype following the single cell dilution process, suggesting that the HAP1 cells became diploid prior to the genome engineering event. For the S346F mutation, CRISPR/Cas9 123 was unsuccessful in generating the mutation but did induce frameshift mutations in multiple colonies (4.20), suggesting that this population of HAP1 cells also became diploid prior to the genome engineering event. A western blot was performed on the wild type and heterozygous cell lines to evaluate whether CRISPR/Cas9 genome engineering altered the expression of SRRM2 in the mutant cell line. Two bands corresponding to the two large SRRM2 isoforms (300 kDa and 250 kDa) were observed in all replicates from both cell lines but the smaller isoform (37 kDa) was not observed in either cell line, possibly due to it not being expressed in HAP1 cells or existing at a concentration too low for this assay (figure 4.22).

4.4.2 Identification of Diploid HAP1 Cell Lines

HAP1 cells have previously been reported to spontaneously become diploid [120], this was known at the beginning of the genome engineering process, however, following instructions from the HAP1 supplier (Horizon Discovery), transfection was performed promptly after thawing, with the thawed cells allowed to expand to 70% confluency in a

75 cm2 before being passaged into 25 cm2 flasks for transfection.

It was recently shown that haploid and diploid HAP1 cells differ in their proliferative rates, with diploids proliferating at a higher rate than the haploid counterparts [125]. A likely explanation for the generation of diploid cell lines following single cell dilution is that diploid colonies were inadvertently selected for the validation of CRISPR/Cas9 genome engineering and haploid cells were excluded. Following single cell dilution, the cells were left to expand and it was noted that some colonies were growing faster than others. Due to time constraints, the faster growing colonies were selected for

CRISPR/Cas9 genome engineering validation, while the slower growing colonies were excluded. Given that diploid HAP1 cells grow faster than their haploid counterparts, it

124 seems likely that diploid colonies were inadvertently selected and the haploid counterparts were excluded, resulting in the generation of diploid HAP1 cells. This hypothesis could be tested by repeating the experiment and selecting both slow and fast growing colonies to screen for the presence of the genome engineering event. If the same observations were made with the faster growing colonies (2 overlapping peaks) but were absent from the slow growing colonies (a single peak), this would suggest the hypothesis is correct, however, it would not confirm the karyotype of the cell lines.

If the same observations were made for the slow and fast growing colonies, future attempts should be made to first ensure that the cells are haploid prior to transfection.

This has reportedly been achieved through single cell sorting of haploid cells by staining with a DNA-binding dye and selecting haploid cells based on the G1 cell cycle peak of the haploid cells relative to the G2 peak of diploid cells [125]. It has also been reported that knock-down of p53 facilitates the maintenance of haploidy by enabling the survival of genetically unstable cells, a feature more common in haploid cells than diploid, and therefore increases the fitness of haploid cells relative to the diploid and prevents their gradual decline in the population [125].

HAP1 cells were selected for this study due to their haploid nature. It is a major advantage for functional genetic studies as only a single allele needs to be mutated to produce a desired phenotype [120]. This is illustrated by the plethora of studies conducted on haploid yeast cells [109]. In the case that haploid cells cannot be isolated following the genome engineering event, an alternative cell line may be a better option for CRISPR/Cas9 genome engineering, such as haploid embryonic stem cells if the protocols for differentiation into the desired cell type can be readily and reproducibly

125 established [126]. This would likely also be more clinically relevant as it would allow for the study of disease causing mutations in the cell type that the disease originates.

4.4.3 The Presence of Frameshift Mutations in HAP1 Cell Lines following

CRISPR/Cas9 Genome Engineering

A major question to be resolved is why frameshift mutations were observed in the cell lines following CRISPR/Cas9 genome engineering. To prevent frameshift mutations, silent mutations are placed in the PAM sequence of the ssODN, destroying the PAM sequence and preventing the sgRNA from localizing Cas9 to the genomic region for cleavage after the genome engineering event [116]. The PAM sequence for the Cas9 nuclease used in this study is NGG [124], and each PAM was removed from the ssODN by the introduction of a silent mutation (figures 4.5 and 4.6). However, Cas9 still induced cleavage at the site after the desired genome engineering event, evident by the homozygous R1805W cell line with the frameshift mutation.

An alternative to making a silent mutation in the PAM sequence, is to make multiple silent mutations in the gRNA binding sequence of the ssODN, preventing the gRNA sequence from binding to the genomic loci similarly to how the PAM mutation is supposed to function [116]. This may be a more viable alternative to PAM mutations, as one can be sure that the gRNA will have little-to-no affinity to the genomic loci if enough bases in the gRNA binding sequence are altered.

4.4.4 Failure to Generate a S346F Mutant Cell Line

CRISPR/Cas9 genome engineering was unsuccessfully implemented to generate a

S346F mutant HAP1 cell line. No S346F mutations were observed in any colonies with

126 or without frameshift mutations like the observations with the R1805W mutation. The key difference between the R1805W and S346F genome engineering attempts was that for the R1805W mutation, both the gRNAs and ssODN were homologous to the same strand of genomic DNA (forward strand), whereas for the S346F mutation, the gRNAs were homologous to reverse strand while the ssODN was homologous to the forward strand. ssODNs can be designed for either strand and many researchers will try both strands when trying to generate a mutant cell line. Since the gRNA and ssODN are complementary to one another, it is possible that Cas9 cleaved the ssODN and prevented the desired genome engineering event, the PAM mutations should have prevented that from occurring. The alternative is that the S346F ssODNs were less effective than the R1805W ssODNs. This may have been the case, as the R1805W ssODNs contained fewer silent mutations than the S346F ssODNs (2 for R1805W, 3 for

S346F), and therefore more mismatches to the genome sequence. Additionally, the cut sites were further away from the S346F codon than they were for the R1805W (S346F,

22bp and 21bp; R1805W: 7bp and 8bp, for the two that worked).

4.4.5 Conclusion

The aim of this chapter was to establish the protocols for CRISPR/Cas9 genome engineering and employ this tool to generate mutant cell lines containing SRRM2 mutations that have been associated with FNMTC. CRISPR/Cas9 was successfully implemented to generate a heterozygous cell line, however, a homozygous cell line is more desirable to study the functional consequences of a mutation. That said, the

FNMTC patients carrying R1805W mutations, as well as the patients carrying the

S346F mutations [17], are all heterozygous for the mutation and wild type sequence, suggesting that if SRRM2 is a susceptibility gene for FNMTC, then the underlying

127 oncogenic processes must occur within a heterozygous cell. It was in this context that the decision was made to perform functional studies on the heterozygous cell line.

Overall this research presented in this chapter has provided a solid framework for future attempts at CRISPR/Cas9 genome engineering using the HAP1 cell line and generated a mutant cell line to study FNMTC predisposition. In the preceding chapters, the heterozygous R1805W HAP1 cell line was used as a model system to study R1805W in the context of FNMTC, and is subsequently referred to as the mutant cell line.

128

Chapter 5 – Quantitative Real-Time PCR for the

Characterisation of Alternative Splicing Events in the

Wild Type and Mutant R1805W HAP1 Cell Lines

5.1 Introduction

The two-step reverse-transcription (RT), and fluorescence-based quantitative real-time

PCR (RT-qPCR) method combines traditional PCR with a fluorescence-based detection of PCR amplification for the quantification of RNA abundance within a biological material [127]. The process functions by first reverse transcribing RNA into complementary DNA (cDNA) using oligo-dT and random hexamer primers to amplify polyadenylated and other regions of RNA transcripts, respectively [128]. Template

RNA can be degraded by Ribonuclease H (RNase H) that detects and cleaves RNA contained within DNA:RNA hybrids, leaving a pool of cDNA to perform the qPCR reaction that is representative of the transcriptome in terms of the RNA molecules present and their relative abundance [129].

Once a pool of cDNA is generated, quantitative real-time analysis of PCR amplification occurs at every cycle of the PCR reaction by the measurement of increasing fluorescence intensity [127]. SYBR Green I binds to the minor groove of the double- stranded DNA and upon binding fluoresces. The fluorescence is detected and quantification of PCR amplification occurs at each cycle of the reaction, allowing for the tracking of amplification during each cycle of the PCR reaction [130].

129

For gene expression analyses, the relative quantity of a target transcript, from different biological samples or replicate runs of the sample, can be determined by comparison to a housekeeping gene that is quantified in parallel to the target transcript and originates from the same sample [115]. Housekeeping genes, also known as reference genes, are endogenous controls that fulfil the criterion of unregulated expression independent of the experimental condition, and can therefore be used to normalize RT-qPCR data by controlling for variation in experimental conditions, such as the biological source of the

RNA sample, the quantity of starting material, or PCR amplification efficiency [127].

Tomsic et al. (2016) performed RNA-seq to identify changes in the splicing pattern of

FNMTC patients harboring the S346F mutation and noted differences in the alternative splicing pattern of 1,647 exons, 7 of which were experimentally verified using semi- quantitative PCR (CAMMK2, CDC16, CTNNA1, FBXW4, HBP1, PIM2, SPPL3).

These results suggest that the S346F mutation alters the normal splicing function of

SRRM2 by reducing its ability to incorporate differentially spliced exons into mature mRNA [17].

Hypothesis: The R1805W mutation alters the normal splicing function of SRRM2 by reducing its ability to facilitate the incorporation of differentially spliced exons into mature mRNA.

The aim of this chapter is to investigate the alternative splicing patterns of the 7 exons previously shown to be differentially spliced in FNMTC patients with the S346F mutation using the wild type and mutant HAP1 cell lines and RT-qPCR.

130

5.2 Method

Once wild type and mutant HAP1 cell cultures reached 70 % confluency in 25 cm2 flasks, they were trypsinised, counted and 280,000 cells were passaged into 6 x 25 cm2 flasks, creating 3 biological replicates for both wild type and mutant HAP1 cells lines.

RNA extraction, cDNA synthesis and qRT-PCR was performed for all 3 biological replicates for both wild type and mutant HAP1 cell lines.

5.2.1 Reverse-Transcription of RNA into cDNA

Complementary DNA (cDNA) was generated from the extracted RNA using the

Quantitect Reverse Transcription kit (Qiagen). The kit contains reagents for two different reactions, the first is a gDNA elimination and the second generates cDNA.

5.2.1.1 Genomic DNA Elimination

The genomic DNA elimination reaction was performed with 2µg of RNA in a total volume of 28µl with the reaction components shown in table 5.1.

Table 5.1 Reaction Components for gDNA Elimination Component Volume/Reaction gDNA Wipeout Buffer 4 µL Template RNA (2µg) 3-7 µL RNase-Free Water 18-21 µL

The reactions were incubated for 2 minutes at 42ºC in a Bio-Rad T100 thermocycler and then placed immediately on ice.

131 5.2.1.2 Reverse Transcription of cDNA

The reverse transcription reactions were performed in a total volume of 40µL with the template RNA being added last. The reaction components are shown below in table 5.2.

Table 5.2 Reaction Components for Reverse Transcription of cDNA Component Volume/Reaction Quantiscript Reverse Transcriptase 2 µL Quantiscript RT Buffer 8 µL RT Primer Mix 2 µL Template RNA: entire genomic 28 µL elimination reaction (5.2.3.1)

The reactions were incubated at 42ºC for 15 minutes and 95ºC for 3 minutes to synthesize cDNA and to inactivate Quantiscript Reverse Transcriptase, respectively.

5.2.2 Primer Design for Amplification of Alternatively-Spliced Transcripts

The qPCR experiment aimed to quantify the relative abundance of two transcripts from

7 genes (CAMMK2, CDC16, CTNNA1, FBXW4, HBP1, PIM2, SPPL3). For each gene, the two transcripts differed by the presence or absence of an exon - the alternatively spliced or variable exon, and were subsequently named the exon-included (EI) transcript, where the transcript contains the exon, or the exon-skipped (ES) transcript, where the transcript does not contain the exon. Primers were designed span exon-exon junctions to specifically amplify the EI or ES transcript.

Primers were also designed for two housekeeping genes (18s rRNA and HPRT) to normalize the qPCR data. 18s rRNA and HPRT were selected as they have previously been shown to have consistent expression in qPCR experiments using HAP1 cell lines

[131].

132

Figure 5.1 Primer Design for PCR Amplification of transcritpts for quantitation of exon skipping [130].* A schematic showing a pre-mRNA molecule containing 4 exons and producing 2 transcripts that differ by the inclusion or exclusion of a exon (exon 3). The paired arrows indicate primer pairs that will specifically amplify the desired transcript, with red arrows being forward primers spanning exon-exon junctions and blue arrows indicating reverse primers which do not span exon-exon junctions. * Adapted from Harvey et al. 2016

5.2.3 Quantitative PCR Analysis of exon-included and exon-skipped transcripts

The qPCR reactions were performed in a total volume of 25 µL in 4 x 96 well plates using a QuantStudio 6 quantitative thermal cycler (ThermoFisher). qPCR was performed with three technical replicates from cDNA generated from all biological replicates. Included in the qPCR analysis were 14 no template controls (NTC), one for each of the primer pairs, with no cDNA template added.

5.2.3.1 Preparation of primer mixtures

Primer mixtures for each EI and ES transcript and housekeeping genes were prepared by mixing 100 µL of 10 µM of each forward and reverse primer together in a total volume of 200 µL.

133 5.2.3.2 Preparation of PCR mastermix

A master mix containing the components of the real-time reactions minus primers was made with cDNA produced from wild type or mutant RNA with an additional 1 5% to allow for wastage as shown below:

Table 5.3 Reaction Components for Reverse Transcription of cDNA for each 96- well plate Component Volume/Reaction Master Mix (96 reactions + 15%) SYBR Green 12.5 µL 1380 µL cDNA template 2 µL 220 µL Nuclease-free water 9 µL 994 µL

5.2.3.3 Preparation of 96-well plates for qPCR

1.5 µL of the primer mixtures (0.3µM of each primer in the 25µl qPCR reaction) were added to the corresponding wells of 96-well plates. 23.5 µL of the master mix was then added to each well using a multichannel pipette, with the master mix containing the wild type cDNA template pipetted into the associated wild type wells and the master mix containing the mutant cDNA pipetted into the associated mutant wells. The primers and maser mix were then pipetted up and down twice to mix the primers with the master mix. The plate was spun using a bench top centrifuge with a 96 well plate rotor for 30 seconds at 1000 x G and wells were checked for the absence of bubbles.

5.2.3.4 qPCR reaction and melt curve analysis

The 96-well plates were loaded into the QuantStudio 6 thermal cycler and the qPCR reaction was completed with the following cycling conditions:

134 Table 5.4 Cycling conditions for the qPCR Analysis for Alternative Splicing Segment Cycles Temperature (ºC) Time 1 1 95 15 minutes 2 40 94 15 seconds 60 30 seconds 72 30 seconds

After the qPCR run, a melt curve analysis was performed to check the specificity of the

PCR products and ensure no off-target amplification occurred.

5.2.3.5 qPCR Data Analysis and Statistical Analysis

After the qPCR runs were completed, Ct values were calculated from the threshold positioned on the logarithmic graph above the negative control background fluorescence and during the exponential phase of amplification. The Ct values were imported from the QuantStudio 6 software into the qbase+ 3.1 (biogazelle) for normalization and analysis. Ct values were converted to calibrated normalized relative quantities (CNRQ) to account for template quantity differences between biological replicates and was performed on the qbase+ software using the 18S rRNA and HPRT housekeeping genes.

Statistically significant differences between wild type and mutant transcript expression was determined using a two-tailed unpaired CNRQ values. Data was considered statistically significant for p < 0.05. Statistical analysis was performed on the GraphPad

Prism 7 software.

135 5.3 Results

5.3.1 RNA Extraction

Extracted RNA concentrations ranged from 495.3 ng/µL to 634.9 ng/µL.

Table 5.5 Concentration and purity of RNA extracted from wild type biological replicates Biological Repeat Concentration A260/A230 A260/A280 Replicate (ng/µL) 1 1 540.3 2.01 1.94 2 531.2 2.01 1.94 2 1 624.9 2.05 2.10 2 615.5 2.04 2.09 3 1 501.1 1.98 2.01 2 495.3 1.98 2.01

Regarding RNA quality, the A260/A230 ratios ranged from 1.94 to 2.10 suggesting low salt contamination. The A260/A280 ratios, a measure of protein and DNA contamination ranged from 1.94 to 2.11, suggesting little protein or DNA contamination.

Table 5.6 Concentration and purity of RNA extracted from mutant biological replicates Biological Repeat Concentration A260/A230 A260/A280 Replicate (ng/µL) 1 1 624.8 1.99 2.03 2 614.1 1.99 2.03 2 1 634.9 2.10 2.10 2 625.5 2.10 2.09 3 1 601.2 1.94 2.11 2 588.3 1.94 2.11

5.3.2 Agarose Bleach Gel Electrophoresis for Evaluation of RNA integrity

When assessing the integrity of eukaryotic RNA by gel electrophoresis, the appearance of three distinct bands indicates high quality RNA. The top band represents 28S rRNA,

136 the second represents 18s rRNA and the third band represents 5.8S and 5S RNA. A

28S:18S band intensity ratio < 2:1 or smeared bands indicate degraded RNA.

Figure 5.2 RNA Extracted from Wild Type and Mutant HAP1 Biological Replicates on a 1% TAE and 1.2% v/v Bleach Agarose Gel. Lanes 1-3, wild type biological replicates 1-3. Lanes 4-6, mutant biological replicates 1-3. Lane 7, NTC. Each lane was loaded with 3µg of RNA and 1 x loading buffer and electrophoresed for 1 hour at 100V. 28S and 18S (rRNA) bands are present in all samples but 5.8S/5S rRNA bands are absent. 28S rRNA is roughly twice as intense as 18S, as is observed with non-degraded RNA samples.

28S and 18S rRNA (figure 5.2) bands were clearly visible for all biological replicates and absent from the NTC. The 28S band appears to be twice as intense as the 18S band, indicating quality RNA. The 5.8S and 5S bands were absent from the gel for all biological replicates, possibly due to a low concentration of RNA run on the gel.

137 5.3.3 qPCR analysis of alternative splicing events on the QuantStudio 6

Prior to normalization of transcript expression, a melt curve analysis was performed to check the specificity of the primers. This was performed by plotting fluorescence as a function of temperature. The presence of a single peak was observed for all transcripts, suggesting that no non-specific amplification had occurred (data not shown). Cycle threshold (Ct) values were exported from the QuantStuio 6 software and the data was analysed using the qbase+ software (Biogazelle). CNRQ values were generated from

Ct values for wild type and mutant transcripts by normalization to 18S rRNA and HPRT housekeeping genes. No fluorescence greater than the background was observed for the

NTCs.

138 Table 5.7 Summary of RT-qPCR Data for Alternative Splicing Analysis Wild type Mutant CNRQ 95% CI CNRQ 95% CI P-value Mean Mean CAMKK2 0.927 (0.658, 1.079 (0.90, 1.306) 0.189 EI 1.306) CAMKK2 0.926 (0.557, 1.538) 1.08 (0.612, 1.904) 0.435 ES CDC16 0.947 (0.647, 1.385) 1.056 (0.403, 2.009) 0.360 EI CDC16 1.026 (0.699, 1.508) 0.974 (0.504, 1.882) 0.786 ES CTNNA1 0.956 (0.584, 1.567) 1.046 (0.709, 1.543) 0.576 EI CTNNA1 0.952 (0.715, 1.267) 1.051 (0.748, 1.476) 0.394 ES FBXW4 1.007 (0.898, 1.129) 0.993 (0.841, 1.173) 0.794 EI FBX 1.111 (0.755, 1.634) 0.9 (0.403, 2.009) 0.388 ES HBP1 0.99 (0.652, 1.503) 1.01 (0.878, 1.162) 0.863 EI HBP1 0.794 (0.107, 5.892) 1.26 (0.742, 2.137) 0.428 ES PIM2 0.984 (0.169, 5.723) 1.016 (0.374, 2.763) 0.949 EI PIM2 1.302 (0.098, 10.898) 0.969 (2.206, 4.569) 0.929 ES SPPL3 0.963 (0.128, 7.219) 1.039 (0.261, 4.135) 0.9 EI SPPL3 0.964 (0.066, 14.074) 1.037 (0.13, 8.27) 0.931 ES CNRQ, Calibrated normalized relative quantities. Linear Data. P-values calculated using unpaired Students t test.

139 A B

C D

E F

Figure 5.3 1D Plots Showing Log Relative Quantities of RT-qPCR Generated from Wild type and Mutant HAP1 Cell Lines. A) CDC16 EI transcript. B) CDC16 ES Transcript. C) CAMKK2 EI transcript. D) CAMKK2 ES transcript. E) CTNNA1 EI transcript. F) CTNNA1 ES transcript. Error bars show +/- 95% CI. The middle bar is the mean for each biological replicate (n=3).

140 A B

C D

E F

Figure 5.4 1D Plots Showing Log Relative Quantities of RT-qPCR Generated from Wild type and Mutant HAP1 Cell Lines. A) FBXW4 EI transcript. B) FBXW4 ES Transcript. C) PIM2 EI transcript. D) PIM2 ES transcript. E) SPPL3 EI transcript. F) SPPL3 ES transcript. Error bars show +/- 95% CI. The middle bar is the mean for each biological replicate (n=3).

141

Figure 5.5 1D Scatter Plots Showing Log Relative Quantities of RT-qPCR Generated from Wild type and Mutant HAP1 Cell Lines. HBP1 EI transcript. B) HBP1 ES Transcript. Error bars show +/- 95% CI. The middle bar is the mean for each biological replicate (n=3).

No significant differences were found in the relative expression levels of the EI or ES transcripts between the two cell lines (Table 5.7). However, some variability in the normalized data between biological replicates was observed, evident by the large 95 % confidence intervals, specifically wildtype and mutant PIM2 EI, PIM2 ES, SPPL3 EI,

SPPL3 ES (figure 5.4), and wildtype HBP1 ES (figures 5.5), suggesting that for these transcripts, the RT-qPCR had been unsuccessful in accurately quantifying transcript expression.

142 5.4 Discussion

RT-qPCR was performed to quantify the splicing patterns of 7 exons previously shown to be differentially spliced in FNMTC patients heterozygous for the S346F mutation

[17]. For each gene, two transcripts were quantified, the exon-included (EI) transcript and the exon-skipped (ES), that differ by the presence or absence of an exon, the alternatively spliced exon. Primers were designed to span exon-exon junctions to prevent the amplification of unwanted templates. RNA was extracted from 3 wild type and mutant HAP1 cell cultures and cDNA generated using the QuantiTect Reverse

Transcription kit. From the cDNA generated, RT-qPCR was performed on the

QuantStudio 6 using SYBR Green I to detect PCR amplification.

The raw Ct values were analysed by the qbase+ software and normalized to the 18s rRNA and HPRT housekeeping genes to generate CNRQ values. Statistical analysis was performed using an unpaired Students t-test. Overall, no significant difference was observed between the wild type and mutant cell lines for any of the transcripts quantified (table 5.7). Variability in the CNRQ between biological replicates was observed, specifically for wildtype and mutant PIM2 EI, PIM2 ES, SPPL3 EI, SPPL3

ES, and wildtype HBP1 ES (figures 5.4 and 5.5), evident by the wide 95% confidence interval values. Interestingly, for each of the transcripts showing large variation between biological replicates, both the wild type and mutant cell lines were affected (except

HBP EI), suggesting that the variation has a common cause. This could be due to suboptimal primers and this problem may be resolved by a different primer pair, or it may be due to low expression of the target transcript.

The aim of this chapter is to investigate the alternative splicing patterns of the 7 exons previously shown to be differentially spliced in FNMTC patients with the S346F

143 mutation using the wild type and mutant HAP1 cell lines and RT-qPCR. The aims were largely met, as most of the transcripts showed similar CRNQ values for each of the biological replicates. However, for those transcripts that showed large variation, it cannot be concluded from this study whether there was a difference in splicing. Had the generation of the S346F mutation been successful, it would have served as positive control for this experiment, as it was previously shown to alter the splicing of the exons assessed in this chapter.

Tomsic et al. (2016) performed RNA-Seq to identify exons that were differentially spliced between S346F FNMTC patients and controls, and experimentally confirmed 7 of 1,642 exons with semi-quantitative PCR [17]. To confirm whether SRRM2 mutations alter the splicing pattern of the cell lines, a high-throughput technique such as

RNA-Seq should be performed as it provides data relating to the whole transcriptome, rather than 14 transcripts assessed in this chapter. Future experiments could be conducted similarly, with RNA-Seq to identify differentially spliced exons between the two cell lines and RT-qPCR to experimentally verify any changes observed. The major advantage of performing this kind of study on a cultured cell line rather than FNMTC patients is that the variation between individuals is essentially controlled with cultured cells, and any data relating to differences in splicing can be attributed to R1805W mutation rather than genetic variation between individuals.

144 Chapter 6 - Flow Cytometric Characterisation of the

Wild type and R1805W Mutant HAP1 Cell Line’s

Growth Kinetics

6.1 Introduction

Flow cytometry is a biophysical technology employed to evaluate characteristics of homogenous or heterogeneous cell populations [132]. As cells pass through a light source, characteristics such as cell size and complexity/granularity can be determined based on the scattering of light, as the scattering of light is directly proportional to the structural and morphological properties of the cell. Scattering of light in the direction of the incident light source (forward scatter, FSC) is a measure of cell size, whereas the scattering of light at 90º to the path of the incident light (side scatter, FCC) is a measure of cellular complexity/granularity. Additionally, measurement of fluorescent features, derived from antibodies or fluorescent probes, can be measured using flow cytometry to evaluate phenotypic traits associated with the fluorescence signal, as the amount of fluorescence emitted following excitation by a light source is proportional to the amount of fluorescent probe bound to the cell or internalized within.

145

Figure 6.1 Scattering of Light in Flow Cytometry. A cell passes through a light source as it flows in a stream. FSC is proportional to cellular size while SSC is proportional to cellular complexity/granularity [132]

The cell cycle is characterised by 3 phases: G0/G1 (Gap1), S phase (DNA synthesis) and G2/M (Gap2/Mitosis). As a cell progresses from G0/G1 to G2/M, its DNA content doubles, with cells in G2/M having twice as much DNA as those in G0/G1, and cells in

S-phase having an intermediate quantity of DNA. DNA binding dyes such as propidium iodide (PI) can be used in conjunction with flow cytometry to quantify the percentages of cells in each of the cell cycle phases, as PI binds stoichiometrically with DNA and fluoresces with an intensity proportional to DNA content. Cell cycle analysis can be performed to determined oncogenic properties of cell populations as many oncogenic processes involve alterations in G0/G1 regulators, resulting in an increased proportion of cells in the G2/M phases of the cell cycle and increased cellular proliferation (ref).

146

Figure 6.2 Cell Proliferation Analysis Using the Dye Dilution Method. Each time a cell divides, the daughter cells receive half the dye of the parent generation (A), resulting in a decreasing fluorescence intensity of subsequent generations by a factor of 2 (B). *Obtained from CellTraceTM Cell Proliferation Kit manual (Thermo Fisher)

In addition to cell cycle analysis using a DNA binding dye, cellular proliferation can be measured by dye dilution and flow cytometry. Carboxyfluorescein succinimidyl ester

(CFSE) is a cell permeable fluorescent dye that binds intracellular proteins and is retained within cells for long periods of time. When a cell proliferates, the daughter cells receive half the amount of CFSE that the parent cell contained, allowing for the identification of successive generations within a population based on the decrease (by a factor of 2) in fluorescence intensity. Proliferation assays using dyes such as CFSE, in conjunction with flow cytometry, allow for the identification of oncogenic characteristics of cells by allowing for the determination of the number of proliferative events that have occurred within a given population during a defined period.

147

The hypothesis of this chapter is: The R1805W mutation prevents proper regulation of the cell cycle and the mutant cell line will exhibit greater proliferative characteristics relative to the wild type.

The aims of this chapter are: To employ flow cytometry techniques to characterise the mutant and HAP1 cell lines in general and to specifically characterise the proliferative characteristics of the cell lines to determine if the mutant exhibits increased oncogenic characteristics relative to the wild type.

148 6.2 Method

This chapter involves the use of multiple flow cytometers at the CMCA. Flow cytometry experiments were conducted with the BD FACSCanto II, BD LSRFortessa, and BD FACSMelody flow cytometers running the BD FACSDiva 8.0.1 software. A minimum of 50,000 cells were analysed for each replicate and controls. Analysis of the cytometry data was performed using the FCS Express 6 and GraphPad Prism 7 software.

6.2.1 Cell Cycle Analysis

Cell cycle analysis was performed to determine if there was a difference in the percentages of cells in the G0/G1 and G2/M phases of the cell cycle for the wild type and mutant HAP1 cell lines by measuring propidium iodide fluorescence on the BD

FACSCanto II. 280,000 wild type and mutant HAP1 cells were passaged into 10 x 25 cm2 flasks, creating 3 biological replicates and 2 controls for each cell line to be analysed. The wild type and mutant controls used for cell cycle analysis experiments are detailed below:

1. Unstained controls: Cells were fixed but were not stained with PI (wild type and

mutant)

2. Unfixed controls: Cells were not fixed but were stained with PI (wild type and

mutant)

6.2.1.1 Preparation of HAP1 Cells for Cell Cycle Analysis

Once cells reached the 70 % confluency, cells were trypsinised and collected in 15 mL tubes. HAP1 medium was added to inactivate trypsin and the cells were pelleted by centrifugation (400 G, 2 minutes). The HAP1 medium was removed and the cells were washed twice with 1 x PBS with pelleting by centrifugation between the washing steps,

149 and suspended in 5 mL of 1 x PBS. Cells were counted and 1,000,000 cells were transferred to 1.5 mL tubes, pelleted by centrifugation (400 G, 3 minutes) and the PBS removed by aspiration.

Biological replicates and controls were prepared differently and as follows:

1. Biological replicates were fixed on ice for 30 minutes by addition of 1 mL of

70% ethanol (in 1 x PBS) to the pellet and the pellet suspended by gentle

vortexing. After fixation, cells were pelleted by centrifugation (400 G, 2

minutes), the ethanol removed by aspiration, and suspended in 1.5 mL of cell

cycle analysis buffer (CCA Buffer) (50 µg/mL propidium iodide and 50 µg/mL

RNase in 1 x PBS) and stored on ice.

2. Unstained controls were fixed with 70% ethanol as described for the biological

replicates and suspended in 1.5 mL of 1 x PBS and stored on ice.

3. Unfixed controls were suspended in 1.5 mL of CCA Buffer without fixation and

stored on ice.

6.2.1.2 Cell cycle analysis of Wild type and Mutant HAP1 cell lines

Histograms of PI fluorescence (DNA content) were generated to identify the G0/G1, S and G2/M phases of the cell cycle. Unstained controls were used to identify unstained cells and unfixed controls were used to identify the apoptotic peak and exclude dead cells from the analysis.

150 6.2.1.3 Cell Cycle Analysis Statistics

Statistical significance between the percentages of cells in the G0/G1 and G2/M phases of the cell cycle for the wild type and mutant HAP1 cell lines was evaluated using an unpaired nonparametric Mann-Whitney test and calculations were considered statistically significant with a P value < 0.05.

6.2.2 Cell Sorting of Large and small Populations from Wild type and Mutant HAP1

Cell Lines

The two populations of cells from the wild type and mutant HAP1 cell lines were identified based on size and complexity/granularity and sorted into two tubes containing

1 mL of collection media.

6.2.2.1 Cell Preparation

Wild type and mutant cells in the exponential phase of growth were trypsinised and collected in 15 mL tubes. HAP1 medium was added to inactivate trypsin and the cells were pelleted by centrifugation (400 G, 2 minutes). The HAP1 medium was removed and the cells were washed twice with 1 x PBS with pelleting by centrifugation between the washing steps, and suspended in 5 mL of 1 x PBS. Cells were counted and

1,000,000 cells were transferred to 1.5 mL tubes and made up to 1.5 mL FACS Buffer.

6.2.2.2 Small and Large Cell Visual Characterisation

Small and large populations of wild type and mutant HAP1 cells separated using the BD

FACSMelody flow cytometer were plated into a 12-well plate (2 for each wild type and mutant large and small populations), and visually inspected using the Zoe Fluorescent

151 Cell Imager (Bio-Rad) for the appearance of singlets (single cells) and doublets (2 or more cells stuck together).

6.2.3 Cell Proliferation Assay

A cell proliferation assay was conducted to determine whether wild type and mutant

HAP1 cell lines showed differences in their proliferating rate by measuring CFSE fluorescence intensity using the BD LSRFortessa flow cytometer. The cell proliferation assay was conducted 3 days after staining of HAP1 cells with CFSE, with cells kept in culture conditions during the 3 days incubation. For both wild type and mutant HAP1 cells, 3 biological replicates were included for the proliferation assay and a number of controls were included in the experiment and are summarized below:

1. Unstained controls: No CFSE or PI staining (wild type and mutant)

2. CFSE-stained controls: Stained on the day of analysis (wild type and mutant)

3. PI-stained controls: Fixed and stained with PI (wild type and mutant)

6.2.3.1 Cell Preparation for the Cell Proliferation Assay

Cells were grown to 70 % confluency in a 75 cm2 flask and were transferred into 15 mL tubes, counted, and suspended in 15 mL of 1 x PBS. 1,000,000 cells were added to 9 x

1.5 mL tubes and the volume made up to 1 mL with 1 x PBS.

For biological replicates: 2µl (10µM) of CellTraceTM CFSE (Invitrogen) dye solution in

DMSO was added to the cell suspension and mixed by gentle vortexing. Cell suspensions were protected from light and incubated for 20 minutes in a 37ºC water bath. After incubation, 5 mL of HAP1 medium was added to each tube, the cells

152 pelleted by centrifugation, and the medium removed by aspiration. The cells were suspended in 10 mL of HAP1 medium, transferred to 75 cm2 flasks and incubated for 3 days at 37ºC and 5% CO2. After 3 days’ incubation, cells were trypsinised, counted, and

1,000,000 cells were suspended in 1 mL of 1 x PBS in 1.5 mL tubes. Prior to cell proliferation analysis, 0.5 mL of PI solution in 1 x PBS (50 µg/mL) was added to the cell suspensions and mixed by gently vortexing, to allow for the identification of apoptotic cells.

For unstained controls: cells were prepared as detailed for the biological replicates but were not stained with CFSE or PI.

For CFSE-stained controls: Cells were prepared as detailed for the biological replicates on the day of the experiment and were not incubated for 3 days. No PI was added.

For PI-stained controls: Cells were prepared as detailed for the biological replicates without CFSE staining and on the day of the experiment, the cells were fixed in 1 mL of

70 % ethanol (in 1 x PBS) on ice for 30 minutes, the ethanol removed following centrifugation (400 G, 2 minutes) and the cells suspended in 1 mL of 1 x PBS and 0.5 mL of 50µg/mL PI (in 1 x PBS).

6.2.3.2 Cell Proliferation Assay of Wild Type and Mutant Cell Lines

The unstained controls were used to identify unstained cells. The CFSE-stained controls were employed to identify the generation-zero peak, as the cells had not replicated since staining, and to identify CFSE fluorescence. PI-stained controls were used to identify the apoptotic peak. PI-stained, CFSE -stained controls, and unstained controls were

153 used to generate a compensation control to resolve the spectral overlap between CFSE and PI.

6.2.3.3 Cell Proliferation Statistical Analysis

Three parameters were quantified to evaluate the proliferative characteristics of the wild type and mutant cell lines: 1) Proliferation index: the average number of cells that an initial cell became following cell division; 2) Division index: the average number of cells that a dividing cell became; 3) Percentage of divided cells: the percentage of cells that underwent proliferation [133].

Data were expressed as percentages, means and medians with 95% confidence interval.

Wild type and mutant cell lines were compared using the unpaired nonparametric

Mann-Whitney test and calculations were considered statistically significant with a P value < 0.05.

154

6.3 Results

6.3.1 Preliminary Cell Cycle Analysis Experiment

6.3.1.1 Gating strategy for cell cycle analysis

Cell cycle analysis was performed on the BD FACSCanto II with a 490nm laser.

A series of gating steps were employed on the flow cytometry software to exclude cellar debris and select for live cells based on size and cellular complexity/granularity (SSC-A vs. FSC-A), and select cells passing through the flow cytometer laser one at a time

(singlets) to exclude cells passing through the laser in clumps (doublets) (FSC-H vs.

FSC-A). Using this strategy, cellular debris and the majority of doublets present in the sample was excluded from the analysis. This gating strategy was applied to both wild type and mutant HAP1 cells, although minor changes were applied to account for population shifts between replicate runs.

6.3.1.2 Identification of two distinct populations within the wild type and mutant HAP1 cell lines

It was observed during the preliminary cell cycle analysis experiment that two distinct populations seemingly existed within both the wild type and mutant HAP1 cell lines.

These two populations were distinguished by their size and complexity/granularity: one population being smaller and less complex/granular (small) and the other being larger and more complex/granular (large). This was observed based on SSC-A and FSC-A

(Figure 6.3, A and D) and based on FSC-H vs. FSC-A (Figure 6.3 B and E). Based on

FSC-H and FSC-A, it appeared that the large population consisted of an abnormally large quantity of doublets, while the small population corresponded to singlets (Figure

6.3, C and D). While doublets are a normal occurrence with flow cytometry analyses,

155 the quantity of doublets observed was abnormal, and could potentially indicate an

Aneuploid population within the wild type and mutant cell cultures.

The extensive quantity of the larger cells prevented accurate analysis of the cell cycle as the mathematical model applied to the PI-DNA histograms predicted that the large population was Aneuploid and generated two cell cycle analysis models, one for each of the small and large populations.

The large population could have been excluded from the cell cycle analysis, however, it was determined important to first characterise the two populations prior to continuing the analysis, as it would only be appropriate to exclude the large population if it consisted of doublets and would be inappropriate if it were due to aneuploidy.

156 Wild Type Mutant

222666222.1..11 22626262.21..11 262.1262.1 262.1 262.1 262.612.1

) Live

) Live

) Live LLivivee ) ) Live 0 0 )

10 96.6

) 196.6 Live 0 0

0 196.6 0 Live

) 93.50Live 196.6 93.50 0 196.6

% ) 0 % 0 93.50 196.6 9900.1.18L8ive % 0 % 0

0 % ) 0 19) 6.6 Live 90.18 0

0 % 0 0 0 196.6 Live

93.50 19) 6.6 0 1

% 0 1 0 0 0

0 90.18 1

93.50 0 196.6 %

0 196.6 % 0

1 90.18 1 0

0 % 0 93.50 196.6 1

x % 0

x 1 90.18

x

0 % 0

( 0 1 ( x 1 ( x

x 131.1 0 1 x 131.1 1 ( 131.1 (

(

x

( 113311.1.1 A 1 x A

( 131.1 A

13x 1.1 - x - (

- A ( 131.1 A

131.( 1 x A A

-

- C 131.1 C 131.1 A ( - - C A

- 131.1 S A C S C A - S C C - - S C A S S 6655.5.5 S S S S 65.5 C - C S C S S 6655.5.5 S S 65.5 S 65.5 S C S

65.5 S

S 65.5 S

65.5 S

S 65.5

S 65.5 000 0 000 0 666555.5..55 111333111.1..11 111999666.6..66 222666222.1..11 000 00 65.5 131.1 196.6 262.1 0 000 0 66565.55..55 11313131.11..11 11919696.66..66 22626262.21..11 A 0 0 65F.6FF5S5SSC.C5C--A-AA 1( (x3(xx 11 1.31010100.0100)))1961.966.6 262.612.1 D 00 65.5 131.1 196.6 262.1 FSC-AFS (Cx- 1A0 (0x0 )1000) 0 0 65F.6F5FS5SSC.C5C--A-AA 1( (x3(x 1x11 . 13010100.010)0))1961.966.6 2622.612.1 FSC-A (x 1000) FSC-FAS (Cx- 1A0 0(x0 )1000) 222666222.1..11 FSC-A (x 1000) 262.1 262.1 22626262.21..11 262.1 262.1 ) )

) 262.1 0 0 262.1 0 ) 119966.6.6

0 196.6 0 ) 0 ) 0 SSininggleletsts ) ) 0 ) Single ts

10 96.6 0 0 0 0 196.6 0

10 96.6 1 ) 0 196.6 1 0 Single ts 196.6 Singlets 1

0 196.6 69.85% 0 Singlets

) 69.8S5i%ngle ts 0 Singlets

69.85 0

0 % 0 0 x

19) 6.6 1 0 x

Single ts 0 0

x Singlets 0

0

69.85% 0 196.6 ( 1 ( 6699.9.977%% 0

( Singlets 1

69.85 0 69.97

% 1 196.6 % 1 x 131.1 1

0 131.1 131.1 Singlets

69.85 0 % x ( 0 69.97% H 1 H

x ( x 69.97% H 13x 1.1 0 x

- 1

- ( ( - ( 131.1

69.97 ( %

H 131.1 1 131.1 x

C 131.1 131.1 C

H - x ( C - H

H ( H

S 131.1 x S H

S - C -

- 131.1 ( - C F F H 6655.5.5 131.1 F S

65.5 C C H C - S C - F

65.5 H S S S F C 65.5 S - C F

F 65.5 65.5 F 65.5 S F 65.5 C S

F 65.5 S 00 F 65.5

0 F 65.5 0 00 0 6655.5.5 113311.1.1 119966.6.6 226622.1.1 00 65.5 131.1 196.6 262.1 00 B 0 0 65.5F6FS5SC.C5-1-AA3 1( (x.1x 11 1300100.0110)9) 6.6196.6262.1262.1 E 0 0 65.F5SC-A1 3(x1 .11000)196.6 262.1 0 000 0 66565.55..55 11313131.11..11 11919696.66..66 22626262.21..11 FSC-AFS (Cx- 1A0 (0x0 )1000) 00 65.5 131.1 196.6 262.1 FSC-A (x 1000) 0 F6FFS5SSC.C5C--A-AA ( (x(x x11 13010100.010)0)) 196.6 262.1 0 65F.S5C-A (1x3 11.0100) 196.6 262.1 FSFCS-AC -(Ax (1x0 0100)00) GG00/G/G11 222999333 G0/G1 G2/M 77979999 293 G0/GG10/GGG122//MM 799 293 G0/G1G2/MG2/M 799 GGG00/0G//GG111 293 G2/M 799 G0/G1 G0G/G01/G1 22222000 559999 G2/M 220 599 GG22//MM 220 SSS 599 599 G2/MG2/M t t

220 t

t 599 t S G2/M S t n n n n t n t S n t u t u u u n u n u t t n o

147 n o 147 o 400 o u 400 o 147 u o 400 n n u u C o 147 C C o 400 C u C SS u C o 147

o 400 S C o 147 C o 400 S C C S C C S 777333 220000 73 202000 7373 200200 000 000 C 0--0-00.1..110 666555.5..55 111333111 111999666.6..66 222666222.1..11 F 0-0-0.8.80 6644.9.9 113300.7.7 119966.4.4 226622.1.1 -0.01 65.5 131 196.6 262.1 -0-.08.8 646.49.9 13103.07.7 19169.64.4 26226.21.1 -0.1 PPPII_6I__D5DD.N5NNAAA--A-AA ( (x1(xx 31 110000000))) 196.6 262.1 -0.8 PPI_I6_D4D.N9NAA-A-A ( 1x(3x 01 1.0700000)) 196.4 262.1 -0.1 P6I5_.D5 NA-A1 (3x1 1000)196.6 262.1 -0.8 P6I4P_.I9D_DNNAA-A1-3A (0x .( 7x1 010000)01)96.4 262.1 PI_PDI_NDAN-A- (Ax (1x0 010)00) PI_PDI_NDAN-A -(Ax (1x0 010)00) Figure 6.3 Preliminary Cell Cycle Analysis Experiment. A and D) FSC-A vs. SSC-A 111222666222 22121818888 1gating2621262 of wild type (A) and mutant (B) populations2188218 to8 exclude cellular debris. B and E) FSC1262-H vs. FSC-A to gate singlets. C and F) PI-2DNA188 -A histogram of PI-stained wild type999444777 (E) and mutant (F) population showing G0/G1,116164641411 S and G2/M phases of the cell 947 947 16411641

t 1641 t

947 t t t t

cycle as well as an elevated PI-DNA signal pastn the G2/M peaks, corresponding to the n n t n n t n t t u u u n u u n u t t n n

o 1094 o o 1094 631 u o o u 631 1094 o

631 n

largen cell population. u u o C 1094 o C 631 C C u C u o C 1094 o 631 C o 1094 C o 631 C C C C 331166 554547477 313616 547 313616 547547 0 00 00 0 0 0--0-00.1..110 666555.5..55 111333111 111999666.6..66 222666222.1..11 -0--0.08..880 66464.49..99 11313030.07..77 11919696.64..44 22626262.21..11 -0.01 65.5 131 196.6 262.1 -0.08 -0.8 64.9 64.9130.7130.7196.4196.4262.1262.1 -0.-10.1 P6PPI5I_I.6_5_D5DD.N5NNAAA--A-A1A 3( (1x(xx 31 1100000010))9)61.966.6 262.612.1 -0.8 P6PP4I_I.I_9_DDDNNNAAA-1A--A3A 0( x(.(x7 x1 101000001)0)9)6.4 262.1 PI_DNA-A (x 1000) PI_DPNIA_D-AN (Ax- A10 (0x0 )1000) PI_PDI_NDAN-A- (Ax (1x0 010)00) PI_DNA-A (x 1000)

157 262.1 262.1

) Live Live ) 0 196.6 0 0 93.50% 196.6 90.18

0 % 0 0 1

1

x ( x 131.1 (

131.1 A - A - C S C S

65.5 S

S 65.5

0 0 65.5 131.1 196.6 262.1 0 FSC-A (x 1000) 0 65.5 131.1 196.6 262.1 FSC-A (x 1000) 262.1 262.1 )

0 196.6 0

Single ts ) 0

0 196.6 1 Singlets

69.85% 0 x 0

( 69.97% 1 131.1

H x - (

131.1 C H S -

F 65.5 C S

F 65.5 0 0 65.5 131.1 196.6 262.1 0 FSC-A (x 1000) 0 65.5 131.1 196.6 262.1 FSC-A (x 1000)

G0/G1 293 G2/M 799 G0/G1

220 599 G2/M S t t n n u u

o 147

o 400 C C S

73 200

0 0 -0.1 65.5 131 196.6 262.1 -0.8 64.9 130.7 196.4 262.1 WildPI_D NTA-Aype (x 1000 ) P IMutant_DNA-A (x 1000)

Small PopulationG0/G1 1262 2188 Large Population G0/G1 Small Population S-Phase Large Population 947 Small Population G2/M 1641 S-Phase t t n

n Large Population u u

o 1094 o 631 G2/M C C

316 547

0 0 A -0.1 65.5 131 196.6 262.1 B -0.8 64.9 130.7 196.4 262.1 PI_DNA-A (x 1000) PI_DNA-A (x 1000)

Figure 6.4 Preliminary Mathematical Models of Wildtype and Mutant Cell Cycle Based on PI Fluorescence Intensity. Mathematical models generated by FCS Express 6 software of the wildtype (A) and mutant (B) cell cycle showing two distinct population. In red and blue are the models applied to the large and small populations. Note in (A), the model predicted the small population G2/M to be very small and did not predict a large population G0/G1.

6.3.2 Characterisation of large and small populations within HAP1 cell cultures

To characterise these two populations, cell sorting was performed on the BD

FACSMelody to separate the large and small populations from wild type and mutant

HAP1 cultures.

6.3.2.1 Gating strategy for separating large and small HAP1 populations

Small and large cells were distinguished from one another based on size and complexity/granularity on a SSC-A vs. FSC-A plot. Gates were created (figure 6.5, A and B) for both the small (red) and large (blue) cells and the two populations were successfully separated.

158 6.3.2.2 Visual Inspection of Small and Large HAP1 Populations

Sorted cells were plated into 12-well plates and visualized on the Zoe Fluorescent Cell

Imager before the cells attached to the bottom of the wells. As can be seen in figure 6.7, the smaller and less complex population consisted almost entirely singlets, while the larger and more complex population consisted almost entirely of doublets or clumps of cells. This was observed for both the wild type and mutant HAP1 cell lines and suggests that the large population consisted almost entirely of doublets rather than a large population of Aneuploid cells with twice the DNA content of the small population.

159

Wild Type

A B

Sorted small cells Sorted large cells

C D

Figure 6.5 Cell Sorting of Large and Small Wild Type HAP1 Populations. A) FSC- A vs. SSC-A, showing two cell populations distinguished by size and cellular complexity: in red a smaller and less complex population; in blue a larger and more complex population. B) FSC-H vs. FSC-A, showing the smaller population appearing as singlets and the larger population appearing as doublets. C) Zoe fluorescent cell image of the sorted small population showing singlets. D) Zoe fluorescent cell image of the sorted large population showing doublets.

160

Mutant

A B

Sorted small cells Sorted large cells

C D

Figure 6.6 Cell Sorting of Large and Small Mutant HAP1 Populations. A) FSC-A vs. SSC-A, showing two cell populations distinguished by size and cellular complexity: in red a smaller and less complex population; in blue a larger and more complex population. B) FSC-H vs. FSC-A, showing the smaller population appearing as singlets and the larger population appearing as doublets. C) Zoe fluorescent cell image of the sorted small population showing singlets. D) Zoe fluorescent cell image of the sorted large population showing doublets.

161 Figure 6.7 Images of the Sorted Small and Large Population. A) Sorted small wild type population. B) sorted large wild type population. Shown with arrows are doublets; singlets are unmarked. Images were taken on the Zoe Fluorescent Cell Imager using the brightfield channel.

162 6.3.3 Analysis of wild type and mutant cell cycle using flow cytometry

Having identified the likely cause of the elevated peaks in the cell cycle histograms as being caused by an unusually large quantity of doublets or clumps of cells, cell cycle analysis was performed with an additional gating step to exclude the large population so that cell cycle analysis could be performed solely on the small population/singlets.

6.3.3.1 Gating strategy for cell cycle analysis of wild type and mutant singlets

The gating strategy was essentially as described in 6.3.1.1, with the additional gate placed around the singlet cell cycle (excluding the doublet cell cycle) on a SSC-W vs.

PI_DNA-A plot, so that the G2/M peak of the singlets could be distinguished from the

G0/G1 peak of the doublets based on the elevated complexity/granularity (SSC-W) of the larger population relative to the smaller population (figures 6.8 and 6.9).

163 Wild type 262.1 262.1 26226.6212.1.1 262.1 262.1Small Cells 26226.21.1 ) S5mS5Sam.8mla9l aC%lll elC Clelsellslls 262.1 ) 0

1) 96.6 0 ) ) 0 5S5m.8a9l%l Cells 555.589.8%9% 19) 6.6 ) 0 ) 0 0 0 0 196.6 0 191) 69.6.6 0 0 0 0 0 0 1 55.89% 196.6 191) 69.66.6 0 0

0 0 0 1 Singlets 0 0 196.6 0

0 0 x 0 0 1 196.6 1 1 0 (

x

1 Singlets

0 91.39 1

% 1 SiSnginlgetlests ( 131.1

0

x x 1 x 131.1 ( x

A ( ( x 1 x 9S1i.n3g9lets

%

( 919.13.939

%

131.1 % - H ( 131.1 ( 1x 31.1

131.1 -

( 131.1 1x 31.1 A C A

A 91.39

% (

- 131.1 H - C - H H

S

- 131.1 - - A C S C C S - 65.5 H C C C S F 6- 5.5 S S C S S S S 65.5 S S 656.5.5 C S F 65.5 F F 656.55.5 S

S 65.5

F 65.5 0 0 0 65.5 131.1 196.6 262.1 A 0 00 B 0 00 65.5 131.1 196.6 262.1 0 00 656.655.F5.5SC-1A31 13(.x31 1.11.01001)91619.966.6.6 26226.6212.1.1 00 65.5 131.1 196.6 262.1 0 65.5 131.1 196.6 262.1 0 0 656.55.F5SC-1A31 (13x.11 1.10001)9169.66.6 26226.21.1 FSFFCSS-CAC-A- (Ax ( x(1x 01 01000)000)) 0 65.5FSC-A1 3(x1 .11000)196.6 262.1 FSC-A (x 1000) FSFCSC-A-A (x ( x1 010000)0) 262.1 262.1 FSC-A (x 1000) 26226.6212.1.1 26226.6212.1.1

) 262.1

) 262.1 0

0 196.6 ) 0 ) 196.6 ) ) 0 ) ) 0 0 0 0 0 0 1) 96.6 0 0 1 1910 69.66.6 0 0 )

196.6 1 0

19169.6.6 0 0 0 0

0 0 0

x 196.6 0 0 0 0 1 ( x 196.6 1 1 0

1 0

( 1 1 131.1

0

x 1 x 131.1 x ( x

( 1 ( x x W

(

( ( - W

1x 31.1

13113.11.1

- ( 1311x 3.311.1.1

C W ( W W 131.1

C - W - - W W

131.1 S - - - W S 65.5 C S C C - 6W 5.5 C S C C - S S S C S S S 65.5

6S 56.55.5 C S 65.5 S S

6S 56.5.5 S S

S 65.5 65.5 S S 0 0 0.0 65.5 131.1 196.6 262.1 C 000.00 65.5 131.1 196.6 262.1 D 0 00 0 0.0000.0.0 656P6.55I5._5.D5NA13-1A133.1 1(1.x1. 1101090116)99.666.6.6 262266.212.1.1 0.00.0.0 656P.655I._5.5DN1A31-1A3.31 1(.x1.1 1010910619).966.6.6 26226.6212.1.1 0.0 65.5 131.1 196.6 262.1 0.0 PI6P_P5ID_I._5DNDNANA-A-1A- (3Ax 1( x.(11x 01 01000)0010)9)6.6 262.1 PIP_PID_I_DNDNANA-A-A- (Ax ( x(1x 01 100000)00)) PI_DNA-A (x 1000) PI_DNA-A (x 1000) 2470 242724047700 2470 1853

t 181518385533 n 1853 u t t t

o 1235 n n n t u C u u n

o 1235 o o 12132535 u C C C o 1235

6C 18 616861188 618 E 0 000.0 65.5 131.1 196.6 262.1 0.00.0.0 656.655.5.5 PI_DNA131-1A3.31 1(1.x1.1 1000) 191619.9666.6.6 262266.212.1.1 0.0 65.5 PIP_PID_I_DNDNANA-A-1A- (3Ax 1( x.(11x 01 01000)000)) 196.6 262.1 Figure 6.8 Gating Strategy for the GenerationPI_DNA- Aof ( xa 1Wild000) Type Singlet Cell Cycle Histogram. A) FSC-A vs. SSC-A with a gate placed around the small HAP1 population and excluding the majority of the large population. B) FSC-H vs. FSC-A, showing the gated smaller population in red and some overlapping doublets (blue). C) SSC-W vs. PI_DNA-A showing the singlet cell cycle (red box) and the doublet cell cycle (blue box) with overlapping singlet G2/M and doublet G0/G1 PI_DNA-A fluorescence intensities (black box) but separated based on SSC-W (complexity/granularity). D) SSC-W vs. PI_DNA-A of the gated singlet cell cycle. E) Mathematical model of the singlet cell cycle.

164 Mutant 262.1 262.1 262.1 262.1 Small Cells 226622..1 Small Cells 226622.1.1 ) ) 57.59% 0 ) 0 ) 196.6 57S.m59a%ll Celllls 196.6 0 0 0 0 196.6 196.6 ) 0 ) 0 ) 0 ) 0 57.59% 0 0 0 Singlets 0 1 0 0 1 196.6 196.6 19 6.6 196.6 Singlets 0 0 1 0 0 1

x x 0 0 90.88% 0 0 ( x ( x SiSn9gin0lge.8ltes8t%s

1 1

1 1

( 131.1

( 131.1

131.1 131.1 H x x A x x 909.08.88%8% - ( H ( - ( A (

- 131.1 - 113311..1 131.1 C C H H A C A C S - S - - - S S F

S 65.5 65.5 C C C C F S 65.5 65.5 S S S S F

S 65.5 65.5 F S 65.5 65.5 00 00 A 00 6655.5.5 113311.1.1 119966.6.6 226622.1.1 B 00 00 6655.5.5 113311.1.1 119966.6.6 226622.1.1 0 65.F5FSSCC--AA1 33( 1(x1x. .11 10000001)1)9966..66 226622.1.1 00 656.55.5 FFSS1CC31-1A-3.A1 (. 1x(x 1 101009106090.)6).6 26226.12.1 FSC--A ((xx 11000000)) FSFCSC-A-A (x ( x1 010000)0) 226622.1.1 226622..11 262.1 226622..1 262.1 ) ) ) 0 ) 0 )

0 196.6 0 )

0 196.6 ) 0 196.6 0 0 )

196.6 0 0 0

0 196.6 0 0 0

0 196.6

196.6 1 0 0 0 0 1 196.6 1

0 0 0 1 1

x

0 x 1 1 ( x

( x x 1 (

1 31.1 (

( x x 131.1

131.1

( 1( 31.1 x 131.1 W

( W 131.1 -

W 131.1

W - W - 131.1 - C W W - - C C - W S C C - S S C 65.5 C F S

S 65.5 65.5 C F F S 65.5 S 65.5 F

F 65.5 S F 65.5 65.5 F F 0 0 0 0 C -000.5 65.2 130.8 196.5 262.1 0-0.5 65.2 130.8 196.5 262.1 -0-0.5.5 6655..22 130..88 119966.5.5 26226.21.1 -0.-50.5 65.265.2 130.1830.8196.5196.5262.1262.1 -0.5 6P5I._2DNA1-3A0 .(8x 100109)6.5 262.1 D -0.5 65.P2I_DN1A30-A.8 (x 1109060.)5 262.1 PPII__DNA-A ((xx 11000000)) PI_PDIN_DA-NAA (-xA 1 0(x0 01)000) PI_DNA-A (x 1000) PI_DNA-A (x 1000)

20260666 22006666 1549 15154499

t 1549 t n n t u t u n

o 1033 n

o 1033 u u C o 1C 033

o 1033 C C 515616 551166 E 0 0 -000-.05.5 65.2 11303.08.8 1961.956.5 262.2162.1 --00.5.5 65..2 PPI_I_DDNNAA1-1A3-30A 0(.8 x.(8 x1 010000)0) 19169.56.5 2622.612.1 PPII__DDNNAA-A-A ( x(x 1 0100)0) Figure 6.9 Gating Strategy for the Generation of a Mutant Singlet Cell Cycle Histogram.. A) FSC-A vs. SSC-A with a gate placed around the small HAP1 population and excluding the majority of the large population. B) FSC-H vs. FSC-A, showing the gated smaller population in red and some overlapping doublets (blue). C) SSC-W vs. PI_DNA-A showing the singlet cell cycle (red box) and the doublet cell cycle (blue box) with overlapping singlet G2/M and doublet G0/G1 PI_DNA-A fluorescence intensities (black box) but separated based on SSC-W (complexity/granularity). D) SSC-W vs. PI_DNA-A of the gated singlet cell cycle. E) Mathematical model of the singlet cell cycle.

165 6.3.3.2 Cell Cycle Analysis Quantitation

A mathematical model was applied to the wild type and mutant singlet cell cycle histogram for each of the biological replicates using the FCS Express 6 software. Chi2 analyses were performed to determine the goodness of fit of the mathematical model to each of the cell cycle histograms generated.

WildWildtyp etype Mutant Mutant

600 G0/G1 477 358 450 t t n n

BR1 u u

o 239 BR1 o 300 C C 150 119 0 0 450 S Phase 358 t t n BR2 n u u o

o 300 239 C C BR2 150 119 0 0 450 G2/M 358 t t n BR3 n u u

o 300 o 239 C C BR3 150 119 0 0 0 37.5 75 112.5 150 3.6e-18 37.5 75 112.5 150 PI_DNA-A (x 1000) PI_DNA-A (x 1000)

Figure 6.10 Cell Cycle Histograms Generated for Wild Type and Mutant Biological Replicates. The phases of the cell cycle are shown for the wild type cell line. BR1-3: Biological replicates 1-3.

166 Table 6.1 Summary of Cell Cycle Analysis Data for Wild Type and Mutant HAP1 Cell Lines Cell Line Biological % % % G0/G1 G2/M Chi2 replicate Cells Cells Cells CV CV in in in S G0/G1 G2/M phase Wild type 1 28.88 13.74 57.39 3.12 2.69 2.48 2 31.67 10.09 58.25 4.90 4.35 2.16 3 32.67 11.13 56.21 3.93 3.39 2.45 Mutant 1 29.85 11.21 58.95 4.69 4.24 2.24 2 26.65 11.60 61.75 3.64 3.64 1.36 3 31.78 10.81 57.41 3.89 3.48 2.44 CV, coefficient of variance;

All chi2 values for each biological replicate were within an acceptable range (<5) suggesting the models applied were appropriate (table 6.1). Coefficient of variance (CV) values were calculated for each G0/G1 and G1/M peak. All CV values were < 6, suggesting that there was little variation in PI fluorescence intensity for each G0/G1 and

G2/M peak for both wild type and mutant biological replicates (Table 6.1).

6.3.3.3 Statistical analysis of percentages of cells in G0/G1 and G2/M phases of the cell cycle

Unpaired nonparametric Mann-Whitney test were performed for the percentages of cells in the G0/G1 and G2/M phases of the cell cycle. No statistically significant differences were found between wild type and mutant cell lines for the percentages of cells in the

G0/G1 (P = 0.70) and G2/M (P = 0.37) phases of the cell cycle.

167 80 Wildtype Mutant 60

40 % Cells

20

0

G2/M G0/G1 S-Phase

Figure 6.11 Wild type and Mutant HAP1 Cell Lines Show No Difference in the Percentages of Cells in the G0/G1 or G2/M Phases of the Cell Cycle. No significant difference was observed in the percentages of cells in the G0/G1 (P = 0.70) and G2/M (P = 0.37) phases of the cell cycle between the wild type and mutant HAP1 cell lines.

6.3.4 Analysis of Wild type and Mutant Proliferation Using Flow Cytometry

The Cell proliferation assay was conducted on the BD LRDFortessa by the dye dilution method and using a 490nm laser.

Prior to running the experiment, the cells stained with CFSE and negative unstained cells were visualized using the Zoe Fluorescent Cell Imager using the standard bright field channel and with a 490nm light source to ensure the cells were stained with CFSE

168 Figure 6.12 Confirmation of Successful CFSE Staining. A) CFSE-Stained cells (brightfield channel), B) CFSE-stained cells (488nm channel), C) Unstained cells (brightfield channel), D) Unstained cells (488nm channel). A and B show CFSE stained cells, with the fluorescence clearly visible under the 488nm light source (B). C and D show unstained cells and no fluorescence is visible under the 488nm light source (D), indicating that they are CFSE negative.

6.3.4.1 Gating strategy for cell proliferation assay

The following gating strategy was applied to both wild type and mutant HAP1 cells and controls, although minor changes were applied to account for population shifts between replicate runs.

6.3.4.1.1 Selection of Small Singlet Population for Cell Proliferation Analysis

A series of gating steps were employed on the flow cytometry software to first select the small cell populations previously identified in 6.3.2, and excluding cellular debris based

169 on size and cellular complexity/granularity (SSC-A vs. FSC-A). Singlets were then selected from the gated small cell population (FSC-H vs. FSC-A). This was achieved by running the unstained controls for the wild type and mutant HAP1 cell lines.

262.1 262.1

196.6 196.6 ) ) 0 0 0 0 0 0 1 1

Small Cells Singlets x x ( (

93.87% 131.1 54.85% 131.1 A H - - C C S S S F 65.5 65.5

0 0 0 65.5 131.1 196.6 262.1 0 65.5 131.1 196.6 262.1 FSC-A (x 1000) FSC-A (x 1000)

Figure 6.13 Gating Strategy for the Selection of Singlets from the Small HAP1 Population. Small cells were selected based on size (FSC-A) and complexity/granularity (SSC-A). Singlets from the small population were selected based on FSC-H vs. FSC-A.

6.3.4.1.2 Gating Strategy to Compensate for PI and CFSE Spectral Overlap

Prior to performing the proliferation assay, a compensation control was generated to resolve the spectral overlap between CFSE and PI. This was performed with the small singlet population gate applied to the CFSE-stained, PI-stained and unstained controls.

Gates were placed around the stained and unstained populations and the compensation applied to all subsequent samples through the FACSDiva 8.0.1 software.

170 1084 94 CFSE Unstained

813 71 CFSE Stained t t n n u u

o 47 o 542 C C

271 24

A 0 B 0 1 2 3 4 5 2 0 2 3 4 5 -10 10 10 10 10 -10 10 10 10 10 10 CFSE-A CFSE-A

313 74

PI Unstained 235 56 PI Stained t t n n u u

o 157 o 37 C C

78 19 C D 0 0 1 2 3 4 5 1 2 3 4 5 -10 10 10 10 10 10 10 10 10 10 PI_DNA-A PI_DNA-A

Figure 6.14 Histograms Generated to Compensate for PI and CFSE Spectral Overlap. A) CFSE-unstained control. B) CFSE-stained control. C) PI-unstained control. D) PI-stained control. Each of the peaks was gated and the compensation control generated from these histograms was applied to all biological replicates. The x-axis shows the biexponential transformation of CFSE or PI fluorescence.

6.3.4.2 Mathematical Modelling of Wild type and Mutant HAP1 Proliferation

Prior to running the biological replicates and following the generation of the compensation controls, CFSE Stained controls were re-analysed by flow cytometry and a gate placed around the CFSE peak to identify the undivided/generation-zero peak for later modelling of cell proliferation from this point.

171 F igure 6.15 CFSE Histograms of CFSE Stained Controls Generated to Identify the Generation-Zero Peak. CFSE fluorescence histograms of wild type (A) and mutant (B) CFSE stained controls showing the generation-zero (undivided) peak (red). The x-axis shows the biexponential transformation of CFSE fluorescence.

Following the gating of first generation peak, each biological replicate was analysed for

CFSE fluorescence and a mathematical model was later applied to the cell proliferation histograms using the FCS Express 6 software. A chi2 statistic was calculated for each model applied to the cell proliferation data to determine the goodness of fit of the models to the experimental data. All chi2 values were <5, suggesting a good fit of the model to the experimental data for all biological replicates.

172 C

B

Figure 6.16 Mathematical Modelling of Wild Type and Mutant Proliferation After CFSE Staining and 3 Days of Growth. Mathematical proliferation models applied to wild type biological replicate 1 (A) and mutant biological replicate 1 (B). The generation-zero (G0) peak is overlaid (red). The generation –one and -two peaks are absent. G3, generation-three. G4, generation-four. G5, generation-five. G6, generation- six. G7, generation-seven. 7 generations of cell division are observed for both wild type and mutant cell lines by the decreasing fluorescence intensity of subsequent peaks. X- axis is the biexponential transformation of CFSE fluorescence area (CFSE-A).

173 133 574

100 431 t t n n u u o o 67 287 C C

33 144 A D 0 0 1 2 3 4 5 1 2 3 4 5 10 10 10 10 10 10 10 10 10 10 CFSE-A CFSE-A

306 96

230 72 t t n n u u

o 48 o 153 C C

77 24 B E 0 0 1 2 3 4 5 1 2 3 4 5 10 10 10 10 10 10 10 10 10 10 CFSE-A CFSE-A

72 93

54 70 t t n n u u

o 36 o 47 C C

18 23 C F 0 0 1 2 3 4 5 1 2 3 4 5 10 10 10 10 10 10 10 10 10 10 CFSE-A CFSE-A

Figure 6.17 Mathematical Models Generated for Each of the Wild Type and Mutant HAP1 Biological Replicates. A-C) Wild type biological replicate 1-3. D-F) Mutant biological replicates 1-3. Each coloured peak represents a successive generation, with CFSE fluorescence intensity decreasing by a factor of 2 for each generation. 7-8 generations were observed for all replicates. X-axis is the biexponential transformation of CFSE fluorescence area (CFSE-A).

174

6.3.4.3 Statistical Analysis of CFSE Distributions

Statistical analysis of CFSE distributions involved the calculation of the proliferation index, division index, and the percentage of divided cells (%Dil) for the wild type and mutant HAP1 cell lines.

40 40 35 35 30 30 25

Division Index 25

Proliferation Index 20 20

A Mutant Wildtype B Mutant Wildtype 110 80 Wildtype

105 60 Mutant

100 40 % of Cells 95 20 % of Cells Divided C 90 D 0 G3 G4 G5 G6 G7 G8 Mutant Generation Wildtype

Figure 6.18 Evaluation of Proliferative Characteristics of Wild type and Mutant HAP1 Cell Lines Using Flow Cytometry. A) Proliferation index of wild type and mutant biological replicates. B) Division index of wild type and mutant biological replicates. C) The %Dil for wild type and mutant biological replicates. D) Percentages of cells in each generation. For (A), (B) and (C), the middle bar represented the mean value. Error bars are ± 95% confidence interval.

Table 6.2 Wild type and Mutant HAP1 Proliferation Kinetics Measure of Proliferation Wild type Mutant P-Value Proliferation Index (PrI) 29.62 ± 2.2 33.13 ± 1.4 0.100 Division Index (DI) 29.62 ± 2.2 33.13 ± 1.4 0.100 % Divided Cells (Dil) 99.39 ± 0.33 99.59 ± 0.36 0.600 Median +/- 95% Confidence Interval, Mann Whitney test.

175 No significant difference was observed for the proliferation index between the wild type

(29.62 ± 2.2) and mutant (33.13± 1.4) cell lines (P = 0.100). No significant difference was observed for the division index between the wild type (29.62 ± 2.2) and mutant

(33.13± 1.4) cell lines (P = 0.100). No significant difference in the %Dil was observed for the wild type (99.39 ± 0.33) and mutant (99.59 ± 0.36) cell lines (P = 0.600). There was a slight increase in the proliferation index, division index and %Dil for the mutant cell line, however, the differences observed were within the 95% confidence interval.

176 6.4 Discussion

6.4.1 Preliminary Cell Cycle Analysis and the Identification of two Distinct Populations

Within the Wild type and Mutant HAP1 cell lines

During the preliminary cell cycle analysis experiment, it was observed that two distinct populations coexisted within each of the wild type and mutant cell lines, with one population being larger and more complex than the other. Cell sorting was employed to separate the two populations from both cell lines and the two populations were visually inspected. Visual inspection revealed that the large population consisted almost entirely of doublets and large clumps of cells, while the smaller population consisted almost entirely of singlets, and suggested that the two populations did not differ by their karyotype. Visual characterisation of the two populations, however, cannot rule out the possibility of two populations differing by their karyotype existing within these cell lines. To resolve this issue, a karyotype analysis is required [134].

6.4.2 Analysis of Wild type and Mutant Cell Cycle Using Flow Cytometry

The identification of the cell cycle peaks was achieved through PI staining of DNA and quantifying the percentages of cells in the G0/G1 and G2/M peaks in cell cycle histograms. A gate was placed around the singlet/small population cell cycle peaks on a

SSC-W vs. PI_DNA-A plot, so that the overlapping singlet G2/M and doublet G0/G1 peaks could be resolved based on cellular complexity/granularity (SSC). No significant difference was observed in the percentages of cells in the G0/G1 (P = 0.70) or G2/M (P

= 0.37) phases of the cell cycle between the two cell lines. Overall these data suggest that the R1805W mutation does not alter the cell cycle.

177 Future experiments involving cell cycle analysis could involve synchronizing the cells in the same phase of the cell cycle sampling the cells at different time points to track the cells passage through the various cell cycle checkpoints [135]. Any discrepancies between the wild type and mutant cell line would indicate that the SRRM2 mutation is affecting the specific transition and further targeted analyses could be undertaken.

6.4.3 Analysis of Wild type and Mutant Proliferation Using Flow Cytometry

The proliferative rate of the wild type and mutant HAP1 cell lines was measured using

CFSE staining and the dye dilution method. To evaluate the proliferative characteristics of the cell lines, statistical analysis of CFSE distributions involved the calculation of the proliferation index, division index and the percentage of divided cells. There was no significant difference between the two cell lines for any of the proliferation metrics measured in this experiment and these data do not support increased proliferative characteristics of the mutant cell line relative to the wildtype.

In future, the cell proliferation assay could be repeated, this time with sampling from multiple time points, so that any minor changes in the proliferation rate during the early cycles of replication could be observed, i.e. when there was still a population of undivided cells.

6.4.3 Future Experiments

Quantifying the percentages of cells in the G0/G1 and G2/M phases of the cell cycle and measuring the proliferative rates of the two cell lines, are two of the main methods of characterizing growth kinetics. The major limitation of these two experiments is that they do not measure S-phase kinetics. An S-phase experiment using a thymidine

178 analogue such as bromodeoxyuridine (BrdU) could have been conducted to characterise the S-phase of both cell lines [136].

6.4.4 Conclusion

The aim of this research chapter was to employ flow cytometry techniques to characterise the wild type and mutant HAP1 cell lines in general and to investigate the growth kinetics of each. The preliminary cell cycle analysis and the cell sorting of the small and large populations characterised the HAP1 cell line insofar of ruling out the possibility that the two populations differed entirely due to their karyotype. It could not however, rule out the possibility of two populations existing within the cell lines and differing by karyotyope. The proliferative characteristics of the HAP1 cell lines were successfully investigated in this chapter although no significant difference between the two cell lines was observed. Additional experiments could be undertaken to further characterise the growth kinetics of the cell lines, however, the data presented here does not support the hypothesis that SRRM2 mutations alter the normal regulation of the cell cycle and increase the proliferative activity of the mutant cell line relative to the wild type.

179

Chapter 7 - General Discussion and Conclusion

This thesis aimed to:

1. To generate mutant model cell systems harboring the SRRM2 mutations for later

functional studies to assess the clinical significance of these mutations.

2. To investigate whether these mutant cell systems display differences in the

splicing pattern of specific pre-mRNAs through quantitative real-time PCR.

3. To employ flow cytometry techniques to characterise the HAP1 cell line in

general and to evaluate the proliferative characteristics of these mutant cell

systems to determine whether these mutations prevent proper regulation of the

cell cycle.

The original project plan was to perform a transient over-expression analysis in

HEK293 and Nthy-ori 3-1 cell lines transfected with pcDNA3-EGFP and pcDNA3.1 vector constructs containing the wild type or mutant SRRM2 sequences. Construction of the wild type vectors was successful, yet challenging due to the large size of the SRRM2

ORF. However, construction of the mutant vectors was abandoned to due to difficulties associated with the size of the SRRM2 ORF. These challenges were extensive but by no means insurmountable, however, they demonstrated the inappropriateness of this method to perform functional genomics studies on genes with large ORFs. Additionally, transfection of these large plasmid constructs into the cell lines may have also proven challenging. It was in this context that the transient over-expression analysis was

180 abandoned in favor of creating model cell systems of SRRM2 mutations using the new

CRISPR/Cas9 technique.

CRISPR/Cas9 was successfully implemented to generate a mutant HAP1 cell line harboring the R1805W mutation, however, the cell line was heterozygous for the mutation. When studying the functional consequences of a mutation in a cell based system, it is preferred to have a homozygous system so that any functional consequences observed will be exemplified and clearly detectable. FNMTC patients harboring SRRM2 mutations, both those in the WA cohort with the R1805W mutation and those with the S346F mutation previously identified, are heterozygous rather than homozygous. This indicates that if the SRRM2 mutations indeed predispose to FNMTC, then the underlying oncogenic processes must be significant enough to induce FNMTC in a heterozygous cell line. It was in this context that the heterozygous R1805W HAP1 cell line was deemed appropriate to study the association of the R1805W variant to

FNMTC.

RT-qPCR was performed to characterise the splicing pattern of the wild type and mutant cell lines to determine whether there was a difference in splicing patterns of 7 exons previously shown to differentially spliced in FNMTC patients heterozygous for the S346F mutation (CAMMK2, CDC16, CTNNA1, FBXW4, HBP1, PIM2, SPPL3).

No significant differences were identified between the wild type and mutant cell lines, however, there was some variation in the data generated between biological replicates and this may have prevented any minor differences in the splicing pattern from being observed. Had the generation of the S346F mutant HAP1 cell line been successful, it would have served as a positive control for the R1805W cell line. Regardless, no large differences were observed between the wild type and mutant cell lines, suggesting that

181 the R1805W mutation does not alter the splicing pattern of the 7 exons investigated.

Future studies should include high-throughput RNA-seq to investigate the whole transcriptome, rather than a targeted study presented in this thesis.

Flow cytometry experiments were first employed to characterise the wild type and

HAP1 cell lines in general. In this regard, they were successful in clarifying some ambiguity relating to the population structure of HAP1 cell lines, which is a critical step if this cell line is to be employed in the future for clinical genetics studies. Cell cycle analysis was performed to quantify the percentages of cells in the G0/G1 and G2/M phases of the cell cycle using propidium iodide staining and quantification of peak fluorescence intensity corresponding to cells within the G0/G1 and G2/M peaks. The experiments were completed successfully in this regard; however, no significant differences were observed between the wild type and mutant cell lines. The proliferation kinetics of the two cell lines were characterised with a proliferation assay and the dye dilution method following CFSE staining, however, no significant differences were observed in the proliferation metrics quantified. The proliferation assay could be repeated, this time with a shorter incubation time so that more obvious differences could be observed, however, this would be by no means definitive, and considering the negative results obtained for the cell cycle analysis experiment and the proliferation assay, it seems more likely that the R1805W mutation does not affect the cell cycle or proliferation.

Given the data presented here, great success was made in characterizing the R1805W mutation even though no oncogenic processes were identified in the mutant cell line.

This highlights a large problem currently in the field of clinical genetics: scientists identifying variants predicted to be pathogenic or even of unknown significance, by in

182 silco bioinformatics techniques, lacking a defined methodology to assess these predications and report these variants to clinicians who as it currently stands, do not know if these variants are actionable for screening purposes or treatment. Until great progress is made in the bioinformatics field to narrow down the number of variants identified, the challenge for clinical genetics laboratories is to work through all the known variants associated with a disease and determine pathogenicity using in vitro model cell systems and to create a publically available index for clinicians.

In conclusion, massively parallel functional studies are required to rapidly sift through variants identified by in silico methods and identify the pathogenic lesions. A total of 7

SRRM2 variants have been identified in individuals with FNMTC, including the previously identified S346F and R1805W mutations found to co-segregate with the disease and predicted to be pathogenic, and an additional 5 variants of unknown significance identified in the WA cohort. Future studies investigating SRRM2 variants and FNMTC could include all 7 of the variants, so each the can be assessed for pathogenicity. In this context, CRISPR/Cas9 is an attractive candidate to generate the model cell systems as it can be scaled up to a high-throughput level, and in this context, the work presented in this study has made great progress in establishing the new technology, even in the context of the issues encountered, as the system was successful in generating the heterozygous mutant cell line and very plausible hypotheses regarding the causes of the issues identified have been postulated and with simple solutions proposed.

183 References:

1. Mihai, R., Physiology of the pituitary, thyroid and adrenal glands. Surgery (Oxford), 2011. 29(9): p. 419-427. 2. Pawlina, W., M. Ross, and G. Kaye, Histology: a text and atlas. 2006, Lippincott Williams & Wilkins, Baltimore, MD. 3. Kirsten, D., The thyroid gland: physiology and pathophysiology. Neonatal Network, 2000. 19(8): p. 11-26. 4. Wartofsky, L. and D. Van Nostrand, Thyroid cancer: a comprehensive guide to clinical management. 2016: Springer. 5. De Felice, M. and R. Di Lauro, Thyroid development and its disorders: genetics and molecular mechanisms. Endocrine reviews, 2004. 25(5): p. 722-746. 6. Kumar, V., A.K. Abbas, N. Fausto, and J.C. Aster, Robbins and Cotran pathologic basis of disease. 2014: Elsevier Health Sciences. 7. Ferlay, J., I. Soerjomataram, M. Ervik, R. Dikshit, S. Eser, C. Mathers, M. Rebelo, D. Parkin, D. Forman, and F. Bray, Cancer Incidence and Mortality Worldwide: IARC CancerBase No. 11. Lyon, France: International Agency for Research on Cancer, 2013. 8. Eldridge, A.G., P.A. Sharp, and B.J. Blencowe, The SRm160:300 splicing coactivator is required for exon-enhancer function. Proc. Natl. Acad. Sci, 1999. 96(25): p. 6123-6130. 9. Torre, L.A., F. Bray, R.L. Siegel, J. Ferlay, J. Lortet-Tieulent, and A. Jemal, Global cancer statistics, 2012. CA Cancer J Clin, 2015. 65(2): p. 87-108. 10. Stewart, B.W. and C.P. Wild, World Cancer Report 2014. Lyon: International Agency for Research on Cancer, 2014. 11. Weeks, A.L., S.G. Wilson, L. Ward, J. Goldblatt, J. Hui, and J.P. Walsh, HABP2 germline variants are uncommon in familial nonmedullary thyroid cancer. BMC Med Genet, 2016. 17(1): p. 60. 12. Negri, E., C. La Vecchia, S. Franceschi, and F. Levi, Patterns of mortality from major cancers in europe. Cancer Epidemiology, Biomarkers & Prevention, 1994. 3(7): p. 531-536. 13. Bonora, E., G. Tallini, and G. Romeo, Genetic Predisposition to Familial Nonmedullary Thyroid Cancer: An Update of Molecular Findings and State-of- the-Art Studies. J Oncol, 2010. 2010: p. 385206. 14. Freed, D., E.L. Stevens, and J. Pevsner, Somatic mosaicism in the . Genes, 2014. 5(4): p. 1064-1094. 15. Futreal, P.A., L. Coin, M. Marshall, T. Down, T. Hubbard, R. Wooster, N. Rahman, and M.R. Stratton, A census of human cancer genes. Nature Reviews Cancer, 2004. 4(3): p. 177-183. 16. Nikiforov, Y.E. and M.N. Nikiforova, Molecular genetics and diagnosis of thyroid cancer. Nat Rev Endocrinol, 2011. 7(10): p. 569-80. 17. Tomsic, J., H. He, K. Akagi, S. Liyanarachchi, Q. Pan, B. Bertani, R. Nagy, D.E. Symer, B.J. Blencowe, and A. de la Chapelle, A germline mutation in SRRM2, a splicing factor gene, is implicated in papillary thyroid carcinoma predisposition. Sci Rep, 2015. 5: p. 10566. 18. Sasanakietkul, T., T.D. Murtha, M. Javid, R. Korah, and T. Carling, Epigenetic modifications in poorly differentiated and anaplastic thyroid cancer. Molecular and Cellular Endocrinology, 2017. 19. Heilo, A., E. Sigstad, and K. Groeholt, Atlas of thyroid lesions. 2010: Springer Science & Business Media.

184 20. Navas-Carrillo, D., A. Rios, J.M. Rodriguez, P. Parrilla, and E. Orenes-Pinero, Familial nonmedullary thyroid cancer: screening, clinical, molecular and genetic findings. Biochim Biophys Acta, 2014. 1846(2): p. 468-76. 21. Lebastchi, A.H. and G.G. Callender, Thyroid cancer. Curr Probl Cancer, 2014. 38(2): p. 48-74. 22. Haugen, B.R., E.K. Alexander, K.C. Bible, G.M. Doherty, S.J. Mandel, Y.E. Nikiforov, F. Pacini, G.W. Randolph, A.M. Sawka, M. Schlumberger, K.G. Schuff, S.I. Sherman, J.A. Sosa, D.L. Steward, R.M. Tuttle, and L. Wartofsky, 2015 American Thyroid Association Management Guidelines for Adult Patients with Thyroid Nodules and Differentiated Thyroid Cancer: The American Thyroid Association Guidelines Task Force on Thyroid Nodules and Differentiated Thyroid Cancer. Thyroid, 2016. 26(1): p. 1-133. 23. Nixon, I.J., C. Suarez, R. Simo, A. Sanabria, P. Angelos, A. Rinaldo, J.P. Rodrigo, L.P. Kowalski, D.M. Hartl, M.L. Hinni, J.P. Shah, and A. Ferlito, The impact of family history on non-medullary thyroid cancer. Eur J Surg Oncol, 2016. 42(10): p. 1455-63. 24. Burgess, J., R., A. Duffield, S.J. Wilkinson, R. Ware, T. Greenawar, M., J. Percival, and L. Hoffman, Two families with an autosomal dominant inheritance pattern for papillary carcinoma of the thyroid. J Clin Endocrinol Metab., 1997. 82(2). 25. Vanderpump, M.P. and W. Tunbridge, The epidemiology of thyroid diseases. Werner and Ingbar's the thyroid: a fundamental and clinical text, 2005: p. 398- 406. 26. Puxeddu, E., C. Durante, N. Avenia, S. Filetti, and D. Russo, Clinical implications of BRAF mutation in thyroid carcinoma. Trends Endocrinol Metab, 2008. 19(4): p. 138-45. 27. Bonora, E., C. Rizzato, C. Diquigiovanni, T. Oudot-Mellakh, D. Campa, M. Vargiolu, M. Guedj, N. Consortium, J.D. McKay, G. Romeo, F. Canzian, and F. Lesueur, The FOXE1 locus is a major genetic determinant for familial nonmedullary thyroid carcinoma. Int J Cancer, 2014. 134(9): p. 2098-107. 28. Yang, S. and J. Ngeow, Familial non-medullary thyroid cancer: unraveling the genetic maze. Endocr Relat Cancer, 2016. 23(12): p. R577-R595. 29. Bauer, A., J., Clinical Behavior and Genetics of Nonsyndromic, Familial Nonmedullary Thyroid Cancer. Front Horm Res, 2013. 41: p. 141-148. 30. Alsanea, O., N. Wada, K. Ain, M. Wong, K. Taylor, P.H. Ituarte, P.A. Treseler, H.-U. Weier, N. Freimer, and A.E. Siperstein, Is familial non-medullary thyroid carcinoma more aggressive than sporadic thyroid cancer? A multicenter series. Surgery, 2000. 128(6): p. 1043-1051. 31. Uchino, S., S. Noguchi, H. Kawamoto, H. Yamashita, S. Watanabe, H. Yamashita, and S. Shuto, Familial nonmedullary thyroid carcinoma characterized by multifocality and a high recurrence rate in a large study population. World J Surg, 2002. 26(8): p. 897-902. 32. Mazeh, H., J. Benavidez, J. Poehls, L., L. Youngwirth, H. Chen, and R. Sippel, S., In patients with thyroid cancer of follicular cell origin, a family history of nonmedullary thyroid cancer in one first-degree relative is associated with more aggressive disease. Thyroid, 2012. 22(1): p. 3-8. 33. Park, Y.J., H.Y. Ahn, H.S. Choi, K.W. Kim, D.J. Park, and B.Y. Cho, The long- term outcomes of the second generation of familial nonmedullary thyroid carcinoma are more aggressive than sporadic cases. Thyroid, 2012. 22(4): p. 356-362. 34. Pinto, A.E., G.L. Silva, R. Henrique, F.D. Menezes, M.R. Teixeira, V. Leite, and B.M. Cavaco, Familial vs sporadic papillary thyroid carcinoma: a

185 matched-case comparative study showing similar clinical/prognostic behaviour. Eur J Endocrinol, 2014. 170(2): p. 321-7. 35. Lee, Y.M., J.H. Yoon, O. Yi, T.Y. Sung, K.W. Chung, W.B. Kim, and S.J. Hong, Familial history of non-medullary thyroid cancer is an independent prognostic factor for tumor recurrence in younger patients with conventional papillary thyroid carcinoma. J Surg Oncol, 2014. 109(2): p. 168-73. 36. Jiwang, L., L. Zhendong, L. Shuchun, H. Bo, and L. Yanguo, Clinicopathologic characteristics of familial versus sporadic papillary thyroid carcinoma. Acta Otorhinolaryngologica Italica, 2015. 35(4): p. 234. 37. Lei, S., D. Wang, J. Ge, H. Liu, D. Zhao, G. Li, and Z. Ding, Single-center study of familial papillary thyroid cancer in China: surgical considerations. World J Surg Oncol, 2015. 13: p. 115. 38. Tavarelli, M., M. Russo, R. Terranova, C. Scollo, A. Spadaro, G. Sapuppo, P. Malandrino, R. Masucci, S. Squatrito, and G. Pellegriti, Familial Non-Medullary Thyroid Cancer Represents an Independent Risk Factor for Increased Cancer Aggressiveness: A Retrospective Analysis of 74 Families. Front Endocrinol (Lausanne), 2015. 6: p. 117. 39. Cao, J., C. Chen, C. Chen, Q.L. Wang, and M.H. Ge, Clinicopathological features and prognosis of familial papillary thyroid carcinoma--a large-scale, matched, case-control study. Clin Endocrinol (Oxf), 2016. 84(4): p. 598-606. 40. Zhang, Q., S. Yang, X.Y. Meng, G. Chen, and R.Z. Pang, Clinical Analysis of Familial Nonmedullary Thyroid Carcinoma. World J Surg, 2016. 40(3): p. 570- 3. 41. Maxwell, E.L., F.T. Hall, and J.L. Freeman, Familial NonMedullary Thyroid Cancer: A MatchedCase Control Study. The Laryngoscope, 2004. 114(12): p. 2182-2186. 42. Ito, Y., K. Kakudo, M. Hirokawa, M. Fukushima, T. Yabuta, C. Tomoda, H. Inoue, M. Kihara, T. Higashiyama, and T. Uruno, Biological behavior and prognosis of familial papillary thyroid carcinoma. Surgery, 2009. 145(1): p. 100-105. 43. Robenshtok, E., G. Tzvetov, S. Grozinsky-Glasberg, I. Shraga-Slutzky, R. Weinstein, L. Lazar, S. Serov, J. Singer, D. Hirsch, and I. Shimon, Clinical characteristics and outcome of familial nonmedullary thyroid cancer: a retrospective controlled study. Thyroid, 2011. 21(1): p. 43-48. 44. Moses, W., J. Weng, and E. Kebebew, Prevalence, clinicopathologic features, and somatic genetic mutation profile in familial versus sporadic nonmedullary thyroid cancer. Thyroid, 2011. 21(4): p. 367-71. 45. Sippel, R.S., N.R. Caron, and O.H. Clark, An evidence-based approach to familial nonmedullary thyroid cancer: screening, clinical management, and follow-up. World journal of surgery, 2007. 31(5): p. 924-933. 46. Triponez, F., M. Wong, C. Sturgeon, N. Caron, D.G. Ginzinger, M.R. Segal, E. Kebebew, Q.-Y. Duh, and O.H. Clark, Does familial non-medullary thyroid cancer adversely affect survival? World journal of surgery, 2006. 30(5): p. 787- 793. 47. Rosario, P.W. and M.R. Calsolari, Should a family history of papillary thyroid carcinoma indicate more aggressive therapy in patients with this tumor? Arquivos Brasileiros de Endocrinologia & Metabologia, 2014. 58(8): p. 812-816. 48. Khan, A., J. Smellie, C. Nutting, K. Harrington, and K. Newbold, Familial nonmedullary thyroid cancer: a review of the genetics. Thyroid, 2010. 20(7): p. 795-801. 49. Rios, A., J.M. Rodriguez, D. Navas, A. Cepero, N.M. Torregrosa, M.D. Balsalobre, and P. Parrilla, Family Screening in Familial Papillary Carcinoma:

186 The Early Detection of Thyroid Disease. Ann Surg Oncol, 2016. 23(8): p. 2564- 70. 50. He, H., A. Bronisz, S. Liyanarachchi, R. Nagy, W. Li, Y. Huang, K. Akagi, M. Saji, D. Kula, A. Wojcicka, N. Sebastian, B. Wen, Z. Puch, M. Kalemba, E. Stachlewska, M. Czetwertynska, J. Dlugosinska, K. Dymecka, R. Ploski, M. Krawczyk, P.J. Morrison, M.D. Ringel, R.T. Kloos, K. Jazdzewski, D.E. Symer, V.J. Vieland, M. Ostrowski, B. Jarzab, and A. de la Chapelle, SRGAP1 is a candidate gene for papillary thyroid carcinoma susceptibility. J Clin Endocrinol Metab, 2013. 98(5): p. E973-80. 51. Ngan, E.S., B.H. Lang, T. Liu, C.K. Shum, M.T. So, D.K. Lau, T.Y. Leon, S.S. Cherny, S.Y. Tsai, C.Y. Lo, U.S. Khoo, P.K. Tam, and M.M. Garcia-Barcelo, A germline mutation (A339V) in thyroid transcription factor-1 (TITF-1/NKX2.1) in patients with multinodular goiter and papillary thyroid carcinoma. J Natl Cancer Inst, 2009. 101(3): p. 162-75. 52. Gara, S.K., L. Jia, M.J. Merino, S.K. Agarwal, L. Zhang, M. Cam, D. Patel, and E. Kebebew, Germline HABP2 Mutation Causing Familial Nonmedullary Thyroid Cancer. N Engl J Med, 2015. 373(5): p. 448-55. 53. Wong, K., X.-R. Ren, Y.-Z. Huang, Y. Xie, G. Liu, H. Saito, H. Tang, L. Wen, S.M. Brady-Kalnay, and L. Mei, Signal transduction in neuronal migration: roles of GTPase activating proteins and the small GTPase Cdc42 in the Slit- Robo pathway. Cell, 2001. 107(2): p. 209-221. 54. Vega, F.M. and A.J. Ridley, Rho GTPases in cancer cell biology. FEBS letters, 2008. 582(14): p. 2093-2101. 55. Etienne-Manneville, S., Cdc42-the centre of polarity. Journal of cell science, 2004. 117(8): p. 1291-1300. 56. Vasko, V., A.V. Espinosa, W. Scouten, H. He, H. Auer, S. Liyanarachchi, A. Larin, V. Savchenko, G.L. Francis, and A. de la Chapelle, Gene expression and functional evidence of epithelial-to-mesenchymal transition in papillary thyroid carcinoma invasion. Proceedings of the National Academy of Sciences, 2007. 104(8): p. 2803-2808. 57. Engelmann, D., C. Meier, V. Alla, and B.M. Putzer, A balancing act: orchestrating amino-truncated and full-length p73 variants as decisive factors in cancer progression. Oncogene, 2015. 34(33): p. 4287-99. 58. Guazzi, S., M. Price, M. De Felice, G. Damante, M.-G. Mattei, and R. Di Lauro, Thyroid nuclear factor 1 (TTF-1) contains a homeodomain and displays a novel DNA binding specificity. The EMBO Journal, 1990. 9(11): p. 3631. 59. Mizuno, K., F.J. Gonzalez, and S. Kimura, Thyroid-specific enhancer-binding protein (T/EBP): cDNA cloning, functional characterization, and structural identity with thyroid transcription factor TTF-1. Molecular and cellular biology, 1991. 11(10): p. 4927-4933. 60. Civitareale, D., R. Lonigro, A. Sinclair, and R. Di Lauro, A thyroid-specific nuclear protein essential for tissue-specific expression of the thyroglobulin promoter. The EMBO journal, 1989. 8(9): p. 2537. 61. Civitareale, D., M.P. Castelli, P. Falasca, and A. Saiardi, Thyroid transcription factor 1 activates the promoter of the thyrotropin receptor gene. Molecular endocrinology, 1993. 7(12): p. 1589-1595. 62. Fabbro, D., C. Di Loreto, C.A. Beltrami, A. Belfiore, R. Di Lauro, and G. Damante, Expression of thyroid-specific transcription factors TTF-1 and PAX-8 in human thyroid neoplasms. Cancer research, 1994. 54(17): p. 4744-4749. 63. Cantara, S., S. Capuano, C. Formichi, M. Pisu, M. Capezzone, and F. Pacini, Lack of germline A339V mutation in thyroid transcription factor-1 (TITF-

187 1/NKX2. 1) gene in familial papillary thyroid cancer. Thyroid research, 2010. 3(1): p. 4. 64. De Felice, M., C. Ovitt, E. Biffali, A. Rodriguez-Mallon, C. Arra, K. Anastassiadis, P. Macchia, E., M.-G. Mattei, A. Mariano, H. Scholer, L. Macchias, and R. Di Lauro, A mouse model for hereditary thyroid dysgenesis and cleft palate. Nature Genetics, 1998. 19: p. 395-398. 65. Kallel, R., S. Belguith-Maalej, A. Akdi, M. Mnif, I. Charfeddine, P. Galofre, A. Ghorbel, M. Abid, R. Marcos, H. Ayadi, A. Velazquez, and H. Hadj Kacem, Genetic investigation of FOXE1 polyalanine tract in thyroid diseases: new insight on the role of FOXE1 in thyroid carcinoma. Cancer Biomark, 2010. 8(1): p. 43-51. 66. Dathan, N., R. Parlato, A. Rosica, M. De Felice, and R. Di Lauro, Distribution of the titf2/foxe1 gene product is consistent with an important role in the development of foregut endoderm, palate, and hair. Dev Dyn, 2002. 224(4): p. 450-6. 67. Cuesta, I., K.S. Zaret, and P. Santisteban, The forkhead factor FoxE1 binds to the thyroperoxidase promoter during thyroid cell differentiation and modifies compacted chromatin structure. Mol Cell Biol, 2007. 27(20): p. 7302-14. 68. Landa, I., S. Ruiz-Llorente, C. Montero-Conde, L. Inglada-Perez, F. Schiavi, S. Leskela, G. Pita, R. Milne, J. Maravall, I. Ramos, and e. al., The variant rs1867277 in FOXE1 gene confers thyroid cancer susceptibility through the recruitment of USF1/USF2 transcription factors. PLoS Genetics, 2009. 5(9). 69. Tomaz, R.A., I. Sousa, J.G. Silva, C. Santos, M.R. Teixeira, V. Leite, and B.M. Cavaco, FOXE1 polymorphisms are associated with familial and sporadic nonmedullary thyroid cancer susceptibility. Clin Endocrinol (Oxf), 2012. 77(6): p. 926-33. 70. Pereira, J., J. da Silva, R. Tomaz, A. Pinto, M. Bugalho, V. Leite, and B. Cavaco, Identification of a novel germline FOXE1 variant in patients with familial non- medullary thyroid carcinoma (FNMTC). Endocrine, 2015. 49: p. 204-214. 71. Zhao, X., X. Li, and X. Zhang, HABP2 Mutation and Nonmedullary Thyroid Cancer. N Engl J Med, 2015. 373(21): p. 2086-7. 72. Zhou, E., Y., Z. Lin, and Y. Yang, HABP2 mutation and Nonmedullary thyroid cancer. N Engl J Med, 2015. 373(21): p. 2084-7. 73. Tomsic, J., H. He, and A. de la Chapelle, HABP2 Mutation and Nonmedullary Thyroid Cancer. N Engl J Med, 2015. 373(21): p. 2084-5. 74. Sponziello, M., C. Durante, and S. Filetti, HABP2 Mutation and Nonmedullary Thyroid Cancer. N Engl J Med, 2015. 373(21): p. 2084-5. 75. Canzian, F., P. Amati, H.R. Harach, J.-L. Kraimps, F. Lesueur, J. Barbier, P. Levillain, G. Romeo, and D. Bonneau, A gene predisposing to familial thyroid tumors with cell oxyphilia maps to chromosome 19p13. 2. The American Journal of Human Genetics, 1998. 63(6): p. 1743-1748. 76. Bevan, S., T. Pal, C. Greenberg, H. Green, J. Wixey, G. Bignell, S. Narod, W. Foulkes, M. Stratton, and R. Houlston, A comprehensive analysis of MNG1, TCO1, fPTC, PTEN, TSHR, and TRKA in familial nonmedullary thyroid cancer: confirmation of linkage to TCO1. The Journal of Clinical Endocrinology & Metabolism, 2001. 86(8): p. 3701-3704. 77. McKay, J., D. Thompson, F. Lesueur, K. Stankov, A. Pastore, C. Watfah, S. Strolz, G. Riccabona, R. Moncayo, and G. Romeo, Evidence for interaction between the TCO and NMTC1 loci in familial non-medullary thyroid cancer. Journal of medical genetics, 2004. 41(6): p. 407-412. 78. Prazeres, H.J., F. Rodrigues, P. Soares, P. Naidenov, P. Figueiredo, B. Campos, M. Lacerda, and T.C. Martins, Loss of heterozygosity at 19p13. 2 and 2q21 in

188 tumours from familial clusters of non-medullary thyroid carcinoma. Familial cancer, 2008. 7(2): p. 141-149. 79. Malchoff, C.D., M. Sarfarazi, B. Tendler, F. Forouhar, G. Whalen, V. Joshi, A. Arnold, and D.M. Malchoff, Papillary Thyroid Carcinoma Associated with Papillary Renal Neoplasia: Genetic Linkage Analysis of a Distinct Heritable Tumor Syndrome 1. The Journal of Clinical Endocrinology & Metabolism, 2000. 85(5): p. 1758-1764. 80. Suh, I., S. Filetti, M.R. Vriens, M.A. Guerrero, S. Tumino, M. Wong, W.T. Shen, E. Kebebew, Q.-Y. Duh, and O.H. Clark, Distinct loci on chromosome 1q21 and 6q22 predispose to familial nonmedullary thyroid cancer: a SNP array-based linkage analysis of 38 families. Surgery, 2009. 146(6): p. 1073-1080. 81. Cavaco, B.M., P.F. Batista, L.G. Sobrinho, and V. Leite, Mapping a new familial thyroid epithelial neoplasia susceptibility locus to chromosome 8p23. 1- p22 by high-density single-nucleotide polymorphism genome-wide linkage analysis. The Journal of Clinical Endocrinology & Metabolism, 2008. 93(11): p. 4426-4430. 82. McKay, J.D., F. Lesueur, L. Jonard, A. Pastore, J. Williamson, L. Hoffman, J. Burgess, A. Duffield, M. Papotti, and M. Stark, Localization of a susceptibility gene for familial nonmedullary thyroid carcinoma to chromosome 2q21. The American Journal of Human Genetics, 2001. 69(2): p. 440-446. 83. Bignell, G.R., F. Canzian, M. Shayeghi, M. Stark, Y.Y. Shugart, P. Biggs, J. Mangion, R. Hamoudi, J. Rosenblatt, and P. Buu, Familial nontoxic multinodular thyroid goiter locus maps to chromosome 14q but does not account for familial nonmedullary thyroid cancer. The American Journal of Human Genetics, 1997. 61(5): p. 1123-1130. 84. He, H., R. Nagy, S. Liyanarachchi, H. Jiao, W. Li, S. Suster, J. Kere, and A. de la Chapelle, A susceptibility locus for papillary thyroid carcinoma on chromosome 8q24. Cancer research, 2009. 69(2): p. 625-631. 85. Wahl, M.C., C.L. Will, and R. Luhrmann, The spliceosome: design principles of a dynamic RNP machine. Cell, 2009. 136(4): p. 701-18. 86. Dvinge, H. and R.K. Bradley, Widespread intron retention diversifies most cancer transcriptomes. Genome medicine, 2015. 7(1): p. 45. 87. Blencowe, B.J., Alternative splicing: new insights from global analyses. Cell, 2006. 126(1): p. 37-47. 88. Brosseau, J.-P., J.-F. Lucier, H. Nwilati, P. Thibault, D. Garneau, D. Gendron, M. Durand, S. Couture, E. Lapointe, and P. Prinos, Tumor microenvironment– associated modifications of alternative splicing. Rna, 2014. 20(2): p. 189-201. 89. Oltean, S. and D. Bates, Hallmarks of alternative splicing in cancer. Oncogene, 2014. 33(46): p. 5311. 90. Ladomery, M., Aberrant alternative splicing is another hallmark of cancer. International journal of cell biology, 2013. 2013. 91. Danan-Gotthold, M., R. Golan-Gerstl, E. Eisenberg, K. Meir, R. Karni, and E.Y. Levanon, Identification of recurrent regulated alternative splicing events across human solid tumors. Nucleic Acids Res., 2015. 43(10): p. 5130-44. 92. Yoshida, K., M. Sanada, Y. Shiraishi, D. Nowak, Y. Nagata, R. Yamamoto, Y. Sato, A. Sato-Otsubo, A. Kon, and M. Nagasaki, Frequent pathway mutations of splicing machinery in myelodysplasia. Nature, 2011. 478(7367): p. 64-69. 93. Graubert, T.A., D. Shen, L. Ding, T. Okeyo-Owuor, C.L. Lunn, J. Shao, K. Krysiak, C.C. Harris, D.C. Koboldt, and D.E. Larson, Recurrent mutations in the U2AF1 splicing factor in myelodysplastic syndromes. Nature genetics, 2012. 44(1): p. 53.

189 94. Blencowe, B.J., R. Issner, J.A. Nickerson, and P.A. Sharp, A coactivator of pre- mRNA splicing. GENES & DEVELOPMENT, 1998. 12(7): p. 996-1009. 95. Jeong, S., SR Proteins: Binders, Regulators, and Connectors of RNA. Mol Cells, 2017. 40(1): p. 1-9. 96. Wang, Z. and C.B. Burge, Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA, 2008. 14(5): p. 802-13. 97. Muller-McNicoll, M. and K.M. Neugebauer, How cells get the message: dynamic assembly and function of mRNA-protein complexes. Nat Rev Genet, 2013. 14(4): p. 275-87. 98. Popp, M.W.-L. and L.E. Maquat, The dharma of nonsense-mediated mRNA decay in mammalian cells. Molecules and cells, 2014. 37(1): p. 1-8. 99. Spector, D.L. and A.I. Lamond, Nuclear speckles. Cold Spring Harbor perspectives in biology, 2011. 3(2): p. a000646. 100. Huang, Y. and J.A. Steitz, SRprises along a messenger’s journey. Molecular cell, 2005. 17(5): p. 613-615. 101. Hall, L.L., K.P. Smith, M. Byron, and J.B. Lawrence, Molecular anatomy of a speckle. The Anatomical Record Part A: Discoveries in Molecular, Cellular, and Evolutionary Biology, 2006. 288(7): p. 664-675. 102. Sawada, Y., Y. Miura, K. Umeki, T. Tamaoki, K. Fujinaga, and S. Ohtaki, Cloning and characterization of a novel RNA-binding protein SRL300 with RS domains. Biochimica et Biophysica Acta (BBA)-Gene Structure and Expression, 2000. 1492(1): p. 191-195. 103. Blencowe, B.J., G. Bauren, A.G. Eldridge, R. Issner, J.A. Nickerson, E. Rosonina, and P.A. Sharp, The SRm160:300 splicing coactvator subunits. RNA, 2000. 6(1): p. 111-120. 104. Bessonov, S., M. Anokhina, C.L. Will, H. Urlaub, and R. Lührmann, Isolation of an active step I spliceosome and composition of its RNP core. Nature, 2008. 452(7189): p. 846. 105. Konarska, M.M., A purified catalytically competent spliceosome. Nature Structural and Molecular Biology, 2008. 15(3): p. 222-225. 106. Miele, A., R. Medina, A.J. van Wijnen, G.S. Stein, and J.L. Stein, The interactome of the histone gene regulatory factor HiNF-P suggests novel cell cycle related roles in transcriptional control and RNA processing. J Cell Biochem, 2007. 102(1): p. 136-48. 107. Blencowe, B.J., Splicing regulation- the cell cycle connection. Curr Biol, 2003. 13: p. R149-151. 108. Ye, X., Y. Wei, G. Nalepa, and J.W. Harper, The Cyclin E/Cdk2 Substrate p220NPAT Is Required for S-Phase Entry, Histone Gene Expression, and Cajal Body Maintenance in Human Somatic Cells. Molecular and Cellular Biology, 2003. 23(23): p. 8586-8600. 109. Grainger, R.J., J.D. Barrass, A. Jacquier, J.C. Rain, and J.D. Beggs, Physical and genetic interactions of yeast Cwc21p, an ortholog of human SRm300/SRRM2, suggest a role at the catalytic center of the spliceosome. RNA, 2009. 15(12): p. 2161-73. 110. Aranda, P.S., D.M. LaJoie, and C.L. Jorcyk, Bleach gel: a simple agarose gel for analyzing RNA quality. Electrophoresis, 2012. 33(2): p. 366-9. 111. Boyer, P.L. and S. Hughes, Site-Directed Mutagenic analysis of viral polymerases and related proteins. Methods Enzymol, 1996. 275: p. 538-555. 112. Nhor, J. and K. Kristiansen, Site-Directed Mutagenesis. Methods in Molecular Biology, ed. G.N.e.P.M.a.D. In: Bross P. Vol. 232. 2003: Humana Press. 127- 131.

190 113. Orlik, F., C. Andersen, and R. Benz, Site-directed mutagenesis of tyrosine 118 within the central constriction site of the LamB Channel of Escherichia coli. Biophysical Journal, 2002. 88: p. 2466-2475. 114. Sheng, Y., V. Mancino, and B. Birren, Transformation of Escherichia coli with large DNA molecules by electroporation. Nucleic Acids Research, 1995. 23(11). 115. Ishino, S. and Y. Ishino, DNA polymerases as useful reagents for biotechnology - the history of developmental research in the field. Front Microbiol, 2014. 5: p. 465. 116. Ran, F.A., P.D. Hsu, J. Wright, V. Agarwala, D.A. Scott, and F. Zhang, Genome engineering using the CRISPR-Cas9 system. Nature protocols, 2013. 8(11): p. 2281-2308. 117. Cho, S.W., S. Kim, J.M. Kim, and J.S. Kim, Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease. Nat Biotechnol, 2013. 31(3): p. 230-2. 118. Shen, B., J. Zhang, H. Wu, J. Wang, K. Ma, Z. Li, X. Zhang, P. Zhang, and X. Huang, Generation of gene-modified mice via Cas9/RNA-mediated gene targeting. Cell Res, 2013. 23(5): p. 720-3. 119. Ran, F.A., P.D. Hsu, J. Wright, V. Agarwala, D.A. Scott, and F. Zhang, Genome engineering using the CRISPR-Cas9 system. Nat Protoc, 2013. 8(11): p. 2281- 2308. 120. Essletzbichler, P., T. Konopka, F. Santoro, D. Chen, B.V. Gapp, R. Kralovics, T.R. Brummelkamp, S.M. Nijman, and T. Burckstummer, Megabase-scale deletion using CRISPR/Cas9 to generate a fully haploid human cell line. Genome Res, 2014. 24(12): p. 2059-65. 121. Garneau, J.E., M.E. Dupuis, M. Villion, D.A. Romero, R. Barrangou, P. Boyaval, C. Fremaux, P. Horvath, A.H. Magadan, and S. Moineau, The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA. Nature, 2010. 468(7320): p. 67-71. 122. Cong, L., F.A. Ran, D. Cox, S. Lin, R. Barretto, N. Habib, P.D. Hsu, X. Wu, W. Jiang, L.A. Marraffini, and F. Zhang, Multiplex genome engineering using CRISPR/Cas systems. Science, 2013. 339(6121): p. 819-23. 123. Wang, H., H. Yang, C.S. Shivalila, M.M. Dawlaty, A.W. Cheng, F. Zhang, and R. Jaenisch, One-step generation of mice carrying mutations in multiple genes by CRISPR/Cas-mediated genome engineering. Cell, 2013. 153(4): p. 910-8. 124. Doench, J.G., E. Hartenian, D.B. Graham, Z. Tothova, M. Hegde, I. Smith, M. Sullender, B.L. Ebert, R.J. Xavier, and D.E. Root, Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation. Nat Biotechnol, 2014. 32(12): p. 1262-7. 125. Olbrich, T., C. Mayor-Ruiz, M. Vega-Sendino, C. Gomez, S. Ortega, S. Ruiz, and O. Fernandez-Capetillo, A p53-dependent response limits the viability of mammalian haploid cells. Proc Natl Acad Sci U S A, 2017. 114(35): p. 9367- 9372. 126. Sagi, I., G. Chia, T. Golan-Lev , M. Peretz, U. Weissbein, L. Sui, M. Sauer, O. Yanuka, D. Egli, and N. Benvenisty, Derivation and differentiation of haploid human embryonic stem cells. Nature, 2016. 532(7597): p. 107-111. 127. VanGuilder, H.D., K.E. Vrana, and W.M. Freeman, Twenty-five years of quantitative PCR for gene expression analysis. Biotechniques, 2008. 44(5): p. 619-26. 128. Teste, M.A., M. Duquenne, J.M. Francois, and J.L. Parrou, Validation of reference genes for quantitative expression analysis by real-time RT-PCR in Saccharomyces cerevisiae. BMC Mol Biol, 2009. 10: p. 99.

191 129. Moelling, K., F. Broecker, and K. JE., Rnase H: specificity, mechanisms of action, and antiviral target. Methods Mol Biol., 2014. 1087: p. 71-84. 130. Harvey, S.E. and C. Cheng, Methods for Characterization of Alternative RNA Splicing. Methods Mol Biol, 2016. 1402: p. 229-241. 131. Gowen, B.G., B. Chim, C.D. Marceau, T.T. Greene, P. Burr, J.R. Gonzalez, C.R. Hesser, P.A. Dietzen, T. Russell, A. Iannello, L. Coscoy, C.L. Sentman, J.E. Carette, S.A. Muljo, and D.H. Raulet, A forward genetic screen reveals novel independent regulators of ULBP1, an activating ligand for natural killer cells. Elife, 2015. 4. 132. Adan, A., G. Alizada, Y. Kiraz, Y. Baran, and A. Nalbant, Flow cytometry: basic principles and applications. Crit Rev Biotechnol, 2017. 37(2): p. 163-176. 133. Roederer, M., Interpretation of cellular proliferation data: avoid the panglossian. Cytometry A, 2011. 79(2): p. 95-101. 134. Imataka, G. and O. Arisaka, Chromosome analysis using spectral karyotyping (SKY). Cell Biochem Biophys, 2012. 62(1): p. 13-7. 135. Schori, C. and J.M. Sedivy, Analysis of Cell Cycle Phases and Progression in Cultured Mammalian Cells. Methods, 2007. 41(2): p. 143-150. 136. Leif, R.C., J.H. Stein, and R.M. Zucker, A short history of the initial application of anti-5-BrdU to the detection and measurement of S phase. Cytometry A, 2004. 58(1): p. 45-52.

192 Appendix I: Buffers and Solutions

1% w/v Agarose Gel Reagent Quantity Agarose 0.5g or 1g 1 x TAE Buffer 50mL or 100 mL 10mg/mL Ethidium Bromide Solution 3 µL Agarose is mixed with TAE buffer and heated in a microwave for 90 seconds. The mixture is allowed to cool for 5 Minutes and ethidium bromide is added and mixed. The mixture is then poured into a casting tray and left to cool down for 1 hour.

10% Ammonium Persulphate Reagent Quantity Final concentration

Ammonium Persulphate 0.5 g 10 % Made up to 5 mL with ddH2O and stored in 100 µL aliquots at -20ºC

50 mg/mL Ampicillin Reagent Quantity Final concentration

Ampicillin 2.5 g 50 mg/mL Ampicillin was dissolved in 5 mL of ddH2O at 4ºC on a rotor and later filter sterilized. Stored in 1 mL aliquots and kept at -20ºC

5% w/v BSA Blocking Buffer Reagent Quantity Final concentration

1 X TBS 9.4 mL 1 X Tween20 100 µL 0.1% v/v BSA 0.5g 5% w/v Reagents were mixed with 9.4 mL of 1 x TBS on a rotor at 4ºC for 1 hour.

Cell Lysis Buffer Reagent Quantity Final concentration

Tris 0.36 g 40 mM EDTA 1.752 g 150 mM Triton-X 80 µL 0.08 % Glycerol 10 mL 10 % Reagents were dissolved in 90 mL ddH2O and the pH adjusted to 7.8. The volume was made up to 100 mL with ddH2O and the buffer was stored at 4ºC. Cell Lysis Buffer was used in aliquots, with 2 mM dithiothreitol (DTT) immediately before use.

193

0.5 M EDTA Reagent Quantity Final concentration

EDTA 8.375 g 10 % Made up to 45 mL with ddH2O and pH adjusted to 8. Stored at room temperature.

FACS Buffer Reagent Quantity Final concentration

EDTA (0.5 M) 1 mL 5 mM FBS 20 mL 20% Reagents were added to 100 mL of 1 x PBS, filter sterilized and stored at 4ºC.

Iscove's Modified Dulbecco's Medium Reagent Quantity Final concentration

IMDM Powder Entire sachet - NaHCO3 2.024 g 12.034 mM Made up to 2 L with ddH2O and filter sterilized in a laminutesar flow hood.

50 mg/mL Kanamycin Reagent Quantity Final concentration

Kanamycin 2.5 g 50 mg/mL Kanamycin was dissolved in 5 mL of ddH2O at 4ºC on a rotor and later filter sterilized. Stored in 1 mL aliquots and kept at -20ºC

10 mg/mL Ethidium Bromide Reagent Quantity Final concentration

Ethidium Bromide 10 mg 10 mg/mL Made up to 1 mL with ddH2O, protected from light and stored at 4ºC.

1 x Phosphate Buffered Saline (PBS) Reagent Quantity Final concentration

NaCl 80 g 137 mM KCl 2 g 2.7 mM Na2H2PO4 14.4 g 4.3 mM KH2PO4 2.4 g 1.4 mM

194 Made up to 990 mL with ddH2O and the pH adjusted to 7.5. Volume subsequently made up to 1 L, the solution autoclaved, and stored at room temperature

7.5% Polyacrylamide Separating Gel Reagent Quantity 30 % Acrylamide/Bis (29:1) Solution 1.9 mL 4 x Separating Buffer 1.9 mL ddH2O 3.65 mL TEMED 7.5 µL 10 % Ammonium Persulphate (APS) 25 µL APS and TEMED were added last. The solution mixed and poured immediately.

1 x Polyacrylamide Stacking Gel Reagent Quantity

4 x Stacking Buffer 0.833 mL 30 % Acrylamide/Bis (29:1) Solution 0.433 mL ddH2O 2.03 mL 10 % APS 33.33 µL TEMED 3.33 µL APS and TEMED were added last. The solution mixed and poured immediately.

4 x Stacking Buffer Reagent Quantity Final concentration

Tris Base 24.24 g 0.5 M 10% SDS 16 mL 0.4% Tris was dissolved in 320 mL of ddH2O and the pH adjusted to 8.8. SDS was then added and the volume made up to 400 mL with ddH2O.

4 x Separating Buffer Reagent Quantity Final concentration

Tris Base 72.66 g 1.5 M 10 % SDS 16 mL 0.4% (w/v) Tris was dissolved in 320 mL of ddH2O and the pH adjusted to 8.8. SDS was then added and the volume made up to 400 mL with ddH2O.

3% Skim Milk Blocking Buffer Reagent Quantity Final concentration

Skim Milk Powder 0.9 g 3% w/v 1 x TBS 30 mL - Skim milk powder was dissolved in 1 x TBS on a rotor for 1 hour at room temperature.

195

1 x Tris-Buffered Saline (TBS) Reagent Quantity Final concentration

Tris Base 4.84 g 40 mM NaCl 58.44 g 0.145 M Reagents dissolved in 1.5 L of ddH2O and the pH adjusted to 7.5. Solution was then made up to 2 L. Stored at room temperature.

1 x TBST (1% Tween20) Reagent Quantity

1 x TBS 1 L Tween20 2 mL Reagents mixed and stored at room temperature

50 x Tris Acetate EDTA Buffer (TAE) Reagent Quantity Final concentration

Tris Base 242 g 2 M Glacial Acetic Acid 57.1 mL 5.71% v/v 0.5M EDTA (pH 8.0) 100 mL 50 mM ddH2O 700 mL - Reagents dissolved in ddH2O in the order they appear in the table. Volume made up to 1 L with ddH2O and stored at room temperature.

1 x Transfer Buffer Reagent Quantity Tris Base 3.03 g Glycine 14.4 mL Methonol 200 mL Reagents dissolved in 800mL of ddH2O. Stored at 4ºC

2 x YT Medium Reagent Quantity Bacto Tryptone 16 g Bacto Yeast Extract 10 g NaCl 5 g Made up to 1 L with ddH2O and mixed thoroughly. Autoclaved and stored at room temperature.

196 2 x YT Agar Reagent Quantity

Bacto Tryptone 16 g Bacto Yeast Extract 10 g NaCl 5 g Bacto Agar 7.5 g Reagents dissolved in 1 L of ddH2O and mixed thoroughly. Autoclaved and allowed to cool down to 50ºC. 500 µL of 50 mg/mL ampicillin or kanamycin is added and mixed. Solution then poured quickly into petri dishes and allowed to set.

Cell Cycle Analysis Buffer Reagent Concentration Propidium iodide 50 µg/mL Rnase A 50 µg/mL

FSB Buffer Reagent Quantity Final concentration

Potassium Acetate (1M) 5 mL 10 mM MnCl2 4.45 g 45 mM CaCl2 0.78 g 10 mM KCl 3.37 g 100 mM Hexaamminecobolt 0.4 g 3 mM chloride Glycerol 50 mL 10% Make up to 500 mL with ddH2O store at 4ºC

197 Appendix II: Primer Sequences

Primers Used to Amplify the SRRM2 ORF from the pF1k-SRRM2 vector Primer Direction Sequence (5’à 3’) SRRM2_F Forward GCGATATCATGTACAACGGGATCG

SRRM2_R Reverse GAGCGGCCGCCTACTTATGGTCGTCA TCCTTGTAATCTGGAGACCTGGAGGAG

Primers for Colony PCR Primer Direction Sequence (5’à 3’) SRRM2_F Forward GCGATATCATGTACAACGGGATCG SRRM2_IP12 Reverse TTGGAGTGGGAGACC

Sequencing Primers Primer Direction Sequence (5’à 3’) SRRM2_IP1 Forward GAGCAAACGTAAATC SRRM2_IP2 Forward TTCTCATACCCCCTC SRRM2_IP3 Forward GATCTCACTCTAGAA SRRM2_IP4 Forward GAAGCAGATCAGTAT SRRM2_IP5 Forward GTCCCTTTCCAGTAC SRRM2_IP6 Forward CCAGAACTCCATCAAG SRRM2_IP7 Forward GTTCCAGGTCATCAC SRRM2_IP8 Forward AAGATCCCGGTCAAG SRRM2_IP9 Forward GCTCCATGATGGATG SRRM2_IP10 Forward GTGTCAGTGGCAGAA SRRM2_IP11 Forward TTCTTCCTCCTCATC SRRM2_IP12 Reverse TTGGAGTGGGAGACC pSpCas9_SP Forward CAAGGCTGTTAGAGAGATAATTGG IP = Internal Primer

Primers for Site-Directed Mutagenesis Primer Direction Sequence (5’à 3’) R1805W_F Forward GAAGACAGCGGAGCTGGTCAAGGTCGCGGGTTAC R1805W_R Reverse GTAACCCGCGACCTTGACCAGCTCCGCTGTCTTC In red, the nucleotide substitution on the forward and reverse primers to generate the R1805W mutation

198

Primers for PCR Amplification of R1805W Codon Primer Direction Sequence R1805W_SPF Forward 5’-CTGAACCTAAAGCTCCAGC-3’ R1805W_SPR Reverse 5’-GAGATGTTCTGGATCTTGATC-3’

Primers for PCR Amplification of S346F Codon Primer Direction Sequence S346F_SPF Forward 5’-CTCATTTCCTGTACTCTACCCATCC -3’ S346F_SPR Reverse 5’-GCCTGGTCGCTGAGGAGAC -3’

Primers for Verification of Successful Genome Engineering Events Primers Mutation Direction Sequence R1805W_SPF R1805W Forward 5’-CTGAACCTAAAGCTCCAGC-3’ S346F_SPF S346F Forward 5’- CTCATTTCCTGTACTCTACCCATCC - 3’

RT-qPCR Primers Primer Direction Sequence (5’à 3’) CAMKK2_EIF Forward GAAATCAAGCTGCACCCTG CAMKK2_ESF Forward GAAATCAAGATCCTGGTGAAGACC CAMKK2_R Reverse AGCAAGTTTCCAGGCGCTGAC CDC16_EIF Forward CTTCCACACAGCCCTTGGTCTTA CDC_ESF Forward GGACTACTTCCACACAGAGCAGAC CDC16_R Reverse GAGGTTTCCAATGGCGTAAGC CTNNA1_EIF Forward CTTGTTACACAGGTTACAACCCTTG CTNNA1_ESF Forward TGTTACACAGGTGATTTGATGAAGG CTANNA1_R Reverse TCAGCTGAACAAGTAATTTGTAGACATC FBXW4_EIF Forward ACCTCAACAGTGGGCAGC FBXW4_ESF Forward GACCTCAACAGGAAATGTGTC FBXW4_R Reverse TGCAGGCAGGCCCTTTGACG HBP1_EIF Forward GCATTCACAAGGGCTATGGTTC HBP1_ESF Forward GCATTCACAAGGTTGGTCATC HBP1_R Reverse TCCAGGAGGTAGACATACATCGC PIM2_EIF Forward GACTTTGATGGGACAAGGGTG PIM2_ESF Forward GACTTTGATGACTGCTGTGCC PIM2_R Reverse CTGATCCTCAATCCCTTACCTTAG SPPL3_EIF Forward GATACTTTGTAGGCCTGCTCAC SPPL3_ESF Forward GATACTTTGTAGGGCGACCTCC SPPL3_R Reverse AGAAGGTAACAGTCTGGTGCAGA HPRT_F Forward AGGATTTGGAAAGGGTGTTTATTC

199 HPRT_R Reverse CAGAGGGCTACAATGTGATCG 18SrRNA_F Forward GTAACCCGTTGAACCCCATT 18SrRNA_R Reverse CCATCCAATCGGTAGTAGCG

200