Topoisomerase I is required for the expression of long in glial cells and is dysregulated in ASD brain

by

Shuchi Trivedi

Supervisor

Dr. Irina Voineagu

A thesis in fulfilment of the requirements for the degree of

Master of Science

School of Biotechnology and Biomolecular sciences

University of New South Wales

August 2018

1

Thesis/Dissertation Sheet

Trivedi Surname/Family Name : Trivedi Given Name/s : Shuchi Abbreviation for degree as give in the University calendar : MSc Faculty : Science School : School of Biotechnology and Biomolecular Sciences I is required for the expression of long genes in glial cells and Thesis Title : is dysregulated in ASD brain.

Abstract 350 words maximum: (PLEASE TYPE) Autism spectrum disorder (ASD) is a neurodevelopmental disease which is a highly heritable, but genetically heterogeneous. With increasing prevalence in last decade, it affects 1-2% population of the world, with the occurrence ratio of female to male being 1:4. Accumulating evidence suggests that ASD may be caused by synaptic dysfunction. At the same time, genes involved in synaptic function are often long genes (>100kb). Perturbation of long expression, and consequently synaptic homeostasis has been hypothesized to be involved in neurodevelopmental diseases.

How the expression of long genes in the brain is regulated, and potentially altered in neurodevelopmental disorders remains unclear. Recent studies have shown that Topoisomerase 1 (TOP1), a ubiquitously expressed enzyme involved in DNA replication and repair, is required for the expression of long genes in neurons. However, whether this effect is cell-type specific in the brain has not been elucidated. Furthermore, genetic variants in topoisomerase genes have been identified in ASD individuals, raising a potential link between the dysregulation of long genes and TOP1 in ASD. The current study was carried out to (a) assess the cell-type specificity of TOP1’s effect on long gene transcription in the brain and (b) investigate the expression of TOP1 and TOP1-dependent genes in brain tissue from ASD and control individuals. By assessing the effect of TOP1 inhibition on long gene expression in glial cells (i.e. normal astrocytes), the current study extended the role of TOP1 in regulating long gene expression to all major brain cell types and determined that the expression of long genes was inversely proportional to their length. We also demonstrate for the first time that TOP1 expression is enriched in the cerebellum compared to human cerebral cortex, and that it is overexpressed in ASD brain samples. Furthermore, some of the TOP1-dependent genes also show increases in gene expression in ASD brain.

Declaration relating to disposition of project thesis/dissertation

I hereby grant to the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or in part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all property rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.

I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstracts International (this is applicable to doctoral theses only).

…………………………………………………………… ……………………………………..……………… ……….……………………...…….… Signature Witness Signature Date The University recognises that there may be exceptional circumstances requiring restrictions on copying or conditions on use. Requests for restriction for a period of up to 2 years must be made in writing. Requests for a longer period of restriction may be considered in exceptional circumstances and require the approval of the Dean of Graduate Research.

FOR OFFICE USE ONLY Date of completion of requirements for Award:

ORIGINALITY STATEMENT

‘I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged.’

Signed ……………………………………………......

Date ……………………………………………...... INCLUSION OF PUBLICATIONS STATEMENT

UNSW is supportive of candidates publishing their research results during their candidature as detailed in the UNSW Thesis Examination Procedure.

Publications can be used in their thesis in lieu of a Chapter if: • The student contributed greater than 50% of the content in the publication and is the “primary author”, ie. the student was responsible primarily for the planning, execution and preparation of the work for publication • The student has approval to include the publication in their thesis in lieu of a Chapter from their supervisor and Postgraduate Coordinator. • The publication is not subject to any obligations or contractual agreements with a third party that would constrain its inclusion in the thesis

Please indicate whether this thesis contains published material or not. ☒ This thesis contains no publications, either published or submitted for publication

Some of the work described in this thesis has been published and it has been ☐ documented in the relevant Chapters with acknowledgement

This thesis has publications (either published or submitted for publication) ☐ incorporated into it in lieu of a chapter and the details are presented below

CANDIDATE’S DECLARATION I declare that: • I have complied with the Thesis Examination Procedure • where I have used a publication in lieu of a Chapter, the listed publication(s) below meet(s) the requirements to be included in the thesis. Name Signature Date (dd/mm/yy)

Postgraduate Coordinator’s Declaration (to be filled in where publications are used in lieu of Chapters) I declare that: • the information below is accurate • where listed publication(s) have been used in lieu of Chapter(s), their use complies with the Thesis Examination Procedure • the minimum requirements for the format of the thesis have been met. PGC’s Name PGC’s Signature Date (dd/mm/yy)

i

COPYRIGHT STATEMENT

‘I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral theses only). I have either used no substantial portions of copyright material in my thesis or I have obtained permission to use copyright material; where permission has not been granted I have applied/will apply for a partial restriction of the digital copy of my thesis or dissertation.'

Signed ……………………………………………......

Date ……………………………………………......

AUTHENTICITY STATEMENT

‘I certify that the Library deposit digital copy is a direct equivalent of the final officially approved version of my thesis. No emendation of content has occurred and if there are any minor variations in formatting, they are the result of the conversion to digital format.’

Signed ……………………………………………......

Date ……………………………………………...... Abstract

Autism spectrum disorder (ASD) is a neurodevelopmental disease which is a highly heritable, but genetically heterogeneous. With increasing prevalence in last decade, it affects 1-2% population of the world, with the occurrence ratio of female to male being 1:4. Accumulating evidence suggests that ASD may be caused by synaptic dysfunction. At the same time, genes involved in synaptic function are often long genes (>100kb). Perturbation of long gene expression, and consequently synaptic homeostasis has been hypothesized to be involved in neurodevelopmental diseases.

How the expression of long genes in the brain is regulated, and potentially altered in neurodevelopmental disorders remains unclear. Recent studies have shown that Topoisomerase 1 (TOP1), a ubiquitously expressed enzyme involved in DNA replication and repair, is required for the expression of long genes in neurons. However, whether this effect is cell-type specific in the brain has not been elucidated. Furthermore, genetic variants in topoisomerase genes have been identified in ASD individuals, raising a potential link between the dysregulation of long genes and TOP1 in ASD.

The current study was carried out to (a) assess the cell-type specificity of TOP1’s effect on long gene transcription in the brain and (b) investigate the expression of TOP1 and TOP1- dependent genes in brain tissue from ASD and control individuals. By assessing the effect of TOP1 inhibition on long gene expression in glial cells (i.e. normal human astrocytes), the current study extended the role of TOP1 in regulating long gene expression to all major brain cell types and determined that the expression of long genes was inversely proportional to their length. We also demonstrate for the first time that TOP1 expression is enriched in the cerebellum compared to human cerebral cortex, and that it is overexpressed in ASD brain samples. Furthermore, some of the TOP1-dependent genes also show increases in gene expression in ASD brain.

5

Table of Contents 1 Introduction ...... 16

Topoisomerases ...... 17

1.1.1 Mechanism of action of ...... 18

1.1.2 Replication and chromosomal organisation ...... 19

1.1.3 Role of Topoisomerases in transcription ...... 20

1.1.4 Role of Topoisomerases in CNS development ...... 24

Autism Spectrum Disorders ...... 25

1.2.1 Overview of ASD...... 26

1.2.2 ASD Heritability ...... 27

1.2.3 The genetic basis of ASD ...... 27

Relevance of astrocytes as a cellular system for neurodevelopmental disorders ...... 31

1.3.1 Role of astrocyte-secreted signal molecules in synapse formation, maturation and in synaptic plasticity ...... 33

Neuro-glia interactions and its significance in neurodevelopment ...... 35

2 Investigating the role of TOP1 in regulating long gene transcription in glial cells .. 38

Introduction ...... 38

Materials and Methods ...... 40

2.2.1 Materials ...... 40

2.2.2 Methods ...... 43

2.2.2.1 Cell culture ...... 43

2.2.2.2 Topotecan treatment ...... 43

2.2.2.3 RNA Extraction...... 43

2.2.2.4 cDNA Synthesis ...... 44

2.2.2.5 Quantitative real-time PCR (qRT-PCR) ...... 44

2.2.2.6 Cloning of TOP1 sgRNAs into pBA439 ...... 47

2.2.2.7 Transient transfections and quantifying transfection efficiency ...... 49

2.2.2.8 Microarray Data ...... 50

6

Results ...... 51

2.3.1 Topoisomerase inhibition by Topotecan leads to reduced expression of long genes implicated in ASD ...... 51

2.3.2 ASD-associated genes are overrepresented among Topoisomerase 1-dependent genes in human astrocytes...... 51

2.3.3 qRT-PCR experiments confirm the downregulation of long genes in response to Topotecan in human primary astrocytes ...... 53

2.3.4 Silencing of TOP1 by CRISPRi ...... 57

Discussion ...... 61

2.4.1 Overview and functions of long genes involved in neurodevelopmental diseases ...... 61

2.4.2 The function of TOP1-dependent long genes and potential implications of their downregulation in response to TOP1 inhibition...... 62

2.4.2.1 IMMP2L- Inner mitochondrial membrane peptidase 2-like gene ...... 62

2.4.2.2 and ...... 63

2.4.3 Towards optimisation of TOP1 silencing by CRISPRi ...... 64

3 Analysing the expression of TOP1 and TOP1-dependent genes in ASD brain ...... 65

Introduction ...... 65

Methods...... 68

3.2.1 RNA-seq data ...... 68

3.2.2 RNA-seq data analysis ...... 68

Results ...... 70

Discussion ...... 75

3.4.1 Highlighting the effect of TOP1 overexpression ...... 76

4 Discussion and conclusion ...... 78

5 Limitations ...... 81

6 Future Directions ...... 83

7 References ...... 84

7

Table of Tables

Table 1.1 Classification of human Topoisomerases ...... 18 Table 2.1 Table of Primers...... 41 Table 2.2 TOP1 sgRNA sequences...... 41 Table 2.3 Reagents and Kits ...... 42 Table 2.4 cDNA Synthesis reaction mix ...... 44 Table 2.5 Master mix for qRT-PCR ...... 44 Table 2.6 Amplification efficiency for primer pairs for qRT-PCR ...... 45 Table 2.7 Reaction mix for digesting pBA439 ...... 47 Table 2.8 Reaction mix for Annealing ...... 49 Table 2.9 Reaction mix for ligation reaction ...... 49 Table 2.10 Comparison of log2 fold changed qRT-PCR and Microarray data...... 57 Table 3.1 Number of samples in the RNA-seq dataset ...... 68

8

Table of Figures

Figure 1.1 Study overview...... 17 Figure 1.2 Mechanism of TOP1 function...... 19 Figure 1.3 Interactions and functions of topoisomerases...... 23 Figure 1.4 Schematic of astrocyte functions...... 33 Figure 2.1 Primer Amplification efficiency...... 46 Figure 2.2 Sequence map of pBA439...... 48 Figure 2.3 Downregulation of ASD susceptibility genes in NHA cells in response to topotecan treatment...... 52 Figure 2.4 Genes differentially expressed upon topotecan treatment are enriched for long genes and ASD susceptibility genes...... 53 Figure 2.5 qRT-PCR validation for ASD significant long genes...... 55 Figure 2.6 Effect of gene length on fold change in response to topotecan treatment...... 56 Figure 2.7 Comparison of transfection efficiency across cell lines...... 59 Figure 2.8 TOP1 silencing with CRISPRi...... 60 Figure 3.1. Properties of the brain tissue samples included in the Parikshak et al. RNA-seq dataset...... 69 Figure 3.2 TOP1 expression in human brainsamples...... 71 Figure 3.3 Boxplots displaying gene expression differences between ASD and controls...... 73 Figure 3.4 Scatterplot displaying gene expression fold-changes between ASD and control samples on a log2 scale in RNA-seq datasets...... 74

9

Acknowledgement

I take this opportunity to thank my supervisor, Irina for providing this great opportunity to fulfil my dream to study Molecular genetics. Irina has always guided, supported and encouraged me to persist throughout the course and accomplish this mammoth task that I undertook almost 2 years ago. It was an honour to be part of Voineagu lab.

I would also like to thank all the wonderful people in Voineagu lab, Akira, Gavin, Nicole, Adam, Emma and Firoz for sharing their knowledge and time throughout the journey.

10

Abbreviations

⅟H2AX Gamma H2A Histone Family Member X

A2BP1 Ataxin 2-binding 1

ADP Adenosine Diphosphate

AMPA α-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid

ATP Adenosine Triphosphate

B2M Beta-2-Microglobulin

BFP Blue Fluorescent Protein

BLM Bloom syndrome RecQ like helicase

CACNA1C Calcium Voltage-Gated Channel Subunit Alpha1 C

CAMK2b Calcium/calmodulin-dependent protein kinase type II subunit beta

CDC Centres for Disease Control and Prevention

cDNA Complementary DNA

CGI CpG islands

CHD10 Cadherin10

CHD9 Cadherin9

CHIP-seq Chromatin immunoprecipitation Sequencing

CHRNA7 Acetylcholine Receptor, Neuronal Nicotinic, Alpha Subunit-7

CNS Central Nervous System

CNTNAP2 Contactin associated protein like 2

CNV Copy Number Variations

CpG Cytosine phosphate Guanine

CREB cAMP response element binding protein 2

CRISPRi Clustered Regularly Interspaced Palindromic Repeat interference

11

C1q Complement component 1q

CTCF CCCTC-Binding Factor

CV Common Variants

DMSO Dimethyl sulfoxide

DMEM Dulbecco’s Modified Eagle’s Medium

DNA Deoxyribose Nucleic Acid

DPYD Dihydropyrimidine dehydrogenase

DSB Double Stranded breaks dsDNA Double stranded DNA

EGR Early growth response

ERG Early Response Genes

FACS Fluorescence-activated cell sorting

FBS Foetal Bovine Serum

FMR1 Fragile X mental retardation 1

FMRP Fragile X mental retardation protein

GABA Gamma Amino Butyric acid

GEO Gene Expression Omnibus

GFP Green Fluorescent Protein

GRIN1 Glutamate Ionotropic Receptor NMDA Type Subunit 1

GWAS -wide association study

HDAC1 Histone Deacetylase 1

HDAC2 Histone Deacetylase 2

Hrp1 Heterochromatin Protein 1

IFIH1 Interferon Induced With Helicase C Domain 1

IMMP2L Inner Mitochondrial Membrane Peptidase Subunit 2 iPSC induced Pluripotent Stem Cells

12

KAP1 KRAB-associated protein 1

KCNMA1 Potassium calcium-activated channel subfamily M alpha 1

KRAB Krüppel-associated box

LAMB1 Laminin Subunit Beta 1 lncRNA Long non-coding Ribonucleic acid

MACROD2 MACRO Domain Containing 2

MECP2 Methyl CpG Binding Protein 2

MEGF10 Multiple EGF Like Domain 10

MERTK MER Proto-Oncogene, Tyrosine Kinase mRNA messenger RNA

NHA Normal Human Astrocytes

NLGN1 1

NPAS3 Neuronal PAS Domain Protein 3

NRCAM Neuronal Cell Adhesion Molecule

NRXN

NuRD nucleosome remodeling and deacetylase

PTBP1 Polypyrimidine Tract Binding Protein 1

PXDN Peroxidasin qRT-PCR Quantitative reverse transcription Polymerase Chain Reaction

RBFOX1 RNA Binding Fox-1 Homolog 1

RECQ RecQ helicase

RFDW2 Ring Finger And WD Repeat Domain 2

RIMS3 Regulating Synaptic Membrane Exocytosis 3

RNA Ribonucleic Acid

RNA pol 2 RNA polymerase 2

ROS Reactive oxygen species

13 rRNA Ribosomal RNA rSAP Shrimp alkaline phosphatase

SBF2 SET Binding Factor 2

SFARI Simon Foundation Autism Research Initiative sgRNA Small guide RNA

Sgs1 Slow growth suppressor 1

SHANK3 SH3 and multiple ankyrin repeat domains 3

SMEA5A Semaphorin 5A

SND1 Staphylococcal nuclease domain-containing 1

SNV Single Nucleotide Variants

SNX5 Sorting nexin 5

SOX5 SRY-Box 5

SR Serine and arginine-rich

SRF Serum Response Factor

SRRM4 Serine/Arginine Repetitive Matrix 4

STK39 Serine/Threonine Kinase 39

SUCLG2 Succinate CoA Ligase GDP- Forming Beta Subunit

TANC2 Tetratricopeptide Repeat, Ankyrin Repeat And Coiled-Coil Containing 2

TAS2R1 Taste 2 Receptor Member 1

TBP TATA-Box Binding Protein

TDRD3 Tudor domain containing protein 3

TF Transcription Factor

TFIIA Transcription Factor II A

TFIID Transcription Factor II D

TSC1 TSC Complex subunit 1

14

TSC2 TSC Complex subunit 2

TSS Transcription start site

UBE2H Ubiquitin Conjugating Enzyme E2H

UBE3C Ubiquitin Protein Ligase E3C

ZNF594 Zinc Finger Protein 594

15

1 Introduction

The identity and functional state of a cell is fundamentally governed by the set of genes expressed from its genome. As a consequence, fine-tuning of gene expression regulation is required throughout organismal development1, 2. The mechanisms that govern gene expression regulation in higher eukaryotes are highly complex and are known to involve an interplay between the core transcriptional machinery, transcription factors and DNA regulatory regions3. Another layer of complexity results from the function of multiple classes of non-coding RNAs4. In addition, topoisomerases, which have been extensively studied for their role in DNA replication and repair, have recently been implicated in regulating gene expression5-8. In particular, topoisomerases have been shown to play a role in the expression of very long genes (> 100 kb) in neurons9. This observation, combined with genetic data showing that long genes are enriched among those associated with Autism Spectrum Disorders (ASD)10, 11, raises an important question: are topoisomerase-dependent gene expression changes implicated in neurodevelopmental disorders such as ASD?

To begin to address this question, the present study aimed first to assess whether Topoisomerase 1 (TOP1) is required for the expression of long genes not only in neurons, but also in glial cells, which are highly abundant in the brain (Chapter 1). Further, the expression of TOP1 and TOP1-dependent long genes were assessed in brain tissue from ASD cases and healthy controls (Chapter 2).

As a background for this study, the introduction section provides an overview of its two main aspects: the mechanistic roles of topoisomerases and the molecular genetics of ASD. Firstly, the introduction covers the role of topoisomerases in DNA replication and repair, as well as recent data implicating topoisomerases in gene expression regulation. Recent evidence showing a role of topoisomerases in brain development is also discussed. Secondly, the introduction addresses the main aspects of ASD genetics, including evidence for an association of long genes with ASD, and the identification of TOP1 genetic variants in ASD. Finally, the introduction section also discusses the functional importance of astrocytes, the most abundant type of glial cells, which are used as a cellular system in this study.

16

Figure 1.1 Study overview. TOP1 inhibition has been previously shown to lead to altered transcription of long genes in neurons. In turn, altered synaptic function has been linked to abnormal neural development, thus potentially leading to neurodevelopmental disorders such as ASD. The specific questions addressed in the present study are shown in the blue boxes.

Topoisomerases Topoisomerases are enzymes which are ubiquitous in distribution and found in all forms of life from prokaryotes to higher eukaryotes12. These enzymes can resolve topological problems of DNA encountered as a result of crucial processes of life such as chromatin compaction, the formation of higher-order structures, recombination, replication and transcription13-15. In , there are six topoisomerases divided into several subfamilies, Type1A (TOP3α and TOP3β), type 1B (nuclear TOP1 and mitochondrial TOP1) and typeIIA (TOP2α and TOP2β)13. TypeIIA and TOP3α are present in the nucleus as well as mitochondria, unlike type 1B topoisomerases which are exclusively present in unique cellular organelles15. Topoisomerases are classified according to the number of breaks it can introduce in the DNA. Type1, introduces a break on one of the DNA strands and the complementary stand passes through the nick. This way one supercoil is resolved at a time. Type2 topoisomerase introduces breaks on both the strands of the DNA allowing another double-stranded DNA segment to pass through and help

17 in process of catenation and decatenation of DNA molecules. This way two supercoils are resolved at a time15, 16.

Table 1.1 Classification of human Topoisomerases Human Topoisomerases Type 1A TOP3α TOP3β Type 1B TOP1 TOPmt Type II A TOP2α TOP2β

1.1.1 Mechanism of action of Topoisomerases Type1A helps relieve negatively supercoiled DNA (DNA that is under-twisted and wound to <10.5 base pairs per turn)17 by introducing transient breaks on a single strand of DNA. Negatively supercoiled DNA is essential for effective binding and cleavage of the DNA strand. Type1A, by its active tyrosine residue, attacks 5’ phosphate group of 3’ DNA segment (generating TOP1 cleavage complex-TOP1cc) and generates a nick by a first transesterification reaction (Figure 1.2). Type1B, with its active tyrosine residue, forms a covalent bond with 3’ phosphate group of 5’ DNA segment. Type1B efficiently resolves positive and negative supercoiling. Unlike Type1A, Type1B can bind to a single strand of dsDNA generating a nick in a single strand. The free DNA end which is “non-covalently attached to the enzyme” allows free rotation and resolves the supercoils. The second transesterification results in re-ligation of the DNA segment12, 13, 16, 18. Type2 topoisomerases cleave both the DNA stands simultaneously and help relax positive and negative supercoils. Hence, TOP2 plays important role in catenation, decatenation and chromatin segregation during mitosis12, 16.

18

igure 1.2 Mechanism of TOP1 function. TOP1 resolves topological problems that occur during replication, transcription and chromatin remodelling. The active tyrosine residue of TOP1 attacks the 5’ phosphate group of supercoiled DNA (TOP1cc) generating a nick in a single strand of DNA and a strand non-covalently bound to TOP1 passes through the nick reducing the number of DNA supercoils. Figure adapted from Yves Pommier (Nature Reviews Cancer 2006).

1.1.2 Replication and chromosomal organisation During the semi-conservative type of replication (TOP2α and TOP3α and TOP1)18, two DNA strands are required to separate accurately. Unwinding of DNA causes positive supercoiling (DNA that is wound to >10.5 base pairs per turn)17 upstream of the replication fork. These positive supercoils can be resolved by several mechanisms such as rotation of the replication fork or replication machinery and enzymatically by topoisomerases for the rapid progression of the replication fork. Rotation of the replication fork or machinery can still cause intertwining of daughter chromatids and positive supercoiling of unreplicated DNA downstream of replication fork12, 14-16. For these reasons the positive supercoiling is best resolved by type1B or type II (TOP2α and TOP2β) which are capable of binding to dsDNA. Type1A (TOP3α and TOP3β), because of its limitation to bind only to single-stranded DNA, is unlikely to be involved12. Precatenanes are entangled daughter duplexes that are formed behind a replication

19

17 fork during strand synthesis . Rotation of the replication fork can resolve positive supercoiling, but also generates precatenanes of sister chromatids. These precatenanes are resolved by TOP2α15. Negative supercoiling generated for the accessibility of replication machinery for the initiation of the replication is resolved by TOP1, TOP2α and TOP3α15.

Several studies have found that TOP2 and type1A plays important role in segregation at replication termination. At the replication termination site where two replication forks converge generating highly supercoiled DNA, TOP2α facilitates decatenation of sister chromatids by interacting with the mammalian RecQ helicase15. Studies carried out in yeast also confirm the role of Top2 to promote fork fusion and separation of sister chromatids. Top2 knockdown experiments in yeasts leads to failure of replication completion, loss of genomic stability, mitotic arrest and ultimately cell death15. After completion of replication, chromosome condensation is vital for the cell to proceed to mitosis. Condensin is a multi- subunit protein complex that plays primary roles in chromosome assembly and segregation in eukaryotes19. In this process of condensation, condensin plays an important role and introduces positive supercoiling in the DNA. In this scenario, TOP2 releases the supercoiling for the maximum compaction of the chromosome in a nucleus15, 20. Post-translational modification of TOP2 such as SUMOylation which is a post-translational modification, where SUMO family of proteins are conjugated to the lysine residue of the target proteins21; helps to maintain chromatin structure by cohesion at the centromere17, 22.

Type1A topoisomerases with their association with helicases (Sgs1 in yeast and BLM in human) and also their ability to preferably recognise Holliday junctions are able to resolve double and single Holliday junctions. Holliday junctions are plausible recombination intermediates between a pair of homologous DNA molecules. Neither TOP2 nor Type1B is capable to participate in this process12. The benefit of involving topoisomerase in resolving double Holliday junctions is that strand exchange can be avoided hence, stalled replication can be resumed without the exchange of genetic material12. This is advantageous because unwanted exchange of material can introduce deleterious mutations and genomic instability at the site of recombination12.

1.1.3 Role of Topoisomerases in transcription Transcription requires recruitment of many transcription factors at the transcription start site (TSS), which prevents DNA rotating freely along its own axis and hence generating supercoiling in DNA12. Recruitment of RNA pol 2 on TSS, causes positive supercoiling upstream of RNA pol 2 and negative supercoiling downstream of RNA pol 2. Negative

20 supercoiling of DNA promotes the formation of R-loops (R- loops are, RNA–DNA hybrids in which a single-stranded RNA hybridizes to a template strand in a DNA duplex and displaces the non-template strand as a loop)15 by binding to nascent RNA. Formation of R-loops prevent replication fork migration, genomic stability and facilitates DNA breakage. DNA regions harbouring unmethylated CGIs, highly transcribed genes, 3’ end of genes, ribosomal DNA loci and recombination sites are prone to R-loop formation15. Negative supercoiling is resolved by TOP1 and TOP3β by separating nascent RNA from DNA template by the unwinding of DNA rather than the nucleolytic effect of RNaseH. Positive supercoiling causes stalling of RNA pol2 leading to inhibition of transcription, TOP2 undergoes conformational changes to widen the cleavage gap hence, similar or different molecule of DNA which is not cleaved can pass through. At promoters, TOP1 and TOP2 help recruitment of RNA pol2 helping initiation of transcription15, 23. Topoisomerase facilitates transcription by forming stable complexes of TFIID (Transcription factor IID) -TFIIA (Transcription factor IIA) at the promoter12, 14, 23 this creates a bend in DNA confirming their role in the architectural organisation of DNA. This phenomenon can also facilitate promoter-enhancer interaction hence the expression of the target gene6. Interestingly, formation of the pre-initiation complex is not affected by a catalytically inactive form of TOP1 confirming its role in transcription initiation beyond nicking6. TOP1 also promotes TF (Transcription Factor) and TBP (TATA-Box Binding Protein), to bind at the transcription start site and augments transcription15. Topoisomerase 1 through its kinase activity phosphorylates splicing factors such as SR proteins. SR proteins localise at exon-intron junctions and actively participate in pre-mRNA processing and transcription 23-25. It has been demonstrated that highly expressed genes require both TOP1 and TOP2 activity for transcription whereas, genes possessing low expression levels can function with TOP1 activity alone15.

Experiments conducted recently on TOP1 show that the N-terminal domain of TOP1 interacts with a C-terminal domain of RNA pol2 establishing a strong symbiotic relationship to facilitate transcription. Baranalo et al. have shown that phosphorylated RNA pol2 is required for catalytically active TOP16. Experiments showing an association of TOP1 with various forms of phosphorylated RNA pol2 that are phosphorylated on Serine 5 at the TSS and Serine 2 at the transcription termination site implies a role in initiation and termination of transcription. CHIP-seq experiments showed that TOP1 was distributed with RNA pol2 along the gene body during elongating phase explaining its role in RNA Pol2 elongation kinetics. TOP1 was

21 accumulated at the TSS if RNA pol2 was paused suggesting that TOP1 uses a “diffuse” mode of recruitment in contrast to “focal” mode of recruitment by TOP26-8.

A study carried out in yeast demonstrated that lack of Top1 causes hyperrecombination in DNA segments rich in ribosomal RNA genes. Hyper-recombination is due to rRNA genes` high copy number per cell, higher transcription rate by RNA pol1 and organisation of rRNA genes as tandemly repeated genes. This finding suggests that Top1 plays an important role in transcription of ribosomal RNA genes and can affect the metabolism of rRNA rich DNA segments14.

Several studies have consistently reported increasing involvement of TOP2β in transcription and gene expression. Evidence of TOP2β being recruited to active transcription site such as H3K4me2 supports this theory24. Ju and Rosenfeld have noted the involvement of TOP2β at promoters for signal-dependent gene expression. They revealed that oestrogen receptors bind to promoters and cause structural changes at the TSS making it unfavourable for transcription. TOP2 introduces transient DNA breaks at these sites to relieve torsional stress and facilitate transcription initiation22, 26.

TOP2β is associated with histone deacetylases, HDAC1 and HDAC2, implying a role in chromatin remodelling24. Top1 is also found to be associated with Hrp1, a chromatin remodeller which takes part in nucleosome remodelling and facilitates transcription at active promoters6.

Neuronal gene expression is regulated according to responses to external stimuli. Genes that respond earliest are termed early response genes (ERGs), such as transcription factors, Fos, FosB, Egr1 and Npsa4. Promoters of these early response genes are already bound by RNA pol2, CREB (cAMP response element binding protein) and SRF (Serum Response Factor). Hence, these genes are expressed within minutes of stimuli occurrence. Early response genes then regulate the expression of late response genes responsible for neurite outgrowth, synaptogenesis and synaptic maturation. It is believed that expression of these genes in basal condition is regulated by topological problems of DNA. TOP2β by creating DSBs (Double Stranded Breaks) on these promoters, in an activity-dependent manner has been involved in releasing torsional stress and facilitating transcription. It was observed that TSS of early genes is in the proximity of CTCF (CCCTC-Binding Factor) binding sites. CTCF is a transcriptional repressor protein which inhibits transcription. TOP2β mediated DSBs at CTCF binding sites facilitates transcription of ERG. CTCF-mediated chromatin loops inhibit the interaction

22 between enhancers and promoters and thereby preventing enhancer transcription crucial for early response genes. DSBs in DNA due to randomly induced breaks, replication stress or UV damage, activates DNA damage response which activates various DNA repair pathways. These repair pathways recruit protein kinases such as ATM, ATR and DNA-PKs which are believed to phosphorylate H2AX commonly known as ⅟H2AX. Hence, presence of ⅟H2AX suggests the DNA damage and activation of associated repair pathways19. TOP2β, by generating DSBs at CTCF binding sites, which is evident by the presence of ⅟H2AX (Gamma H2A Histone Family Membrane X) relieves topological stress, promotes enhancer-promoter interaction and thereby expression of Early Response Genes27.

TOP3β has been found to form complexes with TDRD3 (Tudor domain containing protein 3) which help stabilise TOP3β at active promoters and inhibits R-loop formation. TDRD3 acts as a mediator to form a complex of TOP3β-TDRD3-FMRP (Fragile X mental retardation protein). Thus, TOP3β facilitates transcription and translation15.

Figure 1.3 Interactions and functions of topoisomerases. Overview of interaction of Topoisomerases with various proteins and its role in cellular functions. BLM (Bloom syndrome RecQ like helicase), FMRP (Fragile-X mental retardation protein), HDAC1 (Histone Deacetylase 1), HDAC2 (Histone Deacetylase 2), Hrp1 (Heterochromatin protein 1), RecQ (RecQ helicase), SR (Serine and arginine rich), TBP (TATA-Box Binding Protein), TDRD3 (Tudor domain containing protein 3), TFIIA (Transcription factor IIA), TFIID (Transcription factor IID).

23

1.1.4 Role of Topoisomerases in CNS development Type2 topoisomerases are expressed in a tissue, cell type and developmental stage specific manner. Two isoforms of Type2, TOP2α and TOP2β are homologous in N-termini and enzymatic activity but differ in C- termini and their expression. TOP2α and TOP2β are expressed at different developmental stages in the central nervous system. TOP2α is expressed in pluripotent stem cells but is replaced by TOP2β when these cells are differentiated into neurons, hence has crucial role in neurogenesis. TOP2α is highly expressed in proliferating cells such as spleen, liver and bone marrow and TOP2β is expressed mainly in differentiated cells such as neurons and glia the brain suggesting tissues specific expression of type 2 topoisomerases24. Top1 and Top2β are expressed throughout the developing mouse brain with Top1 showing higher expression in various cortical regions including cerebellum, striatum as well as excitatory synapses compared to its lower activity in hippocampus and hypothalamus24.

It is noted that many genes that are involved in normal synapse function are long genes, i.e. >100kb in length and perturbation of these genes lead to neurodevelopmental diseases, such as Angelman syndrome, Rett syndrome and ASD18. Interestingly, King F et al. discovered that mouse cortical neurons and iPSC derived- human neurons when treated with topotecan, a topoisomerase inhibitor, significantly reduced the expression of long genes9. Topotecan reduced the level of various synaptic proteins, such as NLGN1, CNTNAP2, NRXN1 all coded by long genes and responsible for neurotransmission and synapse formation. The effect of topotecan was reversible with restoring the synaptic plasticity on drug washout9, 10. Furthermore, it has been revealed that TOP2β dependent genes were also long, and combined activity of TOP1 and TOP2β is required for transcription elongation of long genes28.

Experiments conducted in Top2b knockout mice resulted in lethality of the mice due to the widespread effect on nervous system and lack of breathing in the absence of innervation of the diaphragm18, 22. Top2b knockout embryos showed aberrant activity of genes involved in development along with genes involved in neurogenesis and neural differentiation. Top3b null mice mature normally but have a short lifespan. Deletion of Top1, Top2a and Top3a in mice causes termination of the embryo during early developmental stages. These studies indicate an important role of different topoisomerases in embryonic CNS development and normal physiological functions in adult mice18.

To reiterate, topoisomerases are evolutionarily conserved and ubiquitously expressed in all cell types from prokaryotes to eukaryotes. They play key role in cellular functions such as replication, transcription, translation and chromatin remodelling by resolving positive and

24 negative supercoiling generated during these processes. For this reason, mutations in genes expressing topoisomerases is likely to result in dysregulation of key cellular functions leading to pathogenicity of ASD.

Autism Spectrum Disorders Autism Spectrum Disorder (ASD) is a neurodevelopmental disease, affecting approximately 1% population in the world29-31. According to the data published by CDC in 2018 current prevalence of ASD is 1:59 in general population with males four times more likely to be diagnosed than females32. ASD is mainly characterised by impairments in sociability and language, which is distinct to humans33. The key features of ASD are minimal social interaction, a deficit in verbal and non-verbal communication skills and stereotypic behaviour30, 34. Apart from these core features, ASD co-occurs with other psychiatric disorders, for example, intellectual disability (ID), epilepsy, motor control deficit, metabolic and sleep disorders, tic, anxiety and attention- deficit hyperactivity disorder (ADHD)30, 34. ASD comprises of various neurodevelopmental disorders such as autistic disorder (13 cases per 10,000), pervasive developmental disorder/ not otherwise specified (PDD/NOS, 20.8 cases per 10,000) and Asperger’s syndrome (2.6 cases per 10,000)35. The prevalence of ASD is four times higher in males than affected females worldwide30, 31, 35. The prevalence of ASD diagnosis has increased in the last decade due to recent advances in molecular biology techniques such as next-generation sequencing and high-throughput techniques, computational and statistical methodology, increased awareness and broader diagnostic criteria36, 37. Autism was first noted in the 1940s as a genetic disorder by Kenner and Asperger syndrome was discovered around at the same time but on a different continent36, 37. ASD was considered as a genetic disorder by psychiatrists, but for general public, it was a disease caused by “refrigerator mothers”, where it was considered that children are unable to develop social skills due to ignorance and coldness traits observed in their mothers29, 35, 37. It is accepted widely as a heterogeneous disease involving a large magnitude of complex genetic mechanisms which do not follow a Mendelian inheritance pattern30, 37. A few environmental factors have also been associated with causing a small percent of ASD cases. These environmental factors include patients suffering from phenylketonuria due to consumption of a diet containing phenylalanine can develop a high propensity to harbour few phenotypic changes associated with ASD. Prenatal Exposure to teratogenic agents such as valproic acid (a common medication prescribed to treat epilepsy, seizures and migraine)37, 38, ethanol, thalidomide (prescribed as sedative and

25 to prevent nausea) and misoprostol (prescribed to prevent gastric ulcer) alter the expression of genes responsible for proliferation, apoptosis, neuronal differentiation and migration and synaptogenesis and account for some cases of ASD39. ASD is a very complex disease. Diagnosis of ASD is extremely challenging because of different ages of onset of symptoms, the varying degree of severity in core phenotypic changes and the involvement of estimated 200- 1000 genes37.

1.2.1 Overview of ASD The onset of Autism often occurs before the age of three36. The average age of ASD diagnosis is about 3.1 years40. A child with ASD fails to establish initial eye contact during the early developmental age, which often goes unnoticed. Inability to make social connections, isolation and “living in their own world” distinguishes children from others, with restrictions in development of early language. Repetitive behaviour is also noticed in the language used in conversation by pursuing a conversation with echoing of the words. Repetitive behaviour such as constant rocking during infancy, movement of fingers and flicking pages of the book, also, constant whole-body movements are observed. Cause of repetitive movements is not known but is thought to have a calming effect and seem to precipitate in events of stress36. Stereotypic behaviour is also observed in a daily routine where the specific order of events should be strictly followed. Symptoms of self-injury, hypersensitivity to specific sounds and insensitivity to truly painful stimuli is often seen in children with ASD30, 36.

Neurological symptoms such as epilepsy and insomnia are seen in 25% and 60% children diagnosed with ASD respectively. Poor motor skill and clumsiness are also seen in children with ASD. Gastrointestinal symptoms such as diarrhoea, constipation and bloating are observed in 45% cases along with obesity being a common complication of unknown etiology36.

Diagnostic criteria are published by American Psychiatric Association in the latest edition of Diagnostic and Statistical Manual of Mental Disorders36, 37. A checklist of diagnostic criteria are published in the manual and extensive and detailed interview with parents is conducted as a basis for diagnosing ASD. There is a number of available checklists, such as CARS (Childhood Autism Rating Scale), GARS (Gilliam Autism Rating Scale) and ABC (Aberrant Behaviour Checklist) which are used by medical professionals as well as schools36.

26

1.2.2 ASD Heritability There are many factors associated with inheritance of ASD. Among these are multiple single gene neuropsychiatric disorders are linked to higher heritability of ASD. Among these disorders are Fragile X syndrome (FMR1 gene), Rett Syndrome (MECP2), Tuberous sclerosis syndrome (TSC1/TSC2), Timothy syndrome (CACNA1C), Angelman syndrome (15q 11-13 chromosomal ). Altogether, these disorders contribute to ~15-20% of ASD cases30, 36. There are several genetic common and rare variations along with de novo mutations that are considered to contribute to the heritability and pathogenesis of ASD.

Multigenetic diseases are difficult to study compared to the single genetic disorders for their heritability, hence, Genome-wide association studies (GWAS) have been undertaken after the discovery of revolutionary molecular biology techniques, which allowed population-based studies involving multiple families to be carried out. Due to heterogeneity of ASD, simplex families with only one family member affected makes it challenging to identify rare variants but multiplex families and families with consanguinity diagnosed with ASD gave a better understanding of the genetic landscape of ASD36, 37. Family-based studies have revealed that 20% of siblings having an older sibling with ASD develop the disorder and as the number of probands in the family with ASD increase the heritability increases up to 50%, specifically for male children37. Another study estimated heritability at a higher rate (40%) with only one affected child but almost 60% for more than one child affected in the family30, 31. Twin studies have estimated concordance of ASD in monozygotic twins to be 30-90% and dizygotic between 0-30%34, 37. It is important to note that the heritability of ASD differs from one study to another, but genetic contribution to the disorder certainly ranges from at least 38% up to 90%34.

1.2.3 The genetic basis of ASD Common genetic variants are defined as the genetic variation that occurs in the general population at greater than 5% frequency compared to rare genetic variations that are represented at less than 5% frequency31. These genetic variations can be in the form of copy number variations (CNV), single nucleotide polymorphisms (SNPs) or structural variations such as duplications, deletions or insertions. These genetic variants can be either inherited or de novo 31, 34.

It is highly evident that the genetic architecture of ASD is heterogeneous and complex because it is shaped by a number of common, rare and de novo variants. Also, not all genotypic variants

27 result in phenotypic traits that are similar in individuals that have been diagnosed with ASD. Many unaffected family members can have the same CNVs as affected individual36. Many variants are shared with other neurodevelopmental disorders such as mental retardation, ADHD, seizures and schizophrenia36. Depending on the genetic buffer harboured by an individual to compensate the effect of rare deleterious variants with large effect, the phenotypic outcome of an individual is determined34. Thomas Bourgeron has given an excellent explanation for an individual to inherit ASD. Firstly, an individual is at low risk of developing ASD if he has high genetic buffer minimising even high dosage of rare deleterious mutation and can only develop ASD if highly penetrant de novo mutations occur. In the second scenario, an individual has moderate genetic buffer requiring a moderate dosage of rare deleterious variants. Thirdly, if an individual has a low genetic buffer, in which case low-risk variants are sufficient to cause disease, and a child is at high risk of developing ASD without the requirement of rare deleterious variants34.

Paternal age

There has been evidence that a paternal age could contribute to a higher prevalence of ASD in the population30. Most of the de novo mutations are of paternal origin, which in turn is likely due to the high number of divisions occurring during spermatogenesis 30. A study has demonstrated that the number of mutations increases at the rate of 2.01 mutations per year as the father’s age increases. The reason for this is, as the paternal age increases the rate of de novo mutations in the sperms increases and there are more possibilities of a child inheriting deleterious mutations which cannot be compensated by the specific genetic dosage of an individual. It has been noted that transition and transversion mutations occur at 18.2 and 2.55- fold higher respectively at CpG island compared with non-CpG islands. Although as the age increases the overall load of de novo mutations in the sperm also increases, paternal age does not account for the type and rate at which these mutations occur. Interestingly, overall recombination events are much higher in the older mother than younger and de novo mutations also occur at a higher rate in mothers. In spite of these facts de novo mutations occurring in sperms due to paternal age are more penetrable and likely to be inherited to off spring41.

Common inherited variants

Common variants (CV) are found widely in the population (minor allele frequency >5%)31 and risks associated with common variants are low42. Several GWAS studies carried out on family- based cohorts have attempted to find an association between common variations and ASD by

28 performing whole exome sequencing (WES) and whole genome sequencing (WGS). These GWAS studies have identified very few genome-wide loci but include polymorphisms in an intergenic region between CDH9 and CDH10 and SNPs at an intergenic region between SMEA5A and TAS2R1 and at MACROD231, 37. So, why a few genetic loci been identified from sequencing of the whole genome? This is due to heterogeneity of the disease, insufficiently large sample size and lack of reproducibility of same phenotypic traits as a result of identical genetic variants among different cohorts31, 36, 37. It has been noted that effect size of common variants is less compared to rare variants effect size. Nevertheless, these studies suggest that 15-40% or even up to 60% of ASD cases are due to common variants31, 37.

Rare inherited variants

Rare variants are defined as DNA sequence variants present in less than 5% (minor allele frequency <5%) in the general population31. The importance of rare variants in detecting the penetrance of genetic factors has recently been determined by the advent of WES and WGS. These rare variants can be in the form of structural variants, such as chromosomal abnormalities or sequence variants, i.e. copy number variations or single nucleotide variations (SNV)36. The interaction of CV with other CV and other factors can furhter increase the risk of ASD. Studies carried out in ASD cohorts show that inherited CNVs have differing expressivity which means that sibling of an ASD proband can carry the same CNV but without the diagnosis of ASD42. Also, not all recurrent CNVs are pathogenic, as only a few are found to be deleterious42. These issues make it very challenging to find causal variants for ASD. Hence, the need to study the rare variants has emerged to understand the aetiology of ASD. Rare variants have also been found to have much higher effect size, compared to common variants and can be equivalent to rare deleterious mutations42. Studies involving multiplex families along with families involving consanguinity provide more discrete insight into finding a rare inherited variants associated with ASD due to identity by descent at multiple allele and number of probands affected in the family37. Rare variants are found in many genes such as NLGN 3 and 4, NRXN1 and SHANK3 involved in synaptic function making them the focus of future diagnosis and research35, 37, 42. Haploinsufficiency of CHRNA7, a rare variant at 15q13.3 microdeletion is thought to be the cause of many neurodevelopmental traits36, 43, 44.

Overall, these findings suggest that interplay between common and rare variants and their burden on the genetic architecture of an individual determines the aetiology and heritability of ASD31, 34.

29

De novo variants

De novo variants can arise a result of mutations in the paternal germ line or in somatic cells during early embryonic life. De novo variants in the form of CNVs or SNVs do not play role in the heritability of ASD but have a significant role in the pathogenesis of ASD37. This is because, even if the mutation occurs in the germ line, it would not be heritable as it is not present in the somatic cells of the parent. An individual can harbour very few de novo CNVs or SNVs in a genome. Any individual can harbour 18-74 de novo SNVs in their genome, mainly occurring in the noncoding region with only 1-4 SNVs being found in exons34. Hence frequently arising de novo CNVs or SNVs in a specific gene can be associated with ASD37.

De novo CNVs are observed to be larger and comprise more genes, and they account for 5- 15% of ASD cases compared to 1-2% in general population34. The combined effect of these CNVs depends on their size content and chromosomal location; that is, if the CNV is located on a dominant or recessive allele or linked to the X-chromosome30. Most frequent de novo CNVs are 15q11-15q13 duplications, 16p11.2 microdeletions and duplication and 7q11.23 duplication30, 34, 36, 37. These CNVs occur very rarely by chance, hence are considered to be deleterious37. Interestingly, parents carrying de novo variants have no symptoms, but their progeny is diagnosed with ASD, confirming the penetrability of de novo variants and heterogeneity of the disease37. Altogether, these CNVs account for almost 1% of ASD cases36. Deletions at 22q11.2 and 22q13.3 are also considered to be very common34. 16p11.2 deletions and duplications seem to differ phenotypically and are associated with delayed speech and cognitive impairments in conjunction with various other symptoms related to autism36.

De novo SNVs resulting in loss-of-function nonsense and frameshift mutations along with mutations in splice sites have been considered to be high impact and deleterious30. De novo variants (odds ratio 3.5-4)31 have been found to be more deleterious compared to common variants (odds ratio 1.2 or greater)30 with an odds ratio that is even higher for the genes highly expressed in brain30, 31.

Chromosomal abnormalities

Chromosomal abnormalities account for 5-7% of ASD cases, and they occur as duplication or deletion of an entire or partial chromosome segment. Idiodicentric duplication of chromosome 15 results in the extra 47th chromosome. Unequal homologous recombination due to low copy repeats in this region seem to be responsible for the maternal duplication of 15q. This is observed as a de novo event in 1-3% of cases with ASD. Maternal

30 tetrasomy of this region causes severe phenotypic changes compared to trisomy36. Genomic imprinting is an evolutionarily conserved epigenetic mechanism which maintains the expression of parentally inherited genes without the need for alteration in the DNA sequence. Mammals are diploid organisms harbouring two copies of each gene. But the subset of genes in the mammalian genome are expressed by only one copy of the gene. Thus, differential expression of imprinted inherited genes is achieved depending on paternal or maternal origin of the imprinted gene. Genomic imprinting is achieved in humans by DNA methylation at cytosine-C5, histone modifications, lnc-RNAs and chromatin modifications. Interestingly, subset of imprinted genes mainly code for factors regulating embryonic and neonatal development and brain function45, 46. Paternal duplication of 15q has negligible or no phenotypic effect because of genomic imprinting of this region36. Role of protein coding regions of genome has been widely explored in heritability and pathogenicity of ASD. Besides protein coding regions, SNVs in non-coding regions of the of the genome such as promoters, transcription factor binding sites and noncoding RNAs have also been contributory to altered synaptic pathways and neurodevelopment. Among the SNVs found in non-coding regions, paternally inherited miR-873-5p variant with altered binding affinity for several high confidence ASD candidate genes including NRXN2 and CNTNAP2 has been published by Williams et.al (2018). This confirms the phenotypic diversity and complex heritability of the disease in ASD probands47.

Relevance of astrocytes as a cellular system for neurodevelopmental disorders Neuroglia was first described as a “sheath around single nerve fibre” as early as 1838 but were later on discovered in 1856 by Virchow and his colleagues, describing them as, “Nerveknitt” or “Nerve cement”; a connective tissue of CNS, which connects the neurons together48, 49. There are several types of glial cells such as oligodendrocyte, microglia and astrocytes48. Glial cells are evolutionarily conserved cells in the CNS, from invertebrates such as Caenorhabditis elegans to humans but the number of glial cells increases as the brain increases in size and complexity. Larger mammals such as elephants and whales have 80% more glial cells compared to neurons but in the humans the ratio of glia and neurons is 50%- 50%. This ratio changes according to the different brain regions for e.g. in the cerebellum 80% glial cells are present compared to neurons48, 50, 51.

31

Astrocytes have been very extensively studied over the last decade and are widely believed to take part in synaptogenesis due to their interaction with neurons. This has been demonstrated by various in vivo and in vitro studies. In vitro experiments carried out in rodent retinal ganglion cell neurons (RGCs) reveal that the presence of astrocytes in cultures is responsible for the formation of tenfold more synapses and increased synaptic activity in developing neurons and complete removal of astrocytes led to decreased synapse formation or neuronal degeneration and death48, 51. Similar studies have been carried out across varios species such as Drosophila, Xenopus, C.elegans and humans confirming the role of astrocytes in synaptogenesis50.

Studies carried out in rodents and mice suggested that each astrocyte can connect with 100,000 synapses via its interaction with neurons, but in humans, their connection is thought to be even more which is about 140,000- 200,000 synapses48, 50-52.

Astrocytes are a heterogeneous population of cells and several studies reveal different patterns of morphology, gene expression, mRNA and protein expression across different layers of cortex and spinal cord emphasising the fact that there are sub-types of astrocytes possessing different functional properties 48, 53. Among the several types of astrocytes, protoplasmic astrocytes with radial morphology are mainly present in grey matter and are closely associated with neurons and blood vessels forming neuro-glial and neurovascular junctions with a major role in synaptogenesis and modulation of the Blood-Brain Barrier respectively48, 53. Fibrous astrocytes are mainly present in white matter have elongated morphology, they have close connections with oligodendrocytes and play a major role in myelination. Human astrocytes are intrinsically larger than their rodent counterparts. Also, some astrocytes are found to extend their processes in the various cortical layers48, 53, 54.

Astrocytes help in the formation of many types of synapses such as glutamatergic, GABAergic, cholinergic and glycinergic by mediating contact with neurons or by secreting various molecules. Mechanisms that regulate synaptogenesis are contact-dependent as well by secreted molecules. Neurons are unable to form synapses unless they are contacted by astrocytic processes. For example, embryonic neurons when contacted with astrocytes induce protein kinase C signalling pathways and induce synaptogenesis50, 52.

32

Figure 1.4 Schematic of astrocyte functions. This figure was adapted from, http://houseofmind.tumblr.com/post/18779434067/list-astrocyte-functions

1.3.1 Role of astrocyte-secreted signal molecules in synapse formation, maturation and in synaptic plasticity Astrocytes produce many signal molecules such as various thrombospodins, Hevin, glypican and SPARC. Neurexin and neuroligins which are expressed by long genes (>100kb in length) in conjunction with these astrocytic molecules play pivotal role in synapse formation, maturation and elimination9, 48, 50, 54. Here I have discussed the role of signal molecules in maintaining CNS homeostasis.

Thrombospondin (TSP) is one of the major proteins secreted by astrocytes to induce excitatory synapse formation. There are five TSPs in mammals, of these TSP1 and TSP2 are actively secreted during the active period of synaptogenesis and after the first week of the postnatal period levels of TSP1 and TSP2 decreases and are replaced by TSP4. Experiments conducted by adding TSP1 and TSP2 induces synapse formation. This finding is comparable to the studies in which the astrocyte conditioned medium increased the number of synapses formed and

33 depletion of these factors decreased the synapse formation. An in vivo study conducted with TSP1/2 double knockout mice resulted in fewer cortical excitatory synapses51, 52.

Synthesis of cholesterol by astrocytes is likely to play an important role in synaptogenesis. Apolipoprotein E is believed to be bound to cholesterol which can strengthen presynaptic function by promoting synthesis and maturation of synaptic vescicles51, 52. In vivo studies conducted in mice with decreased lipid synthesis leading to decreased cholesterol has been shown to be responsible for impaired synaptic development and plasticity50.

Hevin is a protein which is highly expressed in developing and mature astrocytes. Hevin has a significant role in modulating synapse number and size which is comparable to the effect induced by TSPs. Another protein homologous to Hevin is SPARC (secreted protein acidic, rich in cysteine), which specifically antagonises the synaptogenesis induced by hevin. Thus, hevin and SPARC proteins govern the formation and maturation of synapses in CNS51, 52.

Other factors that regulate glutamatergic synaptic strength are glypican-4 and glypican-6. These signalling molecules convert silent synapses to excitatory, functional synapses by increasing AMPA receptors on the surface51, 52. Tumour necrosis factor-α (TNF-α) is involved in generating excitatory synapses by increasing AMPA receptors at excitatory synapses and decreasing GABA receptors at inhibitory synapses. TNF-α is also involved in the homeostatic scaling of synapses by maintaining the strength of synapses in response to neuronal activity50.

Astrocytes are extensively involved in the process of phagocytosis by engulfing damaged organelles, debris and elimination of synapses during early developmental and adult stages of life. They express many genes contributing CNS homeostasis. Among many genes, transforming growth factor-β (TGF-β) secereted by astrocytes indirectly regulates elimination of synapses by increasing expression of c1q in neurons (Complement component 1q, is an imporatant componenet of complement pathway which has a crucial role in maintaining immunity. In CNS, c1q is involved in synaptic pruning and during neural inflammation55) leading to tagging of unwanted synapses. Astrocytes are equipped with phagocytic receptors MEGF10 and MERTK which are involved in synaptic elimination50, 52, 54. MEGF10 and MERTK contribute actively in eliminating synapses in neural activity dependent manner by detecting phosphatidylserine in debris cells56. Study published Chung et.al have confirmrd these findings suggesting eliminetion of not only post-natal synapses but also, excitatory and inhibitory synapses in adult brain56. Astrocytes deficient in either receptor have a 50% reduced

34 ability for the synaptic elimination48, 50, 52. Thus, astrocytes continually contribute in remodelling synaptic architecture of brain56.

Neuro-glia interactions and its significance in neurodevelopment There has been a notion for many years that brain functions mainly due to neuron-neuron interactions and glia do not play significant role in the brain57. In fact, all types of glia can influence and respond to neurotransmission in several ways58. Furthermore, studies carried out in last decade have shown clear evidence that glial cells, mainly astrocytes, play very important role in maintaining CNS homeostasis by formation, maintenance and elimination of synapses. Astrocytes are evolutionarily conserved cells from invertebrates to highly complex vertebrates such as mammals. Furthermore, as the complexity of CNS increase the astrocyte-neuron ratio also increases confirming the role of astrocytes in complex nervous system of humans48, 50, 59. Investigation of molecular, physiological and anatomical functions and structure of astrocytes and synapses strongly suggest involvement of astrocytes in receiving, processing and coding the information from neurons. This has given rise to the concept of “tripartite synapses” in which synapses are defined as comprising the presynaptic and postsynaptic specializations of the neurons and the glial process that ensheaths them60. For a cell to be a functional unit in the brain it needs to be able to receive, integrate and code this information and communicate with either surrounding neuronal or non-neuronal cells59. Neurons can perform this task because of their specific anatomical structure and intrinsic electrical property. Alternatively, due to lack of electrical excitability, astrocytes perform the same tasks as neurons by elevating cytoplasmic Ca2+ levels which is stored in endoplasmic reticulum57, 59. Astrocytes increase Ca2+ levels in absence of neuronal activity as well as when neurotransmitters are released in response to synaptic activity. Astrocytes are able to differentiate and process the signals received due to the release of different neurotransmitters at synapses. For example, astrocytes present in stratum oriens (which have both cholinergic and glutamatergic synapses) of hippocampus respond to acetylcholine but not to glutamate from alveus. Conversely, the same astrocytes respond to glutamate from Schaffer collateral (which a glutamatergic synapse)57. The Ca2+ elevations in response to these two pathways is distinct suggesting that astrocytes non-linearly modulate the Ca2+ signals57. Astrocytes release various gliotransmitters in response to neuronal activity which in turn activates pre and post synaptic receptors at the synapses modulating neuronal excitability and synaptic plasticity59.

35

Gene expression analysis performed in current study as well as other in vivo studies have confirmed the production of various cell adhesion molecules such as, neurexins, neuroligins, cadherins and ⅟- protocadherins by astrocytes. Localisation of these cell adhesion molecules in astrocyte processes play important role in synaptogenesis and mediate astrocyte-synapse interactions60. I have discussed in detail the functions of these cell adhesion molecules which are expressed by long genes (>100kb) in section 2.4.2.

Glia play important role is connecting synapses and conducting action potential to long distance parts of the brain. Oligodendrocytes are responsible for myelinating axons and it is noted that the myelinated axons are capable to traverse the action potential ten times faster than the unmyelinated axons58.

It has been experimentally proven that increased energy requirement of neurons in stress conditions establishes glucose gradient which allows inter-cellular diffusion of glucose from distant astrocytes into the astrocyte network maintaining synaptic activity in situations of glucose depletion and epilepsy58, 59. Similarly, glucose supply from the astrocytes is believed to sustain the subset of neuronal activity in the normal physiological conditions58.

Since in vitro experiments of neuro- glia interactions are difficult to translate into in vivo animal behaviours mathematical models of neuro- glia network has provided good evidence for the importance of neuro-glia interactions. This model suggested that involvement of artificial astrocytes in neuro-glia network enhanced the performance of this network compared to the network containing neurons alone. Improvement in the network performance also depended on intrinsic properties of astrocytes and strength of neuro-glia interactions59.

Besides the above discussed importance of neuro-glia interactions in brain development and functions, I have also discussed in detail the functions of various signal molecules secreted by astrocytes (section 1.3.1).

ASD has been mainly identified as a neurodevelopmental disease due to abnormal synaptic homeostasis. Long genes (>100kb in length) are highly expressed in the brain are involved in synaptogenesis and synaptic plasticity. Evidence of neuro-glia interactions and the concept of tripartite synapses clearly suggests the role of astrocytes to receive, process, distinguish and transmit the synaptic signals like neurons. Many long genes such as neurexin and neuroligins are highly expressed in astrocytes and play significant role in synapse maintenance and maturation. Study published by King et.al. (2013) demonstrated that inhibition of TOP1 leads to down regulation of long genes linked to synaptic function in neurons. Due to indispensable

36 role of long genes in maintaining synaptic plasticity our findings strongly suggest that mutation in TOP1 may lead to perturbed expression of long genes and may contribute to pathogenicity of ASD.

37

2 Investigating the role of TOP1 in regulating long gene transcription in glial cells Introduction The previous study has demonstrated that TOP1 inhibition by topotecan, as well as the downregulation of TOP1 expression, leads to reduced expression of long genes (>100kb) in neurons9, 61. But whether this effect is cell type-specific remains unclear. Given the high abundance of astrocytes in the human brain, and their important functional interactions with neurons (discussed in the Introduction section), we aimed to determine whether TOP1 plays a role in the efficient transcription of long genes in astrocytes as well. To this end, our lab has generated microarray data from triplicate cultures of human primary astrocytes, with and without treatment with topotecan (Methods)62. These data showed downregulation of long gene expression in human primary astrocytes in response to topotecan treatment. However, the microarray data had not been validated by an independent method, and the analysis has not been focussed on ASD-associated genes. Furthermore, the effect of TOP1 had only been assessed by topotecan treatment, but not by downregulation of TOP1 in human primary astrocytes. Therefore, the first goal of the work presented in this chapter was to validate the microarray results by qRT-PCR using independent topotecan treatment experiments in human primary astrocytes, with a focus on ASD-associated genes.

Moreover, given previous evidence showing that Top1 affects gene expression in neurons both in a cleavage-complex-dependent and independent manner61, I aimed to test both of these aspects in astrocytes. Since topotecan stabilizes the TOP1 cleavage complex, gene expression changes observed in response to topotecan treatment are those dependent on the cleavage complex. To determine cleavage complex-independent effects, I also attempted to knock-down TOP1 expression in human glioblastoma cells using CRISPRi.

CRISPR/Cas9 has been widely utilised inexpensive and efficient system to analyse gene expression in mammalian cells. Class II CRISPR system is an immune mechanism adapted by many bacteria63, 64. DNA or RNA from the invading bacteria and viruses are integrated in the host genome between the CRISPR repeats. The foreign DNA (protospacer) and CRISPR repeat sequences are transcribed together generating crRNA. The crRNA then hybridises with tracrRNA (transactivating crRNA) and this hybridised RNA makes a complex with the bacterial Cas9 nuclease. CrRNA directs Cas9 to complementary target DNA provided it is next to the PAM (protospacer adjacent motif) sequence generating DSBs in DNA. This same mechanism has been exploited by many researchers to study gene expression. If the nucleolytic

38 domains of Cas9 are mutated it can abolish the DNA cleavage activity of Cas9. This Cas9 is known as dead Cas9 (dcas9). For experimental convenience crRNA and tracrRNA are fused together and is known guide RNA (gRNA, 20-22 nt long protospacer). Experiments have shown that when dcas9-gRNA complex is directed to the promoter of target gene, it mildly decreases the expression of gene. Dead cas9 coupled with transcriptional activator (VP64) or repressor (KRAB) can significantly increase or decrease the gene expression respectively64.

Zinc finger proteins containing Kruppel associated box (KRAB) domain are large family of transcriptional repressor proteins conserved in all vertebrates65, 66. Functions of KRAB domain containing proteins include transcriptional repression of RNA polymerase I, II, and III promoters, binding and splicing of RNA, and control of nucleolus function66. KRAB domain in the proteins interacts with scaffolding protein KAP1. Different domains of KAP1 interacts with heterochromatin protein 1 (HP1), histone deacetylases and promotes chromatin remodelling through recruitment of NuRD and histone lysine methyl transferases to the promoters of target genes. It has been reported that KRAB/KAP1 complex can induce long range heterochromatin by spreading chromatin repressive marks such as HP1 and H3K9me3 67, 68. Furthermore, KRAB/KAP1 is believed to repress genes with higher expression and activity. Proteins consisting of KRAB domain are also involved in generating site specific DNA methylation during embryonic development thus contributes in maintaining epigenetic mark during early development65.

Thus, CRISPRi has proven to be an effective tool to repress gene expression in bacterial as well as human cells67, 68. Our lab has generated a U87 glioblastoma cell line constitutively expressing dCas9-KRAB (Methods). U87 cells were derived from a 44 years-old female suffering from glioblastoma. Glial-cell derived from this brain tumor has been extensively used for the study of nervous system69, 70.

The specific aims of the work presented in this chapter were to:

• Assess the effects of TOP1 inhibition by topotecan on gene expression in human primary astrocytes using qRT-PCR. • Knock-down TOP1 in glial cells using CRISPRi. A prerequisite of this aim was the optimisation of transfection efficiency in human primary astrocytes and U87 cells.

39

Materials and Methods

2.2.1 Materials Primary Cells and Cell lines: Normal human astrocytes (NHA) were obtained from Lonza Walkersville Inc (CC-2565). NHA cells were stored in liquid nitrogen until required. NHA cells were at a maximum of 25 passages.

U87-dCas9-KRAB cells had been previously generated in the Voineagu lab by lentiviral transduction of pHAGE EF1α dcas9-KRAB into U87 glioblastoma cells (ATCC HTB-H14) were a gift from Prof. Daniel Lim at UCSF.

HEKT293 cells (ATCC CRL-3216) were a gift from Prof. Louise Lutze-Mann at UNSW.

Oligonucleotides for qRT-PCR: qRT-PCR primer sequences were designed using GETprime (http://bbscftools.epfl.ch/getprime)

Criteria for selecting primer pairs were as follows:

• Should span exon-exon junction to avoid confounding signal from genomic contamination. • Primer pair that would target the majority of isoforms for each gene. • All the primers had a melting temperature between 57°C- 59°C. • 19-24 nucleotides in length. qRT-PCR primer oligonucleotides were synthesised by IDT Asia (Integrated DNA Technologies Pte. Ltd. Singapore) at 25nmole, and are listed in Table 2.1

Oligonucleotides for cloning TOP1 sgRNAs: sgRNA sequences targeting human TOP1 were designed using the Broad Institute Genetic Perturbation Platform (https://portals.broadinstitute.org/gpp/public/analysis-tools/sgrna-design-crisprai)71. SgRNAs that ranked top 2 by this tool was selected. BlpI and BstXI restriction sites were added to sgRNA oligos for cloning into pBA439 (Addgene). Forward and reverse oligos for each primer (Table 2.2) were synthesized by IDT Asia (Integrated DNA Technologies Pte. Ltd. Singapore) at 25nmole.

FACS buffer: FACS buffer for cell sorting was prepared in MilliQ water by adding 25mM HEPES, 2mM EDTA and 2% FBS. The stock solution was then filtered with 0.45µm syringe filters.

40

Table 2.1 Table of Primers. Tm= Melting temperature. NT= nucleotides.

Gene Name Primers Sequence (5’-3’) Tm No.

°C of NTs

KCNMA1 KCNMA1 (Forward) AGTATCACAACAAGGCCCA 59 19 KCNMA1 KCNMA1 (Reverse) CAACTTCAACTCTGCGAGG 59 19 NLGN1 NLGN1 (Forward) GGTTTCTTGAGTACAGGCG 59 19 NLGN1 NLGN1 (Reverse) GGTTTCTTGAGTACAGGCG 59 19 NRCAM NRCAM (Forward) TTAGAGTTAAAGCGGCTCCA 59 20 NRCAM NRCAM (Reverse) AGATCAAGGTCCCATCCTC 58 19 DPYD DPYD (Forward) CGGACATCGAGAGTCCT 58 19 DPYD DPYD (Reverse) CTAATTTCTTGGCCGAAGTG 57 20 SND1 SND1 (Forward) TGGCATGATCTACCTTGGA 58 19 SND1 SND1 (Reverse) TGCTCAGGATTATTAGCTCT 57 21 STK39 STK39 (Forward) CCGATGTAGTTATAGTGGCTG 58 21 STK39 STK39 (Reverse) CAATCTCCGACCCATCACAG 59 19 SUCLG2 SUCLG2 (Forward) TTAACAAGGTGATGGTTGCT 59 21 SUCLG2 SUCLG2 (Reverse) GCACCTCTTCAATGTCGACG 58 19 TANC2 TANC2 (Forward) ATTTAAGAAGAGCCATGCCA 58 20 TANC2 TANC2 (Reverse) AGTAGGTAGGAGACAATCTC 58 22 RFWD2 RFWD2 (Forward) AGTCAGTACGACCTTTAGCCAC 59 20 RFWD2 RFWD2 (Reverse) CCGGTCAAATTCAATACTAG 59 24 IMMP2L IMMP2L (Forward) AGACGCCTTCTTTGAATCCTGGG 58 19 IMMP2L IMMP2L (Reverse) CCTATGGTTCTGACAATATC 58 23

TOP1 TOP1 (Forward) TCCCAACTGTAGCAAAGATGCC 57 18 TOP1 TOP1 (Reverse) GTAACCTTGTTATCATGCCG 57 18

Table 2.2 TOP1 sgRNA sequences. Letters in black are 5’→3’ or 3’→5’ overhangs for restriction sites in vector backbone (pBA439). Letters in ORANGE are 20-22nt long protospacer sequence targeting TOP1 promoter.

TOP1 sgRNA 1 ttgAGGCTGTTACACAACTGCTGgtttaagagc (Forward)TOP1 sgRNA 1 ttagctcttaaacCAGCAGTTGTGTAACAGCCTcaacaag (Reverse)TOP1 sgRNA 2 ttgTAGGCTGTTACACAACTGCTgtttaagagc (Forward)TOP1 sgRNA 2 ttagctcttaaacAGCAGTTGTGTAACAGCCTAcaacaag (Reverse) 41

Table 2.3 Reagents and Kits

Materials Catalogue No. Supplier 5-alpha competent cells C2987 NEB Amphotericin B A2942 Sigma Blp1 R0585 NEB BstX1 R0113 NEB Buffer 2.1 B7202 NEB Cutsmart buffer B7204S NEB DMSO M81802 Sigma Dulbecco`s Modified Eagles Medium (DMEM) 96429 Sigma EDTA 324506 Sigma Foetal Bovine Serum F9423 Sigma HEPES H3375 Sigma iScript cDNA Synthesis Kit 1708891 Bio-Rad iTaq Universal SYBR Green Supermix 1725121 Bio-Rad Lipofectamine 3000 Transfection Reagent L3000008 ThermoFisher Lipofectamine LTX Reagent with PLUS Reagent 15338100 ThermoFisher Minimum Essential Medium (MEM) M4655 Sigma miRNeasy Mini Kit 217004 Qiagen Opti-MEM 31985070 ThermoFisher Penicillin/Streptomycin P4333 Sigma Phosphate Buffered Saline (PBS) 806544 Sigma QiaPrep Spin Mini Kit 27104 Qiagen QIAzole Lysis reagent 79306 Qiagen Qubit dsDNA BR Assay Kit Q32850 ThermoFisher Qubit RNA BR Assay Kit Q10210 ThermoFisher RNase Free DNase set 79254 Qiagen Shrimp Alkaline Phosphatase (rSAP) M0371 NEB T4 Buffer B0202 NEB T4 DNA Ligase M0202 NEB T4 Polynucleotide Kinase (PNK) M0201 NEB Topotecan T2705 Sigma Trypsin 59418C Sigma Wizard SV Gel and PCR Clean-UP System A9281 Promega

42

2.2.2 Methods 2.2.2.1 Cell culture NHA cells and HEK293cells were cultured in DMEM media supplemented with 10% Foetal Bovine Serum, 1% Penicillin-Streptomycin and 1% Amphotericin B (250µg/ml).

U87-dCas9-KRAB cells were cultured in MEM media supplemented with 10% Foetal Bovine Serum, 1% Penicillin-Streptomycin and 1% Amphotericin B (250µg/ml).

2.2.2.2 Topotecan treatment Topotecan hydrochloride hydrate (10mg, Molecular weight 421.45) was dissolved in 20ml DMSO to obtain final concentration of 1200 µM.

Approximately 4.4x106 NHA cells were seeded onto 10 cm cell culture dishes and incubated at 37°C overnight. Cells were treated the following day with 2.5µl topotecan stock solution (1200 µM) added in 10ml DMEM for each cell culture dish to get a final concentration of 300nM. RNA extraction was carried out after 24 hours of topotecan treatment.

2.2.2.3 RNA Extraction RNA extraction was performed by homogenising each cell pellet with QIAzole lysis reagent which is designed for optimum yield of RNA, inhibiting the detrimental effects of RNases. RNA extraction was carried out using the Qiagen RNeasy mini kit according to the manufacturer’s protocol. On-column DNase I treatment was carried out using RNase Free DNase Set according to the manufacturer’s protocol.

The RNA concentration was quantified using a Qubit 2.0 Fluorometer using RNA Broad- Range (RNA BR). Genomic DNA concentration in the RNA samples was assessed using the DNA Broad-Range (DNA BR) reagent. All the RNA samples obtained contained < 5% genomic DNA.

Knowledge of RNA integrity of a sample is critical for the gene expression analysis experiments. Hence, RNA quality was analysed by Agilent Bioanalyzer 2100 (UNSW Ramaciotti Centre). RNA Integrity Number (RIN) quantifies the RNA quality, with ten being the highest and one the lowest, indicating severe RNA degradation. RIN for all the samples was between 9.5-10.

43

2.2.2.4 cDNA Synthesis First strand cDNA synthesis was carried out using the Biorad iScript cDNA synthesis kit, using 1µg of total RNA in 20 µl as described in Table 2.4. The reaction mix was incubated as follows, according to the manufacturer protocol: 5 min at 25°C, 20 min at 46°C, 1 min at 95°C.

Table 2.4 cDNA Synthesis reaction mix

Reagents Volume per reaction in µl 5x iScript Reaction mix 4 iScript Reverse Transcriptase 1 Total RNA (1 µg) Variable

Nuclease-free H2O Variable Total Reaction mix 20

2.2.2.5 Quantitative real-time PCR (qRT-PCR) qRT-PCR was carried out using SYBR Green mix, on a ViiA™ 7 Real Time PCR system (Applied Biosciences), in either 96-well or 384-well plates. Each qRT-PCR reaction was carried out in technical triplicates. Reactions were set up as described in Table 2.5. PCR amplification conditions were as follows, hot start for 1 cycle at 95°C for 20 sec, then, 95°C for 1 sec followed by 60°C for 20 secs, with 40 amplification cycles per run. B2M was included as an endogenous control gene for all qRT-PCR experiments. The suitability of B2M as an endogenous control gene was confirmed using the microarray data which showed no significant expression differences between groups.

Table 2.5 Master mix for qRT-PCR

Reagents Quantity in µl (for each reaction) SYBR Green 5

RNase free H2O 2.6 Forward primer 0.2 Reverse primer 0.2 cDNA 2 Total volume 10

44

2.2.2.5.1 Analysis of qRT-PCR results and determining Fold Change CT values (Cycle threshold) were automatically detected by the ABI PCR system. qRT-PCR data was then analysed using the ΔΔCT method as follows.

ΔCT (Test gene) = CT (Test gene) – CT (B2M)

ΔΔCT (Test gene) = ΔCT (Treatment) – ΔCT (Control)

Fold Change = -1/ 2-ΔΔCT p-value for determining statistical significance was calculated on ΔCT value using a two tailed t-test.

2.2.2.5.2 Amplification Efficiency All primer pairs used for qRT-PCR have been assessed to ensure optimal amplification efficiency. Amplification efficiency was determined by using four 4-fold serial dilutions of a cDNA sample, with each dilution being tested in technical triplicates. The slope of the regression line between the log10 of serial dilution factor (x-axis) and the corresponding CT values for each dilution (y-axis) was used to calculate the amplification efficiency(E) as follows:

E = 10¯¹/slope, Efficiency (%) = (E – 1) x 100.

All primer pairs used for qRT-PCR had amplification efficiency > 80% (Table 2.6 and Figure 2.1).

Table 2.6 Amplification efficiency for primer pairs for qRT-PCR

Primers for specific genes Amplification Efficiency in % B2M 103.31 KCNMA1 80.39 NLGN1 82.71 NRCAM 99.25 IMMP2L 93.76 DPYD 93.11 SND1 143.19 STK39 110.50 SUCLG2 97.84 TANC2 77.46 RFWD2 107.36

45

Figure 2.1 Primer Amplification efficiency. Log10 transformed dilution factor (x-axis) against corresponding CT values (y-axis).

46

2.2.2.6 Cloning of TOP1 sgRNAs into pBA439 pBA439 (Figure 2.2) was linearized with BlpI and BstXI as described in Table 2.7 and incubated at 37°C for 2 hours. The digested DNA was run on a 0.5% agarose gel for 45 mins at 100V.

The ~9kb DNA band was purified using the Wizard SV Gel and PCR Clean-Up System. The purified DNA was then dephosphorylated by adding 2µl alkaline phosphatase (rSAP) and 4µl Cutsmart buffer, and incubated at 37°C for 60 mins, followed by heat inactivation at 65°C for 10 mins.

TOP1 sgRNA oligos were annealed as described in Table 2.8. The annealing reaction was incubated at 37°C for 30 mins, followed by 95°C for 5 mins and thereafter stepwise decrease of 5°C until reaching 25°C by incubating for 1 min, each time temperature is decreased.

Ligation reactions of annealed TOP1 sgRNA oligos and linearized pBA439 were set up as described in Table 2.9 and incubated at 25°C for 90 mins, followed by heat inactivation at 65°C for 10 mins. Three microliters of the ligation mix was transformed into NEB 5-alpha Competent E. coli according to the manufacturer’s protocol. Cloned plasmids were isolated from single ampicillin-resistant colonies by using the QiaPrep Spin Mini Kit. Successful cloning of TOP1 sgRNA into pBA439 was confirmed by Sanger sequencing performed at the UNSW Ramaciotti Centre

Table 2.7 Reaction mix for digesting pBA439

Reagents Volume pBA439 1µg (X µl) BlpI 1 µl BstXI 1 µl NEB buffer 2.1 5 µl

H2O X µl to make total 50µl reaction

47

Figure 2.2 Sequence map of pBA439. BstXI and BlpI : cloning site for TOP1 sgRNAs. This figure was adapted from https://www.addgene.org/85967/.

48

Table 2.8 Reaction mix for Annealing

Oligos and Reagents Volume in µl Forward Oligo (100µM) 2 Reverse Oligo (100µM) 2 T4 PNK 1 T4 ligase buffer 2

H2O X amount to make total 20µl reaction

Table 2.9 Reaction mix for ligation reaction

Reagents Volume pBA439 150-200ng Annealed oligos (1:20 diluted) 5 µl T4 ligase buffer 2 µl T4 ligase enzyme 1 µl

H2O X amount to make total 20µl reaction

2.2.2.7 Transient transfections and quantifying transfection efficiency The efficiency of transfection was first tested in NHA cells, U87-dCas9-KRAB cells and HEK293 cells by transfecting pBA439. To this end, cells were seeded onto 6-well plate, such that the cells were at ~80% confluency on the following day. To optimise the transfection each transfection agent was tested with 3 different volumes with constant DNA concentration.

Transfection with Lipofectamine 3000 and Lipofectamine LTX

Seeded cells were washed with 1ml PBS and pre-warmed fresh MEM media was added on to the cells. Lipofectamine 3000 (7.5 µl) and Lipofectamine LTX (15 µl) were diluted in 125 µl Opti-MEM in two different Eppendorf tubes. In another set of tubes, 125 µl Opti-MEM 2.5 µg DNA and 2.5 µl P3000 or LTX-Plus reagent were diluted for either Lipofectamine 3000 or Lipofectamine LTX. Diluted Lipofectamine and DNA were thoroughly mixed in one tube and briefly vortexed. The DNA-Lipofectamine master-mix was then incubated for 5 mins for Lipofectamine LTX and 15 mins for Lipofectamine 3000. This mixture was then added dropwise on the cells containing freshly replaced media.

Cells were incubated for 24 hours at 37°C, and then the media containing transfection reagents was replaced with 3ml of fresh media. These cells were then further incubated for 24-hours at

49

37°C. After 48- hours cells were incubated with 1ml of trypsin at 37°C for 5 min, centrifuged, and then diluted in 1ml FACS buffer.

FACS sorting was carried out by LSRFortessa SORP and the efficiency of transfection was calculated as the percentage of BFP positive cells.

Based on the results of assessing transfection efficiency, U87-dCas9-KRAB cells were transfected with pBA439-TOP1-sgRNA plasmids and the pBA439 control using the LTX transfection reagent as described above, scaling up to 10-cm dishes: 50 µl LTX and 25 µl LTX- plus, and 25 µg plasmid DNA. Each plasmid was transfected in three culture dishes, in two independent transfection experiments. 48 hours post-transfection cells were FACS sorted and BFP-positive cells were re-plated in 6-well plates. Cells were cultured for 6 additional days before RNA extraction.

2.2.2.8 Microarray Data For microarray analysis NHA were treated with 300nM topotecan or 0.1% DMSO as a control and were incubated for 24 or 48 hours. This experiment was carried out in triplicates. RNA extracted from these cells was then analysed by Ramaciotti Centre at UNSW for microarray analysis.

50

Results

2.3.1 Topoisomerase inhibition by Topotecan leads to reduced expression of long genes implicated in ASD

Since previous data showed that transcription of long genes requires TOP1 in neurons9, we investigated whether the same phenomenon occurs in glial cells such as astrocytes. Astrocytes are abundant in the brain and functionally interact with neurons. As described in the introduction, astrocytes play crucial role in synaptogenesis and also help maintain synaptic plasticity from early embryonic stage to adulthood establishing complex network of CNS48, 51, 52. Our lab has previously generated microarray data from human primary astrocytes treated with topotecan (Methods)62, to investigate the effect of TOP1 inhibition on gene expression. These data identified approximately 3300 genes significantly differentially expressed between topotecan-treated and control cells.

Building on this dataset, I investigated (a) whether ASD-associated genes were overrepresented among genes differentially expressed in response to topotecan and (b) whether the microarray data could be validated in independent topotecan-treatment experiments using qRT-PCR.

2.3.2 ASD-associated genes are overrepresented among Topoisomerase 1-dependent genes in human astrocytes. Of the total 12,736 genes detected as expressed in human primary astrocytes in the microarray data (detection p-value < 0.05), 2414 unique genes (corresponding to 3303 probes) were differentially expressed in response to topotecan treatment. These genes were assessed for overlap with known ASD-associated genes and long genes (i.e. genes with a genomic length >100kb) (Figure 2.3). Probes were considered differentially expressed if they fulfilled the following criteria: FDR < 0.05 and absolute fold change > 1.362. For genes represented by multiple probes, the probe with the lowest p-value, and its corresponding transcript were included in the analysis.

Long genes were highly enriched among differentially expressed (DEX) genes, with 25% of the DEX genes having a genomic length >100kb, while only 12% of non-DEX genes were long genes (p-value = 3.34 E-52, Fisher test) (Figure 2.4)

51

Interestingly, ASD-associated genes were also enriched among DEX genes. ASD-associated genes were obtained from the SFARI database, which manually curates genes supported by multiple lines of evidence72. Among DEX genes 1.7% were associated to ASD, compared to 0.09% for non-DEX genes (p= 0.0006, Fisher test). Following correction for length the p-value increased from 0.0006 to 0.04 but still remained significant (< 0.05).

These data demonstrate that similarly to neuronal genes9, genes depending on TOP1 for effective transcription in human primary astrocytes are enriched for ASD genes (Figure 2.4).

Figure 2.3 Downregulation of ASD susceptibility genes in NHA cells in response to topotecan treatment. Scatterplot of the log2 fold change for gene expression in response to topotecan measured by microarrays (y-axis) against log 10 of gene length (x-axis). Each grey circle represents a differentially expressed gene. Red circles represent ASD susceptibility genes. Red vertical line marks 100kb. FC: Fold change. bp: base pairs.

52

Figure 2.4 Genes differentially expressed upon topotecan treatment are enriched for long genes and ASD susceptibility genes. (a) Barplot displaying the percentage of long genes (>100kb) among differentially expressed (DEX) genes and non-DEX genes. (b) Barplot displaying the percentage of ASD susceptibility genes among differentially DEX and non-DEX genes.

2.3.3 qRT-PCR experiments confirm the downregulation of long genes in response to Topotecan in human primary astrocytes To assess the validation rate of the microarray data using an independent method, qRT-PCR experiments were carried out on RNA samples distinct from those used for microarray analyses. 10 genes were selected from 29 genes that fulfilled all 3 criteria of interest (a) were significantly differentially expressed, (b) have been implicated in ASD, and (c) have a genomic length >100kb.

Topotecan treatment (Methods) was carried out in duplicate independent experiments, with 3 treated and 3 control NHA cultures per experiment. Primers for a total of 9 out of 10 genes showed amplification efficiency >80% and were included to perform qRT-PCR.

Six of the 9 genes tested showed statistically significant downregulation in response to topotecan treatment, in concordance with the microarray data (p-value<0.05, unpaired t-test assuming unequal variance; Figure 2.5). The remaining 3 genes, STK39, SUCLG2 and RFWD2 showed a trend toward downregulated expression, but did not reach statistical significance (Figure 2.5).

Overall, the microarray data showed a 66% validation rate, which is consistent with expected results for the Illumina platform73.

The fold-changes observed by qRT-PCR ranged between -3.7-fold for NLGN1 and -4.1-fold for IMMP2L (Figure 2.5), while on microarrays the fold-changes for the same genes ranged

53 between -2.2 and -1 respectively, consistent with a lower dynamic range of the microarray platform (Table 2.10). Interestingly, there was a direct correlation between gene length and the fold-change of gene expression in response to topotecan treatment for both the microarray and the qRT- PCR data (r2= 0.1961 and 0.3196 respectively) (Figure 2.6). These data confirm the observation that TOP1 inhibition leads to the downregulation of long gene expression, and further show that the magnitude of transcriptional repression correlates with the genomic gene length.

54

Figure 2.5 qRT-PCR validation for ASD significant long genes. Bar plots showing gene expression fold change (y-axis) between topotecan-treated and control cells (n=3 cultures per group). Fold changes were calculated using the ΔΔCT method, with B2M as an endogenous control gene. Statistical significance was assessed using a two-tailed t-test assuming unequal variance. NS: Not significant. P >0.05: NS, p< 0.05: *, p<0.01:**, p< 0.001: ***, p<0.0001: ****. Error bars = Standard deviation of fold change between duplicate treatment experiments

55

Figure 2.6 Effect of gene length on fold change in response to topotecan treatment.

Gene length (kb) (x-axis), log2 transformed fold change (y-axis).

56

Table 2.10 Comparison of log2 fold changed qRT-PCR and Microarray data. FC=Fold change.

Genes qRT-PCR (log2FC) Microarray (log2FC) DPYD -1.273 -2.458 SND1 -1.975 -1.526 IMMP2L -4.113 -1.093 KCNMA1 -2.518 -2.312 NRCAM -1.751 -1.348 NLGN1 -3.712 -2.29

2.3.4 Silencing of TOP1 by CRISPRi To investigate the cleavage complex-independent effects of Top1 in non-neuronal cells, the expression of Top1 was attempted to be silenced using CRISPRi. Although this silencing attempt was not successful, the transfection optimisation experiments and qRT-PCR data have brought useful insights for future optimisation of CRISPRi in the lab.

CRISPRi requires simultaneous expression of a dCas9-KRAB construct and an sgRNA against the target region of interest (in this case, the TOP1 promoter). A U87 glioblastoma cell line which stably expresses dCas9-KRAB has been previously generated in the lab by random integration of pHAGE EF1α dcas9-KRAB (referred to as U87-dCas9-KRAB). Therefore, the TOP1 silencing experiment could be carried out either by double-transfection of dCas9-KRAB and an sgRNA plasmid in NHA cells, or by transfection of the sgRNA plasmid in U87-dCas9- KRAB cells. While using primary NHA cells would be preferable, double-transfection requires high transfection efficiencies. Furthermore, earlier experiments carried out with “U87 dca9- KRAB” cell line revealed transection efficiency as low as 0.2% to 0.4% obtaining barely 800 cells from fully confluent 10cm plate post-transfection with Lipofectamine 3000. Thus, to determine the best approach, transfection efficiency was assessed in NHA cells and U87- dCas9-KRAB cells, as well as HEK293 cells (as positive control, since these cells are known to be highly transfectable).

Transfection efficiency was tested by transfecting pBA439 using two transfection reagents (Lipofectamine 3000 and Lipofectamine LTX) in NHA cells, U87-dCas9-KRAB, and HEK293 cells.

Transfection efficiency was assessed 48 hours post-transfection, as the percentage of BFP- positive cells quantified by FACS-sorting (Methods).

57

The transfection methods are mainly classified into three different categories; biological, chemical and physical. In current study we have utilised cationic lipid (lipofectamine) based, chemical transfection method45. This experiment showed that using Lipofectamine LTX led to higher transfection efficiency across all cell types, and that NHA cells and U87-dCas9- KRAB cells had 3-fold lower transfection efficiency than HEK293 cells. Based on these results, further transfection experiments were carried out using Lipofectamine LTX. The efficiency of chemical transfection method mainly depends on nucleic acid/ chemical ratio, solution pH and cell membrane conditions45. It is known that different cell lines differ in their transfectability according to their ability to process the foreign DNA/RNA, cellular function and cell membrane composition74. The lower than expected transfection efficiency of HEK293 cells (35%), was likely the result of suboptimal gating of the FACS instrument. However, the experiment showed that relative to HEK293 cells, both NHA and U87-dcas9-KRAB cells has much lower transfection efficiencies, which suggested that double-transfection experiments would not be feasible. A comprehensive study published by Neuhaus et.al in 2016 has mentioned that the cells that divide rapidly had higher transfection efficiency compared to the cells that divide at the lower rate. Hence, HEK293 cells which can divide rapidly has higher transfection efficiency compared to NHA and U87-dCas9-KRAB cells. Furthermore, cells possessing higher phagocytic activity harbour higher digestive enzymes are likely to degrade the foreign DNA efficiently resulting in lower transfection efficiency. Since glial cells also possess phagocytic activity this could be the likely reason for lower transfection efficiency of NHA and U87-dCas9-KRAB cells74. CRISPRi experiments were carried out by transfecting pBA439 plasmids expressing sgRNAs against the TOP1 promoter into U87-dCas9-KRAB cells.

58

Figure 2.7 Comparison of transfection efficiency across cell lines. Lipofectamine 3000 and Lipofectamine LTX were utilised to transfect 3 different cell lines. HEk293T, U87-dcas9-KRAB and NHA cells (x-axis). Percentage of BFP positive cells (y-axis)

Two sgRNAs (sgRNA 1 and sgRNA 2) were designed to target the TOP1 promoter and cloned into the BstXI-BlpI site of pBA439, under the mU6 promoter. Two independent transfection experiments were carried out, with each experiment including triplicate transfections of sgRNA 1, sgRNA 2 and empty pBA439 control (Methods). To select for effectively transfected cells, BFP-positive cells were FACS sorted 48-hours post transfection, and re-plated in 6-well plates. TOP1 expression was assessed 6 days later, by qRT-PCR.

No significant expression difference was observed in TOP1 sgRNA expressing cells (Figure 2.8), suggesting that CRISPRi silencing was not efficient.

59

(a)

(b)

Figure 2.8 TOP1 silencing with CRISPRi. TOP1 gene expression changes in sgRNA- expressing cells compared to control cells transfected with an empty vector (pBA439). (a) sgRNA1. (b) sgRNA2. Fold changes were calculated using the ΔΔCT method, with B2M as an endogenous control gene. Statistical significance was assessed using a two-tailed t- test assuming unequal variance. NS: Not significant. Error bar: standard deviation of fold change values between duplicate transfection experiments.

60

Discussion

2.4.1 Overview and functions of long genes involved in neurodevelopmental diseases The work presented in this chapter assessed the effect of TOP1 inhibition by topotecan on gene expression in human primary astrocytes and confirmed that similarly to neurons, astrocytes require TOP1 activity to efficiently transcribe genes longer than 100kb.

Since astrocytes play a major role in the formation, maintenance and elimination of synapses contributing to synaptic plasticity and CNS homeostasis throughout the early developmental stages and adulthood48, 51, 52, our lab aimed to assess the effect of TOP1 inhibition on gene expression in human primary astrocytes by generating microarray data on control and topotecan-treated NHA cells.

The present study completed the assessment of TOP1-dependent gene expression in NHA cells by demonstrating a high validation rate of the microarray data (66%) and focussing on qRT- PCR validation of ASD-associated genes.

We demonstrated that TOP1 inhibition leads to reduced expression of long genes in NHA cells, thereby extending the role of TOP1 in transcriptional regulation to all major brain cell types.

The present study also showed a direct relationship between gene length and the magnitude of transcriptional repression. Interestingly, genes that were longer in length such as KCNMA1, NLGN1 and IMMP2L had a higher fold change compared to the shorter genes such as, SND1 and NRCAM. Surprisingly, the length of DPYD is comparable to other longer genes tested, but the fold change was only -2.418 compared to other longer genes, such as NLGN1 and IMMP2L with a fold change of -13.106 and -17.313 respectively. This result is likely explained by the fact that DPYD expresses a long isoform as well as several short isoforms9, and the qRT-PCR primers were designed to quantify total gene-level expression.

A study published by King et.al in 2013, was a breakthrough study, which performed in vitro experiments in mouse cortical neurons as well as human iPS cells. In this study cells were treated with topotecan and irinotecan (TOP1 inhibitors) to assess the effect of TOP1 inhibitors on the expression of long genes. The results for both topoisomerase inhibitors were highly correlated and expression of many long genes (>100kb) were downregulated in response to TOP1 inhibition. This study also compared the results obtained by other labs who assessed the effect of topotecan on long genes in different cell lines (such as MCF7, 293T, ME16C and HeLa) with those obtained using other topoisomerase inhibitors (irinotecan, camptothecin, doxorubicin). Taken together, the data across multiple cell lines and TOP1 inhibitors supported

61 the notion that TOP1 inhibition leads to decreased expression of long genes. Furthermore, King et. al confirmed this finding by knocking down Top1 with shRNA in mouse cortical neurons 9.

King et al. also showed that many of the ASD candidate genes downregulated with topotecan were exceptionally long and were involved in neuronal development and synaptic function9. However, some very long genes associated with ASD, such as IMMP2L are lowly expressed in neurons, and thus have not been assessed for expression changes in response to TOP1 inhibition.

From the 6 ASD-associated genes validated by qRT-PCR, only two (KCNMA1, NLGN1) have been previously shown to be affected by TOP1 inhibition in neural cells9. For the rest of four genes (DPYD, IMMP2L, NRCAM, SND1), this study is the first study to demonstrate their reduced expression in response to TOP1 inhibition.

2.4.2 The function of TOP1-dependent long genes and potential implications of their downregulation in response to TOP1 inhibition. What are the potential implications of the downregulated expression of the long genes identified in this study as regulated by TOP1 in glial cells? Here I have discussed the cellular functions of proteins encoded by long genes uncovered in the present study. These long genes are regulated by TOP1 and are strongly associated with synaptic functions and ASD.

2.4.2.1 IMMP2L- Inner mitochondrial membrane peptidase 2-like gene Several published studies have implicated the role of mutated IMMP2L in neurodevelopmental disorders such as, Tourette syndrome and ASD75, 76. IMMP2L gene is a susceptibility gene associated with Tourette syndrome, a neurodevelopmental disorder, in which affected individuals were identified with various mutations such as translocations, deletions, duplications and CNVs in the long arm of chromosome 7; harbouring the IMMP2L gene75.

IMMP2L activity processes mitochondrial protein c1 (Cyc1) and glycerol phosphate dehydrogenase (GPD2). Deletion of Immp2l leads to incomplete processing of c1 which can affect the electron transport system and result in increased production of ROS. This leads to the formation of mitochondrial permeability transition pore proteins, releasing pro-apoptotic proteins which eventually result in cell apoptosis76. Furthermore, few studies implicate that incomplete processing of GPD2 may cause metabolic and mitochondrial abnormalities76. Immp2l mutations causing incomplete processing of c1 and GPD2 give rise to dysfunctional

62 mitochondria leading to abnormal myelination and neurodegeneration77. Dysfunctional mitochondria impair ADP phosphorylation and depletion in ATP76, 77.

Mitochondrial dysfunctions strongly correlate with a reduction in neurotransmitters release, in particular, GABAergic neurotransmitters, which are associated with overrepresentation of excitatory neurotransmitters and receptors78. During the age of 12-30 months there is over production of excitatory neurotransmitters78. In this case inhibitory neurotransmitters are required and lack of which can lead to excitotoxicity corresponding to regression in children diagnosed with ASD78. Microarray analysis of IMMP2L knockdown with siRNA in astrocytes showed dysregulation of genes involved in CNS development and transcriptional regulation75.

These results suggest that phenotypic changes in neurodevelopmental disorders can be associated with transcriptional changes caused by mutation in IMMP2L75.

2.4.2.2 Neurexins and Neuroligins Neurexins and neuroligins are pre- and postsynaptic cell adhesion molecules respectively. There are total three genes encoding neurexins; Neurexin 1-3 and five genes encoding Neuroligins; Neuroligin 1-3, NLGN4X and NLGN4Y. Cell adhesion molecules play a significant role in establishing a homeostatic environment at the synapses along with signal molecules, neurotransmitter receptors and scaffolding proteins to maintain synaptic plasticity. All forms of Neurexin and Neuroligin have been described as high confidence candidate genes associated with ASD in SFARI and Autism Kb databases79, 80. Copy Number Variations, SNPs and deletions have been found in these genes in patients with ASD80-82.

Neurexin and neuroligin have been experimentally proven to induce synapse formation in cocultured neuronal cells as well as non-neuronal cells such as HEK293 cells and COS cells80, 81, 83. NLGN1-NRXN complexes help stabilise initial connections, but recruitment of more protein molecules and signalling by NLGN1 is necessary for maturation and maintenance of synapses81. Neurexins interact with other NRXN and non-NRXN molecules via extra cellular domains. Interaction of neurexins with non-neurexin molecules play an important role in the promotion of neurite growth, axon guidance, pathfinding and neuron-glia interactions81.

Studies carried out in the last decade show the key roles of glial cells in synapse formation, maturation, elimination and maintenance of overall homeostasis as well as brain development in the embryonic and adult brain48, 51, 52. Cell adhesion molecules, including neurexin and neuroligin, are expressed in both neurons and astrocytes. Interestingly, levels of neuroligins

63 are comparable or even higher in astrocytes compared to neurons. An in vitro study carried out by simultaneous knockdown of all neuroligins in astrocytes completely blocked neuron induced elaborations of astrocytes, but individual knockdown of NLGN 1-3 demonstrated a partial but significant decrease in astrocytic arborization and neuropil network, highlighting the fact that the role of independent neuroligins is non-overlapping but crucial83. Cortical astrocytes expressing neuroligins 1-3 through their interaction with neuronal neurexins control astrocyte morphogenesis83.

Neuroligin 1 and 2 knockout mice and studies in Nlgn2 mutant rats have demonstrated many behavioural symptoms relevant to ASD such as the deficit in learning and memory, stereotypic behaviour, elevated anxiety and altered levels pain sensation. Some of these symptoms correlate with observation in neurexin 2 KO mice80, 84.

2.4.3 Towards optimisation of TOP1 silencing by CRISPRi To investigate the TOP1 inhibition in a topotecan independent manner we decided to silence TOP1 by CRISPRi. To conduct these experiments, I first compared the optimal transfection efficiency for two available transfection agents. These experiments revealed that Lipofectamine LTX performed better than Lipofectamine 3000. However, CRISPRi experiments performed by transient transfection of TOP1 sgRNA revealed no significant fold change in expression of TOP1 as assessed by qRT-PCR.

There are many transfection agents such as Lipofectamine 2000, Lipofectamine 3000, FuGENE 6, FuGENE HD and Lipofectamine LTX that are available for use. All of these agents can have different efficiency and toxicity effects for the cell line of interest. Selection of the optimal transfection agent with minimal off target effects, cytotoxicity and maximum transfection efficiency is key to any viral or non-viral and transient or stable transfection85. For these reasons it was essential to find a transfection agent with higher transfection efficiency. We found that Lipofectamine LTX had higher efficiency compared to Lipofectamine 3000. As mentioned in the results NHA (Normal Human Astrocytes) and glioblastoma “U87 dcas9- KRAB” cell lines did not show a significant difference in transfection efficiency and “U87- dcas9-KRAB” cell line being a glioblastoma cell line would be representative of gene expression is astrocytes hence, we used “U87-dcas9-KRAB” cell line constitutively expressing dcas9-KRAB for our future CRISPRi experiments.

64

3 Analysing the expression of TOP1 and TOP1-dependent genes in ASD brain Introduction Given that ASD is a heterogenic complex neurodevelopmental disorder with heritability reaching up to approximately 50%37, it remains challenging to elucidate the molecular pathways leading to the diverse phenotypic outcome in ASD patients. In the last decade, a complementary approach to DNA sequence analysis has been the transcriptome analysis of ASD brain tissue samples, with the aim to decipher the underlying molecular pathways.

The first large-scale transcriptomic analysis of ASD and control brain samples determined that only 13% of the transcriptome differed from the control, reflecting the fact that the general transcriptome organisation in ASD is comparable to normal human brain73. Both this initial study by Voineagu et al.73 and a more recent study86 identified differentially expressed genes in cerebral cortex in ASD and control brain, but not in the cerebellum. In addition, Liu et al. (2016) also compared gene expression in brain tissue from ASD patients and controls, but this analysis was only carried out in prefrontal cortex33`. All three studies identified dysregulation of two main groups of genes in ASD. Out of the two groups, one group of genes that were markedly downregulated contained an overrepresentation of neuronal marker genes33, 73, 86. These genes significantly overlapped with ASD susceptibility genes from the SFARI, AutDB and AutismKB databases33. These downregulated genes were overrepresented for categories related to synaptic function and neuronal signalling such as calcium signalling and long-term potentiation pathways33, 73, 86. This observed finding was may be primarily due to dysregulation of synapses33. Altered expression was observed for the genes involved in regulation of transcription and changes in cellular, metabolic and biosynthetic processes. Further analysis showed that genes downregulated in ASD brain samples were enriched for common and rare de novo variants, and CNVs associated with ASD 33, 73, 86.

ASD is a disorder characterised by lack of communication, cognitive and sociability of an affected individuals. These phenotypes play key role to distinguish human from other non - human primates. For this reason, Liu et.al carried out a study to compare the altered gene expression specific to humans by analysing RNA-seq data obtained from the pre-frontal cortex of humans and non-human primates such as chimpanzee and macaque monkeys33. They found that among the large number of perturbed developmental patterns in ASD, the genes predominantly involved in synaptic functions were affected harbouring mutations linked to ASD. Gene expression analysis across three different species suggested threefold increase in

65 human specific developmental genes compared to chimpanzee specific developmental genes33. Same study also demonstrated that peak expression of synaptic genes is delayed in normal humans compared to chimpanzees and rhesus monkeys but in ASD probands synaptic genes peaked in their expression earlier, suggesting, premature development of pre-frontal cortex in ASD individuals. This study found altered density of trimethylated H3K4 histone mark and corelated perturbed gene expression in neuronal cells in ASD pre-frontal cortex. Furthermore, they showed significant enrichment of several TF binding sites in synaptic genes that are downregulated in ASD. Among these TFs, ERG1-4 showed significant relevance because of their role in neuronal plasticity. ERG1’s ability to regulate GABA receptors in hippocampus suggest that its dysregulation can contribute to an excitatory and inhibitory imbalance in ASD33. The group of genes upregulated in ASD brain were enriched for immune response related genes33, 73, 86. Since astrocytes and microglia actively participate in the inflammatory processes in the brain and also play a key role in synaptic plasticity, CNS homeostasis, an upregulated inflammatory response was hypothesized to be secondary to a synaptic insult 73, 86.

In addition to protein-coding genes, long non-coding RNAs (lncRNAs) have also been studied for potential dysregulation in ASD. LncRNAs are a class of non-coding RNA that are >200nt long, 5’ capped, polyadenylated and spliced like mRNA but lack long open reading frames 4, 87, 88. LncRNAs are expressed in a tissue-specific manner and are highly expressed in brain. Some lncRNAs have primate-specific expression patterns in the brain87. They have more cellular specificity and discrete subcellular localisation compared to proteins4, 86, 87. Parikshak et al. (2016) assessed the expression of lncRNAs in ASD brain samples and identified 60 differentially expressed lncRNAs in ASD86. Nine of these interact with FMRP, an RNA- binding protein strongly associated with ASD34, while a further 20 are targets of miRNA- protein complexes, which suggests that they are regulated and therefore might be functional in ASD pathogenicity86. This study also determined that the primate-specific lncRNAs LINC00689 and LINC00693 are predominantly downregulated in the normal developing brain but upregulated in ASD cortex, suggesting a potential role in ASD pathogenesis86. Some differentially expressed lncRNAs were highly expressed during foetal cortical development indicating their role in early embryonic stages86.

Transcriptome studies of ASD brain tissue also highlighted a potential role for alternative splicing in ASD. For example, the splicing factors RBFOX1, SRRM4 and PTBP1 have perturbed expression in ASD brain86. Furthermore, target genes of these splicing factors show

66 consistently perturbed expression in ASD 86. Voineagu et al. (2011) in their gene expression analysis found perturbation of the splicing factor A2BP1/RBFOX1 mainly downregulated in ASD cortex compared to controls73. More than 200 differential splicing events were found, that likely resulted due to dysregulation of A2BP1/RBFOX1. The top targets of these splicing factors are CAMK2G, NRCAM and GRIN1. Out of these three genes NRCAM and GRIN1 are involved in synaptogenesis73 suggesting that perturbed splicing factors/events can also be contributory to the dysregulated synaptic plasticity in ASD.

Overall, transcriptome studies of ASD brain highlighted synaptic and immune dysfunction as potential convergent pathways dysregulated in ASD, and also represent a rich resource for testing further hypotheses. In the present study (Chapter 3) we mined published transcriptome data33, 86 to assess the expression of TOP1 and TOP1-dependent genes in ASD.

Transcriptomic analysis of ASD brains reveals perturbed expression of many genes related to chromatin remodelling, transcription factors, lncRNAs and splicing factors having a major impact on the synaptic function of the brain affecting early embryonic life to adulthood confirming heterogeneity and complexity of the disease73, 86. The majority of genes expressed in brain are long genes (>100kb), and previous studies have established de novo mutations in TOP1 in ASD brain89 with effects on transcription of long genes9, 62. Given that genetic variants in Topoisomerases have been associated with ASD, a corollary is that ASD cases might show alterations in the expression of TOP-1 dependent genes. A study conducted by our laboratory and validation of the same by qRT-PCR confirmed that inhibition of Topoisomerase 1 by topotecan affects the transcription of long genes linked to ASD in astrocytes. Hence, we mined RNA-seq ASD brain data published by Parikshak et al. (2016) and Liu et al. (2016) for the analysis in our study. Aims for this chapter are as follows;

• Analyse the expression of TOP1 in ASD and control brain. • Analyse the expression of TOP1-dependent long genes in ASD brain.

67

Methods

3.2.1 RNA-seq data RNA-seq data from Parikshak et al.86 was kindly provided by the Geschwind lab (UCLA) as normalized reads-per-kilobase-per-million (rpkm). Set includes RNA samples from frontal cortex (Brodmann area 9), temporal cortex (Brodmann areas 41,42) and cerebellar vermis, from autistic and control individuals (Table 3.1).

Table 3.1 Number of samples in the RNA-seq dataset

Phenotype

Brain region ASD Control Total

Frontal Cortex 42 37 69

Temporal Cortex 43 45 88

Cerebellar Vermis 39 45 84

Total 124 127 241

Liu. et.al performed their data analysis on brain tissues obtained from 40 control individuals (males = 28, females = 12, ages between 1-61 years) and 34 ASD individuals (males= 24, females = 10, ages between 2-60 years)33. The age, gender ratios and RNA quality numbers (RIN) were not significantly different between ASD and controls or between brain regions (Figure 3.1). Normalised gene expression data from Liu et al.33 was obtained from GEO, accession number: GSE29138.

3.2.2 RNA-seq data analysis Data was analysed using the R software (https://www.r-project.org/). Statistical significance of gene expression differences between groups was assessed using a non-parametric Wilcoxon test, as implemented in the wilcox.test function in R. ASD-associated genes were obtained from the SFARI database: https://gene.sfari.org/.

68

TOP1 expression data from the GTEx consortium was obtained from the GTEx portal web browser: https://www.gtexportal.org/home/gene/TOP1. Gene-level boxplots of gene expression were generated by selecting the brain samples from the GTEx sample list. The fill color of boxplots was then adjusted to highlight cerebellum and cortex samples.

Figure 3.1. Properties of the brain tissue samples included in the Parikshak et al. RNA- seq dataset. (a) Boxplots of age (years), (b) Boxplots of RNA Integrity numbers (RIN), (c) Barplots of gender ratios. All covariates are plotted across 6 groups: ASD samples across three brain regions, and control samples (CTL) across three brain regions. Temp-temporal cortex; Front-frontal cortex; Vermis- Cerebellar vermis. The bottom, middle, and top lines of each box mark the first, second, and third quartiles, respectively. Whiskers extend as far as 1.5 x the difference between the first and third quantiles. Outliers extending beyond this range are plotted individually.

69

Results To investigate whether the expression of TOP1 and TOP1-dependent genes are dysregulated in the brains of autistic individuals, I mined RNA-seq data generated from post-mortem brain tissue of ASD individuals and controls by the Geschwind lab (UCLA); Methods.

Gene expression levels measured by RNA-seq and normalised as reads-per kilobase-per- million (rpkm) were assessed for differences between ASD and controls as well as between brain regions using a non-parametric Wilcoxon rank-sum test.

Remarkably, we observed a significantly higher expression of TOP1 in cerebellum compared to the cerebral cortex (p=4.8 e-11; Figure 3.2a), which to our knowledge has not previously been reported.

To investigate whether this observation is replicable, I mined RNA-seq data from the GTEx consortium, which includes over a hundred human brain samples. The increased expression of TOP1 in the cerebellum was confirmed using the GTEx data (Figure 3.2b). Not only was TOP1 expression higher in cerebellum than in cerebral cortex, but in fact cerebellum showed the highest TOP1 expression across all brain regions represented in the GTEx data (Figure 3.2b). These data suggest that TOP1 might play an important role in cerebellum, alternatively this finding could reflect the specific cellular composition of this brain region.

Comparing TOP1 expression between ASD and control samples revealed that the mean expression of TOP1 was higher in ASD than in control samples in both cortex and cerebellum. This difference reached statistical significance for the cortex data (p=0.01; Figure 3.2a). In cerebellum, TOP1 showed a trend of increased expression in ASD, which did not reach statistical significance, likely due to the lower sample size.

The magnitude of TOP1 expression change in ASD in cerebral cortex was moderate, with 6% increase in TOP1 expression in ASD samples relative to controls. This observation is consistent with previously observed low fold-changes of gene expression in ASD brains73, 86 and with low effect sizes of ASD-associated alleles90.

70 a)

b)

Figure 3.2 TOP1 expression in human brainsamples. (a) Boxplots of TOP1 expression data from Parikshak et al., grouped by brain region (ctx- cortex; cb-cerebellum) and phenotype. p: Wilcoxon text p-values. (b) Boxplots of TOP1 expression data from the GTEx consortium (https://www.gtexportal.org/home/gene/TOP1). The bottom, middle, and top lines of each box mark the first, second, and third quartiles, respectively. Whiskers extend as far as 1.5 x the difference between the first and third quantiles. Outliers extending beyond this range are plotted individually. RPKM=Reads per kilobase of transcript, per Million mapped reads. TPM=transcript per million

71

Proteins regulating transcription, such as transcription factors, are often expressed at low but tightly regulated levels. Thus, despite the low magnitude of TOP1 expression difference between ASD and controls, the altered TOP1 expression in ASD brain could conceivably have downstream effects on a subset of TOP1-dependent genes. Furthermore, altered expression of long genes has been reported in Rett syndrome, a syndromic form of ASD caused by mutations in the methyl-binding protein MECP236, 91, 92.

To test whether there is any evidence that TOP1 expression differences in ASD brain may have downstream effects, I next investigated the expression of TOP1-dependent genes in ASD and control brain samples. 61 ASD-associated genes that showed reduced expression in human primary astrocytes in response to TOP1 inhibition were detected in the brain data. Of these, 8 showed significant expression differences in ASD relative to control samples (p < 0.05, Wilcoxon test; Figure 3.3): IFIH1, LAMB1, NPAS3, RIMS3, SBF2, SNX5, SUCLG2, UBE2H. Remarkably, 6 of the 8 significant genes showed upregulated expression in ASD (Figure 3.3), consistent with increased TOP1 expression. The magnitude of gene expression changes for these TOP1-dependent genes was also moderate, less than 2-fold (Figure 3.4).

To further investigate whether the 6 TOP1-dependent genes that showed upregulated expression in ASD brain are replicable and identified as dysregulated in ASD, I used data previously published by Liu et al33, which included RNA-seq from prefrontal cortex of 34 ASD and 40 control individuals. The fold-changes of gene expression between ASD and controls were highly consistent between the two independent datasets, with 5 of the 6 upregulated genes (and 7 of all 8 TOP1-dependent dysregulated genes) showing expression differences in the same direction in the two datasets (Figure 3.4). These data suggest that although of low magnitude, the TOP1 expression differences between ASD and controls may contribute to the dysregulated expression for a subset of genes. These results bring initial evidence for a link between TOP1 and gene expression dysregulation in ASD. Further studies would be required to (a) identify the cause of TOP1 upregulation in ASD brain, and (b) modulate in-vitro increased TOP1 expression on a similar scale as that observed in ASD brain, to bring additional evidence for the link between TOP1 expression and downstream gene expression changes.

72

Figure 3.3 Boxplots displaying gene expression differences between ASD and controls. P: Wilcoxon test p-value. The bottom, middle, and top lines of each box mark the first, second, and third quartiles, respectively. Whiskers extend as far as 1.5 x the difference between the first and third quantiles. Outliers extending beyond this range are plotted individually. Y axis= Reads per kilobase of transcript, per Million mapped reads (RPKM). Notch =+/- 1.58 IQR/sqrt(n). AUT= Autism and CON= Control, IQR= Interquartile range.

73

Figure 3.4 Scatterplot displaying gene expression fold-changes between ASD and control samples on a log2 scale in RNA-seq data from two independent studies: Parikshak et al. (x- axis) and Liu et al. (y-axis).

74

Discussion The goal of the work presented in this chapter was to investigate whether the expression of TOP1 is altered in ASD brain tissue compared to controls. Furthermore, we wanted to assess whether long genes showing TOP1-dependent expression in human astrocytes also show altered expression in ASD brain. We found that TOP1 was upregulated in ASD brain compared to controls and, consistently, a subset of TOP1-dependent long genes was also overexpressed in ASD compared to controls.

We identified 8 TOP1-dependent long genes to be dysregulated compared to control, with the majority of genes showed overexpression consistent with increased TOP1 expression. The six genes that were mildly but significantly overexpressed are SNX5, IFIH1, SBF2, NPAS3, UBE2H and SUCLG2.

SNX5 is involved in the endocytosis pathway, responsible for immune response93. IFIH1 is an immune response related gene mainly upregulated in response to viral infections94. UBE2H is an E2-ligase enzyme involved in the ubiquitin dependent proteolytic system, and another member of the same family, UBE3A has been found to be involved in neural development. Mutations of UBE2H and UBE3A are associated with Angelman syndrome and ASD95. NPAS3 gene is a neuron-specific transcription factor mainly involved in brain development, and mutations in this gene have a proven role in schizophrenia96. SUCLG2 is a component of the second succinyl-CoA-synthase isoform which catalyses GDP dependent reactions. Knockdown of SUCLG2 results in a decreased mtDNA content, Cytochrome C Oxidase (COX) activity, and mitochondrial Nucleoside Diphosphate Kinase (NDPK) activity97. GWAS study of exonic CNVs has found rare variant in SUCLG2 in two cases of ASD98.

RIMS3 and LAMB1 are the only two TOP1-dependent long genes which were downregulated in ASD samples in both the datasets. RIMS3 is a synaptic vesicle gene and microdeletion in this gene has been found in many ASD cases. Hence, RIMS3 is considered as a strong candidate gene for ASD99. LAMB1 also an ASD candidate gene located in the long arm of chromosome 7 (an autism critical region)100. LAMB1 is expressed during early embryonic life and believed to be associated with the nervous system development101. Furthermore, SNPs (Single nucleotide Polymorphisms) associated with LAMB1 are also thought to be associated with ASD and responsible for different degree of severity in ASD symptoms100, 101.

To validate the expression of dysregulated genes we compared the same set of our ASD significant genes, with the Liu et al.33 which revealed that the directionality of expression of

75 the genes was same as for the Parikshak et al86. This confirms the robustness of our result and determines that expression of TOP1-dependent genes is concordant with TOP1 expression.

Interestingly, none of the synaptic adhesion molecule genes such as NLGN1, NRCAM and CNTNAP2 associated with ASD and encoded by very long genes were found to be dysregulated in this analysis.

Given that gene expression regulation is a complex and finely tuned process, the expression of long genes is likely controlled by a combination of factors, and thus TOP1 upregulation may ultimately only impact a subset of genes.

Following on from these analyses, it is important to ask, what would be the potential functional consequences of a mild increase in expression of TOP1 and TOP1-dependent long genes?

Transcriptional regulation is finely tuned during development and throughout adult life. Our data supports the notion that the mild increase in TOP1 expression may have an impact on a subset of TOP1-dependent genes. Of these, NPAS3 encodes a transcription factor, and thus its altered expression could have further downstream consequences on neuronal gene transcription.

In addition to transcriptional regulators, synaptic proteins are also known to be highly dose sensitive, with both increases and decreases in expression leading to impaired synaptic activity34. Thus altered RIMS3 expression, although possibly not a direct consequence of TOP1 overexpression, could lead to downstream effects on synaptic function.

3.4.1 Highlighting the effect of TOP1 overexpression While this study has been mostly concerned with the effects of TOP1 on transcription, our observation that TOP1 shows increased expression in ASD brain also raises the question of potential effects of TOP1 overexpression on DNA replication and repair, which in turn are linked to genomic instability. Indeed, overexpression of Top1 in yeast has been demonstrated to lead to genomic instability102, in particular, the repetitive region of rDNA was destabilised highlighting a detrimental effect of Top1 overexpression in the eukaryotic genome. ASD cases consistently show an increased burden of SNVs and CNVs37, but the cause of this phenomenon remains unclear. Our result suggests that altered TOP1 expression may contribute not only to gene expression changes but also to genomic abnormalities in ASD, and thus could be one of the key factors contributing to the pathogenicity in ASD.

76

Finally, we also found that TOP1 shows higher expression levels in the cerebellum compared to cortex. This observation suggests that TOP1 could potentially have brain region-specific or cell-type specific roles in the human brain.

77

4 Discussion and conclusion

Studies carried out in the last five years have found an intriguing fact that many genes expressed in the human neuronal lineage compared to non-neuronal lineage harbour long genes103. Neurodevelopmental disorders including ASD are affected by the perturbed gene expression in the brain regions such as the amygdala and cortex103. Transcriptomes of these regions are highly enriched for long genes involved in synapse formation, maturation and synaptic plasticity. Furthermore, a syndromic forms of ASD such as, FXS and Rett syndrome are caused by mutations in FMRP103 and MECP2103 respectively, revealing that these genes specifically target long genes by translational104 and transcriptional92 perturbation of their targets respectively. These findings confirm the key role of long genes as one of the major contributors in the pathogenicity of ASD9, 10, 61, 103.

Topoisomerases are ubiquitously expressed enzymes involved in basic cellular functions such as, replication, chromatin remodelling, translation and mainly in transcription, which has recently been extensively explored 7, 8, 24. Apart from involvement in these cellular functions, TOP1 is of specific interest in the nervous system because of high oxidative stress resulting in DNA damage which is due to increased metabolic rate and enhanced consumption of oxygen, requiring efficient transcription of many genes18. ASD is mainly defined by dysregulation of synaptic functions11. Studies published in the last few years have discovered de novo mutations in TOP1 and as a result, decreased transcription of long genes (>100kb in length) linked to ASD and other neurodevelopmental disorders emphasising the role of TOP1 in the pathogenesis of these disorders 9, 10, 89.

King et al. showed down regulation of TOP1-dependent long genes by TOP1 inhibition in human neurons 9, 62. This study demonstrated the effect of TOP1 inhibition in astrocytes by analysing previously generated microarray data 54 and validated the results by an independent method. Since RNA pol2 is associated with TOP1, inhibition of TOP1 reduces the density of RNA pol2 along the gene body in length dependent manner affecting transcription elongation of long genes but not for the short genes (<67kb in length)7, 9. Some genes such as NRXNs, KCNMA1 and NLGN1 that were tested in this study show a correlation with the King et al. study. These same long genes (>100kb in length) showed statistically significant decreased fold change (-2.19, -1.90, -1.83 respectively) in mouse neurons with topotecan treatment9. These findings confirm that inhibition of TOP1 down regulates the expression of long genes

78 and the effect is similar across different cell lines (MCF7, 293T, HeLa, ME16C9, human astrocytes and mouse neurons9).

Following our observation of TOP1 inhibition and its effect in astrocytes, we were interested to know the expression of TOP1 in RNA-seq data obtained from sequencing of ASD and control brain samples. Interestingly, we found marginal but significant upregulation of TOP1 in ASD probands compared to non-ASD individuals. Previous studies have found rare de novo missense mutations in TOP1, TOP3B and TOPORS9, 89, 105. Furthermore, missense mutations in a gene has overall less effect compared to more deleterious mutations such as nonsense mutations, frame shift mutations and splice variants105. Hence, whether these mutations are deleterious is questionable because these mutations are rare and have not been replicated in any other cohorts or study published thus far. But the studies carried out in human and mice neurons and astrocytes have confirmed the effect of TOP1 inhibition on downregulation of synaptic function related long genes. Hence, it is convincing that any loss of function mutation in TOP1 can have a significant impact on ASD pathogenesis and other neurodevelopmental disorders.

This study successfully validated previously generated microarray data in astrocytes. The current validation confirmed that downregulation of long genes is important for synaptic function due to topoisomerase inhibition in astrocytes. Hence, this study supports the previous findings by King et al. (2013) that the downregulation of long genes with TOP1 inhibition is comparable in all cell types and is of specific interest due to higher expression of long genes in the brain and their role in synaptic function9, 103. The analysis of TOP1 expression and TOP1 dependent genes in ASD brain showed marginal but significant overexpression of TOP1 in cerebellum compared to the cortex. Intriguingly, the expression of TOP1 and TOP1-dependent genes was higher in ASD brain compared to non-ASD individuals.

ASD has been mainly identified as a neurodevelopmental disease due to abnormal synaptic homeostasis11, 34. Long genes (>100kb in length) are highly expressed in the brain and are involved in synaptogenesis and synaptic plasticity103. Many long genes such as neurexin and neuroligins are highly expressed in astrocytes and play significant role in synapse maintenance and maturation83. Evidence of neuro-glia interactions and the concept of tripartite synapses clearly suggests the role of astrocytes to receive, process, distinguish and transmit the synaptic signals same as neurons60. Present study identified four novel genes: DPYD, IMMP2L, NRCAM and SND1 to have decreased expression in response to TOP1 inhibition in astrocytes. Current study mined the RNA- seq data obtained from brain tissues of ASD and control

79 individuals. Here, we demonstrate for the first-time increased expression of TOP1 and majority of TOP1-dependent long genes in the brain of ASD individuals compared to controls. As glia play key role in maintaining CNS homeostasis and brain development48, 51, 52, our in vitro experiments in astrocytes and in vivo analysis of ASD brain strongly support the theory that perturbed expression of TOP1 may lead to decreased or increased expression of long genes in the brain and might be contributory to many pathways involved in pathogenicity of ASD.

80

5 Limitations

To elucidate the effect of TOP1 inhibition on expression of long genes, Topoisomerase 1 was chemically inhibited with topotecan which forms cleavage complexes with TOP1(TOP1cc) when bound to DNA, generating irreversible breaks in DNA106. Inhibiting the actions of TOP1 by the formation of TOP1cc perturbs the transcription of many long genes in human neurons9. Another study carried out deletion of TOP1 in neurons to demonstrate the effect of deletion on decreased expression of long genes compared to the formation of TOP1cc. This determined that formation of TOP1cc is more deleterious which downregulated more genes compared to TOP1 deletion61. For this reason, we wanted to analyse the effect of TOP1 silencing on the expression of long genes in astrocytes.

CRISPRi has proven to be a highly specific, robust, cost effective, reproducible and non-toxic method compared to RNAi and TALEN to study functions of endogenous eukaryotic genes68. CRISPRi can repress transcription by blocking RNA pol2 and also by the formation of heterochromatin and reducing the accessibility of transcription factors to the gene of interest3, 67. Appropriate design and choice of sgRNA for silencing the target genes is essential for any CRISPRi experiments68, 107. Various tools are available for the design of sgRNA with minimum off-target effect and maximum efficiency. Two TOP1 sgRNAs designed using Broad institute tool targeted TOP1 promoter region. U87 cell line stably expressing dcas9-KRAB was transfected with TOP1 sgRNAs. Quantitative RT-PCR performed 8 days post-transfection did not show significant knockdown of TOP1 compared to control.

There can be several reasons for not obtaining significant knockdown of TOP1 by CRISPRi. (a) Not all sgRNAs have same efficiency, hence, it is necessary to find the sgRNA with optimum effect. For this reason, minimum four or more sgRNAs should be tested to find the one with the best performance. Local chromatin structure and binding affinity of individual sgRNAs play a significant role in the efficiency of silencing by CRISPRi.3, 107 (b) Following the transient sgRNA transfections, qRT-PCR was performed at day 8, to analyse the effect of silencing on target gene (TOP1). We did not observe any silencing effect on TOP1. This could be likely due to degradation of sgRNA at the earlier time than day 8. For this reason, qRT-PCR can be performed at an earlier time point, i.e. at day 3 or day 5 when silencing can be detected. Stable lentiviral integration of sgRNAs for the consistent expression of sgRNAs and long-term repression of the target gene has been utilised by many researchers3, 63, 68 (c) Although KRAB is considered as primary transcriptional repressor few genes can require the recruitment of other

81 factors for the repression3. (d) Mxi1 is another repressor protein expressed in eukaryotes108 which shows robust silencing in S.cerviceae; this means that if KRAB is not effective Mxi1 can be tested for gene silencing experiments3. The latest study published compared different repressor proteins in combination with dcas9-KRAB and concluded that dcas9-KRAB complexed with MECP2 showed stronger repression compared to dcas9-KRAB alone109.

82

6 Future Directions

The focus of this thesis was to determine the effect of Topoisomerase inhibition on long genes (>100kb in length) linked to ASD. Hence, to achieve this, astrocytes were treated with topoisomerase inhibitor drug topotecan. We wanted to also confirm this hypothesis by silencing TOP1 promoter by CRISPRi. Due to lower transfection efficiency of U87-dCas9-KRAB, transient expression of sgRNAs in these cells was inefficient to silence TOP1.

Firstly, to improve the transfection efficiency other transfection agents such as FuGENE HD85, RNAimax or calcium phosphate nanoparticles74 can be further explored to deliver sgRNAs or siRNA in the cells for CRISPRi or RNAi experiments respectively. Lentiviral delivery of sgRNAs to achieve stable expression and robust silencing of TOP1 can be performed.

Secondly, RNA-seq analysis of ASD and control brain samples from two different data sets showed increased expression of TOP1 in ASD probands compared to controls. Experiments carried out in yeasts have confirmed genomic instability as a result of Top1 oversxression102 highlighting an adverse effect of Top1 overexpression in eukaryotic genome. Hence, microarray analysis by overexpressing TOP1 in glial and neuronal cell lines can give useful insight into overall effect of TOP1 overexpression genome wide.

Lastly, only one published study has identified a de novo mutation in TOP1 in ASD. Hence, more studies are required to demonstrate reproducibility. Memory impairment and confusion indicative of synaptic dysfunction in cancer patients treated with topotecan has been well documented10. In vivo experiments with mouse models engineered to express a conditional knockout or knock-in of Top1 in the adult brain and analyses of phenotypic effects showing behavioural patterns linked to ASD would offer helpful insights into the effects of Top1.

83

7 References

1. Brown DD. Gene expression in eukaryotes. Science. 1981;211(4483):667-674. 2. Gibcus JH, Dekker J. The context of gene expression regulation. F1000 biology reports. 2012;4. 3. Gilbert LA, Larson MH, Morsut L, et al. CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell. 2013;154(2):442-451. 4. Mercer TR, Dinger ME, Mattick JS. Long non-coding RNAs: insights into functions. Nature reviews Genetics. 2009;10(3):155. 5. Bansal K, Yoshida H, Benoist C, et al. The transcriptional regulator Aire binds to and activates super-enhancers. Nature immunology. 2017;18(3):263. 6. Baranello L, Kouzine F, Levens D. DNA Topoisomerases: Beyond the standard role. Transcription. 2013;4(5):232-237. 7. Baranello L, Wojtowicz D, Cui K, et al. RNA polymerase II regulates topoisomerase 1 activity to favor efficient transcription. Cell. 2016;165(2):357-371. 8. Ma J, Wang MD. DNA supercoiling during transcription. Biophysical reviews. 2016;8(1):75-87. 9. King IF, Yandava CN, Mabb AM, et al. Topoisomerases facilitate transcription of long genes linked to autism. Nature. 2013;501(7465):58. 10. Mabb AM, Kullmann PH, Twomey MA, et al. Topoisomerase 1 inhibition reversibly impairs synaptic function. Proceedings of the National Academy of Sciences. 2014;111(48):17290-17295. 11. Bourgeron T. A synaptic trek to autism. Current opinion in neurobiology. 2009;19(2):231-234. 12. Wang JC. Cellular roles of DNA topoisomerases: a molecular perspective. Nature reviews Molecular cell biology. 2002;3(6):430. 13. Ashour ME, Atteya R, El-Khamisy SF. Topoisomerase-mediated chromosomal break repair: an emerging player in many games. Nature reviews Cancer. 2015;15(3):137. 14. Nitiss JL. Investigating the biological functions of DNA topoisomerases in eukaryotic cells. Biochimica et Biophysica Acta (BBA)-Gene Structure and Expression. 1998;1400(1):63-81. 15. Pommier Y, Sun Y, Huang SN, et al. Roles of eukaryotic topoisomerases in transcription, replication and genomic stability. Nat Rev Mol Cell Biol. 2016;17(11):703-721. 16. Ghilarov D, Shkundina I. DNA topoisomerases and their functions in a cell. Molecular Biology. 2012;46(1):47-57. 17. Vos SM, Tretter EM, Schmidt BH, et al. All tangled up: how cells direct, manage and exploit topoisomerase function. Nature reviews Molecular cell biology. 2011;12(12):827. 18. McKinnon PJ. Topoisomerases and the regulation of neural function. Nature reviews Neuroscience. 2016;17(11):673. 19. Lee J. Roles of cohesin and condensin in chromosome dynamics during mammalian meiosis. Journal of Reproduction and Development. 2013;59(5):431-436. 20. Bell JC, Straight AF. Condensing chromosome condensation. Nature cell biology. 2015;17(8):964. 21. Yang Y, He Y, Wang X, et al. Protein SUMOylation modification and its associations with disease. Open biology. 2017;7(10):170167. 22. Nitiss JL. DNA topoisomerase II and its growing repertoire of biological functions. Nature reviews Cancer. 2009;9(5):327. 23. Solier S, Ryan MC, Martin SE, et al. Transcription poisoning by Topoisomerase I is controlled by gene length, splice sites, and miR-142-3p. Cancer research. 2013;73(15):4830- 4839.

84

24. Vokálová L, Durdiaková J, Ostatníková D. Topoisomerases interlink genetic network underlying autism. International Journal of Developmental Neuroscience. 2015;47:361-368. 25. Tuduri S, Crabbé L, Conti C, et al. Topoisomerase I suppresses genomic instability by preventing interference between replication and transcription. Nature cell biology. 2009;11(11):1315. 26. Ju B-G, Rosenfeld MG. A breaking strategy for topoisomerase IIβ/PARP-1-dependent regulated transcription. Cell Cycle. 2006;5(22):2557-2560. 27. Madabhushi R, Gao F, Pfenning AR, et al. Activity-induced DNA breaks govern the expression of neuronal early-response genes. Cell. 2015;161(7):1592-1605. 28. Puc J, Aggarwal AK, Rosenfeld MG. Physiological functions of programmed DNA breaks in signal-induced transcription. Nature Reviews Molecular Cell Biology. 2017. 29. Freitag CM, Staal W, Klauck SM, et al. Genetics of autistic disorders: review and clinical implications. European Child & Adolescent Psychiatry. 2010;19(3):169-178. 30. De Rubeis S, Buxbaum JD. Recent advances in the genetics of autism spectrum disorder. Current neurology and neuroscience reports. 2015;15(6):36. 31. Gokoolparsadh A, Sutton GJ, Charamko A, et al. Searching for convergent pathways in autism spectrum disorders: insights from human brain transcriptome studies. Cell Mol Life Sci. 2016;73(23):4517-4530. 32. Baio J, Wiggins L, Christensen DL, et al. Prevalence of autism spectrum disorder among children aged 8 years—Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2014. MMWR Surveillance Summaries. 2018;67(6):1. 33. Liu X, Han D, Somel M, et al. Disruption of an evolutionarily novel synaptic expression pattern in autism. PLoS biology. 2016;14(9):e1002558. 34. Bourgeron T. From the genetic architecture to synaptic plasticity in autism spectrum disorder. Nature Reviews Neuroscience. 2015;16(9):551. 35. Abrahams BS, Geschwind DH. Advances in autism genetics: on the threshold of a new neurobiology. Nature reviews Genetics. 2008;9(5):341. 36. Miles JH. Autism spectrum disorders—a genetics review. Genetics in Medicine. 2011;13(4):278-294. 37. Chen JA, Peñagarikano O, Belgard TG, et al. The emerging picture of autism spectrum disorder: genetics and pathology. Annual Review of Pathology: Mechanisms of Disease. 2015;10:111-144. 38. Modabbernia A, Velthorst E, Reichenberg A. Environmental risk factors for autism: an evidence-based review of systematic reviews and meta-analyses. Molecular autism. 2017;8(1):13. 39. Dufour-Rainfray D, Vourc’h P, Tourlet S, et al. Fetal exposure to teratogens: evidence of genes involved in autism. Neuroscience & Biobehavioral Reviews. 2011;35(5):1254-1265. 40. Mandell DS, Novak MM, Zubritsky CD. Factors associated with age of diagnosis among children with autism spectrum disorders. Pediatrics. 2005;116(6):1480-1486. 41. Kong A, Frigge ML, Masson G, et al. Rate of de novo mutations and the importance of father/'s age to disease risk. Nature. 2012;488(7412):471-475. 42. Buxbaum JD. Multiple rare variants in the etiology of autism spectrum disorders. Dialogues in clinical neuroscience. 2009;11(1):35. 43. Shinawi M, Schaaf CP, Bhatt SS, et al. A small recurrent deletion within 15q13. 3 is associated with a range of neurodevelopmental phenotypes. Nature genetics. 2009;41(12):1269-1271. 44. Bacchelli E, Battaglia A, Cameli C, et al. Analysis of CHRNA7 rare variants in autism spectrum disorder susceptibility. American journal of medical genetics Part A. 2015;167(4):715-723.

85

45. Kim TK, Eberwine JH. Mammalian cell transfection: the present and the future. Analytical and bioanalytical chemistry. 2010;397(8):3173-3178. 46. MacDonald WA. Epigenetic mechanisms of genomic imprinting: common themes in the regulation of imprinted regions in mammals, plants, and insects. Genetics research international. 2012;2012. 47. Williams SM, An JY, Edson J, et al. An integrative analysis of non-coding regulatory DNA variations associated with autism spectrum disorder. Molecular psychiatry. 2018. 48. Allen NJ. Astrocyte regulation of synaptic behavior. Annual review of cell and developmental biology. 2014;30:439-463. 49. Kettenmann H, Verkhratsky A. Neuroglia: the 150 years after. Trends in neurosciences. 2008;31(12):653-659. 50. Allen NJ, Eroglu C. Cell biology of astrocyte-synapse interactions. Neuron. 2017;96(3):697-708. 51. Clarke LE, Barres BA. Emerging roles of astrocytes in neural circuit development. Nature Reviews Neuroscience. 2013;14(5):311. 52. Chung W-S, Allen NJ, Eroglu C. Astrocytes control synapse formation, function, and elimination. Cold Spring Harbor perspectives in biology. 2015;7(9):a020370. 53. Haim LB, Rowitch DH. Functional diversity of astrocytes in neural circuit regulation. Nature Reviews Neuroscience. 2017;18(1):31. 54. Molofsky AV, Krenick R, Ullian E, et al. Astrocytes and disease: a neurodevelopmental perspective. Genes & development. 2012;26(9):891-907. 55. Son M, Diamond B, Santiago-Schwarz F. Fundamental role of C1q in autoimmunity and inflammation. Immunologic research. 2015;63(1-3):101-106. 56. Chung W-S, Clarke LE, Wang GX, et al. Astrocytes mediate synapse elimination through MEGF10 and MERTK pathways. Nature. 2013;504(7480):394. 57. Perea G, Navarrete M, Araque A. Tripartite synapses: astrocytes process and control synaptic information. Trends in neurosciences. 2009;32(8):421-431. 58. Fields RD, Woo DH, Basser PJ. Glial regulation of the neuronal connectome through local and long-distant communication. Neuron. 2015;86(2):374-386. 59. Perea G, Sur M, Araque A. Neuron-glia networks: integral gear of brain function. Frontiers in cellular neuroscience. 2014;8:378. 60. Eroglu C, Barres BA. Regulation of synaptic connectivity by glia. Nature. 2010;468(7321):223. 61. Mabb AM, Simon JM, King IF, et al. Topoisomerase 1 regulates gene expression in neurons through cleavage complex-dependent and-independent mechanisms. PloS one. 2016;11(5):e0156439. 62. Gokoolparsadh A, Fang Z, Braidy N, et al. Topoisomerase I inhibition leads to length- dependent gene expression changes in human primary astrocytes. Genomics data. 2017;11:113-115. 63. Gilbert LA, Horlbeck MA, Adamson B, et al. Genome-scale CRISPR-mediated control of gene repression and activation. Cell. 2014;159(3):647-661. 64. Sander JD, Joung JK. CRISPR-Cas systems for editing, regulating and targeting . Nature biotechnology. 2014;32(4):347. 65. Lupo A, Cesaro E, Montano G, et al. KRAB-zinc finger proteins: a repressor family displaying multiple biological functions. Current genomics. 2013;14(4):268-278. 66. Urrutia R. KRAB-containing zinc-finger repressor proteins. Genome biology. 2003;4(10):231. 67. Groner AC, Tschopp P, Challet L, et al. The Kruppel-associated box repressor domain can induce reversible heterochromatization of a mouse locus in vivo. Journal of Biological Chemistry. 2012:jbc. M112. 350884.

86

68. Larson MH, Gilbert LA, Wang X, et al. CRISPR interference (CRISPRi) for sequence-specific control of gene expression. Nature protocols. 2013;8(11):2180. 69. Allen M, Bjerke M, Edlund H, et al. Origin of the U87MG glioma cell line: Good news and bad news. Science translational medicine. 2016;8(354):354re353-354re353. 70. Dolgin E. Venerable brain-cancer cell line faces identity crisis. Nature news. 2016;537(7619):149. 71. Doench JG, Fusi N, Sullender M, et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nature biotechnology. 2016;34(2):184. 72. Abrahams BS, Arking DE, Campbell DB, et al. SFARI Gene 2.0: a community-driven knowledgebase for the autism spectrum disorders (ASDs). Molecular autism. 2013;4(1):36. 73. Voineagu I, Wang X, Johnston P, et al. Transcriptomic analysis of autistic brain reveals convergent molecular pathology. Nature. 2011;474(7351):380. 74. Neuhaus B, Tosun B, Rotan O, et al. Nanoparticles as transfection agents: a comprehensive study with ten different cell lines. RSC Advances. 2016;6(22):18102-18112. 75. Gokoolparsadh A, Fang Z, Braidy N, et al. Transcriptional response to mitochondrial protease IMMP2L knockdown in human primary astrocytes. Biochemical and biophysical research communications. 2017;482(4):1252-1258. 76. Ma Y, Mehta SL, Lu B, et al. Deficiency in the inner mitochondrial membrane peptidase 2-like (Immp21) gene increases ischemic brain damage and impairs mitochondrial function. Neurobiology of disease. 2011;44(3):270-276. 77. Bertelsen B, Melchior L, Jensen LR, et al. Intragenic deletions affecting two alternative transcripts of the IMMP2L gene in patients with Tourette syndrome. European Journal of Human Genetics. 2014;22(11):1283. 78. Rossignol D, Frye R. Mitochondrial dysfunction in autism spectrum disorders: a systematic review and meta-analysis. Molecular psychiatry. 2012;17(3):290. 79. Betancur C, Sakurai T, Buxbaum JD. The emerging role of synaptic cell-adhesion pathways in the pathogenesis of autism spectrum disorders. Trends in neurosciences. 2009;32(7):402-412. 80. Cao X, Tabuchi K. Functions of synapse adhesion molecules neurexin/neuroligins and neurodevelopmental disorders. Neuroscience research. 2017;116:3-9. 81. Krueger DD, Tuffy LP, Papadopoulos T, et al. The role of neurexins and neuroligins in the formation, maturation, and function of vertebrate synapses. Current opinion in neurobiology. 2012;22(3):412-422. 82. Chubykin AA, Atasoy D, Etherton MR, et al. Activity-dependent validation of excitatory versus inhibitory synapses by neuroligin-1 versus neuroligin-2. Neuron. 2007;54(6):919-931. 83. Stogsdill JA, Ramirez J, Liu D, et al. Astrocytic neuroligins control astrocyte morphogenesis and synaptogenesis. Nature. 2017;551(7679):192. 84. Patel S, Roncaglia P, Lovering RC. Using Gene Ontology to describe the role of the neurexin-neuroligin-SHANK complex in human, mouse and rat and its relevance to autism. BMC bioinformatics. 2015;16(1):186. 85. Hunt MA, Currie MJ, Robinson BA, et al. Optimizing transfection of primary human umbilical vein endothelial cells using commercially available chemical transfection reagents. Journal of biomolecular techniques: JBT. 2010;21(2):66. 86. Parikshak NN, Swarup V, Belgard TG, et al. Genome-wide changes in lncRNA, splicing, and regional gene expression patterns in autism. Nature. 2016;540(7633):423-427. 87. Morris KV, Mattick JS. The rise of regulatory RNA. Nature reviews Genetics. 2014;15(6):423.

87

88. Qureshi IA, Mattick JS, Mehler MF. Long non-coding RNAs in nervous system function and disease. Brain research. 2010;1338:20-35. 89. Neale BM, Kou Y, Liu L, et al. Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature. 2012;485:242. 90. Ma D, Salyakina D, Jaworski JM, et al. A genome‐wide association study of autism reveals a common novel risk locus at 5p14. 1. Annals of human genetics. 2009;73(3):263- 273. 91. Gonzales ML, LaSalle JM. The role of MeCP2 in brain development and neurodevelopmental disorders. Current psychiatry reports. 2010;12(2):127-134. 92. Guy J, Cheval H, Selfridge J, et al. The role of MeCP2 in the brain. Annual review of cell and developmental biology. 2011;27:631-652. 93. Lim JP, Gosavi P, Mintern JD, et al. Sorting nexin 5 (SNX5) selectively regulates dorsal ruffle-mediated macropinocytosis in primary macrophages. J Cell Sci. 2015:jcs. 174359. 94. Gorman JA, Hundhausen C, Errett JS, et al. The A946T variant of the RNA sensor IFIH1 mediates an interferon program that limits viral infection but increases the risk for autoimmunity. Nature immunology. 2017;18(7):744. 95. Vourc'h P, Martin I, Bonnet-Brilhault F, et al. Mutation screening and association study of the UBE2H gene on chromosome 7q32 in autistic disorder. Psychiatric genetics. 2003;13(4):221-225. 96. Macintyre G, Alford T, Xiong L, et al. Association of NPAS3 exonic variation with schizophrenia. Schizophrenia research. 2010;120(1-3):143-149. 97. Miller C, Wang L, Ostergaard E, et al. The interplay between SUCLA2, SUCLG2, and mitochondrial DNA depletion. Biochimica et Biophysica Acta (BBA)-Molecular Basis of Disease. 2011;1812(5):625-629. 98. Bucan M, Abrahams BS, Wang K, et al. Genome-wide analyses of exonic copy number variants in a family-based study point to novel autism susceptibility genes. PLoS genetics. 2009;5(6):e1000536. 99. Kumar RA, Sudi J, Babatz TD, et al. A de novo 1p34. 2 microdeletion identifies the synaptic vesicle gene RIMS3 as a novel candidate for autism. Journal of medical genetics. 2009. 100. Hutcheson HB, Olson LM, Bradford Y, et al. Examination of NRCAM, LRRN3, KIAA0716, and LAMB1 as autism candidate genes. BMC medical genetics. 2004;5(1):12. 101. Kim YJ, Park JK, Kang WS, et al. LAMB1 polymorphism is associated with autism symptom severity in Korean autism spectrum disorder patients. Nordic journal of psychiatry. 2015;69(8):594-598. 102. Sloan R, Huang S-yN, Pommier Y, et al. Effects of camptothecin or TOP1 overexpression on genetic stability in Saccharomyces cerevisiae. DNA repair. 2017;59:69-75. 103. Zylka MJ, Simon JM, Philpot BD. Gene length matters in neurons. Neuron. 2015;86(2):353-355. 104. Greenblatt EJ, Spradling AC. Fragile X mental retardation 1 gene enhances the translation of large autism-related proteins. Science. 2018;361(6403):709-712. 105. Iossifov I, Ronemus M, Levy D, et al. De novo gene disruptions in children on the autistic spectrum. Neuron. 2012;74(2):285-299. 106. Li M, Liu Y. Topoisomerase I in human disease pathogenesis and treatments. Genomics, proteomics & bioinformatics. 2016;14(3):166-171. 107. Heman-Ackah SM, Bassett AR, Wood MJA. Precision modulation of neurodegenerative disease-related gene expression in human iPSC-derived neurons. Scientific reports. 2016;6:28420.

88

108. Manni I, Tunici P, Cirenei N, et al. Mxi1 inhibits the proliferation of U87 glioma cells through down-regulation of cyclin B1 gene expression. British journal of cancer. 2002;86(3):477. 109. Yeo NC, Chavez A, Lance-Byrne A, et al. An enhanced CRISPR repressor for targeted mammalian gene regulation. Nature methods. 2018:1.

89