Discovery of cancer splicing and associated auto-regulatory networks through cross-species

circadian analysis

A dissertation submitted to the Graduate School of the University of Cincinnati in partial

fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

In the Department of Pharmacology & Systems Physiology of the College of Medicine

By

Krithika Ramasamy

University of Cincinnati

Fall 2019

Thesis Advisor and Committee Chair: Dr. Nathan Salomonis

1 ABSTRACT

Disruption of circadian rhythm can lead to serious sleeping disorders and predispose to a number of life-threatening diseases, including cancer. Circadian splicing adds an additional regulatory layer to the time keeping mechanism in plants, flies and mammals. The circadian regulation of alternative splicing, including that of core clock themselves, are speculated to be a central mediator of clock function, however, no comprehensive analyses of this mechanism exist to date. To develop an improved understanding of circadian splicing in mammals I describe a series of comprehensive analyses of circadian splicing within and across diverse healthy mouse, baboon and human tissues, spanning thousands of samples. These analyses confirm that conserved tissue-specific and tissue-shared circadian splicing events (CSEs) are frequent and can be identified from multiple study designs and recommend workflows for accurate circadian splicing analysis. Our analysis demonstrates a higher number of tissue-specific CSEs compared to circadian expression. Transcriptionally, temperature sensitive and other circadian splicing factors (SFs) are also found rhythmic in majority of the tissues. Cross-tissue CSEs frequently contain binding sites for these circadian SFs likely targeting specific CSEs and regulating splicing at the peripheral tissue level. Notably, these evolutionarily conserved CSEs and pan-tissue CSEs frequently impact prior defined cancer regulators, RNA binding previously implicated in thermoregulation and splicing auto-regulation. I demonstrate the importance of these circadian findings specific in

Lung cancer, which suggest the existence of novel putative chronotherapeutic targets. To enable the broad research community, we have developed an easy-to-use online web-portal to explore and compare these results across species. As such, these data have the potential to highlight intriguing new roles for splicing regulation in normal circadian biology.

2

3 ACKNOWLEDGEMENTS

I am grateful to everyone who helped me in big and small ways throughout my graduate studies. I would like to express my deepest gratitude to my mentor and academic advisor Dr.

Nathan Salomonis, who, without his constant support and guidance, wouldn’t have been possible.

He has always been there and ready to go the extra mile for his students. His encouragement and constructive advice helped me to gain a holistic understanding of computational approaches for big data and stay on track.

I am thankful to my advisor Dr. Christian I Hong for his insights and technical guidance throughout the development of this project. I would also like to thank other members of my thesis committee Dr. Jarek Meller, Dr. Tongli Zhang and Dr Yana Zavros for their guidance, insights and valuable input throughout my graduate work. I would like to thank Dr. Steven Kleene for his oversight and commitment to the success of students in the System’s Biology program. I sincerely appreciate all the efforts and help that Mrs. Jeannie Cummins has done since my first day in this program.

I thank Dr. Meenakshi Venkatasubramanian for her time in helping me with programming and genomic data analysis and Andrew Rosselot for his contribution in experimental validation. I thank other members of my lab; Kahish Chetal for developing analysis pipelines, Stuart Hay for building visualization platforms and Aishwarya Kulkarni for her efforts in motif-based analysis. I am grateful to other chronobiology researchers from Hogenesch lab; Dr. Marc Ruben, Dr. Gang

Wu for their contribution and advices in shaping specific analyses.

4 I sincerely appreciate Department of Pharmacology and Physiology, UC and Division of

Biomedical Informatics, CCHMC for this opportunity to conduct my doctorate thesis research and all the exposures, which has led me to this point.

Finally, I would like to thank my husband Rahul K Siva for being generous, understanding and my pillar of strength, without him, my dream of this doctorate wouldn’t have come true. I am fortunate to have him and my daughter Isha Karthik who made this whole experience a memorable one. I am blessed to have my parents-in-law Kalavathy Venkatachalam, Sivagaminathan

Kanakasundaram and my mom Jeya Subramanian. I express my heartfelt gratitude for their abundant love and support.

5 TABLE OF CONTENTS

Chapter 1: Introduction: ...... 9

1.1: Pioneering studies defining the nature of circadian clock ...... 9

1.2: Physiology of circadian rhythm ...... 13

1.3: Pathology ...... 18

1.4: Rationale behind this study ...... 23

1.5 Objectives of the Thesis ...... 27

Chapter 2: Discovery of cancer splicing and associated auto-regulatory networks through cross-species circadian analysis ...... 29

2.1: ABSTRACT ...... 30

2.2: INTRODUCTION ...... 30

2.3: MATERIALS AND METHODS ...... 34 2.3.1: Evaluation Datasets ...... 34 2.1.2: Algorithms and tools used for the analysis ...... 35 2.1.3: Transcriptomic Analyses ...... 40 2.1.4: Periodicity Analysis ...... 42 2.1.5: Conserved Splicing Event Analysis ...... 43

2.4: RESULTS ...... 45 2.4.1: Reliable detection of circadian splicing from time-course RNA-Seq data ...... 45 2.4.2: Distinct tissue and phase-dependent impacts of circadian splicing ...... 50 2.4.3: Circadian splicing regulates distinct pathways from circadian gene expression ...... 56 2.4.4: Conserved circadian splicing from mouse to primates ...... 59 2.4.5: Circadian splicing mediates splicing factor autoregulation and is a hallmark of lung cancer ...... 64 2.4.6: Interactive navigation and exploration of cross-tissue and cross-species circadian splicing ...... 69

2.5: DISCUSSION ...... 70

Chapter 3: Discussion ...... 90

REFERENCES CITED ...... 96

6 FIGURES

Figure 1.1. Core molecular machinery in Neurospora crassa...... 11 Figure 1.2. Evolutionary similarity in the molecular structure of core clock in fly and Mammals...... 13 Figure 1.3. Synchronization of central clock and peripheral clock...... 14 Figure 1.4: Implications of circadian rhythm and alternative splicing in disease ...... 21 Figure 1.5: Schematic of circadian alternative splicing...... 24 Figure 2.1. MultiPath-PSI algorithm for estimation of alternative splicing...... 36 Figure 2.2. Circadian splicing analysis within and across diverse mammalian models ...... 48 Figure 2.3. Circadian splicing impacts distinct tissues and augments circadian gene expression 52 Figure 2.4. Circadian splicing selectively impacts spliceosome and cancer regulators ...... 57 Figure 2.5. Circadian splicing is conserved across species and tissues ...... 61 Figure 2.6. Human circadian splicing is associated with autoregulation of splicing factors and cancer splicing variation ...... 66 Figure S1. Detection of circadian splicing event predictions from diverse rhythmicity algorithms and technological platforms ...... 74 Figure S2. Rhythmic splicing of core circadian genes ...... 78 Figure S3. Circadian splicing profile of RNA-binding proteins in mouse cerebellum ...... 82 Figure S4. Tissue and circadian splicing events predicted from diverse baboon organ systems . 84

7 TABLES

Table 1. The most common cross-tissue murine CSEs ...... 54 Table 2. Conserved cross-tissue transcriptionally regulated circadian splicing factors...... 59 Table 3. Top ranked conserved cross-tissue murine-baboon CSEs ...... 64 Table S1 – Circadian regulators and their splicing events in tissues expressed...... 75 Table S2 – Cross tissue cses in 5 or more mouse tissues ...... 78 Table S3. Rbps targeting cross tissues cses from postar 2.0 ...... 83 Table S5: Top 50 CSEs conserved in mm and baboon (tissue matched to same or related tissues) ...... 85 Table S7: CSEs conserved between mouse and human ...... 86 Table S8: literature evidence for circadian spliced genes in multiple cancer progression and metastatic pathway...... 88

8 Chapter 1: Introduction:

1.1: Pioneering studies defining the nature of circadian clock

Animals, birds, plants, insects and almost every organism are sensitive to diurnal variations in light intensity and other environmental factors. Depending on these variations, organisms switch between states of activity and resting, otherwise known as the sleep-wake cycle. Since this change in physiological behavior is coordinated around the time of day it referred to as circadian rhythm

(CR) (circa=around, diem=day). Circadian rhythm or the circadian clock is an internal, biological time keeping mechanism that synchronizes biological rhythmic behavior in constant day-night conditions. The circadian clock consists of three integral components; 1) Input, 2) Oscillator and

3) Output [1].

1. Input - Signals from the environment, such as light, temperature, feeding, etc., which work

to coordinate and reset the clock. Since the free running period of the clock is not always

24hrs, resetting of the clock is essential.

2. Oscillator - Central molecular clock that perceives the input signal and alters the behavior

of an organism according to the environment.

3. Output - Downstream target genes altered by the oscillator to onset required physiological

changes.

Though, the role and function of a central circadian clock has been extensively studied for over 60 years, one of the earliest mechanistic reports on biological rhythm dates back to 1729 by De

Marian, a geophysicist and a chrono biologist, who observed rhythmic leaf movement in a legume plant. A striking observation of his work was that the opening and closure of leaves in Mimosa

Pudica is not affected by external light conditions but via one or more intrinsic mechanisms. The

9 work of de Mairan was recognized a century later in the field [2]. Since then it has evolved into a modern understanding of circadian biology, highlighting key biochemical and molecular mechanisms [3]. Focused research in the model organisms Neurospora crassa, Arabidopsis thaliana, Drosophila, Mus musculus etc., have been central to this improved understanding of the clock [4] [5, 6].

Some of the most informative molecular mechanism underlying the biological clock were decoded by in depth mutagenesis screens, in which mutants with altered rhythm or arrhythmic phenotype were identified and associated with specific genomic loci [7] [8]. Simple model organisms like Neurospora crassa (N.c.) have used extensively to study diurnal variations since early 1960s [4]. N.c is filamentous fungus whose conidiation can be entrained by variations in the light intensity. N.c. undergoes robust 24h conidiation cycles when exposed to a light dark regiment, but follows a continuous conidiation when exposed to constant light conditions [9]. As such, N.c circadian rhythm has been well characterized both biochemically and genetically. In 1973, mutations in the gene frequency (frq) [10] were found that result in shorter/longer circadian periods with no major effect for light or temperature variations. Further studies [11] proved that the frq is crucial for clock organization and that frq itself a core molecular clock component in N.c.

10

Lee et al., (2000) Science 289:107-110

Figure 1.1. Core molecular machinery in Neurospora crassa. Transcriptional ad translational feedback loops of the clock activator Frequency (FRQ) and the repressors White color -1,

2 (WC-1, 2) [12] (Permission to reuse the figure was obtained from the corresponding author).

In 1980s, an important clock repressor in N.c. was identified called the White collar 1 and

2 complex (WC-1 and WC-2) [13]. The discovery of WCC led to the decoding of the core circadian molecular regulatory network in N.c. by several chrono biologists [12] (Figure 1.1).

WCC and FRQ are found to have interlocked feedback loops hence each of their abundance is regulated by another during light/dark cycle. Oscillations in the expression of FRQ and WCC moderates the expression of clock output genes ccg-1 and ccg-2 [14]. As a result of this work, it became evident that the central clock is regulated by multiple transcriptional and translational feedback loops that are regulated by both enhancers and repressors. It is important to note that this

11 feature is highly conserved from cyanobacteria to plant (Arabidopsis thaliana) to mouse and human [15].

In addition to N.c., one of the most extensively studied circadian model organisms is

Drosophila. Early work by Colin Pittendrigh in 1950s on the eclosion rhythm in Drosophila pseudoobscura, which is entrained by light cycles with changes in temperature, set the foundation for modern circadian biology research [16]. Up until early 1980s, various physiological phenomenon regulated by CR in Drosophila had been well studied but the molecular pathways mediating the central clock remained unclear. Due to advancement in the field of recombinant

DNA technology, in 1984, the first clock gene from multi-cellular organisms, period (Per) was cloned [17, 18] by three researchers; Jeffery C. Hall, Michael Rosbash and Michael Young, who later won Nobel prize in 2017 for setting this milestone. Important discoveries in determining the function of per by Rosbash and Hall [19] and a second clock gene timeless (Tim) by Young [20] has led to the molecular characterization of circadian clock and its regulation via multiple transcriptional and translational feedback loops in fly and later in mammals. PER and acts a repressor for CLOCK-CYCLE dimer, E-box binding proteins which is a transcription factor for per and tim [21].

Initial findings in Neurospora and Drosophila drove the circadian research field to identify mammalian homologs for core clock genes in mammals. In 1994 significant work by Dr.

Takahashi and his colleagues led to the discovery of the Clock gene in mouse via mutagenesis screening [22]. Following that, various genes which act as transcriptional enhancers (BMAL1) and repressors (CRY1,2 and PER1, 2, 3) of the mammalian clock were identified. Also reports by Sun et al.,1997 [23] and Tei et al., 1997 [24] characterized multiple transcripts of mPer genes, which

12 show sequence similarity between Drosophila and mammals, demonstrating evolutionary conservation of the molecular clock (Figure 1.2).

Rong-Chi Huang Biomed J. 2018 Feb; 41(1): 5–8

Figure 1.2. Evolutionary similarity in the molecular structure of core clock in fly and

Mammals. Central clock with repressors per and timeless of fly clock (on the left) upon accumulation represses the expression of E-box binding transcription activators CLOCK and

CYCLE. Mammalian Per1, 2, 3 and Cry1, 2 homologs of per and tim (on the right) are regulated in similar fashion, which possess transcriptional and translational feedback loops along with

BMAL1-CLOCK [3] (Permission to reuse the figure was obtained from the corresponding author).

1.2: Physiology of circadian rhythm

Three important features of circadian clock are;

1. It is endogenous which means that it does not require external time cues to initiate the

cyclical process.

13 2. It has the ability to synchronize or reset the clock with the environment known as

entrainment, which is essential for aligning to the local environment.

3. It also has the ability to maintain circadian period at different temperatures within the

physiological range referred to as temperature compensation [25, 26].

Figure 1.3. Synchronization of central clock and peripheral clock. Entrainment from SCN to clocks in peripheral tissues is achieved via multiple modes including temperature, hormones, food and other mechanisms [27]. (Permission to reuse the figure was obtained from the corresponding author).

The central circadian clock in mammal is located in the suprachiasmatic nucleus (SCN) within the hypothalamus [28]. The SCN receives light signals from the environment via the retinal root ganglion which integrates information from the external environment to various regions of the brain. Subsequently, the SCN is predicted to coordinate peripheral clocks in other organs,

14 primarily through hormonal control [29] (Figure. 1.3). The significance of the SCN in governing the mechanistic control of the central clock was realized when an SCN transplant study in hamster showed impaired rhythmic behavior [30]. Further studies have shown the expression of seven crucial core clock genes oscillating in the SCN. The SCN is resistant to dramatic changes in the environment ex: acute variations in the ambient temperature within physiological range does not impact the free running period of the clock which is referred to as temperature compensation. This characteristic of the clock is essential for maintaining homeostasis in the system against noise in the environment.

While, the SCN is essential for mediating central oscillations in the system, reports on circadian genes retaining rhythmicity in rat fibroblast cell cultures identified the possibility of cell autonomous clocks being present in all cells of peripheral tissues [31]. The development of a

PER2::LUCIFERASE (PER2::LUC) reporter system in mouse was extremely useful in determining the role of SCN in peripheral clock function [32]. The creation of an SCN-specific clock mutant mouse, demonstrated that peripheral tissues can undergo self-sustained rhythm which discerns the likelihood of peripheral synchronizers present to align peripheral clocks, independent of the central clock.

Substantial work has been dedicated to identifying core molecular clock components.

Nevertheless, molecular screens alone are inadequate to understand the broader role of specific molecules and physiological regulation of the clock in mammals. In early 2000s, several studies attempted to address this gap in the knowledge by profiling tissues with the use of DNA microarrays [33]. Initial transcriptome profiling of peripheral tissues include the analysis of mouse skeletal muscle [34], which finds that ~4% of the transcriptome is modulated by CR and that muscle specific genes undergo rhythmic regulation. Similar findings were observed from human

15 transcriptome studies which found that exercise can entrain the clock in skeletal muscle [35].

Comparative studies of cycling transcripts in mouse liver and SCN [36] predicted that up-to-10% of all transcripts are rhythmic. These studies consistently find that while all cellular clocks are synchronized to the central pacemaker, they still undergo tissue specific circadian transcription with discrepancies in phase. Similar to these pioneering studies, others have also reported hundreds of circadian transcripts specific to different tissues like heart, skin etc., [37], [38].

While DNA microarrays provide reasonable detection of transcription for known genes

(dependent on the array design), there are a number of limitations of this technology for carefully quantifying the circadian transcriptome. For example, while some array platforms enable the detection of alternative splicing (AS), several limitations of these array designs can limit such analyses. In the case of AS, up-to-30% of splicing events can be novel, and include intron-retention events which are not detected via microarray based strategies [39]. Recent advances in high- throughput sequencing address these pitfalls and have refined our understanding of the regulation of clock-controlled genes. In 2012, Hughes et al., [40] sequenced the brain transcriptome of

Drosophila (in wild type and period null), in addition to some of the rhythmic transcripts already reported in legacy studies, highlighting novel rhythmic genes and splicing events. In the same study, alteration in the rhythmic behavior of hundreds of circadian genes was observed when flies were exposed to a different light regiment (12h Light dark (LD) or constant darkness (DD)). Most importantly the power of such analysis is that it also reveals the extent of post-transcriptional regulation impacted by CR using per null mutants. RNA-Seq of the liver circadian transcriptome further revealed miRNA regulation in circadian transcript expression [41]. Up-to-15% of the total mRNA transcripts were found to be rhythmic in the mouse liver. A striking observation of this study is that knocking out specific miRNAs alters the abundance, stability and efficiency of

16 translation of these cyclical transcripts into proteins [41]. Next generation sequencing has significantly advanced the field of alternative splicing (AS). Circadian AS analysis in the plant

Arabidopsis thaliana have demonstrated that several circadian transcripts, including core circadian genes such as CCA1, RVE2, LHY undergo extensive AS. Intron retention frequently results in pre- termination codons (PTC) that can result in nonsense-mediated decay (NMD) in plants [42].

Though such studies shed light into the post-transcriptional regulation of clock, the extent of their interaction was unknown until recent nascent-RNA sequencing of mouse liver was performed [43].

An intriguing observation of this study is that only 42% of circadian genes are found to have rhythmic isoform expression while others possess steady-state or arrhythmic isoform expression which are were deemed likely to impact protein expression. Such results highlight the extent of the crosstalk between these two processes; CR and post-transcriptional modification.

While transcriptional and translational feedback loops are now considered to be central to clock regulation, a broader role for alternative splicing in CR has not been a significant focus until recently. A study by researchers in the John Hogenesch lab reported the most extensive mammalian map of circadian gene regulation in 2014, by surveying 12 mouse tissues and brain regions [44]. This study reports several insights into the circadian clock predicted to affect organ physiology, including the peak expression of many rhythmic transcripts around dawn and dusk of the subjective day. Further, these data suggest systems-level circadian adaptations assuming change in the day-night cycle, rhythmic expression of more than 1000 non-coding RNAs having potential role in clock regulation, sensitivity of commercial drugs which targets circadian genes and their efficacy over a change in time dosage regimen. Such studies have the potential to revolutionize our understanding of the role of the circadian clock in organismal physiology. Recent mounting evidences suggests that CR is not just involved in jetlag associated

17 sleeplessness/insomnia, but also in regulating immune responses, cell metabolism, and cell cycle, among others [45, 46].

1.3: Pathology

Circadian clock disruption has been implicated in various metabolic, cardiovascular, behavioral, neurological, sleep disorders and even in cancer. The classic 6 chronotypes in mammals based on the sleep-wake cycle are;

Normal - active during the period between sunrise to sunset

Lark and Advanced sleep phase syndrome (ASPD) - active during early mornings before sunrise and rests while light persists

Owl and Delayed sleep phase syndrome (DSPD) peak activity is during late evening or night.

Non-24hr - Arrhythmic in sleep-wake patterns

While Advanced Sleep Phase Syndrome (ASPD) is rare and proven to be a result of genetic mutations in the hPer2 gene, Delayed Sleep Phase Syndrome (DSPD) is the frequently observed chronotype [47]. The role of the circadian clock in diseases gained attention after several reports of decreased cognitive alertness and insomnia in night shift workers made them prone to accidents

[48]. Melatonin is a naturally produced hormone during the onset of sleep phase and is proved to be oscillating under the robust influence of the circadian clock and is administered for people with sleep disorders. Lowered levels of melatonin synthesis increase the risk of various diseases including stroke, atherosclerosis and reduced blood pressure [49].

Apart from melatonin, another hormone that is robustly rhythmic and controlled by hypothalamic-pituitary axis (HPA) is the stress hormone cortisol, which is essential for the homeostasis [50] [51]. Cortisol is shown to have a strong association with epilepsy with the

18 distribution of seizures from epileptic patients displaying a sharp raise during the awakening phase

(around morning 4 am) [52]. Difference in the sleep-wake pattern and sleeplessness are reported in people with neurological diseases including Alzheimer’s and Parkinson’s diseases [53].

Alzheimer’s disease is associated with brain aging and its lack of ability to protect against oxidative damages and neurodegenerations. In this aspect it was reported that the deletion of Bmal1 causes synaptic vulnerability to oxidative stress [53], which is suspected to be involved in the accumulation of amyloid precursor protein (APP) in Alzheimer’s disease. Patients with early

Parkinson’s disease are reported to have reduced SCN activity and altered circadian gene expression. This altered rhythm impacts the production of circadian hormones like melatonin and cortisol in the blood and worsens the symptoms of insomnia [54]. Not only cognitive disturbances but also in behavioral disorders, an improved understanding of circadian rhythm at the level of molecular clock is increasing. Common psychiatric disorders from mood disturbance, obsession, depression to maniac conditions and potentially schizophrenia are linked to disrupted of the circadian clock [55]. Melatonin has been shown to decrease symptoms in various psychological disorders [56]. These strong associations increase the significance of CR in mental health and well- being.

Several diseases have been implicated in the brain-gut axis due to disruption of the circadian clock. Metabolic disorders have a strong correlation with the misaligned clock. Examples of metabolic disorders includes, but are not restricted to diet-induced diabetes, cardiovascular diseases, obesity and also cancer [57-59]. The prevalence of metabolic disorders in the USA is

35%, or approximately 1 in 3 people. Diabetes mellitus is a common metabolic disorder characterized by impaired insulin secretion. It is often referred to as a lifestyle disease, because factors such as diet, sedentary lifestyle, exercise and sleep highly impact the glucose intake and

19 usage by an individual [59]. Hormones like insulin and ghrelin which are indispensable for glucose homeostasis oscillates during sleep-activity phase or feeding-fasting phase [60]. Clock mutant animals are observed to have impaired secretion of or sensitivity to insulin, which result in dysfunctional glucose metabolism [59]. A role for the circadian clock in regulating vasculature and cardiac muscle cells function have been identified from various animal model studies. Among the physiological fluctuations modulated by the clock, a sharp rise in blood pressure during the awakening period is notable because of its correlation with increased death due to cardiac arrest.

Phase alteration of the light-dark cycle in hamsters prone to cardiomyopathy, show an 11% decreased survival rate due to hypertrophic cardiomyopathy [61]. In a high-pressure cardiac hypertrophy mouse model, disruption of clock function have been shown to alter the architecture of the vasculature, reduce contractility and induce the differential expression of key hypertension genes in cardiomyocytes [62]. This evidence establishes a strong argument for the circadian clock acting as a deterministic factor in cardiovascular diseases.

Along the brain-gut axis, there is evidence demonstrating the ability of CR to modulate not only organ-level physiology but also the gut microbiome [63]. It has been long known that microbiota in the epithelial lining, especially in the gut interacts with the host [63]. Cross-talk between peripheral organs such as gut, liver and pancreas to the SCN are perturbed by altered diet.

This phenomenon has been shown to affect the microbial content in the gut causing microbial dysbiosis. These same studies show that specific strains of bacteria like Enterobacter aerogenes in the gut have a CR synchronized by host melatonin production [63]. Distorted clock function can also result in a disrupted intestinal barrier, which is vital for defense mechanism. Damage of the mucosal barrier is prone to proinflammatory activity and increased chances of infection, combined with improper gating of cell cycle ultimately may result in gastric cancer. A classic feature of cell-

20 cycle are tightly regulated checkpoints for verifying the fidelity of DNA replication. For proper functioning of cell cycle, expression of specific Cyclins during specific phases is crucial [46].

CLOCK and BMAL1 promote the expression of Wee1 impacting WEE1 kinase to modulate the activity of mitosis promoting factor (MPF) which consists of Cyclin B1 and CDK1, regulating the entry of cells from G2-to-M phase [46]. Hence, clock dysfunction has gathered attention for its potential to impact oncogenic pathways. Evidence reported with differential expression of core clock genes Per1, 2 in breast cancer, suppression of Per and Cry gene in esophageal cancer, clock gene malfunction in non-Hodgkin lymphoma, leukemia, pancreatic cancer and in endometrial cancer [64] provide strong evidence that the clock is disrupted in cancer. Hence, the significance of the biological clock in organism health and its ubiquitous presence in almost all physiological responses is widely appreciated. However, a detailed map of the molecular activity of the circadian clock in different dimensions has yet to be fully elucidated to enable chronotherapy strategies in the treatment of clock induced pathologies.

Figure 1.4: Implications of circadian rhythm and alternative splicing in disease. Circadian rhythm and alternative splicing are influenced by external and intrinsic variations like mutations

21 and environmental variations. These variations can impart clock misalignment resulting in metabolic diseases, neurological disorders and cancer (Left). Mis-splicing or mutations in splicing factors can also result in neurodegenerative diseases and cancer (Right).

Among the post-transcriptional modifications that a nascent mRNA undergoes, alternative splicing (AS) is a critical process, which generates multiple protein isoforms with different biological functions from the same gene. Like CR, AS is also a wide-spread mechanism which is reported to affect >90% of the human multi-exon genes [65] and offers an evolutionary advantage to increase proteomic diversity in different cell-types and organ systems. Since AS is involved in the regulation of almost all the protein coding genes, it is not surprising that it has been potentially implicated in up-to-60% of genetic disorders either through mis-splicing of exons or dysfunctional splicing machinery [66] (Figure. 1.4). AS is regulated at multiple levels, including:

1. The regulation of the spliceosome machinery. The spliceosome is comprised of

proteins and small nuclear RNAs that mediate exon-exon splicing. These molecules

regulate each steps of exon-exon ligation (e.g., splice-site recognition, recruitment of

auxiliary proteins, lariat formation, cleavage at the 3' splice-site)

(https://www.wikipathways.org/index.php/Pathway:WP411).

2. The speed of RNA polymerase during transcription [67]

3. Epigenetic variation that causes changes in the chromatin structure, also known as

remodeling, resulting in changes in the availability of the transcript for transcription

and splicing [68].

Splicing is extensively involved in neuronal development and neurotransmission, through various impacts, such as protein alternation and expression regulation of calcium gated voltage channels and neurotransmitter receptors at the surface of the post synaptic region. Both receptor

22 and calcium ion channel genes undergo AS to produce protein isoforms with different functions

[69], [70]. Mutations causing aberrant splicing and splicing factor disruption are reported in various neurological disorders including; amyotrophic lateral sclerosis (ALS), Glioblastoma multiforme, Spinal muscular atrophy, Prader-will syndrome, Schizophrenia, Myotonic dystrophy,

Fragile x syndrome [71].

Among all these neurological disorders and others not noted here, fragile X syndrome has been also linked to CR modulation through altered mRNA binding/regulation. Fragile X syndrome is caused by mutations in the mRNA binding protein Fragile X mental retardation 1 (FMR1), which regulates mRNA localization and synthesis [72]. Patients with fragile X syndrome were reported to have sleep disorders and variations in melatonin levels. Though the cross-talk between these two mechanisms is obvious due to their prevalence, circadian post-transcriptional mechanism of control is not widely studied unlike its transcriptional counterpart.

1.4: Rationale behind this study

The interplay between the central molecular clock and other downstream pathways in the periphery is orchestrated by thousands of clock-controlled genes. Understanding the mechanistic control of these genes is essential for building an in-depth understanding of the circadian clock. Genomic analyses have yielded many potential chronotherapeutic targets, at the gene level. Two of primary mechanisms regulating gene expression are mRNA transcription and translation of mRNA isoforms to proteins. To obtain an improved understanding of the post-transcriptional control of circadian gene expression, it is necessary to understand both mRNA and protein-level regulation of CR regulated genes. In a study using DNA microarray, Duffield in 2003 [73] found that only

23 up to 10% of the genes undergoing rhythmic expression at steady state mRNA levels. These results were largely confirmed in an independent study using nascent-RNA sequencing of mouse liver

[43], which finds that 15% of all nascent RNA-transcripts are rhythmic, out of which only 42% display mRNA-level oscillation. This disparity suggests an undeniable role for post- transcriptional regulation which may underlie novel regulatory feedbacks.

Circadian splicing is the rhythmic alternative splicing of mRNA over a 24hr period. AS occurs immediately after transcription or simultaneously/co-transcriptionally and affect the translational efficiency, type, stability and localization of protein isoforms produced [1].

Transcripts undergoing rapid changes in their abundance are likely required to have varying half- life preferably shorter [1]. To establish such short-term rapid variations, AS can be a beneficial process to modulate cyclical changes in mRNA (Figure 1.5).

Ex-1 Ex-2 Ex-3

Isoform-1

Ex-1 Ex-2 Ex-3

Ex-1 Ex-2 Ex-3

Isoform-2

Ex-1 Ex-3

Figure 1.5: Schematic of circadian alternative splicing. Illustration of how cassette-exon splicing can produce two alternatively spliced isoforms which are preferentially expressed at different times of day; exclusion event during day and inclusion event during night.

24

In Neurospora crassa (N.c.) the rhythmic splicing of frq produce two isoforms; long and short FRQ isoforms [74]. It was shown that the ratio of these two isoforms is sensitive to temperature and essential for maintaining robust circadian rhythm. Altered rhythmic expression of two spliced isoforms of per, perA and perB’ in Drosophila have been shown to establish differences in CR patterns [75]. Expression of specific isoforms per in transgenic flies can rescue locomotor activity problems and demonstrate and interaction of mRNA regulation with the circadian clock. Significance of circadian splicing is further supported by an exon-array analysis of mouse liver upon food entrainment and fasting [76]. These authors investigated the regulation of circadian splicing in liver and experimentally evaluated the rhythmic exons in mouse liver, lung and kidney using qPCR. Circadian genes including the core clock molecules Npas2, Clock, Nr1d1 are also observed to be rhythmically spliced in liver in this study. These authors also discern a role for the central clock or systemic inputs from the local peripheral clock using Vipr2-/- mice. These analyses show that selected splicing events. are under the control of local clock while some are controlled by the central clock network. This is an intriguing piece of evidence that support the tissue-specific regulation of splicing to fine-tune tissue function. To our knowledge, this is the most extensive circadian splicing analysis conducted in mammalian tissues, with samples collected every 6hrs for 24hrs and analyzed using Affymetrix exon array. More recently, it was demonstrated that even the slightest change in the temperature can induce wide variation in the splicing pattern of rhythmically spliced genes [77]. This study further suggests that. rhythmic splicing is independently regulated by temperature without direct control of the clock and that this mechanism can be controlled by heat induced coupling of phosphorylation and dephosphorylation.

25 This work specifically identifies cold-sensitive splicing events from mouse intestinal fibroblasts, that could be validated in neuronal cells, mouse liver and cerebellum.

To develop a detailed understanding of the regulation of circadian splicing it is imperative to study the drivers of splicing: circadian SFs. SFs in general affects various molecular processes at RNA level such as splicing, mRNA stability, translation and protein localization. Several RNA binding proteins have been found to mediate mRNA stability of core clock genes, such as Per2,

Cry1 [78] and AANAT [79] by hnRNP I, D and Q respectively. Recent studies of circadian splicing regulators mutants in plant Arabidopsis thaliana have implicated a number of novel regulatory mechanisms. Prmt5 has been shown to alter the length of the period and resulting in the delayed expression of clock genes like Toc1, LHY, CCA1 and PRR9 [80]. Involvement of Gemin2 in the regulation of AS of Toc1, CCA1 and Prr9 [81] and SKIP in AS of PRR9 and other clock genes

[82] suggests the importance of RNA binding proteins in clock function. In mouse, Cirbp a temperature sensitive RNA binding protein is differentially regulated under the direct control of clock but it is also activated upon specific peripheral stimuli [76]. These data suggest the presence of multiple regulatory mechanisms, such as temperature, food and tissue specific photosensitive genes are involved in the orchestration of rhythmicity [83]. While these studies have revealed novel candidate splicing and mRNA regulators in diverse model organisms, the splicing regulatory control circuits have not been extensively defined in mammals, due to a lack of comprehensive transcriptomic and functional studies. However, recently produced RNA-Seq datasets, with increasing depth of sequencing, temporal resolution and profiled across tissues provide the potential for careful analyses of alternative splicing [44]. The potential of such dataset in terms of splicing analysis has not yet been explored, even by the recently reported isoform deconvolution study [84]. Importantly, while such studies implicate possible mechanisms of mRNA isoform

26 regulation that are under circadian control, there import concerns regarding the application of mRNA isoform deconvolution to infer alternative splicing impacts. Such inferences are made particularly problematic, when no exon/intron-junction specific analyses are employed and deconvolution predictions are derived from low-depth, single-end RNA-Seq data. Similarly, exon- array analyses themselves are likely to be limited in their ability to assess circadian splicing, due to both a lack of specificity and low information content probe information (depending on the platform used).

1.5 Objectives of the Thesis

1. To recommend reliable computational methods for circadian splicing analyses:

Circadian splicing analyses are emerging and there is no gold standard method for

analyzing large scale genomic/transcriptomic data to predict circadian splicing events

(CSEs). This thesis aims to perform a comparative evaluation of multiple datasets from

independent studies and multiple periodicity detection algorithms to detect consistent

CSEs.

2. To study the global impact of circadian splicing in mammalian tissues: To our

knowledge, there are no studies reported to investigate cross-species transcriptome level

circadian splicing analysis in multiple mammalian tissue. Such studies require RNA-

Sequencing data with support of individual exon-exon or exon-intron junction conserved

at a nucleotide-level across species. Similarly, there are no reports on the extent of circadian

splicing in mouse tissues and its significance in regulation of the organism’s clock

physiology. Understanding the potential mode of regulation for circadian splicing and

27 identifying which likely splicing regulators are mediating such events, is vital to

understanding the role of circadian splicing in tissue function at a gene and pathway-level.

3. To clarify the role of circadian splicing in cancer: Previously, human inferred circadian

expressed genes were found to be highly dysregulated in diverse cancers and in particular,

lung adenocarcinoma [85]. To determine if circadian splicing plays a central or peripheral

role in cancer cell homeostasis, progression or therapy resistance, similar studies are

required.

In this thesis, I have worked to clarify these research questions and establish a definitive role for the circadian control of alternative splicing in both healthy and disease states. This work further aims to provide guidelines for further studies of circadian splicing and computational web-based tools to independently interrogate circadian splicing across tissues and species by other researchers.

28 Chapter 2: Discovery of cancer splicing and associated auto- regulatory networks through cross-species circadian analysis

Authors

Krithika R Subramanian1,2, Stuart Hay2, Kashish Chetal2, Andrew Rosselot1, Audrey Crowther2,3,

Anukana Bhattacharjee2, Meenakshi Venkatasubramanian2,4, Marc Ruben5, Christian I Hong1,5*,

Nathan Salomonis1,2,3,4,6,*

1Department of Pharmacology and Systems Physiology, University of Cincinnati, OH, 45267

2Division of Biomedical Informatics, Cincinnati Children’s Hospital and Medical Center, OH,

45229

3Division of Immunology, Cincinnati Children’s Hospital and Medical Center, OH, 45229

4Department of Electrical Engineering and Computer Science, University of Cincinnati,

Cincinnati, Ohio, 45221, USA.

5Department of Pediatrics, Chronobiology Center, Cincinnati Children’s Hospital and Medical

Center, OH, 45229

6Department of Biomedical Informatics, University of Cincinnati, OH, 45267

29 2.1: ABSTRACT

Disruption of circadian rhythm can lead to serious sleeping disorders and predispose to a number of life-threatening diseases, including cancer. The circadian regulation of alternative splicing, including that of core-clock genes themselves, are speculated to be a central mediator of clock function; however, no comprehensive analyses of this mechanism exist to date. Here, we performed a rigorous and unbiased analysis of circadian splicing events (CSEs) within and across diverse healthy mouse, baboon and human tissues, spanning over a thousand samples to identify shared global trends and discrete regulatory mechanisms. Our analysis finds that unlike circadian gene expression, CSEs tend to be tissue-specific rather than shared across tissues, with the highest extent in brain. Surprisingly, close to 1/3rd of all mouse CSEs were conserved to primates when matching precise exon-exon junction positions. Many of these conserved CSEs were found to likely fine-tune the core-clock by altering amplitude of circadian and splicing regulators, in part through autoregulation and nonsense-mediated decay. Further, we find that disruption of circadian splicing is the principal hallmark of oncogenic-splicing in lung adenocarcinoma compared to matching tumor-adjacent controls. We enable broad exploration of these species-specific and - shared CSEs through an interactive web-resource called CircaSplice.

2.2: INTRODUCTION

The temporal rhythmic expression of genes plays an important role in tissue homeostasis and physiological adaptation in all organisms. Multiple regulatory mechanisms exist for both intrinsic and extrinsic regulation of circadian rhythm, including thermoregulation, feeding and light

30 entrainment [76, 86-88]. While systemic regulation of circadian rhythm in peripheral tissues is dependent on both neuronal and hormonal regulation from the master clock in the suprachiasmatic nucleus (SCN), tissue-specific activity of circadian rhythm requires robust peripheral circadian clocks regulating tissue-specific clock controlled genes, impacting diverse physiological pathways

[76, 89, 90]. Unlike circadian transcription, alternative mechanisms for circadian gene regulation, such as post-transcriptional regulation, are not well-established in mammals, leaving an incomplete picture of circadian gene regulation. In mammals, nearly all genes undergo alternative splicing to produce cell-type specific mRNA isoforms that can alter protein function, expression or localization [91]. A role for alternative splicing in clock regulation has been previously demonstrated in basic model organisms, such as filamentous fungi, Neurospora crassa. For example, in Neurospora, the rhythmic splicing of the core circadian regulator frq produces two isoforms, a long and short, the ratio of which is essential for maintaining robust circadian rhythms under temperature variations [74, 92, 93]. Similarly, the rhythmic expression of spliced isoforms of the core circadian regulator Per in Drosophila, affect mRNA abundance and PER accumulation, which acts as a key mediator of circadian rhythm [75]. More recently, emerging evidence suggests that rhythmic splicing tends to be tissue-specific (based on findings from liver), co-varies with circadian gene expression and may be regulated by systemic inputs like food and temperature variations, independent of central clocks [76, 87].

Alternative splicing is primarily regulated by RNA-binding proteins (RBPs), which consist of spliceosomal proteins and their regulators that mediate pre-RNA splicing through their expression, activity or localization. To demonstrate a role for alternative splicing in clock regulation, it is necessary to identify rhythmic splicing events and their splicing regulators whose expression or activity is mediated by the core clock or external cues. Multiple existing RBPs have

31 been implicated in circadian control of mRNA stability, including that of Per2 [94], Cry1 [78] and

AANAT [95] by PTBP1, hnRNPD and hnRNPQ, respectively, in rodents. Additional studies have demonstrated splicing regulators, such as Prmt5 [80], Gemin2 [81] and SKIP [82] link circadian rhythms and alternative splicing in plants. Importantly, the role of alternative splicing in the control of circadian gene expression could be more impactful than previously appreciated. For example, circadian splicing could in principle regulate in which tissues circadian genes are expressed or modulate isoform expression independent of transcriptional control. While prior studies have revealed novel candidate circadian splicing regulators in diverse model organisms, the underlying splicing regulatory control circuits have not been extensively defined in mammals, due to a lack of comprehensive transcriptomic and functional studies. However, recently produced RNA-Seq datasets, with increasing depth of sequencing, temporal resolution and multi-tissue data provide the potential for careful analyses of circadian splicing [40].

While a few mammalian studies provide circadian sampling of many diverse tissues with

RNA-Sequencing, these studies lack either sufficiently deep sequencing (>40 million, paired-end reads), sufficient depth of circadian sampling (every 6 vs. every 2hr) and/or performed using a restricted sampling period (24hr versus 48hr), that inherently limits the confidence of such results

[44, 96]. Such cross-tissue sampling becomes important for the identification of reproducible core- circadian splicing events. Execution of such study designs presents several challenges for the reliable detection of circadian splicing events. Firstly, splicing event detection requires exon-exon or exon-intron spanning reads, with sufficient depth of sequencing for event detection. Methods for isoform deconvolution are extremely problematic for such purposes, as such algorithms are restricted to known and typically limited isoforms, require very deep paired-end sequencing to gain additional confidence and frequently become inaccurate when too many isoform models are

32 supplied, or more than two isoforms are expressed in a sample [39, 97]. Nonetheless, full-length isoform sequencing remains cost prohibitive to be sufficiently sensitive to circadian variation of a

48hr period with sufficient sampling. While a recent study evaluated cross-tissue and cross-species circadian isoform regulation, the use of isoform deconvolution to infer splicing from extremely low depth, single-end, RNA-sequencing data, is an extreme concern in terms of the accuracy of such predictions [84]. Similar concerns potentially exist when using exon-based microarrays with limited features to infer circadian alternative splicing [84]. Further, while rhythmicity detection algorithms have been well-validated for use in circadian gene expression studies, their application in the evaluation of rhythmic splicing, in which missing values are frequent, remain largely untested, without clear benchmarks for their evaluation.

Here, we conduct an in-depth and rigorous meta-analysis of existing circadian RNA-Seq datasets in mouse, baboon and human tissues to define shared and tissue-specific splicing variation, putative regulators and impacted pathways. By assaying for both known and novel discrete splicing events from RNA-Sequencing, these analyses are able to identify diverse mRNA impacts and alternative splicing mechanisms not possible through conventional gene, exon or isoform deconvolution analyses [84]. Our approach for the identification of conserved circadian splicing is rigorous, in that we match rhythmic splice-junctions at a nucleotide-level between species (genomic coordinate liftover) rather than at the gene- or isoform-level, for analogous tissue-regions. Comparison of multiple bioinformatics tools for rhythmicity analysis indicate that low-temporal resolution RNA-Seq can be used to accurately assess circadian regulation of splicing as compared to high-resolution data, using well-established rhythmicity detection algorithms.

While circadian gene expression changes are largely common across tissues, circadian splicing events (CSEs) are primarily tissue-specific and typically do not covary with transcription.

33 Importantly, we find that distinct pathways are regulated by tissue- circadian splicing versus circadian gene expression. A common set of putative splicing regulators were found to be rhythmic across multiple tissues, most notably the cold-inducible RNA-binding protein (CIRBP), many of which show strong evidence of regulating their own splicing (e.g., CIRBP, FUS, SRSF5, TRA2A,

SFPQ). Such global splicing observations were partially validated in three independent ways: 1) postmortem human tissues using an in-silico tissue circadian inference workflow (CYCLOPS), 2) primate time-course RNA-Seq multi-tissue data, and 3) comprehensive knockdown in silico screening. These analyses highlight conservation of autoregulatory feedback loops, as evidenced from their cross-tissue/cross-species regulation and RBP knockdown. Surprisingly, human circadian splicing was found to be the dominant splicing signal in lung cancer, consistently perturbed in patients versus their matched controls. These cross-tissue/cross-species circadian splicing predictions are provided as an interactive online web-portal (CircaSplice - https://circasplice.cchmc.org) to enable broad reuse and in silico testing of new hypotheses (e.g., comparisons to disease).

2.3: MATERIALS AND METHODS

2.3.1: Evaluation Datasets To enable meta-analyses of circadian gene regulation across diverse tissues and species, a number of large datasets of varying size and complexity were analyzed in this work. First, multiple murine circadian transcriptomics datasets were evaluated independently and compared for consistent circadian and gene expression and splicing. Two of these datasets were obtained from a microarray and RNA-Seq study; GSE54650 - Affymetrix Gene ST 1.0 microarray with 2 hour (hr) sampling for 48hr across 12 tissues adult tissues (CT 18hr – CT 64hr); GSE54651 - matched paired-end

34 deep RNA-Seq dataset (FASTQ), with samples collected from 12 tissues, every 6hrs for 48hrs (CT

22hr – CT 64hr) [44]. To validate the use of different rhythmicity detection algorithms an additional deep paired-end RNA-Seq dataset (GSE73552) with samples collected every 2hr for

24hr (CT 0hr – CT 22hr) with 3 replicates was separated analyzed [98]. To identify conserved circadian splicing events (CSEs), a large non-human primate dataset (Papio Anubis) with shallow single-end RNA-Seq (GSE98965) was reanalyzed (FASTQ), with samples collected across 64 different tissue regions every 2hrs for 24hrs [96]. To identify conserved CSEs in human patients, exon-exon junction counts were obtained from the GTEx consortium (https://GTExportal.org) for

6 shared mouse tissues with sufficient evidence of reliable circadian pseudo-temporal ordering.

Two human lung adenocarcinoma deep paired-end RNA-Seq datasets were evaluated for circadian splicing signatures, with tumors and tumor matched adjacent controls were obtained from The

Cancer Genome Atlas (TCGA - dbGAP phs000178.v1.p1 and exon-exon junction counts from the

Broad Firehose database) with 517 tumors and 59 controls, and a Korean patient cohort

(GSE40419) with 77 matched tumor and control biopsies [99].

2.1.2: Algorithms and tools used for the analysis AltAnalyze is a cross platform designed to perform comprehensive end-to-end analyses for both gene expression and alternative splicing study designs. This software can perform this objective for both microarray and RNA-Seq based platforms [67]. The RMA algorithm, implemented in the

Affymetrix PowerTools is called by AltAnalyze for expression normalization of exon or junction- level probesets to calculate gene level and alternative exon-level results. Though exon and junction arrays include probeset spanning regions of well-annotation exons, gene array (e.g., Affymetrix

Gene 1.0 ST array) can also possess exon-level probes for a more restricted set of gene exons. For data from RNA-Seq, gene expression estimates are reported as non-log normalized Reads Per Kilo

35 Million bases (RPKM) values for all expressed exon-exon junctions of a gene or as TPM values

when using the Kallisto processing option (exon pseudoaligning reads). To calculate alternative

splicing estimates for known exons from exon-microarrays, AltAnalyze uses the FIRMA or

splicing index method to calculate differential exon expression relative to gene expression. For

this thesis, splicing-index values for gene array analyses were applied.

A B

Figure 2.1. MultiPath-PSI algorithm for estimation of alternative splicing. A. Example

formula for calculating Percent Spliced In (PSI) values from the software MultiPath-PSI. H is an

exon-exon junction of interest and junctions B, D, F, G, H are overlapping exon-exon or exon-

intron junctions used to calculate the PSI value. B. Relative performance of AltAnalyze compared

to other contemporary splicing analysis algorithms using precision and recall from the simulated

splicing data (http://altanalyze.readthedocs.io/en/latest/Algorithm).

To enable fast and accurate analyses of AltAnalyze uses the Percent Spliced In (PSI)

splicing method called MultiPath-PSI for RNA-Seq alternative splicing analyses. This algorithm

is introduced in AltAnalyze version 2.1.1, requiring aligned BAM files as input. Similar to other

recent reported local splicing variation (LSV)-based methods, such as MAJIQ

(https://majiq.biociphers.org) and LeafCutter (https://github.com/davidaknowles/leafcutter),

36 MultiPath-PSI considers the detected junctions within a restricted genomic interval for assessing local junction expression differences. Unlike these alternative approaches, MultiPath-PSI considers all known and novel exon-exon junctions in a sample or cell and computes its relative detection compared to the local background of all genomic overlapping junctions that can be directly associated with the given gene of interest (sharing at least one known gene-associated splice-site) (Figure 2.1A).

This calculation provides a more inclusive and conservative estimate of LSV. These same junctions are used to identify high confidence intron retention splicing events evidenced by pairs of exon-intron and intron-only mapping paired-end reads, sufficiently detected at both ends of a given intron (5 and 3). This more stringent algorithm requires that only counts for exon-intron spanning reads are reported, in which multiple exon-intron spanning reads are detected at both ends of the intron and a matching paired-end read contained entirely within the intron is present

(BAMtoExonBed module of AltAnalyze). PSI values are calculated for any junction with a minimum PSI difference of 0.1 (10% PSI) between any sample or cell. PSI values are only reported for samples or cells for a given splicing event in which sufficient read-depth is present (minimum of 20 reads per examined junction interval). Junctions are clustered into groups of unique junction clusters when reporting the results to identify redundant splicing events. Unique junction cluster

IDs are determined by examining the overlap in exon-exon and exon-intron junctions for a given gene to identify connected subgraphs in the exon-exon/exon-intron network.

To ensure the approach is accurate, it has been benchmarked this approach against several recent reported methods for local splicing variation analyses (MAJIQ, LeafCutter, rMATS)

(Figure 2.1B). A gold-standard to evaluate prediction accuracy and error rates, we developed a simulation RNA-Seq dataset derived from known and novel isoform predictions for pluripotent

37 stem cells (PSC) and in vitro derived day 30 cardiomyocytes (CM). To create this simulation dataset, we first predicted known and novel isoforms and expression estimates using Cufflinks for biological triplicate samples (PSC and CM). The Cufflinks isoform GTF and expression values

(isoforms.fpkm_tracking) were supplied as input for the software Polyester to produce simulated

RNA-Seq reads at a depth of 100 million paired-end reads using the software default parameters

7. These subsequent FASTA files were supplied as input for genome alignment (hg19) with the software STAR to produce BAM files with read-strand predictions. The associated aligned sequencing data was evaluated with each analysis methods and benchmarked against isoform- derived alternative splicing events (exon-exon and exon-intron junctions) comparing CM and PSC.

Supervised group comparison analyses with MultiPath-PSI are performed using an FDR corrected

(Benjamini-Hocherg) empirical Bayes moderated t-test P-value < 0.05. For each method, a ranked list of splicing events was ordered according to the empirical score or p-value reported by each algorithm for all reported events. Precision and recall curves were generated in MATLAB.

Comparison of unique splice-junction clusters from each algorithm (determined from the junction- graph of junction genomic coordinates output from each method) to these gold-standards, indicated that MultiPath-PSI has the greatest overall accuracy and lowest error rates, based on precision and recall estimates. These same trends were reproduced when considering each specific primary mode of splicing regulation (cassette-exon, alternative splice-site, intron retention), with intron retention showing the greatest gains in precision and recall from MultiPath-PSI.

JTK-Cycle

JTK cycle is a popular non-parametric algorithm used to identify circadian or rhythmic elements from large-scale gene expression datasets [100]. The JTK algorithm is based on two independent

38 methods Jonckhree-Terpstra (JT) test and Kendall’s Tau (KT) test. The JT test is a rank-based ordering model that identifies trends in a dataset. KT is a rank order correlation test used to determine the correlation coefficient for pairs of ordered samples. JTK cycle uses both of these algorithms to test for group ordering. Statistical significance is reported as p-values adjusted for multiple testing using Bonferroni correction and Benjamini-Hochberg False discovery rate (q value). JTK cycle finds elements based on the user defined period and identifies the optimal combination of period and phase that minimizes the exact p-value of KT correlation between compared time series.

ARSER

ARSER is an autoregressive (AR) spectral estimation algorithm to identify rhythmic patterns in a dataset using a harmonic regression-based model [101]. ARSER applies a detrending strategy to a timeseries to remove any linear trends and identifies the optimal period between 20 and 30 by using AR spectral analysis with the help of frequency. It then applies a harmonic regression (HR) based-model to fit a sinusoidal wave patterns to the time series tested. The reported amplitude is the difference between a peak and a trough of the fitted curve. Phase is the peak expression of a specific genomic feature. Based on the phase, period and amplitude, ARSER determines the rhythmicity and reports a q value or FDR.

Harmonic regression:

Harmonic regression (HR) and Fourier transformation are traditionally applied in the field of circadian biology to identify oscillating elements and their patterns. HR is useful in circadian analyses where features are expressed in vary in their rhythmic patterns. HR designates sinusoidal

39 waveforms of different wavelengths to a time series [102]. To generate, harmonic appropriate sine and cosine transformations are applied and regression is performed to fit a curve to the tested time series. The optimal period is obtained from the frequency of the peak expression and phase from the peak expression. Both p and q values are reported for HR R package.

RAIN (Rhythmicity Analysis Incorporating Nonparametric methods)

RAIN is also a non-parametric algorithm applied to detect periodicity time series data. RAIN is a freely available R/Bioconductor package [103]. The authors claim it to be robust for circadian analyses as it attempts to address 2 main limitations in other algorithms; noise of the circadian data is assumed to follow a gaussian distribution, which is not the case with biological data most of the time and Fourier transformation of the. circadian data can generate a saw-tooth type or sharp wave forms which may not be detected by regular sinusoidal or cosinor curve fitting. RAIN can detect both symmetric and asymmetric waveforms of any period. RAIN group orders the timeseries samples into raising and falling parts and apply an umbrella alternative to rhythmicity. Unlike other periodicity detection algorithms RAIN does not care about the shape of the rising or falling part thereby it can principally detect waveforms of any shape.

2.1.3: Transcriptomic Analyses

For splicing predictions, RNA-Seq FASTQ files were aligned to the reference genome (mm10,

Panu_3.0/papAnu4) and transcriptome using the STAR software. The resulting genome aligned

BAM files were processed in the software AltAnalyze (version 2.1.1) to obtain exon-exon and exon-intron spanning reads and subsequently calculate percent spliced-in values, determine the mode of alternative splicing event and predicted protein-level impacts using the local-splicing

40 variation approach MultiPath-PSI (see: http://altanalyze.readthedocs.io/en/latest/Algorithms for algorithm details and algorithm benchmarking) [104]. This algorithm calculates an inclusive

Percent Spliced In (PSI) ratio, based on all genome overlapping exon-exon and exon-intron junctions associated within the evaluated gene. In parallel, gene expression was quantified from

Affymetrix Gene 1.0 ST arrays (CEL files) using the RMA algorithm through AltAnalyze. This array platform contains exon-level probes which are used to quantify gene expression (constitutive exons) and alternative exons (all exons in the transcript definition model). Alternative exons were quantified using the default previously described splicing-index method [105]. For murine RNA-

Seq splicing analysis, only splicing events with no missing values were considered for downstream analyses. For baboon analyses, following PSI (Percent Spliced In) calculation, if two of the 12 time-points had missing values with adjacent time-point PSI measurements, the missing time-point value was imputed (average of the adjacent PSI values). Out of the 64 baboon tissue regions evaluated, only one (Iris) was excluded as it possessed incomplete timepoints. To predict circadian splicing events from human tissues, pre-processed exon-exon junction read-counts were obtained from the Genotype-Tissue Expression database (GTEx) version 7 (https://GTExportal.org). The

GTEx database provides RNA-Seq data from over 1,000 healthy individuals across more than 40 different tissues from men and women between 20 and 80 years of age [106]. We restricted these analyses to human tissue regions with that could be reliably phase ordered using CYCLOPs [107] and matching tissues in mouse (aorta, heart atrium, liver, lung, brown adipose and white adipose).

A custom python script was used to convert the junction counts and coordinates to the junction.bed format produced by TopHat and other transcriptome alignment tools for direct processing in

AltAnalyze. Supervised alternative splicing analyses were additionally performed for 473 RBP short-hairpin RNA knockdowns in HEPG2 and K562 cell lines from ENCODE, 14 hematopoietic

41 cell-type comparisons and 23 lymphoblast cell line RBP knockdowns as previously described

[104]. RBP knockdown differential splicing was assessed using MultiPath-PSI, using the default empirical Bayes moderated t-test p<0.05 and delta PSI > 0.1. TCGA LUAD and matched control exon-exon junction counts were obtained from the Broad Firehose database and provided as input to AltAnalyze for PSI quantification (Ensembl 72 database). LUAD and matched controls from an independent Korean patient cohort, were analyzed directly from FASTQ files using the same workflow applied to mouse and baboon samples for splicing quantification and analysis. LUAD alternative splicing events were determined using a paired t-test, FDR adjusted using the

Bonferroni method and a delta PSI of > 0.1. Since these splicing events were compared to human

CYCLOPS predictions without intron-retention quantification, intron-retention splicing events were excluded from the Venn Diagram comparisons. For human and mouse analyses, Ensembl 72 definitions were used in AltAnalyze (EnsMart72 database) and for baboon Ensembl 96 definitions were used (EnsMart96). PSI Violin plots were produced in GraphPad Prism, with all other plots produced in AltAnalyze, ggplot or Excel.

2.1.4: Periodicity Analysis

The resulting time-course RNA-Seq and array time-course data were analyzed using 3 independent periodicity detection workflows: MetaCycle (ARSER, JTK cycle) [108], RAIN [103] and

Harmonic regression (HR) [102]. For datasets with multiple replicates, genes or spliced exons cycling in at least one replicate are used for comparisons. Because of the differences in the time resolution of different datasets, depth/quality of sequencing and variable missing values in the PSI splicing values different statistical significance cut-offs were pre-selected for different datasets.

For the evaluation of circadian expressed genes (CEGs) from high-resolution time-course data

42 (microarray) and the Atger liver RNA-Seq dataset, a MetaCycle estimated Benjamini-Hochberg

FDR value q-value<0.05 was considered significant. For low resolution RNA-Seq analyses (six- hour resolution or low-depth single-end RNA-Seq), an alternative MetaCycle threshold was applied (p<0.05) to detect rhythmic gene-expression of alternative splicing. The same thresholds were selected for the HR algorithm which reports p and q values. As RAIN does not report an FDR corrected test, we required a reported p <0.05 for both CEGs and CSEs. Because human GTEX samples have no information on the time of tissue collection, they were ordered in a pseudo- temporal manner based on the machine learning algorithm Cyclic Ordering by Periodic Structure

(CYCLOPS). CYCLOPS estimates phase for each sample based on the expression of core clock genes (e.g., Clock and Bmal1). Samples ordered by estimated phase are analyzed with Cosinor regression (modified R package) on this pseudo temporally ordered tissue data to identify significantly rhythmic splicing events (p<0.05) [107]. Circadian phase for downstream analyses was obtained directly from the MetaCycle default output. Phase differences greater than 4 hours were considered as out-of-phase and less than 4 hours were considered in-phase comparing CSEs to matching CEGs.

2.1.5: Conserved Splicing Event Analysis

Genomic positions of mouse CSEs (exon-exon or exon-intron junctions) were lifted over to human or baboon using the appropriate genome conversion chain files in the UCSC genome browser website (http://genome.ucsc.edu/cgi-bin/hgLiftOver). For each CSE, the primary exon-exon or exon-junction coordinates in addition to the secondary (most frequently expressed alternative junction) coordinates were compared between organisms, where both splice-sites for at least one

CSE junction was required to match for each splicing event to indicate a valid orthologous splicing

43 event. Comparison of tissue specificity for each CSE was performed using a custom python script, matching reasonably associated tissue regions between the two species (e.g., mouse cerebellum with any brain region in baboon) to broadly assess tissue conservation of circadian splicing

(convertPSIConservedCoordinatesToBED function in AltAnalyze’s clustering.py module).

Pathway Enrichment and Network Analyses

Pathway enrichment analyses are performed within GO-Elite from AltAnalyze for

(GO) terms [109]. Inferred regulatory and protein interaction networks were produced using the

NetPerspective algorithm in AltAnalyze [110]. NetPerspective uses interactions from

WikiPathways, KEGG and HMDB, experimentally derived transcription factor targets, annotated drug-protein interactions, microRNA target predictions and speculative protein interactions from

BioGRID.

Data Availability

All analyzed dataset are publicly available as noted in the above sections, with all major intermediate results files provided in a dedicated web repository at https://www.synapse.org/#!Synapse:syn20833776/files/. These files include the direct output of

MetaCycle, RAIN, Harmonic Regression, AltAnalyze and the cross-species genomic comparative analyses for the evaluated datasets.

44 2.4: RESULTS

2.4.1: Reliable detection of circadian splicing from time-course RNA-Seq data As a relatively unbiased survey of circadian splicing, our study aimed to perform a primary survey in 12 murine tissues with both microarray and deep RNA-Seq, verify the use of specific algorithms against complementary RNA-Seq data (liver) and subsequently identify conserved mouse CSEs in both baboon and human (Fig. 2.2A). For baboon analyses, an existing impressive collection of 64 tissues with 12 circadian time-points, collected from shallow single-end RNA-Seq provides the potential means to evaluate circadian splicing. Although not sufficiently sensitive for unbiased discovery of CSEs, these data would be expected to enable the detection of conserved CSEs present in at least one of many tissues. For these analyses, we quantify alternative splicing using a previously described accurate local splicing variation approach (MultiPath-PSI), for its ability to directly measure alternative splicing from overlapping exon-exon or exon-intron junctions, rather than conflate such estimates with problematic isoform predictions [104]. Because this method is unbiased, it will identify known and novel alternative splicing and alternative promoter events that are conserved between species when comparing junctions at a genome coordinate level. As recently demonstrated, circadian gene regulation can be inferred from very large postmortem patient datasets in which the circadian phase of each sample is estimated from the expression of core circadian rhythm regulators, such as PER, CLOCK and BMAL1 [107]. In particular, the

GTEx sample cohort which contains thousands of postmortem samples for 35 tissues enables pseudo-temporal ordering of samples to infer circadian gene regulation. While these samples were deeply sequenced, donor and postmortem interval effects coupled with circadian estimation are likely to limit the specificity of such data for unsupervised detection of CSEs, these data should enable the reliable identification of conserved events.

45 In principal, circadian splicing can be detected from high-resolution (e.g., every two hours, for 48 hours) or low-resolution time-course data (e.g., every 6 hours for 48 hours), from RNA-Seq or splicing-sensitive microarray data. A previously described rich mammalian multi-tissue circadian time-course dataset includes both low and high-resolution data from two potential splicing sensitive-platforms, to evaluate both platform and resolution effects [44]. This dataset includes transcriptomics data collected from 12 mouse tissues, sampled every 6 hours by RNA- seq and every 2 hours by DNA microarrays. Prior to rhythmic gene and splicing detection, we reprocessed all raw data using a single computational platform (AltAnalyze), to produce splicing and gene-expression estimates from Affymetrix Gene 1.0 ST array and RNA-Seq platforms.

Given a lack of global gold-standard CSEs, we evaluated both the technical platform and time-scale resolution as variables using an independent deeply sequenced and high-resolution time-course with biological replicates in liver (Atger et al. 2015) [98]. Since the Atger dataset only profiled liver, we used this dataset as a comparative benchmark for determining whether the Zhang microarray dataset with 24 samples per tissue versus the Zhang RNA-Seq with only 8 samples per tissue should be used to define CSEs across tissues. While a number of periodicity prediction algorithms have been developed, they are principally used for gene-level analyses, rather than alternative splicing. To assess the performance of different methods for circadian splicing, we compared multiple recently reported rhythmicity detection algorithms: 1) MetaCycle (ARS, JTK),

2) Harmonic Regression (HR) and 3) RAIN [102, 103, 108]. Because the well-validated periodicity algorithm ARSER in MetaCycle is unable to account for biological replicates unlike

HR and RAIN, the union of significant reported circadian features in separate replicates were considered for the Atger dataset. While the tool MetaCycle can apply multiple periodicity algorithms, the Lomb-scargle algorithm is recommended for irregular time-course data and data

46 with multiple replicate time series. Though the Atger liver dataset has 4 independent time-courses, for a fair comparison we excluded the application of the Lomb-scargle algorithm. We find that for circadian gene expression, MetaCycle, HR and RAIN produce generally similar percentages of overlaps between the three evaluated liver RNA-Seq and array datasets, with RAIN predicting the most circadian genes with the highest number of overlapping genes (913 genes between all three circadian datasets), but also possessing the lowest percentage of overlapping (27% versus 56% with HR comparing Atger to Zhang microarray) (Fig. 2.2B, Fig. S1A). As previously noted, compared to the Zhang RNA-Seq, the Zhang microarray data offers more power to detect circadian gene expression differences due to increased time-resolution (2hr versus 6hr sampling over 48hr).

47 Figure 2.2. Circadian splicing analysis within and across diverse mammalian models. A)

Study design overview. Principle tissues and species are indicated with lines from each denoting tissues profiled during either circadian time-course RNA-Seq studies (mouse and baboon) or computationally inferred (human). Varying circadian temporal resolution for different datasets is indicated (e.g., 8 versus 12 time-points, over a 24- or 48-hour time-course respectively). B)

Comparative analysis (Venn Diagram) of circadian gene expression in two mouse liver time- courses (Zhang et al. and Atger et al.) for three different rhymicity detection algoriths (MetaCycle,

Harmonic Regression (HR), RAIN) and two measurement types (RNA-Seq, Affymetrix Gene 1.0

ST). The number of shared circadian gene expressed genes are indicated. C) Equivalent comparisons for circadian alternative splicing events (exon-level) between platforms and datasets.

These events are compared for shared alternative exons, rather than splicing-junctions, for comparison to gene array exon results. D) Comparison of phase ordering of mouse liver CSEs predicted in Atger et al. and viewed through Zhang et al. RNA-Seq for matching splicing events.

E) Overlapping algorithm predictions within a single liver time-course (Atger) for circadian gene

(left) and alternative splicing events (right).

When these rhythmicity algorithms were applied to circadian splicing predictions from all three datasets, Atger RNA-Seq produced the largest number of CSEs, three times greater than that of the Zhang RNA-Seq (Fig. 1C). We attribute this difference, again to the increased time- resolution of this dataset, as both Atger and Zhang RNA-Seq apply deep paired-end sequencing.

Importantly, these overlaps were not random, as the predicted circadian phase distribution of splicing events was similar between the Atger and Zhang predictions (Fig. 1D). In the Atger dataset, a comparable number of overlapping splicing events were identified by RAIN, HR and

MetaCycle, similar to circadian gene expression (Fig. 1E, Fig. S1B). From these analyses, the

48 most striking observation is the low overlap of identified alternatively spliced exons by the Gene

1.0 ST array of Zhang dataset, with the Atger dataset. Specifically, while 27% of CSEs from RNA-

Seq analyses could be confirmed between Atger and Zhang, less than 2% (35 rhythmically spliced exons) could be confirmed when comparing the Zhang array to Atger RNA-Seq predictions with

MetaCycle. We attribute the low overlap to the restricted exon-region repertoire of this microarray, which in this setting appears to identify unsupported events with detrimental sensitivity.

To test this interpretation, we compared all heart samples to cerebellum using the array or

RNA-Seq samples (non-circadian comparison). Only 6% of the tissue-specific alternative exons from the array analysis could be confirmed by the RNA-Seq when considering all exons, which increased to 28% when only comparing exons with alternative cassette-exon annotations (Fig.

S1C). Comparatively, circadian gene expression predictions were found to be more similar between Atger RNA-Seq and Zhang array datasets, than Zhang and Atger RNA-Seq, suggesting improved detection of circadian gene expression with greater time resolution. Hence, we find increased time-resolution (2hr versus 6hr sampling) improved the detection of mutually confirmed circadian expressed genes (CEGs) by over 3-fold (Zhang array versus RNA-Seq) and CSEs by over 4-fold (Atger RNA-Seq versus Zhang RNA-Seq). While both MetaCycle and HR had a similar extent of overlapping splicing events in Zhang and Atger, is it notable that MetaCycle identified more prior validated CSEs (qPCR) from an independent study (6 out 17 CSEs) (Fig.

S1D). Given these results, we proceeded with analyses of Zhang RNA-Seq for splicing and gene- arrays for CEG detection, using the MetaCycle.

49

2.4.2: Distinct tissue and phase-dependent impacts of circadian splicing

To understand the global impact of circadian splicing at a systems-level, we compared its occurrence and specificity versus that of circadian gene expression among tissues. Application of

MetaCycle to all 12 mouse tissue regions identified evidence of circadian splicing in 3,971 genes, corresponding to 8,510 splicing-events (5,902 event-clusters). As previously demonstrated, genes essential for proper function of the central clock are rhythmic themselves in most tissues at the level of gene transcription (e.g., Per1, Per2, Cry1). However, when we perform the same analysis for circadian splicing, we find that one third of all genes essential for maintaining circadian rhythm

(Mouse Phenotype Ontology) or that are known to regulate circadian gene expression (Gene

Ontology) are predicted to undergo circadian splicing in at least one tissue, including Per1, Per2,

Cry1 Cry2, Arntl and Prmt5 (46 out of 128) (Table S1). Interestingly, rather than a unified mechanism of alternative splicing for these core regulators, distinct modes of alternative exon regulation were observed. In the case of Per2 (cerebellum, adrenal), a novel alternative 3’ splice- site (4bp from the defined splice acceptor) in the third-to-last exon is predicted to result in nonsense-mediated decay (NMD) of the transcript at two of the examined time-points, accounting for ~15% of isoforms expressed (based on reciprocal junction expression). Per1 (brown fat adipose

(BFAT), cerebellum) and Cry1 (lung) were found to undergo alternative promoter regulation, Cry2

(cerebellum) and Per1 (adrenal) had a variable extent of intron-retention and alternative cassette- exon inclusion of Arntl (hypothalamus) (Fig. 2.3A, Fig. S2A). Generally, however, the global

50 extent of splicing-event types was similar between tissue regions, with cassette-exon alternative splicing representing the dominant mode of regulation (43-59% per tissue) (Fig. 2.3B).

Consistent with the original mouse study by Zhang et al., our analysis of all 12 mouse tissues indicate the same rank ordering of tissues for circadian gene-expression impacts, with the highest number of CEGs in liver, kidney and lung and lowest in the brain (cerebellum, hypothalamus and brainstem) (Fig. 2C). However, unlike circadian gene expression, the tissues with the lowest extent of circadian gene expression (neuronal) had the highest extent of circadian splicing, with the lowest extent of circadian splicing in BFAT and liver. While it has been previously shown that splice-isoform diversity is greatest in neuronal tissues [111], it is not clear whether this inherent increase isoform diversity accounts for increased brain CSEs. Furthermore, while the large majority of CEGs were shared among tissue regions (75%), less than half of the

CSEs were present in more than one tissue (37%) and less than 3% in 5 or more tissues (Fig. 2D).

CSEs in genes associated with RNA binding were among the frequently found in all tissues examined (Table 1, Table S2).

51

Figure 2.3. Circadian splicing impacts distinct tissues and augments circadian gene expression. A) SashimiPlot of rhythmic splicing of core circadian regulatory genes (Per1, Per2) from mouse adrenal gland. Curved lines indicate exon-exon splice-junctions with indicated read- counts at selected circadian time-points. Yellow indicates the predicted circadian alternative exon regions. The corresponding annotated gene-exon structure of representative isoforms are shown below the plot along with the AltAnalyze annotated exons. B) Global distribution of CSEs by

52 alternative-exon/intron event-type (AltAnalyze) in each evaluated tissue. C) Number of CSEs and

CEGs for all analyzed mouse tissues, ranked in decreasing order of specificity for circadian splicing versus gene expression (MetaCycle). D) Tissue-specificity of the observed CSEs versus

CEGs, considering splicing events or genes that are rhythmic in one or more than one tissue. E)

CSEs broken-down according to those with one alternative exon (grey) versus less than four

(green) or greater than three. F) The overlap of CSEs and CEGs is shown. Liver and kidney have the large percentage of CSEs that occur in genes also evidenced to undergo circadian gene expression regulation. G) Probability density of all observed CSEs and CEGs (overlapping and not overlapping) at distinct circadian temporal phases (0-24hr circadian time). H) Comparison of the phase of each coincident CEG and CSE. CSEs are divided into those considered in-phase (<4hr peak difference) with its corresponding CEG and out-of-phase (>4hr peak difference). Out-of- phase CSEs may represent inclusion or exclusion-junctions, hence, the pattern can indicate synergistic or antagonistic effects. I) Example CSEs that induce NMD and their phase relationship to that of their own gene expression. The PSI values for the indicated splicing event is shown in red, along with whether that splicing event indicates the NMD- or coding-isoform). J) SashimiPlot of the CSEs from panel J.

Protein- # CSE # CEG Symbol Alt-Exons Predictions Tissues Tissues Pisd-ps1 E2.1 UNK 10 0 Slmap E11.1|E12.1 alt-coding 10 0 Ktn1 E39.1 alt-N-terminus 10 0 Cd200 E3.4 NMD 9 8 6720401G13Rik E9.1 alt-N-terminus 9 3 Tardbp novel retained_intron 9 3 Eif4h E5.2 UNK 9 0 Hnrnpc E2.7 5'UTR 8 3

53 Fubp1 E3.1 truncated 8 4 5830417I10Rik E6.1 UNK 8 1 Hnrpdl E6.8 UNK 8 1 Ubp1 E9.1 alt-coding 8 0 App E8.1 alt-coding 8 0 Immt E5.7 alt-coding 8 0 Hnrnph1 E12.1 alt-N-terminus 7 3 Phldb1 E21.1 alt-N-terminus 7 4 Hnrnpa1 E8.1 alt-coding 7 5 Srsf5 I6.1 retained_intron 7 4 Gtf2i E17.2 alt-coding 7 1 Taz E4.16 UNK 7 2 Tpm3 E9.2 alt-C-terminus 7 2 Tra2a I2.1 retained_intron 7 5 Srsf10 E3.1 alt-N-terminus 7 2 Prrc2c E36.1 alt-C-terminus 7 3 Tnxb E18.1|E19.1 alt-coding 7 3 Fip1l1 E11.1 alt-coding 7 1 Tmpo E7.1|E8.1 alt-coding 7 2 Mat2a I8.1 retained_intron 7 3 2210408F21Rik E8.1 alt-N-terminus 7 0 Gm3435 I6.1 retained_intron 7 0

Table 1. The most common cross-tissue murine CSEs

It has been previously speculated that tissue-specific regulation of circadian splicing may augment the phase and amplitude of circadian transcripts, as a means of transcriptional fine-tuning of circadian rhythm [76]. Additionally, circadian regulated genes have been shown to express more than one known mRNA isoform compared to that of non-circadian genes [44]. While short-read

RNA-Seq does not directly quantify isoform expression, we do find that 14-24% of the rhythmically spliced genes in all the mouse tissues have at least 2 exons alternatively oscillating

54 in a temporal fashion, with cerebellum showing the highest relative number of exons per CSE gene

(24% of CSEs) (Fig. 2E). While liver and kidney have a relatively high percentage of genes that were jointly alternatively spliced and differentially expressed in a circadian manner (43% and

34%), the majority of genes with CSEs did not overlap with CEGs (Fig. 2.3F). Hence, in the majority of tissues, CSEs are not shared with CEGs, with liver being a notable exception.

Phase/acrophase is the time at which peak expression of a CEG is observed. CEG phase shifts have been reported in different pathological conditions, implicating phase regulation as a crucial component of a functional clock [112, 113]. Importantly, we find that CSEs and CEGs, by and large have distinct peak phase distributions (Fig. 2.3G). However, CSEs and CEGs are both found highest around the same times of the day, particularly closer before subjective dawn and dusk. We observe this pattern in all the tissues except heart, kidney and liver. These three organs have two small peaks during the day suggesting their physiological functions could be entrained multiple times by other systemic inputs such as feeding. Noticeably, when CEGs also undergoing circadian splicing are compared for phase differences, 60-83% of these CSEs show greater than

4hr differences with the peak time of circadian expression which may be due to differences in the coupling of rhythmic synthesis and degradation of mRNA (Fig. 2H). Many such in-phase and out- phase CSEs were found to likely induce NMD when circadian gene expression is lowest, to enhance rather than offset circadian gene expression (Fig. 2I, J). We specifically illustrate Cirbp here, as it is the most frequent cross-tissue RBP CEG, present in 11 out of 12 tissues. These data support a model in which splicing variation primarily impacts cassette-exon splicing, affecting multiple isoforms per tissue that in turn favors fine tuning of gene expression, particularly in neuronal cells with inherently greater isoform complexity.

55 2.4.3: Circadian splicing regulates distinct pathways from circadian gene expression

To understand the biological impact of diverse tissue CSEs, we performed a Gene

Ontology enrichment analysis in each of tissue regions and compared these to CEG enriched terms

(Fig. 2.4A,B). These analyses highlight predominantly distinct biological classes of circadian regulated genes, with few shared terms between CSEs and CEGs (e.g., transcriptional regulation, actin cytoskeleton, synapse). We find that while, unfolded protein response genes are unique to

CGEs, CSEs are commonly enriched in pathways mediating splicing control, mRNA binding and chromatin regulation. We find that many RBP genes are expressed in a circadian manner across most mouse, including those previously demonstrated to regulate temperature (Cirbp, Rbm3 and

Hspa8), with many of these rhythmically spliced in multiple tissues (Table 1,2). Among these

Cirbp, was previously shown to be alternatively spliced in response to temperature changes [114].

56

Figure 2.4. Circadian splicing selectively impacts spliceosome and cancer regulators. A, B)

Bubble chart of enriched Gene Ontology terms for CEGs (A) and CSEs (B). Increasing bubble size corresponds to increasing GO-Elite enrichment z-scores. Blue terms = shared between CSEs and CEGs. Red = unique to CSE or CEG. C, D) Correlation heatmap of the gene expression profile of the top (frequency) common-tissue RNA binding proteins (RBPs) against either the top common-tissue CSEs in cerebellum (C) or corresponding RBP CSEs in the different indicated tissue comparisons (D). In panel D, the gene expression vector across the analyzed time-points for each indicated tissue is compared to the PSI vector in the same tissue or a different tissue based on

57 the matching time-points, as each gene and each splicing event are not necessarily circadian regulated in the same tissues. E) Protein-protein interactions between core cancer (Myc), circadian gene networks and common tissue CSEs (found in >4 tissues). Core circadian genes (green fill),

Myc (red fill) common cross-tissue CSEs (sky blue fill) and shared interactors (grey fill). RNA- binding proteins with putative roles in splicing regulation are indicated with purple labels. Red edges = Myc interactions, green edges = core circadian gene interactions. Red node outline indicates genes with known cancer implicated alternative splicing events. Green node outlines are annotated regulators of CR.

As an orthogonal analysis of circadian splicing regulation, we analyzed cross-tissue CSEs for the statistical enrichment of prior defined splicing factor binding sites. For this purpose, we used a previously described cross-species CLIP-seq RNA target database, POSTAR 2.0 [115].

Analysis of the top cross-tissue splicing events, oscillating in 5 or more tissues, identified enriched motifs for RBPs involved in thermoregulation and cancer therapy response (Cirbp, Rbm3) [116] and neurological disease (Tardbp and Fus) [117-119] (Table S3). Notably, all of these were also considered circadian regulated across tissues and in particular brain, at the level of gene expression

(Fig. S3A, B). As a means to identify potential target relationships of these RBPs, we directly compared their gene expression to the top cross-tissue CSEs PSI profiles in cerebellum (Fig. 2.4C).

Notably, Cirbp and Fus are correlated with the expression of the majority of the cross tissue CSEs in almost all tissues, followed by Hspa8, Mbnl2 and Rbm3. In general, CSEs and circadian RBPs segregated into two populations, suggesting they are coregulated. This same pattern is observed when each RBP was compared directly to its own CSEs in a tissue-specific manner, suggestive of autoregulatory splicing feedback of these RBPs with increasing or decrease gene expression (Fig.

2.4D). To identify possible connections to the core circadian control machinery and core cancer

58 networks, we constructed gene-gene and transcription factor-target interaction network comprised of common tissue mouse CSEs, core circadian regulators and Myc interactions (Fig. 2.4E). Several of these genes are prior described regulators of circadian rhythm (Cirbp, Rbm3, Immt), interact with these components (Csnk1 and Ddx17), had noted splicing events associated cancer or share protein interactions with Myc (Hnrnpc, Ddx17, App, Sqstm1, Max).

Circadian splicing Mm (12 tissue Baboon (63 tissue factors region) regions) Cirbp 11 13 Snw1 9 11 Rbm45 9 18 Fus 9 15 Rbm3 8 15 Srsf5 7 15 Srsf1 6 7 Rbm10 4 24 U2af2 2 22 Table 2. Conserved cross-tissue transcriptionally regulated circadian splicing factors.

2.4.4: Conserved circadian splicing from mouse to primates

Our finding that circadian splicing has a distinct pattern of tissue specificity than from circadian gene expression, begs whether such observations are biologically valid or simply an artifact of low-resolution time-course data. To determine whether these findings are supported in an independent circadian time-course, we applied this same computational workflow to RNA-Seq data collected at 2hr intervals, for 24hrs across 64 baboon tissues. While this shallow sequenced dataset does not offer excellent sensitivity to detect CSEs, we used these data to confirm our global and event-level findings. As no prior alternative splicing analyses have been performed in this non-human primate, we first defined global gene and splicing differences across all tissues. A

59 marker gene analysis (MarkerFinder algorithm in AltAnalyze) finds that tissue-specific markers are consistently expressed between tissue replicate samples in baboon, with shared gene markers found more often in related lineages (Fig. 2.5A). Although a similar pattern of tissue-specific splicing was observed in these data, relative to gene expression, improved tissue-specific markers were found in most regions and in particular the brain (Fig. 2.5B, Table S4). For example, one of the most specific splicing events for baboon cerebellum was found in the gene CADPS2, which encodes for cerebellar specific isoform that are mis-spliced in some autism cases due to an exon deletion in the gene [120] (Fig. S4A,B).

Using MetaCycle we identified 6,622 unique rhythmic oscillating events across all baboon tissues. While thyroid had both the greatest number of CSEs and CEGs, bone marrow and lymph nodes possessed the greatest relative circadian splicing versus gene expression, with spleen, kidney and specific brain regions possessing the lowest (Fig. S4C). MetaCycle results show that specific brain regions have a high occurrence of circadian splicing relative to gene expression, in particular amygdala, olfactory bulb, pineal gland and cerebellum (among the top 10 ranked CSE tissues). All of these brain regions feed daily sleep-wake rhythm to SCN via distinct pathways. On the contrary, other brain regions like paraventricular nuclei (PVN) putamen, pons and prefrontal cortex have lowest circadian splicing to gene expression ratios. Circadian splicing was largely tissue-specific, with a relatively small percentage of common tissue CSEs (3.7% in 5 or more tissue regions and

34% in at least two regions), comparable to the results from mouse. These results support our finding in mice that circadian splicing is biased towards specific neuronal regions and relatively low in kidney and liver (Fig. 2.5C).

60

Figure 2.5. Circadian splicing is conserved across species and tissues. To confirm that consistent alternative splicing events can be identified in Baboon, unique tissue-specific genes (A) and alternative splicing events are indicated for 64 baboon (Papio anubis) dissected tissue regions, using the MarkerFinder algorithm in AltAnalyze. C) Relative Magnitude of CSEs versus circadian

61 expressed genes (CEGs) in baboon and mouse for the best matched tissues. D) An example conserved baboon-mouse CSE is shown for the gene TCERG1, by matching exon-exon junction genomic coordinates between species (UCSC genome browser liftover). E) All conserved CSEs

(n=796) across all 64 analyzed baboon tissues, with CSEs matching between equivalent or adjacent tissue regions (e.g., mouse brain stem and baboon pons) indicated as “tissue-conserved”. CSEs conserved across multiple mouse and baboon are considered as “multi-tissue conserved”. F)

Protein-protein and protein-DNA interaction network (NetPerspective in AltAnalyze) of tissue- conserved CSEs. Background color indicates manually curated tissue, cancer or common biological trends, identified through gene-set-enrichment (ToppFun). Red arrows indicate predicted transcriptional regulatory interactions while grey arrows indicate gene-activation

(WikiPathaway, KEGG).

To identify splicing events and cross-tissue circadian networks that are likely indicative of conserved functions across mammals, we aligned genomic coordinates from these 64 primate tissue regions to those observed in mouse. Surprisingly, 29% of baboon CSEs were conserved to mouse (n=1,900), with 25% of these occurring in similar tissue types (e.g., Muscle-gastrocnemius in baboon and skeletal muscle in mouse) (n=473), with 6% of these conserved CSEs present in multiple matched baboon and mouse tissues (n=116, cross-tissue conserved) (Fig. 2.5D,E, Table

3, Table S5). Similar to mouse, such cross-tissue conserved CSEs again were most highly enriched in mRNA processing, alternative splicing and notably neurodegenerative diseases (e.g., synapse formation, neurofibrillary tangles (MAPT, APP, HNRNPA1, FUS), thermoregulation (CIRBP,

RBM3), and cancer/splicing promoting gene networks (PRMT5/WDR77 implicated) [121].

Further, many of the same RBPs that were rhythmically expressed across mouse tissues were similarly expressed across baboon tissues (RBM45, CIRBP, RBM3, FUS, SRSF5, SNW1, FIP1L1).

62 In addition, CSEs for many of these same RBPs were observed in baboon (CIRBP, RBM10,

HNRNPA1, FUS, SRSF5, SRSF3, U2AF2, HNRPNPD, PRPF6 and RBM42), suggesting a more complex mode of gene regulation. Network analysis of conserved multi-tissue CSEs indicates that

CSEs tend to produce RNA-binding and neuronal disease regulatory hubs, with core and tissue- specific splicing networks as the principally protein-interaction hubs (Fig. 2.5F).

Matching Murine # # & Predicted Baboon Murine Baboon Exon- Protein CSE CSE CSE # Matching Murine and Baboon Symbol Region Impact Tissues Tissues Tissues CSE Tissues Heart, Hypo, Cere, Adrenal, Kidney:Heart, Medial-globus- HNRNPA1 E7.1|E8.1 alt-coding 20 7 5 pallidus, Adrenal-cortex, Optic- nerve-head, Kidney-cortex, Prefontal-cortex Heart, BFAT, Cere, BS:Heart, retained- MAT2A I8.1 10 6 4 Pineal-gland, Dorsomedial- intron hypothalamus, Omental-fat WFAT, BFAT, Hypo, Cere:White- retained- RBM3 I3.1 13 6 4 adipose-pericardial, Ventromedial- intron hypothalamus WFAT, Hypo, BFAT, APP E13.1 alt-coding 10 6 4 Kidney:Omental-fat, Pons, Kidney- medulla Adrenal, Cere, Hypo, BS:Lateral- globus-pallidus, Adrenal-medulla, BAG6 E24.1 alt-coding 14 5 4 Habenula, Amygdala, Adrenal- cortex WFAT, BFAT, BS:Cerebellum, CSNK1A1 E6.1 alt-coding 7 6 3 White-adipose-mesenteric, White- adipose-subcutaneous Hypo, Liver, BS:Liver, Optic-nerve- UBTF E9.1 alt-coding 5 3 3 head TIAL1 E7.1 NMD 6 4 3 Hypo, Cere, BS:Habenula alt-C- Adrenal, Cere, Liver:Putamen, Liver, RPS24 E6.1 8 3 3 terminus Amygdala, Adrenal-cortex Hypo, BFAT, BS:Lateral-globus- SPTAN1 E56.1 alt-coding 3 4 3 pallidus, Omental-fat alt-C- Hypo, Cere, BS:Medial-globus- SHTN1 E17.1 2 3 3 terminus pallidus, Optic-nerve-head WFAT, BFAT, BS:Cerebellum, SMARCA2 E32.1 alt-coding 2 3 3 Omental-fat Hypo, BFAT, BS:Ventromedial- retained- SRSF5 I8.1 24 5 3 hypothalamus, Paraventricular- intron nuclei, Pons, Pineal-gland,

63 Dorsomedial-hypothalamus, Omental-fat, Amygdala, Prefontal- cortex WFAT, Cere, BS:White-adipose- TCERG1 E23.1 NMD 7 6 3 pericardial, Optic-nerve-head Hypo, Cere, Lung:Cerebellum, PCBP2 E11.1 alt-coding 6 6 3 Habenula, Lungs, Pons Adrenal, Cere, Hypo:Habenula, CAMK2G E19.2 alt-coding 4 4 3 Adrenal-cortex

Table 3. Top ranked conserved cross-tissue murine-baboon CSEs.

2.4.5: Circadian splicing mediates splicing factor autoregulation and is a hallmark of lung cancer

In light of our findings that CSEs are conserved across species, we decided to further investigate if these findings were conserved in human and specifically involved in cancer. To gain additional support for cross-species conserved CSEs, we extended our comparative analysis studies to 6 human tissues, with hundreds of evaluated samples produced through the GTEx consortium. As the time of death in these postmortem derived RNA-Seq samples is unknown, we relied on an existing powerful predictive computational approach, called CYCLOPS [107], to infer circadian time-of-day for each sample within a 24h period. CYCLOPS is a machine learning algorithm that predicts the pseudo-temporal ordering of patient tissue samples based on the oscillatory expression patterns and phase of known pairs of defined circadian genes which forms an ellipse in a 24h temporal space. This approach is used with cosinor regression, rather than MetaCycle due to the presence of irregular timepoints in the pseudo-temporal ordered data [122]. Analysis of six mouse- matched tissues, with sufficient CYCLOPS confidence for temporal ordering, identified a total of

5,403 unique exon-exon junction clusters (12,138 CSEs) in 3,819 genes (Table S6). Genomic liftover of these splicing events to mouse again confirmed conservation of CSEs in splicing factors,

64 also observed in baboon (Table S7). Several of the observed cross-species RBP CSEs were observed in multiple human, mouse and baboon tissues, including splicing factors (SRSF5,

SRSF3), regulators of polyadenylation (FIP1L1) and alternative C-terminal exon inclusion

(TCERG1). Among these six tissues, the extent of murine conserved CSEs ranged from 6% to

26%, when only considering CSEs occurring in the same mouse and human tissues (Fig. 2.6A).

Comparing the human conservation results to mice, we find a high concordance with murine conserved baboon CSEs (>50%), arguing that these are valid conserved CSEs with close to 1/3rd of all detected mouse CSEs conserved to human or non-human primate (Fig. 2.6B).

65

Figure 2.6. Human circadian splicing is associated with autoregulation of splicing factors and cancer splicing variation. A) Conserved circadian splicing events (CSEs) from mouse to human, inferred from GTEx consortium postmortem samples ordered by CYCLOPS, relative to all predicted human CSEs. CSEs are subdivided by those shared between human and mouse or distinct to each, based on computationally assigned tissue-specificity. SC Fat=subcutaneous fat.

The percentage and number of shared CSEs are shown in red. B) Global conservation of circadian splicing for events shared in mouse and human (Mm-Hs) or mouse and Papio anubis (Mm-Pn).

66 C) A proposed model of circadian splicing factor autoregulation by splicing factor protein expression (negative feedback loops). D) Example of potential autoregulation of Srsf5 by intron retention (Sashimi Plots). Two time-points of mouse cerebellum RNA-Seq (Zhang) are shown on the left and SRSF5 knockdown (human HEPG2 cells, ENCODE) with matched controls on the right. E) RNA binding proteins (RBP) with evidence of autoregulation. Violin plots of percent spliced-in (PSI) values from the MultiPath-PSI algorithm in human RNA-Seq ENCODE RBP knockdown (KD) cell lines. At least two replicates per KD were performed for all RNA-Seq KD comparisons, with all examples significant by empirical Bayes t-test p<0.05. F) t-SNE plot of all detected PSI splicing-event values (<10% missing values per event) in lung adenocarcinoma

(LUAD) patient samples and matched tumor adjacent controls from the TCGA cohort. G)

Alternative splicing events from LUAD versus tumor adjacent control MultiPath-PSI splicing estimates in the TCGA and Korean cohorts, based on a paired t-test analysis (patient-control matched), FDR corrected (Bonferroni), compared to CYCLOPS Healthy lung CSEs, p<0.05. H)

Statistical enrichment of the TCGA tumor versus control splicing events (-log 10 p-value, GO-

Elite Fisher Exact Test), against 500 alternative splicing signatures from ENCODE RBP KD, lymphoblast RBP KD, hematopoietic cell population comparisons and CYCLOPS Healthy Lung

CSEs. The number of signatures is shown on the x-axis.

Our cross-species circadian splicing data suggests a model in which higher RBP expression feeds back to regulate the alternative splicing of their own transcripts, either directly or indirectly

(Fig. 2.6C). To more explicitly evaluate whether the predicted CSEs that occur within RBPs are indeed autoregulated, we re-analyzed existing RNA-Seq data from recent RBP knockdown (KD) studies for potential evidence. Because the KDs are partial and are not targeted to the predicted autoregulated exons/introns, KD should effectively model a decrease in RBP gene expression

67 [123]. Differential splicing analysis of ENCODE cell lines (K562 and HEPG2) finds direct support for autoregulation of the same CSEs (genomic coordinate matched) for CIRBP, FUS, SRSF5,

SRSF1, FIP1L1, U2AF2, TRA2A, SFPQ, RBM39, PRMT5 and HNRNPC (Fig. 2.6D, E).

Importantly, several of these splicing factors have been previously shown or predicted to undergo autoregulation [124-128]. It is noteworthy, that these data predict distinct mechanisms of regulation for different RBPs, with some appearing to promote NMD with increased RBP gene expression (e.g, SRSF5, CIRBP, U2AF2) and others repressing with increased RBP gene expression (e.g., TRA2A, FUS).

As noted, cross-tissue CSEs from our analyses impinge upon a number of pathways critical in cell differentiation, cell growth, tumorigenesis and metastasis, with 31 out of our 38 cross tissue mouse CSEs or associated genes previously evidenced to have oncogenic role in diverse cancers

(Table S8). To test the hypothesis that CSEs play an important role in human cancer biology, we performed an addition RNA-Seq splicing analysis on two large independent lung adenocarcinoma

(LUAD) cohorts, both of which with tumor biopsies and matched histologically normal adjacent controls (TCGA, Korean cohort) [129, 130]. We focused on LUAD because it is a leading cause of cancer-related deaths worldwide. For these analyses, we considered paired supervised comparisons. Presumably, the tumor control biopsy can be considered normal healthy tissue, isolated at different times of day. As such, each lung biopsy would be expected to express a different repertoire of circadian genes and splicing events dependent on the time it was isolated, with the tumor and adjacent isolated proximal to each other. Of note, due to the size of these control datasets, phase information cannot be reliably predicted, hence we looked for global splicing differences in tumor versus the adjacent controls. Dimensionality reduction of all detected TCGA splicing events shows a clear separation of all LUAD tumors and controls (Fig. 2.6F). Comparison

68 of alternative splicing in matching tumors and adjacent controls to CSEs in lung, finds that lung

CSEs are broadly and consistently dysregulated in LUAD (22-32%), suggestive of global disruption circadian rhythm (Fig. 2.6G, Table S9). While this cancer overlap was frequent, we asked whether it was highly enriched and specific to circadian splicing in lung (Fig. 2.6H). To do so, we compared all LUAD TCGA splicing events to those collected from 500 alternative splicing comparisons from a prior compiled database of RBP splicing factor knockdowns (ENCODE, gene expression omnibus), cell-type specific signatures and CYCLOPS CSEs. This analysis amazingly shows that lung cancer alternative splicing events are exceedingly specific to lung CSEs, demonstrating circadian splicing represents the dominant signal in lung cancers. Interestingly, when these shared alternative splicing were analyzed in the iLINCS web-portal for cancer therapeutic response signatures, we find that lung cancer associated drug responses comprise 4 out of 5 of the top associated gene-expression signatures, suggesting these splicing events may represent therapeutic targets (Table S10).

2.4.6: Interactive navigation and exploration of cross-tissue and cross-species circadian splicing To enable broad exploration of both individual species and cross-species predicted circadian splicing events, we created an interactive online web-portal called CircaSplice for the research community (https://circasplice.cchmc.org). This interface has two main navigation windows: 1) an interactive heatmap browser and 2) a splicing-event level viewer to examine individual CSE patterns and periodicity detection statistics, genome coordinates and orthology. This interface allows users to explore CSEs within specific tissues, to re-examine CSEs from one tissue to another or from one species to another and export the associated results.

69

Figure 2.7. CircaSplice web-portal. Interactive cross-species and cross-tissues circadian splicing web interface.

2.5: DISCUSSION

The role and impact of circadian splicing in mammals has remained poorly understood, due in large part to a lack of validated computational techniques for analysis, sufficiently “deep” evaluation datasets and an understanding of what events and processes are conserved. The work presented here attempts to overcome existing deficiencies and clarify these roles through the application of a rigorous integrated bioinformatics approach. Our results implicate the auto- regulation of circadian splicing factors in the control of rhythmic splicing and broadly in cancer.

As noted, a challenge in these studies is the question of how reliable circadian splicing predictions are when the underlying data is suboptimal. For this reason, we have focused our

70 comparative analyses on an existing cross-tissue murine time-course due to the increased depth of sequencing and rely on more speculative predictions from baboon and human as a form of secondary and tertiary validation. These analyses have been designed to be rigorous, by requiring that both exon-exon and exon-intron junctions map exactly between genomes and that associated predictions also match at the level of associated tissues. For these reasons, our analyses are expected to favor specificity over sensitivity.

Our results importantly shed new light on the role of tissue-shared and species-conserved circadian splicing events. Previously, liver exon-array analyses with confirmatory qPCR demonstrate that selected liver CSEs do not tend to be regulated across tissues [76]. Our data support the finding that CSEs are largely tissue-specific, when comparing CSEs from across dozens of tissues. In this same prior study, liver CSEs were found to covary with CEGs; however, our analysis finds that liver is actually an outlier in this regard, with most CSEs occurring independent of CEGs. Further, this prior study as well as others [131] find that CSEs can alter circadian phase and amplitude, which this current study finds evidence for as well.

Through comparative analyses with existing large perturbation references, such as RNA

ENCODE, we have been able to identify novel candidate regulatory mechanisms for the control of circadian splicing, namely splicing RBP autoregulation and NMD. Such RBP autoregulation was also previously shown in plant to impact the oscillatory behavior of target transcripts [131].

Interestingly, these data may highlight a previously unknown sources of variability (diurnal variation) in the expression of splicing factors measured in prior in vivo and in vitro studies at different times of day [124]. Our analysis further highlights important and potentially clinically actionable targets for cancer therapy through the modulation of key circadian pathways, but also raises questions regarding the precise relevance of such signals in a tumor. For example, it is

71 entirely feasible that broad alterations in circadian splicing in lung cancer represent a lack of systemic circadian synchronization and/or dysregulation of CEGs that may alter a tumors ability to respond to standard chemotherapeutics. Importantly, these data set the stage for more extensive autoregulatory and cancer regulatory validations, enabled through this approach.

A secondary but important utility of this work is the identification of optimal computational workflows that can be successfully used to tease out complex circadian alternative splicing regulation and mechanisms. In particular, we find that while a number of rhythmicity detection algorithms produce relatively consistent observations from CEGs, more variable CSE predictions are observed with different computational workflows. While we relied on the software MetaCycle

(ARS) for these studies, such an approach is not optimal for studies with replicate time-course measurements. Even when such replicates are present, without sufficient depth of sequencing and proper circadian time-courses (48hr), results from such studies will remain speculative, when meta-analyses are not performed. Importantly, we show that unbiased and sensitive splicing- detection approaches are needed to resolve complex circadian splicing differences, as exon- microarray based predictions could not resolve CSEs mutually detected from deep RNA-Seq analyses. Indeed, a substantial portion of detected CSEs were in novel exons or retained introns that will not be typically detected by microarrays or deconvolution analyses.

While our human analyses were limited to tissues in which reliable pseudo-temporal ordering could be obtained (Materials and Methods), the analyses presented here can clearly be extended to other non-diseased tissue collections as well as distinct cancers, not considered here.

Such work will require improved human circadian predictions for cancer control tissues or the use of conserved splicing events obtained from model organisms, to determine relevant circadian splicing impacts. Finally, given the breadth and complexity of circadian splicing in different tissues

72 and organisms, a detailed exploration of these findings required the creation of new online tools to be able to sufficiently query CSEs at both global and individual event levels. For this reason, we have created an online computational platform that can query CSEs within and across tissues and/or species. Such an approach has significant potential when ultimately coupled with comparative multi-omics assessments and gene regulatory mechanisms. We hope to extend the features of this platform in the future to enable such higher order insights and to delve deeper into the regulatory impact of circadian splicing in both healthy and diverse disease states.

ACKNOWLEDGMENTS

We thank John J. Hogenesch for his critical discussions and input related to this work. This work was funded in part by the University of Cincinnati Systems Biology Graduate Program and funding from the National Institutes of Health R01CA226802 (NS), R21CA227379 (CIH), and

R01DK117005 (CIH).

Author contributions: The manuscript was written by NS, KRS, CIH and AB. Analyses were performed/advised by KRS, AR, AC, MK, MR, NS. The website was created by SH and KRS.

73

Figure S1. Detection of circadian splicing event predictions from diverse rhythmicity algorithms and technological platforms. CEG (A) CSE (B) predictions in 12 mouse tissues

(Zhang et al.) from three different computation workflows (MetaCycle – ARS-JTK), Harmonic

Regression (HR) and RAIN. C) Common tissue cassette-exon alternative splicing events shared between RNA-Seq and Gene 1.0 ST array comparisons from Zhang et al (analysis in AltAnalyze).

74 D) Comparison of matching qPCR predictions for each of the evaluated algorithms in cerebellum from Zhang et al.

Table S1 – Circadian regulators and their splicing events in tissues expressed.

. Circadia n regulator # CSE # CEG s AltExons Tissues Tissues CSE Tissues CEG Tissues Ad|Ao|BF|BS|Ce|He|Hy|Ki|Li|Lu|S Per1 E2.1 2 12 BF|Ce M|WF Ad|Ao|BF|BS|Ce|He|Hy|Ki|Li|Lu|S Per2 E21.1 1 12 Ad M|WF Ad|Ao|BF|BS|Ce|He|Hy|Ki|Li|Lu|S Per1 E23.1 1 12 Ad M|WF Ad|Ao|BF|BS|Ce|He|Hy|Ki|Li|Lu|S Usp2 E6.2 3 12 Ad|SM|Ao M|WF Ad|Ao|BF|BS|Ce|He|Hy|Ki|Li|Lu|S Usp2 E3.1 2 12 SM|He M|WF Ad|Ao|BF|BS|Ce|He|Hy|Ki|Li|Lu|S Nr1d1 E5.1 2 12 BF|WF M|WF Ad|Ao|BF|BS|Ce|He|Hy|Ki|Li|Lu|S Nr1d2 E6.1 1 12 BS M|WF Ad|Ao|BF|BS|Ce|He|Hy|Ki|Li|Lu|S Arntl E3.1|E4.1 1 12 Hy M|WF Ad|Ao|BF|BS|Ce|He|Hy|Ki|Li|Lu|S Dbp E1.4 1 12 Ad M|WF Ad|Ao|BF|Ce|He|Hy|Ki|Li|Lu|SM| Bhlhe41 E5.2 2 11 Ad|Ce WF Ad|Ao|BF|BS|Ce|He|Ki|Li|Lu|SM| Cry1 E1.1 1 11 Lu WF Cry2 E3.2 1 9 Ce Ao|BF|Ce|He|Ki|Li|Lu|SM|WF Gtf2ird1 E27.1 4 7 WF|Hy|He|SM Ao|BF|BS|Ce|He|Li|SM Gtf2ird1 E24.3 3 7 Ad|Hy|SM Ao|BF|BS|Ce|He|Li|SM Gtf2ird1 E31.1 3 7 Hy|Li|Lu Ao|BF|BS|Ce|He|Li|SM Sfpq E9.1 3 6 Hy|Ao|SM Ad|BS|Ce|He|Hy|WF Dtnbp1 E7.1 1 5 WF Ad|Ao|BF|Ki|Li Dtnbp1 E1.1 1 5 He Ad|Ao|BF|Ki|Li Csnk1e E10.2 3 4 BF|WF|He Ao|Ce|He|Lu Ppara E4.1 1 4 Ao BF|Ki|Li|WF Ogt E2.1 3 3 Hy|Ki|Ce BF|He|Li Hnrnpu I4.1 5 2 Ad|Hy|SM|BS|Lu Hy|Ki Kdm1a E3.1 5 2 SM|He|Ki|BS|Li BS|Lu

75 Ppp1cc I7.1 3 2 Li|Ao|Lu BS|Li Fmr1 E15.2 2 2 Ad|Ce Ki|WF Kdm1a E10.1 2 2 Hy|Ce BS|Lu Csnk1d E13.1 2 2 Ad|Ki Ki|Lu Pml E6.4 2 2 Ao|Lu Li|SM Fmr1 E13.1 1 2 Ki Ki|WF Rora E9.1 1 2 Ce Li|Lu Hnf1b E4.1 1 2 Ki Ki|Li Rai1 E5.2 1 2 Ad Li|WF Rai1 E8.1 1 2 Ad Li|WF Ncor1 E41.1 4 1 He|Ao|Ce|Lu Ki Ncor1 E32.3 4 1 Ad|He|Ki|BS Ki Ncoa2 E15.1 3 1 Li|Ki|Ce Li Fbxl3 E3.4 3 1 He|BS|Ce Ki Prmt5 E2.1 2 1 Hy|BS Li Ncor1 E25.2 2 1 Ki|Lu Ki Ncor1 E49.1 2 1 Ao|Ce Ki Crem E15.2 1 1 Ce Li I22.1_462162 Sik3 10 1 1 Li WF Nr2f6 E4.7 1 1 Li Ki Rab3a E1.5 1 1 Ad Lu Ncam1 E25.1 1 1 Ao Hy Ncor1 E22.1 1 1 Ce Ki Ncor1 E47.1 1 1 Ce Ki Ncor1 E37.1 1 1 Lu Ki Ncor1 E45.1 1 1 Ao Ki WF|Li|Ce|BF|He|Ki|L App E8.1 8 0 u|Hy U2af1l4 E1.31 6 0 Ao|BS|Ad|Ce|He|Ki Rbm4 E1.8 3 0 Ad|He|Li Mta1 E4.1 3 0 Ad|WF|Li Mta1 E16.1 3 0 Ad|WF|Ce Gria2 E18.1|E17.1 3 0 Hy|BS|Ce Cadps2 E36.1 3 0 Hy|Ce|Lu Fyn E14.1|E13.1 2 0 WF|Lu Mycbp2 E53.1 2 0 BS|Ce Adcyap1r 1 E15.1 2 0 BS|Ce App E16.1 2 0 BF|Lu

76 Huwe1 E61.1|E62.1 1 0 Ki Huwe1 E79.2 1 0 BS Huwe1 E24.1 1 0 BS Rbm4 E3.1 1 0 WF U2af1l4 E3.1 1 0 WF Usf1 E3.1 1 0 Ao Mycbp2 E57.1 1 0 Ce Mta1 E9.1 1 0 Ad Kdm2a E17.2 1 0 Ao Adcyap1r 1 E1.5 1 0 BS Cadps2 E20.1 1 0 Lu Cadps2 E27.1 1 0 Ce

77

Figure S2. Rhythmic splicing of core circadian genes. A) SashimiPlots illustrating alternative promoter regulation of Cry1 or differential cassette-exon inclusion of Arntl.

Table S2 – Cross tissue CSEs in 5 or more mouse tissues

# CSE # CEG Symbol AltExons Tissues Tissues Pisd-ps1 E2.1 10 0 Slmap E11.1|E12.1 10 0 Ktn1 E39.1 10 0 Cd200 E3.4 9 8 6720401G13Rik E9.1 9 3 Tardbp I8.1_32335111 9 3 Eif4h E5.2 9 0 Hnrnpc E2.7 8 3 Fubp1 E3.1 8 4 5830417I10Rik E6.1 8 1 Hnrpdl E6.8 8 1 Ubp1 E9.1 8 0 App E8.1 8 0 Immt E5.7 8 0 Hnrnph1 E12.1 7 3 Phldb1 E21.1 7 4 Hnrnpa1 E8.1 7 5 Srsf5 I6.1 7 4

78 Gtf2i E17.2 7 1 Taz E4.16 7 2 Tpm3 E9.2 7 2 Tra2a I2.1 7 5 Srsf10 E3.1 7 2 Prrc2c E36.1 7 3 Tnxb E18.1|E19.1 7 3 Fip1l1 E11.1 7 1 Tmpo E7.1|E8.1 7 2 Mat2a I8.1 7 3 2210408F21Rik E8.1 7 0 Gm3435 I6.1 7 0 P4ha1 E10.1 6 11 Rbm3 E2.3 6 7 Pcbp2 E12.1 6 3 Epb4.1 E19.1 6 2 Mapt E3.1 6 3 Dalrd3 I9.1 6 2 Hnrnpc E8.1 6 3 Nt5c2 E6.1 6 2 Fmo1 E10.1 6 9 Srsf5 E1.11 6 4 Kif21a E33.1 6 1 Map4 E23.1 6 2 Gpx8 E2.1 6 2 Sidt2 E15.1 6 4 Gngt2 E4.1 6 1 Rps24 E6.2 6 1 Foxk2 E10.1 6 3 Ddx17 E12.2 6 5 Luc7l E3.1 6 1 Kifc3 E5.1 6 3 Tcerg1 E23.1 6 1 Rab6a E6.1 6 1 Clk4 E4.1 6 1 Hdac5 E6.1 6 1 Prpsap2 E5.1 6 1 Son E11.1 6 1 Mrpl24 E1.5 6 2 Zfp207 E8.1 6 2 Dctn6 E4.2 6 0 Tbc1d9b E22.1 6 0 Entpd4 E8.1 6 0 Srsf11 E6.1 6 0

79 Gtpbp5 E3.1 6 0 Eif4g2 E15.1 6 0 Tial1 E7.1 6 0 U2af1l4 E1.31 6 0 Srsf6 I3.1 6 0 Csnk1a1 E5.1 6 0 Tbp E2.3 6 0 Fam193b I6.1_55549962 6 0 Rnps1 E2.1 6 0 Hsf1 E9.1 6 0 Ctage5 E8.1 6 0 Fam115a E13.2 5 1 Cirbp E5.5 5 11 Tef E5.1 5 11 Clpx E6.1 5 6 Prr13 E2.5 5 6 Snx14 E16.1 5 5 Dnm1l E4.2 5 4 Ppap2a E3.1|E2.1 5 4 P2rx4 E9.1 5 3 Tmem134 E4.4 5 1 Mbnl2 E12.1 5 7 Mprip E26.1 5 5 Snrk E5.2 5 5 Wbp1 E2.3 5 3 Asph E12.1 5 4 Sqstm1 E7.4 5 2 Dnajc5 E7.1 5 3 Hp1bp3 E4.1 5 1 Camk2g I16.1_20741378 5 3 Camk2g E14.1 5 3 Max E2.1 5 3 Camk2d E17.2 5 3 Sars I10.1 5 2 Mef2d E1.4 5 3 Ei24 E7.1 5 2 Prpf4b E20.1 5 4 Kdm1a E3.1 5 2 Tjp1 E28.3 5 2 Arhgef1 E15.2 5 2 Ube2j1 E7.1 5 2 Ces1d E10.1 5 4 Cox19 E3.1 5 2 Rapgef1 E12.2|E13.1 5 3

80 Herc4 E17.1 5 1 Hnrnpu I4.1 5 2 Tardbp I7.3 5 3 Srsf7 E7.2 5 1 Lsm14a E6.1 5 1 Aftph E8.1 5 1 Tusc3 I10.1_39157350 5 1 Rabggtb E4.1 5 2 Abcd4 E11.2 5 2 Atp11a I31.1 5 2 Ablim1 E27.3 5 2 Add1 E4.2 5 2 Ubqln1 E8.1 5 2 Srsf3 E4.1 5 1 Aplp2 E7.1 5 1 Agfg1 E12.1 5 1 Abi1 E9.1 5 1 Luc7l2 E4.6 5 1 Gnas E8.2 5 1 Zcrb1 E4.1 5 1 Hnrnpr E3.2 5 1 Azi2 E8.4 5 1 Clk4 I7.1 5 1 Cdk8 E7.1 5 1 Hnrnpl E9.1 5 2 Myl6 E6.1 5 2 Ly6e E2.1 5 3 Med15 E11.1 5 2 Snrnp70 I8.2 5 1 Osbpl9 E16.1 5 1 E4f1 E10.2 5 1 Bag6 E24.2 5 0 Srp54b E16.1 5 0 Ubap2l E28.1 5 0 Pank2 E6.2|E7.1 5 0 Itgb3bp E4.1 5 0 Gpbp1 E6.1 5 0 Ip6k2 E6.1 5 0 Pacsin3 I3.1 5 0 Lrrfip2 E4.1 5 0 Rbm7 E2.1 5 0 Ap2a1 E18.1 5 0 Tmem234 I2.1 5 0 Rhot1 E20.1 5 0

81 Gas5 E4.19 5 0 Ilf3 E5.2 5 0 Tmem87b E12.1 5 0 Epb4.9 E6.1 5 0 Pcnp E2.6 5 0 Nbr1 E20.2 5 0 Cr1l E2.3 5 0 Gtpbp5 E10.3 5 0 Slc39a13 E6.2 5 0

Figure S3. Circadian splicing profile of RNA-binding proteins in mouse cerebellum. A) Phase ordered CSEs displayed as a heatmap from mouse cerebellum at the indicated RNA-Seq time- points. B) Matching gene-array phase ordered CEGs corresponding to RBPs in cerebellum

82 Table S3. RBPs targeting Cross tissues CSEs from POSTAR 2.0

Splicing event Targeting RBPs Cd200 Celf1,Celf4, Fus,U2af2 Cirbp Cirbp, Fus, Celf4, U2af2 Csnk1a1 Celf1, Celf4, Cirbp,Fus,U2af2 Ddx17 Celf1, Celf4, Cirbp, Fus, Fmr1, U2af2, SRSF2 Eif4g2 Celf1, Celf4, Cirbp, Fus,SRSF1,SRSF2, U2af2 Eif4h Celf1, Celf4, Cirbp, Fus,SRSF1,SRSF2, U2af2 Fam115a Celf4 Fus, U2af2 Fubp1 Celf1, Cel4, Cirbp, Fus, U2af2 Hnrnpc Celf4, Cirbp, Fus, U2af2 Ktn1 Celf1, Cirbp, Fus, SRSF2, U2af2 Nbr1 Cel4, Cirbp, Fus, SRSF1, SRSF2, U2af2 Prrc2c Celf4, Cirbp, Fus, SRSF1, SRSF2, U2af2 Rab6a Celf4, Cirbp, Fus, Fmr1, SRSF1, U2af2 Fubp1 Celf1, Cel4, Cirbp, Fus, U2af2 Rbm3 Fus, SRSF1, SRSF2, U2af2 Sqstm1 Celf1, Celf4, Cirbp, Fus, SRSF1, SRSF2, U2af2 Srsf10 Cirbp, Fus, SRSF1, SRSF2, U2af2 Srsf5 Celf4, Cribp, Fus, SRSF1, SRSF2, U2af2 Tardbp Celf4, Cirbp, Fus, SRSF1, SRSF2, U2af2 Ubp1 Celf1, Celf4, Cirbp, Fus, SRSF1, SRSF2, U2af2 Mat2a Celf4, Cirbp, Fus, SRSF1, SRAF2, U2af2

83

Figure S4. Tissue and circadian splicing events predicted from diverse baboon organ systems. SashimiPlots of predicted tissue-specific CSEs in two baboon brain regions, cassette- exon inclusion in CADPS2 in cerebellum (A) and repression of intron-retention in TSPOAP1 in amygdala (B) from the software MarkerFinder. C) Ranked baboon tissue regions with the highest number of CSEs relative to CEGs.

84

Table S5: Top 50 CSEs conserved in Mm and baboon (tissue matched to same or related tissues)

# Matching Murine and # Baboon # Murine Baboon Pn Symbol CSE Tissues Mm Symbol CSE Tissues CSE Tissues HNRNPA1:E7.1|E8.1 20 Hnrnpa1:E2.7 7 5 MAT2A:I8.1 10 Mat2a:E2.7 6 4 MAT2A:E8.1 8 Mat2a:E3.1|E2.1 6 4 RBM3:E3.1 13 Rbm3:E15.3 6 4 APP:E13.1 10 App:E2.6 6 4 BAG6:E24.1 14 Bag6:E8.1 5 4 CSNK1A1:E6.1 7 Csnk1a1:E8.1 6 3 UBTF:E9.1 5 Ubtf:E8.1 3 3 BAG6:E24.1 13 Bag6:E8.1 5 3 TIAL1:E7.1 6 Tial1:E5.1 4 3 TIAL1:I7.1 1 Tial1:E5.1 4 3 RPS24:E6.1 8 Rps24:E6.1 3 3 MAT2A:I8.1 10 Mat2a:E6.1|E5.1 5 3 SPTAN1:E56.1 3 Sptan1:E6.1 4 3 4930506M07Rik:E6.1|E5. SHTN1:E17.1 2 1 3 3 SMARCA2:E32.1 2 Smarca2:I6.1 3 3 SHTN1:E16.1|E17.1 2 4930506M07Rik:I6.1 3 3 MAT2A:E8.1 8 Mat2a:E15.3 5 3 SRSF5:E8.1 24 Srsf5:E3.1 5 3 TIAL1:E8.1 1 Tial1:E9.1 4 3 APP:E13.1 9 App:E8.1 5 3 CSNK1A1:E6.1 4 Csnk1a1:E20.1 6 3 TCERG1:E23.1 7 Tcerg1:E21.1 6 3 PCBP2:E11.1 6 Pcbp2:E3.1 3 3 CAMK2G:E19.2 4 Camk2g:E9.1 4 3 SHTN1:E16.1 2 4930506M07Rik:E21.1 3 3 PAM:E24.3 4 Pam:E10.1 3 2 HNRNPH1:E9.1 11 Hnrnph1:E3.1 3 2 MYL6:E6.1.52060183 22 Myl6:E6.2 2 2 CSNK1A1:E11.2 5 Csnk1a1:E4.1 3 2

85 GRIN1:E21.1 1 Grin1:E16.1 2 2 PCBP2:I11.1 7 Pcbp2:E16.1 4 2 PCBP2:I11.1 7 Pcbp2:E10.2 3 2 SFPQ:E12.1 16 Sfpq:E4.3 3 2 SRSF5:E7.1 14 Srsf5:E4.1 5 2 QKI:E7.2 10 Qk:E10.2 3 2 RHOT1:E21.1 4 Rhot1:E2.6 5 2 CCDC90B:E4.1 9 Ccdc90b:E6.1 4 2 QKI:E9.1 4 Qk:E11.1 3 2 FERMT2:E3.1 2 Fermt2:E7.1 4 2 GRIN1:E21.1 1 Grin1:E15.1 2 2 PPP6R2:E13.1 2 Ppp6r2:E15.1 2 2 MAT2A:I8.1 10 Mat2a:E6.1 3 2 SNRNP70:I8.1 3 Snrnp70:E18.2 4 2 CADM3:E3.2 1 Cadm3:E17.1 2 2 ENSPANG00000023695:E24.1 6 Mprip:E17.1 4 2 PCBP2:I11.1 4 Pcbp2:E5.1 3 2 GRIN1:E21.1 1 Grin1:E3.1 2 2 CIRBP:E8.2 7 Cirbp:E3.1 5 2

Table S7: CSEs conserved between mouse and Human

# Human # Mouse CSE CSE CSEs in Hs CSEs in Mm Tissues Tissues HNRNPC:E1.17-E8.2|E3.4-E8.2 Hnrnpc:E1.3-E3.2|E2.7-E3.2 5 7 HNRNPC:E1.17-E3.4|E1.17-E8.2 Hnrnpc:E1.3-E3.2|E2.7-E3.2 5 7 PPAP2A:E1.4-E3.2|E1.4-E4.1 Ppap2a:E1.5-E3.1|E2.1-E4.1 4 4 ACIN1:E11.2-E13.3|E12.6_23536513- E13.3 Acin1:E13.1-E16.3|E15.3-E16.3 4 3 PCNP:E1.6-E3.2_101298643|E1.6-E3.3 Pcnp:E1.5-E2.6|E1.5-E2.7 5 5 ZNF207:E10.3-E11.1|E10.3-E12.1 Zfp207:E7.6-E9.1|E8.1-E9.1 6 5 ZNF207:E10.3-E12.1|E11.2-E12.1 Zfp207:E7.6-E9.1|E8.1-E9.1 6 5 ZNF207:E10.3-E11.1|E10.3-E12.1 Zfp207:E7.6-E8.1|E7.6-E9.1 6 4 ZNF207:E10.3-E12.1|E11.2-E12.1 Zfp207:E7.6-E8.1|E7.6-E9.1 6 4 PRPSAP2:E11.2-E13.1|E12.2-E13.1 Prpsap2:E5.1-E7.1|E4.1-E7.1 3 5 PRPSAP2:E11.2-E13.1|E12.2-E13.1 Prpsap2:E4.1-E5.1|E4.1-E7.1 3 6 RAB6A:E5.2-E7.1|E6.1-E7.1 Rab6a:E6.1-E7.1|E5.1-E7.1 3 5 RAB6A:E5.2-E7.1|E6.1-E7.1 Rab6a:E4.1-E6.1|E5.1-E7.1 3 4

86 RAB6A:E4.1-E6.1|E5.2-E7.1 Rab6a:E6.1-E7.1|E5.1-E7.1 3 5 RAB6A:E4.1-E6.1|E5.2-E7.1 Rab6a:E4.1-E6.1|E5.1-E7.1 3 4 SRSF5:E3.18-E4.8|E3.18-E4.1 Srsf5:E6.1-I6.1|E5.1-E7.1 5 5 SRSF5:E3.18-E4.1|E3.18-E4.8 Srsf5:E6.1-I6.1|E5.1-E7.1 4 5 ACIN1:E11.2-E12.1|E11.2-E13.3 Acin1:E13.1-E16.3|E15.3-E16.3 3 3 SRSF10:E2.2- E4.5_24301565|E2.3_24305201-E3.1 Srsf10:E2.1-E4.1|E2.2-E3.1 5 7 UBP1:E10.2-E11.1|E10.2-E12.1 Ubp1:E8.1-E10.1|E9.1-E10.1 4 8 APP:E12.1-E14.1|E12.1-E13.1 App:E6.2-E9.1|E8.1-E9.1 4 6 RHOT1:E19.2-E23.1|E21.2-E23.1 Rhot1:E18.2-E20.1|E18.2-E21.1 4 5 POSTN:E21.2-E22.1|E21.2-E24.3 Postn:E20.1-E22.2|E21.1-E22.2 4 3 HNRNPH3:E2.1-E3.2|E2.1-E4.2 Hnrnph3:E2.1-E4.1|E3.1-E4.1 5 3 UBP1:E10.2-E11.1|E10.2-E12.1 Ubp1:E8.1-E9.1|E8.1-E10.1 4 4 POSTN:E21.2-E24.3|E22.1-E24.3 Postn:E20.1-E22.2|E21.1-E22.2 4 3 NFIC:E10.3-E12.1|E10.3-I14.2 Nfic:E9.2-E10.1|E9.2-I11.2 6 2 HNRNPH3:E2.1-E3.2|E2.1-E4.2 Hnrnph3:E2.1-E4.1|E3.1-E4.1 5 3 VEGFA:E5.21-E6.1|E5.21-E7.1 Vegfa:E5.1-E7.2|E6.2-E7.2 4 2 SRSF3:E4.1-E5.1|E4.1-E6.1 Srsf3:E3.1-E5.2|E4.1-E5.2 6 4 FXR1:E19.2_180688055- E20.1|E19.2_180688055-E22.1 Fxr1:E16.1-E17.1|E14.1-E17.1 5 2 FXR1:E19.2_180688055- E21.1|E19.2_180688055-E22.1 Fxr1:E16.1-E17.1|E14.1-E17.1 4 2 H2AFY:E10.4-E11.2|I5.2-E11.2 H2afy:E10.2-E11.2|E9.3-E11.2 5 3 HP1BP3:E1.3-E4.1|E3.5-E4.1 Hp1bp3:E1.6-E5.1|E4.3-E5.1 4 5 LSM14B:E4.2_60701396- E5.1|E4.2_60701396-E7.1 Lsm14b:E4.1-E5.1|E3.2-E5.1 4 2 H2AFY:E6.4-E10.3|I5.2-E11.2 H2afy:E10.2-E11.2|E9.3-E11.2 5 3 BCL2L1:E2.4-E3.5|E2.4-E3.6 Bcl2l1:E1.2-E4.2|E2.6-E4.2 5 2 VEGFA:E5.21-E7.1|E6.2-E7.1 Vegfa:E5.1-E6.1|E5.1-E7.2 3 2 SON:E11.1-E17.3|E16.3-E17.3 Son:E10.1-E12.2|E11.1-E12.2 4 5 HDGF:E8.3-E10.2|E9.1-E10.2 Hdgf:E6.1-E8.2|E7.1-E8.2 5 2 SEC31A:E19.1-E22.2|E19.1-E20.1 Sec31a:E15.1-E16.1|E13.1-E16.1 4 3 SEC31A:E19.1-E22.2|E19.1-E20.1 Sec31a:E15.1-E16.1|E13.1-E16.1 4 3 VEGFA:E5.21-E6.1|E5.21-E7.1 Vegfa:E5.1-E6.1|E5.1-E7.2 4 2 PHLDB1:E21.2-E24.1|E23.3-E24.1 Phldb1:E16.1-E19.1|E18.2-E19.1 5 2 LTA4H:E20.1-E21.2|E20.1-E22.1 Lta4h:E17.1-E18.1|E16.1-E18.1 4 3 LTA4H:E20.1-E21.2|E20.1-E22.1 Lta4h:E17.1-E18.1|E16.1-E18.1 4 3 EIF4H:E4.3-E5.1|E4.3-E4.6 Eif4h:E4.1-E5.1|E4.1-E7.1 4 9 MLX:E2.1-E2.10|E2.8-E2.10 Mlx:E3.1-E4.2|E2.1-E4.2 5 3 MLX:E2.1-E2.10|E2.8-E2.10 Mlx:E3.1-E4.2|E2.1-E4.2 5 3

87

Table S8: Literature evidence for circadian spliced genes in multiple cancer progression and metastatic pathways.

Cross tissue circadian spliced Implicated cancer types/implicated role in genes cancer Literature evidence Eif4h :Ex-5 Colorectal cancer PMID:20473909 Ovarian cancer, Non-squamous cell lung App: Ex-8 carcinoma, Breast cancer and Colon cancer PMID:26840089 Cd200: Ex-2 Tumor growth and survival PMID:27108386 Csnka1: Ex-5 Breast and Colon cancer PMID: 25306547 Prrc2c: Ex-36 Lung cancer PMID:24371231 Ddx17 Breast cancer PMID:24910439 Foxk2 Breast cancer PMID:27773593 Fubp1 Splicing of oncogene MDM2 PMID:24798327 Gpx4 Ferrotoptic cell death PMID:24439385 Hnrnpc metastasis in gioblastoma PMID:22907752 Hnrpdl Colon cancer PMID:30052712 Mat2a Colon and liver cancer PMID:26416353 differentially regulated in primary tumor vs Fmo1 metastatic PMID:17123152 Prapsap2 Osteosarcoma PMID:22292074 Rab6a Cell cycle progression & breast cancer PMID:20064528 Rbm3 Prostate cancer PMID:23667174,PMID: 28118608 Slmap endothelial dysregulation PMID: 25527523 Sqstm1 Oncogene PMID:29738493 Srsf1, Srsf5 Multiple cancer via Mcl-1 pathway PMID:23284704 Tardbp Hepatocellular carcinoma PMID: 23389994 Ubp1 cell cycle and cancer progression PMID: 30241344 Fam115a Prostate cancer PMID:28052017 Clpx Cell proliferation and metastasis PMID:27389535

88

89 Chapter 3: Discussion

In this thesis I report the organism level impacts of circadian splicing, its conservation in mammals from a large compendium of mouse, baboon and human RNA-Seq datasets, key circadian splicing regulators predicted to orchestrate rhythmic splicing, and suggest a novel role for circadian splicing in lung adenocarcinoma (LUAD). Our results implicate the auto-regulation of circadian splicing factors in the control of rhythmic splicing and broadly in cancer. Circadian splicing serves as an additional layer of control for molecular oscillations in peripheral clocks. Based on seminal work in this field over the last few decades (Chapter 1), alternative splicing has been shown to add an important dimension to the temporal regulation of the circadian clock. A recent study similar to our own attempted to shed light on differential isoform expressions affected by circadian clock.

We note that this study suggests that circadian alternative splicing impacts can be identified from gene array analyses or isoform deconvolution [74]. However, our data suggests that such analyses are likely extremely problematic given a lack of accuracy of such methods to inherently infer splicing differences, which we argue requires precision comparison of splicing events across species. While true full-length isoform detection analyses are ultimately needed to elucidate the likely functional impact of observed CSEs, isoform deconvolution predictions should be considered highly speculative, and likely less informative in regard to circadian splicing

(especially when insufficient sequencing depth and excessive isoform complexity exists). For these reasons, local splicing variation-based approaches are considered the gold standard for understanding and interpreting isoform-level variation from short-read sequencing data [132].

Nonetheless, challenges still remain in the proper consideration of read-level depth and missing

PSI for evaluation by different rhythmicity detection algorithms. Though isoform deconvolution

90 analyses can be informative, since they do not direct evidence of exon/junction regulation or intron retention, such results should be considered speculative.

In this study, we performed a transcriptome wide splicing analysis in a rich, deep RNA seq data with samples across 12 different mammalian tissues (Mm) and evaluated conserved predicted events in baboon and human. Since the analyzed baboon RNA-Seq data is of low depth, with short reads and the human data has pseudo temporal ordering which may not be accurate, these data possess inherent challenges for any studies of circadian splicing. For this reason, we have focused our comparative analyses on an existing cross-tissue murine time-course due to the increased depth of sequencing and rely on more speculative predictions from baboon and human as a form of secondary and tertiary validation. These analyses have been designed to be rigorous, by requiring that both exon-exon and exon-intron junctions map exactly between genomes and that associated predictions also match at the level of associated tissues.

To report statistically significant and reproducible CSEs, we aimed to compare widely accepted periodicity detection algorithms in the field and datasets acquired across different analysis platforms and experimental conditions. Importantly, the current adopted computational workflow offers technical recommendations for the research community who intend to explore circadian splicing. Specifically, we note:

1. While it is attractive to use existing gene array data to infer circadian splicing, such predictions cannot be reproduced from RNA-Seq.

2. Deeper sequencing (40-60million reads) with tighter time resolution (every 2-4hrs) for at least

48hrs is recommended for circadian splicing analysis as it yielded 4 times more CSEs than low resolution (6h) data. Nonetheless, new studies are required to validate such predictions using independent cohort RNA-Seq data collected in a similar manner.

91 3. Among the algorithms tested, ARSER in Metacycle is not capable of handling missing values and biological replicate time courses. Since the primary dataset used in our study did not have any replicates, we chose Metacycle, as it identified highest number of experimentally validates CSEs from an external dataset.

4. The pattern of distribution of CSEs based on their peak expression information widely varies in the same tissue from different datasets. Hence, phase comparison only within the same experimental design is recommended as comparison between different datasets may subject to false observations.

To our knowledge this is the first study to decode circadian splicing across multiple tissues.

This unbiased approach has yielded thousands of novel mammalian CSEs and several hundreds of conserved CSEs among distant mammalian species. Given that hundreds of thousands of splicing events potentially exist in the analyzed organisms, the observation that 1/3rd of murine circadian splicing events are conserved suggests that such events, by and large, do not occur by chance alone.

While it is clear that far improved circadian predictions could be obtained from both high temporal- resolution data than that used in both our murine and baboon analyses, our work indicates that lower resolution data can be used to not only reliably identify conserved circadian splicing events, but also elucidate novel regulatory mechanisms and systems-level impacts through the integration of such data with circadian gene expression profiles.

Based on this approach, circadian splicing and transcription pattern in different mouse tissues ranked Liver, Kidney and Lung as the tissues with highest number of circadian genes while three brain regions (Cerebellum, BS and Hypothalamus) had the highest extent of circadian splicing. The brain quickly responds to even acute, dull environmental changes/stress. These responses involve changes in signal networks and synaptic connections [45]. The timely

92 expression of specific splicing regulators, epigenetic and other transcription factors in brain tissues is crucial for every step of neuronal development [96]. Previous studies addressing the role of AS in brain development and neurogenesis have also reported that splicing occurs very frequently in brain and that such events are highly conserved [133], [134]. Our baboon analyses reveal a similar pattern of increased circadian splicing to transcription. We further find that up-to-1/4th of rhythmically spliced genes have at least 2 major oscillating exons, with those in cerebellum possessing the greatest number of multi-exons CSEs. As CR coordinates tissue-specific rapid oscillatory expression of clock regulated genes, may advantageous to have more than two alternatively splice forms to offer additional regulatory control in modulating [44].

Alternative splicing is a very dynamic process driven by the regulation of SFs and axillary proteins. Adapting such intricate processes for the temporal regulation of isoform expression adds an extra layer of complexity to the regulatory nature of circadian clock. To orchestrate such intricate regulation, SFs and associated proteins targeting rhythmically spliced genes could be crucial in the regulation of CR. We noticed that several temperature sensitive RNA binding proteins including Cirbp, Rbm3, Hspa8, SR proteins and other chaperones are found rhythmic in multiple tissues which indicates that temperature could be an important physiological cue in coordinating AS and the clock. One of the classic features of this daily oscillations involves cell autonomous peripheral clock getting synchronized to the master clock [29]. It is widely acknowledged that body temperature can serve as a resetting cue from the SCN to peripheral tissues [30]. Similar to food entrainment, in a recent study minor temperature fluctuation were found to altering splicing [32]. This dimension of temperature sensitive circadian splicing via a. change in body temperature was only more recognized recently [34]. In this prior work, temperature regulated SFs (SRSF2&7) regulating U2af26 splicing were found to be critical

93 regulators of rhythmic splicing. We note that our pathway analyses highlight enrichment of these

SFs in CSEs found across mouse tissues, but not in circadian expressed genes. To better understand we examined the expression pattern of RNA binding proteins and their corresponding rhythmically spliced exons, which showed a strong correlation in many genes, including Cirbp, Rbm3, Tardbp,

Fus, Srsf5. These alternative exons were frequently enriched in binding sites for Fus, Tardbp, the thermo-regulated RNA binding proteins Cirbp, Rbm3 and Srsf1, 2. Secondly, these RNA binding proteins were found to Cirbp, Fus, Rbm3, Srsf5 to likely induce autoregulation of their transcripts to potentially achieve synchronization and resetting of circadian splicing. Though regulation of

SFs via phosphorylation is demonstrated [77], to our knowledge auto regulation of splicing factors that are circadian regulated in mammals has not previously been investigated.

Cirbp and Rbm3 are known to impart post transcriptional changes to clock-controlled genes via polyadenylation [116], but their role in splicing regulation is not reported yet. Similar to identifying central modulators of circadian splicing, it is equally vital to determine the missing link between the central clock and these RNA binding proteins. In addition to temperature and food, we believe splicing can respond to other zeitzeigers such as light and sterol hormone. We note that splicing of some of the downstream targets of light sensitive pigment pathway (Rdh5and Rdh13) but not the actual pigment genes themselves (opsins), hormone sensitive retinoic acid receptor genes Rxra, Rxrb and hypoxia inducible Hif1a and its downstream targets Mmp19 and Mmp23

(12h rhythm) are spliced in many tissues mostly in a tissue specific manner. These observations not only encourage the idea of possible multiple networks of circadian splicing feedback control, but also execution of coarse transcriptional and finer splicing corrections in peripheral tissues.

Alternative splicing is implicated in various genetic disorders, diseases and cancer (Chapter

1). Similarly, the pathological relevance of CR in neurological, metabolic disorders and cancer are

94 well-recognized. We find that a large proportion of our predicted cross-tissue circadian spliced genes are significant players in cancer progression and metastatic pathways (Chapter 2). Network analysis suggests that these circadian spliced genes have direct interaction with core clock genes and Myc. These key regulators function in the development of cancer via aberration of cell cycle, cell growth, cell movement, mis splicing of splicing regulators etc. (Chapter 2). Indeed, same circadian spliced exons of cross-tissue CSEs like App, Hnrpdl, Eif4h, Csk1aand Prrc2c are directly implicated in different types of cancer [135] [136]. Other cross-tissue rhythmically spliced genes such as Clpx, Eif4g2, Fmo1, Fubp1, Gpx4, Prpsap2, Sqstm1, Rbm3, Slmap, Tardbp are found to be differentially regulated in different cancer types (Table S8). The majority of these multi-tissue

CSEs (20 out of 58) are also identified as rhythmic in human tissues, which is a remarkable observation that opens up several doors for circadian splicing research in patients with a dysfunctional clock. Ultimately improving the human circadian splicing predictions will require larger healthy control and matched cancer datasets or an improved means to assign circadian phase to existing samples.

Our work provides novel insights into the mechanistic controls and potential regulatory networks mediating rhythmic splicing. Future directions include more detailed comparisons of tissue-specific circadian splicing impacts in diverse cancer datasets with matched healthy tissues.

Further, it will be necessary to associate observed circadian splicing events with specific cancer gene regulatory programs, to determine their precise role and potential resistance to chemotherapy.

As such, we may ultimately be able to identify novel isoform-specific targets for chronotherapy.

95 REFERENCES CITED

1. Kojima, S., D.L. Shingle, and C.B. Green, Post-transcriptional control of circadian

rhythms. J Cell Sci, 2011. 124(Pt 3): p. 311-20.

2. McClung, C.R., Plant circadian rhythms. Plant Cell, 2006. 18(4): p. 792-803.

3. Huang, R.C., The discoveries of molecular mechanisms for the circadian rhythm: The 2017

Nobel Prize in Physiology or Medicine. Biomed J, 2018. 41(1): p. 5-8.

4. Baker, C.L., J.J. Loros, and J.C. Dunlap, The circadian clock of Neurospora crassa. FEMS

Microbiol Rev, 2012. 36(1): p. 95-110.

5. Meyer-Bernstein, E.L. and A. Sehgal, Molecular regulation of circadian rhythms in

Drosophila and mammals. Neuroscientist, 2001. 7(6): p. 496-505.

6. Mas, P., Circadian clock function in Arabidopsis thaliana: time beyond transcription.

Trends Cell Biol, 2008. 18(6): p. 273-81.

7. Price, J.L., Genetic screens for clock mutants in Drosophila. Methods Enzymol, 2005. 393:

p. 35-60.

8. Loros, J.J., A. Richman, and J.F. Feldman, A recessive circadian clock mutation at the frq

locus of Neurospora crassa. Genetics, 1986. 114(4): p. 1095-110.

9. Sargent, M.L., W.R. Briggs, and D.O. Woodward, Circadian nature of a rhythm expressed

by an invertaseless strain of Neurospora crassa. Plant Physiol, 1966. 41(8): p. 1343-9.

10. Feldman, J.F. and M.N. Hoyle, Isolation of circadian clock mutants of Neurospora crassa.

Genetics, 1973. 75(4): p. 605-13.

11. Gardner, G.F. and J.F. Feldman, The frq locus in Neurospora crassa: a key element in

circadian clock organization. Genetics, 1980. 96(4): p. 877-86.

96 12. Lee, K., J.J. Loros, and J.C. Dunlap, Interconnected feedback loops in the Neurospora

circadian system. Science, 2000. 289(5476): p. 107-10.

13. Russo, V.E., Blue light induces circadian rhythms in the bd mutant of Neurospora: double

mutants bd,wc-1 and bd,wc-2 are blind. J Photochem Photobiol B, 1988. 2(1): p. 59-65.

14. Loros, J.J., S.A. Denome, and J.C. Dunlap, Molecular cloning of genes under control of

the circadian clock in Neurospora. Science, 1989. 243(4889): p. 385-8.

15. Dunlap, J.C. and J.J. Loros, Making Time: Conservation of Biological Clocks from Fungi

to Animals. Microbiol Spectr, 2017. 5(3).

16. Pittendrigh, C.S., On Temperature Independence in the Clock System Controlling

Emergence Time in Drosophila. Proc Natl Acad Sci U S A, 1954. 40(10): p. 1018-29.

17. Bargiello, T.A. and M.W. Young, Molecular genetics of a biological clock in Drosophila.

Proc Natl Acad Sci U S A, 1984. 81(7): p. 2142-6.

18. Reddy, P., et al., Molecular analysis of the period locus in Drosophila melanogaster and

identification of a transcript involved in biological rhythms. Cell, 1984. 38(3): p. 701-10.

19. Siwicki, K.K., et al., Antibodies to the period gene product of Drosophila reveal diverse

tissue distribution and rhythmic changes in the visual system. Neuron, 1988. 1(2): p. 141-

50.

20. Sehgal, A., et al., Rhythmic expression of timeless: a basis for promoting circadian cycles

in period gene autoregulation. Science, 1995. 270(5237): p. 808-10.

21. Tataroglu, O. and P. Emery, Studying circadian rhythms in Drosophila melanogaster.

Methods, 2014. 68(1): p. 140-50.

22. Vitaterna, M.H., et al., Mutagenesis and mapping of a mouse gene, Clock, essential for

circadian behavior. Science, 1994. 264(5159): p. 719-25.

97 23. Sun, Z.S., et al., RIGUI, a putative mammalian ortholog of the Drosophila period gene.

Cell, 1997. 90(6): p. 1003-11.

24. Tei, H., et al., Circadian oscillation of a mammalian homologue of the Drosophila period

gene. Nature, 1997. 389(6650): p. 512-6.

25. Merrow, M., K. Spoelstra, and T. Roenneberg, The circadian cycle: daily rhythms from

behaviour to genes. EMBO Rep, 2005. 6(10): p. 930-5.

26. Bodenstein, C., I. Heiland, and S. Schuster, Temperature compensation and entrainment

in circadian rhythms. Phys Biol, 2012. 9(3): p. 036011.

27. Mohawk, J.A., C.B. Green, and J.S. Takahashi, Central and peripheral circadian clocks in

mammals. Annu Rev Neurosci, 2012. 35: p. 445-62.

28. Welsh, D.K., J.S. Takahashi, and S.A. Kay, Suprachiasmatic nucleus: cell autonomy and

network properties. Annu Rev Physiol, 2010. 72: p. 551-77.

29. Pilorz, V., C. Helfrich-Forster, and H. Oster, The role of the circadian clock system in

physiology. Pflugers Arch, 2018. 470(2): p. 227-239.

30. Ralph, M.R. and M. Menaker, A mutation of the circadian system in golden hamsters.

Science, 1988. 241(4870): p. 1225-7.

31. Balsalobre, A., F. Damiola, and U. Schibler, A serum shock induces circadian gene

expression in mammalian tissue culture cells. Cell, 1998. 93(6): p. 929-37.

32. Yoo, S.H., et al., PERIOD2::LUCIFERASE real-time reporting of circadian dynamics

reveals persistent circadian oscillations in mouse peripheral tissues. Proc Natl Acad Sci

U S A, 2004. 101(15): p. 5339-46.

33. Hughes, M., et al., High-resolution time course analysis of gene expression from pituitary.

Cold Spring Harb Symp Quant Biol, 2007. 72: p. 381-6.

98 34. McCarthy, J.J., et al., Identification of the circadian transcriptome in adult mouse skeletal

muscle. Physiol Genomics, 2007. 31(1): p. 86-95.

35. Zambon, A.C., et al., Time- and exercise-dependent gene regulation in human skeletal

muscle. Genome Biol, 2003. 4(10): p. R61.

36. Panda, S., et al., Coordinated transcription of key pathways in the mouse by the circadian

clock. Cell, 2002. 109(3): p. 307-20.

37. Tong, M., et al., Circadian expressions of cardiac ion channel genes in mouse might be

associated with the central clock in the SCN but not the peripheral clock in the heart. Biol

Rhythm Res, 2013. 44(4): p. 519-530.

38. Wu, G., et al., Population-level rhythms in human skin with implications for circadian

medicine. Proc Natl Acad Sci U S A, 2018. 115(48): p. 12313-12318.

39. Jha, A., M.R. Gazzara, and Y. Barash, Integrative deep models for alternative splicing.

Bioinformatics, 2017. 33(14): p. i274-i282.

40. Hughes, M.E., et al., Deep sequencing the circadian and diurnal transcriptome of

Drosophila brain. Genome Res, 2012. 22(7): p. 1266-81.

41. Du, N.H., et al., MicroRNAs shape circadian hepatic gene expression on a transcriptome-

wide scale. Elife, 2014. 3: p. e02510.

42. Filichkin, S.A. and T.C. Mockler, Unproductive alternative splicing and nonsense mRNAs:

a widespread phenomenon among plant circadian clock genes. Biol Direct, 2012. 7: p. 20.

43. Menet, J.S., et al., Nascent-Seq reveals novel features of mouse circadian transcriptional

regulation. Elife, 2012. 1: p. e00011.

44. Zhang, R., et al., A circadian gene expression atlas in mammals: implications for biology

and medicine. Proc Natl Acad Sci U S A, 2014. 111(45): p. 16219-24.

99 45. Scheiermann, C., Y. Kunisaki, and P.S. Frenette, Circadian control of the immune system.

Nat Rev Immunol, 2013. 13(3): p. 190-8.

46. Masri, S., M. Cervantes, and P. Sassone-Corsi, The circadian clock and cell cycle:

interconnected biological circuits. Curr Opin Cell Biol, 2013. 25(6): p. 730-4.

47. Dodson, E.R. and P.C. Zee, Therapeutics for Circadian Rhythm Sleep Disorders. Sleep

Med Clin, 2010. 5(4): p. 701-715.

48. Akerstedt, T., Work hours, sleepiness and the underlying mechanisms. J Sleep Res, 1995.

4(S2): p. 15-22.

49. Jin, Y., T.Y. Hur, and Y. Hong, Circadian Rhythm Disruption and Subsequent

Neurological Disorders in Night-Shift Workers. J Lifestyle Med, 2017. 7(2): p. 45-50.

50. Stephens, M.A. and G. Wand, Stress and the HPA axis: role of glucocorticoids in alcohol

dependence. Alcohol Res, 2012. 34(4): p. 468-83.

51. Koch, C.E., et al., Interaction between circadian rhythms and stress. Neurobiol Stress,

2017. 6: p. 57-67.

52. van Campen, J.S., et al., Seizure occurrence and the circadian rhythm of cortisol: a

systematic review. Epilepsy Behav, 2015. 47: p. 132-7.

53. Musiek, E.S., et al., Circadian clock proteins regulate neuronal redox homeostasis and

neurodegeneration. J Clin Invest, 2013. 123(12): p. 5389-400.

54. Breen, D.P., et al., Sleep and circadian rhythm regulation in early Parkinson disease.

JAMA Neurol, 2014. 71(5): p. 589-595.

55. Karatsoreos, I.N., Links between Circadian Rhythms and Psychiatric Disease. Front Behav

Neurosci, 2014. 8: p. 162.

100 56. Pacchierotti, C., et al., Melatonin in psychiatric disorders: a review on the melatonin

involvement in psychiatry. Front Neuroendocrinol, 2001. 22(1): p. 18-32.

57. Engin, A., Circadian Rhythms in Diet-Induced Obesity. Adv Exp Med Biol, 2017. 960: p.

19-52.

58. Takeda, N. and K. Maemura, Circadian clock and cardiovascular disease. J Cardiol, 2011.

57(3): p. 249-56.

59. Onaolapo, A.Y. and O.J. Onaolapo, Circadian dysrhythmia-linked diabetes mellitus:

Examining melatonin's roles in prophylaxis and management. World J Diabetes, 2018.

9(7): p. 99-114.

60. Gnocchi, D. and G. Bruscalupi, Circadian Rhythms and Hormonal Homeostasis:

Pathophysiological Implications. Biology (Basel), 2017. 6(1).

61. Penev, P.D., et al., Chronic circadian desynchronization decreases the survival of animals

with cardiomyopathic heart disease. Am J Physiol, 1998. 275(6): p. H2334-7.

62. Martino, T.A., et al., Disturbed diurnal rhythm alters gene expression and exacerbates

cardiovascular disease with rescue by resynchronization. Hypertension, 2007. 49(5): p.

1104-13.

63. Paulose, J.K., et al., Human Gut Bacteria Are Sensitive to Melatonin and Express

Endogenous Circadian Rhythmicity. PLoS One, 2016. 11(1): p. e0146643.

64. Savvidis, C. and M. Koutsilieris, Circadian rhythm disruption in cancer biology. Mol Med,

2012. 18: p. 1249-60.

65. Tazi, J., N. Bakkour, and S. Stamm, Alternative splicing and disease. Biochim Biophys

Acta, 2009. 1792(1): p. 14-26.

101 66. Lopez-Bigas, N., et al., Are splicing mutations the most frequent cause of hereditary

disease? FEBS Lett, 2005. 579(9): p. 1900-3.

67. Fong, N., et al., Pre-mRNA splicing is facilitated by an optimal RNA polymerase II

elongation rate. Genes Dev, 2014. 28(23): p. 2663-76.

68. Zhou, Y., Y. Lu, and W. Tian, Epigenetic features are significantly associated with

alternative splicing. BMC Genomics, 2012. 13: p. 123.

69. Bell, T.J., et al., Cell-specific alternative splicing increases calcium channel current

density in the pain pathway. Neuron, 2004. 41(1): p. 127-38.

70. Li, Q., J.A. Lee, and D.L. Black, Neuronal regulation of alternative pre-mRNA splicing.

Nat Rev Neurosci, 2007. 8(11): p. 819-31.

71. Feng, D. and J. Xie, Aberrant splicing in neurological diseases. Wiley Interdiscip Rev

RNA, 2013. 4(6): p. 631-49.

72. Gould, E.L., et al., Melatonin profiles and sleep characteristics in boys with fragile X

syndrome: a preliminary study. Am J Med Genet, 2000. 95(4): p. 307-15.

73. Duffield, G.E., DNA microarray analyses of circadian timing: the genomic basis of

biological time. J Neuroendocrinol, 2003. 15(10): p. 991-1002.

74. Diernfellner, A., et al., Long and short isoforms of Neurospora clock protein FRQ support

temperature-compensated circadian rhythms. FEBS Lett, 2007. 581(30): p. 5759-64.

75. Cheng, Y., B. Gvakharia, and P.E. Hardin, Two alternatively spliced transcripts from the

Drosophila period gene rescue rhythms having different molecular and behavioral

characteristics. Mol Cell Biol, 1998. 18(11): p. 6505-14.

76. McGlincy, N.J., et al., Regulation of alternative splicing by the circadian clock and food

related cues. Genome Biol, 2012. 13(6): p. R54.

102 77. Preussner, M., et al., Body Temperature Cycles Control Rhythmic Alternative Splicing in

Mammals. Mol Cell, 2017. 67(3): p. 433-446 e4.

78. Woo, K.C., et al., Circadian amplitude of cryptochrome 1 is modulated by mRNA stability

regulation via cytoplasmic hnRNP D oscillation. Mol Cell Biol, 2010. 30(1): p. 197-205.

79. Klein, D.C., P.H. Roseboom, and S.L. Coon, New light is shining on the melatonin rhythm

enzyme: the first postcloning view. Trends Endocrinol Metab, 1996. 7(3): p. 106-12.

80. Sanchez, S.E., et al., A methyl transferase links the circadian clock to the regulation of

alternative splicing. Nature, 2010. 468(7320): p. 112-6.

81. Schlaen, R.G., et al., The spliceosome assembly factor GEMIN2 attenuates the effects of

temperature on alternative splicing and circadian rhythms. Proc Natl Acad Sci U S A,

2015. 112(30): p. 9382-7.

82. Wang, X., et al., SKIP is a component of the spliceosome linking alternative splicing and

the circadian clock in Arabidopsis. Plant Cell, 2012. 24(8): p. 3278-95.

83. Dubruille, R. and P. Emery, A plastic clock: how circadian rhythms respond to

environmental cues in Drosophila. Mol Neurobiol, 2008. 38(2): p. 129-45.

84. El-Athman, R., et al., A Computational Analysis of Alternative Splicing across Mammalian

Tissues Reveals Circadian and Ultradian Rhythms in Splicing Events. Int J Mol Sci, 2019.

20(16).

85. Shilts, J., G. Chen, and J.J. Hughey, Evidence for widespread dysregulation of circadian

clock progression in human cancer. PeerJ, 2018. 6: p. e4327.

86. Shakhmantsir, I. and A. Sehgal, Splicing the Clock to Maintain and Entrain Circadian

Rhythms. J Biol Rhythms, 2019: p. 748730419868136.

103 87. Preussner, M. and F. Heyd, Temperature-controlled Rhythmic Gene Expression in

Endothermic Mammals: All Diurnal Rhythms are Equal, but Some are Circadian.

Bioessays, 2018. 40(7): p. e1700216.

88. Bartok, O., et al., Adaptation of molecular circadian clockwork to environmental changes:

a role for alternative splicing and miRNAs. Proc Biol Sci, 2013. 280(1765): p. 20130011.

89. Yeung, J., et al., Transcription factor activity rhythms and tissue-specific chromatin

interactions explain circadian gene expression across organs. Genome Res, 2018. 28(2):

p. 182-191.

90. Meireles-Filho, A.C.A., et al., cis-regulatory requirements for tissue-specific programs of

the circadian clock. Curr Biol, 2014. 24(1): p. 1-10.

91. Pan, Q., et al., Deep surveying of alternative splicing complexity in the human

transcriptome by high-throughput sequencing. Nat Genet, 2008. 40(12): p. 1413-5.

92. Colot, H.V., J.J. Loros, and J.C. Dunlap, Temperature-modulated alternative splicing and

promoter use in the Circadian clock gene frequency. Mol Biol Cell, 2005. 16(12): p. 5563-

71.

93. Liu, Y., et al., Thermally regulated translational control of FRQ mediates aspects of

temperature responses in the neurospora circadian clock. Cell, 1997. 89(3): p. 477-86.

94. Woo, K.C., et al., Mouse period 2 mRNA circadian oscillation is modulated by PTB-

mediated rhythmic mRNA degradation. Nucleic Acids Res, 2009. 37(1): p. 26-37.

95. Kim, T.D., et al., Rhythmic control of AANAT translation by hnRNP Q in circadian

melatonin production. Genes Dev, 2007. 21(7): p. 797-810.

96. Mure, L.S., et al., Diurnal transcriptome atlas of a primate across major neural and

peripheral tissues. Science, 2018. 359(6381).

104 97. Li, Y.I., et al., Annotation-free quantification of RNA splicing using LeafCutter. Nat Genet,

2018. 50(1): p. 151-158.

98. Atger, F., et al., Circadian and feeding rhythms differentially affect rhythmic mRNA

transcription and translation in mouse liver. Proc Natl Acad Sci U S A, 2015. 112(47): p.

E6579-88.

99. Deng, M., et al., FirebrowseR: an R client to the Broad Institute's Firehose Pipeline.

Database (Oxford), 2017. 2017.

100. Hughes, M.E., J.B. Hogenesch, and K. Kornacker, JTK_CYCLE: an efficient

nonparametric algorithm for detecting rhythmic components in genome-scale data sets. J

Biol Rhythms, 2010. 25(5): p. 372-80.

101. Yang, R. and Z. Su, Analyzing circadian expression data by harmonic regression based on

autoregressive spectral estimation. Bioinformatics, 2010. 26(12): p. i168-74.

102. Luck, S., et al., Rhythmic degradation explains and unifies circadian transcriptome and

proteome data. Cell Rep, 2014. 9(2): p. 741-51.

103. Thaben, P.F. and P.O. Westermark, Detecting rhythms in time series with RAIN. J Biol

Rhythms, 2014. 29(6): p. 391-400.

104. Muench, D.E., et al., SKI controls MDS-associated chronic TGF-beta signaling, aberrant

splicing, and stem cell fitness. Blood, 2018. 132(21): p. e24-e34.

105. Emig, D., et al., AltAnalyze and DomainGraph: analyzing and visualizing exon expression

data. Nucleic Acids Res, 2010. 38(Web Server issue): p. W755-62.

106. Consortium, G., The Genotype-Tissue Expression (GTEx) project. Nat Genet, 2013. 45(6):

p. 580-5.

105 107. Anafi, R.C., et al., CYCLOPS reveals human transcriptional rhythms in health and disease.

Proc Natl Acad Sci U S A, 2017. 114(20): p. 5312-5317.

108. Wu, G., et al., MetaCycle: an integrated R package to evaluate periodicity in large scale

data. Bioinformatics, 2016. 32(21): p. 3351-3353.

109. Zambon, A.C., et al., GO-Elite: a flexible solution for pathway and ontology over-

representation. Bioinformatics, 2012. 28(16): p. 2209-10.

110. McLendon, P.M., et al., An Unbiased High-Throughput Screen to Identify Novel Effectors

That Impact on Cardiomyocyte Aggregate Levels. Circ Res, 2017. 121(6): p. 604-616.

111. Yeo, G., et al., Variation in alternative splicing across human tissues. Genome Biol, 2004.

5(10): p. R74.

112. Huisman, S.A., et al., Colorectal liver metastases with a disrupted circadian rhythm phase

shift the peripheral clock in liver and kidney. Int J Cancer, 2015. 136(5): p. 1024-32.

113. Yang, X., et al., Down regulation of circadian clock gene Period 2 accelerates breast

cancer growth by altering its daily growth rhythm. Breast Cancer Res Treat, 2009. 117(2):

p. 423-31.

114. Gotic, I., et al., Temperature regulates splicing efficiency of the cold-inducible RNA-

binding protein gene Cirbp. Genes Dev, 2016. 30(17): p. 2005-17.

115. Hu, B., et al., POSTAR: a platform for exploring post-transcriptional regulation

coordinated by RNA-binding proteins. Nucleic Acids Res, 2017. 45(D1): p. D104-D114.

116. Zhu, X., C. Buhrer, and S. Wellmann, Cold-inducible proteins CIRP and RBM3, a unique

couple with activities far beyond the cold. Cell Mol Life Sci, 2016. 73(20): p. 3839-59.

117. Fujioka, Y., et al., FUS-regulated region- and cell-type-specific transcriptome is

associated with cell selectivity in ALS/FTLD. Sci Rep, 2013. 3: p. 2388.

106 118. Barmada, S.J. and S. Finkbeiner, Pathogenic TARDBP mutations in amyotrophic lateral

sclerosis and frontotemporal dementia: disease-associated pathways. Rev Neurosci, 2010.

21(4): p. 251-72.

119. Rutherford, N.J., et al., Novel mutations in TARDBP (TDP-43) in patients with familial

amyotrophic lateral sclerosis. PLoS Genet, 2008. 4(9): p. e1000193.

120. Sadakata, T., M. Washida, and T. Furuichi, Alternative splicing variations in mouse

CAPS2: differential expression and functional properties of splicing variants. BMC

Neurosci, 2007. 8: p. 25.

121. Rengasamy, M., et al., The PRMT5/WDR77 complex regulates alternative splicing through

ZNF326 in breast cancer. Nucleic Acids Res, 2017. 45(19): p. 11106-11120.

122. Plumel, M., et al., Circadian Analysis of the Mouse Cerebellum Proteome. Int J Mol Sci,

2019. 20(8).

123. Van Nostrand, E.L., et al., A Large-Scale Binding and Functional Map of Human RNA

Binding Proteins. bioRxiv, 2018.

124. Humphrey, J., et al., FUS ALS-causative mutations impact FUS

autoregulation and the processing of RNA-binding proteins through intron retention.

bioRxiv, 2019.

125. Yang, S., R. Jia, and Z. Bian, SRSF5 functions as a novel oncogenic splicing factor and is

upregulated by oncogene SRSF3 in oral squamous cell carcinoma. Biochim Biophys Acta

Mol Cell Res, 2018. 1865(9): p. 1161-1172.

126. Sun, S., et al., SF2/ASF autoregulation involves multiple layers of post-transcriptional and

translational control. Nat Struct Mol Biol, 2010. 17(3): p. 306-12.

107 127. Pervouchine, D., et al., Integrative transcriptomic analysis suggests new autoregulatory

splicing events coupled with nonsense-mediated mRNA decay. Nucleic Acids Res, 2019.

47(10): p. 5293-5306.

128. Grellscheid, S., et al., Identification of evolutionarily conserved exons as regulated targets

for the splicing activator tra2beta in development. PLoS Genet, 2011. 7(12): p. e1002390.

129. Seo, J.S., et al., The transcriptional landscape and mutational profile of lung

adenocarcinoma. Genome Res, 2012. 22(11): p. 2109-19.

130. Cancer Genome Atlas Research, N., Comprehensive molecular profiling of lung

adenocarcinoma. Nature, 2014. 511(7511): p. 543-50.

131. Schoning, J.C., et al., Auto-regulation of the circadian slave oscillator component AtGRP7

and regulation of its targets is impaired by a single RNA recognition motif point mutation.

Plant J, 2007. 52(6): p. 1119-30.

132. Vaquero-Garcia, J., et al., A new view of transcriptome complexity and regulation through

the lens of local splicing variations. Elife, 2016. 5: p. e11752.

133. Raj, B. and B.J. Blencowe, Alternative Splicing in the Mammalian Nervous System: Recent

Insights into Mechanisms and Functional Roles. Neuron, 2015. 87(1): p. 14-27.

134. Su, C.H., D. D, and W.Y. Tarn, Alternative Splicing in Neurogenesis and Brain

Development. Front Mol Biosci, 2018. 5: p. 12.

135. Alam, S., H. Suzuki, and T. Tsukahara, Alternative splicing regulation of APP exon 7 by

RBFox proteins. Neurochem Int, 2014. 78: p. 7-17.

136. Pandey, P., et al., Amyloid precursor protein and amyloid precursor-like protein 2 in

cancer. Oncotarget, 2016. 7(15): p. 19430-44.

108