University of Connecticut OpenCommons@UConn

Doctoral Dissertations University of Connecticut Graduate School

10-6-2015 Understanding the Alternative Splicing-regulated Transcriptomic Changes During Mouse Retinal Development and Disease Through Global and -centric Approaches Devi Krishna Priya Karunakaran University of Connecticut - Storrs, [email protected]

Follow this and additional works at: https://opencommons.uconn.edu/dissertations

Recommended Citation Karunakaran, Devi Krishna Priya, "Understanding the Alternative Splicing-regulated Transcriptomic Changes During Mouse Retinal Development and Disease Through Global and Gene-centric Approaches" (2015). Doctoral Dissertations. 930. https://opencommons.uconn.edu/dissertations/930 ABSTRACT Understanding the Alternative Splicing-regulated Transcriptomic Changes During Mouse Retinal Development and Disease Through Global and Gene-centric Approaches Devi Krishna Priya Karunakaran, Ph.D University of Connecticut, 2015 Alternative splicing (AS) is an important layer of gene regulation and has been shown to control various cellular processes including splicing, mRNA export, translation and cell cycle. Mis-regulation in AS has been implicated in many diseases. But, the role of AS in the retinal development and diseases remains unexplored. To this end, I employed a two- pronged approach, (i.e), gene-centric approach and an en-mass transcriptome analysis approach to address this question. For the first approach, I studied the role of an alternative splicing factor, Sfrs10 and a kinase, Citron Kinase (CitK) and its spliced isoforms in murine retinal development and diseases. Expression analysis of Sfrs10 in mouse and human retinae showed that unlike mouse, it was not expressed in normal human retina but was observed only in AMD retina, suggesting a specific role in response to oxidative stress. In parallel, I showed that the loss of CitK affected the cell division of a subset of retinal progenitor cells which in turn affected the late neurogenesis, specifically that of the Islet1+ bipolar neurons. In the second approach, global analyses were performed by employing RNA deep sequencing on cytoplasmic and nuclear fractions of developing retinal tissue. We investigated if the nuclear transcriptome would be ahead of that of the cytoplasm where it simultaneously executes the current molecular program whilst preparing for the next program i.e., de novo transcription. Also, I employed a custom bioinformatics pipeline to reverse-engineer the order in which the molecular programs are set up as the retinal tissue develops. Further, I extended the study to Nrl

i gene knockout to identify the perturbation of molecular pathways in the absence of the gene. Here, our bioinformatics strategy could predict the perturbed molecular programs well before its histological manifestation. We also compared our methodology with the existing methods of data analysis and show that our pipeline could give information on transcription kinetics of segregated into each bin. Thus, this pipeline was employed in the temporal comparison of a triple microRNA cluster knockout and its wild type counterpart across different stages of development.

ii

Understanding the Alternative Splicing-regulated Transcriptomic Changes During Mouse Retinal Development and Disease Through Global and Gene-centric Approaches

Devi Krishna Priya Karunakaran

B.S., Madras University, 2006 M.S., Madurai Kamaraj University, 2008

A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy at the University of Connecticut

2015

iii

Copyright 2015 by Devi Krishna Priya Karunakaran

iv

APPROVAL PAGE

Doctor of Philosophy Dissertation

Understanding the Alternative Splicing-regulated Transcriptomic Changes During Mouse Retinal Development and Disease Through Global and Gene-centric Approaches

Presented by

Devi Krishna Priya Karunakaran, B.S., M.S.

Major Advisor: ______

Rahul Kanadia, Ph.D.

Associate Advisor: ______

Akiko Nishiyama, Ph.D.

Associate Advisor: ______

Daniel Mulkey, Ph.D.

Associate Advisor: ______

Barbara Mellone, Ph.D.

Associate Advisor: ______

Marie Cantino, Ph.D.

University of Connecticut

2015

v

Dedicated to my Grandmother!

vi

ACKNOWLEDGEMENTS I would like to thank God for His blessings that he has showered on me throughout these years and for helping me realize my dream. First and foremost, I would like to sincerely thank Dr. Rahul Kanadia for being a great mentor that he is to me. He not only taught me to do science but also to think like a scientist. I thank him for giving me the opportunity to work in his lab. He has always been a strong support and a great motivator. He had the trust in my capabilities when I was not sure whether I could do certain things. A constant push given by him has made me achieve what I wanted. I am very grateful to have had a mentor, who was more like a father figure. I would like to sincerely thank him for his belief and trust in me as well as for the encouragement with everlasting patience. Special thanks to those speeches that woke me up to reality as well as for those times you have made me laugh! I would like to thank my colloborators, Dr. Ion Mandoiu and Dr. Sahar Al Seesi of Computer Science and engineering department at the University of Connecticut for always being there to help me out with the bioinformatics troubleshooting and Sahar for patiently teaching me how to understand the scripts and to be independently working with bioinformatics. I would like to thank my committee members, Dr. Akiko Nishiyama, Dr. Daniel Mulkey and Dr. Barbara Mellone for their constant support and valuable suggestions during the meetings. They were instrumental in keeping me to stay focused as well as give constructive criticism in my thesis work. I would like to thank the people who made lab a fun place to be in. First, I would like to thank my past lab member, Dr. Abdul Rouf Banday, who was very helpful in teaching me bio informatics tools and giving me valuable suggestions in my project. I would also like to thank my undergraduate mentees, Christopher Lemoine, who is currently pursuing his PhD in the Kanadia lab, Nisarg Chhaya and Katery Hyatt, who were great to work with and were ready to help me anytime I needed help. I would like to thank former Research for Undergraduates (REU) students, Marybeth Baumgartner, who is currently pursuing her Ph.D in the Kanadia lab, Katherine Delaney and Anouk Olthof for helping me in my project during their time as summer research fellows. I would like to thank the undergraduates Amye Black, Whitney Washburn for briefly helping me in my project. I would like to thank Dr. Marie Cantino for teaching me Electron microscopy and helping me with my Sfrs10 project. I would also like to thank Steve Daniels, David Serwanski of Nishiyama lab for helping me with the same. I would like to thank Dr. Joanne Conover and Dr. Joseph Loturco for giving me valuable insights during our tri-lab meets. I would like to thank Dr. Xinnian Chen, Kristen Kimball, Penny Dobbins, Ed Lechowicz and Ann Hamlin for making PNB 2274 teaching a fun. I would like to thank Kathy Keheller, Linda Armstrong for helping me with the administrative side of the course, whenever I needed them.

vii

I would like to thank other faculty members, staff and other students of PNB for encouraging me during my journal club talks. I would like to thank the animal facility staff, especially Kevin and Teresa and other members of the facility for taking good care of my mice and helping me whenever I needed one. I would like to thank Bevan, who has been a constant support and help whenever I needed one. I would like to thank him for all his help like a brother in a country outside of my home country. I would like to thank Ashley, his fiancé and my past lab mate, who is a great fun to be with. I would like to extend my heartfelt thanks to Seethe, who was, is and will always be a great friend, sister and a great person that shared living space with me and made me feel at home away from home as well as took care of me during my times of frustration and ill-health. Thanks a ton ma! I would like to thank my friends, Raji, Priyanka, Anitha, Hems, Shankar, Shanmu and Prathiba who are in constant touch and have always been supportive and encouraging me of my work. Last but not the least, I would like to extend my heartfelt thanks to my family, without whom I would not have made it to this point in my career. I would like to thank my Paati, who is no more with me, my Appa and Amma and Matty for letting me achieve my dream of pursuing my doctorate studies in neuroscience miles away from them. My sisters and brothers-in-law are my strong pillars of support when I felt like I missed home. I would like to thank Dekka, Lavan, Ratty, Arjun, BIL and Jiju for truly making my PhD an enjoyable journey. I would also like to thank my nieces and nephews, who were my primary stress-busters. I am truly blessed to have them! My acknowledgements would be incomplete without thanking my husband, Vishnu, who put up with me and patiently waited for me to finish my studies to start our life together. He has been a great source of encouragement and support, without which I could not have managed to finish my PhD. I would also like to thank my in-laws for supporting me in this journey. Thank you!

viii

TABLE OF CONTENTS

Acknowledgements …………………………………………………………………………..v List of Figures………………………………………………………………………………...xiii List of Tables………………………………………………………………………………….xv Chapter 1: Retinal Development ...... 1 1.1 Retina as a model system ...... 2 1.2 Structure of the vertebrate retina ...... 3 1.3 Development of the mouse retina ...... 4 1.4 Gene regulation in the developing retina ...... 6 Chapter 2: Alternative splicing ...... 11 2.1 Transcription and Splicing ...... 11 2.2 Alternative Splicing ...... 13 2.2.1 Types of alternative splicing …………………………………………………..15 2.2.2 Factors influencing alternative splicing ………………………………………16 2.2.3 Role in diseases ………………………………………………………………..17 2.2.4 Role in CNS development …………………………………………………….19 2.2.5 Role in cell cycle regulation …………………………………………………...20 2.2.6 Alternative splicing factors …………………………………………………….22 Chapter 3: Role of the alternative splicing factor, Sfrs10 in mouse retinal development ...... 28 3.1 SR ...... 28 3.2 Serine-Arginine rich alternative splicing factor 10 (Sfrs10) ...... 31 3.2.1 Role in development and metabolism ...... 32 3.2.2 Role in diseases and stress-response ...... 32 3.3 Rationale ...... 34 3.4 Background ...... 35 3.5 Results ...... 36

ix

3.5.1 Sfrs10 is expressed across mouse retinal development ...... 36 3.5.2 Sfrs10 is expressed in differentiating retinal cells ...... 37 3.5.3 Sfrs10 marks red/green cone photoreceptors between P8-P14 ...... 37 3.5.4 Sfrs10 is peri-nuclear in mouse rod photoreceptors ...... 38 3.5.5 Sfrs10 is 100% conserved in mammals ...... 39 3.5.6 Sfrs10 expression is conserved in mouse, rat and chicken retinae ...... 40 3.5.7 Sfrs10 expression is pan-retinal in mouse, rat and chicken ...... 41 3.5.8 SFRS10 is not expressed in normal human retinae but is upregulated in AMD retinae ...... 41 3.5.9 Human Sfrs10 promoter is different from that of rodents ...... 43

3.5.10 Sfrs10 forms stress-related speckles ...... 44 3.5.11 Sfrs10 role in hypoxic stress ...... 45 3.6 Discussion ...... 45 3.7 Materials and methods ...... 49 Chapter 4: Role of the cell cycle regulator, Citron Kinase in rat retinal development ...... 74 4.1 Cell cycle regulation ...... 74 4.2 Citron Kinase (CitK) ...... 77 4.2.1 Discovery of Citron Kinase ...... 77 4.2.2 Expression of CitK in the developing brain ...... 78 4.2.3 Flathead mutant ...... 79 4.3 Rationale ...... 80 4.4 Background ...... 80 4.5 Results ...... 81 4.5.1 Citron kinase is enriched during embryonic retinal development ...... 81 4.5.2 Number of progenitor cells in the KO retina is comparable to that in the WT retina at E12 and E13 ...... 83

4.5.3 Cell death of a subset of progenitor cell population is observed in the KO retina by E14 ...... 84

x

4.5.4 Absence of Islet1+ bipolar cell production in the KO retina ...... 85 4.5.5 CitK KO retina undergoes severe degeneration by P14 ...... 86 4.6 Discussion ...... 87 4.7 Materials and methods ...... 89 Chapter 5: Global analyses to understand the regulation of transcriptome by alternative splicing ...... 111 5.1 Technologies employed to understand alternative splicing at the global level ... 111 5.1.1 Microarray ...... 112 5.1.1.1 Principle ...... 112 5.1.1.2 Advantages and disadvantages ...... 113 5.1.1.3 Applications ...... 114 5.1.2 High- throughput sequencing ...... 115 5.1.2.1 Illumina sequencing ...... 117 5.2 Customized bioinformatics pipeline and microarray to understand the biological processes underpinning normal and aberrant retinal development …………………119 5.2.1 Rationale ...... 120 5.2.2 Background ...... 121 5.2.3 Results...... 121 5.2.3.1 RNA-Seq of fractionated retina ...... 121 5.2.3.2 Validation of RNA-Seq data ...... 123 5.2.3.3 RNA-seq revealed high-resolution transcription kinetics ...... 124

5.2.3.4 Transcriptionally coupled genes revealed molecular programs in the developing retina ...... 126 5.2.3.5 Extending the analysis to other time points through the custom microarray ...... 129 5.2.3.6 Comparison with other analysis methods ...... 130 5.2.3.7 Temporal analysis combined with static analysis of Nrl WT and KO RNA-Seq is more informative ...... 131

5.2.3.8 Binning vs. DE based analysis of Nrl WT and KO data ...... 133

xi

5.2.4 Discussion ...... 134 5.2.5 Materials and methods ...... 138 5.3 Extension of the pipeline to understand the effect of triple microRNA knockout in mouse retinal development………………………………………………………………….144 5.3.1 Rationale ...... 144 5.3.2 Background ...... 145 5.3.3 Results and Discussion ...... 147 5.3.3.1 Transcriptomics summary of the static comparison ...... 147 5.3.3.2 Transcriptomics summary of the temporal comparison ...... 149 5.3.3.3 DAVID analysis of static and temporal comparison ...... 151

5.3.3.4 Covariance and coefficient correlation analysis results ...... 152 5.3.4 Future directions ...... 153 5.3.5 Materials and methods ...... 154 Chapter 6: Conclusions and Future Directions ...... 171 List of References ...... 176 Appendix 1 ...... 193 Supplementary Information for Chapter 5 ...... 210 Supplementary figures ...... 210 Supplementary table legends ...... 217

xii

LIST OF FIGURES: Chapter 1: Retinal Development ...... 8 Figure 1.1 Stereotypic architecture of the retina ...... 8 Figure 1.2 Development of the retina from neural ectoderm ...... 9 Figure 1.3 Birth order of retinal cell types ...... 10 Chapter 2: Alternative splicing ...... 24 Figure 2.1 Steps in the splicing process ...... 24 Figure 2.2 Mechanism of alternative splicing ...... 26 Figure 2.3 Types of Alternative Splicing ...... 27 Chapter 3: Role of the alternative splicing factor, Sfrs10 in mouse retinal development ...... 55 Figure 3.1 Expression analysis of Sfrs10 during mouse retinal development ...... 55 Figure 3.2 SFRS10 expression across postnatal retinal development ...... 57 Figure 3.3 Specific enrichment of SFRS10 in red/green cone photoreceptors ...... 58 Figure 3.4 Differential expression of SFRS10 in rod photoreceptors versus cone photoreceptors ...... 60 Figure 3.5 Sfrs10 is 100% conserved in mammals ...... 61 Figure 3.6 Expression of Sfrs10 is conserved in mouse, rat and chicken retina ...... 62 Figure 3.7 Sfrs10 expression is pan-retinal in mouse, rat and chicken ...... 63 Figure 3.8 SFRS10 is not expressed in normal human retinae ...... 64 Figure 3.9 Red/green opsin is dispersed throughout the photoreceptor membrane in AMD ...... 65 Figure 3.10 SFRS10 is upregulated in degenerating retina ...... 66 Figure 3.11 CpGPLOT analysis of Sfrs10 promoter region in mouse, rat and human Sfrs10 ...... 68 Figure 3.12 Comparative analysis of mSfrs10 and hSfrs10 promoter region ...... 70 Figure 3.13 SFRS10 forms independent stress-related speckles ...... 72 Chapter 4: Role of the cell cycle regulator, Citron Kinase in rat retinal development ...... 95 Figure 4.1 Expression analysis of Citron Kinase during rat retinal development ...... 95

xiii

Figure 4.2 Number of Progenitor cells in WT and KO retinae are comparable at E12...... 97 Figure 4.3 Number of RPCs and retinal ganglion cells in WT and KO retinae are comparable at E13 ...... 99 Figure 4.4 Number of RGCs in WT and KO retinae are comparable at E14 but the number of RPCs in S-phase are fewer in the KO retina ...... 101 Figure 4.5 Reduction in RPCs without change in RGCs between WT and KO retinae at E16 ...... 103 Figure 4.6 Reduction in the ONBL in the CitK KO retina along with absence of bipolar cells at P2 ...... 105 Figure 4.7 Bipolar cells are not produced in the KO retina at P4 ...... 106 Figure 4.8 KO retina is severely compromised by P14 ...... 108 Figure 4.9 Model of the effect of loss of Citron Kinase on RPCs ...... 110 Chapter 5: Global analyses to understand the regulation of transcriptome by alternative splicing ...... 155 Figure 5.1 Binning strategy for RNA-Seq data ...... 155 Figure 5.2 Validation of high-resolution transcriptome by RNA-Seq ...... 157 Figure 5.3 Custom bioinformatics pipeline revealed transition in biological processes ...... 159

Figure 5.4 Custom microarray revealed isoform/ coherence and validated RNA-Seq ...... 161 Figure 5.5 Temporal comparison of P21-Nrl-WT and P21-Nrl-KO to P0 ...... 163

Figure 5.6 RNA-Seq revealed progression of biological programs across normal and aberrant retinal development ...... 164 Figure 5.7 Transcriptomics summary of static comparison ...... 165 Figure 5.8 Transcriptomics summary of temporal comparison ...... 167 Figure 5.9 DAVID analysis output of Static and Temporal comparisons ...... 168 Figure 5.10 Covariance analysis of genes enriching for GABA A receptor activity . 170

xiv

LIST OF TABLES Chapter 3: Role of the alternative splicing factor, Sfrs10 in mouse retinal development Table 3.1 Case history of specimens analyzed………………………………………..42 Table 3.2 List of primers used in Sfrs10 RT-PCR analysis…………………………...51 Chapter 4: Role of the cell cycle regulator, Citron Kinase in rat retinal development Table 4.1 List of primers used in CitK RT-PCR analysis………………………………90 Chapter 5: Global analyses to understand the regulation of transcriptome by alternative splicing Table 5.1 Skeletal muscle-specific genes expression. The table shows FPKM values of skeletal muscle-specific genes in E16CE, P0CE and P0NE samples………...... 122

Table 5.2 Distribution of 2007 transcripts in P0NE_Only sample…………………….126 Table 5.3 The table shows our analysis strategy, highlighted in grey and the three other analysis variants we compared with……………………………………………………..131 Table 5.4 Read mapping statistics and rRNA levels in the E16CE, P0CE, and P0NE samples……………………………………………………………………………………..135 Table 5.5 An example of a “P0NE_Only” gene, Ces5a, whose FPKM unit in P0NE is comparable to those of other genes with known established expression kinetics (Nrl, Nr2e3, Gngt2)………………………………………………………………………………140

xv

Chapter 1: Retinal development

“The development of an organism may be considered as the execution of a ‘developmental program’ present in the fertilized egg. A central task of developmental biology is to discover the underlying algorithm from the course of development.” – Aristid Lindenmeyer As a developmental biologist, I acknowledge the words of Lindenmeyer, which has aptly captured the essence of every developmental biologist’s goal i.e., to understand the course of the dynamic process of development and intricate regulatory networks that dictate various decisions taken at every step during development. Thus, the focus of my research was to understand the role of RNA processing, specifically that of alternative splicing and how it regulates the transcriptome in the developing mouse retina. To this end, I employed a two-pronged approach – a gene-centric approach using i) an alternative splicing factor called Serine-arginine rich splicing factor 10 (Sfrs10), where I studied its role in mouse retinal development and disease and ii) a kinase called Citron

Kinase, where I studied the role of its spliced isoforms in retinal progenitor cells in the developing rat retina and iii) en-masse approach using deep RNA sequencing.

In this chapter, I will discuss the model system used in this study, which is the mouse retina. First, I will discuss the history of the retinal research and why retina has been a preferred model system to many neurobiologists. This is followed by the architecture of the vertebrate retina and how the visual information flows from the input neurons in the retina to the brain. Then, I will explain how retina develops from the diencephalon, various models proposed to understand the cell fate of retinal progenitor cells and birth order of retinal cells. Lastly, I will discuss the transcription factors that are known to regulate the production of each retinal type.

1

1.1 Retina as a model system

The eye, as an organ, has been long studied and the history dates back to fourth century B.C. when Plato mentioned in his writings that the light from the eye captured objects with its rays. Later, in 1604, Johannes Kepler was the first to give the ‘theory of retinal image’ explaining that vision occurs through a picture of visible things on the white, concave surface of the retina[1]. Since then, retina and the image formation in the eye have always been of great interest to many neuroscientists. An important breakthrough in the retinal field happened in 1887, when Roman y Cajal started working with the avian retina as he considered retina to be the most suitable system to study the basic organization of the nervous system [2]. In 1893, he published his findings on the retinal biology and mentioned how retina is an advantageous structure for neurobiologists because of its accessibility, its stereotypic architecture with cell bodies and synapses, and the linear flow of visual information. The studies made on the retinal circuitry paved way for his ‘neuron doctrine’ [2]. Thereafter, cell fate of retinal progenitor cells, ‘competence model’ of the retinal progenitor cells were studied [3]. In 1996, birth order of the retinal neurons was characterized by Cepko and her group, who uses retina as the model system to explore central nervous system development as its size and location lends itself to experiments both in vivo and in vitro [3, 4]. Recent advancements in technology along with plethora of literature on expression kinetics of many genes playing vital role in various key processes during its development have bettered our understanding of the retinal development. Thus, the aforementioned advantages of retina along with exhaustive literature in the field has made it a great system to investigate the role of RNA processing, specifically alternative splicing during its development and in retinal diseases.

2

1.2 Structure of the vertebrate retina

The retina is a thin layer of neural tissue that lines the inner part of the eye. It converts the electrical light signal into a chemical signal which is then relayed to the brain via the optic nerve. The retina is comprised of three nuclear layers and two synaptic layers [3,

5](Fig 1.1). The retinal pigmented epithelium (RPE) forms a single layer of cells attached to the choroid and outer segments of photoreceptors. This layer provides nutrients and removes wastes from the photoreceptor cells. Closer to the RPE is the outer nuclear layer

(ONL). This layer contains the nuclei of the light-detecting photoreceptor cells rod photoreceptors and cone photoreceptors. Rod photoreceptor cells are more numerous than cones (Fig 1.1) and are significantly more sensitive, being able to be activated by individual photons of light [6]. Three types of cone photoreceptors exist in the retina: red, green, and blue, each of which are activated by red light, green light and blue light, respectively [7]. In murines, there are only red/green and blue cone photoreceptors and they constitute about 3% of the cells in the ONL and are much less sensitive than rods but allow for color vision [8]. The inner nuclear layer (INL) contains the nuclei of interneurons and one glia, namely the Muller glia. These interneurons include horizontal cells, bipolar cells, and amacrine cells (Fig 1.1). The outer plexiform layer (OPL) is the synaptic layer where photoreceptor cell axons synapse with bipolar cell and horizontal cell dendrites. The layer closest to the vitreous is known as the ganglion cell layer (GCL)

(Fig 1.1), which contains the main output neurons of the retina, namely the ganglion cells

[3]. The long ganglion cell axons comprise the optic nerve, which terminates in the lateral geniculate nucleus in the brain from which the information is relayed to the visual cortex.

The inner limiting membrane is the most proximal to the vitreous and separates the first

3 layer of the retina from the vitreous humor and contains the end-feet of the Muller glia

(Fig 1.1). The inner plexiform layer (IPL) contains synapses between bipolar cells, amacrine cells and ganglion cell dendrites [3].

During phototransduction, light interacts with the photoreceptor outer segments and rhodopsin in rods and cone-opsins in cones undergo a conformational change and become activated [9]. This results in the photoreceptors decreasing the release of neurotransmitters to the interneurons of the INL. Photoreceptor interaction with bipolar cells varies based on the type of bipolar cell but this interaction relays the information to the ganglion cells [9] (Fig 1.1). Lateral modulation of signal by horizontal cells and amacrine cells allows cells from one region of the retina to influence cells of another region. Horizontal cells receive input from photoreceptors and provide a feedback mechanism to other photoreceptor cells. This in-turn helps to regulate and coordinate the input from multiple photoreceptor cells [9]. Amacrine cells modulate the interaction between bipolar cells and ganglion cells. They receive input from bipolar cells and also serve as a feedback mechanism for those cells [9]. Once light reaches the ganglion cells, it is transmitted to the brain. Muller glia processes extend from the vitreous humor side to the photoreceptors acting like a ‘photoptic tube’ through which light can easily pass. This direct passage of light is facilitated by the fact that the glia contain very few organelles, which, if present, would scatter much of the light that would pass through [10].

1.3 Development of the mouse retina

In mouse, the eye begins to develop when optic vesicles evaginate from the diencephalic region of the neural tube around E9.5 [11]. After the evagination of the optic vesicles from the diencephalon, each optic vesicle invaginates to form an optic cup

4 around E10.5 (Fig 1.2). The inner lining of the optic cup gives rise to the neural retina while the outer lining results in the formation of the RPE [11] (Fig 1.2).

Retina is a very good model system with a stereotypic architecture as well as the birth order of its cell types are well characterized. Ganglion cells are born first at E11.5, followed by cone photoreceptors, horizontal cells and amacrine cells (Fig 1.3). Amacrine cells are produced postnatally too. Muller glia and bipolar cells are produced postnatally while rod photoreceptor cells are produced throughout retinal development with its peak production at birth [3] (Fig 1.3). In 1996, Cepko et al., proposed the ‘Competence model’ where it was shown that progenitor cells in early embryonic development are capable of producing more than one neuronal cell type, which was mainly determined by intrinsic signals rather than environmental cues [3]. While, the progenitor cells later in the development were restricted to giving rise to only one type of neuron [3]. Thus, although a retinal progenitor cell is considered multipotent, at any given time during development, the given progenitor cell will be restricted in its ability to produce certain cell types [12].

For example, progenitor cells fated to produce bipolar cells do not have the competence/ability to produce neurons born embryonically, even when placed in an embryonic milieu [12]. The reverse of this experiment also was performed where progenitors fated to produce embryonically born neuronal types could not produce them when placed in the postnatal environment [12]. This substantiated the view that progenitor cells had the ability to produce all retinal neurons, but their competence was restricted by the developmental time period in which they were isolated. Thus, cell-fate of retinal neurons is determined by both intrinsic factors inherent to the progenitor cells, which determine their competence to produce subsets of neurons, as well as by extrinsic signals

5 which influenced the specific type of a neuron that will eventually be produced in a given competence state [12].

1.4 Gene regulation in the developing mouse retina

Identification of gene regulatory networks connecting various pathways during retinal development have led to a better understanding of not only the retinal development but also, mechanisms that regulate these processes. The role of one of the master regulatory genes, Pax6, has been well-established in Drosophila, where ectopic expression of the gene was shown to induce eye formation with normal morphology [13]. Pax6 has been shown to play a crucial role in maintaining the multipotency of the retinal progenitor cells

[14]. It acts upstream of β helix-loop-helix (bHLH) family of transcription factors, many of which are key players in retinal development. One of the members of bHLH family, Math5 has been shown to positively regulate the formation of ganglion cells while it is negatively regulated by the Notch-Delta pathway [15]. Key transcriptional factors in photoreceptor formation include Nrl and Trβ2, which compete to keep the rod and cone photoreceptor cell production in balance [16]. In the absence of Nrl, all rod photoreceptors become cone- like. Nrl along with its target, Nr2e3 is essential for the rod photoreceptor production [17].

Nrl and another factor Crx (Cone- rod homeobox) are together required for the maintenance of rod and cone photoreceptors [18-20]. A few factors crucial for bipolar cell production and maintenance include bhlhb4, and Otx2, where bhlhb4 has been shown to be essential for rod bipolar cell maturation [21], while Otx2 has been shown to be crucial for both photoreceptors and bipolar cells [22]. Other bHLH factors such as Mash1 and

Math3 are also needed for proper bipolar cell production [23]. While the homeobox gene,

Chx10 is primarily present in retinal progenitor cells during early retinal development, later

6 it is expressed in the bipolar cells and Muller glia and is required for their production [21,

24]. Another factor, Sox2 has been shown to play a vital role in the embryonic and adult neural progenitors and is expressed in Muller glia [25]. Thus, transcriptional regulation controlling the production of various retinal cell types is well established. But, the underlying layer of isoform regulation of most of these genes still remain unexplored. This understanding of isoform-level regulation can help greatly in therapeutic strategies when a particular isoform can either be knocked down or upregulated without altering the expression at the gene level. To this end, in the next chapter, I discuss the importance of

Alternative splicing (AS) and how this layer of regulation has diverse roles including in central nervous system development.

7

Figures:

Figure 1.1

Fig 1.1: Stereotypic architecture of the retina: Hematoxylin and eosin stain of an adult mouse retinal section superimposed with schematic representation of retinal cell types. OS=outer segements of photoreceptors; IS=inner segments of photoreceptors;

ONL=outer nuclear layer; OPL=outer plexiform layer; INL=inner nuclear layer; IPL=inner plexiform layer; GCL=ganglion cell layer; RP=rod photoreceptor; CP=cone photoreceptor; RBP=rod bipolar; CBP=cone bipolar; AC=amacrine cell; GC=ganglion cell; MG-Mϋller glia

8

Figure 1.2

Fig 1.2. Development of the retina from neural ectoderm: Retinal development begins around embryonic day (E) 8.5 with the evagination of the diencephalon to form the optic vesicle followed by the formation of lens placode with signals from optic vesicle at

E9. Inductive signals between lens placode and optic vesicle results in the formation of optic cup and les vesicle at E10. At E10.5, lens is formed and the bilayered optic cup forms the outer non-neural retinal pigmented epithelial (RPE) layer and the neural retinal layer. Adopted and modified from Colozza G. et al [26].

9

Figure 1.3

Fig 1.3. Birth order of retinal cell types: The black arrow indicates the time-line starting with E11.5 to postnatal day (P) 14 and P0 is the day of birth. The first cell type sto be born is the output neuron, ganglion cells (green), followed by cone photoreceptors (red), interneurons horizontal cells (dark blue), amacrine cells (mustard yellow), bipolar cells

(fuchsia), and Muller glia (orange). The rod photoreceptor (white) production occurs across development with a peak at P0. Adopted and modified from Cepko et al [3].

10

Chapter 2: Alternative Splicing

This chapter focuses mainly on RNA processing, specifically the alternative splicing

(AS). My research focus is to combine the retinal development and AS to understand the role of AS in the developing mouse retina. To this end, this chapter describes the processes of transcription and splicing followed by alternative splicing process and the diverse roles played by AS in cell cycle, metabolism, disease, and CNS development as well as the factors that aid this process called alternative splicing factors.

2.1 Transcription and Splicing

Development of an organism is a tightly-controlled dynamic process, which involves the regulation of various genes at multiple levels of complexity. At the cellular level, numerous decisions are made including proliferation, division, differentiation and determination. For all these processes, expression of various genes at different phases are required. Therefore, their expression should be tightly regulated at every phase. The development of central nervous system follows the same paradigm. A neural progenitor cell is capable of undergoing an asymmetric (a progenitor cell and a neuron) or symmetric cell division, which could be into two neurons or two progenitor cell. For each of these decisions to be executed, various cell cycle regulating genes and fate determining genes are involved. Spatio – temporal expression of these genes are often regulated at the transcriptional levels and at post-transcriptional levels. The transcriptional level of regulation refers to the transcriptome changes that occur in a tissue-specific and time- specific manner.

Regulation of gene expression can be done at the DNA level, with the usage of an alternative transcription start site and polyadenylation sites or total silencing of a particular

11 gene or a set of genes by regulating the levels of a transcription factor. Often, the gene expression regulation occurs at the post transcriptional level, where different isoforms for a gene are generated, thereby allowing for diversity at the proteome level as well as in mRNA localization, transport, and turnover. Certain sequences in mRNA such as its 3’ untranslated region (3’UTR), exon-exon junction also determine the patterns of its turnover, transport, subcellular localization and its translation [27]. The exon-junction complex (EJC) plays an important role in the post-transcriptional regulation of mRNA, where it is recruited during splicing and remains associated with the mRNA and allows the export of the transcript to the cytoplasm [28].

Although the central dogma (information from DNA flows through a temporary messenger RNA in the nucleus, which is then exported to cytoplasm and is translated into proteins that carry out all major structural and functional activities in a cell) has pointed RNA to be an intermediate, there is another layer of regulation at the post- transcription level that ultimately decides the type of synthesized. RNA processing or regulation of RNA gained importance in late 1970s, when “one gene one protein” hypothesis was questioned as there was a discordance in the number of protein coding genes to the number of protein products in the higher mammals. The first step in RNA processing is the production of mature RNA by the process called splicing.

Splicing: Splicing is co-transcriptional and is carried out by a complex, which is made of five small nuclear RNAs and a large number of accessory proteins, called spliceosome

[29, 30]. This splicing machinery is required for the recognition of splice sites and to catalyze the two esterification steps. The primary RNA transcript (pre-mRNA), which is the identical copy of DNA has the exon and intron boundary (AG/GT), marked by a

12

5’Splice site (SS) (discovered in 1978 [31]) and a 3’Splice site, which is marked by AG/G

(intron-exon boundary) [32] and a branch point [33, 34] (Fig 2.1). The machinery gets assembled at these splice sites, which begins with the recognition of the 5’SS by the snRNP U1 and binding of the splicing factor 1 (SF1) to the branch point and of the U2 auxiliary factor (U2AF) to the polypyrimidine tract [35] (Fig 2.1). This step results in the formation of E-complex, which then becomes energy-dependent, pre-spliceosomal ‘A’ complex after SF1 is replaced by U2 snRNP at the branch point [35] (Fig 2.1). Further,

U4/U5-U6 tri-snRNP complex gets recruited leading to the formation of ‘B’ complex (Fig

2.1), which is then converted to the catalytically active ‘C’ complex after conformation changes [35]. This remodeling of the complex leads to the excision of intron as a ‘lariat’ structure and fusion of exons to form the mature transcript (Fig 2.1).

The decision to include or exclude an exon is decided by another level of regulation called Alternative splicing (AS) (Fig 2.2). The process of AS is aided by certain cis- and trans-acting factors called RNA-binding proteins, which play an important role in various aspects of posttranscriptional regulation of both nuclear and cytoplasmic mRNA including alternative splicing, localization, export and translational regulation, besides playing a vital role in constitutive splicing [36].

2.2 Alternative Splicing

Alternative splicing (AS) has long been thought be occurring at the initial step of splicing during the SS recognition and early spliceosome assembly [37]. Although that is mostly the case, recent evidences have shown that this decision could be made during different stages of splicing including later stages of spliceosome assembly via the interaction between cis-acting and trans-acting factors that either promote or repress the

13 recognition of SS or during either of the two esterification steps (Fig 2.2). AS is one of the mechanisms that contribute to proteome diversity as well as to tissue-specificity of many genes, thereby regulating gene expression in a spatio-temporal manner. Indeed, recent high-throughput sequencing studies on various human tissues have shown that 50% or more of alternatively spliced isoforms are differentially expressed among the examined tissues. The concept of AS was first introduced in 1977 with the discovery of exons and introns in the adenovirus hexon gene [38, 39]. Walter Gilbert predicted that different combinations of exons of could be spliced together to give the diversity in the mRNA isoforms [40]. Thus, AS can be seen as an important mechanism to create protein diversity in response to various developmental, environmental and metabolic cues [41-

43].

AS gained impetus due to the realization that there are fewer human protein coding genes (PCGs) compared to the number of proteins in humans. Due to the advent of technologies such as splicing-sensitive microarrays and high-throughput RNA sequencing (HTS), importance and the occurrence of AS in various tissues have gained limelight. Datasets obtained from these technologies have facilitated the understanding of AS regulated events in contexts of development or differentiated tissues or normal versus diseased states. In humans, about 90% of multi-exonic genes in various tissues employ alternative splicing [44, 45]. While, some AS events are species-specific, it is mostly an evolutionarily conserved process, which highlights the importance of AS very early in evolution. An example of a species-specific AS event is that of the transient receptor potential cation channel V1 (Trpv1) gene. A trigeminal ganglion-specific isoform of this gene is expressed in vampire bats but not in fruit-feeding bats. This isoform has a

14 lower temperature threshold, thereby enabling the vampire bats to sense warm-blooded animals [46].

2.2.1 Types of Alternative Splicing

Alternative splicing can occur in multiple ways to produce different combinations of exons to yield various isoforms. Some of the frequently occurring types of AS events include, i) inclusion of the alternate exon ii) exclusion of the alternate exon iii) alternative

5’ SS selection iv) alternative 3’ SS selection v) intron retention, where an entire intron or a part of it can be retained along with exons. Alternative splice sites are often termed

‘cryptic donor’ or ‘cryptic acceptor’ [47, 48] (Fig 2.3). There are also other less-frequent complex events such as mutually exclusive AS event, alternative transcription start site, usage of different polyadenylation signals. The frequency of these events vary among invertebrates and vertebrates. It has been shown that in higher vertebrates including mouse and humans, most frequently occurring AS event is exon skipping (almost 40% of

AS events), while in drosophila, exon skipping, alternate acceptor and donor and intron retention events are observed at equal frequency with about 20% of exon skipping event

[48].

Difference in the type of AS events are also seen in the different regions of mRNA, i.e., coding region versus untranslated region (UTR). It has been shown that exon skipping event is more frequent in the coding region while the intron retention event is more frequent in the 5’UTR. Also, more cryptic acceptors are found in the coding sequence compared to that in the UTRs. The less-frequent complex events mostly occur in the coding region [48].

2.2.2 Factors influencing alternative splicing

15

Various factors such as secondary structure, splice site selection, cis-acting elements can influence the AS event. The secondary structure can affect AS events by sometimes masking the splice sites or binding sites for splicing factors. For example, a stem and loop secondary structure has been shown to sequester alternative exon 6B of the chicken β-tropomyosin pre-mRNA, leading to its exclusion [49]. Another example is that of the Ras gene, where the IDX exon of Ras pre-mRNA can form a secondary structure with an ISS, preventing the binding of hnRNP H [50]. The helicase P68 exposes this binding site resulting in the binding. Some small nucleolar RNAs (snoRNAs) have also been implicated in AS regulation. For example, snoRNA HBII 52 has been shown to regulate the AS of Htr2c pre-mRNA by binding to an ESS in exon V b to aid its inclusion

[51].

The selection of splice site is also partially influenced by secondary structures in the pre-mRNA. An example is that observed in Dscam pre-mRNA in Drosophila. The exon

6 cluster of this pre-mRNA contains 48 mutually exclusive exons and various combinations of these exons have been shown to produce 38,016 isoforms [52].

Combination of a conserved sequence downstream of exon 5 (docking site) and another conserved sequence, a variant of which is upstream of each of exon 6 variants (selector sequence) allows for the inclusion of only one exon 6 variant. The other variants are excluded by the binding of hrp36, a Drosophila homolog of hnRNP A, to the selector sequence [53].

Alternative exons are shorter than constitutive exons and are flanked by longer introns but they are more conserved than the constitutive exons, especially the exon- intron boundary and these extend to 80 – 100 nucleotides in to the intron, where cis-

16 regulatory elements are embedded. The nature of cis-acting elements and their target binding proteins depend on their position relative to regulated exons. Proteins such as

Nova1, Nova2, Fox1, Fox2, hnRNP L, hnRNP F, hnRNP H have been shown to act as either activators or repressors depending on the location of their binding site. For example, Nova1 binds to an intron splicing enhancer (ISE) in Gabrg2 (GABA A receptor,

γ2) pre-mRNA and enhances the inclusion of exon 9, while it binds to the exon splicing suppressor (ESS) in the exon 4 of its own pre-mRNA and prevents its inclusion [54]. hnRNP L can either activate or repress depending on the location of its binding site relative to the regulated 5’ SS [55]. Similarly, hnRNP H can promote or repress the spliceosomal assembly based on the location of G-rich sequences whether it is downstream of the 5’SS or in the exon [55]. Sometimes, these factors can work in combination to activate or inhibit the splicing process. For example, Nova1 and Nova2 can bind to an ESS and inhibit the formation of the ‘E’ complex by altering the composition before hnRNPs bind, ultimately inhibiting the binding of U1 snRNP. On the contrary, if

Nova1 and Nova2 bind to an ISE downstream of the alternative exon, they promote the formation of ‘A’, ‘B’ and “C’ complex during splicing [56].

2.2.3 Role in diseases

AS plays an important role in diseases as one can imagine the effects of disrupted splicing, such as cancer, metabolic disorders such as obesity, gene mutation etc. [57-59].

A well-known example of disrupted RNA processing and AS events is observed in a class of disorders called ‘microsatellite expansion disorders’. Microsatellites are very short sequence (~ 1 to 10) repeats, which are variable in number between individuals and are capable of causing a disease when the number expands beyond the threshold. This

17 expansion can cause either loss of protein function or gain of aberrant protein function in the open reading frame or gain of function of the RNA that contains this expansion [60].

A few examples of diseases that are caused by RNA gain of function is myotonic dystrophy types 1 and 2 (DM1 & DM2) [61, 62], fragile X-associated tremor ataxia syndrome (FXTAS) and spinocerebellar ataxia 8 (SCA8) [63]. In DM1, CUG repeat located in the 3’UTR of Dmpk mRNA expands beyond its normal range of 5 – 38 repeats to about 50 - > 2500 repeats [64], while DM2 is caused by expanded CCTG repeats in intron 1 of the Znf9 gene [65] and the clinical symptoms of DM1 and DM2 are quite similar except that DM2 is less severe and does not present in the congenital form [62, 64].

The flexibility offered by AS to offer proteome diversity is used well in various types of cancer. Since there is a thin line of separation between normal cell cycle and cancerous growth, most AS events that are known to occur involve reverting of some of the isoforms expressed during cell cycle in the adult stage, thereby leading to the cancerous growth.

AS regulates several genes that play important roles in promoting invasive behavior. In addition, a process that often plays a role in the acquisition of invasive behavior in cancer cells, the epithelial-to-mesenchymal transition (EMT), is accompanied by a reprogramming of AS [66]. Also, once the cells become cancerous, they suppress the apoptotic pathway. Therefore, genes involved in apoptosis pathway, which are alternatively spliced, express their anti-apoptotic isoforms thereby aiding the growth [67].

Some of the apoptotic genes that are alternative spliced include Bcl-x, Caspase2 and

Fas. AS events in exon 2 of Bcl-x create two isoforms – Bcl-x(L), which has anti-apoptotic effects and Bcl-x(s), a pro-apoptotic isoform [68].

18

Casp2 mRNA is alternatively spliced to produce two most important isoforms –

Caspase-2L, which is a full-length coding isoform with pro-apoptotic properties, while the other isoform, Caspase-2s, is an isoform with additional 61-nucleotide exon. This exon inclusion results in the frameshift leading to a premature stop codon, thus creating a short- lived nonsense-mediated decay substrate. This isoform was shown to protect against cell death [69, 70].

Another example is Fas pre-mRNA, where the full-length protein coding isoform

(Fas-L) is pro-apoptotic. While, it also produces a shorter isoform, which excludes a 63 nucleotide-long exon 6. This event results in the deletion of transmembrane domain leading to the soluble form of Fas, which is capable of inhibiting FasL-mediated cell death

[71].

2.2.4 Role in central nervous system development

AS plays an important role in defining tissue-specificity. This, in part, could be explained by tissue-specific expression of the splicing factors and their target mRNA expression. Among all human tissues, brain has been shown to employ the highest degree of AS. Many brain-specific AS regulators have been identified, such as, nPTB,

NOVA1, NOVA2, Hu/elav proteins, to name a few. In addition, more than 300 RNA binding proteins have been identified in proliferating and post-mitotic brain cells with a region and cell specific expression. For example, PTB is highly expressed in progenitors while downregulated in differentiated neurons, where another isoform, nPTB is upregulated. Recently, it has been shown that switch from PTB to nPTB is an important post-transcriptional regulatory step in neuronal differentiation [72].

19

Another brain-specific factor, Nova proteins, are also involved in programming of different types of neuronal cells. Nova1 and Nova2 are differentially expressed in postnatal mouse brain, with Nova2 being highly expressed in the neocortex and hippocampus and Nova1 being expressed in the hindbrain and spinal cord [73].

Conditional knockout analysis of Nova2 has shown that Nova2 regulates ~7% of brain- specific splicing in the neocortex and that Nova2- dependent AS regulates the expression of mRNA transcripts that encode synaptic functions [73].

2.2.5 Role in cell cycle regulation

During normal development and differentiation, dynamic changes in the expression and activity of the splicing machinery are coordinately regulated to modulate the alternative splicing events in a spatio-temporal fashion and according to the physiological needs of the cell. Cell cycle is a key process during development allowing for the proliferation of progenitor cells and for the simultaneous execution of decisions such as determination, differentiation into a particular lineage once cells exit the cell cycle.

Therefore, numerous intricate networks work in a well-coordinated manner to execute these cellular processes. Perturbations caused by mutations, which alters the expression or activity of a splicing factor can lead to an array of human diseases. Cell cycle regulation is also a tightly- controlled process, which keeps the cell under check and prevents it from becoming cancerous. Any deviation in the expression of genes that are ‘check points’ in cell cycle can lead to uncontrolled growth of cells leading to cancer. Therefore, alternative splicing plays an important role in cell cycle regulation by regulating the isoforms of various important genes involved in cell cycle.

20

There are many genes that are employed at various phases of cell cycle including cyclin D1a, CD44, Rac1, Ras etc. Cyclin D1a, is the more common full-length isoform containing five exons. Another variant isoform is known as cyclin D1b, which is polyadenylated in exon 4 at a particular site [74]. Although both isoforms can associate with CDK4 and regulate the activity of CDK similarly [75], D1a can shuttle between the nucleus and cytoplasm in a cell-cycle dependent manner due to the presence of a phosphorylation site present at its C-terminus, while D1b remains nuclear due to the absence of this phosphorylation site [75, 76]. Cyclin D1b remains nuclear and is known to be upregulated in breast and prostate cancers [77] and mutations in D1a at the phosphorylation site can also mimic D1b and increased nuclear levels of the protein could be more oncogenic [78].

Another key player of cell cycle that is highly regulated by AS is CD44. Gunther et al., has shown that CD44 contains variant exons 4 -7 and isoform containing variant exons

6-7 were expressed specifically in a metastasizing pancreatic carcinoma cell line [79].

Various combinations of ten variant exons in CD44 leads to numerous isoforms of CD44

[80]. An isoform containing variant exon 6 was seen to be expressed in B and T lymphocytes after lymphocyte activation [81]. Normal expression of CD44 is that of the isoform containing the variant exons 8-10, which is seen in epithelial cells [82].

A member of Ras superfamily of GTPases, Rac1 has two variant isoforms. Like Ras,

Rac1 shuttle between an active form when it is bound to GTP and an inactive GDP-bound form. Rac1 activates transcription by NFkB and AKT kinase, members of important pathways implicated in various cancers [83, 84]. An isoform of Rac1 called Rac1b, which contains a 57-nucleotide exon 3b, has been shown to be expressed in colorectal cancer.

21

Another important gene involved in the cytokinesis step of cell cycle regulation is

Citron Kinase (CitK). It belongs to Rho family of GTPases and is found to be expressed in the proliferative cells of the developing brain. There are two known isoforms of this gene – full length protein-coding isoform called citron kinase and a shorter isoform without the kinase domain coding region. This gene was employed to study the role of alternative splicing in retinal progenitor cells in the developing rat retina and the function of other newly identified isoforms of CitK. Background on this gene is discussed in the next section and the results obtained in the study is discussed in chapter 4.

2.2.6 Alternative Splicing factors

Trans-acting factors that facilitate the process of AS are called alterative splicing factors (ASFs). These factors can either repress or promote the inclusion of an exon in the final transcript. Accordingly, there are two large family of ASFs namely serine-arginine rich (SR) proteins and heterogeneous nuclear ribonuclear protein (hnRNPs). While SR proteins usually are splicing enhancers and that they promote the inclusion of the alternative exon, members of hnRNP family suppress the inclusion of an exon. SR proteins typically contain one or two RNA –recognition motif (RRM) and one or two RS dipeptide containing domain. The RS domain of these proteins interact with other splicing factors and facilitate the recruitment of the spliceosomal components such as snRNP U1 to the 5’ SS or U2AF65, the largest subunit of U2AF to the 3’SS (SR proteins are discussed in detail in chapter 3) [85]. hnRNPs contain RRM-type and hnRNPK homology

(KH) domain-type RNA binding domains and play a key role in splicing of pre-mRNA and mRNA metabolism. They act as repressors by affecting the formation of the E and A complexes during the spliceosome assembly. Other repressors include Polypyrimidine-

22 tract binding proteins (PTB or hnRNP I), which blocks the binding of U2AF, hnRNP A1, which binds to intronic splicing suppressors (ISS) located upstream of exon 3 in HIV Tat pre-mRNA and prevents the binding of U2 snRNP [86]. Also, tissue-specific factors, Fox1 and Fox2 inhibit the formation of the E’ complex by binding to intronic sequences to prevent SF1 binding to the Calca pre-mRNA [87]. In addition to repressing, splicing inhibitors also block the binding of activators to enhancers. Hu/ELAV family of proteins inhibit U1 snRNP binding by competing with the binding of TIA1 to an AU-rich sequence downstream of the 5’ SS of exon 23a of neurofibromatosis type 1 pre-mRNA [88]. Another example is hnRNP A1 binding to an ESS upstream of the Tra2b-dependent exon splicing enhancer in exon 7 in the SMN2 pre-mRNA, inhibiting the formation or stabilization of U2 snRNP complex [89].

Thus, the final decision of whether or not an alternative exon is included in the transcript is determined by concentration or activity and the combination of activators and repressors. For example, SR protein 9G8 (SFRS7) and hnRNP F and H regulate the splicing of exon 2 of α-tropomyosin by competing for the same binding site [90]. Similar regulation exists in β-tropomyosin exon 6B, where there is an antagonistic regulation between hnRNP A1 and SR proteins, ASF/SF2 (SFRS1) and SC35 [91]. PTB and CELF

(CUGBP- and ETR-3 like factor) protein family and CUGBP1 act antagonistically to regulate the inclusion of exon 5 of troponin T type 2 pre-mRNA [92].

23

Figures Figure 2.1

Fig 2.1 Steps in the splicing process. Basic splicing mechanism begins with the recognition of 5’ splice site and branch point by U1snRNP and U2AF (‘E’ complex). This leads to the recruitment of U2 snRNP, resulting in the formation of ‘A’ complex, which results in first trans-esterification step. Recruitment of the tri-snRNP complex (U4-U6-

24

U5) to the pre-mRNA leads to the second trans-esterification step. Finally, the intron is excised as a ‘lariat’ and the exons are fused to form the mature RNA.

25

Figure 2.2

Fig 2.2 Mechanism of alternative splicing. A. Presence of splicing enhancers, such as

SR proteins (shown in ‘blue’) on the pre-mRNA facilitates the recruitment of the splicing machinery, leading to the inclusion of the alternate exon. B) Presence of splicing repressors, such as hnRNPs (shown in ‘magenta’) on the pre-mRNA prevents the recruitment of the splicing machinery, leading to the exclusion of the alternate exon.

26

Figure 2.3

Fig 2.3. Types of Alternative Splicing. A – E. – Various types of alternative splicing events include A) exon splicing B) Mutually exclusive event C) Alternative 5’ splice site

D) Alternative 3’ splice site E) intron retention.

27

Chapter 3: Role of the alternative splicing factor, Sfrs10 in mouse

retinal development

This chapter approaches the question of role of alternative splicing (AS) specifically through an alternative splicing factor, Serine-arginine rich alternative splicing factor 10

(Sfrs10) in mouse retinal development and disease. This approach to understanding the role of AS, follows the same paradigm of understanding the process of transcription through transcription factors. First, expression analysis of this gene, both at the mRNA and protein level, is discussed followed by the investigation of its role as a stress- response gene in age-related macular degenerating retinae.

3.1 SR proteins

Serine/arginine (SR) family of proteins are a highly conserved family of splicing regulators. The discovery of SR proteins dates back to studies in Drosophila, when genetic screening identified SWAP (suppressor-of-white-apricot), Tra and Tra2 as splicing factors [93, 94]. Characterization of their sequences led to the identification of the

RS domain. Identification of factors ASF/SF2 and SC35 from human cell lines extended their structural characterization to reveal the presence of atleast one RNA recognition motif (RRM) and another region that is not an exact match to RRM but partially matched the consensus sequences and was referred to as RRM homolog (RRMH). These proteins are called “SR proteins” and was described by Roth and his colleagues based on five criteria [95]: 1. All members have a phosphoepitope recognized commonly by the antibody, mAb104 2. They co-purify in a two-step salt precipitation procedure 3. They contain one or two RRMs at the N-terminus and an arginine-serine (RS) dipeptide

28 containing domain at the C-terminus. Sizes of these proteins are conserved from

Drosophila to humans on a SDS-PAGE. These proteins can complement splicing deficient S100b extracts [95]. SR proteins, besides, acting as exon-splicing enhancers

(ESE), have also been implicated in mRNA export, localization, nonsense-mediated decay (NMD) and translation [85, 96-98]. When they act as ESEs, they recruit the splicing machinery to the pre-mRNA via their RS-domain-protein interactions, to the splice site

[99-101]. Kinetic analyses have shown that the relative activity of SR proteins determines the magnitude of splicing efficiency [85]. This activity depended on the SR protein number and the distance between the ESE and the intron. Also, the activation of splicing was proportional to the number of RS repeats within the RS domain [102].

Location: SR proteins are usually enriched in nuclear compartments termed

‘speckles’, which occur throughout the nucleus [85, 103]. Speckles can be of two types –

Inter-chromatin granule clusters (IGCs), which are about 20-25nm in diameter, and perichromatin fibrils, which are about 5nm in diameter. IGCs are usually the storage/assembly sites for pre-mRNA splicing factors while the fibrils are sites of actively transcribing genes and splicing [104, 105]. Biochemical analyses have shown that RS domains are responsible for bringing the SR proteins to these speckles. RS domain, RRM and their phosphorylation status are required for the recruitment of SR proteins to perichromatin fibrils from IGCs.

Phosphorylation and concentration of SR proteins: Phosphorylation of SR proteins is vital for their activity in both constitutive and alternative splicing and for their localization. Serines in the RS domain get phosphorylated and this activity determines their activity [85]. Regulation of SR protein activity by phosphorylation has been

29 implicated in cancers and other diseases. Phosphorylation is required for efficient SS recognition. A number of SR protein kinases have been identified that specifically phosphorylate the serine residues in the RS domain. They include SR protein kinase

(Srpk), Clk/Sty kinase, cyclin-dependent kinases (Cdk), and topoisomerases [106, 107].

Also, SR proteins function in a concentration-dependent manner. SRp38 has been shown to facilitate inclusion of the Flip exon of the Gria2 (Glutamate receptor, ionotropic,

AMPA2) pre-mRNA, whereas the mutually exclusive Flop exon is included in its absence.

Interestingly, both of these exons contain SRp38-binding sites and it was shown that intracellular concentrations of SRp38 and differential binding to exons influenced whether

Flip or Flop exon was included [108].

SR proteins as repressors: There is a balance between activation and repression, which regulates the inclusion/exclusion of an exon in the primary transcript [86, 109].

Although, there are several members in the SR protein family, some of the members have distinct binding sites endogenously, for example, SRSF3 and SRSF4 do not have any overlap in the binding sites of their targets [110, 111]. There are a few SR proteins which cooperatively operate with overlap in their binding sites, such as SRSF1 and SRSF2, which can compensate for one another in exon inclusion/exclusion [112]. Sometimes, SR proteins bind to intronic regions and function as negative regulators. An example is that of SF2/ASF, which during adenovirus infection, binds to an intronic repressor element located upstream of the 3’ SS at the branch point sequence in the adenovirus pre-mRNA

[113]. The binding prevents the recruitment of U2AF and so the splicing process.

Role of SR proteins in mRNA export: SR proteins are known to shuttle between cytoplasm and nucleus depending on their phosphorylation status. SF2/ASF, SRp20 and

30

9G8 have been shown to shuttle between the two compartments. It has been shown that

SR proteins, 9G8 and SRp20 promote nuclear export of the intronless histone H2A mRNA by binding to 22-nucleotide sequence within H2A mRNA in mammals and frog. These SR proteins have also been shown to handover mRNA to the nuclear export factor, Tip- associated protein (Tap) [114, 115]. Another SR protein, Npl3p, a yeast protein has been shown to assist mRNA export in yeast, where the phosphorylation of its RS domain regulates the efficiency of its export function [116].

Role of SR proteins in translation: SR proteins have been shown to play diverse roles in RNA processing including translation of mRNA. They can either directly or indirectly influence translation. For example, SF2/ASF influences the AS of the kinase that regulates translation initiation, Mnk2 pre-mRNA [58]. Directly, SF2/ASF has been shown to influence translation by associating with polyribosomes and mediating the recruitment of components of mammalian target of rapamycin (mTOR) pathway, which releases the inhibitor of the cap-dependent translation, thereby promoting translation [117]. Also,

SRp20 has been shown to promote translation of a viral RNA initiated at the internal ribosome entry site [118].

3.2 Serine-Arginine rich alternative splicing factor 10 (Sfrs10)

SFRS10 (Tra2beta) belongs to SR-protein family. Sfrs10 is a mammalian homolog of Drosophila Tra2beta, which regulates the AS of genes involved in sex determination of the fly [119, 120] and causes sex reversal in Drosophila due to the altered splicing of doublesex (dsx) transcripts [121]. Sfrs10 contains a RRM and two RS domain, one at the

N-terminus and the other at the C-terminus. Sfrs10 has been shown to activate splicing of some targets via (A)GAA binding motifs in its RRM [122], while for some targets, it can

31 activate through its N-terminal RS domain (RS1), which has been shown to be vital and highly conserved between vertebrates and non-vertebrates. Sfrs10 protein lacking RS1 domain has been shown to play the role of a splicing repressor suggesting a dual role of an enhancer and repressor during splicing [123].

3.2.1 Role in development and metabolism

In higher vertebrates, loss of Sfrs10 results in mouse embryonic lethality around embryonic day (E) 7.5-8.5 [124]. Brain specific knockout of Sfrs10 has shown to result in perinatal death where pups die by P1 and show malformed brain [122]. The role of Sfrs10 in various tissues such as testis, intestinal and vascular smooth muscle has been studied.

SFRS10 has been shown to be involved in the regulation of splicing of the clathrin light chain B (Cltb) at the vesicles, which is involved in receptor-mediated endocytosis [125,

126], and in splicing of Mypt, which is involved in vascular smooth muscle diversification

(splicing of MYPT) [127]. SFRS10 has been shown to alter splicing of lpin1 [128], which is a key regulator of lipid metabolism, thereby leading to increased lipogenesis and VLDL secretion in mice [129-131]. It has been shown that Sfrs10 plays a crucial role in neuronal development and survival through its targets such as Sgol2 and Tubd1 [125]. Additionally, knockdown of Tra2b has been shown to perturb p21-mediated cell cycle inhibition, thereby resulting in apoptosis [125].

3.2.2 Role in diseases and in stress-response

Sfrs10 was originally identified as a stress response gene called RA301 that was strongly induced in cultured astrocytes during reoxygenation following hypoxia and in ischemic rat brain [132]. The same group showed that knockdown of Sfrs10 in astrocyte subjected to hypoxia/reoxygenation stress had 3-fold reduction in the release of a

32 neuroprotective interleukin-6 (IL6). This was the first report of an ASF linked to active stress-response mechanism. Other evidence of stress induced upregulation of Sfrs10 was shown by Tsukamoto et al [133] where it was observed that middle carotid artery occlusion caused upregulation of Sfrs10 in vascular smooth cells. In gastric cell lines exposed to arsenite led to changes in splice patterns of Sfrs10 itself and it was translocated to the cytoplasm in part due to the change in its phosphorylation status [134].

In the same study, knockdown of Sfrs10 altered inclusion of exons responsible for the central variable region of CD44, which led to suppression of cell growth. Also, carotid artery occlusion induced ischemia in the brain showed hyperphosphorylation and translocation of Sfrs10 into the cytoplasm of the affected neurons [70]. This shift in Sfrs10 was concomitant with changes in the splice site usage in the known initiator caspase,

ICH-1/caspase2 gene. Over expression of Sfrs10 has been shown to upregulate p53 levels [135]. Reduced levels of Tra2b have been observed in various diseases including stroke, breast cancer [136] and neurodegenerative diseases like spinal muscular atrophy

(splicing of SMN) [137-139] and in tauopathies like Alzheimer’s disease and fronto- temporal dementia with Parkinsonism (FTDP) [136, 140]. The gene, microtubule associated protein tau (MAPT) undergoes AS to produce six brain-specific protein-coding isoforms. The ratio of these isoforms is perturbed due to mutations in the splicing elements in the gene, which leads to the production of irregular ratios of Mapt transcripts observed in patients affected with FTDP. It was also shown that mutations in Mapt produced abnormal Tau proteins causing neurodegeneration. Sfrs10 regulates the splicing event of exon 10 inclusion in Tau pre-mRNA. Inclusion or exclusion of exon 10 leads to the generation of three- or four- microtubule repeats and the ratio under normal

33 conditions equals one and is essential for proper neuronal function. Imbalance created in the ratio leads to neurodegenerative diseases such as Alzheimer’s disease [141, 142].

3.3 Rationale

The alternative splicing factor, Sfrs10, has been shown to play an important role in brain development. I wanted to explore the function of Sfrs10 in the developing mouse retina. To this end, I first characterized the gene and its protein across mouse retinal development. The expression of alternative splicing factor, Sfrs10 was examined across various species as the Sfrs10 protein is 100% conserved between all mammals including mouse and humans. This led to the analysis of Sfrs10 protein by immunohistochemistry on sections of retinae obtained from normal humans.

Surprisingly, Sfrs10 protein was not expressed in normal humans, unlike in mouse, where it had an ubiquitous expression by adulthood. Since Sfrs10 is a well-known stress- response gene, I investigated retinal sections obtained from patients with age- related macular degeneration (AMD). Further, I also tried to explore the function of this gene under hypoxic stress in cell culture system, which was eventually shown by

Kuwano et al [143] that Sfrs10 plays a protective role by promoting the formation of anti- apoptotic isoform of Bcl2 (Bcl2α) under stress.

Following sections include contents of two my publications [144, 145] titled “The expression analysis of Sfrs10 and Celf4 during mouse retinal development” and

“Expression Analysis of an Evolutionarily Conserved Alternative Splicing Factor,

Sfrs10, in Age-related Macular Degeneration”.

34

3.4 Background

Age-related Macular degeneration (AMD) is the leading cause of blindness in the aging population in the developed world. According to National Eye Institute, in 2010, approximately 2 million people were affected with AMD in the US and the number is estimated to go up to 3.6 million by 2030 (http://www.nei.nih.gov/eyedata/amd.asp). AMD affects the macula, the region in the retina that is responsible for sharp, central vision.

AMD can be of two distinct forms: dry AMD (non-exudative or atrophic) or wet AMD

(exudative or neo-vascular) of which dry AMD is the most common. AMD is a multifactorial disease that includes genetic, environmental and physiological components. The underlying cause of AMD is thought to be the hypoxic condition experienced by the photoreceptors leading to their degeneration [146-152].

Rod photoreceptors consume more O2 per gram of tissue weight than any other cell in the body [6]. This constant high energy demand makes the photoreceptors more susceptible to hypoxic stress. Factors such as oxidative stress, accumulation of autoxidative lipofuscin in the lysosomes of retinal pigmented epithelial (RPE) cells [153-

158] and accumulation of drusen in between Bruch’s membrane [159, 160] and the epithelial layer affect the RPE which results in the senescence of these cells. RPE not only plays a vital role in supplying nutrients and oxygen from choroidal vasculature to the photoreceptor cells, but also in removing the metabolic wastes from the photoreceptors.

Since the outer segments of the photoreceptors interact with RPE, senescence of the latter affects the normal functioning of the former [161, 162].

Genome wide association studies have shown SNPs in genes including VegfA,

VegfR2, Arms2, Htra1, and CFH to have significant association with AMD [163-169].

35

However, the role of alternative splicing in the pathogenesis of AMD is not well understood. Members of SR protein family such as SFRS1 have been shown to play a role in the pathogenesis of AMD. Phosphorylated SFRS1 was shown to promote proximal site selection in exon 8 of VEGF to generate the angiogenic isoform, VEGF (165) in AMD

[170, 171]. We investigated the role of SFRS10 in AMD.

3.5 Results

3.5.1 Sfrs10 is expressed across mouse retinal development

To determine whether Sfrs10 was expressed during mouse retinal development, RT-PCR was performed across retinal development. The position of the primers used to amplify the coding sequence of Sfrs10 is shown in Fig 3.1A. Expression was observed across retinal development with two distinct isoforms at postnatal day (P)4, P10 and P14 (Fig

3.1B). Sequence analysis of both these isoforms revealed that the isoform at 1234 bp was the canonical isoform encoding a full-length Sfrs10 protein. The higher MW isoform at 1511 bp was similar to the previously reported exon 2a containing isoform. Inclusion of exon 2a introduces a premature stop codon, which results in non-sense mediated decay of this transcript. Gapdh was used as a control (Fig 3.1B’). 5’ rapid amplification of cDNA ends (5’RACE) was performed to investigate whether alternate transcription start sites were being employed for Sfrs10 across retinal development. For this, PCR was employed on 5’-RACE ready cDNA library from retinae at different developmental time points. While the forward nested primers were in the 5’ linker sequence, the reverse primer was in exon 4 of Sfrs10. The PCR product was then purified and subjected to restriction digestion with EcoNI, which yielded two bands as predicted to be of 380 bp and 130 bp, respectively (Fig 3.1C). There was no change in the transcription start site

36 across retinal development. Similar investigation across different developmental time points was carried out to check the alternate polyA signal usage in Sfrs10 (NM_009186.4) as it contains three putative polyA signals (1672 bp, 1762 bp, 2247 bp). 3’ rapid amplification of cDNA ends (3’RACE) was employed with forward primers designed upstream of each polyA signal and a reverse primer in the linker sequence followed by restriction digestion with BsrI, which yielded two bands as predicted 407 bp and 230 bp.

Again, the second PolyA signal at 1762 bp was utilized for Sfrs10 transcript across retinal development (Fig 3.1D).

3.5.2 Sfrs10 is expressed in differentiating retinal cells

Next, we employed IHC with AB#1 on mouse retinal section across postnatal development to ascertain the cell types expressing Sfrs10. At P0, Sfrs10 was observed in the nuclei in the ganglion cell layer (GCL) (Fig 3.2A). At P4, it was observed at the bottom of the outer neuroblastic layer (ONBL), besides the GCL (Fig 3.2B). Sporadic

Sfrs10+ cells were also observed in the middle of the ONBL of the P4 retina. Next, Sfrs10 was costained with syntaxin, which was localized to the cytoplasm of the Sfrs10+ nuclei.

At P8, Sfrs10 was observed in all of the cell types in the INL including, Müller glia, horizontal cells, bipolar cells, and amacrine cells and the GCL (Fig 3.2D). In addition, in the newly formed ONL at P8, a few Sfrs10+ cells were observed. Similar pattern was observed in the ONL of P10 and P14 retinal sections (Fig 3.2E, F).

3.5.3 Sfrs10 marks red/green cone photoreceptors between P8-P14

The sporadic Sfrs10+ cells in the ONL at P8, P10 and P14 could either be rod photoreceptors or cone photoreceptors. To distinguish between these two possibilities,

P8 and P10 retinal sections were stained for expression of Sfrs10 and red/green opsin.

37

Three independent antibodies for Sfrs10 (AB#1, AB#2, and AB#3) were employed to confirm the specificity of the antibody (Fig 3.3A). Since the primary antibodies to detect both Sfrs10 and red/green opsin were raised in rabbit, they could not be detected simultaneously. Instead, serial IHC was performed, first with Sfrs10 followed by red/green opsin. Since Sfrs10 and red/green opsin are segregated into separate cellular compartments, i.e. nucleus and cytoplasm, respectively, we were able to employ serial

IHC. Here, it was observed that Sfrs10+ nuclei were enveloped by red/green opsin staining thereby confirming the identity of these cells to be red/green opsin cone photoreceptors (Fig 3.3B-D”). All three Sfrs10 antibodies showed similar pattern, thereby confirming the specific enrichment of Sfrs10 in red/green cone photoreceptors. Further, we performed immunoblot analysis on P0 retinal extracts with the two antibodies (AB#1

& AB#2), which again showed a single immunoreactive band at the expected molecular weight of Sfrs10 (Fig 3.3G, H).

3.5.4 Sfrs10 is peri-nuclear in mouse rod photoreceptors

Analysis of Sfrs10 expression across postnatal development ending at P14 showed that the rod photoreceptors did not express Sfrs10. However, it is known that rod photoreceptors undergo rapid outer segment genesis between P13 and P17 and terminal differentiation including synapse formation is thought to be complete by P21 [161, 172].

To test whether Sfrs10 is expressed in rod photoreceptors during their terminal differentiation, IHC was performed for Sfrs10 on P22 retinal sections, which showed

Sfrs10 expression in rod photoreceptors, albeit in a different pattern (Fig 3.4A). Unlike cone photoreceptors, where Sfrs10 was found in a diffused nuclear pattern, in rod photoreceptors it was localized in a peri-nuclear ring-like structure. Since photoreceptors

38 do not have a large cytoplasmic space around the nucleus, it was not clear if Sfrs10 was in the cytoplasm or the nucleus. To further delineate the localization of Sfrs10, 6 μm retinal sections were costained with Sfrs10 and LaminB2 antibody, which marks the inner membrane of the nuclear envelope [173]. Here Sfrs10 was found on the DAPI stained region, but did not overlap completely with LaminB2 thereby confirming its localization on the nuclear side of the nuclear envelope (Fig 3.4A-A’’).

To investigate the role of Sfrs10 in mouse retinal development, we wanted to use a retina-specific conditional knockout of Sfrs10. While I waited for the chimeras to arrive from the KOMP facility at the University of California, Davis, I simultaneously started addressing the question through en-masse transcriptomic analysis using RNA sequencing, which I will discuss in Chapter 5.

3.5.5 Sfrs10 is 100% conserved in mammals

Sfrs10 has three functional domains namely, N-terminal RS1 domain (31-113 amino acids (AA)), C-terminal RS2 domain (231-287 AA) and RNA recognition motif (118-

196 AA) (Fig 3.5A). AA sequences corresponding to these functional domains were compared across different vertebrate species to assess the percentage of conservation of Sfrs10. The selected species were Danio rerio (zebrafish), Xenopus tropicalis (frog),

Gallus gallus (chicken), Pan tronglodytes (chimp), Bos taurus (cow), Canis lupus familiaris (dog), Homo sapiens (humans), Sus scrofa (pig), Mus musculus (mouse),

Rattus rattus (rat) and Delphis delphis (dolphin). The selection was based on the availability of the entire AA sequence in each class.

The entire RS1 domain was conserved 100% between chicken and all the mammalian species (yellow background, Fig 3.5B) while only the RS dipeptide region

39 showed 100% conservation in all species (Fig 3.5B). RS2 domain was similar to RS1 domain in that there was 100% conservation in the RS dipeptide region among all the species (yellow background, Fig 3.5B’’) with the entire domain conserved 100% among the mammals (Fig 3.5B’’). Within the RRM of SR proteins there are two motifs called ribonucleoprotein 1 (RNP1), an octamer, and ribonucleoprotein 2 (RNP2), a hexamer, which are highly conserved [174-177]. In Sfrs10, RNP1 sequence is RGFAFVYF (159-

166 AA) and RNP2 sequence is LGVFGL (125-130 AA). Comparison of RNP1 motif across the species showed 100% conservation except for zebrafish and frog, which had one AA change (phenylalanine to leucine in zebrafish, alanine to serine in frog) (red box,

Fig 3.5B’). Similarly, comparison of RNP2 motif showed 100% conservation across all the species (blue box, Fig 3.5B’). Interestingly, the entire Sfrs10 AA sequence showed 100% conservation in mammals including the linker regions between the functional domains.

3.5.6 Sfrs10 expression is conserved in mouse, rat and chicken retinae

Given that Sfrs10 is highly conserved at the protein level across different species, we wanted to investigate whether its conservation was reflected in its expression. We chose mouse, rat, chicken, and zebrafish as these are widely used model organisms for retinal research. We employed immunoblot analysis using an antibody that recognizes the conserved N-terminal of the protein (black box, Fig 3.6A). A single band at the predicted (39 kDa) size was observed in mouse, rat, and chicken retinal extract (Fig 3.6B).

This band was at the same molecular weight (MW) as the positive control, which was the protein extract prepared from HEK-293t cells transfected with a construct expressing full length Sfrs10 protein. Notably, no immunoreactivity was observed in the lane with zebrafish retinal extract (Fig 3.6B).

40

3.5.7 Sfrs10 expression is pan-retinal in mouse, rat and chicken

To ascertain the cell type specific expression of Sfrs10, retinal sections from adult mouse, rat and chicken retina were subjected to IHC with the same antibody used for immunoblot analysis (black box, Fig 3.6A). In mouse, Sfrs10 was robustly expressed in all the cell types of the retina including, ganglion cells, amacrine cells, Müller glia, bipolar cells, horizontal cells and in both cone and rod photoreceptors. The expression pattern in photoreceptors was different with Sfrs10 expressed diffusely across the nucleus in cone photoreceptors, while the expression was peri-nuclear in rod photoreceptors (Fig 3.7A).

In rat, Sfrs10 expression was similar to that of mouse and was observed in all cell types of the retina (Fig 3.7B). Similarly, in chicken, Sfrs10 was expressed in both the ganglion cell layer and in the inner nuclear layer, but in the photoreceptor layer there was a minor difference in the expression pattern. Sfrs10 expression was not observed in a subset of cells, which by the staining pattern was deduced to be rod photoreceptors (Fig 3.7C).

3.5.8 SFRS10 is not expressed in normal human retinae but is upregulated in AMD retinae

Given that Sfrs10 is a known stress response gene that is 100% conserved between mouse and human, we sought to investigate its expression in normal and AMD human retinae. For this, three independent normal retinal sections were subjected to IHC with anti-Sfrs10 antibody (Table 3.1).

Sample #1 showed no Sfrs10 expression in any of the retinal layers (Fig 3.8A).

Similarly, sample #2 showed no SFRS10 expression (Fig 3.8B). Again, sample #3 showed no SFRS10 expression, except for low levels in a few cells in the ganglion cell layer (Fig 3.8C).

41

Table 3.1: Case history of specimens analyzed

hours post Sourc Sampl Ag Se Categor morte Cause of e e No. e x y m fix Death Medical History

Abca unknow unknown m 1 50 M Normal n unknown NDRI 2 74 F Normal 4 COPD no history available Lung cancer, Pneumonia, Congestive heart NDRI 3 74 F Normal 5 Cancer failure Endometro cancer, osteoporosis, hypertension, glaucoma, macular degeneration, Hemangioma, Respiratory aortic stenosis, NDRI 4 88 F AMD 7.8 failure insomnia Macular degeneration, Diabetes, Cataract surgery, hemicholoectomy, hypertension, hypercholesterole NDRI 5 86 F AMD 7.4 Abcess mia Macular Degeneration, pneumonia, hypoxic respiratory NDRI 6 89 M AMD 6.5 CVA failure, dementia Acute unknow 4.0 - cardiac UCHC 7 76 F n 5.0 arrest Unknown Abbreviations: CVA – Cerebro Vascular Accident (Stroke); COPD – Chronic Obstructive Pulmonary Disease; NDRI – National Disease Research Interchange; UCHC – University of Connecticut Health Center.

Next, we wanted to investigate the expression in AMD, so we obtained three independent AMD samples from NDRI (Table 1). Here, the level of degeneration in the

42 three samples was determined by the localization of L/M opsins in the cone photoreceptors by IHC. Under normal conditions, opsins are restricted to the membrane of the outer segments (OS) [178] as shown in normal sample #3 (Fig 3.9A). However, in case of the AMD retinae, L/M opsins were observed in the entire cone photoreceptor membrane (Fig 3.9B-D). This is in agreement with previous reports indicating that opsins in a degenerating retina relocalize to the entire photoreceptor membrane due to aberrant opsin trafficking [178]. Notably in Sample #4, vertical alignment of OS was lost and the thickness of the outer nuclear layer (ONL) was reduced compared to other two AMD retinae, indicating that this retina had a higher degree of degeneration (Fig 3.9B).

Subsequently, all of these retinal sections were subjected to IHC with anti-Sfrs10 antibody. Sample #4 showed a distinct upregulation of SFRS10 in a speckled pattern (Fig

3.10A). Similar upregulation was seen in a few cells in all three nuclear layers in sample

#5, where the layers were comparable to that in a normal retina (Fig 3.10B). The expression in sample #6 was similar to previous two samples with upregulation seen as distinct speckles in ganglion cells as well as in some cells in the INL and the ONL (Fig

3.10C). In addition, we analyzed the foveal/parafoveal regions from sample #7 by serial

IHC. Again, the staining for red/green opsin showed that the opsin is redistributed throughout the membrane of the photoreceptor, indicating that the sample was undergoing degeneration (Fig 3.10D). In case of SFRS10, the staining in red and green cone photoreceptors (encapsulated by red/green opsin) was diffused throughout the nucleus (solid white arrows in the inset, Fig 3.10D), while, a speckled expression was observed in rod photoreceptors (open white arrow in the inset, Fig 3.10D).

3.5.9 Human Sfrs10 promoter is different from that of rodents

43

The lack of expression in normal human retina could be due to the difference in the transcriptional regulation of mouse and human Sfrs10. To investigate the Sfrs10 promoter region, multiple sequence alignment was performed on the 400 bp region upstream of transcription start site in human, mouse and rat (Fig 3.12A). The homology between mouse and rat was 80% (Fig 3.12A), but the homology between mouse and human promoter was 34% (Fig 3.12A’), and between rat and human promoter was 32%

(Fig 3.12A”). Moreover, promoter in all three organisms did not have a conventional TATA sequence (Fig 3.12A). Since the mouse and rat promoters were highly similar and they both showed pan-retinal expression, we sought to investigate whether their promoters contained hallmarks of promoters associated with ubiquitously expressed genes. For this, the presence of GC-rich regions or CpG islands was investigated, as they are usually associated with “active” chromatin structures like those of house-keeping genes [179,

180]. Using CpG island detection tools such as CpG Island Searcher and CpGPlot, it was found that mouse and rat Sfrs10 promoter region had CpG islands (Fig 3.11A, B).

These CpG islands had average GC content of 60-70% and observed/expected CpG ratio

>60%. However, the human Sfrs10 promoter did not have CpG islands (Fig 3.11C).

3.5.10 Sfrs10 forms stress-related speckles

Besides the upregulation of Sfrs10 in AMD retinae, it also showed a speckled pattern of expression. This pattern has been described for other SR proteins such as SC35 [181,

182]. In addition, Sfrs10 has been shown to colocalize with SC35 in human neuroblastoma cell lines [183]. Thus, human retinal sections were co-stained with Sfrs10 and SC-35, which showed that SFRS10+ speckles did not overlap with SC35+ speckles

(Fig 3.13A-B’’). This led us to investigate whether SFRS10+ speckles were indeed stress

44 granules. HSF1 is known to form stress granules under various kinds of stress [184]. Also,

HSF1 was shown to colocalize with SFRS10 in human colon cancer cell lines [185].

Therefore, AMD retinal sections were co-stained with Sfrs10 and HSF1, which showed that SFRS10+ speckles did not overlap with HSF1+ stress granules. Nonetheless, the presence of HSF1+ granules confirmed that this retina is under stress and that SFRS10+ speckles are distinct and stress induced (Fig 3.13C-D’’).

3.5.11 Sfrs10 role in hypoxic stress

To test the role of Sfrs10 under hypoxic stress, whether it plays a pro-apoptotic or anti-apoptotic role, I employed cell culture system to mimic human AMD conditions. I used three different cell lines, namely human embryonic kidney (HEK) cells, HeLa cells and retinal cells derived from retinae by Dr. Kamla Dutt. I had planned to induce hypoxic stress by administering1) cobalt chloride 2) Arsenite and 3) hypoxic chamber in these cells.

While my preliminary tests to check if Sfrs10 was expressed in these cell lines showed that it was expressed under normal conditions in these cells, another group published the finding on the same question, and showed that Sfrs10 acts as an anti-apoptotic gene by competing with miR-204 and promoting the expression of Bcl2α isoform, which is a protective isoform of Bcl2 [143].

3.6 Discussion

Sfrs10 as a nuclear marker for red/green cone photoreceptors

ASFs are known to auto-regulate their levels. Indeed, Sfrs10 is one of the seven RNA-splicing associated genes that contain “exonic” class of “ultraconserved elements” [186]. This class of RNA-binding proteins is shown to auto-regulate their levels by the inclusion of a stop codon containing exon, which is highly conserved in the

45 vertebrate genome. This auto regulation is thought to be critical for the maintenance of cellular homeostasis of various classes of RNA-binding proteins. This is concordant with the expression pattern of Sfrs10 observed across the retinal development. The higher

MW isoform contains exon 2a, which introduces premature stop codon, rendering the transcript to NMD pathway and regulating the levels of the lower MW canonical isoform.

Sfrs10 expression in postnatal retinal development follows the order of terminal differentiation of retinal neurons. Ganglion cells are the first cells to differentiate, followed by amacrine and horizontal cells at P4. The costaining of syntaxin and Sfrs10 in the

ONBL of P4 retina suggests that they are mostly likely differentiating amacrine and horizontal cells. This observation is in agreement with the previous report where it was shown that syntaxin marks both differentiating amacrine and horizontal cells in the postnatal developing retina [187]. At P8, bipolar cells, amacrine cells, and cone photoreceptors differentiate followed by rod photoreceptors by P21. This suggests that

Sfrs10-mediated AS is involved in genes that regulate the terminal differentiation of the retinal cells. Further, Sfrs10 expression specifically in red/green cone photoreceptors suggests that Sfrs10-mediated AS is specifically required by this subset of cone photoreceptors during development and for their maintenance. This observation would make Sfrs10 one of the first alternative splicing factors to mark red/green cone photoreceptors at P8. Additionally, it would be a valuable nuclear marker of cone photoreceptors at this stage in retinal development as well as in retinal diseases.

Sfrs10 is not detected in the zebrafish retina

Sfrs10 is a widely studied alternative splicing factor that is 100% conserved at the

AA levels in most mammals. At the nucleotide level, Sfrs10 is one of the seven RNA-

46 splicing associated genes that contain “exonic” class of “ultraconserved elements” [186].

This class of RNA-binding proteins is shown to auto-regulate their levels by the inclusion of a stop codon containing exon which is highly conserved in the vertebrate genome. This auto regulation is thought to be critical for the maintenance of cellular homeostasis of various classes of RNA-binding proteins. As Sfrs10 is one of the very few highly conserved genes, we investigated the expression of Sfrs10 in the retina of widely studied model organisms including, mouse, rat, chicken and zebrafish. Immunoblot analysis showed expression in mouse, rat and chicken, which suggests that its transcription is conserved along with its AA sequence. As predicted, no immunoreactivity was observed in the lane containing zebrafish retinal extract (Fig 3.6B). The antibody employed here was directed toward the N-terminus of Sfrs10, which is not conserved in zebrafish (black box, Fig 3.6A). Thus, the absence of the immunoreactivity could be failure of the antibody to recognize Sfrs10 and that it might still be expressed in the zebrafish retina.

Sfrs10 is upregulated only in response to stress in human retinae

Interestingly, SFRS10 is 100% conserved at the AA level between mouse and human, yet it is not expressed normally in the human retina. This was surprising, but in agreement with its role as a stress response gene that would be normally suppressed.

At the level of transcription regulation, CpG islands are shown to be associated with promoters of house-keeping genes and active genes by a comprehensive analysis of

CpG islands in the European Molecular biology laboratory (EMBL) database [180, 188].

In all, the constitutive expression of Sfrs10 in mouse and rat retinae (Fig 3.7A, B) along with the presence of CpG islands (Fig 3.11A, B) suggests that it might regulate the AS of constitutively expressed genes in these organisms. In contrast, SFRS10 is not expressed

47 in the normal human retinae (Fig 3.8A, B, C), and its promoter lacks CpG islands (Fig

3.11C), which suggests that SFRS10 might not be required for general maintenance. In contrast, it was upregulated in AMD retinae (Fig 3.10A, B, C), which is consistent with its role as a stress response gene. Interestingly, sample #7, for which there was no diagnosis of AMD, showed upregulation of SFRS10 suggesting that it was experiencing stress (Fig 3.10D). Since SFRS10 upregulation is often linked to hypoxic stress, one can extrapolate that retinal sample #7 was most likely undergoing hypoxic stress. In addition, the redistribution of red/green opsin staining throughout the membrane of photoreceptor

(Fig 3.10D) indicates retinal degeneration [178].

In all, the upregulation of SFRS10 in AMD retinae suggests that it might be required for AS of a subset of genes involved in hypoxic stress response. For instance, a gene might normally be expressed but under stress conditions, it might undergo AS shift and the isoform responding to stress is regulated by SFRS10. Some of the known targets of

Sfrs10 like Creb1, Pank2 are known to play a key role in metabolism [122, 189]. It could be that there is an increased demand for the isoform regulated by Sfrs10 under hypoxic stress. It is to be noted that both the aforementioned targets have isoforms in the retina.

Future investigation involves testing of the splice pattern shifts in these targets under hypoxic stress.

Sfrs10 does not co-localize with SC35 domain and is not part of stress-granules in human AMD retinae

Most SR proteins have been shown to be a part of the SC35 “nuclear speckle” under normal conditions. Presence of Sfrs10+ speckles independent of SC35 domain in

AMD retinae (Fig 3.13A-B”) suggests that Sfrs10 and SC35 might not interact in the retina

48 under hypoxic stress. It could be that SC35 regulates the splicing of genes that are normally required by the cell whereas Sfrs10 independently regulates a specific subset of genes that are required only under stress conditions. This partitioning of the SR proteins might provide efficient response to stresses such as hypoxia.

Non-overlapping Sfrs10 speckles with HSF1 stress granules in AMD retina (Fig

3.13C-D”) suggest that Sfrs10 does not interact with HSF1. It could also be that HSF1 regulates the transcription of genes that are required early in the stress response while

Sfrs10 regulates the AS of the subset of genes that might be required later in the stress response. Overall, our data suggest that Sfrs10 might not be required for normal maintenance or functioning of neurons in human retina but is predominantly active under hypoxic stress, which is thought to be the underlying cause of AMD.

3.7 Materials and Methods

Animal Procedures

All procedures with mice were performed in accordance with the animal protocol approved by Institutional Animal Care and Use Committee at the University of

Connecticut. The CD1 or the ICR mice from Charles River Laboratory, MA, were employed for all experiments. Chicken eyes were also obtained from Charles River

Laboratory, Storrs, CT. The rat strain used was Wistar and was purchased from Charles

River Laboratory. Zebrafish were obtained from Dr. Sylvain De Guise’s laboratory at the

University of Connecticut, Storrs, CT.

Human samples

Slides with human retinal sections were obtained from Abcam, MA

(http://www.abcam.com/) and from the National Disease Research Interchange (NDRI)

49

(http://ndriresource.org/). In both cases, the authors were not involved in the procurement of the samples. Preprocessed de-identified retinal sections with the associated diagnosis were obtained for IHC analysis. Therefore, in this case the authors were issued a waiver from the University of Connecticut Institutional Review

Board for human subject research. Briefly, the case history for all the samples was provided by NDRI. Also, to classify the AMD samples from the normal, pathological diagnosis was performed by Dr. Federico Gonzalez-Fernandez, MD, PhD, who collaborates with NDRI for the ophthalmologic pathology project. The parameters used in the diagnosis of AMD include accumulation of soft drusen beneath the retinal pigmented epithelium, disciform scar formation indicated by the fragmentation of

Bruch’s membrane and geographic atrophy.

In regards to the one sample (sample #7) obtained from the University of Connecticut

Health Center (UCHC), consent from the next of kin, where applicable was obtained prior to post-mortem retinal tissue procurement. This protocol was approved by the

University of Connecticut institutional review board for human subject research. The sample was de-identified before it was further processed in Dr. Kanadia’s laboratory for

IHC analysis and was conducted in compliance with the Health Insurance Portability and Accountability Act.

Reverse transcriptase-polymerase chain reaction (RT-PCR)

Retinae from different developmental time points in CD1 mice (E12, E14, E16, E18, P0,

P4, P10, and P14) were harvested and total RNA prepared in Trizol following the manufacturer’s protocol (Invitrogen). For cDNA synthesis, 1μg of total RNA from retinae harvested at various time points was used [190].

50

Table 3.2: List of Primers used in Sfrs10 RT-PCR analysis

Primer name Primer Sequence (5' - 3') F1 AGCTTGACAGCTTCAGGAAAGGCC R1 ATCCCAGCTCTATGGCAGGTTTCAG Gapdh Fwd ACAGTCAAGGCCGAGAATGGGAA Gapdh Rev TCAGATGCCTGCTTCACCACCTTCT

PCR to detect Sfrs10 isoforms was performed for 30 cycles (95°C for 30 seconds; 58°C for 30 seconds; 72°C for 1.5 minutes) with primers listed in Table 3.2. Gapdh was used as the control. PCR used to detect Gapdh was performed with the same temperature profile as described above. All PCR products were resolved on a 2.5% agarose gel.

RLM-RACE

Retinae from different developmental time points in CD1 mice (E15.5, E17.5, P0, P5, and

P14) were harvested and total RNA was prepared as described above. 5′ RNA ligase- mediated rapid amplification of cDNA ends (5′ RLM-RACE) (First Choice RLM-RACE kit from Invitrogen) was employed to determine the transcription initiation site(s) of

Sfrs10 mRNA. This method amplifies cDNA only from full-length, capped mRNA, therefore allowing identification of the actual 5′ ends of mRNAs. Total RNA from different time points were first treated with Calf Intestine Alkaline Phosphatase (CIP) to remove the 5′ phosphate from degraded mRNA, rRNA, and tRNA. Samples were then treated with Tobacco Acid Pyrophosphatase (TAP) to remove the cap structure of intact mRNAs, leaving a 5′ phosphate group on this mRNA subset only, followed by ligation of an RNA

51 adapter to the decapped mRNAs. Reverse transcription and subsequent PCR amplification using adapter-specific outer and inner primers (provided in the kit) and gene- specific reverse primer were performed to allow the 5′ ends of mRNA transcripts to be mapped. The gene specific reverse primer of Sfrs10 is 5′-

CCAAACACGCCAAGACAACAGTTG-3′. Instead of cloning, the PCR products were purified and then subjected to restriction digestion with EcoNI to check if the same product is obtained throughout development. Uncut and cut products were resolved on 2% agarose gels.

3’RACE

Retinae from different developmental time points in CD1 mice (E15.5, E17.5, P0, P5, and

P14) were harvested and total RNA was prepared as described above. Reverse transcription was performed using Moloney Murine Leukemia Virus Reverse

Transcriptase (M-MLV RT) with the addition of 3’RACE adapter to the RNA mix. cDNAs obtained were used for PCR amplification with gene-specific forward primer and adapter- specific outer and inner primers. The sequence of gene-specific primer is 5’-

CTTTCTAAATATCAATGCTTAACCAGAACCATTC-3’. Next, the PCR products were purified and then subjected to restriction digestion with BsrI to check if the same product is obtained throughout development. Uncut and cut products were resolved on 2% agarose gels.

Immunohistochemistry (IHC)

All of the experiments were performed on 10-16 μm cryosections obtained from different time points in CD1 mice except for the 6 month retina which was obtained from C57/BL6

52 mice. The cryosections were first hydrated in phosphate-buffered saline (PBS, pH 7.4) and washed three times (5 minutes each at room temperature (RT)), followed by incubation with PBTS buffer (1X PBS with 0.1% triton-X 100, 0.2% BSA and 0.02% SDS) for an hour at RT. Primary antibody (AB#1-rabbit anti-Sfrs10, 1:750, Fitzgerald inc.,

Product #70R-1420; AB#2-rabbit anti-Sfrs10, 1:500, Sigma-Aldrich, Product # AV40528;

AB#3 – rabbit anti-Sfrs10, 1:500, Cemines, Product # AB/SF100; mouse anti-syntaxin -1

(HPC-1), 1:300, Santacruz biotechnologies Inc., Product # sc12736; mouse anti-islet1,

1:300, Developmental Studies Hybridoma Bank, Product# 40.2D6; mouse anti-Pkc-α,

1:300, Calbiochem, Product # OP74; mouse anti-glutamine synthase, 1:300, Chemicon

International, Product # MAB302; mouse anti-SC35, 1:300, Abcam, Product # ab11826; rat anti-HSF1, 1:100, Abcam, Product # ab61382) was incubated in PBTS buffer overnight at 4ºC. Sections were washed with PBTS buffer containing 4’, 6-diamidino-

2phenylindole (DAPI) (Roche diagnostics) 10 times (15 minutes each at RT). Following the washes, secondary antibody (anti-rabbit antibody conjugated with Alexa488, 1:750,

Product # 21206, anti-mouse conjugated with Alexa 568, 1:750, Product # A10037,

Invitrogen) was incubated in PBTS buffer overnight at 4ºC. Sections were washed with

PBTS buffer 7 times (15 minutes each at RT), rinsed with PBS and covered with Prolong gold anti-fade reagent (Invitrogen) and coverslip glass.

Serial IHC

Here the aforementioned IHC protocol was employed with one modification. Upon completion of the nuclear antigen detection by the first antibody, (primary – rabbit anti-

Sfrs10, 1:750 and secondary – Alexa488, 1:750), a 10’/RT incubation with 4% paraformaldehyde was conducted. Subsequently, the cytoplasmic antigen detection by

53 the second antibody (primary – rabbit anti - red/green opsin, 1:300, Millipore, Product #

AB5405 and secondary – Alexa568, Invitrogen, 1:750, Product # 11011) was performed.

Immunoblot

Retinal tissue extracts from mouse, rat, chicken and zebrafish were prepared in RIPA

(50mM Tris (pH 8.0), 150mM Sodium chloride, 1% Igepal, 0.5% SDS) buffer containing

1X protease inhibitor cocktail (cOmplete mini, EDTA-free, Roche Diagnostics). Following the protein estimation of the extracts, 50μg of protein was resolved on 4-20% Tris-glycine gradient gel (Invitrogen). The gel was transferred to a positively charged nylon membrane

(Invitrogen), which was then subjected to immunoblot analysis as described previously

[191]. Rabbit anti-Sfrs10 (1:1000; Fitzgerald Inc., Product # 70R-1420) was used to detect

Sfrs10 in the extract. Mouse anti-Gapdh (1:500; Sigma Aldrich, Product # G8795) was used as the loading control.

Bioinformatics tools and databases

Alignment analysis was carried out using ClustalW tool

(http://www.ebi.ac.uk/Tools/msa/clustalw2/). Promoter regions were characterized using tools such as CpGPlot (http://emboss.bioinformatics.nl/cgi-bin/emboss/cpgplot) and CpG

Island searcher (http://cpgislands.usc.edu/).

Image Acquisition and 3D reconstruction

Confocal fluorescence microscopy was performed using Leica SP2. Images were subsequently processed using IMARIS (Bitplane Inc., CA) and Adobe Photoshop CS4

(Adobe Systems Inc., CA).

54

Figures:

Figure 3.1

Fig 3.1. Expression analysis of Sfrs10 during mouse retinal development. A: Schematic representation of Sfrs10 gene showing exons (orange arrows, E1-E9), intron (black line) and the primers (green arrows, F1 &R1) used for RT-PCR analysis. B: Gel image showing two RT-PCR products of Sfrs10, top band (+ exon 2a), bottom band (-

55 exon 2a). B’: Gel image showing Gapdh expression. C: 5’RACE analysis across retinal development. Shown here is the position of the primers and the restriction site for EcoNI. D: 3’RACE analysis across retinal development. Shown here is the position of the primers and the restriction site for BsrI. E: Schematic representation of the gene after the RACE analysis [144].

56

Figure 3.2

Fig 3.2. SFRS10 expression across postnatal retinal development. A, B, D, E, F:

Immunohistochemistry across postnatal retinal development with rabbit anti-Sfrs10

(green). Nuclei are marked with DAPI (blue). Scale bar: 30μm. GCL, Ganglion cell layer;

INL, Inner Nuclear layer; ONL, Outer nuclear layer; ONBL, Outer neuroblastic layer. C:

Immunohistochemistry for Sfrs10 (green) co-stained for Syntaxin (red) which marks differentiating amacrine cells. Nuclei are marked with DAPI (blue) [144].

57

Figure 3.3

58

Fig 3.3. Specific enrichment of SFRS10 in red/green cone photoreceptors. A: Schematic of Sfrs10 protein with RRM (RNA recognition motif) flanked by two RS (Serine- Arginine dipeptide rich) domains. Above and below the schematic are boxes representing the epitope positions for antibody (AB) #1 (blue box, amino acids 1-51), AB#2 (purple box, amino acids 2-51) and AB#3 (yellow box, amino acids 84-97). B- D”: Serial IHC with rabbit anti-Sfrs10 AB#1 (green) and rabbit anti-red/green opsin (red) and the merged image at P8 (B-B”), P10 (C-C”) and P14 (D-D”). E- E”: Serial IHC with rabbit anti-Sfrs10, AB#2 (green) and rabbit anti-red/green opsin (red) and the merged image (E”). F - F”: Serial Immunohistochemistry with rabbit anti-Sfrs10, AB#3 (green) and rabbit anti- red/green opsin (red) and the merged image (F”). Scale bars 30 μm. G, H: Immunoblot analysis for Sfrs10 (AB#1 (G), AB#2 (H)) on retinal protein extracts from P0 mouse [144].

59

Figure 3.4

Fig 3.4. Differential expression of SFRS10 in rod photoreceptors versus cone photoreceptors. A: IHC on P22 retinal section for SFRS10 (green), laminB2 (red).

Nuclei are marked with DAPI (blue). Shown here is the outer nuclear layer. Scale bar

5μm A’: Magnified image of the boxed region in (B) showing perinuclear localization of

SFRS10 in laminB2 positive cells. Scale bar 3 μm. A”: Schematic representation of

SFRS10 localization observed in A [144].

60

Figure 3.5

Fig 3.5. Sfrs10 is 100% conserved in mammals. A: Schematic of Sfrs10 protein with

RRM (RNA recognition motif) flanked by two RS (Serine-Arginine dipeptide rich) domains.

B - B’’: Amino acid alignment of RS1 domain (B), RRM domain (B’), RS2 domain (B”) of

Sfrs10 protein from different species [145].

61

Figure 3.6

Fig 3.6. Expression of Sfrs10 is conserved in mouse, rat and chicken retina. A:

Alignment of Sfrs10 amino acid (1-100) sequences from mouse, rat, chicken and zebrafish. The black box marks the region used as antigen to raise Sfrs10 antibodies. B:

Immunoblot analysis for Sfrs10 (green) on retinal protein extracts from mouse, rat, chicken and zebrafish. The last lane has extract from HEK-293t cells that were expressing exogenous Sfrs10. Gapdh (red) serves as the loading control [145].

62

Figure 3.7

Fig 3.7. Sfrs10 expression is pan-retinal in mouse, rat and chicken. A-C”: IHC with rabbit anti-Sfrs10 (green) on retinal sections of adult mouse (A), adult rat (B) and adult chicken (C). DAPI (blue) marks all nuclei (A’, B’, C’). Scale Bar represents 50µm [145].

63

Figure 3.8

Fig 3.8. SFRS10 is not expressed in normal human retinae. A-C: IHC with rabbit anti-

Sfrs10 (green) on sections obtained from individuals with normal retinae. DAPI (blue) marks all nuclei. Scale Bar represents 50µm [145].

64

Figure 3.9

Fig 3.9. Red/green opsin is dispersed throughout the photoreceptor membrane in

AMD. A-D: IHC with rabbit anti-red/green opsin (red) on sections obtained from individuals with normal (A) and AMD (B-D) retinae. DAPI (blue) marks all nuclei [145].

65

Figure 3.10

Fig 3.10. SFRS10 is upregulated in degenerating retina. A–C: IHC with rabbit anti-

Sfrs10 (green) on sections obtained from individuals with AMD retinae. DAPI (blue) marks all nuclei. D: Serial IHC with rabbit anti-Sfrs10 (green) and rabbit anti-red/green opsin

66

(red) on parafoveal section obtained from sample #7. DAPI (blue) marks all nuclei. Inset shows the expression of SFRS10 in the red and green cone photoreceptors indicated by solid white arrow and rod photoreceptors indicated by open white arrow. Scale Bar represents 50µm [145].

67

Figure 3.11

68

Fig 3.11. CpGPLOT analysis of Sfrs10 promoter region in mouse, rat and human

Sfrs10. A-C: CPGPLOT generated by CpGPlot tool used for CpG island prediction. A,

B – Mouse and rat Sfrs10 promoter region showed CpG islands; C – No CpG islands were seen in human Sfrs10 promoter region [145].

69

Figure 3.12

Fig 3.12. Comparative analysis of mSfrs10 and hSfrs10 promoter region. A-A”:

70

Comparison of Sfrs10 promoter in mouse vs. rat (A), mouse vs. human (A’), and rat vs. human (A’’) [145].

71

Figure 3.13

Fig 3.13. SFRS10 forms independent stress-related speckles. A - A”: IHC on human retinal section with anti-Sfrs10 (green), anti-SC35 (red). Nuclei are marked with

DAPI (blue). B - B”: Magnified image of the boxed region in A - A” showing non- overlapping Sfrs10+ speckles and SC35+ speckles. C - C”: IHC with anti-Sfrs10

72

(green), anti-HSF1 (red). Nuclei are marked with DAPI (blue). D - D”: Magnified image of the boxed region of C - C” showing non-overlapping Sfrs10+ speckles and HSF+ stress granules [145].

73

Chapter 4: Role of the cell cycle regulator, Citron Kinase in rat retinal

development

This chapter addresses the role of AS in development through a cell cycle regulating gene, Citron kinase in retinal progenitor cells in the developing rat retinae. Since this gene knockout has a severe phenotype, we were able to characterize the phenotype and dissect the cell types affected due to the absence of this gene. We also characterized a novel isoforms of this gene, which could be implicated in various roles played by this gene.

4.1 Cell cycle and its regulation:

The cell cycle is a key process by which the single-celled zygote matures into a multicellular organism. It includes a series of events which leads to the division of a cell into daughter cells. It includes three main phases namely Interphase – preparatory phase where cell grows and accumulates nutrients needed for cell division, mitotic phase

– cell contents are split between two daughter cells and cytokinesis – final stage of division which leads to two separate daughter cells. There are checkpoints at each phase to ensure proper cell division.

Interphase is further divided into first growth phase (G1), synthesis phase (S) and second growth phase (G2). The first phase of interphase is the First Growth phase, G1, during which time the cell grows in size and replicates all of its organelles in preparation for division. In addition, mRNA and protein are synthesized, which will be used during the

S phase [192]. The importance of the G1 phase is that this is when the cell decides whether it will commit to division or whether it will leave cell cycle [193]. This second phase is called the S or synthesis phase. As the two daughter DNA strands are produced

74 during replication, they recruit histones and to form sister chromatids. Eventually, these identical strands of DNA become attached at a central point by a protein named cohesin

[192]. Cohesin is a member of the structural maintenance of (SMC) family of proteins, which affect architectures. At the end of S phase, DNA check for successful replication is done using a set of checkpoint controls. If cells have successfully replicated their chromosomes, they pass to the third stage of interphase, G2.

During G2, a second stage of growth occurs in the cell. This stage has an important checkpoint known as the G2/M checkpoint that must be passed for the cell to be able to enter mitosis [192, 194].

Mitosis begins with prophase, during which chromosome condensation occurs through condensin recruitment. In addition, cohesin, which until this point encircles the entire chromosome is removed from the ends of the chromosome and concentrates at the center. This allows for the individual “arms” of the chromosomes to become visible.

This is also the stage during which the mitotic spindle forms and when in eukaryotes, the centrioles migrate to opposite poles of the cells and begin to radiate their microtubules.

Prometaphase begins with the dissolution of the nuclear envelope into membrane vesicles that will evenly distribute to the daughter cells and is vital for spindle attachment.

Additionally, kinetochores form around the centromere. It is a dynamic phase with constant polymerization and de-polymerization of microtubules that are attached to the kinetochores resulting in chromosomes moving in each direction by microtubules from both poles of cell [194]. At the end of prometaphase, each chromosome has microtubules attached from each pole of the spindle to the kinetochores of the centromere, which is described as “chromosome biorientation”. Metaphase is an immobile phase where

75 chromosomes line up at the equator of the cell known as the metaphase plate. Anaphase is marked by the separation of sister chromatids due to activation of the anaphase promoting complex (APC/C) [194]. Once all the kinetochores are attached, APC/C becomes active and complexes with Cdc20, which then targets securin for degradation, freeing a cysteine protease separase [194]. Separase hydrolyzes the cohesin holding the two sister chromatids together, thereby facilitating chromatid separation. Telophase is the last stage of mitosis during which the nuclear envelope reforms, and the chromosomes de-condense [194].

Cytokinesis can begin as early as anaphase, which brings the cell cycle to a completion. This is the time when the cytoplasm of the two daughter cells divides equally

[194]. It begins with the formation of a cleavage furrow assembly once the site of separation is chosen. The furrow contains actin, myosin, and other cytoskeletal proteins that are organized into a contractile ring known as the actomyosin ring [195]. Thereafter, the ring contracts, generating a membrane barrier between the contents of each daughter cells in addition to constricting the components of the spindle midzone into a structure called the midbody [195]. This is followed by the process, called abscission, where the furrow seals itself, finally generating two completely separate cells [195]. The end-product of mitosis is two identical daughter cells, each with a complete set of 46 chromosomes

[194].

Regulation of cell cycle is crucial to cell survival as dis-regulation could lead to aberrant cell division and to the eventual formation of tumors [194, 196]. On the other hand, mutations in genes required for completion of stages of cell cycle could result in aberrant cells which would then undergo apoptosis. Therefore, a balance between cell

76 division and cell death has to be maintained during development for the proper growth of an organism.

4.2 Citron Kinase

Rho GTPases are a family of GTPases found in all eukaryotic cells. When in their active GTP-bound state, these GTPases perform regulatory functions through specific interactions with target/effector proteins that include serine/threonine kinases, tyrosine kinases, lipid kinases, lipases, scaffold proteins, and oxidases. Through their interactions with effectors, these GTPases control some of the most fundamental eukaryotic processes: cell movement, polarity, morphogenesis, and cell division. In cell division, Rho

GTPases influence the activity of CDK molecules during G1, microfilament and microtubule organization during M phase and play a vital role in the contractile ring formation at the cleavage furrow during the cytokinesis [197].

4.2.1 Discovery of Citron Kinase

Citron Kinase was discovered in 1998 by Di Cunto et al during the screening of new kinases that were specifically expressed in mouse primary keratinocytes [198]. It was originally called KK-1 and 5’ rapid amplification of cDNA ends (RACE) -PCR of KK-1 transcripts did not show any significant extension of the coding sequence. However, sequencing results from the 3’ RACE-PCR showed that the part of the coding sequence was identical to that of Citron [198]. There were two major isoforms of Citron Kinase identified where the first isoform was called ‘Citron Kinase (CitK)’, which is a full-length protein sequence that coded for the following domains: kinase domain, coiled-coil domain with a leucine zipper, Rho/Rac binding domain, zinc finger region, pleckstrin homology domain (PH), and a proline-rich, SH3 binding domain [198]. The second shorter isoform

77 of Citron Kinase had all domain coding regions except for the kinase domain and was called Citron N (Neuron/Non-Kinase) (CitN). Maudaule et al. showed that the latter isoform is expressed specifically in the nervous system [199]. It was shown that CitN is associated with the Golgi apparatus of hippocampal neurons in culture. While the overexpression of CitN has been shown to accumulate filamentous actin and protect the

Golgi apparatus from rupture, its knockdown has been shown to result in a dis-organized golgi apparatus [200]. On the other hand, CitK has been shown to play a crucial role during cytokinesis, specifically during the contractile process at the cleavage furrow that eventually led to the separation of the two daughter cells [197, 199]. Later, it was shown that CitK exhibits its role in cytokinesis through the di-phosphorylation of the regulatory light chain of Myosin II, which is important for creating contractile forces at the cleavage furrow and for assembly of stress fiber [199].

4.2.2 Expression of CitK in the developing brain

To understand the expression patterns of CitK and CitN in the developing mouse,

RNA in situ hybridization experiments were performed on both embryos and postnatal brains using two probes, one that was specific for the kinase domain that would only hybridize with the CitK transcript, and the other that was specific to the coiled-coil region, which was present on both CitK and CitN mRNA transcripts [201]. Analysis revealed that signal for both probes was highly robust as early as embryonic day (E) 10.5 to E16.5, with the highest expression in the developing CNS including retina with further localization to the proliferating areas of the neural tube [201].

Protein analysis of CitK and CitN during various developmental time-points using specific antibodies revealed that only CitK was present during embryonic time-points prior

78 to E16.5. While, CitN expression dominated after E16.5 into the postnatal developmental time-points. Thus, it was shown that in the developing CNS, CitK was specifically expressed by proliferating cells namely progenitor cells, whereas CitN was expressed by post-mitotic, differentiated neurons [201]. The presence of a second transcription start site for CitN was shown when CitN protein levels were not decreased in a null mutant for

CitK.

4.2.3 Flathead mutant

A spontaneous autosomal recessive mutation was identified in 1998 by Cogswell et al., in an inbred colony of Wistar rats and was termed ‘flathead’ (fh/fh) since the mutants showed diminished brain growth beginning early in the embryonic development [202].

Perinatally, these mutants could be identified due to their flattened skulls. Flathead mutation is a frame-shift mutation caused by the deletion of G-C in exon 1 of the kinase domain in the CitK gene on chromosome 12 [203]. This results in a premature stop codon, and the transcript is subjected to non-sense mediated decay pathway, and therefore no protein product would be produced. Physiologically, the fh/fh rats displayed tremors, ataxia, seizures, and abnormal growth in the retina, cerebellum and hippocampus during postnatal development [202]. Nissl staining of coronal section of P21 wildtype and fh/fh rats revealed that the fh/fh brain was approximately one-half the size of the wildtype [Figure 4B]. In addition, immunofluorescence analysis revealed binucleated neurons present in the thalamus, midbrain, hindbrain, spinal cord and cerebellum [203]. Pulse-chase experiments with BrdU injections at E15 and harvest at

P12 showed individual cells containing two BrdU+ nuclei within the same cell suggesting that the progenitor cells did not undergo cytokinesis in the fh/fh mutant [203]. Thus, the

79 defect in cytokinesis in these mutants resulted in increased cell death within the proliferative zones of the brain, thereby wiping out the progenitor pool which would then result in fewer neurons being born and thus a decreased brain size in the rat [203].

4.3 Rationale

Cell cycle is one of the key processes in a developing tissue. The regulation of cell cycle is very crucial for the normal development of an organism. Mutation or mis- splicing of any of the genes involved in the cell cycle can greatly affect the development sometimes leading to the death of that organism. To understand the role of alternative splicing in cell cycle regulation and how it informs normal retinal development, we employed Citron Kinase gene. It has been shown to play a key role in cytokinesis and is known to contain many isoforms. CitK mutant rat model called ‘flathead’ was used to study the role of CitK in progenitor cells of the developing rat retina.

Following sections contain the contents of the publication, Karunakaran et al., [204] titled

“Loss of Citron kinase affects a subset of progenitor cells that alters late but not early neurogenesis in the developing rat retina”

4.4 Background

During retinal development, the balance between self-renewal of RPCs vs neurogenesis allows for simultaneous RPC amplification and neurogenesis [205, 206].

This produces a functional adult retina with six neuronal subtypes including rod photoreceptors, cone photoreceptors, bipolar cells, horizontal cells, amacrine cells and retinal ganglion cells and one glial cell type, Muller glia [3, 5]. Early RPCs are multipotent cells that are known to shift in their competence to produce different neurons over time

[14, 207, 208]. Recently, it was shown that there are RPCs that have temporal bias in

80 producing a subset of neurons [209]. Together, this suggests a dynamic regulation of

RPC cell division that interprets intrinsic and extrinsic cues to regulate neurogenesis. Cell division in the retina occurs such that the S-phase of the cell cycle occurs at the basal end of the outer neuroblastic layer (ONBL) followed by the inter-kinetic movement of the nucleus towards the apical end where the M-phase of the cell cycle occurs [210, 211].

Furthermore, the plane of cell division at the apical end has been reported to regulate whether an RPC undergoes symmetric vs. asymmetric cell division [212]. This process is what regulates RPC amplification vs. neurogenesis. It is here that the regulators of cytokinesis can influence symmetric vs. asymmetric cell division and in turn regulate retinal development. One such gene that regulates cytokinesis is citron kinase. Here, I have analyzed the role of Citron Kinase and how its spliced isoforms inform cell cycle division of retinal progenitor cells and in turn retinal development.

4.5 Results

4.5.1 Citron kinase is enriched during embryonic retinal development

CitK has 47 exons with exons 1 through 12 encoding the kinase domain. Upstream of exon 12, there is an alternative transcription start site that is utilized to produce a truncated protein called Citron N, lacking the kinase domain [201]. PCR amplification of the region coding for the kinase domain with primers F1-R3 and F3-R8 (shown as green arrows in Fig 4.1A) showed a trend with robust expression starting at embryonic day (E)12 followed by a steady decline leading to absence by postnatal (P)8 (Fig.1B.i, ii). However, primer pairs (F12-R17) showed that the alternative transcription start site upstream of exon 12 was being utilized at P10 and P14 (Fig 4.1B.iv). This was further confirmed by

F16-R30 and F29-R41 primer pairs that showed a robust PCR product at P14 and lower

81 levels at P8 and P10 (Fig 4.1B.v, vi). Interestingly, this was not the case for F35-R46 primer pairs where the levels were not similar (Fig 4.1B.vii). The identity of the upper band observed in E12 with F35-R46 primers was not interrogated (Fig 4.1B.vii). Moreover, PCR with the primer pair F8-R12 (positions shown in Fig 4.1A) showed two bands at 594bp and 410bp (Fig 4.1B.iii). Sequence analysis of both these isoforms revealed that the isoform at 594 bp was the canonical isoform including all exons in between. The lower

MW isoform at 410 bp lacked exon 9 (schematic representation below Fig 4.1B.iii). For the primers (F12-R17) amplifying the region between exon 12 and exon 17, there were two bands at 843bp and 552 bp (Fig 4.1B.iv). Again, sequence analysis of both these isoforms revealed that the isoform at 552 bp was the canonical isoform including all exons, while the higher MW isoform at 843 bp was produced by (schematic representation below Fig 4.1B.iv) the usage of the cryptic splice donor at 42306705 NT position (NCBI ref - AC_000080.1) (AGT/GTAG) in intron 13. Since our analysis interrogated short stretches of CitK mRNA, we could not determine the specific combinations in which the two alternative splicing events were utilized. For this, we designed a new set of primers (Fig 4.1C, Schematic representation) in cryptic donor region in exon 13 and junction of exon 8 and 9. RT-PCR analysis with F8-RC13 showed that there were two bands obtained at E12 and E14. The higher MW band at 895 bp corresponds to isoform containing a part of retained intron 13 along with exon 9 included while the lower MW isoform at 711 bp excludes exon 9 (Fig 4.1C, left). From E16 onwards, only the higher MW isoform was observed. Analysis was not carried out beyond P0, as

CitK expression shuts down postnatally. To find whether exon 9 is expressed with the isoform containing canonical exon 13, RT-PCR analysis with F8/9 - R14 was performed.

82

Analysis showed that there were two bands obtained across development with a robust expression observed at E12. The higher MW band at 869 bp corresponds to isoform containing exon 9 with a part of intron 13 retained and the lower MW at 782 bp corresponds to the isoform with exon 9 and canonical exon 13 (Fig 4.1C, right).

Next, we interrogated the expression of CitK protein by section IHC in E12 and E16 retinae. This analysis showed CitK to be localized in discrete puncta at the apical tip, where RPCs undergo M-phase of the cell cycle (Fig 4.1D, 4.1E, Insets). This observation is in agreement with the report by Di Cunto et al., who reported expression of CitK in the developing retina by whole mount in situ hybridization at E11.5 when there are only RPCs, which suggests expression of CitK in RPCs [201]. Moreover, in the developing brain, CitK expression was found at the cytokinesis furrow in progenitor cells [203]. Thus, expression of CitK was considered to be in RPCs. Absence of signal in the KO-retina confirmed specificity of the CitK antibody (Fig 4.1D’, E’). Finally, to confirm whether the punctuate staining of CitK corresponds to RPCs, E12 sections were costained with anti-PH3 antibody (Fig 4.1F’), which marks M-phase progenitor cells, which showed overlap in signal for PH3 and CitK (Fig. 4.1F-F”, Inset).

4.5.2 Number of progenitor cells in the KO retina is comparable to that in the WT retina at E12 and E13

Given the robust expression of CitK in early RPCs, we wanted to interrogate its role in retinal development. For this, we pulsed pregnant female rats at E12 with EdU one hour prior to harvest. E12 WT and KO retinal sections were then subjected to IHC with anti-Ki67, PH3 along with EdU detection (Fig 4.2A - F). Here, we quantified EdU+ (in S- phase) RPCs as a percentage of total Ki67+ RPCs, which showed no difference in S-

83 phase RPCs in CitK KO compared to WT (Fig 4.2G). Similarly, PH3+ (M-phase) RPCs as a percentage of total Ki67+ RPCs did not show any difference between KO and WT

(Fig 4.2H).

Next, we wanted to interrogate the effect of loss of CitK on RPCs over time. So, we performed EdU pulse-chase experiment. For this, we pulsed pregnant females at E12 followed by harvest at E13. Again, WT and KO retinal sections were subjected to Ki67,

PH3 along with EdU detection (Fig 4.3A - F). EdU+ RPCs (S-phase) as a percentage of total Ki67+ RPCs did not change in KO compared to WT (Fig 4.3I). This was also true for

PH3+ (M-phase) RPCs (Fig 4.3J). Since RPCs did not show an observable phenotype in

CitK KO retina, we wanted to interrogate if neurogenesis was affected. Since retinal ganglion cells (RGCs) were the only cell type produced at this point, we employed Islet1 antibody to mark the RGCs. Specifically, we looked for Islet1+ cells that were also EdU

+, so as to interrogate RGCs produced after the EdU pulse (Fig 4.3G- H”). Again, we did not see significant difference in RGC production in KO compared to WT (Fig 4.3K).

4.5.3 Cell death of a subset of progenitor cell population is observed in the KO retina by E14

Next, we pulsed E12 pregnant females with EdU followed by embryo harvest at

E14. Retinal sections from KO and WT embryos were subjected to IHC with Ki67, PH3 along with EdU detection (Fig 4.4A -F). Quantification of EdU+ (S-phase) RPCs as a percentage of total Ki67+ RPCs showed statistically significant decrease in RPCs in the

CitK KO compared to WT (Fig 4.4I). This suggested that there was a loss of RPCs in the

CitK KO retina, which suggests that either the RPCs died or there was an increase in

RGC production at the expense of these RPCs. To distinguish between these two

84 possibilities, we performed IHC with Islet1 along with EdU detection (Fig 4.4G – H”).

Quantification was restricted to the ganglion cell layer (GCL) to take in to account only the differentiated ganglion cells and not include newly born Islet1+ RGCs or amacrine cells (ACs) in the outer neuroblastic layer (ONBL). Quantification of Islet1+ cells that were also EdU+ as a percentage of total Islet1+ cells in the GCL did not show a statistically significant difference in KO compared to WT (Fig 4.4J). This led us to interrogate the possibility that RPCs were undergoing cell death in CitK KO retinae. Indeed, TUNEL analysis showed increased cell death in CitK KO compared to WT (Fig 4.5A, A’, B).

Similar analysis was extended to E16 retinae that were harvested from embryos where the mother had been pulsed with EdU at E12. Here, a significant decrease in the size of the ONBL and a concomitant increase in TUNEL+ cells in KO retina was observed (Fig.

4.5C, C’ D). To confirm that the TUNEL+ cells were RPCs, sections that were subjected to TUNEL assay were stained for Ki67. Here, we observed many TUNEL+ cells were

Ki67+ as well (Fig. 4.5E – F”’). The overlap was confirmed using section tool in IMARIS

7.3 software, where the cross hairs mark the cell under interrogation with the panel on the right showing the longitudinal axis whereas one at the bottom showing the horizontal axis. In both cases, overlap was observed for TUNEL and Ki67 positivity (Fig. 4.5E’, F’ –

F”’). However, there were some cells that were TUNEL+ but Ki67- in the ONBL. These cells had increased intensity of TUNEL signal which correlates with the extent of cell death and lack of Ki67 positivity might be due to protein degradation.

4.5.4 Absence of Islet1+ bipolar cell production in the KO retina

The increase in cell death of RPCs after E14 led us to extend our analysis to the postnatal development. At P0, the KO retinal tissue was visibly smaller than the WT retina.

85

IHC analysis with different markers such as anti-PH3, anti-islet1, anti-Ki67 and anti-Pax6 antibodies showed that GCL remained uncompromised (Fig 4.6A - D’). However, the

ONBL was half the size of the WT counterpart and showed signs of rosette formation.

However, PH3 and Ki67+ RPCs, were observed in the P0 CitK KO retina (Fig.4.6A’, C’).

Similarly Islet1+ and Pax6+ RGCs and ACs were observed in the KO retina (Fig. 4.6B’,

D’). Interestingly, at P2, Islet1+ RGCs and ACs were observed in the WT and KO retinae

(Fig. 4.6F, F’). However, Islet1+ bipolar cells (BPCs) [213] were not observed in the KO retinae (Fig. 4.6F’). Overall, the KO retinae showed rosette formation but some Ki67+

RPCs were still observed (Fig. 4.6G’). At P4, the KO retina was structurally compromised with large rosettes in the ONBL. Again, Islet1 marked a subset of BPCs in the WT retina but these cells were not observed in the KO retina (Fig.4.7A, B). To distinguish between

BPCs and ACs in the ONBL, Pax6 was used to mark amacrine cells (Fig. 4.7A’, A”, B’,

B”). Thus, cells positive for both Islet1 and Pax6 revealed amacrine cells in the ONBL, while those positive for Islet1 alone marked BPCs (Fig. 4.7A, A”, boxed region). To interrogate whether KO retina failed to produce BPCs, we pulsed P0 pups with EdU followed by retinal harvest at P4 and IHC with Islet1 and EdU (Fig.4.7C – D””). This analysis showed Islet1+ EdU+ BPCs were indeed produced in the WT retina (Fig.4.7C”, inset) but were not observed in the KO retina. However, EdU+ cells were observed in the

ONBL of the KO retina (Fig.4.7D”).

4.5.5 CitK KO retina undergoes severe degeneration by P14

IHC analysis at P7 with rhodopsin showed that in the KO retina, there were rhodopsin+ cells except it lacked lamination or canonical morphological characteristics

(Fig. 4.8A’). At P14, the KO retina had deteriorated such that the ONL and INL collapsed

86 to a thin layer of cells that showed positivity for markers such as red/green opsin and rhodopsin (Fig. 4.8C – D’). However, the RGC layer was still present and positive for

Islet1 and Pax6 (Fig. 4.8E – F’).

4.6 Discussion

Role of alternative splicing in Citron kinase expression

Here we discovered an alternatively spliced isoform of CitK that is developmentally regulated such that exon 9 is excluded at E12 and E14. The amino acids encoded by exon 9 are 370-431 amino acids and are part of the kinase domain. Thus, the isoform lacking exon 9 would most likely lack the kinase function. In addition, we discovered another alternatively spliced isoform where cryptic donor in intron 13 such that it adds 29 amino acids. These amino acids are added to the structural maintenance of chromosome

(SMC) domain, which could alter the nuclear function of CitK, which has been reported in drosophila where its homolog is called sticky, which is required for heterochromatin- mediated gene silencing through HP1 localization and H3-K9 methylation [214].

RPC death does not begin until E14

At E12, RT-PCR analysis showed highest levels of expression of CitK, but its loss did not affect RPCs at E12 or E13. This suggests that RPCs at E12 and E13 retina may have a compensatory mechanism that allows RPCs to undergo cytokinesis in the absence of CitK. By E14, we begin to see reduction in RPCs which is coincident with

TUNEL+ cells suggesting CitK requirement for RPCs. Interestingly, RGC production was not aberrant in CitK KO retinae, which is in agreement with the observation that RPCs at

E12 and E13 remain uncompromised. Indeed, it is at E12, when RGC production begins

87 and the resistance of E12 RPCs to loss of CitK allows for normal RGC production. The observation of cell death beginning at E14 raises the issue as to why it happens at E14 and not earlier (Fig. 4.9). One possibility is that there exists a compensatory gene(s)

(factor(s) X) that allowed RPCs to survive at E12 and E13 and these genes were downregulated by E14 (Fig. 4.9). There are other Rho-dependent kinases such as Rho- associated protein kinase (ROCK) and Cdc42 that have been shown to play a regulatory role in the contractility of the cleavage furrow during cytokinesis [197-199, 215]. Aurora, another member of serine/threonine kinase family and Aurora and Ipl1-like midbody- associated protein (AIM-1) have also been shown to play a critical role in cell division especially at the midbody furrow [216, 217]. Nir1, a mammalian homolog of retinal degeneration B in drosophila, has been shown to have a very similar function to that of

CitK [218]. Therefore, it is possible that one or all of these genes could compensate for loss of CitK function.

Late embryonic and postnatal neurogenesis is affected in the KO retina

The depletion of RPCs that began at E14 reached their peak at E16 resulting in a severely compromised retina at birth. Notably, pulse-chase experiments with EdU showed that RPCs that were in S-phase at E12 (time of EdU pulse) had survived cell cycle and had re-entered cell cycle at E16 as shown by EdU+ cells that were Ki67+.

These cells survived because they most likely did not express CitK or if they did express

CitK, they also had the aforementioned compensatory factor. This agrees with the observation in the brain where it was shown only ~50% of neurogenic cytokineses were affected [219]. Thus it is possible that RPCs can be grouped into two categories of neurogenic cytokinesis. One that is dependent on CitK and susceptible to cell death in

88 the KO retina and the other that is independent of CitK which results in RPCs that remain at P0 (Fig. 4.9). Moreover, the RPCs in the P0 KO retina fail to produce bipolar cells as shown by the pulse chase experiments where RPCs that took up EdU at P0 (time of pulse) did not show EdU+ Islet1+ BPCs at P4. It is also possible that RPCs in the KO retina failed to undergo cell division as a secondary effect of the severely compromised retina. This possibility is unlikely given that there are distinct EdU+ nuclei in the ONBL of

P4 retina which suggests that the progeny of the progenitor cells that took up EdU were still alive (Fig.4.7B1”). In all, our data showed that loss of CitK did not affect early embryonic RPCs resulting in normal neurogenesis while altering late embryonic RPCs affecting subsequent neurogenesis.

Here we present a model where a subset of RPCs require CitK function and these subsets of progenitor cells go on to produce Islet1+ BPCs (Fig. 4.9, Top). In the absence of CitK, expression of factor(s) X allows RPCs to survive uptil E14, when the levels of these factor(s) drops making CitK function crucial for RPC survival. This in turn results in the death of these RPCs such that there are none left at birth to produce the Islet1+ BPCs in the CitK KO retina (Fig. 4.9, bottom).

4.7 Materials and methods

Animal Procedures

All procedures with rats were performed in accordance with the animal protocol approved by Institutional Animal Care and Use Committee at the University of Connecticut and in compliance with the regulations of ‘The Association for Research in Vision and

Ophthalmology’ for the use of animals in research. Wistar rats from Charles River

89

Laboratory, MA, were employed for RT-PCR analysis. All CitK mutants

(Citkfh/fh, flathead rats), heterozygous and wildtype littermates were generated from a breeding colony maintained at the University of Connecticut.

Reverse transcriptase-polymerase chain reaction (RT-PCR)

Retinae from different developmental time points in Wistar rats (Embryonic day (E) 12,

E14, E16, E18, postnatal day (P)0, P4, P10, and P14) were harvested and total RNA prepared in Trizol following the manufacturer’s protocol (Invitrogen). For cDNA synthesis,

1μg of total RNA from retinae harvested at various time points was used [190]. PCR to examine the expression of Citron Kinase gene was performed with the primers across the gene mentioned in table 1. RT-PCR thermocycler conditions were for 33 cycles (95°C for

30 seconds; 58°C for 30 seconds; 72°C for 50 seconds). Gapdh was used as control.

Primers used to amplify Gapdh are mentioned in table 1. RT-PCR thermocycler conditions were for 30 cycles (95°C for 30 seconds; 58°C for 30 seconds; 72°C for 50 seconds). All PCR products were resolved on a 2.5% agarose gel. The products were then excised and cloned into pGEMT vector (Promega, catalog # A1360) and sequenced with T7 primer to confirm their identities.

Table 4.1: List of primers used in the CitK RT-PCR analysis

Primer name (# represents exon) Primer Sequence (5' - 3') F1 CGGTAGCGGAGAGATGTTGAAGTTCAAGT R3 TAACCACCTGCACTTCGGCGAAGT F3 ACTTCGCCGAAGTGCAGGTGGTTA

90

R8 GCAACAGAGACCCTCGAACTTCAGTCTCT F8 AGAGACTGAAGTTCGAGGGTCTCTGTTGC F8/9 AATAACATCCGGAACTCTCCTCCCC R12 GCTCTCTGATGTCGTGGAGAAGCTGAAGA F12 TCTTCAGCTTCTCCACGACATCAGAGAGC RC13 AGGTCCTTAGCACTCGCTAAGCCA R14 CTGGAACATTCTCCCACTTCAGGCT R17 TCCTTCAGTCTGTTCTCTCTACGCTCCATG F16 TGCAGAACATCCGGCAGGCAAA R30 AGGGCGAGCTTCAGCTCATTGTACTG F29 CACGAGAAGGTGAAAATGGAAGGCA R41 CCGATGAGGATACTGTAATTGGTGAAGTGG F35 GGTGGAAGAATTTGAGCTGTGCCTTC R46 TCTGTCCGGCCCTCTCTGTCTCGGTA GAPDH Fwd ACAGTCAAGGCCGAGAATGGGAA GAPDH Rev TCAGATGCCTGCTTCACCACCTTCT

EdU pulse experiments

Pregnant heterozygous females at E12 were first weighed and injected with 1ml of 25mM

EdU in PBS/100mg of body weight and embryos were either harvested one hour after at

E12 or at E13, E14 or E16. P0 pups were first weighed and injected with 0.4ml of 25mM

EdU in PBS/100mg of body weight and retinae were harvested at P4.

Immuno histochemistry (IHC)

91

All of the experiments were performed on 10-16 μm cryosections obtained from different time points in CitK KO, heterozygous and WT littermates. For embryonic analysis, cryosections were subjected to antigen retrieval as described by the manufacturer (Vector laboratories, Catalog # H-3300) followed by IHC. The sections were then hydrated in phosphate-buffered saline (PBS, pH 7.4) and washed three times (5 minutes each at room temperature (RT)), followed by incubation with PBTS buffer

(blocking/permeabilization buffer) (1X PBS with 0.1% triton-X 100, 0.2% BSA and 0.02%

SDS) for an hour at RT. Primary antibody (mouse anti-islet1, 1:300, Developmental

Studies Hybridoma Bank, Product # 40.2D6; mouse anti-Ki67, BD Biosciences, Product

# 556003; rabbit anti–Phospho Histone H3 (PH3), 1:300, Bethyl laboratories, inc.,

Product # IHC-00061; rabbit anti-Pax6, 1:300, Covance, Product # PRB-278P; mouse anti-rhodopsin, clone 4D2, 1:300, Millipore, Product # MABN15; rabbit anti-red/green opsin, 1:300, Millipore, Product # AB5405) was incubated in PBTS buffer overnight at

4ºC. Sections were washed with PBTS buffer containing 4’, 6-diamidino-2phenylindole

(DAPI) (Roche diagnostics) 10 times (15 minutes each at RT). Following the washes, secondary antibody (anti-rabbit antibody conjugated with Alexa488, 1:750, Product #

21206, anti-mouse conjugated with Alexa 568, 1:750, Product # A10037, Invitrogen) was incubated in PBTS buffer overnight at 4ºC. Sections were washed with PBTS buffer 7 times (15 minutes each at RT), rinsed with PBS and mounted in Prolong gold anti-fade reagent (Invitrogen) and coverslip glass.

For the detection with mouse anti-CRIK/CitK antibody (1:50, BD Biosciences, Product #

611376), IHC protocol as described above was employed except that the blocking buffer consisted of 1X PBS with 0.5% triton-X, 5% normal goat serum (NGS) containing DAPI

92 followed by the incubation of the primary antibody in permeabilization buffer (1X PBS with

0.25% triton-X and 2.5% NGS) overnight at 4ºC. Sections were washed 3 times with the permeabilization buffer followed by the incubation with secondary antibody in permeabilization buffer for 2hours at RT. Sections were washed with permeabilization buffer 5 times (10 minutes each at RT), rinsed with PBS and covered with Prolong gold anti-fade reagent (Invitrogen) and coverslip glass.

For EdU detection, Click-iT EdU Alexa fluor 647 imaging kit (Molecular probes, catalog #

C10340) was used. The steps were followed as per instructions in the user manual.

TUNEL assay

In situ Cell death detection kit, TMR red (Roche diagnostics, catalog # 12156792910) and

TUNEL Apo-green detection kit (Biotool, Catalog # B31112) was used to assay the cell death in the embryonic retinal tissue at E14 and E16 as per instructions in the user manual. For determining whether the dying cells were retinal progenitor cells, E16 retinal sections were first stained with mouse anti-Ki67 and IHC was followed as described earlier. The sections were then fixed and TUNEL assay was carried out.

Image Acquisition and 3D reconstruction

Confocal fluorescence microscopy was performed using Leica SP2. Images were subsequently processed using IMARIS 7.3 (Bitplane Inc., CA) and Adobe Photoshop CS4

(Adobe Systems Inc., CA). For counting, spot tool in IMARIS software was employed. For confirming the overlap between Ki67 and TUNEL signal, section tool was employed. The placement of the cross hairs shows the signal in the longitudinal (shown on right) and horizontal axis (shown at the bottom) at the given position.

93

Statistical analysis

Data were analyzed in Microsoft Excel 2007. Statistical significance was determined using the Student’s t test (p ≤ 0.05). Data are presented as mean ± SEM.

94

Figures

Figure 4.1

95

Fig 4.1. Expression analysis of Citron Kinase during rat retinal development. A:

Schematic representation of CitK mRNA showing exons (orange boxes) and the primers

(green arrows) used for RT-PCR analysis. B: Gel images showing the expression of CitK at retinal timepoints indicated on top along with the forward and reverse primers used.

Primers in (i) exons 1 and 3; (ii) exons 3 and 8; (iii) exons 8 and 12; (iv) exons 12 and 17

, with schematic representation of isoforms detected in (iii) and (iv); (v) exons 16 and 30;

(vi) exons 29 and 41; (vii) exons 35 and 46 ; (viii) Gapdh. (vi) & (vii) Gel images at the bottom show a low exposure image. C: Schematic representation of combination of isoforms identified in CitK and the primers used for RT-PCR analysis (green arrows, F8-

RC13 and F8/9 – R14). (Left) Gel image showing the expression of isoform containing cryptic donor in exon 13 along with or without exon 9 included across embryonic day

(E)12, E14, E16, E18 and postnatal (P)0. (Right) Gel image showing the expression of isoform containing exon 9 with or without cryptic donor in exon 13 across E12, E14, E16,

E18 and P0. D-E’: IHC with mouse-CitK antibody (green) on E12 WT (D) and KO (D’) retinae and E16 WT (E) and KO (E’) retinae. Insets show the localization of CitK (arrow heads). F-F”: IHC with mouse anti- CitK antibody (green) along with rabbit anti-PH3 antibody (red) on E12 retinal sections. DAPI (blue) marks all the nuclei. ONBL – Outer neuroblastic layer; GCL – Ganglion cell layer [204].

96

Figure 4.2

Figure 4.2. Number of Progenitor cells in WT and KO retinae are comparable at E12.

IHC on retinal sections obtained from embryonic day (E)12 embryos where the mother was pulsed with EdU one hour prior to harvest. A, D: Retinal progenitor cells (RPCs) positive for Ki67 (green) and PH3 (red) in WT (A) and KO (D). B, E: RPCs positive for

EdU (magenta) in WT (B) and KO (E). DAPI (blue) marks all nuclei. C, F: Merged image

97 showing RPCs positive for Ki67 (green) and EdU (magenta) in WT (C) and KO (F). DAPI

(blue) marks all nuclei. Inset shows the higher magnification image of the boxed region in the apical end of the ONBL where the solid arrow points to a RPC that is EdU+ and

Ki67+. G: Quantification of S-phase RPCs (EdU+ cells) as a percentage of all RPCs

(Ki67+ cells) in WT (blue, n=6) and KO (red, n=9). H: Quantification of M-phase RPCs

(PH3+ cells) as a percentage of all RPCs (Ki67+ cells) in WT (blue, n=6) and KO (red, n=9). ONBL – Outer neuroblastic layer; GCL – Ganglion cell layer [204].

98

Figure 4.3

99

Fig 4.3: Number of RPCs and retinal ganglion cells in WT and KO retinae are comparable at E13. IHC on retinal sections obtained from embryonic (E)13 embryos where the mother was pulsed with EdU at E12. A, D: Retinal progenitor cells (RPCs) positive for Ki67 (green) and PH3 (red) in WT (A) and KO (D). B, E: RPCs positive for

EdU (magenta) in WT (B) and KO (E). C, F: Merged image showing RPCs positive for

Ki67 (green) and EdU (magenta) in WT (C) and KO (F). DAPI (blue) marks all nuclei.

Inset shows the higher magnification image of the boxed region in the apical end of the

ONBL where the solid arrow points to a RPC that is EdU+ and Ki67+. G-H”: Retinal sections showing EdU+ cells (magenta) in WT (G) and KO (H) along with IHC for Islet1

(green) in WT (G’) and KO (H’). Shown in G” and H” are the merged images of G & G’ and H & H’, respectively. Inset shows the higher magnification image of the boxed region in the GCL where the solid arrow points to a RGC that is EdU+ and Islet1+. DAPI (blue) marks all nuclei. I: Quantification of S-phase RPCs (EdU+ cells) as a percentage of all

RPCs (Ki67+ cells) in WT (blue, n=7) and KO (red, n=10). J: Quantification of M-phase

RPCs (PH3+ cells) as a percentage of all RPCs (Ki67+ cells) in WT (blue, n=7) and KO

(red, n=10). K. Quantification of retinal ganglion cells (RGCs) born at/after E12 by determining Islet1+ and EdU+ cells as a percentage of all Islet1+ cells within the GCL in

WT (blue, n=3) and KO (red, n=7). ONBL – Outer neuroblastic layer; GCL – Ganglion cell layer [204].

100

Figure 4.4

101

Fig 4.4: Number of RGCs in WT and KO retinae are comparable at E14 but the number of RPCs in S-phase are fewer in the KO retina. IHC on retinal sections obtained from embryonic (E)14 embryos where the mother was pulsed with EdU at E12.

A, D: Retinal progenitor cells in (RPCs) positive for Ki67 (green) and PH3 (red) in WT

(A) and KO (D). B, E: RPCs positive for EdU (magenta) in WT (B) and KO (E). C, F:

Merged image showing RPCs positive for Ki67 (green) and EdU (magenta) in WT (C) and KO (F). DAPI (blue) marks all nuclei. Inset shows the higher magnification image of the boxed region in the apical end of the ONBL where the solid arrow points to a RPC that is EdU+ and Ki67+. G-H”: Retinal sections showing EdU+ cells (magenta) in WT

(G) and KO (H) along with IHC for Islet1 (green) in WT (G’) and KO (H’). Shown in G” and H” are the merged images of G & G’ and H & H’, respectively. Inset shows the higher magnification image of the boxed region in the GCL where the solid arrow points to a RGC that is EdU+ and Islet1+. DAPI (blue) marks all nuclei. I: Quantification of S- phase RPCs (EdU+ cells) as a percentage of all RPCs (Ki67+ cells) in WT (blue, n=3) and KO (red, n=9). Student’s t-test, p=0.01. J: Quantification of retinal ganglion cells

(RGCs) born after E12 by determining Islet1+ and EdU+ cells as a percentage of all

Islet1+ cells within the GCL in WT (blue, n=6) and KO (red, n=7). ONBL – Outer neuroblastic layer; GCL – Ganglion cell layer [204].

102

Figure 4.5

103

Fig 4.5. Reduction in RPCs without change in RGCs between WT and KO retinae at E16. A, A’: TUNEL assay (red) on embryonic (E14) WT (A) and KO (A’) retinae.

DAPI (blue) marks all the nuclei. B: Quantification showing cell death (TUNEL+ in A &

A’) in E14 WT (blue, n=6) and KO (red, n=8) retinal sections. Student’s t-test, p=0.005.

C, C’: TUNEL assay (red) on E16 WT (C) and KO (C’) retinae. DAPI (blue) marks all the nuclei. D: Quantification showing cell death (TUNEL+ in C & C’) in E16 WT (blue, n=3) and KO (red, n=6) retinal sections. Student’s t-test, p=0.05. E-F”’: Identity of TUNEL+ cells by IHC for Ki67 (red) and TUNEL (green) in E16 WT (E) and KO (F) retinal sections. The boxed region in E is shown as a high magnification image in E’ with section function in IMARIS showing the overlap of cells positive for both Ki67 and

TUNEL (Details in materials and methods). Boxed regions in F are shown as high magnification images in F’-F”’ with section function in IMARIS showing the overlap of cells positive for both Ki67 and TUNEL.ONBL – Outer neuroblastic layer; GCL –

Ganglion cell layer [204].

104

Figure 4.6

Fig 4.6. Reduction in the ONBL in the CitK KO retina along with absence of bipolar cells at P2. A - D’: IHC on postnatal (P0) retinal section showing PH3 (green) in WT (A) and KO (A’); Islet1 (green) in WT (B) and KO (B’); Ki67 (green) in WT (C) and

KO (C’); Pax6 (green) in WT (D) and KO (D’). E - H’: IHC on P2 retinal section showing

PH3 (red) in WT (E) and KO (E’); Islet1 (green) in WT (F) and KO (F’); Ki67 (green) in

WT (G) and KO (G’); Pax6 (green) in WT (H) and KO (H’). DAPI (blue) marks all the nuclei. ONBL – Outer neuroblastic layer; GCL – Ganglion cell layer [204].

105

Figure 4.7

Fig 4.7. Bipolar cells are not produced in the KO retina at P4. A – B”: IHC on postnatal

(P4) retinal sections for -Islet1 (green) in WT (A) and KO (B) and for Pax6 (red) in WT (A’) and KO (B’). Merged images of A & A’ and B & B’ are shown in A” and B”, respectively.

106

In A and A’, the dashed box marks the Islet1+ & Pax6 –ve BPCs. The arrow heads point to cells with expression of both Pax6 and Islet1.C – D”’: IHC on sections of P4 retinae harvested from rats pulsed with EdU at P0. Expression of Islet1 (green) in WT (C) and

KO (D) and EdU detection in WT (C’) and KO (D’). Merged images of C & C’ and D & D’ are shown in C” and D”, respectively. Inset in C” shows a higher magnification image of the boxed region highlighting bipolar cells born at/after P0. DAPI marks all the nuclei.

ONBL – Outer neuroblastic layer; GCL – Ganglion cell layer [204].

107

Figure 4.8

108

Fig 4.8. KO retina is severely compromised by P14. A – B’: IHC on postnatal (P7) retinal sections showing rhodopsin (green) and Pax6 (red) in WT (A) and KO (A’); Islet1

(green) in WT (B) and KO (B’). C – F’: IHC on P14 retinal sections showing red/green opsin (red) in WT (C) and KO (C’); rhodopsin (green) in WT (D) and KO (D’); Pax6 (red) in WT (E) and KO (E’); Islet1 (green) in WT (F) and KO (F’). DAPI marks all the nuclei.

ONL – Outer nuclear layer; INL – Inner nuclear layer; GCL – Ganglion cell layer [204].

109

Figure 4.9

Fig 4.9. Model of the effect of loss of Citron Kinase on RPCs. Shown here is a developmental time line starting with E12 and P4. Top panel reflects the scenario in WT retinae and the bottom reflects the scenario in KO retinae. The orange triangle shows the expression levels of CitK as it decreases over time. The brown triangle shows the expression kinetics of a potential compensatory factor (Factor X), which decreases by

E14. The circles represent RPCs, where open circles represent CitK -ve RPCs while solid circles (orange) represent CitK+ RPCs. The loss of CitK does not affect RPCs at

E12 but is required for the survival of RPCs beginning at E14 shown as reduction in the number of solid circles in the KO scenario. This reduction in CitK+ RPCs results in drastic loss by P0 such that Islet1+ Bipolar cells are not produced in the KO retinae

[204].

110

Chapter 5: Global analyses to understand the regulation of

transcriptome by alternative splicing

This chapter discusses the global approach to understand the role of AS in retinal development. To this end, we employed RNAseq and custom microarray to address the question. We also devised a custom bioinformatics pipeline to extract biological meaning from the large dataset obtained from RNAseq. This method was compared to other methods routinely employed by other biologists and we showed that the former method has more inference values. Therefore, this strategy was extended to another RNAseq data obtained from various developmental time points from wild type and a triple microRNA cluster knockout retinae.

5.1 Technologies employed to understand alternative splicing at the global level

Alternative splicing (AS) has been shown to be a key regulatory step in the gene expression and is of the mechanisms to generate cellular and functional complexity in higher eukaryotes. Gene-targeted approaches highlighted the importance of AS in various tissues but those analyses could only go so far. Therefore, en-masse approach of transcriptome analysis was taken to understand the global picture of various regulatory networks that governed the expression of genes on the whole in a tissue -specific manner.

Global analyses of gene expression mostly relied on Microarray profiling and expressed sequence tagged – cDNA sequence (EST approach), which have shown that about two- thirds of human genes contain one or more alternatively spliced exon. Due to the limitation in the depth coverage and sensitivity at the time offered by microarray and conventional sequencing technologies, the extent of AS events in humans was not known. Whole- genome tiling arrays were also employed to capture the transcriptome complexity but this

111 technology could not capture splice-junction information. Other method such as event- specific arrays have been used but they have been hampered by the specificity, high costs and difficulty in the data analyses. The other technology, EST approach provided partial sequences of individual cDNA clones but is sensitive to cloning biases. Other technologies that were employed include serial analysis of gene expression (SAGE) [220] and parallel signature sequencing, which also could not provide specific splicing event information. In 2007, the potential of RNA sequencing (RNA seq) was first demonstrated by the polony multiplex analysis of gene expression (PMAGE), which allowed for the detection of as low as 0.3 RNA copies per cells [221]. RNAseq offers increased depth coverage and sensitivity. Thus, Illumina based RNAseq technology has become the new way of thinking for most biologists that are interested to understand the transcriptome at single-nucleotide resolution.

5.1.1 Microarray

Microarray is a multiplex array on a solid substrate, which is used a high- throughput screening technique to measure the expression levels of large number of genes simultaneously. The concept was first introduced by Tse Change in 1983, when he illustrated “antibody microarrays” [222]. While DNA microarray technology is the most widely used type of microarray, other types of microarray including protein, peptide, cellular, glycoarray are routinely performed. The two platforms that are currently available for DNA microarrays include glass microarrays, which involves spotting on a glass slide and high-density oligonucleotide arrays, also called “chip”, which involves insitu oligonucleotide synthesis.

5.1.1.1 Principle

112

The principle behind DNA microarrays include three main steps:

1. Array preparation – using chip or glass slide via surface engineering, DNA fragments are usually synthesized and then covalently linked to the solid surface such as silicon chip or glass via surface engineering. The spot size is usually less than 200µ in diameter.

Each spot contains a few picomoles of a specific sequence called probes.

2. Sample preparation – Isolation of DNA or mRNA and cDNA preparation, known as targets.

3. Hybridization and acquisition - These immobilized probes are used to hybridize to the target. The probe-target hybridization is detected quantitatively either using a fluorophore or chemiluminescence labeled targets. The signal strength measure gives the relative measure of gene expression in the sample.

Post data-acquisition, data is analyzed using appropriate software followed by quantification and interpretation of data.

5.1.1.2 Advantages and Disadvantages

Some of the advantages of microarray include:

 Simultaneous detection of multiple probes

 Initial and faster screening in disease diagnosis

 Gene level and protein level detection makes the gene to protein to disease

correlation easier.

 Cost-effective – glass type DNA microarrays are affordable.

113

 No additional equipment is required – hybridization step does not require any other

equipment.

 Increased detection sensitivity – glass microarrays have longer target sequences

about 2kb, which makes it more sensitive in probe- target hybridization.

While there are advantages to this method and is widely used, it has a few disadvantages including

 labor involved in synthesis, purification, storage of probes

 probe bias where some probes might have a stronger covalent linking to its targets

while some might have variations at the single nucleotide level, which in turn might

affect the hybridization

 cross hybridization, where homology in closely related gene family members might

cause the target to hybridize to a different probe.

 Upregulation or downregulation might not necessarily be reflected by the probe –

target hybridization.

 Difficulty in detecting alternative splicing event – although splicing arrays are

employed, unambiguous identification of novel isoforms are harder with small

fragments of probes.

5.1.1.3 Applications

Microarray is used for different kinds of analysis such as expression analysis of genes in normal versus diseased condition, mutation analysis to find single nucleotide polymorphisms (SNPs), genomic analysis to find new genes, disease prognosis,

114 especially, in cancer to find the markers that are up regulated compared to the normal conditions, drug discovery to identify the protein products of the diseased gene, which in turn is used in the synthesis of drugs to target these products and toxicology to find the effects of toxins at the cellular level and changes caused by in the genetic profile due to their exposure.

Recent advancements in the field of deep sequencing, especially in RNA sequencing technology, has over ridden microarray technology to a greater extent. The latter method provides the depth and nuanced information that is not available in the former technology. Also, the bias introduced due to the probe binding in microarray is avoided in RNAseq technology. Additionally, RNAseq can be used to build the whole genome for a new , which is not possible by microarray.

5.1.2 Deep sequencing and RNA sequencing

The advent of high-throughput sequencing based methods has changed the way in which transcriptomes are studied. RNA sequencing (RNAseq) is widely used in the analysis of transcriptome to accurately measure gene expression levels, isoform expression levels and to find novel isoforms. RNA sequencing (RNA-Seq) involves direct sequencing of complementary DNAs (cDNAs) using high-throughput DNA sequencing technologies followed by the mapping of the sequencing reads to the genome. It provides a more comprehensive understanding of the complexity of eukaryotic transcriptomes in that it allows for the identification of novel exons, exon-intron boundaries, transcription start sites [223], and new splicing variants. It also allows for the precise estimation of isoform expression [224-233]. RNA seq’s ability to construct a novel full length isoform

115 sequences as well give their expression levels make this technology to be widely used in the study of en-masse transcription regulation.

One of the earliest technology to obtain a comprehensive profile of the transcriptome, including lowly expressed transcripts, polony multiplex analysis of gene expression (PMAGE) was developed by George M. Church [221]. It has been shown to provide accuracy in the mRNA expression assessment since individual cDNA molecules are subjected to sequencing without any prior library amplification or sub-cloning. The principle of PMAGE is that individual cDNA template molecules are clonally amplified onto each polony bead in millions of parallel, compartmentalized droplets formed in a water- in-oil emulsion. Polony sequence-by-ligation (SBL) provides an accurate, cost-effective and multiplexed platform for high throughput DNA sequencing. SBl works on the premise of the high fidelity of DNA ligase, which iteratively label each bead with a fluorophore encoding the identity of a base within the template. Microscopy-based detection of fluorescence ligation allows for the assessment of about 5 million cDNA molecules per run and digital quantification allows for the analysis of broad range of mRNA expression.

Another form next-generation sequencing was that 454 pyrosequencing, which works on the principle of pyrosequencing. This technology was developed by 454 Life

Sciences. It is quite similar to PMAGE, in that DNA is amplified in an emulsion. This method uses luciferase to generate light for detection of individual nucleotides added to the nascent DNA, data from which is used to generate nucleotide read-out of the sequences. This provides longer reads compared to Illumina but does not have depth coverage. While there are many different platforms for sequencing, the most commonly used platform is that Illumina.

116

5.1.2.1 Illumina (Solexa) sequencing

Illumina sequencing, was jointly founded by Shankar Balasubramanian and David

Klenerman in 1988 [234]. This sequencing is based on the reversible dye-terminator technology. The principle of this method involves formation of DNA clusters, which is the clonal amplification of DNA. This method gnerates shorter reads but the depth coverage is much greater compared to other methods.

Sample preparation: Samples are prepared to using a two-step process: (1)

Tagmentation and (2) Cycle-amplification. Tagmentation is the process where DNA is first fragmented and by transposomes, and specific adaptor sequences are ligated to the 3’ and 5’ ends of the resulting fragments. Subsequently, additional motifs replace the adaptor sequences in the process called reduced cycle amplification. These motifs include sequencing primer binding sites, indexes, and oligo sequences for hybridization in the flow cell [235].

Cluster generation: This step employs a cluster station. It comprises of a flow cell, which can hold up to eight samples (each can go on a lane in that flow cell). Next, the denatured

DNA template is loaded to the flow cell, which anneals to the oligo that are already bound to the flow cell. Once bound, a second strand is synthesized using these oligos, creating a double-stranded template, which is covalently attached to the flow cell surface. This process is repeated after denaturing and washing away original DNA template. The strand that is left over, then bends over to hybridize with the lawn oligo. This is called bridge amplification, which is continued for many cycles to create an exponential amplification of these strands [236].

117

Sequencing: For sequencing, the reverse strands are cleaved and washed away. 3’ ends of the forward strands are also blocked. Read-1 primer is added, which binds to the respective binding site on the read-1 strand. Fluorescently-labeled dNTPs, with a temporarily blocked 3’hydroxyl groups are added. With the incorporation of each nulcleotide, fluorescence is emitted following a light source application, wavelength of which will indicate which nucleotide has been added to the read. Following this, the temporary inactivation is removed at the 3’ hydroxyl group allowing for binding of another nucleotide and the process is repeated. After the read length is reached, it is denatured and washed away. Then, an index-1 primer is added to the flow cell and the first index read is generated the same way the read-1 was generated. Incase of paired end reads, once forward strand is sequenced, the reverse strands are sequenced by uncapping the

3’ ends using index-2 reads. Once it is generated, the excess is washed away.

Polymerase generates a complementary strand and the double-strand generated is denatured. This time, forward strands are cleaved and washed away and 3’ ends of the reverse strand are blocked to inhibit hybridization. Then, read-2 primer is added following the addition of fluorescently labelled dNTPs and the above-mentioned process is repeated for this strand [231, 237].

Reads generated from pooled samples are sorted based on the unique indexing sequences with which they were labeled during sample preparation. Using paired-end reads increase the alignment depth, thereby reducing ambiguity within the alignment.

Data analysis: During the analysis of the large dataset, the reads obtained are mapped to the reference genome. Once the alignment is done, subsequently the data can be interrogated for novel isoforms, splicing events, single nucleotide polymorphisms,

118 insertion-deletion identification, microRNA and long non-coding RNA identification etc

[231].

5.2 Customized bioinformatics pipeline and microarray to understand the biological processes underpinning normal and aberrant retinal development

5.2.1 Rationale:

One of the long standing goals in the retinal field has been to get a complete repertoire of transcriptome underpinning various biological processes and to understand how the transcriptome shifts either during developmental progression or in a pathological condition. To this end, we employed RNA sequencing technology on a fractionated developing mouse retina and devised a custom bioinformatics strategy to extract the biological meaning from the large data-set. The fractionation facilitated the interrogation of the cytoplasmic transcriptome separately from that of the nucleus, thereby allowing us to check if the nuclear transcriptome was preparing for the next molecular event while transcribing genes for the current step, while cytoplasmic transcriptome would mainly reflect that which will be translated into proteins. Also, to understand the degree of alternative splicing usage by genes expressed in the developing retina, we estimated expression levels of the constituent isoforms of genes to leverage the gene expression.

Also, in parallel, we devised a customized unique splice junction microarray, which contained the unique splice junctions for isoforms of genes expressed in RNAseq data.

The pipeline was extended to Nrl wildtype and knockout mouse retinae (Courtesy: Dr.

Anand Swaroop, National Eye Institute) and compared to another methodology, which is generally used by other biologists, to find the best way of analyzing large datasets.

119

5.2.2 Background

The retina has been the most accessible part of the developing central nervous system with a wealth of information on detailed birth order of its cell types and on many genes involved in executing specific programs such as cell cycle regulation, cell fate determination, and neuronal differentiation. However, a comprehensive gene regulatory network is still not achieved as gene-centric approach can only go so far. To address this issue, transcriptome capture to identify co-transcriptionally regulated genes across retinal development has previously been attempted and was of great value [238]. However, these efforts were hampered by the lack of depth of the captured transcriptome and lack of fractionation to gain higher resolution. Another concern was that at any given time the retina consists of different cell types with varied transcriptomes, which renders finding meaning from co-transcriptionally regulated genes difficult. We wanted to investigate whether higher depth of the captured transcriptome through RNA-Seq with minimal cross- compartment (nucleus-cytoplasm) normalization could resolve this issue.

Here we report analysis of RNA-Seq data from cytoplasmic and nuclear transcriptome of the developing retina. We show that combinatorial use of RNA-Seq with our custom bioinformatics strategy revealed the precise order of gene activation and transitions in processes during retinal development. Transition in gene expression was validated and resolved at the isoform level through our custom microarray. Importantly, we show proof of principle by extending our methodology to analyze RNA-Seq data from

P21-Nrl-WT and KO retinae. Our approach which focuses on understanding the temporal progression in gene expression during normal/aberrant development can be extended to development and disease progression of other tissues.

120

5.2.3 Results

5.2.3.1 RNA-Seq of fractionated retina

RNA for deep sequencing was obtained from retinae from E16 as it is the midpoint in embryonic development and P0 as it is a major transition in development. RNA-Seq was performed on rRNA depleted RNA captured from the cytoplasmic extract (CE) of E16 and

P0 along with nuclear extract (NE) of P0. The rationale was that by comparing the CE across development, we would capture mRNA that were most likely translated into proteins, which in turn would reveal transitions in biological processes during the retinal development. Also, comparing the cytoplasmic transcriptome minimizes the contamination of unspliced transcripts contributed by the nuclear fraction, which might spike the FPKM units of isoforms and in turn the gene expression. On the other hand, comparison of P0CE to P0NE would reveal transcriptome dynamics within a time point.

The two-way comparison (time and fraction) would capture change in transcription kinetics with three distinct sets of transcripts: transcripts in both fractions; transcripts exclusively in the CE; and transcripts exclusively in the NE. Overall, the transcriptome captured by RNA deep sequencing was obtained as 99.28, 117.38, and 127.46 million paired-end reads from E16CE, P0CE, and P0NE, respectively. An important decision for

RNA-Seq analysis is setting the threshold for gene expression, which in the field ranges from 0.3–1.0 FPKM [239]. This range suggests that the threshold for any dataset must be vetted through empirical evidence. Thus, we interrogated a range of FPKM values to set the threshold for gene expression in the retina. We found that 1 FPKM was the appropriate value, as genes with known retinal expression were above this threshold,

121 hence considered expressed. In contrast, skeletal muscle-specific genes were below 1

FPKM, and were considered not expressed in the retina (Table 5.1).

Table 5.1: Skeletal muscle-specific genes expression. The table shows FPKM values of skeletal muscle-specific genes in E16CE, P0CE and P0NE samples.

Skeletal-muscle E16CE P0CE P0NE genes FPKM FPKM FPKM Tnnt3 0.10 0.11 0.00 Tnnt1 0.94 0.50 0.50 Tnni3k 0.00 0.02 0.10 Tnni2 0.30 0.21 0.20 Tnni1 0.10 0.90 0.44 Tnnc2 0.00 0.00 0.10 Tnnc1 0.63 0.95 0.91 Tnn 0.00 0.00 0.10

While the low-level expression (<1 FPKM) of skeletal muscle-specific genes might have a yet-to-be-identified biological function in the retina, we reasoned that in the absence of any literature support it would be safe to consider them as not expressed, to ensure high specificity of our analysis, at the cost of possible slight loss in sensitivity. Once the threshold was set, the reads were then subjected to our custom bioinformatics pipeline as described in materials and methods. The output of mapping and gene expression quantification was reported in Fragment per Kilobase per million mapped reads (FPKM) units.

122

5.2.3.2 Validation of RNA-Seq data

A detailed explanation of how to run the scripts in the customized bioinformatics pipeline and how to perform the downstream analysis are given in Appendix 1. We used genes with established expression kinetics to objectively assess the sensitivity of RNA sequencing [240]. For example, Fgf15, Sfrp2, Atoh7 and Irx4 are known to have higher expression levels at E16 than at P0, which was reflected in E16CE compared to P0CE data (Fig. 5.2A) [238, 241-245]. Likewise, expression of Fabp7, Gngt2, Nr2e3, Nrl, and

Rho was as predicted in that it was higher in P0CE compared to E16CE (Fig. 2B) [238,

246-252]. Finally, Pax6 showed little variation between E16CE and P0CE (Fig. 5.2B), which was also as expected [253, 254]. The transcriptional kinetics of some of these genes was independently validated by qPCR analysis across retinal development (E14,

E16, E18, P0, P2, P4, P10, P25), thereby confirming the robustness of both RNA-Seq data and the bioinformatics approach used to assign expression and binning [255].

We also used the same paradigm of genes with established expression kinetics to determine the level of cross-contamination between P0CE and P0NE RNA. To determine the level of nuclear RNA contaminating the cytoplasmic extract, we checked the expression of genes whose transcripts are predominantly nuclear, such as Xist, Malat1,

Tsix and Neat1 [255-258]. Indeed, for all four genes, the FPKM values were significantly higher in the NE compared to the CE (Fig. 5.2C). Determining the level of cytoplasmic

RNA contamination in the nuclear extract presented a unique challenge as the majority of the RNA in the CE would be expected to be in the NE. For this, we examined expression of replication-dependent histone genes, as they are intronless and are known to be efficiently exported to the cytoplasm [259]. Indeed, histone genes have higher FPKM in

123 the CE than the NE (Fig. 5.2D). Furthermore, replication-dependent histone genes also serve to account for genomic DNA contamination. These genes lack introns and do not require splicing, so histone genomic DNA would be read as histone mRNA and inflate

FPKM values in the NE, which was not the case (Fig. 5.2D), thus confirming minimal genomic contamination. Finally, genomic DNA contamination would result in FPKM value

>0 for all genes; however, we observed 0 FPKM in the NE for a large number (804) of genes. In all, these controls suggest that there was minimal genomic DNA contamination in our fractionated NE RNA-Seq data.

5.2.3.3 RNA-seq revealed high-resolution transcription kinetics

We wanted to test the fidelity of our approach in capturing in vivo kinetics and the identification of co-transcriptionally regulated genes. Inherent to our binning strategy is identification of co-transcriptionally regulated genes. Therefore, the aforementioned binning strategy (Fig. 5.1) was employed for the E16CE vs. P0CE comparison and the

P0CE vs. P0NE comparison to extract the transcription kinetics. For example, genes in the E16CE_Only bin had FPKM below threshold in P0CE. This suggests that transcription of these genes was initiated at/before E16 and was downregulated after E16 or just before

P0. Genes in the P0CE_Only bin were transcribed after E16, but before P0. Genes in the

OR_E16CE bin suggest that their transcription was initiated at/before E16 and downregulated after E16, but before P0 such that their FPKM was not below threshold in

P0CE. Overall, 12,041 gene were expressed (Additional file 1: Figure S2) of which 10,369 were non-differentially represented (Non_DR) between E16CE and P0CE (Additional file

2: Table S1.1, S1.2, Additional file 1: Figure S2). Further analysis of alternative splicing status showed that genes in Non_DR bin were alternatively spliced at a higher level (42%)

124 compared to those undergoing transcriptional change (37%, Additional file 1: Figure S2).

Likewise, the binning strategy was employed with the P0CE-P0NE comparison, which showed an increase in the number of expressed genes, which was mostly accounted for by the 2007 genes in the P0NE_Only bin (Additional file 1: Figure S2). Investigation of the alternative splicing status showed that genes over represented in P0NE (OR_P0NE) employed the highest degree of alternative splicing (68%) compared to other bins including OR_P0CE (36%) and Non-DR (52%) (Additional file 1: Figure S2). Next we combined E16CE-P0CE transcription kinetics with those observed in P0CE-P0NE transcription kinetics. Briefly, we took genes in a bin from the E16CE-P0CE comparison and interrogated their distribution in the different bins in the P0CE-P0NE analysis, which yielded high-resolution transcription kinetics. The term “high-resolution transcription kinetics” encapsulates both temporal and detection sensitivity. For example, amongst the genes in the not expressed (No_Ex, 22,331) and E16CE_Only (632) bins, 1628 and 379 were detected above threshold in the P0NE_Only bin, respectively (Fig. 5.2E and 5.2J).

Similarly, 86 genes from the 384 genes in the OR_E16CE bin and 1710 genes from the

10,369 genes in the Non_DR bin were upregulated in the P0NE compared to the P0CE

(Fig. 5.2F and G). In contrast, 35 genes of the 255 genes in the OR_P0CE bin were downregulated in P0NE compared to P0CE, while FPKM for 13 genes was below threshold (Fig. 5.2H). Likewise, 11 genes of 401 genes found in P0CE_Only in the

E16CE-P0CE comparison were downregulated in P0NE compared to CE and 94 genes had FPKM below threshold (Fig. 5.2I). Overall, there were 2007 genes with transcripts exclusively in P0NE, of which 1084 were protein coding genes, 582 were Gm clones, 214 were Riken clones, and the rest were non-coding RNA genes (Table 5.2).

125

Table 5.2: Distribution of 2007 transcripts in P0NE_Only sample

Type of RNA # genes Protein coding genes 1084 Gm clones 582 Riken clones 214 Miscellaneous/Anti- 35 sense/non-coding/rRNA Pseudogenes 33 snoRNA 21 microRNA 27 lincRNA 5 snRNA 6

5.2.3.4 Transcriptionally coupled genes revealed molecular programs in the developing retina

Our objective here was to employ RNA-Seq to find transcriptionally coupled genes so that we could leverage them to discover molecular programs being employed during retinal development. For this, genes were subjected to DAVID analysis (Fig. 5.3A-I).

Interestingly, our first submission of genes (632; Fig. 5.2E) in the E16CE_Only bin to

DAVID did not enrich for any statistically significant (Benjamini <0.05) functions (Fig.

5.3B). But, other bins with fewer or more genes than in E16CE_only, yielded many functions (Fig. 5.6, Additional file 3: Table S2.1). For example, genes in the OR_P0CE bin enriched for 7 functions of which the top hit was “visual perception” (Fig. 5.3B,

Additional file 3: Table S2.1), showing that this function was initiated just before birth. This

126 showed that transcriptionally coupled genes could inform biological processes that were executed at that developmental timepoint.

One caveat to our binning strategy was that while it grouped genes based on transcription kinetics, it separated genes participating in a common biological process (identified by

DAVID) into different bins. This in turn would prevent extraction of the expression kinetics of genes known to participate in executing a biological process of interest. To address this issue, we devised a subtractive iterative approach using GeneMANIA (Fig. 5.3C-II) to identify genes participating in a common biological process from the different bins. For example, 17 genes in OR_P0CE genes (E16CE-P0CE) that enriched for visual perception by DAVID were used as bait in GeneMANIA, followed by the aforementioned subtractive iterations (Fig. 5.3C; 1x-3x) to generate a final list of 36 genes (Fig. 5.3C,

Right). Subsequently, each gene was assigned to its respective bin in both E16CE-P0CE and P0CE-P0NE comparisons (Fig. 5.3D-III). Redistribution of the genes into their respective bins is shown in Fig. 5.3E. For visual perception, the transcription of Rdh5, which converts all-trans retinal to 11-cis retinal [260] was initiated at/before E16, shut down just before birth and was initiated again at birth as we find it in P0NE_Only bin in

P0CE-P0NE (Fig. 5.3E). In contrast, Crb1, Cngb3, Pcdh15 and Rgs9, which play a role in photo transduction [261, 262] and structural support/maintenance of photoreceptors

[263, 264], were transcribed at/before E16 and upregulated at P0 as they were over- represented in P0NE in P0CE-P0NE (Fig. 5.3E). Rp1, which is a photoreceptor-specific microtubule-associated protein [265], was the only gene that was transcribed at P0

(P0CE_Only in E16CE-P0CE) that continued to be upregulated as it was over- represented in P0NE in P0CE-P0NE (Fig. 5.3E). Transcription of Guca1a was initiated

127 between E16 and P0 (P0CE_Only in E16CE-P0CE), except it was turned off before P0

(P0CE_Only in P0CE-P0NE) (Fig. 5.3E). Through this method we were able to deconstruct the precise activation of genes involved in many aspects of vision acquisition/phototransduction during embryonic development.

The same analysis was performed on all the bins in P0CE-P0NE comparison.

Specifically, genes in the OR_P0NE bin showed enrichment for 120 GOterms of which one of them was “synapse” (Additional file 3: Table S2.2). This functional enrichment agrees with studies showing that synaptogenesis occurs postnatally in the rodent retina

[266]. Further analysis of genes underlying the GOterm “synapse” showed transcription initiation of the AMPA receptor subunit genes including, Gria1, Gria2 and Gria4 before/at

E16 (Non_DR in E16CE-P0CE) (Additional file 1: Figure S3). Similarly Grik2, which encodes for a subunit of the ionotropic kainate receptor, was also initiated before/at E16

(Additional file 1: Figure S3). Gad2, which is necessary for the production of the inhibitory neurotransmitter GABA, was transcribed before E16, while Gad1 transcription was initiated after E16 prior to birth (Additional file 1: Figure S3). Overall, genes involved in formation of the presynaptic activity were activated mostly during embryonic development

(Additional file 1: Figure S3). In contrast, genes involved in postsynaptic activity had overlapping transcriptional activation with a subset of genes (Grid2, Grid1, Grik5, Grin3a,

Ryr2 and Shank2) that were specifically activated in P0NE (Additional file 1: Figure S3).

Finally, employing the same analysis for genes in P0NE_Only, which reflected de novo transcription, enriched for 14 GOterms, where voltage-gated calcium ion channel activity was one of the top hits (Additional file 3: Table S2.2). Again, this enrichment did agree

128 with previous studies where it has been shown that calcium channel activity is crucial for the construction of functional synapses that occurs postnatally [266].

5.2.3.5 Extending the analysis to other time points through the custom microarray

To confirm our RNA-Seq findings and extend our analysis across retinal development, we leveraged our RNA-Seq data to design a custom microarray. The array was designed to en masse validate isoform kinetics by assaying for unique exon-exon junctions of a subset of genes across retinal development. The junctions were selected based on the following criteria: 1) gene must have more than one isoform expressed in the RNA-Seq data; and 2) An exon-exon junction must be unique such that it is not found more than once in all of the isoforms for that gene in the Ensembl database. In all, the microarray had 28,575 probes for 5581 genes and was employed to interrogate expression of these isoforms in the cytoplasmic transcriptomes of E12, E16, E18, P0, P4, P10 and P25 retinae. Data obtained were subjected to K-means clustering with the Genesis software, which was set to generate 10 clusters (Additional file 4: Table S3). Based on the overall patterns across time, the clusters were organized into three groups: embryonic, embryonic + postnatal and postnatal clusters (Fig. 5.4A-C), subsequently referred to as clusters 1, 2, and 3, respectively.

Next, we applied DAVID analysis to each cluster and found that embryonic clusters

(Clusters 1 and 2) enriched for functions such as cell cycle regulation and cell projection organization (Fig. 5.4D). For the embryonic + postnatal clusters (Clusters 3 - 9), the functional GOterms that were enriched were those required for cell cycle regulation and terminal differentiation of neurons, such as vesicle-mediated transport, synapse formation, negative regulation of apoptosis, and axon guidance (Fig. 5.4E). Finally, the

129 sole postnatal cluster (Cluster 10) was the only one that enriched for functions such as visual perception, photoreceptor cell differentiation and sensory perception of light (Fig.

5.4F). In all, the isoform-specific microarray confirmed RNA-Seq findings and further revealed the complexity of alternative splicing employed by the developing retina.

Thus, binning of the RNA-Seq data was crucial for the identification of co- transcriptionally regulated genes, which in turn gave us the temporal transitions in biological programs enriched by the transcriptionally coupled genes (Fig. 5.6).

5.2.3.6 Comparison with other analysis methods

Our analysis pipeline is characterized by two main features, namely the temporal nature of the analysis and the binning strategy that allow us to do the functional analysis on genes with very specific expression kinetics. The current norm for next generation sequencing based functional analysis in the literature is done based on simple differential expression (DE) analysis, where the whole list of DE genes is fed into DAVID, or a similar functional annotation analysis platform.. The list of DE genes can sometime be too large, exceeding the input size limit of the functional analysis tool. To overcome this problem, a sub-list is sometimes selected based on prior knowledge of the gene functions [267]. Most analysis are also done in a static manner, where samples represent two conditions at the same time point [268]. In this section, we study the effect of binning and the temporal analysis on the result, by varying the analysis strategy. We first apply our analysis strategy to RNA-Seq data from P21-Nrl-WT and P21-Nrl-KO (Courtesy Dr. Anand Swaroop; NEI)

[268].Then we apply three variants of the analysis pipeline, listed in Table 5.3, to the same data, and we compare the results.

130

Table 5.3: The table shows our analysis strategy, highlighted in grey and the three other analysis variants we compared with.

Temporal analysis coupled Static analysis coupled with with binning strategy binning strategy Temporal analysis coupled Static analysis coupled with with simple DE analysis simple DE analysis

5.2.3.7 Temporal analysis combined with static analysis of Nrl WT and KO RNA-

Seq is more informative

The loss of Nrl results in cell-fate switch from rod to cone photoreceptors [269]. This made Nrl-KO an ideal system to test our hypothesis that temporal comparison would yield more information than the static analysis. First we performed static comparison between

P21-Nrl-WT and P21-Nrl-KO data (Additional file 1: Figure S4, Additional file 5: Table

S4.1), similar to the one previously reported [269]. The objective of transcriptomics analysis of wild-type and knockout tissue is to find genes undergoing change to reveal the resulting biological change in the absence of that gene. Surprisingly, DAVID analysis of genes undergoing dynamic changes in gene expression in static comparison of P21-

Nrl-WT and KO enriched for a couple of generic functions that did not could not give any meaning in terms of the knockout phenotype (Additional file 3: Table S2.5).

Our RNA-Seq analysis of data from E16CE to P0CE and P0CE to P0NE comparisons showed that comparison across time (∆/time) was crucial in revealing biologically relevant meaning from co-transcriptionally regulated genes. Therefore, we introduced the variable of time by comparing P21-Nrl-WT and P21-Nrl-KO data separately

131 to our P0 data (P0CE + P0NE) (Additional file 1: Figure S4, Additional file 5: Table S4.2,

S4.3). The rationale was that ∆/time would reveal unique sets of gene in P0 vs. P21-Nrl-

KO comparison, which in turn would reveal changes in the biological processes. In both comparisons, there were several bins with co-transcriptionally regulated genes

(Additional file 1: Figure S4), which is not surprising, considering the major developmental shift from newly born developing retinae at P0 to fully functional P21 retinae. Also, DAVID analysis yielded significant GOterms for all the bins in the temporal analysis (Additional file 3: Table S2.3, S2.4). Moreover, there were many common functions in P21_Only bin

(P21WT_Only and P21KO_Only) in both analyses. The most relevant issue was to ascertain the overlap in the identity of genes for the same process in the P21_Only bin in both sets of comparisons. For example, one of the common function in P21_Only bin was

“visual perception”. Interrogating the genes underlying the GO term “visual perception” in these bins, there were 21 genes common to both P0-P21WT and P0-P21KO analysis.

However, 3 genes (Gnat1, Gucy2f, Rpgr) were only found in the P0-P21WT comparison, and one gene (Glra1) was specific to the P0-P21-KO comparison (Fig. 5.5, Left). The three genes unique to WT comparison are known to operate specifically in rod photoreceptors, which were of course absent in the Nrl-KO retina [270-272]. On the other hand, Glra1 is an important gene for cone-bipolar cells, which might be undergoing adaptive changes in the Nrl-KO retina [273]. The GO terms enriched by genes in

P21_Only bin in the P0 vs. P21-Nrl-KO comparison were also informative. One of the functions enriched in DAVID was “regulation of blood pressure” (Fig. 5.5, right). Analysis of function of the genes underlying this enrichment revealed that most of them were engaged in vasodilation, suggesting that the Nrl-KO retina was undergoing vasculature

132 restructuring/dilation as a secondary effect of the cell-fate switch. Moreover, a recent report showed that the Nrl-KO retina develops dilated retinal blood vessels and leakage at P60 [274]. This showed that our gene expression/binning strategy captured the molecular signature at P21 for a phenotype that manifests histologically at P60.

5.2.3.8 Binning vs. DE based analysis of Nrl WT and KO data

Here we have introduced a new strategy for RNAseq data analysis. To understand the advantages afforded by this approach we compared our binning method to the more commonly used DE method. First we analysed the static comparison between Nrl-WT vs.

Nrl-KO data by the binning method and the DE-based method. Here we found that the

DE-based method showed large number (1116) of genes with differential expression

(Additional file 6: Table S5.1). Similar data was obtained by the binning method except that the DE genes were now in bins labelled OR and Only, which reduced the large list of

DE genes into manageable quanta. Moreover, the inherent value of the bins is inferred transcriptional kinetics. However, the binning method has a cost, which is reflected in the few statistically significant biological functions enriched by DAVID analysis. In contrast, the DE-based method generates a large list, which requires the investigator to decide the fold-change that might be relevant for his/her study. The one advantage of the DE method is that it produces a wide range of functional enrichments (12 GOterms) for DE gene-list for DAVID analysis(Additional file 7: Table S6.3).

Next, we analyzed temporal comparison between P0 vs. P21 Nrl-WT and P0 vs.P21 Nrl-KO by both the binning method and the DE-based method (Additional file 6:

Table S5.2, S5.3). Here the binning method revealed gene transcription kinetics across time, which inherently is valuable for deciphering developmental changes in normal and

133 aberrant situations. The DE-based method produced a large DE gene-list, but it did not yield any change in gene transcription over time (Additional file 7: Table S6.1, S6.2). Thus, it would require further deconstruction of the list, which is not necessary by the binning strategy. Also, DAVID analysis of the binned data provided enrichment of biological functions tethered to the specific gene expression profile. This is of great value as it is one of the central goals of performing RNAseq to capture transcriptome change over time

(Additional file 3: Table S2.3, S2.4). In all, the two approaches of RNAseq analysis have merits that can be leveraged to effectively analyze RNAseq data.

5.2.4 Discussion

Amongst the co-transcriptionally regulated genes identified by our binning strategy, genes that remained transcriptionally unaltered employed a higher degree of alternative splicing than those undergoing dynamic regulation. This suggests that during development the major transcription initiations and terminations might lay down the foundation, while the proteome diversity generated through alternative splicing might be engaged in resolving the finer details, such as neuronal subtype specification and terminal differentiation.

Another advantage of the binning strategy was that the large transcriptome data was quantized, allowing us to interrogate the dataset for genes with established expression kinetics. The purpose of this was to challenge the binned data generated based on the 1

FPKM threshold for known gene expression patterns (Fig. 5.2A&B).

Once the dataset was validated, the status of genes in various bins in the E16CE-

P0CE comparison in the P0CE-P0NE comparison was interrogated, which revealed dynamic transitions in the transcription kinetics in the same developmental stage. For example, the 632 genes in E16CE_only bin in the E16CE-P0CE comparison bifurcate

134 into No_Ex and P0NE-Only bins in the P0CE-P0NE comparison (Fig. 5.2E). This suggests that while some genes will remain off, some are re-initiated and predicts that these transcripts will appear in the CE of the next developmental stage. This was confirmed by our quantitative PCR analysis for Nr2e3, Nrl and Rho across postnatal retinal development [255]. The presence of transcripts for 1084 protein coding genes in the P0NE_Only bin suggested that we had captured de novo transcription of genes that might be required for the next developmental program. For example, Ces5a is a ~36 kb gene that has FPKM below threshold in E16CE and P0CE, but has FPKM of 161 in P0NE.

This value is much higher than FPKM of genes such as Nr2e3 (35.4 FPKM), Nrl (6.1

FPKM) and Gngt2 (22 FPKM) (Table 5.4).

Table 5.4: An example of a “P0NE_Only” gene, Ces5a, whose FPKM unit in P0NE is comparable to those of other genes with known established expression kinetics (Nrl, Nr2e3, Gngt2)

Genes E16CE FPKM P0CE FPKM P0NE FPKM Ces5a 0.073539 0.420057 161.0296 Nrl 0.558403 15.88869 6.104338 Nr2e3 0.008781 28.08 35.45415 Gngt2 10.06518 91.92727 22.08052

This indicates that expression of Ces5a is more than physiologically equivalent, yet it is not observed in P0CE where it could be translated. One possibility is that there is an active regulation of its export, although this warrants further investigation.

The intrinsic value of identifying co-transcriptionally regulated genes is the expectation that they might reveal the biological processes being executed by the

135 developing retina. Our bioinformatics pipeline can deconstruct the order of activation of specific genes engaged in executing a specific biological process so that one can begin to generate gene regulatory networks underlying retinal development. A key feature of our pipeline is the use of GeneMANIA to find potential partners of the core set of genes from a specific bin that enrich for a function in our DAVID analysis (Fig. 5.3C). A priori, one would predict a progressive increase in the number of genes with sequential application of the GeneMANIA part of the pipeline (Fig. 5.3C). However, we observed that there was reduction in the number of genes (Fig. 5.3C, Right). This suggests that leveraging RNA-Seq data to remove genes that were not expressed in the retina enriched for those genes relevant to retinal development and function at the timepoint under investigation.

Next we applied our analysis pipeline to find co-transcriptionally regulated genes in the P0 and P21-Nrl-WT comparison and the P0 and P21-Nrl-KO comparison

(Additional File 1: Figure S4, Additional file 5: Table S4). One of the salient features of this analysis was that temporal analysis was more informative than static comparison.

One explanation is that temporal analysis created bins that were developmentally regulated, which through DAVID analysis revealed changes in biological processes. For example, there is no cell cycle occurring at P21 so the majority of the cell cycle genes should be inactivated. Indeed, we observe cell cycle in the P0_Only bin in both P0 vs

P21-Nrl-WT and P0 vs. P21-Nrl-KO analysis (Additional file 3: Table S2.3, S2.4). These genes in static analysis would show up as not expressed. Similarly, genes in P21_Only bin enriched for functions such as ion channel activity, ion transport, visual perception, synapse, voltage-gated ion channel activity, neurotransmission and others (Additional file

136

3: Table S2.3, S2.4). This was as expected as the retina is fully functional at P21 compared to P0. The advantage of our strategy is that it allowed us to understand the progression in gene expression kinetics in normal development and leverage that to understand how this progression deviates in the knockout retina. When P21_Only bin

(either P21WT_Only or P21KO_Only) was analyzed through DAVID, we found many functions that were common to both sets of comparison, except examination of the number of genes underlying these functions revealed that there were subtle differences between the two bins (P21WT_only and P21KO_Only) (Additional file 3: Table S2.3,

S2.4). This suggested that while many of the functions remain unaltered in the KO, there are subtle changes in the manner in which they might be executed. For example, “visual perception” showed up in the P0 vs. P21-Nrl-WT and P0 vs. P21-Nrl-KO comparisons in the P21_Only bins (Fig. 5.5). There were 24 genes underlying enrichment of this function in the WT comparison (Fig. 5.5), while there were 22 genes in KO comparison (Fig. 5.5).

Upon comparing the gene identities from both sets, subtle differences emerged that allowed us to find the biological meaning from change in a single gene such as the rod photoreceptor-specific gene, Gnat1, that was absent in the Nrl-KO retina, which lack rod photoreceptors [270]. Finding Gnat1 through temporal analysis raises the question whether it would have been found in static analysis. While Gnat1 was present in the

P21WT_Only bin in static analysis, the rest of the genes that would normally be part of the GO term “visual perception” were in the Non_DR bin. Thus, without a priori knowledge one would not find this specific gene out of the entire list of genes in the P21WT_Only.

Temporal analysis combined with our gene expression and binning strategy followed by our custom bioinformatics pipeline was able to find these subtle changes, which in case

137 of static analysis was not possible (Additional file 1: Figure S4). While one could find these subtle changes in the static analysis by looking at specific genes, but that requires a priori knowledge. The advantage of doing whole transcriptome analysis is that one could find patterns computationally, which can be leveraged to obtain new insights without the need for a priori knowledge. For example, GOterms such as potassium channel complex, sodium channel activity, synaptic vesicle, calcium ion transport and regulation of blood pressure regulation (Additional file 3: Table S2.4) were enriched by genes in the

P21_Only bin in P0 vs. P21-Nrl-KO comparison, but were absent in the WT comparison.

This finding suggests that there are specific anomalies in the Nrl-KO retina. Given that in the Nrl-KO retina, the majority of the rod photoreceptors have converted to cone photoreceptors, changes in ion transport and synaptogenesis are to be expected [17,

275]. However, regulation of blood pressure seemed out of place for the Nrl-KO retina.

Indeed, closer examination of the genes underlying this function revealed the need to examine vasodilation in the Nrl-KO retina. Notably, previous reports showed that the Nrl-

KO retina develops dilated retinal blood vessels and leakage at P60 [274]. Thus, this confirming the prediction made through shifts in the molecular signatures identified by our temporal analysis. Importantly, our analysis predicted an outcome based on gene expression pattern changes occurring between P0 to P21 that manifests at P60.

5.2.5 Materials and Methods

Animal Procedures

All experiments used CD1 mice from Charles River Laboratory, MA. All mice procedures were compliant with the protocols approved by the University of Connecticut’s Institutional

Animal Care and Use Committee (IACUC).

138

RNA fractionation

Retinae were dissected from E16 embryos and P0 pups followed by fractionation protocol as described previously [255]. Once the fractions were obtained, Trizol (Invitrogen, CA, cat # 15596-026) was used as per the manufacturer’s instruction.

Library preparation for deep sequencing

After the total RNA was prepared from the two fractions, ribosomal RNA (rRNA) was removed using Ribozero Ribosomal RNA removal kit (Epicenter, WI, cat # RZH1046) by following the manufacturer’s instruction. The removal of rRNA was confirmed by gel electrophoresis and was used for RNA-Seq library preparation. RNA-Seq library was prepared using Script-seq mRNA seq library preparation kit (Cambio, UK, cat #

SS10906). The library was deep sequenced in multiple runs using Illumina Hi-Seq 2000 platform at the University of Connecticut Health Center Deep sequencing core facility.

P21 WT and Nrl-KO RNA-Seq data was shared with us by Dr. Anand Swaroop; National

Eye Institute [268].

RNA-Seq analysis

CD1 reference creation

The transcriptome captured by deep sequencing was obtained as short paired-end reads.

We analyzed the RNA-Seq data from each sample through riboPicker [276], an algorithm to identify reads derived from rRNA, which showed minimal (0.02 – 0.34%) rRNA transcripts (Table 5.5).

139

Table 5.5: Read mapping statistics and rRNA levels in the E16CE, P0CE, and P0NE samples

Sample Percentage of transcriptome Percentage of # mapped bases mapped read pairs rRNA reads in Gb E16CE 61.26% 0.34% 7.54 P0CE 52.85% 0.19% 7.69 P0NE 25.42% 0.03% 4.02

Next, reads were mapped to the mouse genome and the transcriptome to create a reference. The mouse genome sequence (mm10, NCBI build 38) was downloaded from

UCSC database [277, 278] together with the GTF for the Ensembl transcript library

(release 68) (http://genome.ucsc.edu). The paired-end reads from E16 cytoplasmic extract (CE), P0CE and P0 nuclear extract (NE) were mapped separately to the mm10 genome and the transcript sequences extracted according to the Ensembl transcript library coordinates. Mapping was done using bowtie [279] and allowed for three mismatches in seed-length of 30 bases. For each sample, the two sets of read alignments

(genome and transcriptome) were merged together using the HardMerge tool from the

NGSTools suite [280]. HardMerge keeps alignments of read that align uniquely to the transcriptome, uniquely to the genome only or uniquely to each reference, where both alignments agree. Other alignments are discarded. This initial mapping was used to perform mismatch analysis with another tool in the NGSTools suite (Additional file 1:

Figure S1). Accordingly, the first 6 and last 32 bases from each read were trimmed. The trimmed reads (62 bp) were remapped using the aforementioned mapping parameters to the genome and transcriptome, and were once again merged using the HardMerge rules.

Since the RNA-Seq was performed on retinal RNA extracted from CD1 strain mice, the

140 resulting alignments from all three samples were pooled together and used to call Single

Nucleotide Variations (SNVs) using SNVQ [280]. The CD1 reference genome was created by modifying the mm10 reference to reflect the inferred SNVs. Transcript sequences were extracted from the modified genome.

Gene Expression Analysis

E16CE, P0CE and P0NE reads were mapped against the created CD1 transcriptome reference. The P21 WT and KO single end reads were mapped against the C57BL6 reference transcriptome (version 68). Mapping was done using bowtie and allowed for one mismatches in seed-length of 30 bases. Gene expression were estimated using,

IsoEM, a novel expectation-maximization algorithm that estimates isoform frequency from single and paired RNA-Seq reads. IsoEM exploits read disambiguation information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand, and read pairing information. Isoform expression is reported as Fragment per Kilobase per million mapped reads (FPKM) units and gene expression is the sum of FPKM of its constituent isoforms

[281]. For gene differential expression, two methods were run, GFOLD [282] and Fisher’s exact test with house-keeping gene normalization as suggested by Bullard et al. [283].

Gapdh was used as the housekeeping gene for this analysis. Genes were called differentially expressed if they showed ≥2 fold expression in one sample by both methods.

GFOLD was run on the CD1 transcriptome aligned reads, with default parameters and a p-value of 0.01. Fisher’s exact test was run on estimated number of reads mapped per kilobase of gene length (calculated from IsoEM estimated FPKM values). Similar to

GFOLD, a p-value of 0.01 was used for Fisher’s exact test.

141

Binning strategy

Samples were analyzed in pairs, and genes were classified based on their expression levels (expressed vs. not expressed), differential gene expression calls, and the number of expressed isoforms for the gene. 1 FPKM was set as threshold for expression. Genes were then classified into one of the following bins (Fig. 1) based on yes/no calls. Firstly, genes with expression level less than 1 FPKM are placed in the not expressed (not_Ex) bin (Fig. 1i). Rest of the genes, which were expressed were further categorized into the following bins (Fig. 1ii). Genes expressed exclusively in one sample were placed in one of the ONLY bins (Fig. 1iii). Differential expression calls were made to genes expressed in both samples (Fig. 1iv). If a gene passed GFOLD and Fisher’s test, then it was placed in the over represented (OR) bin. Genes, which did not pass both or either of these tests, were placed in the non-differentially represented (non_DR) bin. Bins of expressed genes were subcategorized based on the alternative splicing status of the genes. This categorization included single and multiple isoform bins (SI, MI). There were genes with multiple isoforms that were individually below threshold, but the sum of FPKM values of these isoforms is above threshold. These were placed in the multiple isoforms below threshold (MIBT) bin. Similarly, genes expressed with multiple isoforms where only one isoform was above threshold were placed in the multiple isoforms one above threshold

(MOAT) bin.

Functional annotation analysis

Genes belonging to each bin were analyzed for enrichment of individual GOterms to find whether co-transcriptionally regulated genes had overlapping functions using the

Database for Annotation, Visualization and Integrated Discovery, DAVID (Fig. 3A-I) [284,

142

285]. Default parameters (≤0.05 Benjamini score) were used for all analyses. The gene lists enriching for GOterms were run through GeneMANIA to identify potential partners

[286]. First, we took the gene list underlying a biological process identified by DAVID and used it as bait in another online tool called GeneMANIA (Fig. 3C-II), which identified potential partners for genes from the primary list. The potential partners identified by

GeneMANIA are based on all published literature and databases, which could introduce a partner that is relevant in another tissue, but might not be expressed in the retina. To eliminate such genes, we selected only those genes that were expressed in our RNA-

Seq data. Subsequently, this short list of genes was added to the primary list to generate a secondary list, which was used again as bait in GeneMANIA. This process was repeated until convergence, which was reached after three iterations (Fig. 3C-II).

Microarray design

We designed a custom microarray with Affymetrix to en masse interrogate the presence/absence of unique exon/exon junctions in isoforms expressed in the RNA-Seq data. After mapping the RNA-Seq reads from the three samples (E16-CE, P0-CE and

P0-NE) to the Ensembl 68 transcripts and running IsoEM to estimate the FPKM values for each of the three samples, genes expressed in any of the three samples were selected. Exon-exon junctions that are unique among expressed transcripts in genes that have more than one expressed transcript were selected (Junctions with flanking sequences of length ≤ 12 bases were eliminated). As a result, we included probes for

28,574 unique junctions from 11,923 transcripts on the custom Affymetrix microarray chip.

143

Custom Affymetrix data analysis

Cytoplasmic RNA (1µg) was prepared from retinae harvested from E12, E16, E18 embryos and P0, P4, P10 and P25 and processed at Yale Center for Genome Analysis.

Expression levels of probe targets were computed from the raw intensity values using the

Robust Multichip Average (RMA) algorithm [287, 288], which was performed with the affy

R package [289]. Subsequently, the data were processed through Gene Expression

Similarity Investigation Suite (Genesis) (v1.7.6) for k-means clustering [290]. Here, we ran 500 iterations to generate a total of 10 clusters that fell into three categories based on expression kinetics. These trends were defined as embryonic, postnatal, and embryonic + postnatal. The gene lists belonging to each cluster were separately analyzed for functional enrichment using DAVID.

5.3 Extension of the pipeline to understand the effect of triple microRNA knockout in mouse retinal development

5.3.1 Rationale

We devised a custom bioinformatics pipeline which we leveraged to extract biological processes underpinning retinal development and extended the analysis to a wildtype and knockout paradigm. Once we established, the workability of our pipeline, we wanted to extend it to study the temporal progression of wildtype and knockout and use the information to extract the pathway affected in the knockout during the course of the aberrant development. Here, we used RNAseq data obtained from a normal and triple microRNA (miR 183/96/182) knockout retinae (Courtesy: Dr. Lijin Dong, National Eye institute) to identify molecular signatures that are changing in the knockout compared to the wildtype developing retina. This data was leveraged to the expression of one of the

144 key targets of the microRNA cluster named Mak. Quantitative PCR results from Dr. Dong has shown that this gene is shown to be upregulated in the knockout compared to the wildtype and this result was used to leverage the output of our pipeline.

5.3.2 Background

microRNAs are class of small non coding RNAs, which are about 22 nucleotides in length. These evolutionarily conserved class of RNA play an important role in post transcriptional regulation by binding to the 3’UTR of mRNA target via RNA interference mechanism. Since their discovery in C. elegans, about 2000 microRNAs have been identified and have been shown to play a key role in various cellular processes such as cell proliferation, regeneration, hematopoiesis, neuronal differentiation etc. Also, they are known to regulate cellular pathways as multiple microRNAs can target one mRNA and one microRNA can have multiple mRNA targets. microRNAs have diverse temporal and tissue-specific expression patterns across different species. Previous studies on various microRNAs in primary cell culture have shown that they play a vital role in the regulation of angiogenesis of choriocapillaries, oxidative damage prevention and proper maintenance of cultured RPEs. They are also strongly linked to the ER stress, implicating their role in various diseases such as AMD. Approximately 78 microRNAs have been shown to be expressed only in the retina, implying the importance of microRNAs in the regulation of gene expression in the retina [291, 292]. In Drosophila, microRNAs have been suggested to play a role in photoreceptor differentiation, while in murines, they have been linked to lens differentiation [292]. In mice, inactivation of

Dicer, which is required for microRNA maturation, has been shown to cause progressive degeneration of retina [293].

145

More than 30% of animal microRNAs are organized into clusters (miRBase), which are group of microRNA precursors with an inter-microRNA distance of less than

10kb on the same genomic strand [294]. One of the highly conserved cluster of miRNAs is miR-183/96/182, which is located on an intergenic region of human chromosome 7q.

In mouse, the sequences coding for these miRNA are located within 4 kb of each other on chromosome 6qA3 and transcribed in the same orientation. They have been shown to give very similar expression patterns in zebrafish whole-mount experiments [295]. miR-96/182/183 cluster play an important role in both development and disease and are considered as a specific cluster for sensory organ development, particularly in retina

[296] and in hair cells of the inner ear [297]; their expression has also been documented in several cancers including breast, ovarian, endometrial, bladder, prostate, lung, hepatocellular and thyroid [298-300]. Inhibition of the miRNA cluster has been shown to induce p53/apoptosis signaling pathway [301]. The temporal and spatial expression of miR-96/-182/-183 cluster has been shown to correlate with functional maturation of the inner ear [297]. Expression analysis of miR-96/182/183 across retinal development have been shown to be minimal at embryonic time points with a steady increase postnatally and reaching the peak in the adult retina [302]. Insitu hybridization of retina has shown miR-183/96/182 cluster to be highly expressed in photoreceptors and interneurons of the inner nuclear layer (INL) and was shown to be down-regulated during dark adaptation and up-regulated in light-adapted retina [303]. In various disease models including the three photoreceptor-linked mouse models of RP (Opsin/, 307 and rds), a human retinitis pigmentosa model, expression of the miR-183/96/182 cluster was shown to be dramatically decreased [303], suggesting a correlation between decreased

146 expression and retinal disease. Inhibition of miR-83/96/182 cluster have also been shown to cause severe retinal degeneration in sponge (for the cluster) transgenic mice

[296, 303]. Despite their implication in retinal degeneration, role of miR 96/182/183 in normal retinal development is not well understood.

The vertebrate retina is a part of the central nervous system and contains six neurons including, rod photoreceptors, cone photoreceptors, amacrine cells (AC), bipolar cells (BP), horizontal cells (HC), and ganglion cells (GC) and one glial cell namely Müller glia. The rod and cone photoreceptors are highly specialized neurons, divided into morphologically and functionally distinct compartments: a synaptic terminal, an outer segment, an inner segment, and a cilium connecting the outer and inner segments [304]. Outer segments (OS) are formed initially from the primary cilia and contain an axoneme, which begins at the basal bodies, and passes through a transition zone also called the connecting cilia and into the outer segment. The miR182/96/183 cluster has been shown to play a vital role in the maintenance of cone photoreceptor outer segments [296]. Global analysis of the cluster knockout has shown to enrich for genes involved in synaptogenesis, synaptic transmission and phototransduction [296] but the systematic analysis to find the primary, secondary and tertiary effects of the cluster knockout in the retinal development has not been done.

5.3.3 Results and discussion

5.3.3.1 Transcriptomics summary of the static comparison

RNA sequencing (RNAseq) data from three developmental time points, including postnatal (P) 5, P11, P27 wild type (WT) and triple microRNA knockout (KO) retinae were analyzed. Static comparison was performed where the wild type and knockout at a given

147 time point were compared to each other (Fig 5.7A). The binning data showed that 14,544 genes were expressed in P5WT and P5KO comparison, 14,598 genes were expressed in P11WT –P11KO comparison and 13,778 genes were expressed in P27- P27KO comparison (Fig 5.7B-D). In P5WT-KO comparison, of the expressed genes, 13,643 genes were non-differentially represented (Non_DR), while 12 genes were over represented in P5KO (OR_P5KO) and 1 only one gene was over-represented in P5WT

(OR_P5WT) (Fig 5.7B). 319 and 569 genes were present in the P5WT_only and

P5KO_only bins, respectively. In P11WT-KO comparion, of the expressed genes, 13,643 genes were non-differentially represented (Non_DR), while 9 genes were over represented in P11WT (OR_P11WT) and 1 only one gene was over-represented in

P11KO (OR_P11KO). 331 and 614 genes were present in the P11WT_only and

P11KO_only bins, respectively (Fig 5.7C). In P27WT-KO comparison, of the expressed genes, 13,133 genes were non-differentially represented (Non_DR), while 3 genes were over represented in P27KO (OR_P27KO) and 8 genes was over-represented in P27WT

(OR_P27WT). 277 and 357 genes were present in the P27WT_only and P27KO_only bins, respectively (Fig 5.7D).

Alternative splicing status: When the alternative splicing (AS) status of each bin was interrogated in P5WT-KO comparison, 59 % of genes in Non_DR bin employs AS, while

38% of genes in P5WT_Only bin and 47% of genes in P5KO_Only bin employ AS. Similar trend is seen in other two static comparisons as well (Fig 5.7B). In P11WT-KO analysis,

59 % of genes in Non_DR bin employs AS, while 36% of genes in P11WT_Only bin and

42% of genes in P11KO_Only bin employ AS (Fig. 5.7C). In P27WT-KO analysis, 58 %

148 of genes in Non_DR bin employs AS, while 35% of genes in P27WT_Only bin and 41% of genes in P27KO_Only bin employ AS (Fig. 5.7D).

Thus, genes in Non_DR showed greater degree of AS, which is pattern that has consistently been shown in the other analyses of retinal developmental time points as well. It further emphasizes the importance of regulation of AS at the isoform level, when the at the gene level, the expression is not changed very much.

5.3.3.2 Transcriptomics summary of the temporal comparison

Temporal comparison was performed where the developmental time points were compared as the course of development progressed separately in WT and KO. In this strategy, P0 wild type data, which was used to compare the P21 Nrl WT and KO data, was used to compare with P5WT and P5KO to understand the changes that could have manifested in the P5KO retina. Also, P5 was compared to P11 and P11 was then compared to P27 (Fig 5.8A).

The binning data of P0WT – P5WT comparison showed that 15,055 genes were expressed in P5WT and P5KO comparison (Fig 5.8B). Out of 15,055 genes, 10,814 genes were present in Non_DR bin, 307 genes in OR_P0WT bin, while 1081 and 2854 genes were present in P0WT_Only and P5WT_Only bins, respectively (Fig 5.8B). In

P0WT-P5KO comparison, the number of genes in Non_DR bin were 10,886 out of 15,118 genes that were expressed (Fig 5.8C). There were 316, 1000, and 2916 genes in

OR_P0WT, P0WT_Only and P5KO_Only bins, respectively (Fig 5.8C). There was a slight increase in the number of expressed genes in the P0WT-P5KO comparison, which was

149 reflected in P5KO_only bin and Non_DR bins. But, there was no gene over represented in P5WT or KO compared to P0WT.

In P5WT-P11WT comparison, 14,546 genes were expressed, out of which 13,402 genes were present in Non_DR bin (Fig 5.8D). 7 and 10 genes were present in OR_P5WT and OR_P11WT bins, while 571 and 556 genes were present in P5WT_only and

P11WT_only bins (Fig 5.8D). In the KO comparison between P5 and P11, 14,849 genes were expressed, out of which 13,526 genes were present in Non_DR bin (Fig 5.8E). 6 and 4 genes were present in OR_P5KO and OR_P11KO bins, while 731 and 582 genes were present in P5KO_only and P11KO_only bins (Fig 5.8E).

In P11WT-P27WT comparison, 14,586 genes were expressed, out of which

12,804 genes were present in Non_DR bin (Fig 5.8F). 9 and 6 genes were present in

OR_P11WT and OR_P27WT bins, while 1165 and 602 genes were present in

P11WT_only and P27WT_only bins (Fig 5.8F). In the KO comparison between P11 and

P27, 14,614 genes were expressed, out of which 13,144 genes were present in Non_DR bin (Fig 5.8G). 8 and 2 genes were present in OR_P11KO and OR_P27KO bins, while

1113 and 347 genes were present in P11KO_only and P27KO_only bins (Fig 5.8G).

Alternative splicing status: When the alternative splicing (AS) status of each bin was interrogated in P0WT-P5WT and P0WT-P5KO comparisons, genes in Non_DR bin showed the highest degree of AS with 64% of genes with multiple isoforms (Fig 5.8B,C).

Similar pattern was observed in other comparisons where Non_DR bin had about 59% of its genes with multiple isoforms and exhibited the higher degree of AS compared to other bins (Fig 5.8D-G). Genes in the “Only” bins were predominantly ones with only one

150 isoform, therefore no AS event. Overall about 51 – 80% of genes in only bins in all the temporal comparisons had single isoforms (Fig 5.8D-G).

5.3.3.3 DAVID analysis of static and temporal comparison

Once the bins were obtained, next step was to analyze the functions enriched by transcriptionally co-segregated genes present in each bin. DAVID analysis was performed on each bin belonging to every pairwise comparison in static and temporal analysis. Many unique GOterms emerged in the temporal analysis but not as many enriched in the static analysis.

Static Comparison: In static analysis, in P5WT-KO comparison, only bin that enriched for functions was P5KO_Only. There were 7 functional GOterm enrichments and all functions were mostly associated with the synaptic transmission including neurotransmitter receptor activity, ion channel activity, GABA receptor activity, and synapse (Fig 5.9). It has been shown that in the tri microRNA knockout, beta waves by electro retinogram were flat, suggesting that there could be deficits in the synaptic transmission (unpublished data from Dr. Dong). In P11WT – KO analysis, there were two enrichments for P11WT_only and P11KO_Only bins, which mainly enriched for muscle- related functions. There were a few functions relating to cellular maintenance including translation and cellular matrix were enriched in P27WT-KO comparison (Fig 5.9).

Temporal comparison: DAVID analysis of temporal comparison yielded many more functional enrichments compared to static comparison. Interestingly, P5KO_Only bin in

P0-P5KO comparison yielded muscle related functions and synapse related functions, suggesting that these molecular programs could be affected much as early as P5 (Fig

151

5.9). By leveraging the data to P0WT, the aberrance in the normal development was extracted.

P5WT-P11 WT comparison had many interesting functional enrichments. Mainly, genes belonging to P5WT_only bin enriched for ‘cell fate commitment’, which is the case in the developing mouse retina round P5, when many cells including amacrine cells, cone cells, bipolar cells begin to differentiate. Genes in both P11WT_Only and P11KO_only bins, enriched for ‘GABA receptor activity’ function and interestingly, there were different number of genes underlying them. This function was seen in at P5 in static analysis, while in temporal analysis, it appears much later. In the comparison of P11WT-P27WT, there were a few functional enrichments relating to immune system by genes in P11WT_Only, which could be due to the development of glial cells and DAVID analysis could have related glial-associated genes with immune response (Fig 5.9).

5.3.3.4 Covariance and coefficient correlation analysis results

Covariance and correlation coefficient is used to describe linear relationships between two variables, in this case two developmental time points. Correlation and covariance describe how one time point behaves with respect to another. Therefore, when a developmental time point is compared to itself, it has the highest correlation, which is given as 1. This correlation could be positive or negative or neutral. Closer a time point in the gene expression changes to that of another time point, stronger the correlation is and would be closer to 1. When the genes behave in a totally opposite manner between two points, the correlation becomes negative. Usually, the value of sample correlation coefficient, r, ranges from 0 – 0.2, it is considered a very weak or negligible correlation, while 0.2 – 0.4 is considered weak, 0.4 – 0.7 is considered moderate, 0.7-0.9 is strong

152 and 0.9 -1 is very strong correlation. This correlation gives a sense of how a developmental time point behaves, in terms of overall gene pattern changes in every bin or by a particular function. Here, we used genes enriching for the function ‘GABA receptor activity’ in P11WT_Only bin to see how this time point behaves in relation to other time points both in the WT and in KO (Fig 5.10).

The covariance results showed that when a time point is compared itself, the covariance was 1. When P0WT was compared all other time points including P5WT &

KO, P11WT & KO, P27WT &KO, r value was in the range of 0.2 – 0.5, where it was least correlated with P27 (r = 0.2) and most correlated with P5 (r = 0.59) (Fig 5.10). Next, when

P5 was compared in a similar manner, it had highest correlation to the KO counterpart (r

= 0.96), closely followed by P11 (r=0.9), then followed by P27 (r = 0.7) and strangely least correlated with P0 (r = 0.59) (Fig 5.10). This trend explains the course of development, where a given time point is closer to the next developmental time point compared to the previous time point. P5KO followed a similar pattern in that it was the closest to P5WT

(Fig 5.10). P11WT was almost equally closely related to both P5 and P11 WT and KO at about 0.87 – 0.9 r vale. The interesting observation was in P11KO, where it showed a higher correlation with P27WT (r = 0.97) compared to its own WT counterpart (r = 0.92)

(Fig 5.10). This suggests that P11KO might have the gene expression patterns similar to that in P27WT, thereby indicating a developmental shift in the KO, where the patterns observed by P27 in normal development might happen sooner around P11 in the KO (Fig

5.10). However, this result has to be vetted through more covariance analyses for other functional enrichments to see if this pattern truly emerges.

5.3.4 Future directions

153

Currently, I am employing this strategy to find the correlation of genes enriching for various functions in each bin to check if the pattern of P11KO and P27WT correlation is consistent, which if the case, can help in the understanding of retinal development in the absence of this triple microRNA cluster as a whole and might suggest re-wiring of the networks in the P11KO to mimic the gene regulatory networks at P27 in the wild type retina.

5.3.5 Materials and methods

Binning strategy

Binning strategy employed in this analysis is same as that described in section 5.2.5 under ‘Binning strategy’.

Functional annotation analysis

DAVID analysis to extract functional enrichments was done in the same way as described in section 5.2.5 under ‘Functional annotation analysis’.

Covariance and correlation coefficient analysis

Covariance and correlation coefficiency is a function of Microsoft excel, which helps in determining the correlation between two variables. The sample correlation coefficient (r) gives the strength of correlation.

154

Figures: Figure 5.1

Fig 5.1 Binning strategy for RNA-Seq data: Binning protocol shown here is for two theoretical samples, A and B. Schematic on the top shows the different steps in the binning protocol and the outcomes are shown as bar graphs underneath (purple boxes).

In the bar graph FPKM units are shown on the y-axis and the gene is represented as a bar in green (sample A) and orange (sample B). Colored lines within the bar represent

155 the constituent isoforms (Yellow boxes). The dashed line represents the threshold (1

FPKM) of gene expression.

156

Figure 5.2

Fig 5.2 Validation of high-resolution transcriptome by RNA-Seq A-D. Expression of genes with established expression kinetics including (A) Fgf15, Sfrp2, Atoh7, Irx4, (B)

Fabp7, Gngt2, Nr2e3, Nrl, Rho, Pax6, (C) Malat1, Xist, Tsix, Neat1, (D) Hist2h2bb,

157

Hist2h2aa2, Hist1h4k and Hist1h4f shown as bar graph with FPKM in y-axis (log scale for A-C) between E16CE (blue), P0CE (red), and P0NE (black). E - J. Combined

E16CE-P0CE and P0CE-P0NE high-resolution transcription kinetics, E16CE_Only (E),

OR_E16CE (F), Non_DR (G), OR_P0CE (H), P0CE_Only (I) and No_Ex (J).

OR_E16CE - Over represented in E16CE; OR_P0CE – Over represented in P0CE;

Non_DR – Non-differentially represented.

158

Figure 5.3

Fig 5.3 Custom bioinformatics pipeline revealed transition in biological processes: A. E16CE-P0CE comparison is shown with its bins as boxes that were used to extract gene list for DAVID analysis and the GOterms for functions enriched by

159 these lists were curated (Detailed list of GOterms in Additional file 2: Table S1.1, S1.2).

This process is represented by the roman number I. B. An example to show the output of Part I, where OR_P0CE bin was chosen from E16CE-P0CE comparison. The genes that enriched for a function in each bin in A were then subjected to pipeline shown in C

(II), which starts with gene list entry to GeneMANIA followed by (arrow going up) identification of new partners. (Right) Output of the pipeline in C (II), where the primary gene list (17 genes) that enriched for “Visual Perception” function is shown in the first column. The three iterations of the pipeline in part II are denoted as 1X, 2X and 3X. D

(III). Genes in the final list were assigned to their bins in E16CE-P0CE and P0CE-P0NE comparisons. E. Output for D (III) with rows showing distribution of genes in bins from

E16CE–P0CE comparison and the columns reflecting genes in bins from P0CE-P0NE comparison.

160

Figure 5.4

Fig 5.4 Custom microarray revealed isoform/gene expression coherence and validated RNA-Seq. A -C. Shown here is a centroid view of K-means clusters of isoform-specific probes across retinal development (Clusters given in Additional file 4:

161

Table S3). The y-axis shows arbitrary units (-3 to 3) of expression and the developmental time is shown on top and bottom. D – F. Selected GOterms enriched by

DAVID analysis for genes in clusters A-C.

162

Figure 5.5

Fig 5.5 Temporal comparison of P21-Nrl-WT and P21-Nrl-KO to P0 Shown is an example of common and unique GOterms identified in the P21_only bin in both P0 vs. P21-Nrl-WT (P21WT_Only) and P0 vs. P21-Nrl-KO (P21KO_Only) comparisons and the genes underlying them. Left common function (visual perception);

Right Unique to P0 vs. P21-Nrl-KO (regulation of blood pressure) (Detailed list of

GOterms in Additional file 2: Table S1.3, S1.4).

163

Figure 5.6

Fig 5.6 RNA-Seq revealed progression of biological programs across normal and aberrant retinal development. A. Schematized representation of kinetics of the molecular programs identified by RNA-Seq (E16CE-P0CE- P0NE) is represented with different colors and the shapes represent the gene expression kinetics. B. Extension of our temporal analysis strategy to P21-Nrl-WT and P21-Nrl-KO by comparing them to P0

(P0CE +P0NE) revealed molecular programs in normal development (P21-Nrl-WT) and unique programs in aberrant development (P21-Nrl-KO). C. Schematic representation of a cell at P0, where the cytoplasm and the nucleus are temporally synchronized and the cell with dotted line shows that the nuclear transcriptome is shifted forward temporally compared to that of the cytoplasm

164

Figure 5.7

165

Fig 5.7. Transcriptomics summary of static comparison: A. Strategy employed in the static comparison where wild type and knock out are compared. B – D.

Transcriptome summary of the comparison between P5WT-P5KO (B), P11WT-P11KO

(C), P27WT-P27KO (D). The white box represents all genes examined for this study.

The venn diagram within it represents the different bins. Alternative splicing status of each bin is shown as a pie chart.

166

Figure 5.8

167

Fig 5.8. Transcriptomics summary of temporal comparison: A. Strategy employed in the temporal comparison where developmental time points are compared to each other as the development progresses. B – G. Transcriptome summary of the comparison between P0WT-P5WT (B), P0WT-P5KO (C), P5WT-P11WT (D), P5KO-

P11KO (E), P11WT-P27WT (F) and P11KO-P27KO (G). The white box represents all genes examined for this study. The venn diagram within it represents the different bins.

Alternative splicing status of each bin is shown as a pie chart.

168

Figure 5.9

Fig 5.9. DAVID analysis output of Static and Temporal comparisons. Table shows the two comparisons – Static and temporal comparisons with the pairwise comparisons and Goterm functional enrichments of genes belonging to their respective bins.

169

Figure 5.10

Fig 5.10 Covariance analysis of genes enriching for GABA A receptor activity.

Covariance and correlation analysis of genes enriching for ‘GABA A receptor activity’ across different time points both in the wild type and miR182/96/183 knock out retinae.

Cells in ‘red’ indicate the highest correlation between those time points. While cells in

‘blue’ indicate weaker correlation.

170

Chapter 6: Conclusion and Future Directions

In Chapter 3, it was shown that Sfrs10 was expressed in differentiating retinal neurons and are maintained in the adult retina. Its expression implies the usage of RNA processing including AS by retinal neurons during terminal differentiation. The extent to which AS is employed during neuronal differentiation and function is not well understood and our data on the expression of Sfrs10 suggests that it might play a crucial role. Indeed, neuronal specific-knockout of Sfrs10 has shown perinatal lethality due to massive apoptosis in ventricular regions in the cortex. Thus, a retina specific-knockout of Sfrs10 will be a valuable tool in helping us understand the role of Sfrs10 and its downstream targets in the developing retina. Additionally, using the inducible cre, Sfrs10 can be ablated at various stages of development to check if Sfrs10 is required for the maintenance of retinal neurons post-differentiation. The expression of Sfrs10 specifically in the red/green cone photoreceptors during the postnatal retinal development suggested

Sfrs10-regulated alternative splicing in that subset of cone photoreceptors. Currently there are no clear nuclear markers for red/green cone photoreceptors. Thus, Sfrs10 can serve as a valuable marker for these cone cells.

The upregulation of SFRS10 in AMD retinae suggests that it might be required for

AS of a subset of genes involved in hypoxic stress response. For instance, a gene might normally be expressed but under stress conditions, it might undergo AS shift and the isoform responding to stress is regulated by SFRS10. It could be that there is an increased demand for the isoform regulated by Sfrs10 under hypoxic stress. By identifying the targets of Sfrs10 in the AMD could pave way for the therapeutic intervention of the disease progression. It was shown that Sfrs10, indeed, plays an anti-apoptotic role by

171 regulating the alternative splicing of Bcl2 in colon cancer. It was also shown that Sfrs10 competes with miR-204 to bind to Bcl2. Thus, the same pathway could be activated under hypoxic stress to exhibit a protective role. A mouse model for retinal degeneration such as rd1 (where rod photoreceptors die as early as P13) could be employed to check if, by knocking down miR-204 or by over expressing Sfrs10, rod photoreceptors survive longer in this disease model.

In Chapter 4, it was shown that Citron kinase was expressed in a specific subset of retinal progenitor cells and its expression diminishes postnatally. Loss of CitK shows the absence of Islet1 specific subset of bipolar cells. This suggests that CitK might be present in those progenitor cells that would eventually give rise to Islet1+ bipolar cells postnatally. This observation corroborates the ‘competence’ model where it was shown that a retinal progenitor cell, once determined to become a bipolar cell, cannot make embryonically born neurons even when placed in the embryonic environment. By employing conditional knockout mice for Citron Kinase crossed with inducible Chx10-cre mice, we can study the role of Citron kinase in the bipolar cells postnatally and if Citron

N has a role to play in the terminal differentiation of these cells. Another question that still remains unanswered is the other factor that compensates for the loss of CitK in the knockout till E14 after which the massive apoptosis of retinal progenitor cells begin. Since there are other members of the ROCK/ROK family present, it would be interesting to knockdown each one at a time, to see if these are playing the role of the compensatory factor for CitK loss. Further, the role of Citron N has not been studied in the retina and if we could specifically knockdown the kinase domain of CitK, the role of Citron N could be investigated.

172

Two novel isoforms of CitK were identified, where one lacked the exon 9 while another had intron 13 retention. The former isoform would result in a protein, which lacks the kinase domain and the latter would have an altered SMC domain, suggesting its influence on the nuclear activity of CitK, which has been reported in Drosophila. Thus, investigating the role of these isoforms separately can provide more information on diverse roles of CitK in retinal progenitor cells. For this, isoform specific short hairpin RNA

(shRNA)s have been engineered. Using RNA interference technique, these shRNAs will be introduced into the dividing progenitor cells using P0 invivo retinal electroporation.

Post- electroporation, retinae can be harvested at various time points and immunohistochemistry analysis on these tissue will be performed. This will allow us to identify the isoform-specific aspects of the phenotype and could possibly highlight both the nuclear and cytoplasmic role of Citron Kinase, attributing each role to a specific isoform.

In Chapter 5, global analyses of transcriptome changes and how alternative splicing informs retinal development was discussed. It was shown that genes, which did not change in their transcriptional levels, had the highest level of alternative splicing at the isoform level. RNAseq data obtained from fractionated retinae revealed that the nuclear transcriptome was temporally ahead of cytoplasmic transcriptome, in that nucleus was preparing for the next molecular event while executing the current program. A customized bioinformatics pipeline was devised to extract the biological meaning from large datasets. This strategy was also compared to other methodologies routinely employed by other biologists. The application of this pipeline is diverse, in that, it could be used to investigate transition in molecular programs across tissue types and cell

173 states, including investigations of disease progression. This pipeline was also employed to understand the progression of uterine cancer, where RNAseq data from tumor samples from five different regions as well as from different stages of cancer. The goal here is to identify the chronology in which each of the tumor developed and whether or not the tumors developed at later stages were formed de novo or from metastasized cells. By identifying tumor-specific molecular signatures and finding the underpinning biological programs, the relationship between each tumor with respect to that biological function begin to emerge, thereby allowing us to determine the similarities and differences between the tumors.

Another analysis that employed this pipeline was investigating the role of a microRNA cluster knockout in the retinal development. Since the cluster, miR 182-92-

183, is known to play a role in photoreceptor and bipolar cell maintenance, one would predict that the absence would lead to pathways involved in photoreceptor function/maintenance going aberrant. To test whether or not, we can identify these molecular signatures early in the development, RNAseq data from various stages of retinal development is obtained from the knockout and wildtype counterpart. Additionally, downstream targets of this miRNA cluster is known, which would help us to leverage the changes we observe in the bins obtained through our analysis. Currently, this work is in progress, where we employed the automated upstream and downstream bioinformatics pipeline to extract all the pathways enriched by each bin in a pairwise comparison along with the whole gamut of genes involved in each of the pathway. We are also employing the nodal analysis where, for example, P5WT is compared to P11KO RNAseq data and this comparison along with P5WT-P11WT comparison. The rationale here is that by

174 comparing these two pairwise comparisons, we can detect the divergence in the normal pathway as it precipitates. This way, primary, secondary and tertiary effects of the miRNA cluster KO can be established.

175

List of References

1. Russell GA: Chapter 6: after Galen Late Antiquity and the Islamic world. Handbook of clinical neurology 2010, 95:61-77. 2. Piccolino M: Cajal and the retina: a 100-year retrospective. Trends in neurosciences 1988, 11(12):521-525. 3. Cepko CL, Austin CP, Yang X, Alexiades M, Ezzeddine D: Cell fate determination in the vertebrate retina. Proceedings of the National Academy of Sciences of the United States of America 1996, 93(2):589-595. 4. Brownlee C: Biography of Constance L Cepko. Proceedings of the National Academy of Sciences of the United States of America 2004, 101(1):14-15. 5. Livesey FJ, Cepko CL: Vertebrate neural cell-fate determination: lessons from the retina. Nature reviews Neuroscience 2001, 2(2):109-118. 6. Arden GB, Sidman RL, Arap W, Schlingemann RO: Spare the rod and spoil the eye. The British journal of ophthalmology 2005, 89(6):764-769. 7. Brown PK, Wald G: VISUAL PIGMENTS IN SINGLE RODS AND CONES OF THE HUMAN RETINA. DIRECT MEASUREMENTS REVEAL MECHANISMS OF HUMAN NIGHT AND COLOR VISION. Science 1964, 144(3614):45-52. 8. Curcio CA, Sloan KR, Kalina RE, Hendrickson AE: Human photoreceptor topography. The Journal of comparative neurology 1990, 292(4):497-523. 9. Purves D, Lotto RB, Williams SM, Nundy S, Yang Z: Why we see things the way we do: evidence for a wholly empirical strategy of vision. Philosophical transactions of the Royal Society of London Series B, Biological sciences 2001, 356(1407):285-297. 10. Franze K, Grosche J, Skatchkov SN, Schinkinger S, Foja C, Schild D, Uckermann O, Travis K, Reichenbach A, Guck J: Muller cells are living optical fibers in the vertebrate retina. Proceedings of the National Academy of Sciences of the United States of America 2007, 104(20):8287-8292. 11. Graw J: Eye development. Current topics in developmental biology 2010, 90:343-386. 12. Belliveau MJ, Cepko CL: Extrinsic and intrinsic factors control the genesis of amacrine and cone cells in the rat retina. Development 1999, 126(3):555-566. 13. Halder G, Callaerts P, Gehring WJ: Induction of ectopic eyes by targeted expression of the eyeless gene in Drosophila. Science 1995, 267(5205):1788-1792. 14. Marquardt T, Ashery-Padan R, Andrejewski N, Scardigli R, Guillemot F, Gruss P: Pax6 is required for the multipotent state of retinal progenitor cells. Cell 2001, 105(1):43-55. 15. Wang SW, Kim BS, Ding K, Wang H, Sun D, Johnson RL, Klein WH, Gan L: Requirement for math5 in the development of retinal ganglion cells. Genes & development 2001, 15(1):24-29. 16. Ng L, Lu A, Swaroop A, Sharlin DS, Swaroop A, Forrest D: Two transcription factors can direct three photoreceptor outcomes from rod precursor cells in mouse retinal development. The Journal of neuroscience : the official journal of the Society for Neuroscience 2011, 31(31):11118- 11125. 17. Cheng H, Aleman TS, Cideciyan AV, Khanna R, Jacobson SG, Swaroop A: In vivo function of the orphan nuclear receptor NR2E3 in establishing photoreceptor identity during mammalian retinal development. Human molecular genetics 2006, 15(17):2588-2602. 18. Hennig AK, Peng GH, Chen S: Regulation of photoreceptor gene expression by Crx-associated transcription factor network. Brain research 2008, 1192:114-133. 19. Furukawa T, Morrow EM, Li T, Davis FC, Cepko CL: Retinopathy and attenuated circadian entrainment in Crx-deficient mice. Nature genetics 1999, 23(4):466-470.

176

20. Corbo JC, Lawrence KA, Karlstetter M, Myers CA, Abdelaziz M, Dirkes W, Weigelt K, Seifert M, Benes V, Fritsche LG et al: CRX ChIP-seq reveals the cis-regulatory architecture of mouse photoreceptors. Genome research 2010, 20(11):1512-1525. 21. Bramblett DE, Pennesi ME, Wu SM, Tsai MJ: The transcription factor Bhlhb4 is required for rod bipolar cell maturation. Neuron 2004, 43(6):779-793. 22. Wang S, Sengel C, Emerson MM, Cepko CL: A gene regulatory network controls the binary fate decision of rod and bipolar cells in the vertebrate retina. Developmental cell 2014, 30(5):513- 527. 23. Hatakeyama J, Tomita K, Inoue T, Kageyama R: Roles of homeobox and bHLH genes in specification of a retinal cell type. Development 2001, 128(8):1313-1322. 24. Dhomen NS, Balaggan KS, Pearson RA, Bainbridge JW, Levine EM, Ali RR, Sowden JC: Absence of chx10 causes neural progenitors to persist in the adult retina. Investigative ophthalmology & visual science 2006, 47(1):386-396. 25. Surzenko N, Crowl T, Bachleda A, Langer L, Pevny L: SOX2 maintains the quiescent progenitor cell state of postnatal retinal Muller glia. Development 2013, 140(7):1445-1456. 26. Colozza G, Locker M, Perron M: Shaping the eye from embryonic stem cells: Biological and medical implications. World journal of stem cells 2012, 4(8):80-86. 27. Moore MJ: From birth to death: the complex lives of eukaryotic mRNAs. Science 2005, 309(5740):1514-1518. 28. Le Hir H, Gatfield D, Izaurralde E, Moore MJ: The exon-exon junction complex provides a binding platform for factors involved in mRNA export and nonsense-mediated mRNA decay. The EMBO journal 2001, 20(17):4987-4997. 29. Nilsen TW: The spliceosome: the most complex macromolecular machine in the cell? BioEssays : news and reviews in molecular, cellular and developmental biology 2003, 25(12):1147-1149. 30. Will CL, Luhrmann R: Splicing of a rare class of introns by the U12-dependent spliceosome. Biological chemistry 2005, 386(8):713-724. 31. Breathnach R, Benoist C, O'Hare K, Gannon F, Chambon P: Ovalbumin gene: evidence for a leader sequence in mRNA and DNA sequences at the exon-intron boundaries. Proceedings of the National Academy of Sciences of the United States of America 1978, 75(10):4853-4857. 32. Lazowska J, Jacq C, Slonimski PP: Sequence of introns and flanking exons in wild-type and box3 mutants of cytochrome b reveals an interlaced splicing protein coded by an intron. Cell 1980, 22(2 Pt 2):333-348. 33. Senapathy P, Shapiro MB, Harris NL: Splice junctions, branch point sites, and exons: sequence statistics, identification, and applications to genome project. Methods in enzymology 1990, 183:252-278. 34. Taggart AJ, DeSimone AM, Shih JS, Filloux ME, Fairbrother WG: Large-scale mapping of branchpoints in human pre-mRNA transcripts in vivo. Nature structural & molecular biology 2012, 19(7):719-721. 35. Matera AG, Wang Z: A day in the life of the spliceosome. Nature reviews Molecular cell biology 2014, 15(2):108-121. 36. Blencowe BJ: Alternative splicing: new insights from global analyses. Cell 2006, 126(1):37-47. 37. Kornblihtt AR, Schor IE, Allo M, Dujardin G, Petrillo E, Munoz MJ: Alternative splicing: a pivotal step between eukaryotic transcription and translation. Nature reviews Molecular cell biology 2013, 14(3):153-165. 38. Berget SM, Moore C, Sharp PA: Spliced segments at the 5' terminus of adenovirus 2 late mRNA. Proceedings of the National Academy of Sciences of the United States of America 1977, 74(8):3171-3175.

177

39. Chow LT, Gelinas RE, Broker TR, Roberts RJ: An amazing sequence arrangement at the 5' ends of adenovirus 2 messenger RNA. Cell 1977, 12(1):1-8. 40. Modrek B, Lee C: A genomic view of alternative splicing. Nature genetics 2002, 30(1):13-19. 41. Salomonis N, Conklin BR: Stem cell pluripotency: alternative modes of transcription regulation. Cell cycle (Georgetown, Tex) 2010, 9(16):3133-3134. 42. Salomonis N, Schlieve CR, Pereira L, Wahlquist C, Colas A, Zambon AC, Vranizan K, Spindler MJ, Pico AR, Cline MS et al: Alternative splicing regulates mouse embryonic stem cell pluripotency and differentiation. Proceedings of the National Academy of Sciences of the United States of America 2010, 107(23):10514-10519. 43. Nilsen TW, Graveley BR: Expansion of the eukaryotic proteome by alternative splicing. Nature 2010, 463(7280):457-463. 44. Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ: Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature genetics 2008, 40(12):1413- 1415. 45. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB: Alternative isoform regulation in human tissue transcriptomes. Nature 2008, 456(7221):470-476. 46. Gracheva EO, Cordero-Morales JF, Gonzalez-Carcacia JA, Ingolia NT, Manno C, Aranguren CI, Weissman JS, Julius D: Ganglion-specific splicing of TRPV1 underlies infrared sensation in vampire bats. Nature 2011, 476(7358):88-91. 47. Kim E, Goren A, Ast G: Alternative splicing: current perspectives. BioEssays : news and reviews in molecular, cellular and developmental biology 2008, 30(1):38-47. 48. Sammeth M, Foissac S, Guigo R: A general definition and nomenclature for alternative splicing events. PLoS computational biology 2008, 4(8):e1000147. 49. Libri D, Balvay L, Fiszman MY: In vivo splicing of the beta tropomyosin pre-mRNA: a role for branch point and donor site competition. Molecular and cellular biology 1992, 12(7):3204- 3215. 50. Guil S, Gattoni R, Carrascal M, Abian J, Stevenin J, Bach-Elias M: Roles of hnRNP A1, SR proteins, and p68 helicase in c-H-ras alternative splicing regulation. Molecular and cellular biology 2003, 23(8):2927-2941. 51. Bazeley PS, Shepelev V, Talebizadeh Z, Butler MG, Fedorova L, Filatov V, Fedorov A: snoTARGET shows that human orphan snoRNA targets locate close to alternative splice junctions. Gene 2008, 408(1-2):172-179. 52. Celotto AM, Graveley BR: Alternative splicing of the Drosophila Dscam pre-mRNA is both temporally and spatially regulated. Genetics 2001, 159(2):599-608. 53. Olson S, Blanchette M, Park J, Savva Y, Yeo GW, Yeakley JM, Rio DC, Graveley BR: A regulator of Dscam mutually exclusive splicing fidelity. Nature structural & molecular biology 2007, 14(12):1134-1140. 54. Dredge BK, Darnell RB: Nova regulates GABA(A) receptor gamma2 alternative splicing via a distal downstream UCAU-rich intronic splicing enhancer. Molecular and cellular biology 2003, 23(13):4687-4700. 55. Erkelenz S, Mueller WF, Evans MS, Busch A, Schoneweis K, Hertel KJ, Schaal H: Position- dependent splicing activation and repression by SR and hnRNP proteins rely on common mechanisms. Rna 2013, 19(1):96-102. 56. Chen M, Manley JL: Mechanisms of alternative splicing regulation: insights from molecular and genomics approaches. Nature reviews Molecular cell biology 2009, 10(11):741-754. 57. Venables JP: Aberrant and alternative splicing in cancer. Cancer research 2004, 64(21):7647- 7654.

178

58. Karni R, de Stanchina E, Lowe SW, Sinha R, Mu D, Krainer AR: The gene encoding the splicing factor SF2/ASF is a proto-oncogene. Nature structural & molecular biology 2007, 14(3):185-193. 59. Faustino NA, Cooper TA: Pre-mRNA splicing and human disease. Genes & development 2003, 17(4):419-437. 60. Orr HT, Zoghbi HY: Trinucleotide repeat disorders. Annual review of neuroscience 2007, 30:575- 621. 61. Brook JD, McCurrach ME, Harley HG, Buckler AJ, Church D, Aburatani H, Hunter K, Stanton VP, Thirion JP, Hudson T et al: Molecular basis of myotonic dystrophy: expansion of a trinucleotide (CTG) repeat at the 3' end of a transcript encoding a protein kinase family member. Cell 1992, 69(2):385. 62. Kanadia RN, Johnstone KA, Mankodi A, Lungu C, Thornton CA, Esson D, Timmers AM, Hauswirth WW, Swanson MS: A muscleblind knockout model for myotonic dystrophy. Science 2003, 302(5652):1978-1980. 63. Dick KA, Margolis JM, Day JW, Ranum LP: Dominant non-coding repeat expansions in human disease. Genome dynamics 2006, 1:67-83. 64. Larkin K, Fardaei M: Myotonic dystrophy--a multigene disorder. Brain research bulletin 2001, 56(3-4):389-395. 65. Liquori CL, Ricker K, Moseley ML, Jacobsen JF, Kress W, Naylor SL, Day JW, Ranum LP: Myotonic dystrophy type 2 caused by a CCTG expansion in intron 1 of ZNF9. Science 2001, 293(5531):864-867. 66. Thiery JP: [Epithelial-mesenchymal transitions in cancer onset and progression]. Bulletin de l'Academie nationale de medecine 2009, 193(9):1969-1978; discussion 1978-1969. 67. Schwerk C, Schulze-Osthoff K: Regulation of apoptosis by alternative pre-mRNA splicing. Molecular cell 2005, 19(1):1-13. 68. Boise LH, Gonzalez-Garcia M, Postema CE, Ding L, Lindsten T, Turka LA, Mao X, Nunez G, Thompson CB: bcl-x, a bcl-2-related gene that functions as a dominant regulator of apoptotic cell death. Cell 1993, 74(4):597-608. 69. Iwanaga N, Kamachi M, Aratake K, Izumi Y, Ida H, Tanaka F, Tamai M, Arima K, Nakamura H, Origuchi T et al: Regulation of alternative splicing of caspase-2 through an intracellular signaling pathway in response to pro-apoptotic stimuli. The Journal of laboratory and clinical medicine 2005, 145(2):105-110. 70. Daoud R, Mies G, Smialowska A, Olah L, Hossmann KA, Stamm S: Ischemia induces a translocation of the splicing factor tra2-beta 1 and changes alternative splicing patterns in the brain. The Journal of neuroscience : the official journal of the Society for Neuroscience 2002, 22(14):5889-5899. 71. Izquierdo JM, Majos N, Bonnal S, Martinez C, Castelo R, Guigo R, Bilbao D, Valcarcel J: Regulation of Fas alternative splicing by antagonistic effects of TIA-1 and PTB on exon definition. Molecular cell 2005, 19(4):475-484. 72. Coutinho-Mansfield GC, Xue Y, Zhang Y, Fu XD: PTB/nPTB switch: a post-transcriptional mechanism for programming neuronal differentiation. Genes & development 2007, 21(13):1573-1577. 73. Ule J, Ule A, Spencer J, Williams A, Hu JS, Cline M, Wang H, Clark T, Fraser C, Ruggiu M et al: Nova regulates brain-specific splicing to shape the synapse. Nature genetics 2005, 37(8):844- 852. 74. Betticher DC, Thatcher N, Altermatt HJ, Hoban P, Ryder WD, Heighway J: Alternate splicing produces a novel cyclin D1 transcript. Oncogene 1995, 11(5):1005-1011. 75. Lu F, Gladden AB, Diehl JA: An alternatively spliced cyclin D1 isoform, cyclin D1b, is a nuclear oncogene. Cancer research 2003, 63(21):7056-7061.

179

76. Solomon DA, Wang Y, Fox SR, Lambeck TC, Giesting S, Lan Z, Senderowicz AM, Conti CJ, Knudsen ES: Cyclin D1 splice variants. Differential effects on localization, RB phosphorylation, and cellular transformation. The Journal of biological chemistry 2003, 278(32):30339-30347. 77. Burd CJ, Petre CE, Morey LM, Wang Y, Revelo MP, Haiman CA, Lu S, Fenoglio-Preiser CM, Li J, Knudsen ES et al: Cyclin D1b variant influences prostate cancer growth through aberrant androgen receptor regulation. Proceedings of the National Academy of Sciences of the United States of America 2006, 103(7):2190-2195. 78. Alt JR, Cleveland JL, Hannink M, Diehl JA: Phosphorylation-dependent regulation of cyclin D1 nuclear export and cyclin D1-dependent cellular transformation. Genes & development 2000, 14(24):3102-3114. 79. Gunther K, Dworak O, Remke S, Pfluger R, Merkel S, Hohenberger W, Reymond MA: Prediction of distant metastases after curative surgery for rectal cancer. The Journal of surgical research 2002, 103(1):68-78. 80. Ponta H, Sherman L, Herrlich PA: CD44: from adhesion molecules to signalling regulators. Nature reviews Molecular cell biology 2003, 4(1):33-45. 81. Arch R, Wirth K, Hofmann M, Ponta H, Matzku S, Herrlich P, Zoller M: Participation in normal immune responses of a metastasis-inducing splice variant of CD44. Science 1992, 257(5070):682-685. 82. Galiana-Arnoux D, Lejeune F, Gesnel MC, Stevenin J, Breathnach R, Del Gatto-Konczak F: The CD44 alternative v9 exon contains a splicing enhancer responsive to the SR proteins 9G8, ASF/SF2, and SRp20. The Journal of biological chemistry 2003, 278(35):32943-32953. 83. Keely PJ, Westwick JK, Whitehead IP, Der CJ, Parise LV: Cdc42 and Rac1 induce integrin- mediated cell motility and invasiveness through PI(3)K. Nature 1997, 390(6660):632-636. 84. Perona R, Montaner S, Saniger L, Sanchez-Perez I, Bravo R, Lacal JC: Activation of the nuclear factor-kappaB by Rho, CDC42, and Rac-1 proteins. Genes & development 1997, 11(4):463-475. 85. Shepard PJ, Hertel KJ: The SR protein family. Genome biology 2009, 10(10):242. 86. Zhu J, Mayeda A, Krainer AR: Exon identity established through differential antagonism between exonic splicing silencer-bound hnRNP A1 and enhancer-bound SR proteins. Molecular cell 2001, 8(6):1351-1361. 87. Zhou HL, Lou H: Repression of prespliceosome complex formation at two distinct steps by Fox- 1/Fox-2 proteins. Molecular and cellular biology 2008, 28(17):5507-5516. 88. Barron VA, Lou H: Alternative splicing of the neurofibromatosis type I pre-mRNA. Bioscience reports 2012, 32(2):131-138. 89. Doktor TK, Schroeder LD, Vested A, Palmfeldt J, Andersen HS, Gregersen N, Andresen BS: SMN2 exon 7 splicing is inhibited by binding of hnRNP A1 to a common ESS motif that spans the 3' splice site. Human mutation 2011, 32(2):220-230. 90. Crawford JB, Patton JG: Activation of alpha-tropomyosin exon 2 is regulated by the SR protein 9G8 and heterogeneous nuclear ribonucleoproteins H and F. Molecular and cellular biology 2006, 26(23):8791-8802. 91. Expert-Bezancon A, Sureau A, Durosay P, Salesse R, Groeneveld H, Lecaer JP, Marie J: hnRNP A1 and the SR proteins ASF/SF2 and SC35 have antagonistic functions in splicing of beta- tropomyosin exon 6B. The Journal of biological chemistry 2004, 279(37):38249-38259. 92. Charlet BN, Logan P, Singh G, Cooper TA: Dynamic antagonism between ETR-3 and PTB regulates cell type-specific alternative splicing. Molecular cell 2002, 9(3):649-658. 93. McGuffin ME, Chandler D, Somaiya D, Dauwalder B, Mattox W: Autoregulation of transformer-2 alternative splicing is necessary for normal male fertility in Drosophila. Genetics 1998, 149(3):1477-1486.

180

94. Zachar Z, Garza D, Chou TB, Goland J, Bingham PM: Molecular cloning and genetic analysis of the suppressor-of-white-apricot locus from Drosophila melanogaster. Molecular and cellular biology 1987, 7(7):2498-2505. 95. Zahler AM, Lane WS, Stolk JA, Roth MB: SR proteins: a conserved family of pre-mRNA splicing factors. Genes & development 1992, 6(5):837-847. 96. Krainer AR, Conway GC, Kozak D: The essential pre-mRNA splicing factor SF2 influences 5' splice site selection by activating proximal sites. Cell 1990, 62(1):35-42. 97. Krainer AR, Conway GC, Kozak D: Purification and characterization of pre-mRNA splicing factor SF2 from HeLa cells. Genes & development 1990, 4(7):1158-1171. 98. Long JC, Caceres JF: The SR protein family of splicing factors: master regulators of gene expression. The Biochemical journal 2009, 417(1):15-27. 99. Graveley BR: Sorting out the complexity of SR protein functions. Rna 2000, 6(9):1197-1211. 100. Lam BJ, Hertel KJ: A general role for splicing enhancers in exon definition. Rna 2002, 8(10):1233-1241. 101. Matlin AJ, Clark F, Smith CW: Understanding alternative splicing: towards a cellular code. Nature reviews Molecular cell biology 2005, 6(5):386-398. 102. Bradley T, Cook ME, Blanchette M: SR proteins control a complex network of RNA-processing events. Rna 2015, 21(1):75-92. 103. Caceres JF, Misteli T, Screaton GR, Spector DL, Krainer AR: Role of the modular domains of SR proteins in subnuclear localization and alternative splicing specificity. The Journal of cell biology 1997, 138(2):225-238. 104. Misteli T, Spector DL: Protein phosphorylation and the nuclear organization of pre-mRNA splicing. Trends in cell biology 1997, 7(4):135-138. 105. Sacco-Bubulya P, Spector DL: Disassembly of interchromatin granule clusters alters the coordination of transcription and pre-mRNA splicing. The Journal of cell biology 2002, 156(3):425-436. 106. Zhou Z, Fu XD: Regulation of splicing by SR proteins and SR protein-specific kinases. Chromosoma 2013, 122(3):191-207. 107. Zhong XY, Ding JH, Adams JA, Ghosh G, Fu XD: Regulation of SR protein phosphorylation and alternative splicing by modulating kinetic interactions of SRPK1 with molecular chaperones. Genes & development 2009, 23(4):482-495. 108. Feng Y, Chen M, Manley JL: Phosphorylation switches the general splicing repressor SRp38 to a sequence-specific activator. Nature structural & molecular biology 2008, 15(10):1040-1048. 109. Zahler AM, Damgaard CK, Kjems J, Caputi M: SC35 and heterogeneous nuclear ribonucleoprotein A/B proteins bind to a juxtaposed exonic splicing enhancer/exonic splicing silencer element to regulate HIV-1 tat exon 2 splicing. The Journal of biological chemistry 2004, 279(11):10077-10084. 110. Anko ML, Muller-McNicoll M, Brandl H, Curk T, Gorup C, Henry I, Ule J, Neugebauer KM: The RNA-binding landscapes of two SR proteins reveal unique functions and binding to diverse RNA classes. Genome biology 2012, 13(3):R17. 111. Anko ML, Neugebauer KM: RNA-protein interactions in vivo: global gets specific. Trends in biochemical sciences 2012, 37(7):255-262. 112. Pandit S, Zhou Y, Shiue L, Coutinho-Mansfield G, Li H, Qiu J, Huang J, Yeo GW, Ares M, Jr., Fu XD: Genome-wide analysis reveals SR protein cooperation and competition in regulated splicing. Molecular cell 2013, 50(2):223-235. 113. Lindberg A, Gama-Carvalho M, Carmo-Fonseca M, Kreivi JP: A single RNA recognition motif in splicing factor ASF/SF2 directs it to nuclear sites of adenovirus transcription. The Journal of general virology 2004, 85(Pt 3):603-608.

181

114. Huang Y, Steitz JA: Splicing factors SRp20 and 9G8 promote the nucleocytoplasmic export of mRNA. Molecular cell 2001, 7(4):899-905. 115. Huang Y, Gattoni R, Stevenin J, Steitz JA: SR splicing factors serve as adapter proteins for TAP- dependent mRNA export. Molecular cell 2003, 11(3):837-843. 116. Gilbert W, Siebel CW, Guthrie C: Phosphorylation by Sky1p promotes Npl3p shuttling and mRNA dissociation. Rna 2001, 7(2):302-313. 117. Karni R, Hippo Y, Lowe SW, Krainer AR: The splicing-factor oncoprotein SF2/ASF activates mTORC1. Proceedings of the National Academy of Sciences of the United States of America 2008, 105(40):15323-15327. 118. Fitzgerald KD, Chase AJ, Cathcart AL, Tran GP, Semler BL: Viral proteinase requirements for the nucleocytoplasmic relocalization of cellular splicing factor SRp20 during picornavirus infections. Journal of virology 2013, 87(5):2390-2400. 119. Baker BS: Sex in flies: the splice of life. Nature 1989, 340(6234):521-524. 120. Hoshijima K, Inoue K, Higuchi I, Sakamoto H, Shimura Y: Control of doublesex alternative splicing by transformer and transformer-2 in Drosophila. Science 1991, 252(5007):833-836. 121. Hedley ML, Maniatis T: Sex-specific splicing and polyadenylation of dsx pre-mRNA requires a sequence that binds specifically to tra-2 protein in vitro. Cell 1991, 65(4):579-586. 122. Grellscheid S, Dalgliesh C, Storbeck M, Best A, Liu Y, Jakubik M, Mende Y, Ehrmann I, Curk T, Rossbach K et al: Identification of evolutionarily conserved exons as regulated targets for the splicing activator tra2beta in development. PLoS genetics 2011, 7(12):e1002390. 123. Zhu J, Krainer AR: Pre-mRNA splicing in the absence of an SR protein RS domain. Genes & development 2000, 14(24):3166-3178. 124. Mende Y, Jakubik M, Riessland M, Schoenen F, Rossbach K, Kleinridders A, Kohler C, Buch T, Wirth B: Deficiency of the splicing factor Sfrs10 results in early embryonic lethality in mice and has no impact on full-length SMN/Smn splicing. Human molecular genetics 2010, 19(11):2154- 2167. 125. Storbeck M, Hupperich K, Gaspar JA, Meganathan K, Martinez Carrera L, Wirth R, Sachinidis A, Wirth B: Neuronal-specific deficiency of the splicing factor Tra2b causes apoptosis in neurogenic areas of the developing mouse brain. PloS one 2014, 9(2):e89020. 126. Stamm S, Casper D, Hanson V, Helfman DM: Regulation of the neuron-specific exon of clathrin light chain B. Brain research Molecular brain research 1999, 64(1):108-118. 127. Fu K, Mende Y, Bhetwal BP, Baker S, Perrino BA, Wirth B, Fisher SA: Tra2beta protein is required for tissue-specific splicing of a smooth muscle myosin phosphatase targeting subunit alternative exon. The Journal of biological chemistry 2012, 287(20):16575-16585. 128. Pihlajamaki J, Lerin C, Itkonen P, Boes T, Floss T, Schroeder J, Dearie F, Crunkhorn S, Burak F, Jimenez-Chillaron JC et al: Expression of the splicing factor gene SFRS10 is reduced in human obesity and contributes to enhanced lipogenesis. Cell metabolism 2011, 14(2):208-218. 129. Csaki LS, Reue K: Lipins: multifunctional lipid metabolism proteins. Annual review of nutrition 2010, 30:257-272. 130. Peterfy M, Phan J, Reue K: Alternatively spliced lipin isoforms exhibit distinct expression pattern, subcellular localization, and role in adipogenesis. The Journal of biological chemistry 2005, 280(38):32883-32889. 131. Yao-Borengasser A, Rasouli N, Varma V, Miles LM, Phanavanh B, Starks TN, Phan J, Spencer HJ, 3rd, McGehee RE, Jr., Reue K et al: Lipin expression is attenuated in adipose tissue of insulin- resistant human subjects and increases with peroxisome proliferator-activated receptor gamma activation. Diabetes 2006, 55(10):2811-2818.

182

132. Matsuo N, Ogawa S, Imai Y, Takagi T, Tohyama M, Stern D, Wanaka A: Cloning of a novel RNA binding polypeptide (RA301) induced by hypoxia/reoxygenation. The Journal of biological chemistry 1995, 270(47):28216-28222. 133. Tsukamoto Y, Matsuo N, Ozawa K, Hori O, Higashi T, Nishizaki J, Tohnai N, Nagata I, Kawano K, Yutani C et al: Expression of a novel RNA-splicing factor, RA301/Tra2beta, in vascular lesions and its role in smooth muscle cell proliferation. The American journal of pathology 2001, 158(5):1685-1694. 134. Takeo K, Kawai T, Nishida K, Masuda K, Teshima-Kondo S, Tanahashi T, Rokutan K: Oxidative stress-induced alternative splicing of transformer 2beta (SFRS10) and CD44 pre-mRNAs in gastric epithelial cells. American journal of physiology Cell physiology 2009, 297(2):C330-338. 135. Huang Q, Raya A, DeJesus P, Chao SH, Quon KC, Caldwell JS, Chanda SK, Izpisua-Belmonte JC, Schultz PG: Identification of p53 regulators by genome-wide functional analysis. Proceedings of the National Academy of Sciences of the United States of America 2004, 101(10):3456-3461. 136. Watermann DO, Tang Y, Zur Hausen A, Jager M, Stamm S, Stickeler E: Splicing factor Tra2-beta1 is specifically induced in breast cancer and regulates alternative splicing of the CD44 gene. Cancer research 2006, 66(9):4774-4780. 137. Hofmann Y, Lorson CL, Stamm S, Androphy EJ, Wirth B: Htra2-beta 1 stimulates an exonic splicing enhancer and can restore full-length SMN expression to survival motor neuron 2 (SMN2). Proceedings of the National Academy of Sciences of the United States of America 2000, 97(17):9618-9623. 138. Hofmann Y, Wirth B: hnRNP-G promotes exon 7 inclusion of survival motor neuron (SMN) via direct interaction with Htra2-beta1. Human molecular genetics 2002, 11(17):2037-2049. 139. Lorson CL, Hahnen E, Androphy EJ, Wirth B: A single nucleotide in the SMN gene regulates splicing and is responsible for spinal muscular atrophy. Proceedings of the National Academy of Sciences of the United States of America 1999, 96(11):6307-6311. 140. Li J, Chen XH, Xiao PJ, Li L, Lin WM, Huang J, Xu P: Expression pattern and splicing function of mouse ZNF265. Neurochemical research 2008, 33(3):483-489. 141. Jiang Z, Tang H, Havlioglu N, Zhang X, Stamm S, Yan R, Wu JY: Mutations in tau gene exon 10 associated with FTDP-17 alter the activity of an exonic splicing enhancer to interact with Tra2 beta. The Journal of biological chemistry 2003, 278(21):18997-19007. 142. Kondo S, Yamamoto N, Murakami T, Okumura M, Mayeda A, Imaizumi K: Tra2 beta, SF2/ASF and SRp30c modulate the function of an exonic splicing enhancer in exon 10 of tau pre-mRNA. Genes to cells : devoted to molecular & cellular mechanisms 2004, 9(2):121-130. 143. Kuwano Y, Nishida K, Kajita K, Satake Y, Akaike Y, Fujita K, Kano S, Masuda K, Rokutan K: Transformer 2beta and miR-204 regulate apoptosis through competitive binding to 3' UTR of BCL2 mRNA. Cell death and differentiation 2015, 22(5):815-825. 144. Karunakaran DK, Congdon S, Guerrette T, Banday AR, Lemoine C, Chhaya N, Kanadia R: The expression analysis of Sfrs10 and Celf4 during mouse retinal development. Gene expression patterns : GEP 2013, 13(8):425-436. 145. Karunakaran DKP, Banday AR, Wu Q, Kanadia R: Expression Analysis of an Evolutionarily Conserved Alternative Splicing Factor, Sfrs10, in Age-Related Macular Degeneration. PloS one 2013, 8(9):e75964. 146. Algvere PV, Marshall J, Seregard S: Age-related maculopathy and the impact of blue light hazard. Acta ophthalmologica Scandinavica 2006, 84(1):4-15. 147. Bressler SB: Introduction: Understanding the role of angiogenesis and antiangiogenic agents in age-related macular degeneration. Ophthalmology 2009, 116(10 Suppl):S1-7. 148. Campochiaro PA: Ocular neovascularisation and excessive vascular permeability. Expert opinion on biological therapy 2004, 4(9):1395-1402.

183

149. Curcio CA, Johnson M, Huang JD, Rudolf M: Aging, age-related macular degeneration, and the response-to-retention of apolipoprotein B-containing lipoproteins. Progress in retinal and eye research 2009, 28(6):393-422. 150. Ding X, Patel M, Chan CC: Molecular pathology of age-related macular degeneration. Progress in retinal and eye research 2009, 28(1):1-18. 151. Feigl B: Age-related maculopathy - linking aetiology and pathophysiological changes to the ischaemia hypothesis. Progress in retinal and eye research 2009, 28(1):63-86. 152. Verhoeff FH, Grossman HP: The Pathogenesis of Disciform Degeneration of the Macula. Transactions of the American Ophthalmological Society 1937, 35:262-294. 153. Boulton M, Marshall J: Effects of increasing numbers of phagocytic inclusions on human retinal pigment epithelial cells in culture: a model for aging. The British journal of ophthalmology 1986, 70(11):808-815. 154. Dorey CK, Wu G, Ebenstein D, Garsd A, Weiter JJ: Cell loss in the aging retina. Relationship to lipofuscin accumulation and macular degeneration. Investigative ophthalmology & visual science 1989, 30(8):1691-1699. 155. Suter M, Reme C, Grimm C, Wenzel A, Jaattela M, Esser P, Kociok N, Leist M, Richter C: Age- related macular degeneration. The lipofusion component N-retinyl-N-retinylidene ethanolamine detaches proapoptotic proteins from mitochondria and induces apoptosis in mammalian retinal pigment epithelial cells. The Journal of biological chemistry 2000, 275(50):39625-39630. 156. Berman ER: Retinal pigment epithelium: lysosomal enzymes and aging. The British journal of ophthalmology 1994, 78(2):82-83. 157. Holz FG, Bellmann C, Margaritidis M, Schutt F, Otto TP, Volcker HE: Patterns of increased in vivo fundus autofluorescence in the junctional zone of geographic atrophy of the retinal pigment epithelium associated with age-related macular degeneration. Graefe's archive for clinical and experimental ophthalmology = Albrecht von Graefes Archiv fur klinische und experimentelle Ophthalmologie 1999, 237(2):145-152. 158. Bird A: Age-related macular disease. The British journal of ophthalmology 1996, 80(1):2-3. 159. Guymer R, Luthert P, Bird A: Changes in Bruch's membrane and related structures with age. Progress in retinal and eye research 1999, 18(1):59-90. 160. Hogan MJ: Role of the retinal pigment epithelium in macular disease. Transactions - American Academy of Ophthalmology and Otolaryngology American Academy of Ophthalmology and Otolaryngology 1972, 76(1):64-80. 161. Young RW: The renewal of photoreceptor cell outer segments. The Journal of cell biology 1967, 33(1):61-72. 162. Snodderly DM, Sandstrom MM, Leung IY, Zucker CL, Neuringer M: Retinal pigment epithelial cell distribution in central retina of rhesus monkeys. Investigative ophthalmology & visual science 2002, 43(9):2815-2818. 163. Fang AM, Lee AY, Kulkarni M, Osborn MP, Brantley MA, Jr.: Polymorphisms in the VEGFA and VEGFR-2 genes and neovascular age-related macular degeneration. Molecular vision 2009, 15:2710-2719. 164. Zareparsi S, Buraczynska M, Branham KE, Shah S, Eng D, Li M, Pawar H, Yashar BM, Moroi SE, Lichter PR et al: Toll-like receptor 4 variant D299G is associated with susceptibility to age- related macular degeneration. Human molecular genetics 2005, 14(11):1449-1455. 165. Lazzeri S, Orlandi P, Figus M, Fioravanti A, Cascio E, Di Desidero T, Agosta E, Canu B, Sartini MS, Danesi R et al: The rs2071559 AA VEGFR-2 genotype frequency is significantly lower in neovascular age-related macular degeneration patients. TheScientificWorldJournal 2012, 2012:420190.

184

166. Kaur I, Cantsilieris S, Katta S, Richardson AJ, Schache M, Pappuru RR, Narayanan R, Mathai A, Majji AB, Tindill N et al: Association of the del443ins54 at the ARMS2 locus in Indian and Australian cohorts with age-related macular degeneration. Molecular vision 2013, 19:822-828. 167. Ricci F, Zampatti S, D'Abbruzzi F, Missiroli F, Martone C, Lepre T, Pietrangeli I, Sinibaldi C, Peconi C, Novelli G et al: Typing of ARMS2 and CFH in age-related macular degeneration: case-control study and assessment of frequency in the Italian population. Archives of ophthalmology 2009, 127(10):1368-1372. 168. Lu F, Shi Y, Qu C, Zhao P, Liu X, Gong B, Ma S, Zhou Y, Zhang Q, Fei P et al: A genetic variant in the SKIV2L gene is significantly associated with age-related macular degeneration in a Han Chinese population. Investigative ophthalmology & visual science 2013. 169. Nakata I, Yamashiro K, Akagi-Kurashige Y, Miyake M, Kumagai K, Tsujikawa A, Liu K, Chen LJ, Liu DT, Lai TY et al: Association of genetic variants on 8p21 and 4q12 with age-related macular degeneration in Asian populations. Investigative ophthalmology & visual science 2012, 53(10):6576-6581. 170. Amin EM, Oltean S, Hua J, Gammons MV, Hamdollah-Zadeh M, Welsh GI, Cheung MK, Ni L, Kase S, Rennel ES et al: WT1 mutants reveal SRPK1 to be a downstream angiogenesis target by altering VEGF splicing. Cancer cell 2011, 20(6):768-780. 171. Nowak DG, Amin EM, Rennel ES, Hoareau-Aveilla C, Gammons M, Damodoran G, Hagiwara M, Harper SJ, Woolard J, Ladomery MR et al: Regulation of vascular endothelial growth factor (VEGF) splicing from pro-angiogenic to anti-angiogenic isoforms: a novel therapeutic strategy for angiogenesis. The Journal of biological chemistry 2010, 285(8):5532-5540. 172. LaVail MM: Kinetics of rod outer segment renewal in the developing mouse retina. The Journal of cell biology 1973, 58(3):650-661. 173. Wakabayashi T, Mori T, Hirahara Y, Koike T, Kubota Y, Takamori Y, Yamada H: Nuclear lamins are differentially expressed in retinal neurons of the adult rat retina. Histochemistry and cell biology 2011, 136(4):427-436. 174. Landsman D: RNP-1, an RNA-binding motif is conserved in the DNA-binding cold shock domain. Nucleic acids research 1992, 20(11):2861-2864. 175. Birney E, Kumar S, Krainer AR: Analysis of the RNA-recognition motif and RS and RGG domains: conservation in metazoan pre-mRNA splicing factors. Nucleic acids research 1993, 21(25):5803- 5816. 176. Bandziulis RJ, Swanson MS, Dreyfuss G: RNA-binding proteins as developmental regulators. Genes & development 1989, 3(4):431-437. 177. Novoyatleva T, Heinrich B, Tang Y, Benderska N, Butchbach ME, Lorson CL, Lorson MA, Ben-Dov C, Fehlbaum P, Bracco L et al: Protein phosphatase 1 binds to the RNA recognition motif of several splicing factors and regulates alternative pre-mRNA processing. Human molecular genetics 2008, 17(1):52-70. 178. Shelley EJ, Madigan MC, Natoli R, Penfold PL, Provis JM: Cone degeneration in aging and age- related macular degeneration. Archives of ophthalmology 2009, 127(4):483-492. 179. Tazi J, Bird A: Alternative chromatin structure at CpG islands. Cell 1990, 60(6):909-920. 180. Larsen F, Gundersen G, Lopez R, Prydz H: CpG islands as gene markers in the . Genomics 1992, 13(4):1095-1107. 181. Zhang G, Taneja KL, Singer RH, Green MR: Localization of pre-mRNA splicing in mammalian nuclei. Nature 1994, 372(6508):809-812. 182. Hall LL, Smith KP, Byron M, Lawrence JB: Molecular anatomy of a speckle. The anatomical record Part A, Discoveries in molecular, cellular, and evolutionary biology 2006, 288(7):664-675. 183. Li SJ, Qi Y, Zhao JJ, Li Y, Liu XY, Chen XH, Xu P: Characterization of Nuclear Localization Signals (NLSs) and Function of NLSs and Phosphorylation of Serine Residues in Subcellular and

185

Subnuclear Localization of Transformer-2beta (Tra2beta). The Journal of biological chemistry 2013, 288(13):8898-8909. 184. Cotto J, Fox S, Morimoto R: HSF1 granules: a novel stress-induced nuclear compartment of human cells. Journal of cell science 1997, 110 ( Pt 23):2925-2934. 185. Kajita K, Kuwano Y, Kitamura N, Satake Y, Nishida K, Kurokawa K, Akaike Y, Honda M, Masuda K, Rokutan K: Ets1 and heat shock factor 1 regulate transcription of the Transformer 2beta gene in human colon cancer cells. Journal of gastroenterology 2013. 186. Ni JZ, Grate L, Donohue JP, Preston C, Nobida N, O'Brien G, Shiue L, Clark TA, Blume JE, Ares M, Jr.: Ultraconserved elements are associated with homeostatic control of splicing regulators by alternative splicing and nonsense-mediated decay. Genes & development 2007, 21(6):708-718. 187. Alexiades MR, Cepko CL: Subsets of retinal progenitors display temporally regulated and distinct biases in the fates of their progeny. Development 1997, 124(6):1119-1131. 188. Larsen F, Gundersen G, Prydz H: Choice of enzymes for mapping based on CpG islands in the human genome. Genetic analysis, techniques and applications 1992, 9(3):80-85. 189. Zhou B, Westaway SK, Levinson B, Johnson MA, Gitschier J, Hayflick SJ: A novel pantothenate kinase gene (PANK2) is defective in Hallervorden-Spatz syndrome. Nature genetics 2001, 28(4):345-349. 190. Kanadia RN, Shin J, Yuan Y, Beattie SG, Wheeler TM, Thornton CA, Swanson MS: Reversal of RNA missplicing and myotonia after muscleblind overexpression in a mouse poly(CUG) model for myotonic dystrophy. Proceedings of the National Academy of Sciences of the United States of America 2006, 103(31):11748-11753. 191. Lin X, Miller JW, Mankodi A, Kanadia RN, Yuan Y, Moxley RT, Swanson MS, Thornton CA: Failure of MBNL1-dependent post-natal splicing transitions in myotonic dystrophy. Human molecular genetics 2006, 15(13):2087-2097. 192. Nurse P, Masui Y, Hartwell L: Understanding the cell cycle. Nature medicine 1998, 4(10):1103- 1106. 193. Hara K, Tydeman P, Kirschner M: A cytoplasmic clock with the same period as the division cycle in Xenopus eggs. Proceedings of the National Academy of Sciences of the United States of America 1980, 77(1):462-466. 194. Nurse P: Universal control mechanism regulating onset of M-phase. Nature 1990, 344(6266):503-508. 195. Guertin DA, Trautmann S, McCollum D: Cytokinesis in eukaryotes. Microbiology and molecular biology reviews : MMBR 2002, 66(2):155-178. 196. Chen HZ, Tsai SY, Leone G: Emerging roles of E2Fs in cancer: an exit from cell cycle control. Nature reviews Cancer 2009, 9(11):785-797. 197. Madaule P, Eda M, Watanabe N, Fujisawa K, Matsuoka T, Bito H, Ishizaki T, Narumiya S: Role of citron kinase as a target of the small GTPase Rho in cytokinesis. Nature 1998, 394(6692):491- 494. 198. Di Cunto F, Calautti E, Hsiao J, Ong L, Topley G, Turco E, Dotto GP: Citron rho-interacting kinase, a novel tissue-specific ser/thr kinase encompassing the Rho-Rac-binding protein Citron. The Journal of biological chemistry 1998, 273(45):29706-29711. 199. Madaule P, Furuyashiki T, Eda M, Bito H, Ishizaki T, Narumiya S: Citron, a Rho target that affects contractility during cytokinesis. Microscopy research and technique 2000, 49(2):123-126. 200. Camera P, da Silva JS, Griffiths G, Giuffrida MG, Ferrara L, Schubert V, Imarisio S, Silengo L, Dotti CG, Di Cunto F: Citron-N is a neuronal Rho-associated protein involved in Golgi organization through actin cytoskeleton regulation. Nature cell biology 2003, 5(12):1071-1078.

186

201. Di Cunto F, Imarisio S, Hirsch E, Broccoli V, Bulfone A, Migheli A, Atzori C, Turco E, Triolo R, Dotto GP et al: Defective neurogenesis in citron kinase knockout mice by altered cytokinesis and massive apoptosis. Neuron 2000, 28(1):115-127. 202. Cogswell CA, Sarkisian MR, Leung V, Patel R, D'Mello SR, LoTurco JJ: A gene essential to brain growth and development maps to the distal arm of rat chromosome 12. Neuroscience letters 1998, 251(1):5-8. 203. Sarkisian MR, Li W, Di Cunto F, D'Mello SR, LoTurco JJ: Citron-kinase, a protein essential to cytokinesis in neuronal progenitors, is deleted in the flathead mutant rat. The Journal of neuroscience : the official journal of the Society for Neuroscience 2002, 22(8):RC217. 204. Karunakaran DK, Chhaya N, Lemoine C, Congdon S, Black A, Kanadia R: Loss of citron kinase affects a subset of progenitor cells that alters late but not early neurogenesis in the developing rat retina. Investigative ophthalmology & visual science 2015, 56(2):787-798. 205. Wong LL, Rapaport DH: Defining retinal progenitor cell competence in Xenopus laevis by clonal analysis. Development 2009, 136(10):1707-1715. 206. He J, Zhang G, Almeida AD, Cayouette M, Simons BD, Harris WA: How variable clones build an invariant retina. Neuron 2012, 75(5):786-798. 207. Turner DL, Cepko CL: A common progenitor for neurons and glia persists in rat retina late in development. Nature 1987, 328(6126):131-136. 208. Jensen AM, Raff MC: Continuous observation of multipotential retinal progenitor cells in clonal density culture. Developmental biology 1997, 188(2):267-279. 209. Hafler BP, Surzenko N, Beier KT, Punzo C, Trimarchi JM, Kong JH, Cepko CL: Transcription factor Olig2 defines subpopulations of retinal progenitor cells biased toward specific cell fates. Proceedings of the National Academy of Sciences of the United States of America 2012, 109(20):7882-7887. 210. Buchman JJ, Tsai LH: Putting a notch in our understanding of nuclear migration. Cell 2008, 134(6):912-914. 211. Del Bene F: Interkinetic nuclear migration: cell cycle on the move. The EMBO journal 2011, 30(9):1676-1677. 212. Silva AO, Ercole CE, McLoon SC: Plane of cell cleavage and numb distribution during cell division relative to cell differentiation in the developing retina. The Journal of neuroscience : the official journal of the Society for Neuroscience 2002, 22(17):7518-7525. 213. Elshatory Y, Deng M, Xie X, Gan L: Expression of the LIM-homeodomain protein Isl1 in the developing and mature mouse retina. The Journal of comparative neurology 2007, 503(1):182- 197. 214. Sweeney SJ, Campbell P, Bosco G: Drosophila sticky/citron kinase is a regulator of cell-cycle progression, genetically interacts with Argonaute 1 and modulates epigenetic gene silencing. Genetics 2008, 178(3):1311-1325. 215. Amano M, Nakayama M, Kaibuchi K: Rho-kinase/ROCK: A key regulator of the cytoskeleton and cell polarity. Cytoskeleton (Hoboken, NJ) 2010, 67(9):545-554. 216. Glover DM, Leibowitz MH, McLean DA, Parry H: Mutations in aurora prevent centrosome separation leading to the formation of monopolar spindles. Cell 1995, 81(1):95-105. 217. Terada Y, Tatsuka M, Suzuki F, Yasuda Y, Fujita S, Otsu M: AIM-1: a mammalian midbody- associated protein required for cytokinesis. The EMBO journal 1998, 17(3):667-676. 218. Litvak V, Tian D, Carmon S, Lev S: Nir2, a human homolog of Drosophila melanogaster retinal degeneration B protein, is essential for cytokinesis. Molecular and cellular biology 2002, 22(14):5064-5075.

187

219. LoTurco JJ, Sarkisian MR, Cosker L, Bai J: Citron kinase is a regulator of mitosis and neurogenic cytokinesis in the neocortical ventricular zone. Cerebral cortex (New York, NY : 1991) 2003, 13(6):588-591. 220. Yamamoto M, Wakatsuki T, Hada A, Ryo A: Use of serial analysis of gene expression (SAGE) technology. Journal of immunological methods 2001, 250(1-2):45-66. 221. Kim JB, Porreca GJ, Song L, Greenway SC, Gorham JM, Church GM, Seidman CE, Seidman JG: Polony multiplex analysis of gene expression (PMAGE) in mouse hypertrophic cardiomyopathy. Science 2007, 316(5830):1481-1484. 222. Chang TW: Binding of cells to matrixes of distinct antibodies coated on solid surface. Journal of immunological methods 1983, 65(1-2):217-223. 223. Tsuchihara K, Suzuki Y, Wakaguri H, Irie T, Tanimoto K, Hashimoto S, Matsushima K, Mizushima- Sugano J, Yamashita R, Nakai K et al: Massive transcriptional start site analysis of human genes in hypoxia cells. Nucleic acids research 2009, 37(7):2249-2263. 224. Cloonan N, Forrest AR, Kolle G, Gardiner BB, Faulkner GJ, Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G et al: Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nature methods 2008, 5(7):613-619. 225. Cloonan N, Grimmond SM: Transcriptome content and dynamics at single-nucleotide resolution. Genome biology 2008, 9(9):234. 226. Marguerat S, Wilhelm BT, Bahler J: Next-generation sequencing: applications beyond genomes. Biochemical Society transactions 2008, 36(Pt 5):1091-1096. 227. Morin R, Bainbridge M, Fejes A, Hirst M, Krzywinski M, Pugh T, McDonald H, Varhol R, Jones S, Marra M: Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. BioTechniques 2008, 45(1):81-94. 228. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods 2008, 5(7):621-628. 229. Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 2008, 320(5881):1344- 1349. 230. Shendure J: The beginning of the end for microarrays? Nature methods 2008, 5(7):585-587. 231. Shendure J, Ji H: Next-generation DNA sequencing. Nature biotechnology 2008, 26(10):1135- 1145. 232. Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, Penkett CJ, Rogers J, Bahler J: Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 2008, 453(7199):1239-1243. 233. Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nature reviews Genetics 2009, 10(1):57-63. 234. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR et al: Accurate whole human genome sequencing using reversible terminator chemistry. Nature 2008, 456(7218):53-59. 235. Caruccio N: Preparation of next-generation sequencing libraries using Nextera technology: simultaneous DNA fragmentation and adaptor tagging by in vitro transposition. Methods in molecular biology (Clifton, NJ) 2011, 733:241-255. 236. Holt RA, Jones SJ: The new paradigm of flow cell sequencing. Genome research 2008, 18(6):839- 846. 237. Tipu HN, Shabbir A: Evolution of DNA sequencing. Journal of the College of Physicians and Surgeons--Pakistan : JCPSP 2015, 25(3):210-215. 238. Blackshaw S, Harpavat S, Trimarchi J, Cai L, Huang H, Kuo WP, Weber G, Lee K, Fraioli RE, Cho SH et al: Genomic analysis of mouse retinal development. PLoS biology 2004, 2(9):E247.

188

239. Hart T, Komori HK, LaMere S, Podshivalova K, Salomon DR: Finding the active genes in deep RNA-seq gene expression studies. BMC genomics 2013, 14:778. 240. Consortium SM-I, Consortium SM-I: A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nature biotechnology 2014, 32(9):903-914. 241. Kurose H, Bito T, Adachi T, Shimizu M, Noji S, Ohuchi H: Expression of Fibroblast growth factor 19 (Fgf19) during chicken embryogenesis and eye development, compared with Fgf15 expression in the mouse. Gene expression patterns : GEP 2004, 4(6):687-693. 242. Behesti H, Papaioannou VE, Sowden JC: Loss of Tbx2 delays optic vesicle invagination leading to small optic cups. Developmental biology 2009, 333(2):360-372. 243. Hufnagel RB, Riesenberg AN, Quinn M, Brzezinski JAt, Glaser T, Brown NL: Heterochronic misexpression of Ascl1 in the Atoh7 retinal cell lineage blocks cell cycle exit. Molecular and cellular neurosciences 2013, 54:108-120. 244. Chiodini F, Matter-Sadzinski L, Rodrigues T, Skowronska-Krawczyk D, Brodier L, Schaad O, Bauer C, Ballivet M, Matter JM: A positive feedback loop between ATOH7 and a Notch effector regulates cell-cycle progression and neurogenesis in the retina. Cell reports 2013, 3(3):796-807. 245. Jin Z, Zhang J, Klar A, Chedotal A, Rao Y, Cepko CL, Bao ZZ: Irx4-mediated regulation of Slit1 expression contributes to the definition of early axonal paths inside the retina. Development 2003, 130(6):1037-1048. 246. Yoshida S, Mears AJ, Friedman JS, Carter T, He S, Oh E, Jing Y, Farjo R, Fleury G, Barlow C et al: Expression profiling of the developing and mature Nrl-/- mouse retina: identification of retinal disease candidates and transcriptional regulatory targets of Nrl. Human molecular genetics 2004, 13(14):1487-1503. 247. Corbo JC, Myers CA, Lawrence KA, Jadhav AP, Cepko CL: A typology of photoreceptor gene expression patterns in the mouse. Proceedings of the National Academy of Sciences of the United States of America 2007, 104(29):12069-12074. 248. Cheng H, Khan NW, Roger JE, Swaroop A: Excess cones in the retinal degeneration rd7 mouse, caused by the loss of function of orphan nuclear receptor Nr2e3, originate from early-born photoreceptor precursors. Human molecular genetics 2011, 20(21):4102-4115. 249. Omori Y, Katoh K, Sato S, Muranishi Y, Chaya T, Onishi A, Minami T, Fujikado T, Furukawa T: Analysis of transcriptional regulatory pathways of photoreceptor genes by expression profiling of the Otx2-deficient retina. PloS one 2011, 6(5):e19685. 250. Swaroop A, Xu JZ, Pawar H, Jackson A, Skolnick C, Agarwal N: A conserved retina-specific gene encodes a basic motif/leucine zipper domain. Proceedings of the National Academy of Sciences of the United States of America 1992, 89(1):266-270. 251. Swaroop A, Kim D, Forrest D: Transcriptional regulation of photoreceptor development and homeostasis in the mammalian retina. Nature reviews Neuroscience 2010, 11(8):563-576. 252. Swain PK, Hicks D, Mears AJ, Apel IJ, Smith JE, John SK, Hendrickson A, Milam AH, Swaroop A: Multiple phosphorylated isoforms of NRL are expressed in rod photoreceptors. The Journal of biological chemistry 2001, 276(39):36824-36830. 253. Walther C, Gruss P: Pax-6, a murine paired box gene, is expressed in the developing CNS. Development 1991, 113(4):1435-1449. 254. Hsieh YW, Yang XJ: Dynamic Pax6 expression during the neurogenic cell cycle influences proliferation and cell fate choices of retinal progenitors. Neural development 2009, 4:32. 255. Banday AR, Baumgartner M, Al Seesi S, Karunakaran DK, Venkatesh A, Congdon S, Lemoine C, Kilcollins AM, Mandoiu I, Punzo C et al: Replication-dependent histone genes are actively transcribed in differentiating and aging retinal neurons. Cell cycle (Georgetown, Tex) 2014, 13(16):2526-2541.

189

256. Donohoe ME, Zhang LF, Xu N, Shi Y, Lee JT: Identification of a Ctcf cofactor, Yy1, for the X chromosome binary switch. Molecular cell 2007, 25(1):43-56. 257. Kay GF, Penny GD, Patel D, Ashworth A, Brockdorff N, Rastan S: Expression of Xist during mouse development suggests a role in the initiation of X chromosome inactivation. Cell 1993, 72(2):171-182. 258. Sasaki YT, Ideue T, Sano M, Mituyama T, Hirose T: MENepsilon/beta noncoding RNAs are essential for structural integrity of nuclear paraspeckles. Proceedings of the National Academy of Sciences of the United States of America 2009, 106(8):2525-2530. 259. Marzluff WF: Metazoan replication-dependent histone mRNAs: a distinct set of RNA polymerase II transcripts. Current opinion in cell biology 2005, 17(3):274-280. 260. Simon A, Lagercrantz J, Bajalica-Lagercrantz S, Eriksson U: Primary structure of human 11-cis retinol dehydrogenase and organization and chromosomal localization of the corresponding gene. Genomics 1996, 36(3):424-430. 261. Kohl S, Baumann B, Broghammer M, Jagle H, Sieving P, Kellner U, Spegal R, Anastasi M, Zrenner E, Sharpe LT et al: Mutations in the CNGB3 gene encoding the beta-subunit of the cone photoreceptor cGMP-gated channel are responsible for achromatopsia (ACHM3) linked to chromosome 8q21. Human molecular genetics 2000, 9(14):2107-2116. 262. He W, Cowan CW, Wensel TG: RGS9, a GTPase accelerator for phototransduction. Neuron 1998, 20(1):95-102. 263. Pellikka M, Tanentzapf G, Pinto M, Smith C, McGlade CJ, Ready DF, Tepass U: Crumbs, the Drosophila homologue of human CRB1/RP12, is essential for photoreceptor morphogenesis. Nature 2002, 416(6877):143-149. 264. Ahmed ZM, Riazuddin S, Ahmad J, Bernstein SL, Guo Y, Sabar MF, Sieving P, Riazuddin S, Griffith AJ, Friedman TB et al: PCDH15 is expressed in the neurosensory epithelium of the eye and ear and mutant alleles are responsible for both USH1F and DFNB23. Human molecular genetics 2003, 12(24):3215-3223. 265. Liu Q, Zuo J, Pierce EA: The retinitis pigmentosa 1 protein is a photoreceptor microtubule- associated protein. The Journal of neuroscience : the official journal of the Society for Neuroscience 2004, 24(29):6427-6436. 266. Wachtmeister L: Oscillatory potentials in the retina: what do they reveal. Progress in retinal and eye research 1998, 17(4):485-521. 267. Jhaveri DJ, O'Keeffe I, Robinson GJ, Zhao QY, Zhang ZH, Nink V, Narayanan RK, Osborne GW, Wray NR, Bartlett PF: Purification of neural precursor cells reveals the presence of distinct, stimulus-specific subpopulations of quiescent precursors in the adult mouse hippocampus. The Journal of neuroscience : the official journal of the Society for Neuroscience 2015, 35(21):8132-8144. 268. Brooks MJ, Rajasimha HK, Roger JE, Swaroop A: Next-generation sequencing facilitates quantitative analysis of wild-type and Nrl(-/-) retinal transcriptomes. Molecular vision 2011, 17:3034-3054. 269. Mears AJ, Kondo M, Swain PK, Takada Y, Bush RA, Saunders TL, Sieving PA, Swaroop A: Nrl is required for rod photoreceptor development. Nature genetics 2001, 29(4):447-452. 270. Calvert PD, Krasnoperova NV, Lyubarsky AL, Isayama T, Nicolo M, Kosaras B, Wong G, Gannon KS, Margolskee RF, Sidman RL et al: Phototransduction in transgenic mice after targeted deletion of the rod transducin alpha -subunit. Proceedings of the National Academy of Sciences of the United States of America 2000, 97(25):13913-13918. 271. Roepman R, Bernoud-Hubac N, Schick DE, Maugeri A, Berger W, Ropers HH, Cremers FP, Ferreira PA: The retinitis pigmentosa GTPase regulator (RPGR) interacts with novel transport-like

190

proteins in the outer segments of rod photoreceptors. Human molecular genetics 2000, 9(14):2095-2105. 272. Karan S, Frederick JM, Baehr W: Novel functions of photoreceptor guanylate cyclases revealed by targeted deletion. Molecular and cellular biochemistry 2010, 334(1-2):141-155. 273. Ivanova E, Muller U, Wassle H: Characterization of the glycinergic input to bipolar cells of the mouse retina. The European journal of neuroscience 2006, 23(2):350-364. 274. Roger JE, Ranganath K, Zhao L, Cojocaru RI, Brooks M, Gotoh N, Veleri S, Hiriyanna A, Rachel RA, Campos MM et al: Preservation of cone photoreceptors after a rapid yet transient degeneration and remodeling in cone-only Nrl-/- mouse retina. The Journal of neuroscience : the official journal of the Society for Neuroscience 2012, 32(2):528-541. 275. Strettoi E, Mears AJ, Swaroop A: Recruitment of the rod pathway by cones in the absence of rods. The Journal of neuroscience : the official journal of the Society for Neuroscience 2004, 24(34):7576-7582. 276. Schmieder R, Lim YW, Edwards R: Identification and removal of ribosomal RNA sequences from metatranscriptomes. Bioinformatics (Oxford, England) 2012, 28(3):433-435. 277. Flicek P, Amode MR, Barrell D, Beal K, Brent S, Chen Y, Clapham P, Coates G, Fairley S, Fitzgerald S et al: Ensembl 2011. Nucleic acids research 2011, 39(Database issue):D800-806. 278. Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, Cline MS, Goldman M, Barber GP, Clawson H, Coelho A et al: The UCSC Genome Browser database: update 2011. Nucleic acids research 2011, 39(Database issue):D876-882. 279. Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology 2009, 10(3):R25. 280. Duitama J, Srivastava PK, Mandoiu, II: Towards accurate detection and genotyping of expressed variants from whole transcriptome sequencing data. BMC genomics 2012, 13 Suppl 2:S6. 281. Nicolae M, Mangul S, Mandoiu, II, Zelikovsky A: Estimation of alternative splicing isoform frequencies from RNA-Seq data. Algorithms for molecular biology : AMB 2011, 6(1):9. 282. Feng J, Meyer CA, Wang Q, Liu JS, Shirley Liu X, Zhang Y: GFOLD: a generalized fold change for ranking differentially expressed genes from RNA-seq data. Bioinformatics (Oxford, England) 2012, 28(21):2782-2788. 283. Bullard JH, Purdom E, Hansen KD, Dudoit S: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC bioinformatics 2010, 11:94. 284. Huang da W, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature protocols 2009, 4(1):44-57. 285. Huang da W, Sherman BT, Lempicki RA: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic acids research 2009, 37(1):1-13. 286. Warde-Farley D, Donaldson SL, Comes O, Zuberi K, Badrawi R, Chao P, Franz M, Grouios C, Kazi F, Lopes CT et al: The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic acids research 2010, 38(Web Server issue):W214-220. 287. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics (Oxford, England) 2003, 4(2):249-264. 288. Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data. Nucleic acids research 2003, 31(4):e15. 289. Gautier L, Cope L, Bolstad BM, Irizarry RA: affy--analysis of Affymetrix GeneChip data at the probe level. Bioinformatics (Oxford, England) 2004, 20(3):307-315. 290. Sturn A, Quackenbush J, Trajanoski Z: Genesis: cluster analysis of microarray data. Bioinformatics (Oxford, England) 2002, 18(1):207-208.

191

291. Sundermeier TR, Palczewski K: The physiological impact of microRNA gene regulation in the retina. Cellular and molecular life sciences : CMLS 2012, 69(16):2739-2750. 292. Lewis MA, Steel KP: MicroRNAs in mouse development and disease. Seminars in cell & developmental biology 2010, 21(7):774-780. 293. Damiani D, Alexander JJ, O'Rourke JR, McManus M, Jadhav AP, Cepko CL, Hauswirth WW, Harfe BD, Strettoi E: Dicer inactivation leads to progressive functional and structural degeneration of the mouse retina. The Journal of neuroscience : the official journal of the Society for Neuroscience 2008, 28(19):4878-4887. 294. Altuvia Y, Landgraf P, Lithwick G, Elefant N, Pfeffer S, Aravin A, Brownstein MJ, Tuschl T, Margalit H: Clustering and conservation patterns of human microRNAs. Nucleic acids research 2005, 33(8):2697-2706. 295. Wienholds E, Kloosterman WP, Miska E, Alvarez-Saavedra E, Berezikov E, de Bruijn E, Horvitz HR, Kauppinen S, Plasterk RH: MicroRNA expression in zebrafish embryonic development. Science 2005, 309(5732):310-311. 296. Lumayag S, Haldin CE, Corbett NJ, Wahlin KJ, Cowan C, Turturro S, Larsen PE, Kovacs B, Witmer PD, Valle D et al: Inactivation of the microRNA-183/96/182 cluster results in syndromic retinal degeneration. Proceedings of the National Academy of Sciences of the United States of America 2013, 110(6):E507-516. 297. Sacheli R, Nguyen L, Borgs L, Vandenbosch R, Bodson M, Lefebvre P, Malgrange B: Expression patterns of miR-96, miR-182 and miR-183 in the development inner ear. Gene expression patterns : GEP 2009, 9(5):364-370. 298. Li P, Sheng C, Huang L, Zhang H, Huang L, Cheng Z, Zhu Q: MiR-183/-96/-182 cluster is up- regulated in most breast cancers and increases cell proliferation and migration. Breast cancer research : BCR 2014, 16(6):473. 299. Mihelich BL, Khramtsova EA, Arva N, Vaishnav A, Johnson DN, Giangreco AA, Martens-Uzunova E, Bagasra O, Kajdacsy-Balla A, Nonn L: miR-183-96-182 cluster is overexpressed in prostate tissue and regulates zinc homeostasis in prostate cells. The Journal of biological chemistry 2011, 286(52):44503-44511. 300. Liu Y, Han Y, Zhang H, Nie L, Jiang Z, Fa P, Gui Y, Cai Z: Synthetic miRNA-mowers targeting miR- 183-96-182 cluster or miR-210 inhibit growth and migration and induce apoptosis in bladder cancer cells. PloS one 2012, 7(12):e52280. 301. Tang H, Bian Y, Tu C, Wang Z, Yu Z, Liu Q, Xu G, Wu M, Li G: The miR-183/96/182 cluster regulates oxidative apoptosis and sensitizes cells to chemotherapy in gliomas. Current cancer drug targets 2013, 13(2):221-231. 302. Krol J, Busskamp V, Markiewicz I, Stadler MB, Ribi S, Richter J, Duebel J, Bicker S, Fehling HJ, Schubeler D et al: Characterizing light-regulated retinal microRNAs reveals rapid turnover as a common property of neuronal microRNAs. Cell 2010, 141(4):618-631. 303. Zhu Q, Sun W, Okano K, Chen Y, Zhang N, Maeda T, Palczewski K: Sponge transgenic mouse model reveals important roles for the microRNA-183 (miR-183)/96/182 cluster in postmitotic photoreceptors of the retina. The Journal of biological chemistry 2011, 286(36):31749-31760. 304. Insinna C, Besharse JC: Intraflagellar transport and the sensory outer segment of vertebrate photoreceptors. Developmental dynamics : an official publication of the American Association of Anatomists 2008, 237(8):1982-1992.

192

Appendix 1:

1. Steps to run the bioinformatics pipeline

Analyses from RNAseq data involve mapping the reads obtained from sequencing onto the reference genome or transcript libraries or exon-exon junction libraries or combinations of these. The gene expression level is then estimated from unique reads.

Biological meaning can be extracted by comparing the expression levels of genes between samples.

The pipeline that was employed by us to extract biological information from big data

(RNAseq) used a customized bioinformatics pipeline which organized information into separate bins. The genes in these bins were grouped based on their transcriptional kinetics across two samples. Then genes belonging to each bin were analyzed using

DAVID to understand the functions enriched by them. Additionally, other potential partners to these genes involved in a pathway were also identified using another tool called GeneMANIA.

Following sections describe step-wise procedure to execute the bioinformatics pipeline mentioned above. It begins with accessing the remote server to execute the scripts. This is followed by the main scripts employed in our pipeline and the tools employed in those scripts such as Bowtie for read mapping, IsoEM for estimating the isoform frequency of genes from mapped reads, and GFOLD and Fisher’s exact test for making differential expression calls.

1.1. Server Access

193

To access the remote server, accounts have to be created by Dr. Ion Mandoiu of

Computer Science and Engineering department. Once they are created, the user has to download two software namely, WinSCP and PuTTY. While WinSCP is the client server that saves all the directories, scripts and the output of the script execution, PuTTY is the user interface. The platform used is Linux; hence the commands are given in Linux language. Once these two softwares are downloaded, the user can logon to WinSCP using their username and the password provided by Dr. Mandoiu. Once logged in, the user can change his/her password by giving the command “passwd”.

Default server given is cnv1 and so the host is cnv1.engr.uconn.edu. The default port is

22. After the password change, the user could see their home directory as the default start screen in WinSCP. All directories, fastq files and scripts for running the pipeline are stored in “data1” directory in the “root” directory. When the user wants to switch to that directory, command line “cd” followed by the path to that directory should be given. To login in PuTTY, the user should use the same password (Note: no characters will appear when the user types his/her password. Once they hit enter, cnv1% will appear. The screen in PuTTY will always have the header, “cnv1%”. If that shows up, the user can continue typing their command line. For example, to change the directory to data1 from home directory, the user can type

Cnv1% cd /data1

If the user wants to create directory under their name in data1, the command line is,

Cnv1% mkdir Krishna

This will create a directory named “Krishna” under data1 directory.

194

Likewise, new directories can be created. The user has to be careful in knowing where they currently are as creating directories elsewhere could be confusing later. To make sure they are in the correct location, the user can type the command “pwd”. This will show the current location. Then they can change directory using “cd” followed by the path of the location they want the new directory created.

1.2. Common command lines

Some of the commonly used command lines are as follows:

1. To check the current location of the user: pwd

2. To create new directory: mkdir

3. To copy the contents of a directory: cp –recursive

name>

4. To create short links (short cuts for directories): ln –s

shortcutname.same extension as the original file.

5. To unzip or zip files: gunzip or gzip followed by the path name.

6. To check if the script is running: ps x

7. To abort a running script: kill followed by the ID (it will show up when the user uses

ps x command).

8. To find a particular character in the output: grep followed by that character (it could

be a gene name or word)

9. To save the log file for a script: /pathname of the script >& output name.log & (This

enables the script to run even when the user terminates the session in WinSCP).

10. To concatenate two data: cat filename1.extension filename2.same extension >

newname.same extension.

195

1.3. Building index files:

For comparing the RNAseq data, a reference file or an index file must be first built. It has to be done for each species. In case of mouse, separate index files were built for C57/BL6 and CD1 strain of mice. CD1 reference was used in the E16CE, P0CE, P0NE comparison, while the former was used in the Nrl wildtype (WT) and knockout (KO) and microRNA WT and KO RNAseq data analysis. To build the index file, one needs two reference files. First one is the FASTA file, which contains the genome information for the species under investigation. It can be obtained from UCSC genome browser (https://genome.ucsc.edu/).

For mouse, version Mus musculus 10 (mm10) was used. This reference genome is that of the C57/BL6 strain. The second file that is required is the gene transfer file (GTF), which contains information about gene structure such as gene name, start and end positions, frame, and strand. This file is obtained from Ensembl browser

(www.ensembl.org), where the mouse file (version 68) was used to generate the index file. The downloaded file contains the isoform IDs, but not the gene IDs, which are needed for generating the index file.

Script used:

#!/bin/tcsh set genome=/import1/UCSC/mm10/mm10.fa; set GTF=/import1/GTF/mm10Ensembl68wGeneNames.gtf; set isoSequences=/import1/bowtie_index/mm10/mm10Ens68_polyA200_randOrder.fa; set Ens_index=/import1/bowtie_index/mm10/mm10Ens68_polyA200_randOrder.fa;

The highlighted fields in “green” are species-specific. Here shown is that of mouse.

These fields have to be changed when working with different species.

196

Prior to executing the above mentioned script, the “CreateGTFWithGeneIDs” script must be executed, which queries the Ensembl transcript IDs to a clusterFile containing both transcript and gene IDs, generating a new GTF file that contains the gene IDs. When executing this script, the file names of the downloaded GTF, the clusterFile with geneIDs and transcriptIDs, and the output GTF must be provided as inputs one, two, and three, respectively. The output files will be created at the location of the user when they enter the command to execute the script. Therefore, the user should enter the intended destination directory before executing the script. This is done by entering the change directory command, cd, followed by the destination path/directory. Following is the

‘createGTFWithGeneIDs’ script:

GTF=$1 clusterFile=$2 outputGTF=$3 rm -f tmp.GTF sortedClusterFile.txt tmp.txt awk '{print substr($12,2,length($12)-3), $0}' $GTF | sort -k 1,1 > tmp.GTF sort -k 1,1 $clusterFile > sortedClusterFile.txt join sortedClusterFile.txt tmp.GTF > tmp.txt awk '{print $3 "\t" $4 "\t" $5 "\t" $6 "\t" $7 "\t" $8 "\t" $9 "\t" $10 "\t" $11, "\42" $2 "\42;", $13, $14}' tmp.txt > $outputGTF

CD1 reference genome:

Single nucleotide variations due to the strain difference between C57/BL6 and CD1 are first accounted for and reads are mapped to the newly created reference genome.

1.4. Input files:

The input files for the analysis are the FASTQ files. They are unmapped raw reads obtained from the RNAseq run. Nowadays, reads are obtained as paired-end reads (~

120 bp per end). Therefore, each sample contains two FASTQ files for each end. The

197 user then sometimes unzips the files followed by creating a short link for these files in their directory in the root directory.

1.5: Read mapping: Bowtie

Reads are mapped using the read aligner, Bowtie, which efficiently aligns short reads

(~35bp). Burrows-Wheeler index is used by Bowtie to index the genome. The output is usually obtained in the sequence alignment/map (SAM) format. Bowtie forms the basis for other algorithms such as TopHat, Cufflinks, Crossbow and Myrna. Input for Bowtie include an index file and fastq files (raw reads). These fastq files can be single or paired- end reads.

The alignments that are reported by Bowtie are randomized to avoid “mapping bias”, which is defined as the spurious holes that cause the aligner not to report a particular class of good alignments. These alignments cannot have more than “N” mismatches, where N= 0 – 3 in the first “L” bases, where L ≥ 5. The first L bases are called the “seed”.

Also, the sum of the Phred quality values may not exceed “E” (default is 40).

Required Inputs: (Bowtie, IsoEM command lines along with these inputs are a part of

Run-analysis.sh script) set genome=/import1/genome/mm10/mm10.fa; set knownGeneGTF=/import1/GTF/mm10Ensembl68wGeneNames.gtf; set index=/import1/bowtie_index/mm10/mm10Ens68_polyA200_randOrder.fa; set isoSamFile=mm10_Ens68_coordinates; set genomeSamFile=mm10_Ens68_in_genome_coordinates; #set geneClusters=/import1/GTF/mm10Ens68_TransID_GeneName.txt; set isoEM_Dir=/home/sahar/isoem-clip-polyA/bin; set fastqDir=/data1/krishna/nrlko/data Bowtie command: bowtie --chunkmbs 512 -k 20 -l 30 -n 3 -p 16 -e 250 --sam --sam-nohead \ $index ${fastqDir}/${condition}-${replicate}.fastq \ ${condition}-${replicate}-$isoSamFile.sam

198

1.6: IsoEM

Isoform expectation-maximization (IsoEM) algorithm is employed to infer the frequency of isoform occurrence for a gene from the mapped reads. The algorithm uses information on the standard deviation, distribution of insert sizes, which are generated during the sequencing library preparation. It uses base quality scores, strand and read pairing information. IsoEM maps the reads to the known isoforms using output from Bowtie by converting read alignments to genome coordinates. This in turn is combined to obtain isoforms for each read, which then use “line sweep” technique to detect the compatibility between reads and isoforms, during which reads are grouped based on their isoform compatibility. Next, when reads match multiple isoforms, IsoEM simultaneously estimates the frequencies as well as iteratively performs the following two steps until they converge:

 E-step: Expected number of reads, n (j) is computed from isoform (j) based on

computed weight (w) assuming that the frequency of the isoform, f (j) is correct.

 M-step: For each isoform, f (j) is set and read coverage is normalized based on the

E-step.

Script used: set option="--directional"; set average=320; set std=60; isoem -a --directional -G $knownGeneGTF P0-all-sort.sam -polyA 200 mv ${condition}-${replicate}-$genomeSamFile.iso_estimates ${condition}- ${replicate}.iso_estimates mv ${condition}${replicate}-$genomeSamFile.gene_estimates ${condition}- ${replicate}.gene_estimates

1.7: Making the differential expression (DE) calls:

199

Once the reads are mapped and the isoform frequency is estimated in Fragment

Per Kilobase per Million mapped reads (FPKM) units, the threshold for gene expression is set. In case of retina, based on the known expression kinetics of genes, the threshold was set to 1. Therefore, when a gene (whose FPKM is calculated based on the sum of its constituent isoforms) is considered expressed (FPKM ≥ 1), differential expression (DE) calls had to be made. To make DE calls, two independent algorithms were employed namely, GFOLD and Fisher’s exact test. The overall script that is executed is “analysis- pipeline script” which is as follows:

#!/bin/sh sample1=$1 sample2=$2

#1 ../bin/run-all-methods $sample1 $sample2 #2 ../bin/getGFOLDIsoDEIntersection ${sample1}_${sample2} # no second parameter for isoform ../bin/getGFOLDIsoDEIntersection ${sample1}_${sample2} gene ##3 ../bin/ExpressedNotExpressedAnalysis ${sample1}_${sample2} # ##4, 5, 6, and 7

../bin/DivideOnlyGroupsToOneAndMultipleIsoforms ${sample1}_${sample2} ../bin/GetDIEGenes ${sample1}_${sample2}

../bin/MergeFiles ${sample1} ${sample2} ${sample1}_${sample2} ## 7 ../bin/LabelGeneSubgroupsInAllPartitionsFiles ${sample1}_${sample2} #8 ../bin/GetStats

#clean up #rm -f *List #rm -f *.tmp

This script encompasses the following eight scripts: i. Run-all-methods

#!/bin/bash geneGTF=/import1/GTF/mm10Ensembl68wGeneNames.gtf isoDE=/home/sahar/code/FishersTest/isode isoGTF=/import1/GTF/mm10Ensembl68.gtf

200 sample1=$1 sample2=$2 echo "in run-all-methods" echo $sample1 echo $sample2

# The number of mapped reads in million is taken from the bowtie log. Eventaully needs to be computed from the sam file for automation #get the number of aligned reads count1=`awk '{if ($3 != "*") print $1}' ${sample1}- mm10_Ens68_in_genome_coordinates.sam | sort --parallel 6 | uniq | wc -l` count1InMillion=$(( count1/1000000 )) count2=`awk '{if ($3 != "*") print $1}' ${sample2}- mm10_Ens68_in_genome_coordinates.sam | sort --parallel 6 | uniq | wc -l` count2InMillion=$(( count2/1000000 )) ../bin/runFisher $sample1 $sample2 $count1InMillion $count2InMillion ../bin/runGFOLD ${sample1}-mm10_Ens68_in_genome_coordinates ${sample2}- mm10_Ens68_in_genome_coordinates $geneGTF gfold.geneout ../bin/runGFOLD ${sample1}-mm10_Ens68_in_genome_coordinates ${sample2}- mm10_Ens68_in_genome_coordinates $isoGTF gfold.out echo "end run-all-methods"

This in turn contains two other scripts including (1) runGFOLD and (2) runFisher

GFOLD:

Generalized fold change algorithm is based on the Bayesian posterior probability distribution, which is used to estimate the relative difference in the gene expression. It is also employed for the isoform level analysis of differential expression. This method is preferred over methods that give raw fold change for estimating the differential expression of gene in terms of p-value.

Script used: runGFOLD

#!/bin/sh sample1=$1 sample2=$2 GTF=$3 output=$4 echo "runGfold" gfold count -ann $GTF -tag ${sample1}.sam -o ${sample1}.${output}.read_cnt gfold count -ann $GTF -tag ${sample2}.sam -o ${sample2}.${output}.read_cnt gfold diff -sc 0.01 -s1 ${sample1}.${output} -s2 ${sample2}.${output} -suf .read_cnt -o $output echo "end runGfold"

201

If the fold change needs to be changed, “format GFOLD” script has the parameter that needs to be changed. formatGFOLDfiles script:

#!/bin/bash gfoldFile=$1 FCThreshold=$2 rm -f ${gfoldFile}.${FCThreshold}.isoDEFormat awk -v FCt=$FCThreshold '{FC=2^$4; if (FC < 1 && FC > 0) FC=(1/FC); if ($5>$6) Dir=1; else Dir=2; if (FC >= FCt && $2!=0) diff="DE"; else diff="nonDE"; if ($1 != "#") print $1, $5, $6, $2, Dir, FC, diff}' $gfoldFile > ${gfoldFile}.${FCThreshold}.isoDEFormat

The field in “red” needs to be changed to “2” for the script to consider the input fold change along with the change mentioned in the ‘getGFOLDsioDEintersection’ file.

Fisher’s exact test:

Fisher’s exact test is another algorithm used in the estimation of differential expression of genes between tow samples. This algorithm calculates the probability score (P-value) and it gives an exact measure of the deviation from the null hypothesis compared to G-test or

Chi-square test, when the sample size is small.

Script used: runFisher

#!/bin/sh

Fisher=/home/sahar/code/FishersTest/isode echo "in runIsoDE" sample1=$1 sample2=$2 NumberOfMappeReadsinMillion1=$3 NumberOfMappeReadsinMillion2=$4 pair=${sample1}_${sample2} echo $pair rm -f ${pair}*.iso_estimates rm -f ${pair}*.gene_estimates rm -f ${pair}.DE rm -f ${pair}_wgeneName.DE rm -f ${pair}.geneDE key-merge ${sample1}.iso_estimates ${sample2}.iso_estimates > ${pair}.iso_estimates

202 awk '{if ($2 < 1) fpkm1 = 0; else fpkm1 = $2; if ($3 < 1) fpkm2 = 0; else fpkm2 = $3; if (fpkm1 != 0 && fpkm2 != 0) print $1 "\t" fpkm1 "\t" fpkm2}' ${pair}.iso_estimates > ${pair}Tab.iso_estimates key-merge ${sample1}.gene_estimates ${sample2}.gene_estimates > ${pair}.gene_estimates awk '{if ($2 < 1) fpkm1 = 0; else fpkm1 = $2; if ($3 < 1) fpkm2 = 0; else fpkm2 = $3; if (fpkm1 != 0 && fpkm2 != 0) print $1 "\t" fpkm1 "\t" fpkm2}' ${pair}.gene_estimates > ${pair}Tab.gene_estimates

$Fisher ${pair}Tab.iso_estimates $NumberOfMappeReadsinMillion1 $NumberOfMappeReadsinMillion2 2 0.01 housekeeping ENSMUST00000147954 > ${pair}.DE $Fisher ${pair}Tab.gene_estimates $NumberOfMappeReadsinMillion1 $NumberOfMappeReadsinMillion2 2 0.01 housekeeping Gapdh > ${pair}.geneDE

# $Fisher ${pair}Tab.iso_estimates $NumberOfMappeReadsinMillion1 $NumberOfMappeReadsinMillion2 2 0.01 total > ${pair}.DE # $Fisher ${pair}Tab.gene_estimates $NumberOfMappeReadsinMillion1 $NumberOfMappeReadsinMillion2 2 0.01 total > ${pair}.geneDE echo "end run IsoDE"

The number in “red” is the specified fold change. This number can be changed based on the fold change for each set of data. This change has to accompany the fold change in formatGFOLD script for the change to take effect. The field highlighted in “green” is the transcript ID of mouse Gapdh gene. When different species data is analyzed, this field should be changed to the species-specific Gapdh isoform that is expressed in the data.

ii. getGFOLDisoDEintersection

This is the intersection of GFOLD and Fisher test results to select for genes with p-value

0.01 and fold change ≥ 2.

#!/bin/sh pair=$1 subset=$2 # gene if for gene DE and nothing (do not supply parameter) if isoform DE echo "In getGFOLDIsoDEIntersection"

echo $pair $subset ../bin/formatGFoldFiles gfold.${subset}out 1 # 1 will not eliminate DE calls based on fold change of GFOLD; it will be done on the intersection with isoDE; so it would be based on isoem FPKMs rm -f ${pair}.${subset}DE.IdFpkm1Fpkm2DirDE

203

awk '{print $1, $2, $3, $5, $7}' ${pair}.${subset}DE > ${pair}.${subset}DE.IdFpkm1Fpkm2DirDE rm -f gfold.${subset}out.IDDirDE awk '{print $1, $5, $7}' gfold.${subset}out.1.isoDEFormat > gfold.${subset}out.IDDirDE rm -f keymerge.tmp key-merge ${pair}.${subset}DE.IdFpkm1Fpkm2DirDE gfold.${subset}out.IDDirDE > keymerge.tmp rm -f ${pair}.${subset}DE.IdFpkm1Fpkm2DirDEGFoldDirDE awk '{if (NF == 7) print $0}' keymerge.tmp > ${pair}.${subset}DE.IdFpkm1Fpkm2DirDEGFoldDirDE # eliminate lines with no IsoED output; isoDE with run only on genes expressed on both samples (FPKM > 0.1)

rm -f ${pair}.${subset}IsoDEGFOLDOverExpIn1List awk '{if ($4 == $6 && $5 == "DE" && $7 == "DE" && $2 > $3) print $1, $2, $3, "1", ($2/$3), "DE"}' ${pair}.${subset}DE.IdFpkm1Fpkm2DirDEGFoldDirDE > ${pair}.${subset}IsoDEGFOLDOverExpIn1List rm -f ${pair}.${subset}IsoDEGFOLDOverExpIn2List awk '{if ($4 == $6 && $5 == "DE" && $7 == "DE" && $3 > $2) print $1, $2, $3, "2", ($3/$2), "DE"}' ${pair}.${subset}DE.IdFpkm1Fpkm2DirDEGFoldDirDE > ${pair}.${subset}IsoDEGFOLDOverExpIn2List rm -f ${pair}.${subset}IsoDEGFOLDNonDEList awk '{if ($4 != $6 || $5 != "DE" || $7 != "DE") print $1, $2, $3, "nonDE"}' ${pair}.${subset}DE.IdFpkm1Fpkm2DirDEGFoldDirDE > ${pair}.${subset}IsoDEGFOLDNonDEList if [ "$subset" != "gene" ] then rm -f ${pair}_wgeneID.${subset}IsoDEGFOLDOverExpIn1List key-merge ../mm10Ens68TransID_GeneName.txt ${pair}.${subset}IsoDEGFOLDOverExpIn1List | awk '{if (NF == 7) print $0}' > ${pair}_wgeneID.${subset}IsoDEGFOLDOverExpIn1List rm -f ${pair}_wgeneID.${subset}IsoDEGFOLDOverExpIn2List key-merge ../mm10Ens68TransID_GeneName.txt ${pair}.${subset}IsoDEGFOLDOverExpIn2List | awk '{if (NF == 7) print $0}' > ${pair}_wgeneID.${subset}IsoDEGFOLDOverExpIn2List rm -f ${pair}_wgeneID.${subset}IsoDEGFOLDNonDEList key-merge ../mm10Ens68TransID_GeneName.txt ${pair}.${subset}IsoDEGFOLDNonDEList | awk '{if (NF == 5) print $0}' > ${pair}_wgeneID.${subset}IsoDEGFOLDNonDEList else

rm -f ${pair}.${subset}IsoDEGFOLD*List.GeneIDs awk '{print $1}' ${pair}.${subset}IsoDEGFOLDOverExpIn1List > ${pair}.${subset}IsoDEGFOLDOverExpIn1List.GeneIDs awk '{print $1}' ${pair}.${subset}IsoDEGFOLDOverExpIn2List > ${pair}.${subset}IsoDEGFOLDOverExpIn2List.GeneIDs awk '{print $1}' ${pair}.${subset}IsoDEGFOLDNonDEList > ${pair}.${subset}IsoDEGFOLDNonDEList.GeneIDs

fi

echo "End getGFOLDIsoDEIntersection

The field in “red” has to be changed to the desired fold change along with the changes in ‘formatGFOLDfiles’ and ‘runFisher’ scripts for the new fold change to be included in the anaylsis. This script works directionally, i.e., a gene that comes out of GFOLD as

204

“differentially expressed”, then is checked for its p-value (whether it is 0.01) in the output of runFisher script. Only those genes that have passed this becomes “DE genes”. If a gene is “DE” by GFOLD but does not pass the set p-value in Fisher, the gene does not pass the DE test.

iii. ExpressedNotExpressedAnalysis

This script uses threshold = 1 to determine whether a gene is expressed (FPKM≥1) or

not (FPKM <1).

iv. DivideOnlyGroupsToOneAndMultipleIsoforms

This script further sub-divides the “Only” group of genes into single isoform or multiple

isoform.

v. GetDIEGenes

vi. MergeFiles

It merges the output of the above mentioned scripts.

vii. LabelGeneSubgroupsInAllPartitionsFiles

This assigns the label to each gene by merging the output of the above scripts.

viii. GetStats

This gives the final number of genes in each category (bin) and in the sub-category along with the number of mapped reads and number of not expressed genes.

1.8: Binning:

Once the genes are categorized based on their FPKM units and their differential expression, they are subjected to DAVID analysis to identify the functional enrichments of those genes.

2. Downstream analysis of the binned data

205

2.1: Finding the functional enrichment through DAVID analysis.

The Database for Annotation, Visualization and Integrated Discovery (DAVID) is an algorithm to provide functional enrichments for a given set of genes. Additionally, it provides other major biological functions related to the gene list as well as find other functionally similar genes in the species genome not listed in the query. Protein domain and motifs information along with gene-disease association are also provided by DAVID.

It is one of the most sought big data mining tool to investigate the biological meaning from

RNAseq data. DAVID mines the available literature and studies to give functional enrichments for the genes. Usually a gene list that has an underlying common feature such as transcriptional kinetics could enrich for more meaningful functions compared to a random list of genes. For example, most up- or down- regulated set of genes between two samples have been shown to be involved in specific interesting biological processes compared to genes that are not changing between the two. The functions enriched by DAVID could be either validated from apriori knowledge or by performing wet lab experiments. Also, the size of the gene list is an important factor in determining the function with higher statistical significance. Therefore, a larger gene list can have a statistically significant enrichment.

The advantages of DAVID includes its flexibility to use different species background for analyses. Besides giving the functional enrichment with statistical significance, DAVID also provides annotation categories, protein-protein interaction, protein functional domains, disease association, bio-pathways, sequence feature, homology, and gene tissue expression etc.

206

Following are the steps to identify the functional enrichments of genes belonging to different bins:

1. Highlight “Upload” tab.

2. Paste the gene list in the white space under “A. Paste a list”.

3. Choose “Ensembl _Gene _ID” in the drop down box under “Step 2: Select

Identifier”.

4. Choose “Gene list” under “Step 3: List type”. Hit “Submit list”.

5. Click “Submit to conversion tool” which will be next to the drop down box in

option 1.

6. Another tab would open up. Click on “red arrow” next to “official _gene _symbol”

under “convert all” under “Summary of Ambiguous gene IDs”.

7. Click “Submit converted list to DAVID as a gene list”. Give a name to the list

instead of “new_converted_list”.

8. Go to the neighbouring tab that was already open but now the “list” tab would

be highlighted.

9. Choose “Mus musculus” species and click on “Select species”. If any other

species is selected (You can find it by looking at the current background) other

than Mus musculus, go to “Background” tab and choose “Mus musculus” from

it and click “Use”.

10. Click “Functional Annotation clustering”.

11. Repeat the step above again with selecting the species, and clicking on

“Functional Annotation Clustering” again.

207

12. Another window would open. Click on “Download file”. Save the text file under

the same name you gave earlier.

2.2: Finding the potential interactors of genes involved in a pathway through

GeneMANIA

GeneMANIA is an algorithm that finds genes related to a set of input genes by functional association. GeneMANIA mines data from literature regarding genetic interactions, pathways, co-expression, protein domain similarity, protein-DNA, protein- protein, genetic interactions, gene and protein expression data, phenotypic screening profiles, and co-localization studies. Sometimes, orthologous studies are also used to define a partner to a gene in a complex or a pathway. The first input set of genes are

“bait” as they define the rest of the potential partners.

GeneMANIA works best with the maximum input number of 50 genes and make gene function predictions on the basis of gene ontology annotation patterns. Network generated is accompanied by the data source such as PubMed or BioGRID, Pfam etc.

Co-expression data usually is collected from Gene Expression Omnibus (GEO) from a publication. Physical interaction data is mainly associated with protein-protein interaction studies from primary literature and databases such as BioGRID and Pathwaycommons.

In genetic interaction data, two genes are functionally associated if perturbation of one gene affects the other gene. Also, two genes are linked if they share a protein domain, which is obtained from InterPro, SMART and Pfam. Co-localization is another way of linking two genes when they are shown to be present in the same tissue and the same location or gene products present at the same cellular location. Connection between two genes also come from pathway data where they are shown to participate in the same

208 reaction within a pathway (data collected fromReactome, bioCyc, Pathwaycommons).

Predicted connections are usually associated with orthology.

Following are the steps involved in identifying the network in GeneMANIA:

1. Input the list of genes, which is considered the “bait”.

2. Select the species.

3. The input list is shown in black balls while the new potential interactors as ones in

grey balls.

4. Save the network and the list of genes.

The scripts for the above mentioned customized bioinformatics pipeline were written by

Dr. Sahar AlSeesi under the guidance of Dr. Ion Mandoiu.

209

Supplementary Information for Chapter 5

Supplementary figures – Additional file 1

Figure S1

210

Figure S1 (Related to Materials and Methods): Read position mismatch analysis.

A – C. Reads obtained from deep RNA sequencing for E16CE (A), P0CE (B) and P0NE

(C) were subjected to mismatch analysis by HardMerge alignment. Shown in x-axis is the read position (1 – 101) and shown in y-axis is the percentage of reads with mismatch.

211

Figure S2

Figure S2 (Related to Fig. 5.2): Transcriptome summary for E16CE-P0CE & P0CE-

P0NE comparison: A & B. Transcriptome summary of the comparison between

E16CE-P0CE and P0CE-P0NE, respectively. The white box represents all genes examined for this study. The venn diagram within it represents the different bins. C.

Alternative splicing (AS) status of “expressed in both” groups (OR_E16CE, Non_DR,

OR_P0CE, and OR_P0NE) in E16CE-P0CE and P0CE-P0NE comparisons.

212

Figure S3

Figure S3 (Related to Fig. 5.4): Downstream analysis of genes belonging to the

OR_P0NE bin enriching for the GOterm "Synapse": A. Output of the downstream gene annotation analysis pipeline parts II and III for OR_P0NE category which enriched for synapse. B. Genes resulted from the DAVID analysis for the GOterm “synapse”

213 were further sub-divided into presynaptic or postsynaptic along with the time point of their transcription initiation. Up (↑) and down (↓) arrows indicate upregulation or downregulation of transcription at that time point.

214

Figure S4

Figure S4 (Related to Fig. 5.5): Transcriptome summary for static & temporal comparison: A. Transcriptome summary of the comparison between P21-Nrl-WT and

215

P21-Nrl-KO (Static comparison). The white box represents all genes examined for this study. The venn diagram within it represents the different bins. B & C. Transcriptome summary of the comparison between P0 vs. P21-Nrl-WT and P0 vs. P21-Nrl-KO, respectively (Temporal comparison). The white box represents all genes examined for this study. The venn diagram within it represents the different bins.

216

Supplementary tables

Legends:

Additional file 2: Table S1: Output of Binning in E16CE – P0CE and P0CE –P0NE comparisons: Custom bioinformatic pipeline and binning of E16CE - P0CE and P0CE-

P0NE comparisons at the gene level (S1.1, S1.3) and isoform level (S1.2, S1.4) as discussed in the strategy in Fig. 1C.

Additional file 3: Table S2: DAVID analysis output for temporal comparisons:

DAVID output for genes belonging to bins in E16CE-P0CE comparison (S2.1), P0CE-

P0NE comparison (S2.2), P0 vs. P21-Nrl-WT comparison (S2.3) and P0 vs. P21-Nrl-KO comparison (S2.4).

Additional file 4: Table S3: Clusters obtained in the microarray analysis: Clusters generated through K-means clustering in Genesis for the microarray.

Additional file 5: Table S4: Output of Binning in P0 – P21WT and P0 – P21KO comparisons: Custom bioinformatics pipeline and binning of P0 (P0CE+P0NE) vs.

P21WT-rep1, P0 vs. P21WT-rep2 and P0 vs. P21-WT-rep3 comparisons (S4.1) and P0

(P0CE+P0NE) vs. P21KO-rep1, P0 vs. P21KO-rep2 and P0 vs. P21-KO-rep3 comparisons (S4.2).

Additional file 6: Table S5: DE based analysis results for static and temporal comparisons: DE based analyses for static (P21WT vs. P21KO) (S5.1) and temporal

(P0 vs. P21WT and P0 vs. P21KO) comparisons (S5.2, S5.3).

Additional file 7 – Table S6: DAVID analysis output of DE based analyses for

217 static and temporal comparisons: Results of DAVID analysis based on DE genes for

P0 vs. P21WT (S6.1), P0 vs. P21KO (S6.2), and P21WT vs. P21KO (S6.3) comparisons.

218