Quick viewing(Text Mode)

Cell Cycle Regulation in the Florida Red Tide Dinoflagellate, Karenia Brevis

Cell Cycle Regulation in the Florida Red Tide Dinoflagellate, Karenia Brevis

Medical University of South Carolina MEDICA

MUSC Theses and Dissertations

2011

Cell Cycle Regulation in the Florida , brevis

Stephanie Alexandra Brunelle Medical University of South Carolina

Follow this and additional works at: https://medica-musc.researchcommons.org/theses

Recommended Citation Brunelle, Stephanie Alexandra, " Cycle Regulation in the Florida Red Tide Dinoflagellate, " (2011). MUSC Theses and Dissertations. 176. https://medica-musc.researchcommons.org/theses/176

This Dissertation is brought to you for free and open access by MEDICA. It has been accepted for inclusion in MUSC Theses and Dissertations by an authorized administrator of MEDICA. For more information, please contact [email protected]. Regulation in the Florida Red Tide Dinoflagellate, Karenia brevis

By

Stephanie Alexandra Brunelle

A dissertation submitted to the faculty of the Medical University of South Carolina in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the College of Graduate Studies.

Department of Molecular and Cellular Biology and Pathobiology

2011

Approved by:

Chairman, Advisory Committee TABLE OF CONTENTS

ACKNOWLEDGMENTS ...... i

LIST OF FIGURES ...... ii

LIST OF TABLES ...... v

LIST OF ABBREVIATIONS ...... vi

ABSTRACT ...... vii

CHAPTER 1: Introduction ...... l

CHAPTER 2: Global protein and translation patterns in Karenia brevis over the diel cycle ...... 13

Introduction ...... 14

Methods ...... 20

Results ...... 29

Discussion ...... 36

CHAPTER 3: Characterization and regulation ofS-phase and proteins in K. brevis

Introduction ...... 65

Methods ...... 70

Results ...... 79

Discussion ...... 88

CHAPTER 4: S-phase Expression of PCNA: Possible Post-translational Modifications,

CDK Dependence, and its Potential Utility as a Biomarker of Growth Status in Karenia brevis Red Tides ...... 118

Introduction ...... 119

Methods ...... 122 Results ...... ' ...... 130

Discussion ...... 139

CHAPTER 5: Conclusions, Significance, and Future Directions ...... 179

REFERENCES ...... 191

APPENDIX 1 ...... 210

APPENDIX 2 ...... 217 ACKNOWLEDGMENTS

I would like to extend a heart felt thank you to my mentor, Dr. Fran Van Dolah.

Throughout this endeavor, she provided wonderful support and training. I would also like to thank my committee members who provided advice and shared knowledge that made this project successful: Dr. Michael Janech, Dr. Lauren Ball, Dr. David Kurtz and Dr.

Paul McDermott. The members of the Van Dolah laboratory were immensely helpful and were not only wonderful colleagues but dear friends as well: Jeanine Morey, Emily

Monroe, Jillian Johnson, Mackenzie Zippay and Peter Feltman. I would also like to thank the members of the Marine Biotoxins Program including Tod Leighfield and the Marine

Biomedicine and Environmental Sciences students. Lastly, I need to thank my family,

Alyce and Roger Brunelle who provided the love and encouragement I required to complete this PhD program.

1 LIST OF FIGURES

CHAPTER 1

FIGURE 1: Tree of life ...... 12

CHAPTER 2

FIGURE 1: Polysome profile of L. polyedrum over circadian time ...... 47

FIGURE 2: Polysome profile of K. brevis cultures during the light and dark phase ...... 48

FIGURE 3: Flow cytometry over the K. brevis cell cycle: polysome experiment...... 49

FIGURE 4: Analysis of translational activity over the cell cycle/diel cycle of K. brevis:

polysome fractionation ...... 50

FIGURE 5: Two-dimensional gels of K. brevis proteins: extraction optimization ...... 52

FIGURE 6: Flow cytometry over the K. brevis cell cycle: 20 study ...... 53

FIGURE 7: 20 gels of K. brevis proteins over the cell cycle ...... 54

FIGURE 8: Flow cytometry over the K. brevis cell cycle: OIGE study ...... 55

FIGURE 9: 20 gels: resolved proteins labeled with Cy2, Cy3 and Cy5 ...... 56

FIGURE 10: Significantly changing proteins as determined by OIGE in K. brevis ...... 57

FIGURE 11: Scans of experimental gels: resolved proteins stained with SYPRO

Ruby ...... 60

FIGURE 12: Trends of significantly changing proteins in K. brevis ...... 61

FIGURE 13: 20E gels of K. brevis nuclear proteins ...... 62

CHAPTER 3

FIGURE 1: Multiple Alignments of the sequences of S-phase proteins ...... 95

FIGURE 2: Phylogenetic tree ofPCNA ...... 102

FIGURE 3: Three-dimensional protein models of K. brevis PCNA ...... 104

11 FIGURE 4: Spliced leader sequences on S-phase transcripts ...... 105

FIGURE 5: Flow cytometry over the cell cycle ...... 106

FIGURE 6: qPCR analysis of S-phase transcripts over the cell cycle in K. brevis ...... l 07

FIGURE 7: Western blot analysis ofS-pha~e over the cell cycle in K. brevis ...... 108

FIGURE 8: Peptide competition assay of anti-K. brevis PCNA ...... 109

FIGURE 9: Immunolocalization of PCNA ...... 110

FIGURE 10: Flow cytometry of the cell population after an extended dark phase ...... 112

CHAPTER 4

FIGURE 1: Multiple alignments of the SUMO and ubiquitin polypeptides ...... 157

FIGURE 2: K. brevis PCNA amino acid sequence: analysis of potential sumoylation and ubiquitin sites ...... 158

FIGURE 3: The ubiquitin and SUMO proteins in K. brevis ...... 159

FIGURE 4: Immunoprecipitation of K. brevis PCNA ...... 160

FIGURE 5: Immunoprecipitation of K. brevis PCNA, western blot with anti-SUMO antibody ...... 161

FIGURE 6: Growth curve and flow cytometry data of K. brevis in 125mL flasks ..... 162

FIGURE 7: Western blot analysis ofPCNA over the K. brevis growth curve ...... 163

FIGURE 8: Western blot analysis ofPCNA in log and stationary phase cultures ...... 164

FIGURE 9: Pattern of ubi quit in at ion in K. brevis stationary phase cultures ...... 166

FIGURE 10: Full length nucleotide sequence of a cyclin-dependent kinase in K. brevis ...... 167

FIGURE 11: Amino acid sequence of full length K. brevis cyclin-dependent kinase .. 168

FIGURE 12: Partial alignment of K. brevis CDKs: cyclin binding motif...... 169

111 FIGURE 13: Cyclin-dependent kinase inhibitors (CDKIs), olomoucine and fascaplysin ...... 170

FIGURE 14: SYTOX staining of K. brevis ...... 171

FIGURE 15: PCNA protein abundance throughout S-phase ...... 172

FIGURE 16: PCNA protein abundance at LDT12 after inhibition with CDKIs ...... 173

FIGURE 17: Linear correlation of % S-phase and PCNA protein abundance ...... 174

IV LIST OF TABLES

CHAPTER 2

Table 1: Proteins found to significantly change over the diel cycle using DIGE ...... 63

Table 2: Nuclear protein identifications using tandem MS (LTQ) ...... 64

CHAPTER 3

Table 1: Primers used for qPCR and SL - specific PCR ...... 114

Table 2: Peptides used for antibody development...... 115

Table 3: Prevalence and diversity of selected S-phase specific genes represented in the K. brevis library ...... 116

Table 4: Amino acid polymorphisms in full length PCNA sequences from K. brevis ... 117

CHAPTER 4

Table 1: Sequences from K . brevis with homology to cyclin dependent kinases and cyclins ...... 175

Table 2: Fascaplysin range finder: Flow cytometry and SYTOX results ...... 176

Table 3: Olomoucine range finder: Flow cytometry and SYTOX results ...... 177

Table 4: Cyclin Dependent Kinase Inhibitor study: Flow cytometry and SYTOX results ...... 178

v LIST OF ABBREVIATIONS

2-0E two dimensional electrophoresis

CAK cyclin-dependent kinase activating kinase

COK cyclin dependent kinase

COKI cyclin dependent kinase inhibitor

OIGE difference in-gel electrophoresis elF eukaryotic translation initiation factor

EST expressed sequence tag

HAB

MALOI TOF matrix-assisted laser desorption/ionization time of flight

LTQ linear trap quadrupole

PCNA proliferating cell nuclear antigen

PCR polymerase chain reaction pRb retinoblastoma protein qPCR quantitative polymerase chain reaction

RFC replication factor C

RP restriction point

RPA replication protein A

RnR ribonucleotide reductase

SL spliced leader

SUMO small ubiquitin-like modifier

VI ABSTRACT

The dinoflagellate, Karenia brevis, produces hannful algal blooms in the Gulf of

Mexico that cause extensive marine animal mortalities and human illness nearly annually.

The molecular mechanisms controlling cell cycle entry in this dinoflagellate are important because bloom development occurs through asexual cell division. The cell cycle of K. brevis is phased to the diel cycle such that any cells entering the cell cycle on a given day do so synchronously. However, a microarray study of over a diellcell cycle found no change in cell cycle transcript levels in actively dividing populations, including genes that code for replication fork proteins, typically activated in by transcription at the G liS transition. There are many examples of post­ transcriptional regulation of physiological processes in and this suggests that cell cycle progression may also be regulated post-transcriptionally. The studies presented in this dissertation address the question of whether the cell cycle in K. brevis is regulated at the level of translation. We first characterized the global pattern of translation activity over the diel cycle using polysome profiling. We found that, unlike

Lingulodinium polyedrum, a dinoflagellate widely used to study diurnal rhythms, translation was not limited to the dark phase. We next used a 2-DE gel proteomic approach to identify diel patterns of global and specific changes in protein expression, with particular interest in cell cycle proteins. Although a number of proteins displayed diel patterns of expression, we were unable to characterize the behavior of cell cycle proteins using this method. We therefore selected four DNA replication fork genes

(PCNA, Replication factor C, Replication protein A, and Ribonucleotide reductase 2 ) that are classically regulated via transcription at the G liS phase transition, and

Vll characterized their expression by qPCR, western blotting and immunolocalization.

Sequence analysis revealed a 5' spliced leader sequence on each of these transcripts, indicative that they are likely under post-transcriptional control, and qRT -PCR confirmed that their transcript levels were indeed unchanged over the cell cycle. Sequence analysis and protein modeling were next used to develop peptide antibodies for western blotting to investigate their protein abundance over the cell cycle. The K. brevis replication fork proteins, PCNA, RFC, RPA and RnR2 were shown to change over the cell cycle with highest expression at S-phase, suggesting translational control. Furthermore, a post­ translational modification appeared to occur on the K. brevis PCNA protein in S-phase cells, resulting in a --9 kDa increase in size. We hypothesize that this represents modification by either SUMO or ubiquitin, two post-translational modifications observed on PCNA in other eukaryotes during S-phase. The occurrence of this post-translational modification coincided with a shift in nuclear location of PCNA from peripheral to bound, as determined by immunolocalization. Since the expression of replication fork proteins appears to change over the cell cycle without changes in transcription, CDKs were investigated to begin to understand how these drivers of the cell cycle may be controlling S-phase entry at the translational level. Both the CDK4- specific inhibitor, fascaplysin, and the pan-CDK inhibitor, olomoucine, inhibited progression through the G liS transition as well as the production of the PCNA protein.

These results lead us to propose the dinoflagellate utilizes CDK-dependent translational control of cell cycle entry as opposed to transcriptional control which is seen in most eukaryotes.

VIlt Chapter 1: Introduction

1 Dinoflagellates are ancient, unicellular, eukaryotic . Dinoflagellates, along with and apicomplexans, are members of the superphylum Alveolata. To place the unusual molecular characteristics of K. brevis in an evolutionary context, it is useful to note that most of the eukaryotes on the planet are microbial, in terms of abundance and evolutionary diversity, and make up the vast majority of the phylogenetic tree of eukaryotes (Lukes et aI., 2009). The position of dinoflagellates in the tree of life (Figure

1), shows that they have a long evolutionary history, of900 million years during which they have solved biological problems in remarkable ways that often differ from the models of eukaryotic cell biology (Lukes et aI., 2009). Dinoflagellates possess an unusually large amount of DNA in their haploid , up to 1 x 10 12 bp, or ",6 times the human (Taylor, 1987). K. brevis has a genome of approximately 1 x 1011 bp contained within 121 chromosomes (Lidie et aI., 2005 and references therein).

Dinoflagellates represent the only eukaryotic phylum that lacks that typically aid in chromatin packaging. Their DNA is fully eukaryotic in sequence organization with large amounts of repetitive DNA(Minguez et aI., 1994), but is unusual in that many genes occur in large arrays of tandem repeated copies. It is thought that the high copy number of genes and expansion of the genome is due to gene or whole genome duplication events as well as possible retroposition of cDNA back into the genome (Lin,

2011). Also, dinoflagellates genomes have undergone extensive horizontal gene transfer

2 which is due to multiple endosymbiotic events where a dinoflagellate ancestor engulfed both a prokaryote (genes transferred from plastid to nucleus) and eukaryotes (nucleus to nucleus gene transfer) (Nosenko and Bhattacharya, 2004; Hackett et aI., 2004;

Andersson, 2005). To date, unlike other eukaryotes with large genomes, few introns have been identified in dinoflagellate genes. The chromosomes of dinoflagellates exist in a liquid crystal state in which a core is tightly condensedand thought to be made up on non­ coding sequence whileperipheral arches of exposed DNA are thought to be areas undergoing active transcription (Costas and Goyanes, 2005). The large sizes of dinoflagellates genomes have to date made whole genome sequencing impractical.

However, genes for many conserved cellular processes have been found through dinoflagellate EST sequencing projects ( carterae and Lingulodinuim polyedrum, Bachvaroff et aI., 2004; K. brevis, Lidie et aI., 2005; Alexandrium tamarense,

Hacket et aI., 2005; , Leggat et aI., 2007).

The majority of dinoflagellates are marine phytoplankton that are important contributors to the base of the foodweb and can grow to high abundance or "blooms".

The blooms of certain dinoflagellates are termed harmful algal blooms (HABs) because potent toxins produced by these , when present at high concentrations, have adverse effects on human and ecosystem health. The unarmored, fucoxanthin-containing dinoflagellate, Karenia brevis, produces the lipophilic neurotoxins known as andis responsible for causing red tides in the Gulf of Mexico, particularly off the west coast of Florida. Both humans and marine fauna are detrimentally affected by the ingestion and inhalation of brevetox ins produced by K. brevis. In Florida, red tides cost an estimated US$ 20 million in remediation and lost tourism revenue annually (Anderson 3 et aI., 2000). Although toxic red tides have been observed in Florida since the 1840s, the

occurrence of these HABs appears to have increased in frequency and extent over the past quarter century (Hallegraeff, 1993; Van Dolah, 2000; Anderson et ai., 2002).

Therefore, understanding the mechanisms that control the proliferation of dinoflagellates

is critical in order to develop the capacity to predict the occurrence, monitor the

development, and mitigate the impacts of blooms.

The development of a harmful algal bloom can be divided into four sequential

stages: initiation, growth, maintenance and termination (Steidinger et ai., 1998). K.

brevis is found in background concentrations (1-1000 cells liter-I) throughout the Gulf of

Mexico (Tester and Steidinger 1997; Steidinger et ai., 1998). Initial growth of blooms

occurs offshore at depth where K. brevis can outcompete faster growing phytoplankton because of its low light adaptation and its ability to swim directionally to acquire

nutrients near the bottom, including inorganic as well as organic sources of nitrogen and

phosphorus (Sinclair et aI., 2009; Stumpf et ai., 2008). Recent studies on circulation

patterns in the Gulf of Mexico suggest that K. brevis blooms may initiate by the transport

of nutrients from the northern Gulf of Mexico onto the West Florida Shelf in the summer

via circulation driven by normal summer northward winds and the Mississippi River is a

source of these nutrients at depth (Stumpf et ai., 2008). Another hypothesis is that

nutrient input from large scale processes such as dust storms originating in the Sahara

desert is involved in bloom initiation (Steidinger et aI., 1998; Walsh and Steidinger,

2001). Once initiated, K. brevis exhibits a low division rate, averaging 0.3 div/day in both

field and laboratory populations (Heil et aI., 2001; Magana and Villareal, 2006;

Richardson, 2000; Richarson et aI., 2006; Redalje et ai., 2008). K. brevis is able to adapt 4 to a broad range of light intensities (Evans et aI., 2001; Schaffer et aI., 2007) and aggregated cells can occur in daylight where nutrients from coastal water egress or fish kills can maintain populations in near surface waters (Walsh et aI., 2004). Lab studies have shown that K. brevis actively seeks nutrients and these findings support the hypothesis that offshore, near-bottom aggregates of K. brevis cells may be the seed populations for bloom initiation, since upwelling and winds transport these populations toward the coast to form near surface blooms (Janowitz and Kamykowski, 2006).

Although extensive research has gone into understanding the oceanographic conditions that permit the initiation and dispersal of K. brevis blooms, the molecular mechanisms controlling cell proliferation in this dinoflagellate remain poorly understood.

Since growth of K. brevis results almost entirely from vegetative cell division, cell cycle entry is a key regulator of bloom progression. Therefore in order to understand factors affecting the growth of these blooms and to potentially develop a growth specific biomarker it is necessary to understand the cell cycle of this organism.

The Influence of Photoperiod and Circadian Rhythms in Dinoflagellate Biology

Many dinoflagellate cellular and molecular processes are not only adapted to the photoperiod, but display circadian entrainment. Numerous studies in dinoflagellates have shown that the cell cycle is linked to the photoperiod (Sweeney and Hastings, 1958;

Olsen and Chisholm, 1986; Homma and Hastings, 1989; Roenneberg, 1996; Taroncher­

Oldenburg, 1997; Van Dolah and Leighfield, 1999; Leighfield and Van Dolah, 2001,

Litaker et aI., 2002; Dagenais-Bellefuielle et aI., 2008), resulting in synchronous cell cycle progression in populations dividing at a rate of one division per day. In "phased" 5 populations, cell division rates are

Lingulodiniumpolyedrum (formerly polyedra), cell division is restricted to the early hours of the subjective morning (Roenneberg, 1995). In Amphidinium operculalum,

S-phase begins 10 hours after the onset of light and begins 14-16 hours after this cue (Leigh field and Van Dolah, 2001). A. operculalum is dawn cued, such that a delay in the dark/light transition results in an equivalent delay in cell cycle phases. In contrast,

Gambierdiscus loxicus was found to be dusk cued, with mitosis occurring 6 hours after the onset of the dark phase (Van Dolah et aI., 1995).In K. brevis, the dark to light transition serves to entrain the cell cycle, with S-phase beginning approximately 6 hours after onset of light, as determined by flow cytometry (Van Dolah and Leighfield, 1999).

When placed in continuous light K. brevis continues to proceed through the cell cycle in a phased manner and the amount of the cell population progressing through the cell cycle increases to 70%. This formally demonstrates that the cell cycle is under control of a circadian rhythm (Brunelle et aI., 2007).Although the cell cycle behavior of K. brevis has been characterized, the molecular mechanisms controlling cell cycle progression and the genes and proteins involved in these processes are not as well described and warrant further investigation.

The Dinoflagellate Cell Cycle

Dinoflagellates have a eukaryotic cell cycle consisting of a G 1 phase, characterized by growth, S-phase, consisting of DNA replication, a G2 phase, also characterized by growth as well as editing of DNA, and mitosis, or cell division (Vaulot 6 et aI., 1986; Van Dolah et aI., 2007, Brunelle et aI., 2007). However, they have acquired some unique characteristicssuch as permanently condensed chromosomes, a that does not break down during mitosis, and a mitotic spindle that is extranuclear (Taylor, 1987), that suggest that there may be unique adaptations to some of the classical eukaryotic cell cycle regulatory processes. Nonetheless, many conserved cell cycle genes appear to be present in dinoflagellates, based on EST library analysis, including the central cell cycle regulatory complexes made up of cyclins and cyclin dependent kinases (CDKs).

CDKs regulate two checkpoints in the eukaryotic cell cycle, the G liS transition and the entry into mitosis. Differential transcript abundance of cell cycle genes is centrally driven by the phosphorylation activity of CDK at the G liS and G2/M checkpoints. At S-phase entry, CDKs are activated by the binding of newly expressed cyclin, upon which both proteins are phosphorylated. The activated CDK/cyclin complex then initiates DNA synthesis by phosphorylating the retinoblastoma protein (pRb). The pRb represses gene transcription, required for the G 1 to S phase transition, by binding to the transactivation of the E2F transcription factor (Giacinti and Giordano, 2006).

In late G 1, when the CDKlcyclin complex phosphorylates pRb, pRb releases E2F, allowing the expression of genes that encode the proteins necessary for S-phase progression. Surprisingly, a microarray study of the global transcriptome in K. brevis over the cell cycle reported the absence of any change in the transcript levels of either S­ phase or M-phase specific cell cycle genes (Van Dolah et aI., 2007). Further evidence of post-transcriptional regulation of cell cycle genes in K. brevis comes from PCR data, which shows that the spliced leader sequence is present in all cell cycle genes examined 7 to date. Additionally, no members of E2F transcription factor family have been identified in dinoflagellates to date. We therefore hypothesize that the cell cycle of K. brevis is controlled at the translational or post-translational level.

Post-transcriptional Control of Gene Expression in Dinoflagellates

Processes typically under transcriptional control in most eukaryotes are not in

K.brevis, including stress response genes and cell cycle genes, as demonstrated in microarray studies (Van Dolah et aI., 2007; Lidie, 2007). A possible explanation for predominance of translational control in K. brevis came from a study by Lidie and Van

Dolah (2007) that identified an identical 22 bp5' spliced leader sequence on a diverse K. brevis mRNAs, indicative of a trans-splicing mechanism. The spliced leader trans­ splicing mechanism was first characterized in trypanosomes (Murphy, Atkins and

Agabian, 1986) which are kinetoplastid parasites only distantly related to dinoflagellates.

In trypanosomes, transcription is constitutive and generates polycistronic messages.

These polycistronic transcripts are processed to mature messages when the mRNA splicing machinery binds to polypyrimidine stretches located in the intergenic regions of the primary transcripts. Trans-splicing of a 37 nt spliced leader results in these regions being cleaved and results simultaneously in capping and polyadenylation (Vanhamme and Pays, 1995). It is during and after these mRNA processing steps that the expression of individual genes is controlled. Through this process, as well as differential rates of translation and RNA degradation, rapid changes in protein levels can occur (Archer et aI.,

2011). All mRNAs from trypanosomes are substrates for the trans-splicing reaction.

Since its initial discovery in trypanosomes, a variety of protists and some metazoans

8 display smaller,varying percentages ofmRNAs that are processed throughtrans-splicing

(Davis, 1996). Zhang et aI., 2007, have reported that this spliced leader (SL) sequence is common to all dinoflagellates, and present on most ifnot all nuclear-encoded mRNAs.

Thus it appears that post-transcriptional control plays a major role in dinoflagellate gene expreSSIon.

Translational Control in Dinoflagellates

Although there are steps between transcription and translation that are possible levels of control, such as differential mRNA stability, there are numerous examples of translational control of specific proteins in dinoflagellates, where mRNA levels are constant but protein levels change. In L. polyedrum, luciferin-binding protein (Mittag et aI., 1994), NADP-dependent isocitrate dehydrogenase (Akimoto et aI., 2005) and

GAPDH (Fagan et aI., 1999) are all controlled at the translational level. These processes are all under circadian control. Iron-superoxide dismutase (Okamoto et aI., 2001) and nitrate reductase (Ramalho et aI., 1995) were also shown to be regulated at the translational level by a circadian rhythm in L. polyedrum. To better understand the dynamics of translational regulation over the circadian cycle, studies have been carried out investigating circadian rhythms duringtranslation. In L. polyedrum, the bulk production of polysomes was shown to be under circadian control with bound polysomes and protein synthesis occurring mainly during the subjective night when placed in constant dim light (Schroder-Lorenz and Rensing, 1987). The presence of this rhythm in

L. polyedrumfurther complicates understanding how specific cell cycle proteins might be regulated in a timely manner that does not coincide with circadian controlled pattern of nighttime bulk translation. 9 The studies presented in this dissertation were undertaken to further our understanding of cell cycle regulation in the Florida red tide dinoflagellate, Karenia brevis by addressing three specific aims:

1. To characterize the extent and patterns of changes in the protein expression and

translation over the diel cycle.

2. To characterize S-phase specific genes and determine if their proteins are

regulated at the level of translation.

3. To identify possible post-translational modifications ofPCNA observed in S­

phase,determine if the expression ofPCNA is CDK dependent, and examine its

utility as a biomarker for growth in K. brevis blooms.

To better understand the extent to which gene expression is regulated at the level of protein synthesis in K. brevis,Chapter 2 first investigated circadian patterns of protein translation using polysome profiling. A proteomic approach was then used to examine global changes in protein abundances over the diel cycle and nuclear proteomic methods were then optimized to improve the study of cell cycle proteins in the future. Chapter 3 then investigates the expression and behavior of specific S-phase proteins involved in the

DNA replication fork, using western blotting and immunolocalization. These studies revealed that these S-phase proteins are regulated at the level of translation, as opposed to transcriptional control found in most eukaryotes, and identified some apparent post­ translational modifications to PCNA. The studies in Chapter 4 focus on PCNA, to determine the CDK dependence of this S-phase protein. Chapter 4 also explores post- 10 translational modification mechanisms, ubiquitin and SUMO, identified for the first time

in a dinoflagellate, that may be responsible for changes in localization and molecular weight of K. brevis PCNA during S-phase. Lastly, Chapter 4 demonstrates the potential

ofPCNA as to be employed as a biomarker for the growth status of toxic blooms of K.

brevis. Finally, Chapter 5 provides a synthesis of the above studies, conclusions that can be drawn from the results, and identifies future directions for this research.

11 Figure 1: Tree of life showing the placement of the dinoflagellates within the eukaryotic crown group Alveolata. This group includes the dinoflagellates (outlined in red), ciliates and . Adapted from Baldauf, 2003.

Tree of Life

12 Chapter 2: Global protein and translation patterns in Karenia brevis over the diel

cycle

13 INTRODUCTION

Many physiological processes appear to be under post-transcriptional control in dinoflagellates. The best studied examples to date include proteins shown to be under circadian control, in particular those involved in bioluminescence in the model dinoflagellate, Lingulodinium polyedrum. In L. polyedrum, circadian expression of genes involved in bioluminescence, such as luciferin-binding protein and luciferase is controlled by changes in rates of protein translation while mRNA levels remain constant

(Morse et aI., 1989; Mittag et aI., 1998). Differential expression of luciferin-binding protein and luciferase over circadian time is controlled by the RNA-binding protein,

CCTR (circadian-controlled translational repressor), which binds to the 3 'untranslated region (UTR) and prevents translation (Mittag 1996). The binding of CCTR to RNA is under circadian control and its activity is highest during the light phase. During the dark phase, CCTR releases the mRNAs of luciferin-binding protein and luciferase, allowing for translation to occur. Other genes under circadian, translational control in dinoflagellates include GAPDH (Fagan et aI., 1999), peridinin-chl a-binding protein (Le et aI., 2001), and superoxide dismutase (Okamoto et aI., 2001). These proteins show highest abundance during the day. Therefore, it appears that different proteins in dinoflagellates have distinct circadian patterns.

Circadian Rhythms

Circadian rhythms are defined as oscillations in biochemical, physiological and behavioral functions of an organism with a periodicity of approximately 1 day (Sancar et 14 aI., 2000). Evolutionarily, circadian rhythms confer adaptive advantage, allowing organisms to anticipate and prepare for environmental changes (Hastings, 2003) and this has been demonstrated in some organisms, such as cyanobacteria (Ouyang et aI., 1998) and ground squirrels (DeCoursey et aI., 1997). Unicellular, photosynthetic, marine organisms were most likely the first to evolve a circadian clock, utilizing the 24-hour periodicity of their early environment to optimize their use of energy resources and minimize their exposure to damaging UV irradiation (Roenneberg et aI., 1997).

Circadian oscillations function at the cellular level, even in multicellular organisms, where oscillations in one cell type may differ from those in another (Roenneberg and

Mittag, 1996). There are three main components of the circadian clock: a central oscillator that regulates the 24-hour periodicity, an input pathway that entrains the oscillator to environmental cues such as light, and an output pathway that couples the oscillator to circadian responses (Christie and Briggs, 2001). The photoperiod is the main external signal, or zeitgeber (German = time giver), in setting circadian rhythms

(Roenneberg et aI., 1997). Although light is what entrains the rhythm, a rhythm is identified as circadian if it continues in the absence of the light cue or other external cues such as temperature.

A number of intracellular processes have been shown to have a circadian pattern.

In plants, many processes that are controlled by the circadian clock include gene

2 expression, Ca + levels, and some activities (Somers, 1999). Circadian regulation of gene expression may occur at the level of transcription, transcript abundance, translation and/or posttranslational processing (Somers, 1999). A microarray study of showed that expression of approximately 2% of genes 15 cycled with a circadian rhythm (Schaffer et aI., 2001). Circadian rhythms regulate bioluminescence, cell aggregation, photosynthesis and cell division in dinoflagellates.

Similar to other systems, dinoflagellate RNA and protein synthesisare required for the maintenance ofcircadian rhythms and protein phosphorylationevents playa role in determiningperiod length (Wijnen and Young, 2006). Microarray analysis of circadian gene expression in dinoflagellates Pyrocistis lunula and K. brevis showed approximately

2 and 3% of genes changing, respectively, over 24 h in continuous light (Okamoto and

Hastings, 2003; Van Dolah et aI., 2007). However, the expression patterns did not reveal any concerted changes in transcript levels associated with many known circadian processes. Since all circadian regulated genes studied to date are under translational control in dinoflagellates, it may be that these organisms use translational control for all circadian regulated functions. The degree to which dinoflagellates use translational control in the cell overall is not known. Until recently, it was accepted that circadian rhythms in most eukaryotes are transcriptionally controlled. However, it was recently found that in the eukaryotic green alga Ostreococcus tauri, circadian rhythms persisted in the absence of transcription (O'Neill et aI., 2011). The recent discovery of a spliced leader sequence in dinoflagellates (Lidie and Van Dolah, 2007; Zhang et aI., 2007) which is used in trypanosomes for post-transcriptional regulation of gene expression also leads to the idea that dinoflagellates may be using post-transcriptional control as well.

The Central Role of the Photoperiod in Dinoflagellate Physiology

The diel cycle plays a key role in many aspects of dinoflagellate physiology, including photosynthesis, vertical migration behavior, and bioluminescence. Cells ofK. 16 brevis are positively phototactic and negatively geotactic (Kamykowski et aI., 1998) which results in their concentration in the upper water column during the day where they can access light for photosynthesis. Therefore, the photoperiod has a strong influence on

K. brevis' swimming behavior. In addition, it has been shown that cell division is regulated by the diel cycle in many dinoflagellates (Sweeney and Hastings, 1958; Olsen and Chisholm, 1986; Homma and Hastings, 1989; Roenneberg, 1996; Taroncher­

Oldenburg, 1997; Van Dolah and Leighfield, 1999; Leighfield and Van Dolah, 2001,

Litaker et ai., 2002; Dagenais-Bellefuielle et aI., 2008). The dinoflagellate has a eukaryotic cell cycle consisting of a G 1 phase, characterized by growth, S-phase, consisting of DNA replication, a G2 phase, in which cells edit replicated DNA, and mitosis, or cell division (Vaulot et aI., 1986; Partensky and Vaulot, 1991; Van Dolah et aI., 2007; Brunelle et aI., 2007). In K. brevis, the dark-to-light transition serves to entrain the cell cycle, with S-phase beginning approximately 6 hours after onset of light (Van

Dolah and Leighfield, 1999). The persistence of this rhythm in constant light demonstrated that cell cycle progression is under control of a circadian rhythm, which is in tum entrained to the diel cycle (Brunelle et aI., 2007).

A global transcriptome analysis in K. brevis using a microarray found approximately 10% of genes changing with respect to light/dark cycle. However, many genes involved in diel-dependent processes, including cell cycle genes, did not change mRNA levels in a diel-dependent fashion (Van Dolah et aI., 2007). This raised the question as to whether the cell cycle, like other circadian regulated processes in dinoflagellates, is regulated post-transcriptionally and whether a change in protein

17 expression over the cellidiel cycle might be observable using a parallel proteomic approach.

A complicating aspect to using proteomic analysis of the cell cycle is the observation in the dinoflagellate, L. polyedrum, of a diurnal rhythm of translation. In this species, polysome analysis demonstrated that the bulk of messages are translated during the dark (Schroder-Lorenz and Rensing, 1987). Little actively translated, polysome­ associated RNA was present during the subjective day, and the relative proportion of messages associated with polysomes increased to reach a maximum during the subjective night period, which corresponds to G2 phase of the cell cycle (Figure 1). This corresponds with the time when the rate of protein synthesis reaches its maximum as measured in a cell-free extracts by in vitro protein synthesis using

[35S]methionine(Schroder- Lorenz and Rensing, 1987).

The studies in herein were undertaken to better understand the extent to which protein levels in Karenia brevis vary over the over the diel cycle, and in particular, whether cell cycle proteins change in expression. Because there is currently no data on translation patterns in K. brevis, we first sought to determine if a circadian rhythm controls the bulk translation of messages over the diel cycle as in Lingulodinium. The polysome analysis indicates that bulk translation does not change over the diel cycle.

However, this does not preclude the likelihood that specific proteins needed for diel dependent processes are differentially expressed. A proteomic approach was therefore taken to assess diel changes in global protein expression and to potentially identify proteins that were differentially expressed. Few proteomic studies have been attempted to date in dinoflagellates because the lack of genome sequence information limits the 18 ability to make accurate identifications of peptides (Akimoto et aI., 2004; Chan et aI.,

2004; Chan et aI., 2005). With the recent availability of a K. brevis cDNA library containing approximately 22,671 unique genes to provide a database of K. brevis specific amino acid sequences (Lidie et aI., 2005), a proteomic approach for investigations into this organism became feasible for the first time. Protein extraction methods were first optimized for a two dimensional gel electrophoresis proteomic protocol for K. brevis.

This method was then adapted for difference in-gel electrophoresis (DIGE) analysis for quantitative comparison between timepoints collected within a diel cycle. DIGE has been shown to broaden the dynamic range of analysis by detecting low abundance proteins without saturating the highly abundant ones and provides quantitative results

(Renaut et aI., 2006).

19 METHODS

Cultures and Culturing Conditions

K. brevis (Wilson isolate) was grown in 0.22 Jlm filtered seawater (36%0) obtained from the seawater system at the Florida Institute of Technology field station, Vero Beach,

Florida, enriched with j/2 nutrients (Guillard, 1973), modified with the use of ferric sequestrene in place ofEDTA'Na2 and FeCl3 ·6H20 and the addition ofO.OlJlM selenous acid. Cultures were maintained in 1 liter glass bottles at 2SoC + 1°C with a 16:8 lightdark cycle, and cool white light at SO-60 flmol photons m-2 sec-I, determined using a

LI-COR LI-2S0 Light Meter (LI-COR BioSciences, Lincoln, NE). Experimental cultures of K. brevis were grown to mid log phase under nutrient replete conditions in order to maximize the cell division rate, which has previously been established at ",O.S div/day.

Cultures for Polysome Analysis

Triplicate one liter cultures were harvested on culture day 7, during log phase growth. Cells were sampled at LDT2, 6, 10, 14, 18 and 22, where LDT refers to light­ dark-time, where "lights-on" is LDTO and "lights-off' is at LDT 16. At LDT2 and 6 cells

are in Gl of the cell cycle, LDTI0 and 14 represent S-phase and LDT18 and 22 represent

G2/Mitosis. Cells were harvested by centrifugation at 600 x g for 10 min. and sea water

was removed by aspiration. Cell pellets were then extracted for polysome analysis (see

below).

Cultures for 2-Dimensional Electrophoresis

Cultures for 2-DE analysis were harvested in triplicate during log phase on day 7

at LDT 2, 10 and 18, corresponding with G 1, S phase and G2/Mitosis of the cell cycle,

respectively, for initial proteomic analysis. Cells were harvested by centrifugation at 600 20 x g for 10 min. and sea water was removed by aspiration. Protein was then extracted

from cell pellets (see below).

Cultures for Difference In-Gel Electrophoresis

One liter cultures were harvested in quadruplicate during log phase on day 7 at

LDT2, 8, 14, 20 to increase statistical power. Again, cells were harvested by

centrifugation at 600 x g for 10 min. and sea water was removed. Protein was then

extracted from cell pellets ( see below).

Cultures for Nuclear Isolations

One liter cultures of cells were harvested from mid log phase cultures to study the

nuclear proteome during Gland S-phase. Cells were harvested by centrifugation at 600

x g for 10 min. and sea water was removed. Protein was then extracted from nuclear

pellets as described below. Nuclear proteins from two liters were pooled due to the low protein concentration of lysates.

Cell Cycle Analysis

Samples for flow cytometric analysis were taken fromall experimental cultures to

verify their progressionstatus through the cell cycle. At each experimental time point,

15mL of culture were fixed with 2% gluteraldehyde and stored at 4°C for at least 24

hours. Cells were then centrifuged 10 min at 600 x g. To extract cellular pigments, cells

were resuspended in -20°C methanol for a minimum of 4 hours. The cells were then

centrifuged 10 min. at 2000 x g and DNA was stained by the addition of 1o Jlg/mL propidium iodide (PI, Sigma, St. Louis, Missouri) (final concentration) in 1 mL PBS

containing 10mg/mL RNase (Sigma) and 0.5% Tween 20. Flow cytometry was 21 perfonned on an Epics MXL4 flow cytometer (Beckman Coulter, Miami, Florida).

Multicycle software (Phoenix Flow Systems, San Diego, CA) was used to obtain the percentages of cells in each stage of the cell cycle, or cell cycle distribution.

Polysome profiles

One liter cultures of K. brevis were harvested by centrifugation, 600 x g for 10 min., and cell pellets were resuspended in 1 ml of resus pension buffer (0.2 M Tris, pH

7.5, 2.5M KCI, 100 mM MgCI2, 0.5% (v/v) Triton X-lOa, 1 mM DTT, and 100 mg/ml cycloheximide). The resuspended cells were homogenized by pipettingwith a PIOOO pipetman (Gilson Inc., Washington, GA) on ice and drawing the suspension through a syringe 10 times using a 10 cc syringe with a 22 g needle. Homogenates were centrifuged for 10 min. at 10,000 xg to pellet nuclei, mitochondria, and . To prepare polysomes, supernatants were layered onto 15-50% linear sucrose gradients and centrifuged for 100 min. at 40,000 rpm in an SW-55Ti rotor (Beckman Instruments) at

4°C. Gradients were profiled using an ISeO fractionation system equipped with a flow cell and UV detector (A254). Based on the A254 profiles, absorbance peaks were identified as the 40s and 60s ribosomal RNA, monosome (80s), and polysomes containing increasing numbers of . Confirmation of the identity of the monosome and polysomes was obtained by running an identical gradient in the presence of EDTA, which causes ribosomal runoff and consequently the absence of monosome and polysome peaks.

Protein Extraction 22 A matrix was designed to test multiple homogenization and rehydration

(solubilization) buffers that are typically used in two-dimensional electrophoresis in order to optimize conditions for K. brevis proteins. Urea-detergent based homogenization buffers, Gooley (5.8M urea, 2.3% CHAPS (3-[(3-cholamidopropyl)dimethylammonioJ-l

-propanesulphonate ), 2mM tributyl phosphine and 40mM Tris) and Modified

Whitzman's (9M urea, 4% NP-40, and 1% dithiothreitol (OTT)) were tested as well as trifluoroethanol buffer (11.6mM Tris-HCI, 5.8M urea, 2.3M thiourea, 2.3% CHAPS, 65 mM 1,4-dithioerythritol in a 50:50 mixture oftrifluoroethanol). Other homogenization buffers tested included SDS boil buffer (5% sodium dodecyl sulfate, 5% p­ mercaptoethanol, 10% glycerol, 60 mM Tris, pH 6.8), 1M sucrose in triethanolamine, and a 10% trichloroacetic acid/acetone precipitation. In each case the protein pellet from one

1L culture, containing about 106 cells total, was suspended in the rehydration buffer and applied to the isoelectric focusing strip to run in the first dimension followed by SDS­

PAGE.

Proteins were also extracted using TRIzol® Reagent (Invitrogen, Carlsbad, CA)

(phenollchlorofonn reagents) following the manufacturer's protocol with slight modifications. Four mLs ofTRIzol® Reagent were used per 10 6 cells to ensure removal of pigments which interfere with 2D analysis. RNA was then removed by chloroform extraction, followed by DNA precipitation with ethanol, and subsequently proteins were precipitated with isopropanol. After washing 4 times with 8 mLs of 0.3 M guanidine hydrochloride, protein pellets were dried and resuspended a "DIGE buffer" consisting of

7M urea, 2M thiourea, 4% w/v. CHAPS, 30 mM Tris, pH 8.5.

23 2D Gel Analysis

Following optimization (above), proteins were extracted using TRIzol® Reagent as described above. Protein concentrations were determined using the Bradford (Bio­

Rad, Inc., Hercules, CA) assay. One hundred micrograms of protein were rehydrated in

DIGE buffer with 1M DTT and 0.2% Biolytes (Bio-Rad) for a total volume of 155JlL and isoelectrically focused on 11 cm Bio-Rad ReadyStrip IPG (Immobilized pH Gradient) strips pH 3-10 using a Bio-Rad Protean IEF Cell. The IPG strips were then loaded and run in the second dimension on 8-16% acrylamide Criterion gels (Bio-Rad, Inc . .,

Hercules., CA).

Gels were then fixed with 10% methanol/ 7% acetic acid for 30 minutes at room temperature and stained with SYPRO Ruby (Bio-Rad, Inc., Hercules, CA) protein stain overnight, then destained with 10% methanol! 7% acetic acid for 30 minutes at room temperature before imaging in PDQuest 2-D Analysis software (Bio-Rad Laboratories,

UK). After imaging gels, protein spots were matched using PDQuest software. A G I

(LDT2) sample was chosen as the master gel against which protein abundances on all gels were compared within. A t-test was used in PDQuest to analyze any potential changers relative to LDT2. Significantly changing proteins were then determined using a

I-way ANOVA of spot intensities in Microsoft Excel.

Difference In-Gel Electrophoresis (DIGE)

TRIzolTM Reagent was used to isolate protein from each sample as described above. After extraction, protein pellets were dried and resuspended in 200JlL DIGE labeling buffer (7M urea, 2M thiourea, 4% w/v. CHAPS, 30 mM Tris, pH 8.5). The 24 Bradford protein (Bio-Rad, Inc., Hercules, CA) assay was then used to obtain protein sample concentration.For the CyDye labeling, twenty-five micrograms of whole cell protein lysate from each replicate at each time point were labeled with fluorescent cyanine dyes, Cy5 or Cy3, following the manufacturer's recommended protocols (GE

Healthcare, Piscataway, NJ). A pooled sample containing equal amounts of protein from all samples was used as a normalizer, or internal control run on each gel, which improves gel-to-gel repeatability in differential expression calculation. Two hundred micrograms of the pooled sample was labeled with Cy2 (twenty five micrograms per each of the 8 experimental gels). All protein samples were labeled with 400 pmol of the amine reactive cyanine dyes, freshlydissolved in anhydrous dimethyl formamide. The labeling reaction was incubated on ice in the dark for 30 minutes and the reaction was terminated by addition of 1Onmollysine. Twenty-five micrograms of experimental protein samples labeled with Cy3 and Cy5 were mixed with 25 micrograms of the Cy2 labeled pooled normalizer, and focused on 11 cm Bio-Rad ReadyStrip IPG (Immobilized pH Gradient) strips, pH 4-7, using a Bio-Rad Protean IEF Cell. After isoelectric focusing, the strips were run in the second dimension on 8-16% Criterion acrylamide gels (Bio-Rad

Laboratories, Hercules, CA). Gels were then fixed in 100/0 methanol/ 7% acetic acid for

30 minutes. The gels were imaged using an Fx laser imager equipped with a blue laser.

The image files were imported into DeCyder software (GE Healthcare, Piscataway, NJ) for analysis and the scanned gel images were processed to find all possible protein spots.

A master gel was determined using the software and all gels were matched to the master gel for mobility in both dimensions. Difference In-gel Analysis was used to normalize and quantify protein spots in all gels. Biological Variation Analysis (BVA) was then 25 used for statistical analysis (one-way ANOVA) to test for significantly changing proteins across gels and a t-test for proteins changing between LDT2 and each of the other three time points with a Bonferroni correction. BVA processes multiple gel images, performs gel-to-gel matching of spots, allowing for quantitative comparison of protein expression across multiple gels. Once significantly changing protein spots were determined, all gels were post stained with SYPRO Ruby (Bio-Rad Laboratories) and spots of interest were excised using a Bio-Rad Spot Cutter (Bio-Rad Laboratories., Hercules, CA) for protein isolation.

Gel plugs were washed with 50 mM ammonium bicarbonate/50% acetonitrile, vortexed for 20 min. and sonicated 3 times for 20 min. at room temperature in a water bath. After dehydrating with 1000/0 acetnonitrile for 5 min., gel plugs were digested with

O.lJlg/JlL trypsin (Promega Trypsin Gold) overnight at 37°C to produce peptides. The peptides were then extracted from the gel plugsand washed with 50 mM ammonium bicarbonate and dried using a Savant speed vacuum (Thermo scientific, Waltham, MA).

After the peptide samples were dried, they were resuspended in 10 uL 0.1 % trifluoroacetic acid. Samples were vortexed for 20 min., sonicated for 20 min. to solubilize and applied to C I8 ZipTips® (Millipore, Billerica, MA) to further concentrate.

Samples were then elutedin 2 uL of a matrix consisting of 10mg of a-cyano-4- hydroxycinnamic acid (Sigma-Aldrich, St. Louis, MO) in 500/0 acetonitrile and 0.5% trifluoroacetic acid for MALDI-TOF analysis. After applying the samples in matrix to a

MALDI-TOF target plate, MALDI-TOF analysis was performed at MUSC by A. Bland.

MALDI-TOF-TOF target plates similarly prepared were sent to the MUSC Mass

26 Spectrometry Core Facility and data was acquired using aMALDI TOF TOF (Applied

Biosystems 4800 Proteomics Analyzer TOF -TOF).

Mass spectra results from the MALDI-TOF were used for peptide mass fingerprinting, using the K. brevis EST library and NCBI (GenBank) non-redundant sequence database as sources of theoretical peptide masses of proteins stored in the databases. The K. brevis eDNA library was translated in 6 frames by Dr. John Schwacke

(Dept. of Epidemiology and Biostatistics, MUSC) using the EMBOSS (European

Molecular Biology Open Software Suite) program Transeq that translates nucleotide sequences to amino acid sequences using the standard coding table and uploaded into the

MASCOT search engine (Matrix Science, Boston, MA). Tandem mass spectra from the

MALDI-TOF TOF were searched against the K. brevis database and non-redundant sequences on GenBank using MASCOT using the fixed modification of carboxamidomethylation of cysteine (+57) software for identification of proteins.

Fragment ion scores were generated using MALDI-TOF TOF analysis. Protein scores for

MS data and ion scores for MS/MS data above 95% C.1. (confidence interval) were considered a significant match.

Nuclear Isolations

Cells were harvested by centrifugation at 600 x g for 10 minutes. Cell pellets were then resuspended in 10mL of buffer consisting ofO.25M sucrose, 10 mM Tris and 5 mM CaC12 . Cells were allowed to equilibrate in buffer for 20 minutes on ice and homogenized with a Dounce homogenizer 10 times. Homogenized cells were then 27 centrifuged at 2200 x g for 6 minutes at 4°C and the pellets were resuspended in the aforementioned buffer. This homogenization step was repeated, followed by centrifugation for 5 minutes at 3000 x gat 4°C. Pellets were resuspended in 5 mLs of 10 mM Tris, with 5 mM MgCI2. This resuspension was layered over a gradient of 4 mL of

2.2 M sucrose and 12 mL of 1.6 M sucrose. The gradients were then centrifuged at

12,000 x g for 25 minutes at 4°C. The bottom of the tube contained a pellet of purified nuclei. The sucrose was removed by aspiration and an aliquot of the nuclei was brought up in phosphate buffered saline and stained with 1° Jlg/mL propidium iodide for examination under a fluorescent microscope to assess purity_ One mL TRIzol® Reagent was then added to the nuclear fraction for protein extraction following the manufacturer's protocol. After washing 4 times with 8 mLs of 0.3 M guanidine hydrochloride, protein pellets were dried and resuspended in 100uL ofDIGE buffer for 2D gel analysis. One hundred micrograms of protein were rehydrated and run using the 2D gel methods described above. Nuclear proteins were also cut from the 2D gels using the Bio-Rad Spot cutter and trypsin digested and these samples were run on a Finnigan L TQ (linear ion trap mass spectrometer) at the MUSC Proteomics Core Facility. The resulting spectra were searched using SEQUEST software (Thermo Finnigan, West Palm Beach, FL) using the variablemodification oxidation of methionine (+ 16) and the fixed modification carboxamidomethylation of cysteine (+57) against the translated K. brevis database and the GenBank non redundant species database. Spectra were also manually inspected for band y ions.

28 RESULTS

Global Translational Activity over the Diel Cycle

In order to determine whether protein translation occurs primarily during the dark phase as observed in L. polyedrum, active translation in K. brevis over the diel cycle was investigated using polysome profiling. Polysome extraction and profiling methods were first optimized for K. brevis and the initial studies examined light (LOTS) vs. dark

(LDT20) phase cultures in mid log phase growth (Figure 2). We were able to clearly distinguish fractions containing free ribosomes, represented by 40S and 60S ribosomal

RNA, RNA present as monosomes (80S), and polysomal RNA containing up to six ribosomes (Figure 2A). The identity of the monosome and polysome peaks was confirmed by treating the preparation with 30 mM EDTA, which causes the release of ribosomes from the mRNA and thereby the loss of monosome and polysome peaks (Figure 2C). There did not appear to be any major difference in the polysome peak sizes between light (Figure 2A) and dark (Figure 2B) phase polysome profiles. (The difference in slopes is due to variability in the sucrose gradients). This is dramatically different from the nighttime ribosomal loading reported in L. polyedrum (Figure 1).

However, since this preliminary experiment investigated only two time points over the 24 hour diel cycle, it is possible that we missed a window of more active translational activity. Therefore we next obtained polysome profiles from triplicate cultures harvested every 3 hours throughout the diel cycle. To more closely mimic conditions in the L. polyedrum study, cells were kept in continuous light, which enabled us to assess if any resulting pattern is circadian in . To confirm that the cells in this experiment were actively growing, and therefore requiring the expression of known diel-dependent 29 metabolic processes, we performed cell cycle analysis on the cell populations used for polysome preparations (Figure 3). Approximately 40% of the cell population of all cultures was synchronously proceeding through the cell cycle. However, no major changes in polysome loading were observed at any time point (Figure 4). Thus, it does not appear that a circadian rhythm of translation occurs in K. brevis. Instead, K. brevis may be producing individual proteins as demanded by physiological and metabolic processes over the die! cycle. Since these polysome studies did not show a pattern of bulk translation, a proteomic approach was used to investigate patterns of total cellular protein expression on 2-DE gels to investigate the behavior of individually resolved proteins.

Optimization of Protein Extraction Methods for K. brevis 2-DE analysis

Numerous buffers are used for proteomic studies depending on the type of tissue

(plant, animal, etc) and the proteins of interest (membrane, nuclear, cytoso lic, etc.).

Therefore, optimization of protein extractions for 2-DE analysis of K. brevis was first necessary since there are limited studies employing these techniques in dinoflagellates.

Traditional buffers that are often used in 2-DE proteomic analyses were tested for efficiency in extracting proteins from K. brevis. For 2-DE analysis, it is necessary to obtain protein extracts that will produce the highest number of protein spots and minimize nucleic acid contamination so that the proteins will be clearly resolved on the gels without streaking patterns. Multiple buffers for homogenization of proteins were tested, including: Gooley buffer, Modified Witzmann's buffer, SDS Boil buffer, trifluoroethanol buffer,sucrose/triethanolamine buffer, a trichloroacetic acid/acetone 30 precipitation, and phenol/chloroform (TRIzol® Reagent; Invitrogen, Carlsbad, CA).

Many of the buffers tested were not suitable for protein isolation as evidenced by a lack

of protein spots on the 2-0E gels. Three extraction buffers (Gooley, modified

Witzmann's, and TRIzol® Reagent) were successful for protein extraction as evidenced

by protein spots on a 2-0E gel. Of the three best results, protein extracts using TRIzol®

Reagentclearly resulted in gels containing a greater number of protein spots compared to

gels run with traditional urea/detergent based homogenization buffers (Figure 5). This

extraction method was therefore used in further studies in K. brevis.

Global Analysis of Protein Expression over the Diel Cycle using 2D and SYPRO

Ruby

F or initial insight into the K. brevis proteome over the diel cycle, protein samples were obtained from triplicatecultures at 3 time points, LDT 2, LOTIO and LOTI8. These time points were chosen since LOT2 displays the highest proportion of cells in G I,

LDTIO represents early S-phase when cell cycle proteins necessary for S-phase progression should be abundant and LOT 18 is a dark phase time point when the cycling cell population is in G2 or mitosis. Flow cytometry data shows that approximately 50% of the cell population was proceeding through the cell cycle in this experiment (Figure 6) and therefore if changes in the proteome were to occur over the cell cycle they might be visible on the 2-0E gels. When G I (LOT2), S-phase (LOTIO) and G2/Mitosis (LOTI8) protein profiles on the 20 gels were compared using PDQuest software and a built-in student's t-test, 47 protein abundances, out of 304 protein spots detected, changedsignificantly over the diellcell cycle. Figure 7 illustrates changing intensities of 3 31 protein spots between G 1, S phase and G2/Mitosis time points. Although these 2DE gels showed promise in terms of spot resolution, these preliminary results from the 2-DE using SYPRO did not provide adequate matching between gels for powerful statistical analysis. The difficulty in protein spot matching using PDQuest in this initial study warranted the use of a more stringent and quantitative approach with additional replication at each time point.

Global Proteomic Analysis over the Diel Cycle Using DIGE

For the next set of analyses, DIGE was used to increase the quality of matching of proteins between gels, facilitated by the use of a normalizer sample on each gel, which allows for a higher accuracy of statistical analysis and increased sensitivity. For the

DIGE studies, samples were collected for flow cytometry and proteins were extracted from quadruplicate cultures at 4 time points (every 6 hours) over the diel cycle: LDT2

(G 1), LDT8 (early S-phase), LDTI4 (mid-S-phase) and LDT20 (late G2/early mitosis).

Flow cytometry results show that --50% of the cell population was proceeding through the cell cycle (Figure 8). Proteins were labeled with Cy3 or Cy5, run pairwise in two dimensions in the presence of the pooled normalized labeled with Cy2 and analyzed using DeCyder software. A minimum of 470 proteins were reproducibly detected on each 2-DE gel (Figure 9). Of these, 21 proteins were found to change significantly

(ANOVA, p

LDTI4 directly, and 83.3% were shown to have the same trend (Table 1).

After statistical analysis was carried out using DeCyder, all gels were stained with

SYPRO Ruby for total protein (Figure 11). Proteins were matched and compared to the matching pattern in DeCyder software in order to pick the significantly changing proteins for identification by mass spectrometry. Fourteen of the significantly changing proteins were sent to the MUSC Mass Spectrometry Core for analysis by MALDI-TOF-TOF. The other 7 significantly changing proteins did not have high enough abundance for mass spectrometry analysis. Comparison of the peptide sequences obtained from MALDI TOF

TOF with the translated K. brevis EST library provided putative matches; however, none of these met the generally accepted significance cut-offs of greater than C.1. 95%. This may be because there was a mixture of proteins samples in the gel pugs that were not able to be separated using MALDI- TOF TOF or the proteins may have been of low abundance. Although no proteins could confidently be assigned identity, this study identified several trends in protein abundance over the diel cycle, in which the most dramatic changes occurred late in the light phase from LDT8 to LDTI4, which also corresponds to S-phase of the cell cycle. The three general trends that were observed over the diel cycle in these proteins are shown in Figure 12.

Proteomic Analysis of Isolated Nuclei

Because 2D gel analysis of whole celllysates detected only ----500 of the most highly expressed proteins in the cell, subcellular fractionation wasnext used to increase 33 the possibility of identifying proteins that localized to the nucleus, particularly those involved in cell cycle regulation, by removing highly expressed proteins from other cellular compartments. Nuclear isolations should facilitate the analysis of both proteins that are differentially expressed and proteins that are imported to and exported from the nucleus over the cell cycle.

A wider range of isoelectric points, 3-10 pI, was used to allow for better separation of potentially basic proteins, as expected for nuclear proteins. SYPRO Ruby was used for staining and PDQuest software was used to identify spots and analyze intensity. Twenty-six prominent nuclear spots were then picked, extracted, and trypsin digested. Nuclear peptides were sent to MUSe Mass Spectrometry Core Facility for analysis on the LTQ MS/MS to determine if these proteins could be identified using

SEQUEST softwareto search the translated K. brevis EST library and GenBank nr database.

The number of nuclear proteins detected was much smaller ('"" 100) than that found in whole celllysates (Figure 13). Positive matches to the K. brevis EST library were obtained for eighteen nuclear proteins by L TQ MS/MS, or approximately 18% of the proteins detected on the gels (Table 3). Ten of the 18 proteins had hits to contigs in the

EST library that have significant homology to known genes on GenBank. Of the proteins with known function, most of these proteins are of known nuclear localization. Table 3 shows the number of peptides identified in each sample and the total ion score generated from the identification of these peptides. The total ion score is calculated by weighting the ion scores for all individual peptides nlatched to the protein that is associated \vith this peptide and the MS/MS spectrUlTI. The ion score is calculated for each peptide 34 matched fronl MS/MS peak lists and is significant if it meets the 95% confidence interval. The substantially improved success rate for identifying proteins from the nuclear preparations is most likely a combination of the highly conserved nature of many nuclear proteins and the use of reverse phase C 18 LC-MS/MS which is moresensitive and permits separatation of peptide mixtures.

35 DISCUSSION

The studies in Chapter 2 were undertaken to gain insight into diel patterns of protein expression in K. brevisand, in particular, whether a proteomic approach would be infonnative of changes in the expression of cell cycle proteins since the cell cycle is tightly linked to the diel cycle. Mounting evidence from studies of circadian rhythms

(Mittag, 1996; Fagan et aI., 1999; Le 2001; Okamoto et aI., 2001), microarray studies

(Okamoto and Hastings 2003; Van Dolah, 2007; Lidie 2007) and the identification of spliced leader trans-splicing in dinoflagellates (Lidie and Van Dolah, 2007; Zhang et aI.,

2007) point to a predominance of translational control in this group of protists.

Schroder-Lorenz and Rensing (1987) demonstrated that translational activity in the dinoflagellate L. polyedrum occurs almost exclusively during the night based on polysome profiles (Figure 2D). In that species, little RNA was associated with ribosomes at the time of cell division, ---CTO. By CT6, a large monosome peak appears suggesting the initiation of translation has occurred, but little RNA is present in the polysome pool.

However, by CTI2, which corresponds with the beginning of dark in the circadian day, polysome peaks increase in size relative to the monosome and by CT17 much of the

RNA is found in the larger polysome classes with 3 or more ribosomes. This time corresponds roughly to when cells are exiting S-phase and entering G2 phase in the diel­ phased (LD 8: 16) cell cycle of Lingulodinium (Dagenais-Bellefeuille et aI., 2008). By

CT24, when cells have once again moved into mitosis, little translation activity is evident. Using protein synthesis in cell free extracts, it was also shown that the protein degradation displayed the same circadian rhythm as the synthesis rate (Schroder-Lorenz and Rensing, 1987). This suggests concerted protein turnover and replacement prior to 36 cell division. The polysome experiments in this chapter were undertaken to determine if a similar pattern of protein synthesis is evident in K. brevis. Somewhat surprisingly, the polysome profiles in K. brevis over the diellcell cycle did not show any time point where

"bulk translation" was evident (Figures 2 and 4). A large monosome peak was present at all timepoints examined over the diel cycle, with the peaks corresponding to messages containing two ribosomes always being the dominant form of polysome. At no time point did the larger polysomes dominate, although peaks representing up to six ribosomes were resolved at most time points. We are confident that the absence of a time point representing "bulk translation" is not an artifact due to ribosomal runoff, because increasing the cycloheximideconcentration two-fold to 200 mg/ml did not alter the profiles (data not shown). Furthermore, flow cytometry demonstrated that approximately

50% of the cell population was proceeding through the cell cycle on the day the study was performed; therefore the absence of a strong protein synthesis peak cannot be related to the lack of metabolic activity although it could be diminished by out-of-phase cells.

There are limited studies on diel and circadian translation patterns in dinoflagellates which mostly have been carried out in Lingulodinium. The study by

Schroder-Lorenz and Rensing (1987) which identified a circadian rhythm of polysome production, was based on previous findings that a circadian rhythm of protein synthesis occurs in this dinoflagellate (Volknandt and Hardeland, 1984; Donner et aI., 1985;

Cornelius et aI., 1985) and that the bulk of total protein appeared to be synthesized mainly during the phase corresponding to the night period (Rensing and Schill, 1985).

However, a study in Alexandriumfundyense measuring total protein concentration every

2 hours over the diel cycle showed a protein production pattern where protein 37 concentration peaked at the end of the light phase and reached a minimum at the end of the dark phase (Taroncher-Oldenburg, 1997), the opposite of what was observed in L. polyedrum. In eukaryotes such as and mammals, there does not appear to be a time point of bulk translation during the cell cycle. However, during mitosis, 5'cap-dependent translation is turned off (Pyronnet and Sonenberg, 2001). During this process, translating ribosomes are stalled on mRNAs to protect the messages from degradation during mitosis and this allows for the cells to rapidly resume active translation upon entry into the next

G 1 phase of the following cell cycle, which is energy efficient for the cell (Sivan and

Stein, 2008). K. brevis undergoes mitosis during the dark phase of the diel cycle with the peak occurring from LDT18-LDT22. There was a small decrease in polysomes at the

LDT22 time point relative to the preceding time points that may reflect mitosis; however, with only 50% of the cells proceeding through the cell cycle, the other 50% must still be in Gland this may account for the continued presence of translationally active RNA at this time point. However there was no evidence of bulk translation at any time during the diel cycle, leaving open the question of whether we might expect diel dependent changes in individual proteins, based on their needs at different times during the photoperiod or cell cycle.

The analysis of proteomes using 2D gel separation of proteins followed by identification using MALDI-TOF mass spectrometry has been used successfully in many organisms for which whole genome sequences have been obtained (Gevaert and

Vandekerckhove, 2000). Although a complete genome of a dinoflagellate is not currently feasible due to their massive sizes and complexity (5 x 10 10 bp for K. brevis, or almost

30x the human genome), recent large scale EST sequencing projects in dinoflagellates 38 provide the first tools to investigate global protein products using MALDI-TOF based

peptide mass fingerprinting or MALDI-TOF/TOF analysis of peptide sequences. In the

current project, an EST database of K. brevis was used that contained 22,671 unique

genes. Based on the analysis of dinoflagellate genome sizes (Hou and Lin, 2010) we

estimate that that this database contains at approximately one-half of the expressed

sequences in this organism (Brunelle and Van Dolah, 2011). The traditiona12-DE gel

analysis with SYPRO Ruby staining used in this study allowed for resolution of

approximately 300 K. brevis proteins from 10 - 200 kDa and pI 4 t07 (Figures 5 and 7).

Preliminary data from using this method to compare protein profiles from G 1, S -phase

and G2/Mitosis showed that there did appear to be changes in abundant proteins over the

cell cycle (Figure 7). This preliminary analysis suggested we pursue the question of diel

dependent proteome changes using a method that results in less intergel variability and

offers higher sensitivity, DIGE.

The use of an internal standard in DIGE, run on every gel, allows for less inter-gel

variability and a lower limit of detection. This also allows for more statistically accurate

analysis across multiple gels by normalizing individual samples compared with the

internal standard (Renaut et aI., 2006). Using the DIGE approach, ----470 protein spots

were detected (Figure 9) and 4.5% of these proteins were found to change significantly.

This percentage is somewhat lower than anticipated, based on the prevalence of circadian

as controlled processes in dinoflagellates. There are limited published studies using proteomics in dinoflagellates. However, a study using traditional 2-DE methods in L. polyedrum on larger format gels showed that only 3.1 % of detected proteins (28 of900)

changed over the course of the diel cycle (Akimoto et at, 2004). Of these, Rubisco made 39 up the majority of the spots identified, with the other nine proteins representing largely functions. The behavior of the differentially expressed proteins in

Lingulodinium fell into three patterns: 1) increasing during early day, with maximum abundance during late day and into night, 2) increasing during late day with maximum abundance during night, and 3) decreasing during the day, absent or minimal in late day, then increasing to maximum abundance in late night. This is interesting in light of the previous observations of bulk translation and protein turnover at night in this species.

The patterns of abundance among eight of the twenty-one changing proteins in K. brevis showed similar trends: 1) decreasing during early day, increasing during late day and into the night with maximal abundance in late night, 2) decreasing in early day with maximum abundance attained in late day and maintaining at night, 3) proteins decreasing during the day and minimal at night (Figure 11). Fourteen of twenty-one significantly changing proteins from the K. brevis DIGE study were analyzed by MALDI-TOF TOF mass spectrometry but nosignificant matches were found to either the K. brevis EST library or to the GenBank database. Our current limitation due to incomplete ESTs to provide peptide mass fingerprints, in addition to the incomplete coverage of the expressed genome in any dinoflagellate to date is the likely source of our limitations in identifying proteins. Additionally, protein spots extracted from gels often contain a mixture of proteins; therefore a liquid chromatography step such as using LC-MS will provide better separation of peptides. Efforts to develop full length cDNA libraries from dinoflagellates using the spliced leader sequence to amplify from the 5' ends may improve the quality of the existing sequence data available for peptide analysis (Lin et aI.,

2010). 40 Since our particular focus for this study was cell cycle related proteins, we

reasoned that nuclear proteomics may be a more successful approach for identification of

proteins of interest since high abundance cytoplasmic and chloroplast proteins are

removed from the sample. Nuclear proteomics have been used to successfully identify

DNA replication proteins under cell cycle regulation such as replication factor C (subunit

5), proliferating cell nuclear antigen, and replication protein A (14 kDa subunit) in human

lymphoma cells (Henrich et aI., 2007). It has been estimated using a bioinformatic

approach that 31.5%,21.9%,26.3%,25.7% and 25% of the proteins in the total cell

proteome are nuclear in S. cerevisiae,C. elegans, D. melanogaster, mouse and human,

respectively (Kumar and Raghava, 2009). Therefore, the 2D analysis of nuclear proteins

in K. brevisin the current study most likely grossly underestimates the number of nuclear proteins since only --100 protein spots were detected. This number was also much

smaller than the number of nuclear proteins detected in plants using 2D gels, MALDI

TOF and peptide mass fingerprinting, which has been reported as 500-700 proteins in

Arabidopsis (Bae et aI., 2003). However, the --100 proteins detected on the gels containing K. brevis nuclear protein, eighteen proteins were significantly matched to sequences in the K. brevis EST library when using when analyzed by LC-LTQ MS/MS after tryptic digestion (Table 3). Seven of these proteins have been experimentally shown to have nuclear localization. Eight of the proteins are hypothetical; therefore little can be said of their function in the nucleus. One of the seven nuclear proteins identified included a eukaryotic initiation factor (eIF4A-3). This translation factor has been shown to be nuclear in mammalian cells (Holzmann et aI., 2000; Ferrailuolo et aI., 2004) and plant cells (Koroleva et aI., 2009). Since eIF4A-3 interacts with translation initiation 41 factors elF4G and eIF4B, and is localized to the nucleus, Ferrailuolo et aI., 2004, suggest

that this elF is involved in the "pioneer round" of translation and show that this protein is

a nucleocytoplasmic shuttling protein. The pioneer round of translation refers to the first

time the reading frame of an mRNA is read, possibly occurring in the nucleus (lshigaki et

aI., 2001). Additionally, nuclear proteomic studies in mammalian systems, specifically

human Raji lymphoma cells, have identified elF3 subunit 2, and elF6 in the nucleus and

eIF4B, elF2 subunit 3, elF 4H, elF 5A and elF lA in the DNA binding proteome isolated

from nuclei (Henrich et aI., 2007). Two of the significantly identified proteins in the K.

brevis proteome, tropomodulin and a calpain family cysteine protease containing protein,

have cytosolic localization. These proteins are both involved in actin organization and

cytoskeletal remodeling (Sung et eI., 1992; Smith et aI., 2000) and may reflect

contamination of other cellular components. components, such as ,

have been identified in nuclear proteomic studies; therefore this does not seem to be an unusual result (Henrich et aI., 2007). To be most informative, subcellular proteomics

requires highly purified fractions; however cross-contamination is a common problem

(Schirmer and Gerace, 2002). In order to identify a higher number of nuclear proteins, a

larger of number of nuclei will need to be isolated from K. brevis and likely other dinoflagellates. Obtaining a pure, concentrated sample is non-trivial and will be

necessary for successful dinoflagellate proteomics. The use of large format gels would

also be beneficial for separating an increased number of spots. Additionally, since the nucleus is a dynamic cellular structure with proteins constantly moving in and out with a

short residence time, a time course study using nuclear proteomics may be a way to identify more nuclear proteins. Chromatin associated proteins should be able to be 42 identified since the extraction method used here (phenoVchloroform: TriZol or

TriReagent) has been used to extract such proteins as PCNA, which binds to chromatin,

in K. brevis. The proteins selected for mass spectrometry analysis were the most

abundant proteins within the nuclear gels, therefore it will be interesting to see what other

proteins can be identified that may be involved in gene regulation, cell cycle, and other

metabolic processes involving the nucleus in K. brevis. However, the use of a sensitive

mass spectrometry method, and separation chemistry prior to MS/MS such as LC­

MS/MS, as well as additional sequence information from large scale sequencing projects

in dinoflagellates will yield more biological data.

The presence of eukaryotic initiation factors in dinoflagellates indicates that these

organisms have the eukaryotic protein synthesis machinery. Eukaryotic IF -5A has been

identified in Crypthecodinium cohnii (Chan et aI., 2002); and eIF4A, eIF3-8 as well as an elongation factor (1 alpha) are present in Amphidinium carterae (Bachvaroff and Place,

2008). Also, a gene data set from Alexandrium catenella identified eIF4A and 3 elongation factors (Toulza et aI., 2010). The K. brevis cDNA library has 9 ESTs to eIFs, with differing homology scores or e-values, including eIF2B (low homology), eIF5 (high homology) and as mentioned above eIF4A-3 (high homology) (marinegenomics.org).

Therefore, although dinoflagellates may be using translational machinery to regulate cellular processes in a manner that is divergent from other well-studied eukaryotes, the necessary components appear to be present within the dinoflagellate genome/proteome.

Further investigation into the translational machinery of dinoflagellates should yield information about how these factors that are similar to other eukaryotes are being used in these unusual protists. 43 The few dinoflagellate proteomic studies for which published data exists have used a traditiona12-0E approach for species identification between different HAB species including Prorocentrum micans, P. minimum, P. sigmoides, P. den tatum,

Scrippsiellatrochoidea, Karenia longicanalis, K. digitata, K. mikimotoi, as well as K. brevis (Chan et aI., 2004). This study focused only on species specific 2D gel spot patterns and no protein identifications were attempted for any of the above species.

Another study in a dinoflagellate investigated the protein profiles between toxic and non­ toxic strains of Alexandrium minutum (Chan et aI., 2005). Although the protein profiles between the toxic and non-toxic strains differed, no protein identifications were made.

Lastly, a study investigating protein profiles of P. triestinum under different environmental stress conditions showed that they were indeed different, but again proteins were not identified (Chan et aI., 2004b). Akimoto et ai. (2005) is the only study published to date that successfully identified any dinoflagellate proteins, or attempted to assign meaningful physiological inference from their expression, although the proteins identified were dominated by different forms of rubisco, a highly expressed chloroplast protein. The current study demonstrates that isolation of the cellular compartment of interest is a useful approach to enrich for proteins of interest.

For the proteomic analysis in this chapter, three different mass spectrometers were used: MALOI-TOF, MALDI-TOF TOF and LTQ (LC/MS). Unfortunately, the peptide mass finger printing using the MALO I-TO F did not provide useful data since protein coverage is low from K. brevis ESTs. When more complete sequence information is available for dinoflagellates, peptide mass finger printing may become be a useful tool since it is a fast, single mass spectrometry approach based on known peptide sequences. 44 Although the MALDI-TOF TOF is more sensitive providing another layer of identifying

data and yields sequences information, since peptides are generated and then MS/MS

spectra from these peptides are produced, we were still unable to identify K. brevis proteins. However, this may be due to low abundance or again, mixtures that were not able to be separated. LC/MS using an L TQ is sensitive, can separate mixtures that may be present in sample proteins, and can detect post-translational modifications using high resolution and accurate mass. The fact that peptides are separated and concentrated before they are introduced to the ion source greatly enhances the ability to obtain useful sequence information. Using this technique, significant matches of nuclear proteins to the K. brevis EST library were successfully made. This approach is more costly and time consuming, but appears to be essential for successful identification of proteins in an organism for which genome sequence data is not available.

SUMMARY

The studies presented in Chapter 2 provide the first analysis of the proteome of K. brevis with respect to one of its most important physiological drivers, the diel cycle.

Unlike L. polyedrum, the only other dinoflagellate studied, K. brevis does not display diel pattern of bulk translation. Rather, it appears that proteins are differentially expressed over the diel cycle, with patterns that likely reflect their daytime or nighttime peak in activity. Using DIGE, we found 4.5% of total cellular protein significantly changing in expression, although we were not successful in identifying them based on peptide mass fingerprints. For the analysis of cell cycle related proteins, proteomic analysis of isolated nuclei proved to be more productive, with 18 of ~ 100 proteins being matched to ESTs in 45 the K. brevis library when analyzed by L TQ MS/MS. Although limited significant matches were made using the K. brevis EST library, these studies show the feasibility and usefulness of proteomics in this dinoflagellate, even in the absence of genomic sequence data. Although transcriptomic studies have contributed a wealth of information to dinoflagellate molecular biology, mRNA levels are even less likely to correlate well with protein levels in dinoflagellates than in most eukaryotes, due to the presence of widespread post-transcriptional regulation of gene expression. Proteomics studies, as well as those studies investigating translational activity and post-translational modifications, can begin to address the cellular state in a more accurate capacity. Genome sequencing or the large scale production of full length eDNA libraries in dinoflagellates will provide the means for increased protein identifications in the future.

46 eTO

Increasing # of Ribosomes

Figure 1: Polysome profile of L. polyedrum over circadian time (Schroder and Lorenz,

1987). This study served as a positive control for the K. brevis polysome analysis. Shown are the 80S, or monosome fraction (arrow). To the right of the monosomes are the polysomes, or those mRNAs with increasing number of ribosomes. On the x-axis is increasing number of ribosomes present in the polysome factions, based on sedimentation in a sucrose gradient and on the y-axis is optical density at 254 nm. CT= Circadian Time

47 A B c OayCulture Nightculture 30mM EOTA (LOTS) (LOT20) (LOT20)

~ E s::: ~ It) N C

o"'-"" . ~ ~ tn s::: Q) C ta ...-(.) oc.

) ) )

Increasing # of Ribosomes

Figure 2: (A) Polysome profile of a K. brevis culture during the light phase, or subjective day. Shown are the 80S, or monosome fraction, the 40S and the 60S ribosomal subunits; and bracketed to the right of the monosomes are the polysomes, or those mRNAs with increasing number of ribosomes. (B) Polysome profile of a dark phase culture. (C)

Polysome profile of a K. brevis culture treated with 30mM EDTA which causes ribosomal run-off (negative control). On the x-axis is increasing number of ribosomes present in the polysome factions, based on sedimentation in a sucrose gradient and on the y-axis is optical density at 254 nm. LDT=Light Dark Time

48 100 Flow Cytometry 90 80 c .2.., 70 .!!! 60 :l Co 0 50 ~ 0. • %Gl

OJ 40 u ••••• %S phase 30 --. • %G2/M '*' 20 10

0 +- - 2 6 10 14 18 22

Light Dark Time

Figure 3: Flow cytometry over the K. brevis cell cycle showing that approximately 50% of the cell population was proceeding through the cell cycle in this study. LDT, or light/dark time, is shown on the x -axis with the percent of the cell population in each phase of the cell cycle on the y-axis. LDT2 and 6= Gl, LDTIO= early S-phase,

LDT14=mid S-phase and LDT18= transition from S-phase to G2/Mitosis, and

LDT22=mitosis.

49 Figure 4: Analysis of translational activity over the cell cycle/diet cycle of K. brevis, using polysome fractionation. The arrow shows the 80S or monosome fraction

(combination of the 408 and the 608 ribosomal subunits) and to the right of the monosome are the polysomes, or those mRNAs with increasing number of ribosomes.

LDT2 and 6 represent GI, LDTIO and 14, S-phase, and LDT18 and 22, G2/Mitosis.

LDT18 and 22 are dark phase samples.

50 LOT2 (Gl) LOT6 (Gl) LOTiO (5)

LOT14 (5) LOT1S (G2/M) LOT22 (G2/M) ...-... E c ~ It) N C o "'-"" ~ tn r: Cl) C ca o +:i C- O

) ) ) Increasing # of Polysomes

51 Gooley Buffer Modified Witzmanns's Buffer pH4 pH7 pH4 pH7 250kDa ' _ 250kDa =

lOkDa lOkDa •

Phenol/Chloroform pH4 pH7 250kDa

lOkDa

Figure 5: 2D gels using traditional urea/detergent based homogenization buffers: (A)

Gooley buffer and (B) a modified Witzmann's buffer and 2D gel using phenollchlorofonn reagents (C).

52 100 ___ G1

c: -A- S-phase 0 80 to ...... G2/M -::J 60 oC- a.. 40 ,. G) ,. ,. ". U ,. ". CI~ 20

O~~--.-~~~~-.--.-~~~.--.~ o 2 4 6 8 10 12 14 16 18 20 22 24

LightJDark Time in Hours (LOT)

Figure 6: Flow cytometry over the K. brevis cell cycle. LDT, or light/dark time, is shown on the x-axis with the percent of the cell population in each phase of the cell cycle on the y-axis. LDT2= G 1, LDT10= early S-phase, and LDT18= transition from S-phase to G2/Mitosis.

53 LDT2 LDT10 LDT18 G1 s G2M G1 1414 ,.- 5 PPM G2M CNT

SSP 3404 G1 s G2M 5 G21\j 1934 PPM G1 CNT

SSP 4203

G1 s G2M G1 1111 5 PPM rr.-l~~~1 . CNT W---.J CNT ~~ i' G2M

I • • • SSP 4606

Figure 7: 2D gels displaying profiles of K. brevis proteins harvested over the cell cycle

(G 1, S phase and G2/Mitosis). Proteins that appear to be changing in relative abundance are boxed in yellow and represented in the histograms to the right. LDT = Light/Dark

Time in hours.

54 100

s: 80 .-....,o ns -::l 60 Q. o 0. • G1 40 .. .•...... CIJ ~ . .•.. s U ...... 20 ...... ~ .... ,/ .... -G2/M 1,:.:...... ,. ,/

a -r-- - --~ ....-- ~ ~ -- "I 2 8 14 20 Light/Dark Time in Hours (LOT)

Figure 8: Flow cytometry over the K. brevis cell cycle. LDT, or light/dark time, is shown on the x -axis with the percent of the cell population in each phase of the cell cycle on the y-axis. LDT2= G 1, LDT8= early S-phase, LDT14=mid-S-phase and LDT20= transition from G2 to Mitosis.

55 pH4 pH4 pH7 pH4 pH7 250 kDa 250 kDa - 250 kDa

10kDa 10kDa 10kDa Gell Cy2: Normalizer Gell Cy3: LOT2 Gell CyS: LOTS

pH4 pH4 pH7 pH4 250 kDa 250 kDa - I 250 kDa

10kDa 10 kDa- 10 kDa Gel2 Cy2: Normalizer Gel2 Cy3:LOT2 Gel2 CyS: LOTS pH4 pH7 pH4 pH7 pH4 pH7 250kDa 250 kDa 250 kDa

• •

10kDa 10kDa 10kDa Gel7 Cy2: Normalizer Gel7 Cy3: LOTl4 Gel7 CyS: LOT20

Figure 9: Scans of three representative gels showing resolved proteins labeled with Cy2

(a pooled sample from each experimental sample, used as a normalizer), Cy3 and Cy5.

56 Figure 10: Significantly changing proteins as determined by DIGE. Dotted lines depict the change in log abundance of a protein spot labeled with Cy3 vs Cy5 at two time points within a given gel. The solid lines represent the average intensity of the same across all gels. Where dotted lines are missing, the spot was absent from individual gels. A) Spot

333 decreases during the light phase, appears to increase during the dark phase and is present in all 16 experimental samples and 8 normalizer, or pooled, samples. (B) Spot

189 increases from LDT8 to LDT14 and in present in 12 of24 samples. (C) Spot 396 follows a similar trend to spot 333 and is present in all 24 samples. (D) Spot 235 increases during the light phase and is present in all 24 samples. (E) Spot 627 increases from LDT8 to 14 and is present in all samples. (F) Spot 204 also increases during the light phase (LDT8 to 14) present in all samples. (G) Spot 43 is present in all samples and increase during the light phase. (H) Spot 117 steadily increase during the light phase and appears to decrease at night being present in 21 of 24 samples.

57 A "

~ ' .. ..

l I LOT2 LOTS LOT14 LOT20 LOT2 LOTS LOT14 LOT20 Time Spot 189, p=O.00062 Time Spot 333, p=O.00015

D QI u c: ,. -cRJ c: :s .c ,. . :;1

Spot 369, p=O.0016 Spot 235, p=O.0032

58 E .... F ...... cu f. ~ U ... U C ... : C '" "C .". "C C"' "'C .a::l .a::l ct ct . aD ':-

; ao p 0 ~--" o . ....I ....I .: "C '. "C ~ ~

"C 4- "C "'C C"' ."OM ..., ...... , V')"' ..., V')"' .C

e C r £ £ , ~ LOT2 LOTS LOT14 LOT20 LOT2 LOTS LOT14 LOT20 Spot 627, p=0.0092 Time Spot 204, p=0.0099 Time

G H ,

cu ~ u u C C "C "C "'C "'C .a::l .a::l ct ct aD aD 0 0 ....I ....I "C "C ~ ~ "C"' "C"' ...,C ...,C V')"' V')"' £ l ft t c LOT2 LOTS LOT14 LOT20 LOT2 LOTS LOT14 LOT20 Time Time Spot 43, p=0.011 Spot 117, p=0.012

59 pH4 pH4 pH7 pH4 pH7 pH4 pH7 250 kDa : a . ~ ~ - "

10 kDa

~ Y.ill .Grl3 ~ Cy3-LDT2 Cy3-LDT2 Cy3-LDT2 Cy3-LDT2 Cy5-LDTS Cy5-LDTS Cy5-LDTS Cy5-LDTS

250 pH4 pH4 kDa .'

10 ,':- . kDa Ge1.5 Gili Ye1.1 iicl.B Cy3-LDTI4 Cy3-LDTI4 Cy3-LDTI4 Cy3-LDTI4 Cy5-LDT20 Cy5-LDT20 Cy5-LDT20 Cy5-LDT20

Figure 11: Scans of all experimental gels showing resolved proteins after final staining with SYPRO Ruby. Spots determined to significantly change were picked from all gels in which the proteins were present and were then pooled for tryptic digestion. This was done in order to maximize protein abundance from the gel plugs to increase possibility of identification.

.'

60 Trend 1: Spots 627,43, 117 and 235 Trend 2: Spots 204 and 189

LOT2 LOT8 LOT14 LOT20 LOT2 LOT8 LOT14 LOT20 Time Time

Trend 3: Spots 333 and 369

ell u £: nJ "C £: ~ .Q c( Q.D o -' "C... nJ "C £: ...nJ V)

LOT2 LOT8 LOT14 LOT20 Time

Figure 12: Trends of significantly changing proteins in K. brevis. Trend 1, proteins- decreased during early day, increased during late day and into the night with maximal abundance in late night; Trend 2, proteins decreased in early day with maximum abundance attained in late day and maintaining at night; Trend 3, proteins decrease during the day and are minimal at night.

61 pH3 pH10 pH3 ______~ __-- ,.__ r pH10 250kOa - 250kOa

10kOa 10 kO

LOT5 (G1) LOT 14 (S-phase)

______t;;...;;..;. H10 . pH3 ______-pH10 250kOa 250kOa_

10kOa 10kOa LOT14 (S-phase) LOT14 (S-phase)

Figure 13: 2DE gels of K. brevis nuclear proteins. LDT5 (G1) and LDT14 (S-phase) were compared. Approximately 100 protein spots were detected.

62 Table 1: Proteins found to significantly change over the diel cycle using OIGE. Shown are the master spot numbers assigned by OeCyder software, the number of samples in which the spot was present (of 24, including Cy2, Cy3 and Cy5 from each of 8 gels), the results from a Student's t-test comparing each time point to LOT2, the average ratio compared to LOT 2, and the results from a one-way ANDVA comparing the means across all time points. Note: Not all spots were detected by SYPRO Ruby staining in order to be cut from gels. Additional gels were run, swapping the original Cy dyes, in order to verify the trend that was occurring between LOT 8 and 14 and 15 of 18 protein abundances agreed (83.3%).

T-Test (LOT2 VS. other time A NOVA Original Trend Repeated with dye Master No. Appearance points) Pvalue LOT8-14 swap Note

333 24(24} 0.00073 0.00015 down down (2/3) 189 12(24} 0.056 0.00062 up up 369 24(24} 0.11 0.0016 down down 235 24(24} 0.77 0.0032 up up

8 6(24) 0.0041 not confirmed with SYPRO 627 24(24} 0.58 0.0092 up up 204 24(24} 0.21 0.0099 up up (2/3)

43 24(24) 0.89 0.011 up up 117 21(24} 0.06 0.012 up down 340 24(24} 0.69 0.014 down down (2/3) 573 9(24} 0.014 not confirmed with SYPRO 314 24(24) 0.85 0.016 up up (2/3) 367 24(24) 0.47 O.Ola down down 223 12(24} 0.022 not confirmed with SYPRO

339 24(24) 0.14 0.026 down down (2/3) 192 24(24) 0.73 0.035 up up (2/3) 608 24(24) 0.92 0.036 up up 595 18(24) 0.34 0.042 down up not confi rmed with SYPRO 629 21(24) 0.44 0.042 up up (2/3) not confirmed with SYPRO 176 15(24) 0.82 0.045 up up not confirmed with SYPRO

271 24(24) 0.91 0.045 up down not confirmed with SYPRO

63 Table 2: Nuclear protein identifications using tandem MS (LTQ). Shown are the number of peptides identified in each sample, the peptide sequences of each peptide, the total ion scores, the match to the contig sequence in the K. brevis cDNA library using SEQUEST, the protein ID of the top hit to the K. brevis Contig from GenBank with BLASTx e-value, and the species to which the sequence has the highest homology. Shown in bold are the proteins known to have known nuclear localization.

Total # Peptides BLAST Ion Peptide Sequences Protein 10 Organism e-value of Identified Score Contig from Librarv contig APSILDLK, IDLQLLR, LLDVRICR, DPGAILPDER, ATHELSEGIYN, QKLLDVRICR, ICRQCLHQLR, HSGQILQRHVCVALR, LLDVRICRQCLHQLR

9 33 Contig 6616 tropomodulin 3.00E-11 beetle DELTLEGI, ELAEQSQK, FMNNPFR, RDELTLEGI, VFHMIQQR, EQVYDIYR, DVIVQAQSGTGK, DPQVLFLSPTR, FMNNPFRVlVK, AlEAGVHIVSGTPGR, FMNNPFRVLVKR, VKGQIVCLNVLAGMASKMFVLDEADEMLNR

eukaryotic initiation factor 4A-3 13 66 Contig 5255 (eIF4A-3) 3.00E-102 apicomplexan GVAGQRARR, SSATGEQTGSI, GCRVSESSPR, SARQNTRCTR, SlPGASlSRRER, VSESSPRDERSAR, G PGITE HARG HQPGAAHTK, PPGSCRPMEPGESRVAS, AGGGQPDPGLSSPAWR 9 58 Contig 5491 hypothetical protein 0.57 0. sativa RELTCH, RKCMQR, CLTKVHR, ARCRCl TK, AWGAQHHR, contlg 3533, UnaJ domam protem 6 29 KCMQRIPR Rev 6675 (transcriptional regulator) 3.00E-05 (.;ontlg l~ll. 2 33 GQALTLPR, ARGQALTLPR Rev 3965 hypothetical protein 1.00E-15 M. musculus NACPSLPR, LHDCLGR, LQTMLLCK, QAHSSPVLLGR, TVSKCSENLR, LLQTMLlCKCCC, TVSKCSENLRSDIQK, CSENLRSDIQKTPll,SAFGHDLGTCLHALHR, LHDCLGRLGLDDNLLAK, ANRLLQTMLLCKCCC, AANHLRANRLLQTMLLCK 12 33 Rev 25 Major basic nuclear protein 2.00E-32 Karlodinim GLPPCSKG, SSPQSLPR, GPALASFSR, SSPQSlPRHR 4 38 Contig 3542 hypothetical protein 5.00E-07 M. musculus NLQQMR, SRISNCK, ISNCKVK, ERLSlPR, TALSSKER, \JUIILlYi U,J""tL, 6 35 VKDGAQSTGIDWAK Rev 2952 hypothetical protein 0.096 platypus VVIILl~ vVoJ, J'\'VY 4 38 LSAQELR, DRWVDK, ASTDALPR, ETRTAMK 4727 hypothetical protein 0.021 M. musculus YQPTVGR, QEQRKR, CVIRLSR, VAEVSPLR, LLGVRCWSVKR, .... UIIUY VV.:JV. 6 32 HCDDVHDWPLK Rev 1495 hypothetical protein 0.001 LSAQElR, DRWVDK, ASTDALPR, ETRTAMK, ETRTAMKDR 5 38 Contig 305 hypothetical protein 0.021 M. musculus

2 39 SSPQSLPR,SSPQSLPRHR Contig 3542 hypothetical protein 5.00E-07 M. musculus GSLCTLAR, KRGGSPGGR, GTPPCPRWR, GSLFEGSVPCECRHlEK, RGSGASLGCGEIliHANCSGTMSR 5 33 Rev 7203 similar to SOX3 (DNA binding) 1.00E+00 platypus CKSAELR, EDWQLFR, ERHQTLR, WQVQMGEGR, CPMCGTVQPR, GKPQAQMMMR, LGESVPSASAASAR, TQWAEAlREGR, AKRTQWAEALR, AVRTISEQMMGGK, KERPIRLGESVPSASAASAR, FQARGFEYEVLLDRlVQR, WQVQMGEGRWEDFAESEQR Calpain family cysteine protease 13 34 Contig 9679 containing protein 3.00E-12 IVEKLYGDDK, ETMAEGMWQDAK, DCEElKTLLAEQTSKSK, NIIIVTFARGLKVTADCHIK, AGAKVVTASVMPDKNIIIVTFAR, VVADGHIITSQGPGTSLQFALK, YVAYAATAQQLlSIPAFTFEVRR, VLVPIADGSEDIETACITDVLVRAGAK Contig 9664, DJ-1 family protein (pos. 8 15 Rev 832 regulator of transcription) 6.00E-34 Arabidopsis DELTLEGI, HAEQSQK, FMNNPFR, RDELTLEGI, VFHMIQQR, EQVYDIYR, DVIVQAQSGTGK, DPQVLFLSPTR, FMNNPFRVlVK, ALEAGVHIVSGTPGR, MFVLDEADEMLNR, VKGQIVCLNVLAGMASK eukaryotic initiation factor 4A-3 12 89 Contig 5255 (eIF4A-3) 3.00E-102 apicomplexan RDGRFR, ACYDQR, NFKVWR, EQCGKRGS, VWRGDCTR, DGAGTQGADCGR, CASLSEIASAAR, HSGASSRLQCR, DGCREQCGKR, DGRFRALPNLR, YAWSWAISQPR Contig 4790, SNF2 family chromodomain- 11 32 Rev 4075 helicase (chromatin assembly) 2.00E-05 Chlamydomonas RALHCGR, SLSCEHS, CCDIRR, RLLHHR, QAPAIVCR, CNSELDPR, IIIRCCDIR, FLVLQRNR, CIIITSVTDPIIIR, AHDlLLTIGKV, LHEASCSRGSK, HQIIIRllCRVD, PWLRCIIITSVTDPIliR

13 37 Rev7117 Apo-Eif4aiii (eIF4A-3) O.OOE+OO H. sapiens

64 Chapter 3: Characterization and regulation of S-phase genes and proteins

inK. brevis

65 INTRODUCTION

The dinoflagellate cell cycle is typical of that in other eukaryotic organisms, with distinct G 1, S, G2, and M phases. In K. brevis, the cell cycle is under circadian control

(Brunelle et aI., 2007), and is thereby entrained to the lightldark cycle, with S-phase beginning approximately 6 hours after onset of light, and mitosis occurring late in the dark, peaking at 20-22 hours of a 24 hour diel cycle (Van Dolah and Leighfield, 1999).

Conserved eukaryotic cell cycle regulatory proteins, cyclins and cyclin dependent kinases

(CDK) are present in dinoflagellates (Liu et aI., 2005, Zhang et aI., 2006, Toulza et aI.,

2010, Bertomeu et aI., 2007, Bertomeu and Morse, 2004, Barbier et aI., 2003) and appear to be necessary for cell cycle progression through G liS and G2/M (Van Dolah and

Leighfield, 1999; Bertomeu and Morse 2004). In higher eukaryotes, CDK-dependent phosphorylation at G liS activates the transcription of a core set of S-phase genes in a cell cycle dependent manner, thereby ensuring that specific processes are limited to the appropriate cell cycle phase (Rustici et aI., 2004; Spellman et aI., 1998). Late in the cell cycle, CDK dependent phosphorylations trigger entry into mitosis, including nuclear envelope breakdown and chromatin condensation. However, dinoflagellates have a number of unusual features that suggest modification of some of these cell cycle control mechanisms.

Dinoflagellate chromosomes are permanently condensed and lack nucleosomes that typically control the degree of chromatin condensation and thereby access for transcription and replication. Instead, the chromatin occurs throughout the cell cycle in a 66 tightly packed liquid crystal state, with peripheral arches of exposed DNA thought to be

areas undergoing active transcription (Costas and Goyanes, 2005). Several lines of

evidence point to the possibility that transcription in dinoflagellates is constitutive (Van

Dolah et aI., 2007). No recognizable promoter sequences (e.g., TATA boxes or other) have been identified in dinoflagellate genes. Further, all nuclear encoded gene transcripts

examined in K. brevis possess an identical 22 nucleotide leader sequence on their 5' end

termed the spliced leader (SL; Lidie and Van Dolah, 2007; Zhang et aI., 2007). SL

trans-splicing was first observed in trypanosomes, an early diverging branch of unicellular eukaryotes that carry out constitutive transcription and produce polycistronic messages (Murphy et aI., 1986). Trans-splicing of the SL about 100 nucleotides upstream of an open reading frame serves to process these polycistronic pre-mRNAs into

individual mature messages, each with a 5' capped SL and a 3' poly A tail. With messages produced constitutively, differential gene expression in trypanosomes is regulated by differential rates ofmRNA turnover and differential rates of trans-splicing

(Vanhamme and Pays, 1995). The presence of the SL mechanism in dinoflagellates suggests that their transcription may take place constitutively as well. Indeed, many physiological processes in dinoflagellates appear to be under translational control, including bioluminescence (Morse et aI., 1989; Mittag et aI, 1998), carbon fixation

(RUBISCO, Nassoury et aI., 2001), glycolysis (GAPDH, Fagan et aI., 1999), and photosynthesis (PCP, Le et aI., 2001). In K. brevis, a microarray study of diel dependent gene expression suggested that cell cycle control is also post-transcriptional, based on the absence of differential expression of cell cycle gene transcripts (Van Dolah et aI., 2007).

67 During the cell cycle of animals and plants, the G 1IS phase transition is triggered by CDK phosphorylation of the retinoblastoma protein (Rb), a "pocket protein" that serves to sequester the transcription factor E2F in inactive cytosolic complexes during

G 1. Phosphorylation of Rb results in the release of E2F, which translocates to the nucleus and orchestrates the transcription of genes required for DNA synthesis. The

Rb/E2F pathway is present in green (Robbens et al. 2005, Umen and Goodenough

2001) and diatoms (Huysman et al. 20 1O),but is absent from Apicomplexa (Gardner et al.

2005), the nearest relatives to dinoflagellates, and has not been identified in dinoflagellates. Nonetheless, S-phase entry in K. brevis appears to be CDK dependent, as inhibiting CDK activity with the inhibitor, olomoucine, causes a G 1 cell cycle block (Van

Dolah and Leighfield, 1999). Given the current paradigm for CDK controlled S-phase entry, it is unclear how transcription-independent S-phase entry might be achieved.

The study in Chapter 3 therefore investigates the regulation of a suite of S-phase specific genes and their protein products that are critical to the DNA synthesis phase of the cell cycle. A survey of S-phase genes present in an extensive EST collection from K. brevis was first carried out to gain insight into how conservedthe dinoflagellate S-phase processes are in comparison with other eukaryotes. From this survey, four S-phase genes that have strong sequence homology with orthologs in other eukaryotes were selected for study, all of which are typically under CDK-dependent transcriptional control at the G liS transition through its activation of E2F. The genes investigated include: proliferating cell nuclear antigen (PCNA), replication factor C (RFC), replication protein A (RPA), and ribonucleotide reductase (RnR2). PCNA, RFC, and RPA are members of the replication fork complex where PCNA is a sliding clamp that forms a homotrimeric ring structure 68 around DNA, endowing DNA polymerase 8 with a high degree ofprocessivity (Kuriyan and O'Donnell, 1993). RFC is a complex of 5 subunits that loads PCNA onto the 3' hydroxyl end of the primer strand of the DNA primer-template in an ATP-dependent process (Loor et aI., 1997). RP A is a trimeric single stranded DNA binding protein that stabilizes the replication fork, stimulates DNA polymerase 8 activity in the presence of the PCNAlRFC complex and is involved in both initiation and elongation of DNA replication (Lee and Hurwitz, 1990). Ribonucleotide reductase is a tetramer, consisting of two large and two small subunits, that reduces ribonucleotides to deoxyribonucleotides during S-phase and in response to DNA damage.

To assess the regulation of these genes, their transcript levels were first analyzed over the course of the cell cycle using quantitative PCR. Peptide antibodies designed from these sequences were next employed in western blotting and immunolocalization to investigate cell cycle specific changes in protein abundance and/or localization.

Although there were no significant changes in transcript levels of these S-phase genes, there were S-phase specific increases in protein abundances. Furthermore, in the case of

PCNA, an S-phase specific -"-J9 kDa increase in molecular weight was observed that may be attributed to a post-translational modification. This S-phase specific shift in the molecular weight ofPCNA coincided with its chromatin association as determined by immunolocalization. Together these results suggest that the expression of S-phase specific proteins in K. brevis is independent of their transcription upon entry into S phase.

69 METHODS

Survey of Cell Cycle Genes in K. brevis

04 Sequences with significant homology (BLASTx E-value < 10- ) to cell cycle genes were identified from a K. brevis expressed sequence tag (EST) database containing

65,292 sequences housed at NOAA (Charleston, SC) and available at marinegenomics.org. ESTs were previously assembled into 22,671 unigene contigs using

Cap3, with a minimum 30 bp overlap and 70% identity set as assembly criteria.

Annotation of contigs and gene ontology analysis were performed in Blast2Go.

Analysis of Selected S-phase Gene Sequences

Among the cell cycle genes, sequences for several genes with significant homology (E-value < 10-5°) to S-phase genes on GenBank were selected for further analysis. These include PCNA, RFC, RPA and RnR. To confirm the presence of the spliced leader on contigslacking the 5' end of the transcripts, PCR was carried out using gene specific reverse primers (Table 1) in conjunction with a truncated primer for the dinoflagellate-specific spliced leader (TCCGTAGCCATTTTGGCTCA). Total RNA

(200 ng) was reverse transcribed (Ambion RETROscript kit, Austin, TX) using an oligo

(dT) primer (Ambion, Austin, TX). PCR was conducted using Qiagen HotStar master mix (Qiagen, Valencia, CA) with a 400 nM final concentration of each primer. PCR parameters were as follows: 95°C for 1 min; 25 cycles: 94°C for 1 min, 60°C for 1 min, and 72°C for 1 min.; and 72°C for 10 min. PCR products were analyzed on an Agilent

Bioanalyzer 2100. Amplicons were cloned using a TopoTA cloning kit (Invitrogen,

Carlsbad, CA), and purified plasmid DNA was sequenced in both forward and reverse

70 directions using the universal M13 primer (Seqwright, Houston, TX). Sequenced products were aligned with existing sequences using Bioedit software.

Multiple Alignments of Amino Acid Sequences

In silico translationsofthe full length open reading frames of K. brevis PCNA,

RPA1, RFC3, RFC5, RnRland RnR2 were performed using BioEdit software. To obtain the predicted theoretical molecular weight for each protein, the amino acid sequences were analyzed using the Compute pIlMw tool on the ExP ASy Proteomics server

(http://expasy.org/tools/pi tool.html). For multiple alignments of each protein sequence, the amino acid sequences from other eukaryotes were obtained from GenBank. Species used for this analysis included: Trypanosoma brucei, Chlamydomonas reinhardtii,

Arabidopsis thaliana,, and Homo sapiens, to include representatives across multiple diverse phyla. Once imported into BioEdit software, multiple alignments of the 6 species were done using the ClustalW Multiple Alignment application.

Cultures and Culture Conditions

K. brevis (Wilson isolate) was grown in 0.22 f.lm filtered seawater (36%0) obtained from the seawater system at the Florida Institute of Technology field station, Vero Beach,

Florida, enriched with j/2 nutrients (Guillard, 1973), modified with the use of ferric sequestrene in place ofEDTA'Na2 and FeC13·6H20 and the addition of 0.01 f.lM selenous acid. Cultures were maintained in 1 liter glass bottles at 25°C + 1°C with a 16:8 light:dark cycle, and cool white light at 50-60 f.lmol photons m-2 sec-I, determined using aLI-COR LI-250 Light Meter (LI-COR BioSciences, Lincoln, NE). To compare the transcript abundance and protein expression levels of the S-phase genes over the cell 71 cycle, replicate 1 L cultures were grown to mid-log phase when cell concentrations were approximately 104 cells/mL. Cells were harvested at times corresponding to G 1 (LDT2 and LDT6), S phase (LDTIO and LDT14), and G2/Mitosis (LDT18 and LDT22), where

"LDT" designates "light-dark" time, or hours after lights-on in a 24 hour light-dark cycle.

A total of five independent cultures were harvested at each time point for qPCR and

Western blot analysis. At least three independent cultures at each time point were used for immunolocalization studies. From each culture, a 10 mL aliquot of cells was processed for flow cytometric cell cycle analysis while the remaining cells were used for

RNA and protein extractions.

Cell Cycle Analysis

Ten milliliters of culture were collected from each 1L biological replicate culture

(n=5) at each time point Cells were fixed in 2% gluteraldehyde by adding 50% gluteraldehyde directly to culture medium containing cells and stored at 4°C for at least

12 hours. Cells were then centrifuged at 1750 x g to remove fixative and growth medium. To extract cellular pigments, cells were stored in -20°C methanol for a minimum of 4 hours, centrifuged at 1000 x g and methanol was removed. The cells were then stained with 10 Jlg/mL propidium iodide (final concentration) in phosphate buffered saline (PBS) containing 10 mg/mL RNase and 0.5% Tween 20. DNA analysis was performed on an Epics MXL4 flow cytometer (Beckman Coulter, Miami, FL) using a 5

W argon laser with a 488 nm excitation wavelength and 635 nm emission wavelength.

Multicycle software (Phoenix Flow Systems, San Diego, CA) was used to obtain the percentages of cells in each stage of the cell cycle.

72 qPCR

Forward and reverse primers sets were designed for concensus PCNA, RFC,

RPAl and RnR2 cDNA sequences (Tablel) and optimum annealing temperatures were

determined prior to the analysis of experimental samples. The specificity of each primer

set was verified by analysis of amplicon size determination on an Agilent Bioanalyzer

(Santa Clara, CA)and by melt curve analysis. Efficiencies of the primer sets were

determined using a standard curve of cDNA produced from K. brevis. Experimental

samples were harvested at the aforementioned timepoints from the five independent

cultures by centrifugation at 600 x g for 10 minutes and medium was removed. Total

RNA was isolated by the addition of 1 mL of TriReagent to the cell pellet (Molecular

Research Center, Inc., Cincinnati, OH) and processed according to manufacturer's

instructions. Residual DNA contamination was removed by incubation with 27.3 units

RNase-free DNAse and processed through a Qiagen RNeasy mini-column (Valencia,

CA). RNA was quantified using a Nanodrop spectrophotometer (Wilmington, DE, USA)

and qualified using anAgilent Bioanalyzer.For each sample, 200 ng of total RNA was reverse -transcribed (Ambion RETROscript kit, Austin, TX) with oligo (dT) primers in

triplicate. Duplicate real-time assays were performed on each reverse-transcribed sample

on an Applied Biosystems 7500 Real-Time PCR system (Foster City, CA, USA). A

sample volume of25 f.lL was used for all assays, consisting of ABI SYBR green (Applied

Biosystems) PCR master mix, nuclease-free water, gene-specific primers (400 nM final concentration), and If.lL eDNA. Assays were run using the following protocol: 95°C for

15 min, 40 cycles of 94°C for 15 s, gene-specific annealing temperature (58°C-62°C) for

40 s, and 72°C for 1 min. To quantify the fold change in mRNA expression ofPCNA, 73 RFC, RP A and RnR2, one cDNA sample from K. brevis was used to develop a standard curve for all primer sets relating efficiencies and slope values. PCR products were subjected to melt curve analysis to confirm the presence of a single amplicon. Statistical analysis of the fold-change values (log2 ratio= C-E, where C is the control and E is the experimental samples) was evaluated using a one-way ANOVA in GraphPad Prism 4 (La

Jolla, CA).

PCNA Phylogenetic Analysis

Fourteen K. brevis PCNA contigs were aligned with a dinoflagellate (Pyrocystis lunula) PCNA and 4 higher plant PCNAs to generate a preliminary alignment in BioEdit.

However, for simplicity of analysis, only one K. brevis contig was used for the final alignment. Contig 2450 was chosen for this analysis since it contained the full open reading frame and had a high e-value by BLASTx analysis. Sixty-four PCNA amino acid sequences were then obtained from NCBI's GenBank from the Archaea and Eucarya domains to produce the final alignment. Eukaryotes represented in the final alignment include animals (vertebrates and invertebrates), fungi, plants and various protists, including other algal species. The prokaryotic Eubacteria were not included in this alignment due to the low level of sequence identity of their clamp-loading protein sequences with Archaea and Eucarya (Indiani and O'Donnell, 2006). Archaeabacteria were included in this analysis because PCNA sequences from this domain have known sequence identity to eukaryotes. Alignments were performed in BioEdit to manually search for conserved domains.ProtTest 1.2.7 software was used to determine the best fit model for amino acid replacement. The JTT substitution model was used for PCNA 74 (Jones et aI., 1992). ClustalX and MEGA Software packages were used to perfonn

phylogenetic analysis. MEGA software was used to generate a Neighbor-joining tree with

1000 bootstrap replicates.

PCNA Protein Homology Model

K. brevis PCNA Contig 2450 was used to build a protein homology model. This

protein, or amino acid, sequence was BLASTed (protein-protein) against NCBI's

GenBank using the SwissProt database which is smaller and better annotated than the

non-redundant. Cn3D 4.1 software was then used to detect the conserved domains in the

K. brevis sequences. The K. brevis sequence was then loaded into SwissPBD Viewer

software. A template upon which to build the model was chosen from the SwissModel

database (http://swissmodel.expasy.org/cgi -binlhlastexpdb.cgi) of known crystal

structures of proteins. Human PCNA was chosen as the template since the K. brevis

PCNA sequence (Contig 2450) had the highest homology to this protein. The homology

model was built after threading and aligning the K. brevis sequence with the template.

This alignment was then submitted to the Swiss PBD Viewer Server for model

refinement. Additionally, the exP ASY Proteomics

(http://www.expasy.org/tools/pi_tool.html) server was used to calculate the theoretical pI

(isoelectric point) and molecular weight of K. brevis PCNA.

Antibody Development

In order to study protein expression, it was necessary to obtain antibodies that

specifically cross-react with proteins of interest in the organism under investigation. 75 Although the S-phase genes are highly conserved in eukaryotes, the dinoflagellates

represent a distinct group of primitive eukaryotes, the , and commercial cell

cycle antibodies tested did not cross-react with K. brevis proteins, with the exception of

anti-RnR2 (Calbiochem). For PCNA, in silica translated sequences and the above protein model were analyzed to determine an optimal region for peptide antibody production

suitable for use in immunolocalization, specifically, one occurring in an exposed area of the native folded protein. The resulting short peptide (DRIADFDLKLMQIESE) was

BLASTed against an in-house K. brevis EST database and GenBank to ensure specificity.

The resulting peptide sequence was submitted to EZBiolab (Carmel, IN, USA) for polyclonal antibody production. Affinity purification was carried out using a SulfoLink

Immobilization kit (Thermo Scientific, Rockford, IL, USA). All other peptide antibodies

(Table 2) were designed by ProSci, Inc. (Poway, CA, USA)using standard algorithms for hydrophilicity, antigenicity and surface probabilityand affinity purified polyclonal antibodies were generated. The peptides used for development of each antibody are listed in Table 2.

Protein Extraction and Western Blotting

Protein was extracted from the same experimental samples used for qPCR using

TriReagent, following the manufacturer's protocol. A Bradford protein assay (Bradford,

1976) was used to quantify protein concentration using the Bio-Rad Protein Assay Dye

Reagent (Bio-Rad, Hercules, CA, USA). For Western blotting, 10 ~g of protein extract were loaded on a 4-12% Bis-Tris gradient gel (Invitrogen Corporation, Carlsbad, CA) and then transferred to a 0.2 J-tm polyvinylidene fluoride (PVDF) membrane using the 76 Invitrogen iBlot system (Invitrogen Corporation, Carlsbad, CA). The PVDF membrane was then blocked in 5% non-fat milk in Tris buffered saline (TBS) with 0.1 % Tween 20 for 1 hour at room temperature. Membranes were then incubated in primary antibodies overnight at 4°C. Peptide antibodies developed against translated K. brevis cDNAs were used at the following dilutions (v/v) in TBS with 0.1 % Tween 20 and 5% non-fat milk: anti-KbPCNA, 1:5000; anti-KbRFC5, 1:2000; and anti-RPA1, 1:2000 (Table 2). A commercial antibody was used for the detection of ribonucleotide reductase 2 at a dilution of 1 :60 (v/v) (Calbiochem, San Diego, CA). Membranes were washed 3 x 10 minutes in TBS with 0.1 % Tween 20. Horseradish peroxidase-linked donkey anti-rabbit

Ig was then incubated with the PVDF membrane at 1 :2000 in TBS with 0.1 % Tween20 for 1 hour at room temperature. After rinsing membranes in TBS with 0.1 % Tween 20 (2 x 10 minutes) and TBS (Ix 10 minutes), the Pierce ECL Super Signal West Pico Kit

(Pierce, Rockford, IL) was used for detection. Membranes were exposed to film (Pierce

CL-XPosure Film, Thermo Scientific, Rockford, IL) and developed for visualization.

Densitometry of the bands of the biological replicates (n=5) was carried out using Alpha

Innotech (San Leandro, CA) FluorChem 8900 software. The band intensities of all time points were normalized to LDT2 and significant differences were determined with a one­ way ANOVA using GraphPad Prism 4 (La Jolla, CA) software.

Peptide Competition Assay

Ten micrograms of protein extract from cells in S-phase (LDT 10 andLD14) were used for the PCNA peptide competition experiment. The anti-K. brevis PCNA antibody

(3.6Jlg) was incubated with 1DO-fold excess ofthe peptide (360f.lg) to which the antibody 77 was developed, in TBS with 0.1 % Tween 20 and 5% milk for 1 hour at 4°C. The experimental membrane was incubated overnight at 4 °C with the peptide blocked anti­

PCNA. The control blot was incubated overnight at 4°C in the anti-K. brevisPCNA antibody at 1:5000 (v/v) in TBS with 0.1 % Tween20 and 5% non-fat milk. Western blot procedures were then followed as described above.

Immunolocalization

Triplicate cell samples were fixed in 2% paraformaldehyde in seawater for 10 minutes and centrifuged for 10 minutes at 250 x g. Cell pellets were then washed in

Phosphate Buffered Saline (PBS), resuspended in 2 mL of -20°C methanol and incubated at 4°C for 10 minutes. The pelleted cells were incubated with normal goat serum (Vector

Laboratories, Inc., Burlingame, CA) at 4°C for 45 minutes and then incubated overnight at 4°C with the K. brevis anti-PCNA antibody diluted at 1: 100 in PBS with 0.1 %

Tween20. Cells were also incubated with the K. brevis anti-PCNA antibody blocked with 100-fold excess of peptide ( above) to test for specificity. All cells were then incubated with fluorescein isothiocyanate (FITC)-labeled goat anti-rabbit (Sigma­

Aldrich Corp., St. Louis, MO, USA) at 1: 100 for 1 hour at room temperature. After 3 X

5 minutes washes in PBS with 0.1 % Tween20, cells were counterstained with 0.1 Ilg/mL

DAPI (4',6-diamidino-2-phenylindole) (Molecular Probes) in PBS for 5 min at room temperature to visualize nuclei. The cells were mounted in Fluoromount-G (Southern

Biotech, Birmingham, AL, USA). An Olympus Fluorescent Microscope with a FITC filter set was used for imaging. One hundred individual cells from each replicate were counted to determine the percentage of cells stained positively for PCNA. 78 RESULTS

Survey of Cell Cycle Genes in K. brevis

In order to gain insight into the conservation of genesinvolved in the K. brevis cell cyclea survey ofgenes present in a K. breviswas carried out on a cDNA sequence database containing 65,292 ESTs. Based on the genome size prediction of Hou and Lin

(2009), we predict that this EST database contains approximately half of the K. brevis expressed genes. A total of 130 cell cycle genes were identified based on BLASTx and

Gene Ontology,assigned using Blast2Go (Appendix 1). This is the largest sampling of cell cycle genes in a dinoflagellate to date and provides preliminary insight into the conservation of the eukaryotic cell cycle in this divergent . These included conserved eukaryotic genes involved in central cell cycle control, pre-replication complexes, such as mini chromosome maintenance proteins, S-phase specific genes involved in DNA replication, and mitotic genes, such as centrins and members of the anaphase promoter complex. Genes directly involved in DNA replication are shown in gray. Six highly conserved S-phase genes were chosen for further analysis in this study and are shown in bold: PCNA, RFC3, RFC5, RPAl, RnRI and RnR2. The conservation and diversity of these transcripts in K. brevis is shown in Table 3.

Characterization of Selected S-phase genes

PCNA

PCNA gene transcripts were among the most highly expressed ESTs in the library, with 10 1identified using an expect value <10-1°. Among the PCNA ESTs, significant sequence variability exists, as previously reported in other dinoflagellates 79 (Zhang et aI., 2006), with the 10 lK. brevis ESTs assembling into 11 unique contigs.

These sequences have been submitted to Genbank as KbPCNAs (Accession Numbers

IF491177-1F491185). Most of the sequence differences between these contigs reside in single nucleotide substitutions found in third codon positions, resulting in synonymous substitutions. Thus the 11 contigs encode nine unique proteins with substitutions found at only 20 conserved sites among 259 amino acids (Table4). None of the amino acid sequences was identical to any of the nine K. brevis PCNA sequences previously submitted to GenBank (Accession numbers FJ1805026.1 - FJI805034.1), although all represent different combinations of substitutions at the same 20 sites. Altogether, a minimum of20 unique PCNA genes have been described from K. brevis. All PCNA open reading frames for which full length sequences were obtained are of the same length, 780 nucleotides, with a predicted protein molecular weight of 28 kDa. This predicted molecular weight is similar to what has been identified in other eukaryotes and

K. brevis shows a high degree of conservation with model eukaryotes across phyla

(Figure lA).

A neighbor-joining tree ofPCNA sequences from diverse eukaryotic phyla and archaea found the K. brevis PCNA to be distant from animals and higher plants, each of which formed monophyletic clades with moderate bootstrap support (81 and 74%, respectively). The dinoflagellates similarly formed a monophyletic group with 99% bootstrap support and fell out most closely with other protists, the kinetoplastids and

Apicomplexa, although these deeper branches are not strongly supported (Figure 2).

After several unsuccessful attempts to use commercial antibodies for PCNA (made to metazoans), we used this information, along with the multiple alignment in Figure lA to 80 guide antibody development. A conserved 16 amino acid peptide

(DRIADFDLKLMQIESE) was identified in all K. brevis PCNA sequences and

BLASTed against the K. brevis EST database and GenBank to ensure its specificity. The

full length PCNA protein sequence (from Contig 2450) was next used to produce a computational 3D protein model in order to determine where on the protein the antibody would bind (Figure 3, top). The K. brevis PCNA monomer has the conserved structure seen in archaea and yeast (Figure 3, bottom) that enables it to form a trimeric DNA clamp. From the computational model of PCNA, it appears that the peptide

DRIADFDLKLMQIESE is located on the outside surface of the protein, making this good candidate for use in immunolocalization.

Replication Protein A

Expressed sequence tags with homology to RPA were analyzed next. RPA is a single strand DNA binding protein that typically consists of three subunits. Only ESTs for RPA1, the largest subunit, responsible for DNA binding, were identifiable in the K. brevis library (Table 3). Twenty-eightESTs were found, which fold into three contigs.

The two full length contigs were submitted to Genbank as KbRP A 1a and KbRPAlb

(Accession numbers JF491189 and JF491176). These encode for proteins of 464 and 465

129 amino acids, respectively, and have highest homology to Perkins us (e-value 1.0- ;

GenBank Accession number XP002779439). The predicted molecular weight of these proteins is 51 kDa. This is somewhat smaller than the 70 kDa RP A 1 protein identified as

RPAI in mammals, but is similar in size to that found in trypanosomes (Zicket at, 2005) and other protists. This can be seen in the multiple alignment of RP A 1 (Figure 1B). 81 Consistent with this finding, a peptide antibody made to K. brevis RPA1 recognizes a

protein of 51 kDa.

Replication Factor C

RFC is made up of five subunits in eukaryotes, of which only subunits 3 and 5 were identified in the K. brevis library. A single EST was identified for RFC subunit 3.

The K. brevis RFC3 protein sequence shows high conservation when aligned with other eukaryotes (Figure IC). The predicted molecular weight of K. brevis RFC3, ---39kDa, is also highly conserved. This sequence was submitted to GenBank as KbRFC3 (GenBank accession number JF491186). Seven ESTs for RFC subunit 5 folded into two unique contigs, one of which is full length. Similar to RFC3, the K. brevis RFC5 protein sequence also shows homology and all species investigated in the multiple alignment have an amino acid sequence of approximately the same size, with a predicted molecular weight of 39 kDa (Figure ID). A peptide antibody developed to RFC5 using the translated sequence indeed recognizes a band at 39 kDa (below). These sequences have been submitted to GenBank as KbRFC5a and KbRFC5b (GenBank accession numbers

JF491187 and JF491188).

Ribonucleoside Diphosphate Reductase

The two subunits that make up the RnR tetramer, interestingly, are not equally represented in the K. brevis library. RnRl is highly expressed, with 54 ESTs that fold into 12 unique contigs, encoding at least 3 unique proteins (Table 3). RnRI shows a very high degree of homology with other eukaryotes until the C-terminus (Figure IE). C. 82 reinhartii and S. cerevisiae have longer sequences with predicted molecular weights of

98 and 97.5 kDa, respectively, whereas, unexpectedly, the K. brevis sequence has a molecular weight (--90 kDa) more similar to that of H. sapiens (90kDa) and A. thaliana

(92 kDa). In contrast to RnR1, sequence representation of its partner RnR2 is much lower in the K. brevis EST library, with only 2 ESTs present, which form a single contig

(submitted to GenBank as KbRnR2 (Accession No. JF699608). RnR2 is also highly conserved (Figure IF). There is some sequence variation in the N-terminus between the species investigated; however, the predicted molecular weights of all sequences are within a tight range of ---40-45kDa. Sufficient conservation is present in K. brevis RnR2 that we were able to successfully use an antibody made to human RnR2 for studies in K. brevis (Figure IF, boxed region), which recognizes a doublet of bands at --40, as also found in yeast (Hurd et aI., 1987) (below).

Identification of the Spliced Leader Sequence on S-phase Genes

Given the widespread presence of the SL of dinoflagellate gene transcripts, and its importance to gene expression, we next sought to determine ifit is present on the S­ phase genes under study. Several of the PCNA contigs were found to possess the 5' SL sequence (Table 4) and where the SL was missing, the full 5' UTR sequence is lacking

(i.e., this represents incomplete sequence data, not the absence of the SL on a full-length sequence). To confirm the presence of the SL on the other S-phase genes, PCR amplification was carried out on one representative of each using a spliced leader primer and gene-specific primer. We found that the SL was present on each representative examined, in each case approximately 50-100 nucleotides 3' of the start codon (Figure 4).

83 Th,ese sequences have been submitted to GenBank (Accession numbers: KbRPAla -

IF491189; KbRFC3 - JF491186; RnRl -lF491190).

Cell Cycle Dependent Expression of S-phase Gene Transcripts

Flow cytometry (Figure 5) was first carried out to confirm that the cells used in this experiment were actively cycling through the cell cycle. The flow cytometric cell cycle distribution indicated that approximately 50% of the cells were actively proceeding through the cell cycle in a diel phased manner as previously described (Van Dolah and

Leighfield, 1999). Essentially all cells ('"-'95%) were in G 1 at LDT2. The peak of S­ phase occurred at LDT14 with the peak ofG2+M at LDTI8, which accounted for approximately 50% of the cell population proceeding through the cell cycle (viewed most easily as the minimum in % G 1). Most cells had completed mitosis and returned to G 1 by LDT22. Since the cultures were not 100% synchronous, some variability in the rate at which cultures entered S-phase is evidenced by the variability at the LDTIO time point.

Transcript abundance of the S-phase genes was next assessed by quantitative PCR in five replicate cultures at each time point. There was no significant change in expression of any of the S-phase genes over the cell cycle relative to LDT2 (ANOVA, p>0.05; Figure 6). In particular, during S-phase (LDTIO-LDT14) expression levels were either level or lower than at LDT2, the opposite behavior from that found in most eukaryotes. These results are consistent with their post-transcriptional regulation in S­ phase and confirm previous observations in K. brevisfrom in a microarray study (Van

Dolah et aI., 2007).

84 Abundances of S-phase Proteins Over the K. brevis Cell Cycle

Given the absence of S-phase specific transcription of these genes, their proteinswere next investigated to determine if they are differentially expressed, suggesting S-phase specific translation, or if they are constitutively produced and their activity is regulated post-translationally. Protein abundances were compared in cultures during Gl (LDT2 and LDT6), S-phase (LDTI0 and LDTI4) and G2+M (LDTI8 and

LDT22) (Figure7). The Western blots shown are representative of five replicate cultures at each time point with the mean + S.E.M. of the band intensities normalized to the mean intensity at LDT2. Although averaged densitometry of RP Al (Figure7 A) did not indicate a significant increase in band intensity, the blots generally showed that a small increase in abundance typically appeared around LDTI0-LDT14, with the protein sometimes, but not always, largely absent by LDT22. RnR2 (Figure 7B) peaked during S-phase, with highest abundance at LDT14 and LDT18. RFC also typically peaked at LDT14 (Figure

7C), but the levels of RFC late in the ceB cycle varied. In some replicates, RFC remained nearly as high late in the cell cycle as at LDT 14, but in others it was nearly absent.

PCNA (Figure7D and E) showed both increased protein abundance during S-phase and an increase in molecular weight of ---9 kDa beginning at LDTI0, with the larger band maximally expressed at LDT14 and LDTI8, and both the 28 kDa form and the 37 kDa forms largely disappearing at LDT22.

Since 28 and 37 kDa bands appeared consistently during S-phase in the PCNA

Western blots, a peptide blocking experiment was performed to ensure that both bands were specific to PCNA. Figure8shows that both bands represent specific binding to the

K. brevis PCNA antibody. Therefore, this larger band may be due to a post-translational 85 modification since the 28 kDa band is the expected size of eukaryotic PCNA and detennination of the amino acid sequence on http://expasy.orgltools/pi_too1.html also predicts K. Brevis PCNA to have a mass of approximately 28 kDa.

Subcellular Localization of PCNA

The subcellular localization pattern ofPCNA in GI (LDT 2), S-phase (LDTI4) and G2+M (LDT18) was examined next to detennine if there was a correlation between localization and the appearance ofthe 37 kDa PCNA band observed by Western blotting.

Immunofluorescence showed that, unlike most eukaryotic cells, PCNA was present in the nucleus at all timepoints investigated, with 53% of the cells staining positive for PCNA in G 1 (LDT2). There was an increased percentage of PCNA positive cells (70%) at the peak ofS-phase (LDTI4), and 52% remained positive in G2+M at CTl8 (Figure9A).

Qualitatively, the S-phase cells stained more brightly and showed a distinctive punctate pattern of chromatin-associated PCNA suggestive of localization to replication forks

(Figure 9C), whereas staining in the LDT2 cells was frequently more intense around the periphery of the nucleus (LDT2, arrow) and at LDT18 cells staining was more diffuse rather than punctate. PCNA was not observed in the cytosol.

In the experiment above, the cells at LDT2 showed 52% positive for PCNA

(Figure 9B). What is not clear is whether these represent cells that, although they are in

G 1 as assessed by flow cytometry, have already passed the restriction point and thus have started expressing PCNA. Therefore, to further synchronize the cells in G 1, cultures were placed in constant darkness for two and a half light cycles, such that any residual cells "primed" to enter the cell cycle would have completed their cycle. The idea was 86 that when the dark-synchronized cells, arrested in G I, are released into light, PCNA­ positive cells would represent only cells proceeding into S-phase unless PCNA is constitutively expressed. Flow cytometry of the dark synchronized cells showed that approximately 20% of the cell population entered S-phase by LDTI4 following release into the light (Figure lOA). Both the total percentage ofPCNA positive cells and the percentage of cells with chromatin-associated PCNA increased significantly as cells proceeded into S-phase (LDTI4; Figure lOB). The results suggest both that PCNA was newly synthesized and that the chromatin association ofPCNA corresponds temporally with appearance of the 37 kDa form ofPCNA during S-phase as observed by Western blot.

87 DISCUSSION

Eukaryotic cell cycle progression is controlled largely by the transcriptional and/or post-translational activation of highly conserved protein complexes precisely during the cell cycle stage at which they are needed. In humans, yeast, and Arabidopsis, several hundreds of genes are differentially transcribed over the cell cycle, yet recent comparative analyses reveal that which genes are regulated by transcription or by post­ translational mechanisms varies considerably among eukaryotic lineages (Jensen et aI.,

2006). Thus there is not one universal mechanism for eukaryotic cell cycle regulation.

Instead, it appears that transcription and post-translational regulation have co-evolved, yielding different solutions to achieve the cell cycle dependent activation of orthologous protein complexes (Jensen et aI., 2006). Trypanosomes, which lack transcriptional control, have evolved yet a different level of regulation, in which differential expression of cell cycle genes is achieved post-transcriptionally by increasing the stability of phase­ specific messagesthrough the binding of RNA binding proteins (Archer et aI., 2009). S­ phase genes possess a conserved octameric post-transcriptional control element,

CATAGAAG, in either the 5' or 3' UTR that regulates cell cycle-dependent accumulation of these mRNAs (Brown and Ray, 1997; Zick et aI., 2005). Since dinoflagellates, although evolutionarily distant from trypanosomes, also appear to rely largely on SL mediated post-transcriptional regulation of gene expression,we investigated whether K. brevis also uses a post-transcriptional solution to S-phase gene expression. In the current study, no changes were observed in message abundance of four S-phase genes, but increases in their protein expression levels during DNA replication were observed, suggesting that in the dinoflagellate regulation of S-phase takes place at the 88 level of translation. It is possible that the absence of observable increases in S-phase associated transcript levels reflects the fact that only about 50% of the population was proceeding through the cell cycle (Figure 5). However, even when cell cycle synchrony of70% (the maximum attained for K. brevis) was achieved whenK. brevis was placed under continuous light, there were no changes in any cell cycle transcripts (Van Dolah et aI., 2007).

Three of the genes investigated in this study, PCNA, RFC5, and RPA1, work together as components of the replication fork complex that facilitates DNA synthesis, while the fourth, RnR2 converts ribonucleotides to deoxyribonucleotide substrates for

DNA synthesis. Given the coordinated roles of their proteins in DNA synthesis, their relative expression levels in the cDNA library revealed some surprising differences.

Dinoflagellates are well known for gene amplifications, with many genes being present in high copy numbers, often in tandem repeat units (Le et aI., 1997; Zhang and Lin, 2003;

Bertomeu and Morse, 2004; Liu and Hastings, 2006; Bachvaroff and Place, 2008).

PCNA is a highly amplified gene, present in 41 copies in (Zhang et aI., 2006).

In the K. brevis library, 101ESTs assembled into 11 unique contigs, all of which are unique from the 9 sequences in GenBank, for a minimum of20 unique PCNA genes in K. brevis. RnRI ESTs were similarly highly expressed, with a minimum of 12 unique genes represented. In contrast, RnR2, RFC and RPA appear to be low-copy genes and their expression levels in the library are similarly low. Although expression ofESTs cannot be considered a quantitative measure of relative expression, the sampling of 65,292 ESTs should be sufficient that this difference in representation is not random. Such

89 mismatches in expression levels of functionally related genes have previously been noted

in K. brevis, e.g., 0 a and ~ 0 and tubulin (Lidie et aI., 2005).

Cell Cycle Dependent Expression

Replication Protein A

Replication Protein A, the single strand DNA binding protein that serves to hold

the replication fork open, is an example of a cell cycle protein whose cell cycle regulation

differs among eukaryotic lineages. In Saccharomyces cerevisiae, all three subunits are

differentially transcribed., whereas in Schizosaccharomyces pombe only subunit ssbl

changes at the transcript level, and only RPA2 is differentially expressed in humans. In

this study., the behavior of the large subunit ofRPA (RPAl) was investigated, the only

subunit identified in the K. brevis EST library. The data presented here suggest that the

RPAI gene has not undergone the extensive gene duplication as seen in PCNA. No

changes in transcript level were observed. There was no statistically significant change

in RPAI expression although visual inspection of the Western blots suggests a trend

toward higher expression during S-phase and minimal expression in mitosis.

Ribonucleotide Reductase

The activity of ribonucleotide reductase., a tetramer consisting of two large and

two small subunits that reduces ribonucleotides to deoxynucleotides., is maximal during

S-phase in all eukaryotes and in the , human, and Arabidopsis its activity is regulated in part by the S-phase dependent transcription of its large subunit (Trachana et

aI., 2010). In K. brevis, neither RnRl (Van Dolah et aI., 2007) nor RnR2 (this study)

show a cell cycle-dependent change in transcript levels. Cell cycle activity of RnR in 90 mammalian cells is further controlled post-translationally by an S-phase specific stabilization of the RnR2 subunit lasting until cells pass into mitosis, when it is polyubiquitinated for degradation (Chabes et aI., 2003). This stabilization ofRnR2 ensures a sufficient supply of dNTPs for replication as well as repair during G2 (Chabes and Thelander, 2000). In K. brevis, the RnR2 protein was present throughout the cell cycle with a significant increase during S phase (LDTI4, p

(Engstrom and Rozell, 1988). However, unlike the mammalian RnR2, K. brevis RnR2 presence throughout the cell cycle is similar to what is observed in trypanosomatid, L. m.amazonensis(Lye et aI., 1999).

Replication Factor C

RFC consists of five subunits that together form the clamp loading protein which loads the PCNA homotrimeric ring onto replicating DNA. In yeasts the transcription of

RFC subunits does not change during the cell cycle, whereas in humans, transcripts of

RFC2-5 subunits increase during S-phase. In K. brevis, RFC5 transcript levels were unchanged while its protein expression, although it did not meet the cut-off for statistical significance, appears to increase in S-phase. An increase in RFC during S-phase would be expected since the PCNA protein, which RFC binds to and loads onto DNA during 91 replication and repair, changes over the K. brevis cell cycle, with highest abundance during S-phase.

Proliferating Cell Nuclear Antigen

PCNA, the homotrimeric ring that forms the replication fork sliding clamp is another protein that is regulated differently in various eukaryotic lineages. PCNA is transcriptionally controlled in budding yeast and humans, but constitutive in yeast over the cell cycle (Jensen et aI., 2006). In K. brevis, the transcript levels were similarly unchanged. In mammals, PCNA protein expression is widely used as a marker of actively growing cells because of its S-phase specificity (Elias, 1997). In K. brevis, this protein was present throughout the cell cycle, although barely visible during mitosis by

Western blot. The most notable change in expression was a shift in band size of approximately 9 kDa dominating during S-phase. It is possible that the S-phase specific

37kDa form ofPCNA represents the sequential replacement of the 28 kDa PCNA with a larger isoform. However, there is no precedence for this in other organisms and there were no such candidates found in the EST library. Alternatively, this shift in molecular weight may represent post-translational modification of PCNA. Two candidate modifiers arethe small ubiquitin-like modifier (SUMO; 9 kDa) or ubiquitin (8.5 kDa) since both modifications to PCNA are known in other eukaryotes. However, total PCNA, both the

28 kDa and 37 kDa forms, are in low abundance by LDT22 which represents mitosis.

The presence of these peptides in K. brevis and their possible roles in PCNA modification are addressed in Chapter 4.

The immunolocalization data show that the PCNA protein is localized to the nucleus at all phases of the cell cycle, but its distribution within the nucleus changes. 92 Peripheral staining dominated in G 1, while an increase in brightly stained chromatin­ associated PCNA was found in S-phase and corresponds with the time at which there is an observed band size shift of PCNA. As the cells entered G 21M, the proportion of cells with chromatin-bound PCNA decreased and PCNA at the nuclear periphery increased. It may be that as the cell finishes DNA replication, PCNA is sequestered to the nuclear envelope for export and ubiquitination-dependent degradation by the 26S proteosome.

This is consistent with the decreased expression at this time point observed by the

Western blot. The observed nuclear localization throughout the cell cycle is unusual when compared to other eukaryotes, but has been observed in the promastigote stage of the trypanosomatid, Leishmania donovani, where PCNA is nuclear throughout the cell cycle but has distinct nuclear foci in S-phase, which are thought to be active sites of DNA replication (Kumar et aI, 2009). Like K. brevis, the PCNA protein was not found in the cytoplasm during any stage of the L. donovani cell cycle. In human cells, PCNA is absent in G 1, is present in a pool in the nuclear periphery in early S-phase and has a more nucleoplasmic pool ofPCNA later in S-phase (Essers et aI., 2005). However, it is not currently known if this change in localization is regulated by its post-translational modification.

CONCLUSIONS

Although dinoflagellates possess conserved cell cycle proteins present in all eukaryotes, they have many unusual cell cycle characteristics that suggest some aspects of their cell cycle regulation may differ from the established eukaryotic models. In particular, the presence of the spliced leader on K. brevis gene transcripts suggests post- 93 transcriptional gene regulation. This study demonstrated that S-phase genes classically regulated at the level of transcription, appear to be under post-transcriptional control in K. brevis. Among the four S-phase genes studied, there was no evidence for transcriptional activation during S-phase. Nonetheless, their protein products had maximal expression levels during S-phase and, in the case of PCNA, a prominent apparent post-translational modification specific to S-phase.

94 Figure I: Multiple Alignments of the amino acid sequences of S-phase proteins. (A)

Proliferating Cell Nuclear Antigen, black bar indicates the location of the peptide used to develop the anti-PCNA antibody used for immunolocalization and the red bar shows the peptide used to develop the anti-PCNA antibody used for Western blotting. (B)

Replication Protein A, large subunit. Black bar indicates the location of the peptide used to develop the anti-RPA antibody used for Western blotting. (C) Replication Factor C, subunit 3 (D) Replication Factor C, subunit 5. Black bar indicates the location of the peptide used to develop the anti-RFC5 antibody used for Western blotting. (E)

Ribonucleotide Reductase I and (F) Ribonucleotide Reductase 2, boxed region shows the highly conserved region in RnR2 to which the peptide antibody used in this study was developed against in human RnR2.No antibodies were developed for RFC3 or RnRI.

Alignments include the eukaryotes Karenia brevis, Trypanosoma brucei,

Chlamydomonas reinhardtii, Arabidopsis thaliana, Saccharomyces cerevisiae, and Homo sapIens.

95 (A)

10 20 30 40 50 n 10 80 90 100 . _ . - 1 - _. - I · · - - 1 - · - - 1 ·· - · 1 - _. - 1 - -- · 1 -· - - 1 · _ . - 1 -· · - 1 -·· - 1 -·· - 1 -· · - 1 - · · - 1 -· - - 1 -· - - I ··· - I · _. - 1 -· - - 1 · - - · 1 H. sapiens PCHA - MFEARLVQCSILKKVLEALKDL IlIEACWD I SSS n.Q ID SSHVSLVQL TLRSE GFDTYRCDRULJU.IGVln. TSlISKILKCA 'lIEDIITLRAEDlIAD TL S. cerevisiae PCHA - In..E EEASLFKRIIDr.FKD CVQLVlWQ CKEDl I QJ VDDSRVLLVSLE I GVEAF QEYRCDHPVTL CMDL TSL SKILRC tnfTDTL TLIADNTPDS I T. brucei PCHA - lfi.. QVLHAIn.WKRL I E CI lJl LVlIEAJfJ- DC1W( GLS I Q ·ID TSHVALVJD.ILLRDDCF T QCERJf SVLf LlJLASL SKVLKIVE TDSL TLRHEDD SDW C. reinhrdtii PCHA - I EARITQbAVLKKLVEALKELVTE r.n FDVSST L QL Q ,ID S~ HV CLV S LALRE D br E ID RCDRJU. If I HFQln.SKILKCAGlIEDTITLKAEDltAD QL A. tha1iana PCHA - lfi..ELRLVQr SLLKKVLE S I KDLVlJDAlffDCSST F SL Q ,ID SSHV GIIDDIITIKADDt DTV K. brevis PCHA J.IVLEAHL QQ ...VLLKKWD .[I(J)L CKEVlJT'DCSEKf L QVQSlID SSHV VALLLRE SArSEl KCD P SDSLKLKWQIJD TV

110 120 1 30 140 150 1'0 110 180 190 200 . . . - 1 - -- - I · · · - I · -- - 1 ·· - - I · _. - 1 - _ . - 1 -· - - 1 - _ . · 1 -· · - 1 -·· - 1 -·· · 1 -·· - 1 -· · - 1 - · · - 1 - · - - 1 · - - - I · ·· - 1 - · - - 1 ···· 1 H. sapiens PCHA VfEAPUQEKVSDYEMKUIDLDVEQL CI P E QEYS CWKMPS EF I CRDL SHI GDAWISC S :\S EL lIGII I KL SQ ------TSIW S. cerevisiae PCHA ILLr ED TKKDRIAEYSLKLJ.mIDADFLKI EEL QYD STLSLP SSEF SKIVRDL SQLSDSI lIDlITKE TI KFV: GDI (. SGSVIIKP ------fVDU T. brucei PCHA TL TSElfGERSRKCEYQLKLLE I E TEAl,fr I PEJ.m SlVTL SSQEF IVRDMTVF 'D TVlJIE ILKE SVKF SSCGDVGE CYALLRASHJ.P TVDPRSKCE SD C . reinhrdtii PCHA TU ESSlJQDRI SDt DLKLMSI ESEHL I P DQE YS I Kl,IPSSE YQRIVRDL TS I GDTVL I SATKE l.I KF STSliDV TAlWTLRJi ------nT1P A . tha1iana PCHA Tn ' ESP TQDKIADI' EUKU ID I DSEHL I P DAEYHS lVIUIPSIIEF SRI CKDLSSI .iDTVV I SVTKE I:vKF STA17DH TAlflVLRQ------UT TV K. brevis PCHA SF QCES GEDDRIADFDLKlJ.IQI ESEHlfE I PE QQYKVLARLP S FMKI CRDLKEF GETMQI QASKE I RrSVQ DI T IMn..KP ------RE S-

210 220 2 30 240 250 2'0 210 280 290 300 ... - 1 -· · · 1 ·· · · 1 - _ . ·1 ·· · · 1 · · - · 1 ··· - 1 · ·· · 1 -·· · 1 -·· - 1 - · · - 1 ··· - 1 -·· · 1 -·· · 1 ··· · 1 · · · - 1 ··· - 1 ··· · 1 -·· ·1 -··· 1 H . sapiens PCHA DKEEEJ ------VTI ElfiIEPVQL Tl ALR'lLlU 1 T TPL SSTVTLSMS VPLVVE IA--Dl( HL KI EDEE S--- S . c erevisiae PCHA EHPE TS ------I KLEJ.m QPVDL T &J ~ LD II K G SS L S D RV G I RL SS EAPAL r Q FD L K--- S FLQFPLAP lIDEE----- T. brucei PCHA VKTEDEEAD CSVRTHS KD I PL r I rvDVRTlIEP I TL SI AKE SP CMVE YSI D- - QV YLRYYLAPKVDDAE ----- C . reinhrdtii PCHA EKPEEQ------TIIDLKEPV Tr PWKLSL TKDLP lVVEYQI G--EL AVKF P KI DDED QM ED A. tha1iana PCHA DKPEDA------lVIE}{KEPVSL SJ: ALR nlSI' T TP LSDTVTI SLSSELPWVEYKVA--El(l YI Ri iLAPKI EEEEDTlJP- K . brevis PCHA EKPEEK------VTL TVHE SVTATI- L CS SVEL L ( PDSPLSVKYELEI 11 l'I-1QI 'YLAPKI DE ------

H . sapiens PCHA S. cerevisiae PCHA T. brucei PCHA C . reinhrdtii PCHA EI QS A . tha1i ana P CHA K . brevis PCHA

96 (B)

10 20 30 40 50 60 10 80 90 100 110 120 · ··· 1···· 1···· 1 ·· · · 1 · ··· 1 ··· ·1 · · ·· 1 ···· 1 ·· · ·1 ···· 1 ·· ·· 1···. 1 · . ·. 1 .·.· 1 ... . 1 ·· · · 1 ···· 1 · · ·· 1 · · ·· 1 · ·· . 1 ···· 1 · ·· · 1 ··· ·1 ···· 1 K. hrevis RPAl T. hrucei RPAl C. reinhardtii RPAl W A'IATL TS GDVARI KSKEDf Alir ---WLRVSELQEV ------KKHKCJ.fl.. SD('}IlISI R - VLA SQFADLVAS ELSlI('CLIKI TAF'VTlITH SDD - VVLhTDLSVVSPf T IVK A . tha1.iana RPAl --UP VSL TEGVVImn..n LEVT SETDlmPVl.QVTELKLIQSKL HQUQESSUR LLSD&T- D :}{lIn SU ISLV1 rQ GTI QU SVIRL THYI ClI- LIQTRR IVVUIQLEVIVEKClrIH S. cerevisiae RPAl -I~ S SV Q L S R. ·Dr HSH TlrKQRYDlrP T G( VYQVYlrTRKSD----- ( AlrSIIRKtrLUlISD( I ·lKALLRlIQAASKI QSMEL QRCDIIRVI lA.EPA IVRERKKYVLLVDDFELVQSR- ---- H. sapiens RPAl --ltV QLSE&AIAAIMQ- --KGDTlH KP ILQVnrIRP IT-----TGtl SPPRYRLLMSD&U IT LSSf1.lLf lQU rPLVEEEQLSSlICVC QI HRF IVItTLKD f, RRVVIUfELEVLKS------

130 140 150 160 110 180 HO 200 210 220 230 240 .. .. 1 .· ·· 1 ···· 1 · · ·· 1 · · · · 1 · · · · 1 · · ·· 1 · · ·· 1 ···· 1 · · ·· 1 ·· ·· 1 · ··· 1 ··· · 1 ··· · 1···· 1 · ··· 1 · ··· 1 · · ·· 1 ·· ·· 1 ···· 1 ···· 1 · · · · 1 ···· 1 · · ·· 1 K. hrevis RPAl ------M T. hrucei RPAl ------M C. reinhardtii RPAl MEVD}{ALI lSTP l-KP Q SST------AD PDJ I

250 260 210 280 2'0 300 3 10 320 330 340 350 3 60 ···· 1 ··· · 1···· 1 · · ·· 1 ·· ·· 1 ·· ·· 1 · ·· · 1 · ·· · 1 ···· 1 · · 1 ··· · 1···· 1 ···· 1 ·· · · 1···· 1 ···· 1···· 1···· 1 · ··· 1···· 1 ···· 1 ···· 1 ·· ·· 1 ··· · 1 K . hrevis RPAl I VD ------Qr f'P I SEI SP'I'HT- KWTI KARVTSKAPVRTF - - KS t KVF SVDLLDAL GEI RSTH 11 T. hrucei RPAl QQ P SQQ ------Q I Q P ID S L S P ~ L &KWWIRARVTDKSE I R'ftflrKP TSQ GKL FSf l LIDE Sil - SI TVF lr C. reinhardtii RPAl TPPS n-IGVSDRK------lrllHKIAQLHPYE1 - 1IWCIRAKVDRKAPLRALP-SKPDVI

310 380 no 400 410 420 430 440 450 460 410 480 ·· 1 ···· 1 ·· ·· 1 ···· 1 · ··· 1···· 1 ···· 1 · · ·· 1···· 1 ···· 1 ···· 1 · · ·· 1 ···· 1···· 1···· 1 ···· 1···· 1 ···· 1 ···· 1 · · ·· 1 ···· 1 ···· 1···· 1 ··· · 1 K. hrevis RPAl Q KF LDVLKP GP Cfrr SKGRVRIADRRVTALlmRYELH DADAIIEPAKDEL T--l EAL lrITSL I QSKT- LP CGVDLCGVVTAVRPLMTVRlrKE G- QELLKRD ITIADDTAT SM T. brucei RPAl VDIff IrPLIVIr QV'IYl SGt QVKlfAIlRKl SlrvtrtrDYELSf D1ITCQI S W SSSI PLQRYUFVPIAILKQRE - V' SLVDVLAWLlrvEELCTIVQRST RELVRRTVKVAD STA - I C. reinhardtii RPAl 'PAEm ISEQLVE KVYVf HKI KVKPAD KKYV'IVKlIEYQI DfTDTTDVSEAAD QDSS- IlMTTA VEVTP I EQLPRR I QRAPVDVMGVVLAL GSY('TVKRKADtrSELPRREVTI&DQS KSV A. tha1.iana RPAl DAVD QIFDKIVV lrvYl .. I SR llLKPAQKUFlrHLP IIDYE IHLDSAST I QP CEDD CT--I PRYHfHFRlrIGDI EllllE- tnrsTTDVItIVSSI S-PTVAIMRKIlL TEVQKRSLQLKDUS RSV S. cerevisiae RPAl DIAT IrE lLQE r KVYYVSKAKL QP Qf TIll THPYEUlLDRD TV I EECFDE Sll--VPKTHf lU IKLDAI QUQE - V1rSlrvDVL U I QT IlrPKI: ELTS - KKF DRRD ITIVDDS fSI H. sapiens RPAl EQVDKl l PLIEVlrKVYYT SK TLK rKQI Til VKlIDYEUTf trlrE TSVUP CEDDKH--LPTVQfDf T I DDLElrKS- KD SLVD IIGICKSYEDATKI TVRStrllREV IYlllDTS "KVV

498 500 510 520 5 3 0 540 550 560 510 580 5 90 600 · · · · 1 ···· 1 ·· · . 1 · .·. 1 .... 1···· 1 ···· 1 ···· 1···· 1···· 1···· 1 ···· 1 ···· 1 · · · · 1···· 1 · · · ·1 · · · · 1···· 1 · · ·· 1 ···· 1 ···· 1 ···· 1···· 1 ···· 1 K . hrevis RPAl TVTLWAER Q ED RVf E C ------YPVVCVKrv ~l r ,Rf SLLA S ~Vf KP TFEEAKRV QQWW S E G& - SSQr LLD I SQTT G ---- GD GG( R1rRllA ------AAftT LAGVR Q ' T . hrucei RPAl DVTLWU- ---EllAI

610 620 630 '40 650 6U 610 680 "0 100 110 120 ··· · 1 · ·· · 1 · ··· 1 ···· 1 ··· · 1···· 1 ·· · · 1 ···· 1 ··· · 1 · · 1 · . ·· 1 ··· · 1 ···· 1 ···· 1 · · ·· 1 ·· ·· 1 ···· 1 ···· 1 ···· 1···· 1 ···· 1 ···· 1 ··· · 1 · ... 1 K. hrevis RPAl ADRL l - DQPEH SV IVQMIU< Q&EVQPL Q CQEPRE Un SSLP CllRRVD SS I CSI CtrR-A KV llIRCRfVDFED QAtIllTTI HEAAQRVL SAEDVRA!fEO T DEI-lAD T. hrucei RPAl L K -PKP DVIDLRCVPVYL KQ----nDTQWYDACP QCUnnKI

138 140 150 160 110 180 .,,0 800 810 820 830 840 ···· 1 ···· 1 ···· 1 · ··· 1 ···· 1 · · ·· 1 ·· · · 1 · · · 1···· 1 ··· · 1 ···· 1 ··· · 1 ·· · 1 · · ·· 1···· 1 ·· · · 1 ···· 1 ·· · · 1 ···· 1 ···· 1 ···· 1 ···· 1 ··· · 1···· 1 K . hrevis RPAl ~~~iM!Ir AVRhRYl EKP LSLWRAKLDT ~1I--- f EVRPtlITWDARPI SR ,EN RV!n..KE IIIQ IL ",}{------T . hrucei RPAl EED --P1ITVTKVil QMmOrRPVLMRLRVKEE t.Lf GlrEDSERVRLlrvvRlTE fi>LDTV'IEDKRQ · QLRQECDEl-IIKCIIU YL------C. reinhardt.ii RPAl ( t EGf AGAL QWKPWQVVVMSKARE Ylr - - - GllRSVRltSA YKV'E IU DWVSESSRLVTL U------A . tha1.iana RPAl YE trQDEE KF ED IIRSVAl TKY Il KL Kl KEE n S- - - DE QRVKA TWKAEKLIlYSStrTRf'MLEA I DKLKI DAlISLP IKAESSUYRSDAJ irS V TS( TRD T SVDARREl lLPAI.lI QV· Q S . cerevisiae RPAl EED-- P IlEI TKITQS I QlUlE't'l)t RI RAREDTYtI - - - DQSRI RY'lVAllLHSU 'x"RAEAD YLADELSKALLA ------­ H . sapiens RPAl DKlf- - EQAI EEVrQllAllTRSf I f RVRVKVE TYll - -- DE SRI TVI-lDVKPVD YRE :l(.RRL Vt-lS I RRSAU1------

850 8U 810 880 "0 no ·· 1 · ·· · ' ···· 1 · ··· 1 ···· 1···· 1 · · · · 1 · ·· · 1 ···· 1 · ·· · 1 - · ·· 1 ···· 1 K. hrevis RPAl T. hrucei RPAl C. reinhardtii RPA1 A. tha1.iana RPAl YGUQYSSD SL! "r TS ClrvCRSUSHVSAUCP TUISEP Qr Qi'Ur TIIA( !PRQHV SY S . cerevisiae RPAl H. sapiens RPAl

97 (C)

10 20 30 40 50 60 10 80 '0 100 --- - 1 - - - - 1 - - - -1 - - - - 1 - -- - 1 - - - - 1 - - - - 1 - -- - 1 - - - - 1 - - - - 1 - -- - 1 - - - - 1 - - - - 1 - -- - 1 - - - - 1 - -- - 1 - - - - 1 - -- - 1 - - - - 1 - - -- I K. brevis RFC3 ------Sl.fAI.LltVDKHRPKSLDEMD YHHDL SARL QU JQPHlfl..FV P S &AbKSTRVY fl.RELYGP TDHIKVE TKSVAPtJP SSP SlfT T. brucei RFC3 H T E f, TL SU lo QP fS TLPWEKYRP TTLDDVVAHEE ILDTTRRUDJ Sf SMPHLLFYl PPCT KTT TI CAHHLF ( KERL WLElUIASDDR&I DVVR C. reinhardtii RFC3 ------I·lPLltVDKYRPUSFDKFVVHKDIADULKKLVAT DFPHTLF y( PP (}f &KKTLVUALL I Y JI,('VEKVRVE TKPWK I DLP SRKLE A. thaliana RFC3 ------I·fl.ltVDKYRPKSLDKVIVHED QKLKKLVSE QDCPHLLPYGP S GS KKTLUIALLKQI YGASAEKVKVEl lfKVDA SRTID S. cerevisiae RFC3 IfS-----TSTEKRSKEULPWEKYRPE TLDEVYGQlJEVITTVRI

118 120 130 140 150 160 110 180 1'0 200 - -- - 1 - -- - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - -- - 1 - - - - 1 - - - - 1 - -- - 1 - -- - 1 - -- - 1 - - - - 1 - - - - 1 - -- - 1 - -- - 1 - - - - I K. brevis RFC3 VD I QVVVS 11 YHLVL TP SDV t,llKD VVlIQL I KEVAAHPPU JI- -- HTf KVVVI EDAGASUP QAQAPLRRTIlEKYlIKTCRIILMCDSASKI IAPLRSRCCA T. brucei RFC3 QQVREFIl STSSI FFQlnJP GUQ TV------TUFKLVILDEAD QMSSDAQ RRIIEKt' TKlWRFCILCIJHI UKIIP QSRCTR C. reinhardtii RFC3 VEL TTL SSlrnHLELlJPADV l.SlIDRYVVQE IIKElfARSRP!f& l.SR&J KVLVLlJEVDRLS QQ &LRR TIoIE SSACRL IMVCSIIVSKVMEPVRSRCL C A. thaliana RFC3 LEL TTL SSTUHVEL TP SDACF QDRYIVQE llKEIfAKlIRP I DTKf KK&YKVLVLlJEVDKL SREAQHSLRRTIIE SSSCRL ILCCl'SSSKVTEAI KSRCLlr S. cerevisiae RFC3 UQ I KDFASTRQI F SKG------I KL IILDEAD TllAAQllALRRVI ERYTKlJT . CVLAl TP L SRCTR H. sapiens RFC3 I E I ST SUYHLEVlJP SDA nSDRVVIQElfl.. KTVl QSQQLE TlISQRD T KVVL LTEVDKL TKDJI QHALRRTIlEKYI·fSTCRL ILCC USTSKVI PPI RSRC

218 220 230 240 250 260 210 280 2'0 380 --- - 1 - -- - 1 - - - - 1 - - - - 1 - -- - 1 - - - - 1 - -- - 1 - - - - 1 - - - - 1 - - - - 1 - -- - 1 - - - - 1 - - - - 1 - -- - 1 - -- - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - - -- I K . brevis RFC3 VRV AP SUEDVQKVL QKVP S SL QLPPI'.rAFKV1.EKS RDLRRALLLLESJ' HAQAUQ F SKD f ILP QE r.wD (" I EKVSKK ILQE QSPKC fEVR fl.. ~'E L T. brucei RFC3 F SPVKKSAMLPRLKL IAREE rvPF TDE GLISAFRLSDGDlfRRCLlJTIlQASSlf Sf G-----E I TEESVYRTT GIJP TP TDVRVIW DIfi.SHlJYAT StfEKV C. reinhardtii RFC3 VRVAAPSDAQVlfEVL Q AKKEIJLVLP AARVVDYP GRlJLRRALL CLEVC QQ YP F GDSQEP QRADWELYIAEV JU m E QSPKQLYLVRSKL l'EL A. thaliana RFC3 VRI11AP SQEEIVKVLEFVAKKE SL QLP QGF IAEKSI SLRRA ILSLE TCRVQUYPF T nQVI SPMDWEEYVAEIA TDUlfKE QS PKKLF QVRGKVYEL S. cerevisiae RFC3 r QPLP Q I ERR WLVHEKLKL SPI I ELSIHYDHRRVLlWLQS C TLDIJPDEDE I SDDVIYE CC &APRP SDL VLKSILEDDlt&TIlliYTL H. sapiens RFC3 VRVPAPSI ED I CHVL STVCKKE GLlJLP SQLAHRLAEKSCRlJLRK LMCEACRVQQ YP} T QE I PE TDltEVYLRE TAlJ1. IVSQQTP QRLLEVR RL L

310 320 330 340 350 360 310 380 -- - - 1 - - - - 1 - - - - 1 - -- - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - -- - 1 - - - - 1 - -- - 1 - -- - 1 - -- - 1 - -- - 1 - -- - 1 - -- - 1 - - - - 1 - - K . brevis RFC3 CLPPDF ILKQU RL IADQRDD K----- Q A utFE S TI.ffiQ SKD IFP S RL SYCASltL TSKP QHSQER------T . brucei RFC3 QQLVVDK STADLVREVHL IVI·fAMD LP-QDCKCFLLIKLl.DVEYil Ar.l TRElfIlJ--I r.vL AFQLVKEAL TQUKP I FE l L C C. reinhardtii RFC3 LAlfCVPPELUIRQL TFELLKR- lfD DE I K-----LE TVSYAA QFE QRL QE & I FH--LEAFVARVIf SlWKTYLVSCr; A. thali ana RFC3 LVIICI PPEVILKRLLHELLKK- LDSELK- ----LEVCHWAJ m UfRL&Q H'H--l EAFV lfSI IT'LIST- F f ------S . cerevisiae RFC3 llKVRS GLAL I DL I E GIVKILED YEL QUEE TRVHLL TKLAD I EYSI SK(w( ND QI Q--f s p v rr h i W ------H. sapiens RFC3 L THCI PPE IDfK LLSELLHlJ- CDCQLK-----GEVP HRL QL SKAI YH--L FV,!\Kl'MJIu.YKKFMED GLE f.lft. -- --

98 (D)

10 20 30 40 50 60 70 80 '0 100 - - - - I - - - - I - - - - I - - - - I - - - - I - - - - I - - - - I - - - - I - - - - I - - - - I - - - - I - - - - I - - - - I - - - - I - - - - I - - - - I - - - - I - -- - I - - - - I - - - - I K. brevis RFC5 --MSSUEDVEMTDAGSVLEPP ------IL I GGRELLPWVEKYRP SSLDQLVAliEDIIRTI ERI·nrn- KRQLP HMI..LHbPPCTGKTST IT TIl T. brucei RFC5 ------laWVDKYRPRTLDDVDLypEL TEVLKRLAE - SQDLPHLLLYGP SGTr KKTRM· VL QQV C. reinhartii RFC5 QDlfS - MDEVUAVPS --- --lmLKTDGTVD GRDAPWVEKYRPKKLDDv: IIDTIKRLTV- ElmLPHLLLYGPPGTGKTSTI V, QI S. cerevisiae RFC5 ------I·ISLWVDKYRPKSUIALSm IEEL TlIFLKSL SDQPRDLPHLLLYGPUGTGKKTRCMALLE S I A. tha1iana RFC5 ---},ITEL T SAMD I DVDE I QPRKPIlJKGKDVVGF GPPP QS TPWVEKYRP QSLDDVAAHRD IIDTI DRL Tlf- EIJKLPHLLLYGPPGTGKTSTI LAVARKL H. sapiens RFC5 -----ME TS KQ---QE QP------AA TKI RULPWVE QTLUDLISHQDILSTlQKF IlI- EDRLPHLLLY PP TGKTSTILAC QL

110 120 130 140 150 1'0 170 180 1'0 200 - - - - I - - - - I - - - - 1 -· - - I - - - - I - _. - 1 - - - - I - - - - I - - - - I - _. - 1 -·· · 1 - - - - I · _ . · 1 - - - - I - - - · 1 - · - - I - - - - I ··· - 1 -· - - I · - - - I K. brevis RFC5 YKEK------SMTLEL NASD GI D--VVRE QI KTFVSVKQLFAGL GG SS GQPKLVILDEADIUf TQ SAQF RRlfI E Q T. brucei RFC5 :iGP SVY SLRLEHKSVQV TDSKVVD IATL SSPHHI DIUPSDAGlrnDRVVVMQUI RE IAQT --VPLQS CTP G-AAKYKVVVLlIEVDKlfG QHALRRTla K C. reinhartii RFC5 YGNS ------LAtnf TLELlISSDE RGI G--VVRQE I QDFASTR------SVFSI KL IILDE CD TQDAQAALRRVIEK S. cerevisiae RFC5 F I> PGVYRLKI DVRQFVTA SIlRKLELtlVVSSPYHLE I TP SDMGm IDRIVIQELLKEVAQl lEQVDF QDSKD GLAHR CVIINEAlISL TKDAQAALRRTIIEK A. tha1iana RFC5 Y P K------UHLELtlA SDDRGI D--VVRQQ I QDFASTQSf ------SL GKS SVKLVLLDEAD ·ITKDAQF RRVIEK H . sapiens RFC5 YKDKE ------F Gg ·WLELllASDDRGI D--IIRGP ILSFASTR------TI FKKGFKLVILDEAD ITQDAQllALRRVI EK

210 220 230 240 250 2'0 270 280 2'0 300 - - - - I · - - -I ·· · -I - - - · 1 - · - - I - - - - I - - - - I - - - - I - - - - I - _. - I - _. · 1 - - - - I - _ . - I - - - - I - - - - I - - - - I· - - - I · - - - I - - - - 1 - - -- I K. brevis RFC5 YAP ILIC}WSSRIIPAL QS RCTKL GPL TEEQIAGRLRYV CEKI T- ED GI Iv:EVGGGDMRKv.L IWFQTT SMGHE IVD------G V T. brucei RFC5 fl. TCRLVLICUSTSRL IAPLRSRC VRVPSHSQEIH TKVI RTVCEKE GRlfPP SPAF SlJQSE GUL QLML TKv.DF SG------s GAiii' C. reinhartii RFC5 YTRl CLICIlYVSKIIP QS RCT Rt'APL SP QFVRERL QYV I EKliKLG-PGGLDAVVQLGSGDlIRRSLUILQS C DTVD ------QSAV S. cerevisiae RFC5 YSKlH RL DlVCDSlISP I lAP I KSRCLL I RCPAP SD SE I S T IL SDVVTlIERI QLE TKD ILKRIAQASUGllLRVSLUfLE SllALm IELALK------S S SP I A. tha1iana RFC5 YTKST I GUHV,lJKIIPALQS RCT APLDGVJn·ISQRLKHVl EAERLVVS- DCG VRLSlfGDI·IRKALUILQSnn. SKE ITEEESKQITEEDV H. sapiens RFC5 F TEUTRFCLICUYL SKIIPALQSRCT (;PL TPEUWPRLEHVVEEEKv.DIS- ED IKALVTL SSGDI·IRRAU H L QSTlU GKVT------EETV

3 10 320 330 340 350 3'0 3 70 380 3 '0 400 - - - · 1 · _. - I · - - - 1 - · - - 1 -· · - I - - - - I - - - - I - - - - I - - - - I - - - - I- _ . - I - - - · 1 - _ . - I - - - - I - - - - I - - - - 1 - · - - I - _. - I - - - · 1 - - - - I K. brevis RFC5 YQST GVP QPEDI DSrLQTU n fSlIVTFSASVKGLKDLLSSKG DDFL QLMHRRLVVQKL SVQQRVRL TTIaADI DWRLKQGCGE QVQLA&IVAAFHEAR T. brucei RFC5 P QADWQVFLEE ID ILTE QT -PKKLFE I RG YDLL QCI S GEVD-IRGVLEALU I--SVKP ,IRPAVVS HllMKL GTKP ILHL AS ,IQLL C. reinhartii RFC5 YTCTGlfPLP I ERVLTWLLU--DRVAEVFAlH LKLQVDKGIALVD IVRELHPFVMAL S I PVPAKVALVERLADVEHRLAF STSEKL QL GALV S . cerevisiae RFC5 I KPDWIIVIHKL TRKIVKERS- VlJSLIE C VLYDLLAHCI PAlH ILKEL TF SLLDVETLtITTlJKSSIIEYSSVFDERL SL G IFHLE GF IAKVMCCL A . tha1iana RFC5 YL CTGlfPLPKD I EQI SHWLLU--KPFDE CYKDVSE I KTRKG IVD IVKEITLF I FKIKMP SAVRVQL IlIDLAD I EYRLSF GCIIDKL QLGAIISTF THAR H. sapiens RFC5 YTCTGHPLKSD JILDWlfLU--QDF TTA H TELKTLKGLALHD IL TE I HLFVHRVDFP SSVRI HLL TKlfADIEYRLSVGTlIEKI QL SSL IAAFQV TR

. - . · 1 - -- K. brevis RFC5 T. brucei RFC5 KRlJ- ---- C. reinhartii RFC5 E T S. cerevisiae RFC5 D------A. tha1iana RFC5 SIIVG H. sapiens RFC5 DLIV

99 (E)

10 20 3 0 40 ~o 60 70 80 '0 100 "" .. , .... , . ... , .. . . , .. . . , ... . , . . .. , ... . , . .. . , . . . . , . .. . , . ... , . . .. , .... , .... , . .. . , .... , . . .. , . .. · ' · · ·- 1 K. brevis RnR1 -----I.fYVVKRD( RHQEVRFD S I TKRI QTL CD ( L DP IO"VDP VPVTQKVI E I YS I STSE I D T LAAE T CA YMSQIOtPD f ST I.RII V SnL HIOI TVI J IS T. brucei RnR1 I-n.E TVKI..V T KRD ( SVEP YDE KVVRS R IVln.MS I D S YYVDVDD L VRVV E ("VREGMST .fl.DEL E TAAYCVT KHPD Y f LLA IUJV T HKTTTE SVL C. reinhardtii RnR1 --lfT SMYVLKRDf' R QE P VUF D K I T IVKLAV (.L tfPD } ' CDAVLV.A QKVA N TTSE L DELAAETA V StlLHIOITE K SI S A . tha.l..i ana RnR1 - -- --:t.fYVVKRD ( R Q E ~ DKI TARLKKL S1 · L SS D H C D PVLVA Q KV CA GV TTSQL DE TAA f T ClfH P D 'r S V SIILHIOfT KK ~J S S . cerevisiae RnR1 -----lfYVIKRD RKEPVQ}' DK I TSR I T RLSVGL DP UR I DII VKVTQR IIS S CVT TVEL D l ILAAE T CA YMT TVHPD 'LAT IA I S nLHKQTTK Q S H. sapiens RnR1 -----lfHVIKRD CR QE RVHf DK I TSR I QKL C Y 'L lUfDt VDP A Q I TIfJ

110 120 130 140 1~0 160 170 180 1'0 200 . . . . , .... , . . . . , .. . - 1 - · · . , . .• - 1 -·· . , . . .. , . ... , .... , .... , .. . - 1 - · · - 1 -· · - 1 -· · - 1 - · · - 1 -·· - 1 -·· - 1 - · · - 1 -··- 1 K . brevis RnR1 E T CR!-n.RDYKDK Q- L Rl S L IADDVWEFVQK S L DAA I D YTRDY l YDYFGF K T L E YL L RVH & --- - - KIVERP Q m~Jl C I HP GVE I E T. T. brucei RnR1 D Sf RVLHEHVSQATRRHAPLIS EEL WD fKHS QQ I IIIYE RDf D F E Yf l" YKT L E R S YLLRVDK l R [ VMEVVERP QQMI L RV I H EDL E RVKET C. reinhardtii RnR1 K TIfDKLYTYRNPRlI E EJ IAI

218 220 230 240 2~0 260 270 280 290 3 00 . . . - 1 - · · - 1 -·· . , ... - 1 - - - - , - -- - 1 -· - - 1 - - - - 1 -·· . , .. - - 1 - -. - , . . - - I · _ . - 1 -· - - 1 - _ .. , . .. - I · · - - I · _ . - 1 -· - - 1 - -·- 1 K . brevi s RnR1 In.MSQR 'r} THATP T L l}fA T P T P QlfSSC F LLTVQSD S I D l.I YSTL T H C J

3 10 320 330 340 3 ~0 3 60 370 380 3'0 400 -. - - 1 - _. - 1 - -- - 1 - _. - 1 - ·· - I · - - - 1 - · · - 1 - · · - 1 - -- - 1 - · · - I · · · - 1 - · - - 1 - _ . - 1 -·· - 1 -·· - 1 - _ . - 1 -· - - 1 - ·· - I · - - - 1 - ·· - 1 K . brevis RnR1 KRK( SI fYLE P WHAJ>XI E t L E L KKlrn KEE QR.A.RDL I Y OJL WI P D L F:t.IRRV KED £ EWT L F CP l D QE S GK GLIDVWl EEF E KMY T E L E TA K RKTV T. brucei RnR1 KRKCI' lA I YLE P WHAJ> IJ Gf LLLKIOlTf KED QR ARDLt Y GL WI P D L FUERVES H £ TWT LlfDP UTAP------F L S D C Y QE I TD L YER YE RE f R VRT I C. reinhardtii RnR1 KRK&AI I YUEPWHADVFD F L DL RIOrnf KEEARARDLl Y L WI P D L fERVEADKEWS L f' CP UE AP------GLADCW EEF E RL KYE QE GR S RRVV A . tha.l..iana RnR1 KRK("AJ AVYLE PWK VYEf L E L RIOrn£. KEE KRARDL f 'II ./ ..LWLP D L I 'IfERV QIDIG QWS L F CP UE 1I.P ------GLADCWGAEF E T L YTKYERE K.: KKVV S . cerevisiae RnR1 IULl' WI P D Lf'1.tKRVETUQDWS LMCP l fECP ------L D EVW( EEF E KLY S K Q RVRKVV

410 420 430 440 4~0 460 470 480 4'0 ~ oo " _ . I ·· · · . · -- - I ·· _. , - _ . - 1 - _ . - 1 -· - - 1 -·· - I ·· - - 1 - _. - I · · · - 1 - ·· - I · -- - 1 - -. - 1 -·· - 1 -·· - 1 -· - - I · - - - 1 - _. - I · -· - 1 K. brevis RnR1 K A QQL WFR ILE S QME T T P .fl.YKDH Jllf K SlfQQ11L T I H SSIILC T E IIE YTS EVAV ClfLAS IALS RF ------SD S E P YDr E ( T. brucei RnR1 QA QE L WI LILE SQVET VPJ'l.fl.YKDA C tU K SlIQKlIL T I K C StILC T E IVEYTSRDEVJ V C tILAS IALPRlVl

~10 ~20 ~30 ~40 ~~o ~60 ~70 ~80 ~'O 600 - - - - , .. - - I · - - . , - _ . . • .. - - 1 -· - - I · - - - I · _. - 1 -· · - . - - - - 1 - _ . - . - _. - I ·· · . • - ... • - -- - 1 -· - . • - . - - I · - . - 1 -· - . • - - _ . , K . brevis RnR1 L YEVT KV.AT ILUKVI D RlJYYPVEEARRStH RHRP ILGV Q CLADAFlfIRRLP f E S DJ lfED II· E T I YlAftC S CE E RX lYA S P /' S K K T . brucei RnR1 LKEVT KVVT RllLl fRVIDRlrnVPVRE YS1ILRHRPV £ I Q GLADTIl..LLS L P l AKPE AKKLlfRQ I I E T I YFAJ'lVE. STE LAE KD f P 1 K S P I'SE K C . reinhardtii RnR1 LAEl-1T KMVT RlILU K IIDVlJYYPKP QR S1UfRHRP I CL fVQ "'L A])TF ILLl. D S P L tfRQ I EVVYH STSCE LAERD l P T ~ (; S P V SQ fV A. tha.l..iana RnR1 LAEVTATVTVlILIlKIIDVlfYVPVET TStUfRHRP II I OV Q GLADAt ILL ·lPF D S P EI QQL UKD I t E T I YYH KI STE LI.ARL 'P YE T A S P V S K l I S. cerevisiae RnR1 L HEIAKVI T HllLlfRVI D RlIYVPVP IDIS1UfKHR.P L V Q LADT YMt.fl.RLP I E S E EI QTL llKQ I r E T I YHATL EIl S CE D QKE ST E ( S P /' S K I H . sapiens RnR1 LAEVT KVVVRllLl lKIID I lJ'lI"YPVPEI CL SUKRHR P I I V Q LAD 1 Iu·m YFF E S QLLt lKQ I F E T I Y'iGALEI S CD u..KEQ P T YE S P V S K I

618 620 6 3 0 640 650 660 670 680 6'0 700 -- .. • . _ . - 1 - · - - 1 - _ . - 1 - _ . - 1 - · · - 1 -·· - I ···· I · - - - I · ·· - 1 - _ . - 1 -· · - 1 - · - - I · -·- 1 - · - - 1 -·· - 1 - · · - 1 - _. - 1 - -- - 1 -··- 1 K . brevis RnR1 L Ql D L W K T P ------IQC( ---RWDWA r L KEK I S KH L RN S LLVAPlfPTAST.A QIL£ tnfESI' E P YTQIIL YVRRVLS E I'V QVlfRHLL RDLVS R t T. brucei RnR1 L QI D L WDEERRI R ---IDfED SVHS H C L WDWD S L KERVVKV ·lRlfSLLV lPTASTSQIL(';IHfEC I E P I TSJlI YV R RVL S l'E l' P VVUKHLVKELIRL Q C . reinhardtii RnR1 L QP DlflfrvKP------T INDl R ---C n WEEL RllR II QH GVRlISLLVAPlfPTASTSQILClflfEC r E P YTS1H YVRRVLS E f TVVUQHLLHDL 1IRlf A. tha.l..iana RnR1 L Q P DlfW}WIP ------S D R ---WDWAVLRDUI ~ KlI 'VIUfSLLV:\.PlfPTASTSQILGIUfEC r E P YTSII I YSRRVLS E 'VVVNKHLL HDL T DU S. cerevisiae RnR1 L QI Dlllil ------1· ·1---WDWET L RKDIVKK' L RllSL TI· fP TASTSQIL GYlfECF E PVTSlU S RRVLS E I QVV1fP YLL RDL VDL H . sapiens RnR1 L QYJ)lfWtw T P ------TD L ---WDWKVLKEK L I Rl~S LL IAPUP TASTJ' QIL U NES I E P YTSllIYTRRVLS E l QIVlIPHLL KDL T E R (

710 720 7 3 0 740 750 760 770 780 790 800 _. _. I · -- - I · - - - I · - _. I ·· - - 1 - -- - 1 -··- 1 - -- - I · · - - 1 - _. - 1 - -. - 1 -· _ . I ·· - - I ·· - - I · - _ . I · - . - 1 -·· - 1 -· · - 1 - · · - 1 -· - - 1 K . b1"evis RnR1 L W TEDlfRUQL IAHlU SVQQL - D L P CD I KEL YKTVWE I K QR IVLDMA D R I D QSQSL tH KMID I T I CU

810 820 8 3 0 840 8~0 860 870 880 8'0 '00 . _ . - 1 - _ .. • .. . - I · _ .. • .. - - I ·· · - 1 - · · - 1 -·· . , . - -- • .. - . , . . . - 1 - -- . , .. - - 1 -·· - I · -- - 1 - _ . - , _. -- • .. - . • . _ . - 1 - · -- I K . brevis RnR1 I TVEVDKVRRASSJ'lJ ------L TJAT.~AV SI J

AT P QUKUPS1f S r. tlL r. L T P t~ I TAAJ' II S L P STSUTP V Sth b ./V GG------D T'V A.RR I Q L~ QL E REQQ - II A . tha.l..iana RnR1 TVDT ·n.KEK ------PSV D ------KEVEEE--D tlE TKf... ---Q S. cerevisiae RnR1 T I D QEV Q I\/' T H SVS E L DRPV iVPK T KJ S E QK SAL T E SSD tfEKDASP VP S E QSSVSS StfVKLEDSV P VP T E T I KED S DEKKCD I 'lilfEKV H . sapiens RnR1 T L tIKEKLKDK ------E KVS KEEEEKER UT ---

910 .. -- 1 ---· ' .. _- ' _·- K_ brevis RnR1 I.. '\.P---Ki'A CE 1IC SS T . brucei RnR1 L t. PPD T D D Gf CL U C S C _ reinhardtii RnR1 IWC S L E --tfKDA C UfC S A _ tha.l..iana RnR1 UVC S L T--tfPEEC C f S S _ cerevisi ae RnR1 II C TJ' P --TP EACE S C S H _ sapiens RnR1 . ,W C S L E --nRDE CLMC S

100 (F)

10 20 30 40 50 60 10 80 '0 100 - - - - I - - - - I - - - - I - - - - I - - - - I - - - - I - - - - I- - - - I- - - -I - - - - I - - - - I - - - - I - - - - I - - - - I - - - - I - - - - I - - - - I- - - -I - - - - 1 -··- 1 K. brevis RnR2 ------I SAPV SR I SAE YRE LEA ------Q D PLLL ENPRR~· P I EH Perkensis RnR2 ------MTSLSE Q1.fI{ALEP------KEDLLKEIJPHRYVMfP I T. brucei RnR2 ------l IPPKSHI

110 120 130 140 150 160 110 180 1'0 200 · · ·- 1 -· - - 1 - - - - 1 - ·· -1 - -·- 1 - - - - I · -. -, _ . - - , - - - - , . _. -, - - - -, - - - - , . - _. , .... , .... , . .. p-Io__ ..... __ I ·· · · ' ···· , K. brevis RnR2 IlfEMYKKHEASFlfT E I DL SQDIlKDlIDM- L SE SERHFVKHVLAFFAASD IVLEllLASQF SVEVQAP YGF YSLLIE QYI RD Perkensis RnR2 IlfEMYKKHEASFWT E I DL SQDLRDllEN- LSDIIDRHF I SHVLAFFAA SDCIVLEIIL SGEVQCP YG YSLLIDIJYI KD T . brucei RnR2 P DIWQKYKKAE SSIWTVEE I DL GIIDMTDllEK- LDD GERHF I KHVLAFFAASDGIVLEIILAERFIfCEVQVPEVRCFY F YSVLIDTYVVD H. sapiens RnR2 iso2 HD IWQl SFlfT EVDL SKD I QHWE S - LKPEERYF I SHVLAFFAASDGIVllElILVE SQEVQI TEARCFYCF SLLIDTYI KD C. reinhardtii RnR2 P QrtfEMYKKAEASFWT EVDL GDDI.fKHWEK- LSDGERHF I SHVLAFFAASD G- IVLElILAI RHIKEVQLP SLLLE SYI KD A. thaliana RnR2 KS IlfEI· SFWT EVDL S TDVQQ llEA - L TD SEKHF I SHl LAFF AA SD GIVL EIILAR - FU IDVQVP SLLLE TF I KD S. cerevisiae RnR2 SFWTAEEI DL SKD I HDlfUlmln lEl1E F I SRVLAFFAASDGIVlIEIILVEUF STEVQI P YSLLIDTYI KD

210 220 230 240 250 260 210 280 2'0 300 _ .. - 1- · · -I ··· - 1 - _. -I · - . -1 - · · -1 -·· - 1 -·· - 1- · - - 1 -·· - 1 -·· - 1 -· - - 1 - - - - I ··· - 1 -·· - 1 -·· - I ··· - 1 -· - -1- · · - I · - - - 1 K. brevis RnR2 P KDlWPNA I E 'IlfPAVKE JWA I QlfIfiI- SENSF RVVAFAAVEGILF SCSF I YlfLKKR bU IPGLTF SlIELISRDE (;LHAE FACLLYRKLVlIP-LP Perkensis RnR2 P KD KIr I E TVPSVRKKAElfALSWIlI- DDllCF SERL ACVE GILF SGSF CAIYlfLKKRGUfPGL TF SllELISRDE GLHAD FACLL IMHYTRLP T. brucei RnR2 P DEKQRLLHA I RT I P CI EKKAKWA I ElfI G- SQT Sl' P TRLVAFAAVEGI FFSGSF CAIFlfLKKRGU IPGL TF SIIEL I SRDE GLHTDFfJ. CLLYE IVlIKLP H. sapiens RnR2 iso2 PKEREFLFlIAI E 'IlfPCVKKKADlfALRlfIGD TYGERVVAFAAVEGI FFSCS 'ASI FlfLKKR U IPGL TF SIIELISRDE GLHCDFACUlFK-HLVHKPS C. reinhardtii RnR2 P QEKHKLFHS I E TVP CI KKKA QlfAMKlfI G- SDSVTSERLVAFAAVEGI FFS c.. SF CSI FlfLKKR&U fP GL TF SllELIS RDE GLHCDFACMLF SI·lLlJTR- ID A. thaliana RnR2 SKEKDRLF NA I E T I P CI SKKAKWCLDlfI Q- SPUSF AVRLV AF AC (j. I FF S GS F CAIFlfLKKR GU IP GL TF SllEL I SRDE GLHCDFACLL I SLL QLH-VP S . cerevisiae RnR2 P KE SEF LFIIAI HTIPE H EKAElfALRlfIQDAD F JERLVAFAS I E F S r SFASI FlfLKKR tr-IPGL TF SIIELICRDE GLHTDFACLLFAHLKlIK-PD

310 320 330 340 350 360 310 380 3'0 400 . _. - 1 ··· - 1 -· - - 1- _ . - 1 - _. - 1 - · - - 1 - · - - 1 -·· - 1 - _ . - 1 -·· - 1 -·· - 1 -·· - 1 -·· - I ··· - 1 -·· - 1 -· - - I - _ . - 1 -·· -1 -·· - 1 - · ·- 1 K. brevis RnR2 GDVlHAIVQ&AVDVERKF I CEALS CDLI ISELMTRYI EFV RLL TVL (;-HSKLF SSSUP LDlfIm L I SL QGKTlIFFEKRVGEYQ .fT------S Perkensis RnR2 DERVHE IVRf AVDVERVFI SESLPVSLI m SQLMCRYI EFV RLLVALGHPKL YllAT1JP FDlfIOOlISLQl KTUFFEKRV EYQK GVI &T T. brucei RnR2 RDRVLE I I CllAVS I EREF I CDALPVRL I m SQLMTQYI EFV RLLVSL r.YDRHYUSKlJPFDFMDUISL QGKTUFFEKKV&EYQ H. sapiens RnR2 iso2 EERVREIIIlIAVRI E QE l" L TEALPVKL I ' CTUlKQYI EFV RLlU.EL G-F SKVFRVEIJPFDFUEIH SLE I. KTUFFEKRV " EYQRM C. reinhardtii RnR2 E TRVKQl IVEAVEI E QEF CTEALPVDLI tSRU QYI QFVADRLLVSL GC QYl~IP FE ·IEIH SL QGKTltFFEKRVGDYQ A. thaliana RnR2 LEKVY QIVHEAVEI E TEFVC P CDLI RLLVTL GCERTYKAE1IPFDlfIm F I SL QGKTUFFEKRVGEYQ S. cerevisiae RnR2 PAIVEKIVT VEI E QRYFLDALPVALL RLLVAFr11KKYYKVE1IPFDFUEIH S GKTlIFFEKRVSDYQ

410 420 _ . . -1 - · · -I- _ . · 1 -··- 1 K. brevis RnR2 P SQDEVK -----FALDVD F Perkensis RnR2 lID ERRKS I F DEDf T . brucei RnR2 ERSSKVF S------LD r H. sapiens RnR2 iso2 P-TElISF T------LD F C. reinhardtii RnR2 - L QGKVQlf - - ---t SI·WED A. thaliana RnR2 - L QUGUQUY---EF TTEEDF S. cerevisiae RnR2 - KSTKQ C---AF TFlIEDF

101 Figure 2: Phylogenetic (Neighbor-joining) tree of PCNA amino acid sequences with percent bootstrap support. Inset: dinoflagellates and kinetoplastids.

102 Neighbor-joining tree, 1000 bootstrap replicates (MEGA)

Zea mays1 PCNA Zea mays2 PCNA Higher plants, Oryza sativa PCNA Arabidopsis Q9M7Q7 PCNA 1

21 B . napus PCNA TOBAC PCNA tomato PCNA poplar PCNA 34 Daucus carota PCNA Nicotiana benthamiana PCNA 74 P . satiwm PCNA

19 CATRO PCNA (CYCLlN) 36 3 9 periwinkle PCNA GI cine max PCNA

1-.______Tetraselmis chui PCNA

Homo sapiens PCNA

26 Rattus non.egicus PCNA Sus sc rofa PCNA '----- Mus musculus PCNA Danio rerio PCNA lae'vis PCNA

81 Bombus mori PCNA

3 SuI. frugiperda PCNA melanogaster PCNA 92 S . crassipalpis PCNA Aedes aegypti PCNA

2 4 ,----- Aureococcus anophagefferens PCNA % Ditylum brightwellii PCNA vertebrates, 91 -inverts.} Skeletonema costatum PCNA Prorocentrum minimum PCNA 5 Rhodomonas sp. CCMP768 PCNA Trypanosoma brucei TREU927 PCN 11 Dunaliella tertiolecta PCNA T.cruzi strain PCNA 8 9 C 160098 Chlamydomonas reinhrdtii PCNA Saccharomyc es cere'visiae PCNA putati-.e Leishmania PCNA thermophila PCNA 79 Pfiesteria piscicida PCNA 4 C . elegans PCNA 13 Schizosaccharomyces pombe PCNA Pyrocystis lunula PCNA A . fumigatus PCNA 99 micrum PCNA Trypanosoma brucei TREU927 PCNA T.cruz i strain PCNA 44 K brevis Contig 2450 PCNA

25 putatil.e Leishmania PCNA Pfiesteria piscicida PCNA 13 Pyrocystis lunula PCNA Dinoflagellates, 99% 88

27 Guillardia theta PCNA falciparum 3D7 PCNA ,----- Ferroplasma acidarmanus PCNA Methanoculleus marisnigri PCNA 8 3 92 Methanosarcina mazei PCNA Archaea, 88% Archaeoglobus fulgidus PCNA Chain A Sulfolobus tokodaii PCNA Chain D Sulfurisphaera ohwakuensis PCNA 54 13 Pyrobaculum aerophilum PCNA Aeropyrum pemix PCNA

38 Thermofilum pendens Hrk PCNA Thermococcus fumicolans PCNA

62 Tkodakarensis KOD1 PCNA 1-.______+- ______P .furiosus PCNA

1-.______-+ ______Pyrococcus horikoshii 013 PCNA

103 LAPK binding K. brevis

inus

Peptide: DRIADFDLKLMQIESEH S.cerevisiae

Figure 3: Three-dimensional protein models of K. brevis PCNA monomer (top 2 panels).

Top left panel shows where the peptide is located that was used for antibody development

and subsequent immunolocalization studies. Top right model shows the L-A-P-K

binding domain where PCNA binds to RFC. Bottom: trimeric Archaeabacterial (left) and

yeast PCNA (right).

104 10 20 30 40 50 60 70 80 '0 100 ~...... ;...;;..,; ...... &..:... - __- ;..,;- -:....a...;-....:.t- - - 1 - - -- 1 -- - - 1 - - - - 1 - -- - 1 - -- - 1 - --- 1 - - -- 1 --- - 1 ---- 1 -- -- 1 - - - - 1 --- - 1 - --- 1 - --- 1 - --- I Kb SL TCC GTAGCCATTTT GGCTCAA Kb PCNA 2a TCC GTAGCCATTTT GGCTCAA T C T C T GT CAG T G C TT G T CAAGCC TTT C CA C TT GT G TACAAG C TAT C T C TT CAA TA C TT GAA G T CA~ T CC TT G Kb PCNA 2b TAGA T CAT CAA TA C TT GAAG T CA C ~ T CC TT GAGG C T CACC T GCAACAGGCC G T CC T CC T GAAGAA GGTT GT C G Kb PCNA 9 TCC GTAGCCATTTT GGCTCAA ACAAG C TAGA T C TT CAA TA C TT GTAA T CA ~ T CC TT GAGG C T CACC T GCAA CAGG C T G T CC T CC T GAAGAA GG Kb RFC3 G C T GGC T CAGAGG C T CACAGA CCCC GTTT C GT C TTTT CATTTT GGCA C TT C TT CAG CAGA~ C G C T G C T C T GGG Kb RPAla TGCTCAAACATACTT CACGCC TGCAAGCGTGTCACC TGTCC TCC TCAAAAGATCCC TAGCAAGCCAAATCTTTT GCC Kb RnRl GTCCA GTGGTCCC TT CAACTCC TGCTT GCTGTCCC GCAACTCGCC TCTGTGCTTTTTT GCTTTAAGGTGTGCAGCA

110 120 130 140 150 160 170 180 1'0 200 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - - -- I Kb SL Kb PCNA 2a AGGCTCACC TGCAACAGGCTGTCC TCC TGAAGAAGGTT GTAGATGCCATGAAGGATCTTT GTAAAGAGGTGAACTT CGATT GTAGTGAGAAGGGTTT GCA Kb PCNA 2b ATGCCATGAAGGATCTTT GCAAAGAGGTGAACTTT GATT GCAGTGAGAAGGGCTT GCAGGTCCAGTCCATGGACAGCTCTCATGTT GCC TT GGTCGCCC T Kb PCNA 9 TT GTAGATGCCATGAAGGATCTTT GCAAAGAGGTGAACTT CGATT GTAGTGAGAAGGGCTT GCAGGTCCAGTCCATGGACAGCTCTCATGTGGCC TT GGT Kb RFC3 TGGATAAGCACC GCCC GAAGTCGCTT GATGAGATGGACTACCATCATGATCTTT CGGCACGGTT GAAGAAGTT GGCAGCC CAGGGCAATCAGCCACATAT Kb RPAla ~ T G T CAGG T C CATTTA T GCCAA TT GCC GAG C TTT C CACC TA CC GAACAAAG T GGACAA T CAAGGGT C G T G T GA CAAACAAGGGACC T C T GC GCA CA T Kb RnRl ~CC GGA CAGA T G TA C G T ~G T CAAA C G C GA T GGCAGG CA C CAGGAGG T G C G C TT C GACAGCATTA C GAAG C GAA T CCAGA C G C TTT G C GA T GGA C TT G

Figure 4: Spliced leader sequences on S-phase transcripts. Outlined are the spliced leader

sequences on the 5' end of each representative transcript and the start codon (ATG)

downstream from the spliced leader.

105 100

c: 80 0 +J -"'::s 60 Q. 0 Q. • G1 40 OJ ..•.. S U ~ 20 ... ··f······· ) ..~" -. G2/M ~ .. ~ .... ~ o r:.--~ii··~ ~ I ••• LOT2 LOT6 LOT10 LOT14 LOT1S LOT22

Time (Hours)

Figure 5: Flow cytometry data showing the percent of the cell population proceeding

through the cell cycle, which is phased to the diel, or light/dark, cycle (n=5 replicate

cultures). X-axis is time in the light dark cycle, with "lights on" at LDTO.

106 4 ~PCNA ~RFC o 2 E!EE!I RPA 1U lDDDIRnR

et:N 0 ~~~~~~~"WM __~~"~~. OJ o ..J -2

-4

Time

Figure 6: Gene expression results for PCNA, RFC, RP A and RnR2 in K. brevis. The four S-phase transcripts showed no significant change over the cell cycle. LD time is on the x -axis and the log2 of the ratio of expression of each gene relative to its expression at

LDT2 is on the y-axis(log2 ratio= C-E, where C is the control cycle threshold and E is the experimental cycle threshold). Significance was determined using a one-way ANOVA

(PCNA, p=0.943; RFC, p=0.7943; RPA, p=0.8916; and RnR, p=0.8321).

107 RFCS A. RPAl B. RnR2 C. kDa kDa kDa 2 6 10 14 18 22 2 6 10 14 18 22 2 6 10 14 18 22 62- 62 ~ ~~410 __ _ 49 49- 49-- 38-

600 600 600 ~ 500 ~ 500 ~ 500 0 0 0 ....J 400 ...... J 400 ....J 400 '0 0 '0 C 300 C 300 C 300 ~ ~ ~ u :I ~ 200 L. 200 ~ 200 ! ~ 4) ! 4) ! Q.. i Q.. :. I Q.. I I 100 i • i I 100 • 100 ! 0 0 0 LOT6 LOT10 LOT14 LOT18 LOT22 LOT6 LOT10 LOT14 LOT18 LOT22 LOT6 LOT10 LOT14 LOT18 LOT22

Time (Hours) TIme (Hours) TIme (Hours) p=O.3122 *p=O.0233 p=O.1122

D. PCNA:28kDa E. PCNA:37kDa F. Combined PCNA proteins: kDa kDa 2 6 10 14 18 22 2 6 10 14 18 22 28 + 37kDa 38 38 28 - + +- 28- 350

350 300 600 300 250 ~ ~ 500 0 250 0 ....J ....J § 200 f .... 200 .... 400 I f 0 0 150 C 150 300 4) I C 41 f t I 100 f ~ 100 ! ~ 200 4) 4) ! 1 Q.. i ! ! Q.. 50 100 I I 50 0 0 LOT6 LOT10 L0T14 LOT18 LOT22 LOT6 LOT10 LOT14 LOT18 LOT22 LOT6 LOT10 LOT14 LOT18 LDT22

TIme (Hours) Time (Hours) Tne(Hcus) *p=O.0244 *p=O.OO19 p=O.1225

Figure 7: Western blot analysis of the RP AI, RnR2, RFC5 and PCNA proteins over the cell cycle. Densitometry of the replicates (n=5) over time is shown in the graph below each blot. A representative blot is shown for each protein is shown. Changes in band density over time relative to LDT2 is shown in the graph below each blot (n=5 replicates). RnR2 (B) and PCNA [both the 28 kDa (D) and 37 kDa forms (E)] were shown to significantly change over the cell cycle (ANOVA, p<0.05). RP Al (A) and

RFC5 (C) did not meet statistical cutoffs, but did appear to increase in S-phase when observed visually. (F) Total combined PCNA protein did not significantly change over the cell cycle.

108 LOT LOT LOT LOT LOT LOT 10 10 14 10 10 14 98- 98- ~NS 62- 62 - 49- 49- 38 _r----~ ~NS 38- 28- I':------:-' 28- 17- 17-

PCNA 1:5000 PCNA Peptide Block

Figure 8: Peptide competition assay of anti-K. brevis PCNA demonstrating that both the

28kDa and the 37kDa band are specific to the anti-K. brevis PCNA antibody. Protein lysates tested were obtained during two S-phase time points (LDT10 and 14).

109 Figure 9: Immunolocalization ofPCNA (A) Representative labeling patterns observed in

Gl, S, and G2/M cells probed with anti-PCNA and a FITC-Iabeled secondary antibody

(green) (i-iii); overlay of FITC with DAPI stained nuclei (blue), where bright blue nuclei are negative for PCNA (v-vii). Negative controls probed with peptide blocked anti-PCNA and FITC-Iabeled secondary antibody (iv, viii). Numbers in parenthesis represent the % of PCNA positive cells counted in triplicate fields of a minimum of 100 cells each. n = nucleus, c = cytoplasm and p = periphery of the nucleus. (B) Flow cytometric cell cycle distribution of cells observed in A. (C) Enlarged image of an S-phase cell showing punctuate staining suggesting chromatin association of PCNA; inset DAPI overlay of the same cell.

110 A. PCNA localization

(i) (ii) (iii) (iv)

(v) (vi) (vii) (viii) LOT2: GI (53%) LOTI4: S-phase (73%) LOTIS: G2/M (52%) PCNA peptide competition assay

B. Cell cycle distribution C. Chromatin Association of PCNA

LDT2 LDT14 LDT18

G1 93.3% G1 G1 52.50/0

5 00/0 5 5

G2+M 6.7% G2+M G2+M

111 Figure 10: (A) Flow cytometry data showing the percent of the cell population entering the cell cycle when released into light after an extended dark phase during log phase. (8)

Percent of cells positively stained for the PCNA protein that are chromatin associated or at the nuclear periphery.

r

112 A. Cell cycle distribution

c 100 0 • --...!! ::s 80 a. 0 • %G1 A.- 60. .. 40 ------%8 U ~ -"'6--%G2IM 20

0 LDT2 LOTS LDT14 LDT18 Time (Hour:sl

i'

8. PCNA localization pattern

100

! -.. 80 a60 -Chromatin '0 Bound c 40 o Periphery ~ . D..• 20 / o LDT2 LOTS LOT14 LDT18

Time (Hours) * p

113 Table 1: Primers used for qPCR and SL - gene specific peR for identification of the presence of the 5' spliced leader sequence.

Gene Forward primer 51 -3 1 Reverse primer 51-3 1 Top hit GenBank Accession number

PCNA ~CTTGAGGCTCACCTGCAAC GCAAGCCCTTCTCACTACAA EU078559(K.brev~) RPAl ~GCCAGCATTGACGCCATA GGTGCTGTGAAGCTGGTGATGA XP_002786176 RFC ~AGGAAACAAGGACCGGGCAGT CGCCAGCATCCTCAATGACAAC EEY69117 RnRl ~AGGCAACAGTATTCTTGT ~CAGATGTACGTGGTCAAA XP_002785979 RnR2 ~CAAGAGGCGGTCAGCTACAAACT ~GTGGATGTGGAGCGGAAGTTCAT XP 002773236

~Iiced Leader ~CCGTAGCCATTTTGGCTC

114 Table 2: Peptides used for antibody development. The ribonucleotide reductase 2 (RnR2) antibody is a commercially available peptide antibody developed against the human

RnR2 amino acid sequence.

Protein Peptide PCNA (Westen blot) NVM LKPRESEKPEEKVTLT PCNA (lmmunolocalization) DRIADFDLKLMQIESEH RFC GRAVYQSTGVPQPEDID RPA N LLKKPSEYSNGG RRAALH R RnR2 QIAMENIHSE

115 Table 3: Prevalence and diversity of selected S-phase specific genes represented in the K. brevis library.

Unique Unique GenBank GenBank Non- ESTs Contigs Proteins Organism Top Hit Top E-value Organism Top Hit E-value PCNA 109 16 9 1. 13E-141 Daucus carota 3.00E-64

replication factor C I subunit3 1 1 1 Xenopus laevis 4.00E-76 Xenopus laevis 4.00E-76 replication factor c, subunit 5 7 2 1 Phytophthora infestans 3.00E-87 Thalassiosira pseudonana 1.00E-80 replication protein A, large subunit 30 3 2 Perkensis marinus 8.00E-71 Arabidapsis thaliana 9.00E-58 ribonucleotide reductase, large subunit 54 12 3 Perkensis marinus 1. 17E-165 Ectocarpus silicolosus 1.00E-161 ribonucleotide reductase, small subunit 2 1 1 Perkensis marinus 1.20E-129 Trypanosoma brucei 3.00E-128

116

Chapter 4: S-phase Expression of PCNA: Possible Post-translational Modifications,

CDK Dependence, and its Potential Utility as a Biomarker of Growth Status in

Karenia brevisRed Tides

118 INTRODUCTION

Proliferating cell nuclear antigen (PCNA) is involved in many essential cellular

processes. These include DNA replication, DNA damage repair, chromatin structure

maintenance, chromosome segregation and cell-cycle progression (Stoimenov and

Helleday, 2009). PCNA was originally detected as an antigen for autoimmune disease in

systemic lupus erythematosus patients, detected only in the proliferating cell populations

(Miyachi et aI., 1978). Since then it has also been used as a biomarker for actively

dividing cancer cell populations. The crucial involvement ofPCNA in cellular

proliferation and its tight association with cancer transformation has resulted in the

frequent use ofPCNA not only as a diagnostic marker butas a prognostic cell cycle

marker (Elias, 1997). Due to its usefulness as a biomarker for proliferation in

mammalian systems, PCNA has received wide attention in other biological systems. In

the causative malaria parasite Plasmodiumfalciparum, an organism that has 5 developmental stages, PCNA expression is tightly regulated during the intraerythrocytic development; early, uninucleate ring stages (G 1 phase) contain little PCNA mRNA or protein and PCNA induction is first detected in the late ring stage and is maximally expressed in trophozoites (Kilbey et aI., 1993). This progression correlates with the onset of parasite DNA replication in the intraerythrocytic cycle. Following this logic, PCNA proteins have been identified in phytoplankton with the interest in their potential as in situ growth rate markers, including (Dunelliela salina) and bloom forming dinoflagellates, including Crypthecodinium conhii, Amphidinium carterae and 119 catenatum (Leveson and Wong, 1999) as well as in (Liu et aI., 2005) and P. piscicida (Zhang et aI., 2006). When PCNA expression was investigated over the growth curve in P. donghaiense, it remained at a comparatively high level during exponential growth and decreased during late stationary phase (Liu et aI., 2005). Similarly, in the green alga D. salina, PCNA protein abundance was low at the beginning of growth in lab cultures and peaked during exponential growth and was hardly detected in late stationary phase. However, in P. piscicida, the abundance ofPCNA changed only about two-fold between log and stationary phase, making it of little value as a quantitative marker of growth (Zhang et aI, 2006). An ideal biomarker for growth status of a cell population is absent in non-growing conditions but abundant in actively growing cells. In Chapter 3 we found a significant increase in the size ofPCNA from 28 to 37 kDa and abundant expression of the 37 kDa form only during S-phase in K. brevis. Theseproperties suggest it may be a useful biomarker for bloom growth.

However, Chapter 3 also identified some unusual phenomena that warrant further investigation to better understand the regulation of PCNA in the dinoflagellate. First, we found PCNA transcript levels unchanged over the cell cycle, despite an S-phase specific increase in protein expression. Since PCNA expression is typically activated by CDK­ dependent transcription factors, this suggests an unusual regulation of the G I/S checkpoint. Second, the identity of the ~37 kDa protein remains in question. With no precedent for a larger form ofPCNA dominating during S-phase in any species studied to date and no evidence for a longer PCNA sequence in the K. brevis EST library, we hypothesized that this shift represents a post-translational modification. However, no post-translational modifications have been described in dinoflagellates to date. The 120 studies in Chapter 4 therefore address three questions that further our understanding of the S-phase specific expression ofPCNA in the dinoflagellate:

(1) Does the S-phase specific change in PCNA molecular weight result from post­

translational modification, and if so, what is the post-translational modification of

PCNA during S-phase?

(2) What CDKs are present in K. brevis and what is the dependence ofPCNA

protein abundance on CDK activity at the G l/S transition?

(3) Does the S-phase specific form ofPCNA have utility as a biomarker for

growth rate in K. brevis?

121 METHODS

Analysis of Post-Translational Modification Proteins and eyelin-Dependent Kinases

04 Sequences with significant homology (BLASTx E-value < 10- ) to genes

involved in post-translational modifications, specifically ubiquitination and sumoylation,

and cyclin-dependent kinases (CDKs) were identified from theK. brevis expressed

sequence tag (EST) database housed at NOAA CCEHBR. Annotation of contigs and

gene ontology analysis were performed in Blast2Go.

In silico translationsof the full open reading frames of K. brevis SUMO, ubiquitin

and CDKs were performed using BioEdit software. To obtain the predicted theoretical

molecular weight of SUMO and ubiquitin, the amino acid sequences were analyzed using

the Compute pIlMw tool on the ExP ASy Proteomics server

(http://expasy.org/tools/pi tool.html). For multiple alignments of each protein sequence,

the amino acid sequences from other eukaryotes were obtained from GenBank. Species

used for the analysis of SUMO included: Pfiesteria piscicida, Phaeodactylum

tricornutum, Homo sapiens, Schizosaccharomyces pombe, Arabidopsis thaliana,

Chlamydomonas reinhardtii, andSaccharomyces cerevisiae; and for Ubiquitin included:

Plasmodium jalciparum,Homo sapiens, Schizosaccharomyces pombe,Arabidops is

thaliana, and Chlamydomonas reinhardtii, to include representatives across multiple

phyla. Sequences used for CDKs includedPlasmodiumfalciparum CDK7,

Arabidopsisthaliana CDK C, and Chlamydomonasreinhardtii CDK. Once imported into

BioEdit software, multiple alignments of the species were done using the ClustalW

Multiple Alignment application.

122 To determine sites of lysine modification by SUMO and ubiquitin on the K. brevis

PCNA, a bioinformatic approach was used. The full length K. brevis PCNA amino acid sequence was uploaded into the SUMOsp 2.0 sumoylation site prediction program

(http://sumosp.biocuckoo.org/prediction.php, Ren et aI., 2009). Similarly, the K. brevis

PCNA amino acid sequence was submitted to UbPred, a predictor of protein ubiquitination sites (http://ubpred.org/ , Radivojac et aI., 2010).

Cultures and Culturing Conditions

K. brevis (Wilson isolate) was grown in 0.22 Jlm filtered seawater (36%0) obtained from the seawater system at the Florida Institute of Technology field station, Vero Beach,

Florida, enriched withjl2 nutrients (Guillard, 1973), modified with the use of ferric sequestrene in place ofEDTA' Na2 and FeCb' 6H20 and the addition of O.OlflM selenous acid. Cultures were maintained in 1 liter glass bottles at 25°C + 1°C with a 16:8 light:dark cycle, and cool white light at 50-60 flmol photons m-2 sec-I, determined using a

LI-COR LI-250 Light Meter (LI-COR BioSciences, Lincoln, NE). Cell counts for growth curve analysis and division rate per day calculations were determined on a

Multisizer 3™ Coulter Counter (Beckman Coulter, Inc., Brea, CA).

Cyclin Dependent Kinase Inhibitor Experiments

Flask cultures containing 125mL of K. brevis were grown to mid-log phase (Day

6). On Day 6, cyclin dependent kinase inhibitors and carrier controls (fascaplysin/water, olomoucine/DMSO) were added just before LDTO (lights-on). Olomoucine was chosen as an inhibitor for the CDKI studies since this compound has broad CDK specificity and has been shown to block K. brevis in G 1 as well as mitosis (Van Dolah and Leighfield, 123 1999). Fascaplysin was chosen for its specificity to the G liS-phase transition CDK,

CDK4. Carrier controls of the same volume were added to cell cultures for the duration

of experiments (fascaplysin is soluble in water and olomoucine in DMSO). Structures of

these compounds are shown in Figure 1. A series of range finding studies was carried out for each compound in order to optimize cell viability and the effectiveness of each compound to block the cell cycle prior to S-phase. Concentrations of fascaplysin that were tested included 10, 5, 2.5, 1.5, 1, and 0.75 J..lM; and olomoucine concentrations tested included 100, 50, 25, 12.5, and 6.25 J..lM.

Five independent replicate cultures were sampled for each treatment. To ensure

CDK inhibitor was not just poisoning the cells, viability was assessed at LDT12 (mid-S­ phase) using SYTOX (Molecular Probes, Invitrogen) staining after 12 hours of exposure to inhibitor or carrier control (Figure 2). Fifty J..lL of 5J..lM SYTOX were added to 5mL of

K. brevis culture. Cells were incubated for 15 min. at room temperature followed by addition of 200J..lL of 50% gluteraldehyde for fixation and incubated for 5 min. Cells were centrifuged for 3 min. at 600 x g and placed on a microscope slide for enumeration under an Olympus Fluorescent microscope within 10 minutes. One hundred cells were counted to a get a percentage of cell viability. Fifteen milliliters of culture were fixed for flow cytometric analysis and the remaining cell culture (~100mL) was centrifuged at 600 x g for 10 min. Protein was then extracted from the cell pellets for western blot analysis using TriReagent following the manufacturer's protocol.

PCNA Expression Pattern in Flask Cultures over the Cell Cycle

K. brevis cultures in 125mL flasks were grown to mid log phase (Day6). Four biological replicates were sampled at each time point every 2 hours over the course of S- 124 phase and LDT2 samples (Gl) were taken to compare with the S-phase samples. S-phase

time points included LDTI0, 12, 14, 16 and 18. Flow cytometry samples were taken

from each culture at each time point for cell cycle analysis. The remaining culture was

centrifuged for 10 min. at 600 x g to pellet the cells for protein extraction using

TriReagent. Proteins were then used for analysis ofPCNA throughout S-phase by western blotting.

Growth Rate Studies: Log vs. Stationary Phase Experiments

Five replicate 125 mL flask cultures of K. brevis were grown to mid-log phase

(Day 6), late log phase (Day 8 and Day 10), early stationary phase (D12) and stationary phase (D14) and sampled on the indicated days. Two mL samples were taken on Days 1,

3, 5 and the day of sampling mentioned above (Day 6, 8, 10, 12 and 14) for all cultures in order to construct a growth curve and to calculate divisions rate. Cells were harvested for protein extractions at LDT12 (mid-S-phase) on each sampling day and a 15mL aliquot was taken for flow cytometric analysis. The remaining culture (---100mL) was centrifuged at 600 x g for 10 min. Protein was then extracted from the cell pellets for western blot analysis using TriReagent following the manufacturer's protocol.

For analysis ofPCNA expression under different growth rates, 125mL flask cultures were grown at light intensity of 70 f,lE/m2/s to mid-log phase (Day 6, n=5) or stationary phase (Day 12, n=5). To obtain log phase cells with lower growth rates, another set of flasks were wrapped in "neutral density" mesh filters which attenuate the amount of light the cultures receive to 31 f,lE/m2/s (Day 6, n=4). All cultures were sampled at LDTI2, which corresponds to mid-S-phase, on the appropriate sampling day.

Cell counts were obtained every other day during the course of the study in order produce 125 a growth curve and to calculate growth rate. The equation K'== In (n2/nl)/(t2-tl) was used to detennine growth rate, where fil and n2 are cell counts at the time points tl and t2.

Divisions per day were then calculated by using the equation K'/ln2. Proteins were extracted and Western blotting (see below) was carried out with anti-K. brevis PCNA antibody_ Flow cytometric analysis was performed on 10mL of culture as described below in order to correlate PCNA band intensity with the percentage of cells in S-phase at LDT12. A linear correlation analysis in GraphPad Prism 4 software was used to determine the relationship between growth rate and PCNA abundance.

Cell Cycle Analysis

Ten milliliters of culture were collected from each 1L biological replicate culture

(n==5) at each time point. Cells were fixed in 2% gluteraldehyde by adding 50% gluteraldehyde directly to culture medium containing cells and stored at 4°C for at least

12 hours. Cells were then centrifuged at 1750 x g to remove fixative and growth medium. To extract cellular pigments, cells were stored in -20De methanol for a minimum of 4 hours, centrifuged at 1000 x g and methanol was removed. The cells were then stained with 10 Jlg/mL propidium iodide (final concentration) in phosphate buffered saline (PBS) containing 10 mg/mL RNase and 0.5% Tween 20. DNA analysis was performed on an Epics MXL4 flow cytometer (Beckman Coulter, Miami, FL) using a 5

W argon laser with a 488 nm excitation wavelength and 635 nrn emission wavelength.

Multicycle software (Phoenix Flow Systems, San Diego, CA) was used to obtain the percentages of cells in each stage of the cell cycle.

126 Protein Extraction and Western blotting

Protein was extracted from the experimental samples using TriReagent following

the manufacturer's protocol. A Bradford protein assay (Bradford, 1976) was used to

quantify protein concentration using the Bio-Rad Protein Assay Dye Reagent (Bio-Rad,

Hercules, CA, USA). For western blotting, 10 Jlg of protein extract were loaded on a 4-

12% Bis-Tris gradient gel (Invitrogen Corporation, Carlsbad, CA). For the SUMO

experiments, a human recombinant SUMO 1 protein was used as a positive control. This

recombinant protein was a gift from Dr. Tilman Heise (Dept. of Biochemistry, MUSC).

Proteins were then transferred to a 0.2 Jlm polyvinylidene fluoride (PVDF) membrane

using the Invitrogen iBlot system (Invitrogen Corporation, Carlsbad, CA). The PVDF

membrane was then blocked in 5% non-fat milk in Tris buffered saline (TBS) with 0.1 %

Tween 20 for 1 hour at room temperature. Membranes were then incubated in primary

antibodies overnight at 4°C.The anti-KbPCNA antibody developed against translated K.

brevis was used at 1 :5000in TBS with 0.1 % Tween 20 and 5% non-fat milk. Commercial

antibodies were used for the detection of SUMO at 1: 1000 (Rockland Immunochemicals,

Inc., Gilbertsville, P A) and ubiquitin at 1: 1000 (Enzo Life Sciences, Inc., Plymouth

Meeting, PA) in TBS with 0.1 % Tween 20 and 5% non-fat milk. Membranes were washed 3 x 10 minutes in TBS with 0.1 % Tween 20. Horseradish peroxidase-linked donkey anti-rabbit Ig was then incubated with the PVDF membrane at 1:2000 for PCNA and SUMO or 1 :5000 for ubiquitin, in TBS with 0.1 % Tween20 for 1 hour at room temperature. After rinsing membranes in TBS with 0.1 % Tween 20 (2 x 10 minutes) and

TBS (Ix 10 minutes), the Pierce ECL Super Signal West Pico Kit (Pierce, Rockford, IL) was used for detection. Membranes were exposed to film (Pierce CL-XPosure Film, 127 Thermo Scientific, Rockford, IL) and developed for visualization. Densitometry of the bands of the biological replicates (n=5) was carried out using Alpha Innotech (San

Leandro, CA) FluorChem 8900 software. Significant differences were determined with a one-way ANOVA using GraphPad Prism 4 software (La Jolla, CA).

Peptide Competition Assay

Ten micrograms of protein extract from stationary phase cells at the time corresponding to S-phase (Day 12 and 14, LDT 12) were used for the PCNA peptide competition experiment. The anti-K. brevis PCNA antibody (1 :5000 or 3.6J.lg) was incubated with 100X peptide (360J.lg of peptide), to which the antibody was developed, in

TBS with 0.1 % Tween 20 and 5% milk for 1 hour at 4°C. The experimental membrane was incubated overnight at 4 °C with the peptide blocked anti-PCNA. The control blot was incubated overnight at 4°C in the anti-K. brevisPCNA antibody at 1:5000 in TBS with 0.1 % Tween20 and 5% non-fat milk. Western blot procedures were followed as described above.

Immunoprecipitation

Cells were harvested during S-phase (LDT12-14) and proteins were extracted using TriReagent following the manufacturer's protocol. Protein pellets were solubilized in 200 J.lL DIGE buffer (7M urea, 2M thiourea, 4% w/v. CHAPS, 30 mM Tris, pH 8.5).

Two micrograms of antibody (anti-PCNA, anti-Ubiquitin or anti-SUMO) were added to the protein lysate and incubated at 4°C for 1 hour with rotation. Ten microliters of

Protein AlG PLUS-Agarose beads (Santa Cruz Biotechnology, Inc., Santa Cruz, CA) 128 were then added to the protein lysate/antibody mixture and incubated at 4°C overnight with rotation. Immunoprecipitates were collected by centrifugation at 1000 x g for 5 min. at 4°C. Supernatant was aspirated and discarded. Immunoprecipitates were washed 4 times with PBS repeating the above centrifugation step for 5 min at 4°C. After the final wash, supernatant was aspirated and discarded and the pellet was resuspended in 40~L of

IX LDS electrophoresis loading buffer (Invitrogen, Carlsbad, CA). Samples were boiled for 3 min. to remove agarose beads from the immunoprecipitate and centrifuged at 2500 x g for 5 min. at 4°C. Three microliters of reducing agent were then added to 30~L of the sample and loaded onto 4-12% acrylimide gels for analysis by western blotting.

Stripping of Western Blots

To reprobe blots with a different antibody than had previously been used, a stripping protocol was used as follows. Stripping buffer was prepared using 0.93g of

Tris-HCI, 2g of sodium dodecyl sulfate, 20 ~L of J3-mercaptoethanolbrought up to

100mLs and adjusted to pH 6.7. Stripping buffer was warmed to 40°C. The blot to be reused was then incubated in buffer, rotating at 40°C for 20 min. The blot was then washed 4 times in phosphate buffered saline for 20 minutes each. The blot was then washed in 5% non-fat dry milk in Tris buffered saline with 0.1 % Tween-20 two times for

20 minutes each. The new primary antibody for analysis was then added and incubated following the optimized procedure.

129 RESULTS

Possible Post-translational Modifications of PCNA

The presence of multiple forms of the PCNA protein observed in Chapter 3 led to further investigation of possible post-translational modifications ofPCNA in K. brevis.

PCNA in eukaryotes that have been studied is known to be both sumoylated and ubiquitinated on lysine residues. Sequences with homology to the sumo and ubiquitin polypeptides are present in the K. brevis EST library. Multiple alignments of both K. brevis sequences with other eukaryotes show that they are highly conserved and possess the G-G motif which binds to the lysine of the modified protein and is present in these polypeptides across phyla (Figure 1). K. brevis ubiquitin has high homology to known

33 ubiquitin sequences (BLASTx E value 10- ) andthe K. brevis SUMO sequence shows

38 high conservation as well (BLASTx E value 10- ). Amino acid sequence analysis using the ExPASy compute pI/Mw tool predicts the K. brevis ubiquitin to code for a polypeptide of8.5kDa and sumo for 9kDa. Sequence analysis of the PCNA amino acid sequence using the ubiquitin prediction program UbPred, predicted that K184, K190, and

K194 are ubiquitinated with medium confidence (Figure 2)based on the eight known ubiquitin binding domains that range from 20-150 amino acids in length (French et aI.,

2005). Sequence analysis of K. brevis PCNA using SUMOsp, a sumoylation prediction program, predicted that K190 is sumoylated with medium stringency (Figure 2). This was determined to be a Type II non-consensus motif (ESEKPEE). Other organisms with

K190 ofPCNA predicted to be sumoylated include H sapiens and C. reinhardtii. Lysine

238 of K. brevis PCNA was also predicted to be sumoylated with high stringency. This site (VKYE) was determined to be a Type I consensus site (\f-K-X-E) where \f is A, I, L, 130 M, P, F,or V and X is any amino acid residue. Lysine 164 of K. brevis PCNA was not

identified as a possible modification site by ubiquitin and SUMO, which has been shown

in S. cerevisiae to be the site of modification.

Ubiquitin and SUMO are Expressed in K. brevis

The presence of ESTs with homology to ubiquitin and SUMO in the K. brevis

library and their predicted interactions with K. brevis PCNA identify these proteins as candidates for the post-translational modification of PCNA in S-phase. Numerous ESTs with homology to members of the ubiquitin pathway have been identified in K. brevis, including 133 sequences listed in Appendix 2. Commercial antibodies were found to have specificity to both polypeptides in K. brevis (Figure 3). The anti-ubiquitin antibody recognizes multiple bands in K. brevis protein lysates at all points in the cell cycle, as expected if it is used to modify other protein substrates. However, a short exposure of 5

seconds of the western blot to film shows a band at ---6kDa, a band at r-.J 14kDa and a higher molecular weight band at ---55 kDa (Figure 3A). The last two lanes of this gel are samples from LDT16 and 18, respectively. The differential pattern may be different in these 2 samples due to the fact that the cell population is in the last part of S-phase, or it might reflect dark-phase specific protein modification, as these time points occur in the dark. The anti-sumo antibody recognizes the sumo polypeptide in K. brevis, shown to run at ---I 7kDa. This antibody also recognized the human recombinant proteins used as positive controls (Figure 3B).

Immunoprecipitation of PCNA

In an effort to identify if the 37 kDa band observed during S-phase represents ubiquitin or SUMO modification ofPCNA, PCNA was immunoprecipitated from S- 131 phase cells and Western blot analysis was performed using the same PCNA antibody.

The anti- K. brevis PCNA antibody immunoprecipitated four prominent forms of the

PCNA protein (Figure 4). These included the 28 and ~37 kDa forms that were reported in Chapter 3 experiments over the K. brevis cell cycle and the growth curve. Additional bands were detected at ~47 kDa and 62kDa. In order to test whether any of these bands represented a sumoylated form of PCNA, which would be expected to occur most prominently during S-phase, once again PCNA was immunoprecipitated from S-phase cells and subsequently probed with anti-SUMO. This experiment showed that a sumoylated form ofPCNA at ---47kDa (Figure 5A). In order to determine whether this band represents sumoylated PCNA, this blot was stripped of the sumo primary antibody and the HRP-conjugated secondary antibody and reprobed with anti-PCNA. Indeed, this band was present; however the 28kDa was not present in this immunoprecipitate (Figure

5B), leaving its interpretation open to question.

If the 37 kDa band is a sumoylated form and limited to S-phase, we might expect its expression to decrease as cells proceed through the growth curve from log to stationary phase, where growth is diminished. Therefore, cultures of K. brevis (125mL flasks) were sampled in quadruplicate during mid-S-phase at LDT12 on Day 6, 8, 10, 12 and 14, representing mid-log phase, late log phase, early stationary phase, stationary phase and late stationary phase, respectively (Figure 6A). Flow cytometry shows that on

Day 6 at LDTI2, 60% of the cell population was in S-phase with 30% in S-phase on Day

8, 11 % on Dayl0, 6% on Day 12 and 0% on Day14 (Figure 6B). On Day 14, only one culture was still alive and had high cell abundance. However, since the four other cultures on Day 14 were either dead or had low cell abundance, these samples were not 132 used for PCNA western blot analysis. Western blot data shows that the ---37kDa PCNA is present during S-phase in log phase cultures and decreases repeatably as cells approach stationary phase (Days 6 through 10) (Figure 7 A). In early stationary phase (D 10),

PCNA is barely present. Unexpectedly, a prominent PCNA band at ""-J47kDa appears on some, but not all, Day 12 cultures. Given that on Day 14 these cultures began to decline, the presence of this 47 kDa band in some may indicate a tipping point, where some cultures are beginning to decline and others are not yet there. Densitometry of these proteins is shown in Figures 7B and C. Although there was no significant difference using a one-way ANOVA, there is a clear, repeatable trend over the growth curve for both forms of PCNA. The lack of significance may he due to the high variability with which K. brevis culture enter stationary phase.

These results prompted a confirmatory study in which we compared the expression ofPCNA at LDT12 in log phase and stationary phase, 5 replicates each in 125 mL flasks. In the log phase cultures at LDTI2, the ---37kDa PCNA protein is present in all replicates (Figure 8A). Flow cytometry data shows that 41.1-64.5% of the log phase cell population was in S-phase at LDT12 when sampling occurred (Figure 8A). In the stationary phase replicates, the ---37kDa PCNA is barely present in one replicate and absent from all others replicates (Figure 8B). In 3 of the 5 samples, the ---47 kDa hand wasprominent. Flow cytometry shows that the stationary phase cultures have only 0-

8.9% of the cell population proceeding through S-phase (Figure 8B).

The presence ofa 47 kDa version ofPCNA was unexpected and warranted confirmation of its specificity. Therefore, we carried out a peptide competition assay to confirm that it is in fact PCNA. Figure 8C shows both the 47 kDa band and the 37 kDa 133 form ofPCNA are present. The antibody specificity is blocked with a 100-fold excess of

peptide antigen. Given that it appears only in cultures that appear to be on the cusp of

dying, we hypothesized that this band might represent a ubiquitinated form ofPCNA. To

address this question, immunoprecipitation was carried out using the PCNA antibody and

Western analysis with the anti-ubiquitin antibody. Unfortunately, none of the

immunoprecipitations provided clear evidence of PCNA ubiquitination in K. brevis (data not shown). However, an intriguing result pointing to the possibility ofPCNA ubiquitination is shown in Figure 9. A changing pattern of ubi quit inat ion is apparent in protein lysates from LDT12 in Day 10 (early stationary phase) and Day12 (late stationary phase) cultures (Figure 9A), when protein degradation might be expected to increase. On both Day 10 samples and one Day 12 sample, the ubiquitin pathway components and substrates match. It is in these exact biological replicates that the --47kDa PCNA is not present (Figure 9B). Additionally, the ubiquitin monomer is abundant in these samples.

In contrast, the Day 12 sample Kb19 shows an entirely different ubiquitination pattern. A prominent band of ---47 kDa appears, while the pool of monoubi quit in almost disappears in this sample. It is in this exact biological replicate that the PCNA ---47kDa band is quite prominent (Figure 9B) and although this data is indirect, it provides evidence of ubiquitination that is consistent with the appearance of the 47kDa band recognized by the

PCNA antibody.

Candidate Cyelin-Dependent Kinase and eyelin Genes in the K. brevis Library

Sixteen ESTs with homology to cyclin-dependent kinases are present in the K. brevisEST library as identified by BLAST and gene ontology searches in Genbank by 134 BLAST2GO (Table 1). These sixteen ESTs fold into 5 unique contigs. Two have high homology to CDK7 (e-56) and one has homology to CDK C (e-66). One is the reverse complement and the last has poor homology to known CDKs. Using PCR with the spliced leader as forward primer (5' TCCGTAGCCATTTTGGCTCA 3') and a gene specific as reverse primer (5' GTGCAGGAATGTGTTGCCGTTT 3'), the full length sequence for the contig with homology to CDK C was obtained (Contig 5041, Marine

Genomics IDI08857, Table 1). This sequence has the 5' spliced leader (Figure 10) and a cyclin binding motif (PLTAIEK) as shown by alignment with other protist and plant

CDKs (Figurel1). Residues conserved within CDK groups and among species that are involved in A TP binding have an asterisk (*) above them (Figure 11). Interestingly, the conserved threonine residue present in the CDK T-Ioop, which is where CDK is phosphorylated and activated by CDK activating kinase (CAK), is substituted with a serine in K. brevis. Although divergent, this residue can still be phosphorylated and activated by a potential CAK, yet to be characterized in K. brevis. This sequence has highest homology to plant CDKC (Arabidopsis, 3e-57) and a green algae CDK (C. reinhardtii, 2e-51). CDKC was shown not to interact with either mitotic or G 1 cyclins in the tomato plant (Joubes et aI., 2001), but rather to bind to RNA polymerase and regulate transcription. A partial alignment of the K. brevis CDKs shows that these CDKs are more closely related to plant (Arabidopsis), green algae (Chlamydomonas) and as expected

Apicomplexan (Plasmodium) CDKs (Figure 12).

Concentration Dependence of CDK Inhibitor Activity

Since analysis of CDK sequences in the EST library did not identify a candidate

G 1 specific CDK in K. brevis, we sought to gain insight into which (if any) type of CDK 135 might be involved in regulating S-phase proteins, using two CDK inhibitors of differing specificity (Figure 13). In order to investigate the effects of the CDK inhibitors, olomoucine and fascaplysin, in K. brevis it was first necessary to identify inhibitor concentrations that were effective in causing a cell cycle block, but were not toxic to the cells as determined by SYTOX staining (Figure 14). Fascaplysin concentration was tested from 0.75 - 5 J.lM (Table 2). From this range, a final concentration of 1.5JlM was selected for further studies since it allowed only 5% of the cell population to proceed through S­ phase, without extensive loss of viability (Table 2). Olomoucine concentration was tested from 6.25 - 50 JlM (Table 3). From this, a final concentration of 12.5 JlM was selected for further experiments as determined by viability (96%) and the percent of the cell population blocked in G 1 93.8% at LDTI2.

PCNA Expression Patterns in Flask Cultures

The growth rates of K. brevis cultures grown in 125 ml flasks were found to be substantially higher (K'= 0.6) than those in the liter bottles used in Chapter 3, and therefore flasks were used for the subsequent growth phase studies. However, because the cells were more synchronous, the pattern ofPCNA expression was first established as compared to bottle cultures. The flask cultures in this study were highly synchronous with --60% of the cell population in S-phase by LDT12 (Figure 15A). PCNA protein abundance was investigated at LDT2, 10,12,14,16, and 18. At LDT2 (G1) the 37 kDa form ofPCNA was already present and more abundant than the 28 kDa form, which dominated this time point in the bottle cultures (Figure 15B). However, the abundance of both forms ofPCNA was low. At all other time points, all of which had >50% of cells in 136 S-phase (Figure 15A), only the 37 kDa form was present (Figure 15B). Densitometry shows a decreasing, but variable, trend in the expression levels in LDT 18 replicates

(Figure 15C). This corresponds to the time when the cell populations are beginning to exit S-phase and is consistent with our previous observations (Figure 15A).

Effects of CDK Inhibition on PCNA Protein Abundance

Once optimal concentrations of olomoucine and fascaplysin were determined, for both cellular viability and G 1 block, 5 replicate 125mL flask cultures were treated with

12.5 11M olomoucine or DMSO carrier, 1.511M fascaplysin or water carrier, just prior to

LDTO and sampled at LDT12 (mid-S-phase) for viability, cell cycle analysis and protein abundance ofPCNA. Sampling occurred 12 hours after CDK inhibitor addition when the cells would normally be in mid-S-phase as shown by the controls. Table 4 shows the results from the flow cytometry, SYTOX viability staining, and cell counts per mL. Both fascaplysin and olomoucine addition resulted in the expected G 1 block with only 0.84% and 2.54%, respectively, of the cell population proceeding through S-phase.

PCNA protein production was absent at LDT12 when cells were treated with olomoucine and fascaplysin (Figure 16). Interestingly, expression of both the 28 and ---37 kDa forms ofPCNA were blocked. In the untreated controls at LDT12, only the ---37kDa band was present, as expected for the peak of S-phase in the flask cultures. For comparison, a protein lysate showing S-phase bands from a lL bottle culture at the same time point had both 28 and 37 kDa bands observed previously. Fascaplysin blocked the production of PCNA more effectively than olomoucine (Figure 16A), however both inhibitors caused a significant reduction in the production of this protein when 137 normalized to untreated controls at the same time point (Figure 16B, p

ANOVA). These results suggest both that CDK is involved in regulating the increased expression ofPCNA in S-phase and a CDK with CDK4-like sensitivity to fascaplysin is present in K. brevis and active at G l/S.

Growth Phase Specificity of PCNA Expression may Provide a Biomarker for Bloom

Status

Because the 37 kDa form ofPCNA appears only during S-phase, it has the potential to serve as a marker of actively growing blooms of K. brevis that might find utility in field monitoring for the growth status of Florida red tides. We therefore compared the expression level of the 37 kDa band at LTD12 in actively growing populations (0.30 - 0.60 dive /day), slow growing populations (0.04 - 0.20 dive /day), and populations in stationary phase (0 diVe /day). Growth curves were produced by taking cell counts every other day and on the day of sampling in order to calculate divisions/day in each culture at the time of harvest (Figure 6A). Flow cytometry was used to assess cell cycle distribution in each culture and the percent of cells in S -phase was determined

(Figure 6B). When the western blot band intensity was plotted vs. the percent of cells in

S-phase for each sample, there was a significant linear relationship between the 37 kDa form of PCNA protein and the percent of cells entering S-phase (Linear correlation, r2=O.7812, p

PCNA may be useful as a marker for actively growing blooms. 138 DISCUSSION

Potential Post-translational Modifications ofPCNA by SUMO and Ubiquitin

In higher organisms, PCNA undergoes two different post-translational modifications that regulate its activity in a cell cycle dependent manner, sumoylation and ubiquitination (Hoege et aI., 2002). Ubiquitin-dependent proteolysis refers to the conjugation of the ubiquitin polypeptide to a lysine residue on the target protein for degradation by the 26S proteosome. Conjugation of ubi quit in to the substrate protein is through a three-step pathway (Sorkin et aI., 2009): the El Ub-activating enzyme forms a high-energy intermediate with ubiquitin; then the E2 Ub-conjugating enzyme carries the activated Ub to ligase E3 which is bound to the substrate (Sorokin et aI., 2009); then, once the protein is presented to the proteosome, deubiquitination occurs by deubiquitinating (DUBs) and ubiquitin can be recycled. Substrates can either be mono- or polyubiquitinated, which can occur via chains or branches and leads to proteosome degradation. Monoubiquitination is the attachment of one ubiquitin to a protein (Hicke and Dunn, 2003), does not lead to degradation and can be involved in multiple processes. Multiubiquitination (multiple monubiquitination of severallysines on the substrate) leads to protein destruction by the (Haglund et aI., 2003). There is precedence for many proteins to be ubiquitinated at different times over the cell cycle, including the yeast G 1 cyclins, Cln2 and Clin3, yeast CDK inhibitors, Sicl and Farl, the

Kip1 mammalian G 1 regulators, cyclins D and E, and the mammalian CDK inhibitor p27 , which all require phosphorylation before ubiquitination (Ciechanover et aI., 2000). There are many other examples of ubi quitin modified substrates that are specific to each phase of the cell cycle and can be further characterized by non-proteolytic and proteolytic 139 ubiquitination (Wickliffe et aI., 2009). Specifically, there is extensive research that has

investigated the ubiquitination ofPCNA. PCNA can be both mono and

polyubiquitinated. PCNA monoubiquitination has been demonstrated in most

eukaryotes, from budding and fission yeasts to human, chicken and Xenopuslaevis egg

extracts, however polyubiquitination ofPCNA has only been clearly detected in the

yeasts and X laevis egg extracts (Ulrich, 2009). PCNA monoubiquitination promotes

translesion DNA repair and polyubiquitination ofPCNA promotes error-free DNA repair

(Hoege et aI., 2002; Gill, 2004). PCNA is also ubiquitinated during normal cell cycle

progression. In yeast, metazoans and mammalian cells, PCNA is degraded during S­

phase after it is used to prevent re-replication thus promoting genomic stability. This

process is mediated through CRL4Cdt2 which is an E3 ubiquitin ligase (Abbas and Dutta,

2011).

Sumo (~mall ubiquitin-like modifier) post-translational modifications also have a wide range of effects within the cell (Johnson, 2004). Sumoylation is involved in cell cycle regulation, transcription, cellular localization, DNA repair, mitochondrial activity, plasma membrane ion channels, and chromatin organization (Andreou and Tavemarakis,

2009; Hannoun et aI., 2010). Sumo only shares ~ 18% homology with ubiquitin but has a similar 3-dimenstional structure (Bayer et aI., 1998). Like ubiquitin, sumo modifies lysine residues of substrate proteins, is expressed in all eukaryotes, and has a conjugation-deconjugation system that is reversible (Meulmeester and Melchior, 2008).

The sumo polypeptide is somewhat divergent in eukaryotes with variation in protein size from proteins ----9 - --11 kDa (Andreou and Tavemarakis, 2009; Hannoun et aI., 2010).

The E1 activating and E2 conjugating enzymes involved in sumoylation are highly 140 related to those involved in ubiquitination, however there is only one known E2 enzyme,

Ubc9, involved in sumoylation where there are dozens ofE2 enzymes in the ubiquitin pathway (Gill, 2004). There is a sumo consensus motif found on many sumo substrates which consists of the sequence 'P-K-X-E, where 'P is a large hydrophobic amino acid and

K is the site of sumo conjugation (Rodriguez et aI., 2001). PCNA has been found to posses this motif (Ulrich, 2009) and PCNA sumoylation takes place during S-phase or after high doses of DNA damage (Hoege et aI., 2002). Sumoylation ofPCNA is thought to prevent deleterious hyper-recombination during normal replication and lack of sumoylation causes increased spontaneous sister chromatid recombination during mitotic growth (Pfander et aI., 2005). PCNA can undergo polysumoylation in yeast, however the function and importance of this modification is currently unclear (Parker et aI., 2008;

Windecker and Ulrich, 2008).

Sequence analysis shows that one subunit of K. brevis PCNA codes for a protein of --28kDa. There are no sequences in the cDNA library that appear to code for a protein with a longer amino acid sequence. Therefore, we hypothesized that the 37 and 47 kDa forms ofPCNA may be post-translationally modified forms ofPCNA. In eukaryotic cells, PCNA is ubiquitinated in response to DNA damage, promoting the bypass of replication-blocking lesions (Hoege et aI., 2002; Stelter and Ulrich, 2003; Kannouche et aI., 2004). In S. cerevisiae, PCNA becomes sumoylated in a damage-independent manner during S-phase, preventing unscheduled recombination events (Pfander et la.,

2005). Sumo and ubiquitin can also coexist on a single PCNA subunit in yeast

(Windecker and Ulrich, 2008). In this study, we identified both the ubiquitin and sumo polypeptides in K. brevis, as well as many genes involved in the ubiquitin and ubiquitin- 141 like systems, including ubiquitin precursors (polyubiquitins), ubiquitin-conjugating and ligating enzymes, as well as members of the 26S proteosome and a sumo-activating enzyme (Appendix 2), providing strong evidence that both of these post-translational modification systems exist in this dinoflagellate. Based on sequence analysis, the K. brevis ubiquitin sequence encodes a polypeptide of 8.5kDa, typical to other eukaryotes, and sumo for 9kDa, somewhat smaller than the typical 10 - 11kDa. By western blot, ubiquitin was somewhat smaller than expected, --- 6kDa; however this discrepancy in molecular weight on acrylamide gels is similar to what is observed in other organisms

(Ott et aI., 1998). The sumo polypeptide migrated somewhat larger than expected at '" 17 kDa but again free sumo has been shown to migrate at '" 17kDa in Xenopus (Wang et aI.,

2009) and ~15kDa in Arabidopsis (Colby et aI., 2006), while their sequences predict 10.7

- 11.7kD and 11 - 12 kDa, respectively. This appears to be common in analysis of sumo proteins by western blotting as all known sumo proteins are ~ 100 amino acids long but have aberrant sizes when run on gels. For example, in human cancer cells (hep2 cells) sumo 1 migrates above 16.5 kDa, however the amino acid sequence codes for 11.6 kDa

(Bailey and O'Hare, 2005)._Similarly, the human recombinant sumo 1,2 and 3 run as standards in the current study all migrate at approximately ---14kDa, somewhat smaller than the K. brevis sumo, but larger than their predicted sizes from sequence analysis (10.8

- 11.6 kDa).

Once it was determined that the sumo and ubiquitin polypeptides were present and highly conserved in K. brevis, the PCNA sequence was analyzed using a bioinformatic approach to investigate whether there were conserved lysine residues or motifs within PCNA where these modifications could occur. When the K. brevis PCNA 142 sequence was analyzed using UbPred, a ubiquitination prediction program (Radivojac et

aI., 2009), three lysines were recognized as possible sites of ubiquitination with medium

confidence (Figure 2). The sites identified included K184, K190 and K194, which do not correspond with the site ofPCNA ubiquitination (K164) in most eukaryotes (Shaheen et aI., 2010). A lysine residue is found in the K. brevis PCNA (at K165) and aligns with this

site in all other eukaryotes, but is found within the amino acid context SKEG. The program used for this analysis was developed and trained on a combined set of 266 non­ redundant experimentally verified ubiquitination using experimental proteomic data from

S. cerevisiae (Hitchcock et aI., 2003 and Peng et aI., 2003) and known ubiquitination sites involved in human disease from GenBank searches. Therefore, this program may not be the best predictor for dinoflagellate sequences which appear to be divergent in many respects from model eukaryotes for which large scale datasets are available. Ubiquitin biding domains are structurally diverseare found in proteins that contain a variety of structural features and diverse biological functions (Hicke et aI., 2005). SUMOsp (Ren et aI., 2009) was also used to identify potential sumoylation sites on K. brevis PCNA. The sites ofPCNA sumoylation are not as well characterized as those of ubi quitin at ion and do not appear to be as conserved. Two potential sites of sumoylation were identified in K. brevis PCNA (Figure 2) including K190 (DKPEE) with medium confidence, which is a non-consensus site and K238 (VKYE) with high confidence, which is within a well­ characterized sumoylation motif ('P-K-X-E) found in other organisms. Interestingly, when PCNAs from other organisms were analyzed, SUMOsp also predicted that K190 would be sumoylated in H. sapiens and C. reinhardtii. However, the K238 sumoylation site appears to be unique to dinoflagellate PCNA, as this residue and its amino acid 143 context present in K. brevis is also found in Pyrocystis lunula (Okamoto and Hastings,

2003), Pfiesteria piscicida (Zhang et aI., 2006), Prorocentrum donghaiense (Zhao et aI.,

2009) as well as Alexandrium catenella (Sui et aI., 2008, unpublished from GenBank),

Karenia mikimotoi and Peridiniumfolicaceum (Hou et aI., 2009, unpublished from

GenBank).

Based on sequence analysis of potential PCNA post-translational modification sites and the dynamics ofPCNA over the cell cycle, it appears likely that K. brevis PCNA is modified by sumo and ubiquitin. Immunoprecipitation with anti-PCNA was inconsistent, but revealed both the 28 and 37 kDa forms, as well as bands at 47kDa and above 62kDa. We had not previously observed the ---62 kDa band. However, since

PCNA can be both polyubiquitinated and polysumoylated (Shaheen et aI., 2010;

Windecker and Ulrich, 2008) this higher band could be due to one of these processes.

When a PCNA immunoprecipitate was probed with anti-SUMO, a band that appeared at

----47kDa. This was higher than expected, based on a 9 kDa SUMO addition; however this band was also visibly in an S-phase whole cell lysate run in parallel and probed with anti­

SUMO. When this blot was stripped of the sumo antibody and probed with the anti­

PCNA antibody, the same band appeared, lending further evidence to that hypothesis that this represents sumoylated PCNA. Probing PCNA immunoprecipitates with anti­ ubiquitin was uninformative, due to inconsistency in the immunoprecipitation method to even yield PCNA bands. However, the comparison of ubi quitin patterns with PCNA patterns over the growth curve in K. brevis identifies a ---47 kDa band that co-migrates with the 47 kDa band recognized by anti-PCNA. Thisis an interesting result that should be further investigated to determine how ubiquitination is changing over the cell cycle. It 144 is expected as a cell population ages and no longer needs certain proteins for cellular

processes, such as DNA replication, they would be degraded by the ubiquitiniproteosome

system.

The transition from G 1 to S-phase is a critical point when cells activate a division

program, commit to duplicate the genome and proceed to divide (Gutierrez et aI., 2002).

For this transition to occur, cells must pass the restriction point (RP) at some time during

G 1 when growth factors or mitogens are optimal to ensure successful genome

duplication. Passage through the RP, also termed the G 1 checkpoint or S TART in yeast,

prior to the G l/S transition regulates entry into the cell cycle and subsequent DNA

replication. Cyclin dependent kinase controls the passage through the RP. Although it

appears that the dark to light transition is what entrains cell cycle entry in K. brevis to the photoperiod, this "zeitgeber" is not precisely equivalent to the RP since cells continue to

cycle in continuous light.In the green alga, Chlamydomonasreinhardtii, a light dependent

commitment point (restriction point) occurs in G 1 (Zachleder and Van Den Ende, 1992).

In contrast to C. reinhardtii, in Euglena, light-dependent commitment points also occur

post-G 1, although their biochemical equivalency to a restriction point is uncertain

(Hagiwara et aI., 2001). The results in Chapter 3 suggest that the restriction point in K.

brevis occurs prior to LDT2, since PCNA positive cells are present by that time.

However, the molecular constituents of the restriction point have yet to be defined in dinoflagellates. The results presented in this chapter show that there is most likely a

CDK4-like CDK involved in the G liS restriction point in K. brevis since blocking CDK

with fascaplysin in G 1 prevents S-phase entry and PCNA production.

145 Cyclin-dependent protein kinases (CDKs) are a large family of serine/threonine kinases.Certain members of the CDK family regulate the cell cycle while others function in different cellular processes such as central regulators of transcription (Morgan, 1997;

Liu and Kipreos, 2000). The CDKs involved in cell cycle regulation have cyclin partners to which they bind and these complexes act as the main drivers of the eukaryotic cell cycle. The activity of CDK is regulated by multiple mechanisms which include cyclin association, specific phosphorylations and dephosphorylations and interactions with CDK inhibitors (CDKls) (Sherr, 1996; Sherr et aI., 1999; Keiichi and Nakayam, 2005).

Throughout the cell cycle, cyclins and CDKIs oscillate due to proteolysis (Murray, 2004) and it is the ubiquitin-proteosome pathway that mediates the degradation of these regulatory proteins (Weissman, 2001; Pickart, 2004). In model eukaryotes such as mammals, when the restriction point is reached, the CDK4/6/cyclin D complex phosphorylates the retinoblastoma protein (pRb) during early G 1 which leads to the activation of E2Fs, a family of cell cycle regulatory transcription factors that activate transcription of S-phase genes (Cam and Dynlacht, 2003). The E2F transcription factor family is comprised of 8 members (Singh et aI., 2020). E2Fs 1, 2 and 3 interact with pRb and have a strong transcriptional activation domain (Slansky and Farnham, 1996). E2Fs

4 and 5 have little or no transcriptional activity and E2Fs 6-8 do not bind pRb and are used to repress transcription by recruitment of transcriptional co-repressors (Frolov and

Dyson, 2004; DeGregori and Johnson, 2006). E2Fl, 2 and 3 can strongly induce S-phase entry and their activity is necessary for the induction of many genes required for S-phase progression. pRb inhibits cell cycle progression by repressing the transcriptional activity ofE2Fs and their target genes (Dimova and Dyson, 2005). E2Fs are absent from 146 Apicomlexans and to date, no E2Fs have been identified in dinoflagellates. Therefore, the activation of cell cycle proteins must be occurring through an alternative method.

The K. brevis CDK sequences appear to be divergent from the well-studied yeast and mammalian CDKs and cyclins. In yeasts, there is one CDK that drives the cell cycle,

Cdc2 in S. pombe and CDC28 in S. cerevisiae (Simanis et aI., 1987). A plant homolog of the CDK cdc2 was first identified in Chlamydomonas and wheat (Pisumsativum) followed by numerous other plant species (Francis, 2007). There are seven classes of plant CDKs: A-G of which the CDK As, Bs, D and Fs are involved in cell cycle control

(Umeda et aI., 2005; Francis, 2007). Table 1 shows that the K. brevis CDKs appear to be closely related to plant CDK C (Arabidopsis and Oryza) as well as protist CDK7

(). Many examples ofCDKs and cyclins have been found in protists such Leishmania mexicana (Hassan et aI., 1997), (Kvaal et aI., 2002;

Khan et aI., 2002), PlasmodiumJalciparum(Le Roch et aI., 2002) and Trypanosoma brucei (Ellis et aI., 2004). In L. mexicana, one CDK (or CRK, cyclin-dependent related kinase) has been characterized that is a homolog ofcdc2 which is human CDKI (Gomes et aI., 2010). This same study purified and expressed 7 putative recombinant CRKs from

L. major. Two putative isoforms of a cdc2-related protein have been identified in T. gondii (Donald et aI., 2005). InT. brucei, 8 CRKs have been identified that appear to have different functions over the cell cycle (Gourguechon and Wang, 2009). The CDKs, cyclins, and a CDK effector protein in P. Jalciparum were identified using homology­ based peR and database mining approaches (Doerig et aI., 2008). It was recently found that a Plasmodium CDK (Pfmrk) binds to an effector protein PfMATl, a putative CDK activating kinase, which activates Pfmrk and this complex co-localizes to the nucleus and 147 regulates DNA synthesis (Jirage et aI., 2010). Phylogenetic analysis ofCDK sequences from members of the protist groups Giardia, Trypanosoma and several from Plasmodium showed that these CDKs do not strongly associate with any established CDK family and it is suggested that although they are orthologs of defined CDK groups, they have diverged to the point that they are not recognizable using sequence-based phylogenetic methods (Guo and Stiller, 2004). A cdc2-related kinase in T. brucei was shown to have a

"PQTALRE" motif (Ellis et aI., 2004) and in P. Jalciparum, the CDK-related kinase

PtPK6 possesses a SKCILRE motif (Bracchi-Ricard et aI., 2000). Therefore, protist

CDKs appear to be divergent from metazoan CDKs, although functional conservation exists throughout eukaryotes.

The divergence of dinoflagellate CDKlcyclins may not be suspiring based on their unusual cell cycle. The Apicomplexan Plasmodium, possesses 6 CDK-like kinases that are divergent from the typical eukaryotic CDKs and to date only 3 of these have shown cyclin-dependent activity in vitro (Doerig et aI, 2002). CDKs have been identified in a limited number of other marine phytoplankton. In the diatoms Thalassiosira pseudonana and Phaeodactylum tricornutum, large scale data mining of their recently sequenced genomes identified multiple key cell cycle regulators such as CDKs, CDK interactors and cyclins (Huysman et aI., 2010). Five CDKs were identified in P. tricornutum and it appears that both of these algal species have novel components including diatom-specific cyclins. Similarly, a dinoflagellate CDK5-like protein has been characterized from in

Lingulodinium polyedrumby protein microsequencing and phylogenetic analysis(Bertomeu et aI., 2007) and still leaves the question of CDK control of S-phase entry in dinoflagellates unanswered. CDK5 in metazoans is involved in neuronal 148 development (Lalioti et aI., 2010). This dinoflagellate CDK5-like kinase was shown to react with an anti-PSTAIRE antibody. The PSTAIRE motif is the conserved cyclin binding motif on CDKs. Thus, this CDK would appear to be involved in the cell cycle.

The K. brevis full length CDK sequence has highest homology to the plant CDKC, not mammalian CDK5, and has a motif (PLTAIEK) that aligns with the cyclin binding domain of plants (Arabidopsis), other algae (Chlamydomonas) and other protists

(Plasmodium). However, this CDK sequence has conserved ATP binding residues and the CDKI studies show that there are G 1 CDK homologs in K. brevis.

The cyclin partners ofCDKs are equally poorly characterized in dinoflagellates.

Cyclins were originally identified because their abundance changed over the cell cycle

(Murray, 2004). Cyclins bind to CDKs through the PSTAIRE motif on CDKs, and these binding motifs appear to be divergent as CDK sequence and function diverge within and between organisms. This is shown in this Chapter with the K. brevis eye lin binding domain diverging from the other motifs (Figures 11 and 12). In kinetoplastids, the CDK­ related kinases show the expected biochemical interactions of CDKs and interact with cyclins; however, the PSTAIRE motif is degenerate (Ellis et aI., 2004). A mitotic cyclin homolog was identified in the dinoflagellate L. polyedrum through construction of a cDNA library, followed by PCR and confinned by functional complementation assays in yeast to act as a mitotic cyclin (Bertomeu and Morse, 2004). Cyclin B homologs have been identified in both K. brevis (Barbier et aI., 2003) and Crypthecodinium cohnii(Barbier et aI., 1998) by immunoreactivity with a yeast cyclin B antibody. The identification of G I-specific CD Ks and cyclins in dinoflagellates will require additional

149 sequencing and in particular, obtaining full length sequences that are currently largely missing in EST collections.

CDK Inhibition

Since we still lack information on G I-specific CDKs or cyclins in dinoflagellates, in order to begin to investigate how S-phase genes may be modulated in the absence of transcriptional control, we investigated if their expression is dependent on CDK activity.

To provide discrimination between CDK classes we employed both the broad inhibitor olomoucine and the CDK4-specific inhibitor fascaplysin in G 1 to inhibit CDK activity.

Both CDK inhibitors prevented progression into S-phase and expression of the 37 kDa form ofPCNA when added to cells at the dark to light transition (LDTO). The olomoucine results show that there is most likely a CDK 1,2, 4 and/or 6 functional homolog involved in the G I/S transition. The G 1 specific block produced by fascaplysin points to the involvement of a CDK4-like protein in the G I/S transition. To inhibit

CDK4, fascaplysin binds in the ATP binding pocket ofCDK4 by interacting through a bidentate (having two atoms from which electrons can be donated) hydrogen bond donor/acceptor pair, forming one hydrogen bond with the NH backbone ofVa196 via the carbonyl offascaplysin and Nl of the adenine ring (Soni et aI., 2000). The inhibitory action of fascaplysin is dependent on CDK4 being bound to its cyclin partner with an

IC so of 0.35 JlM in mammalian cells (Huwe et aI., 2003). The specificity of fascaplysin to bind to CDK4 as opposed to CDK2 or CDK6 is based on the differences in structure.

CDK4 has an acidic residue (EI44) where CDK2 has a neutral residue (QI31) and a neutral residue in CDK4 (T 102) is substituted for the positively charged K89 in CDK2, leading to a two-unit increase in the formal charge of the A TP biding pocket of CDK2 150 relative to CDK4 which renders a more favorable electrostatic energy with CDK4

(Mcinnes et aI., 2004). Among mammalian CDKs there is strong sequence homology between the catalytic domains where A TP binds suggesting similar three-dimensional structure (Gray et aI., 1999). Interestingly, most of these residues are conserved among the CDKs within a species, for example in CDKs 1-9 in humans (Gray et aI., 1999), and many are conserved even in K. brevis, C. rein hardtii, a diatom and two plant species

(Figure 11).

Although, these results do not indicate a direct link between CDK and S-phase protein expression, they do suggest some CDK dependent pathway responsible for the translation ofPCNA in absence of transcriptional regulation. In Arabidopsis, CDKA,the plant cdc2 homolog active in G 1, has been found to interact with elF4a translation factor which suggests a direct link between the cell cycle and the regulation of translation

(Hutchins et aI., 2004). This interaction is abundant in actively proliferating Arabidopsis cells and absent from cells that are not dividing. CDKI phosphorylation of the eIF-4E binding proteins (4 E-BP 1) during mitosis releases elF4 allowing translation to occur in

HeLa (cervical cancer) cells (Heesom et aI., 2001), which is another example ofCDK controlled cell cycle progression involving translation control. In K. brevis it may be that

G 1 CDKs are also interacting with the translational machinery to regulate progression of the cell cycle. Interestingly, no CDKls, such as p2l, p27 or p57, have been identified in

K. brevis, which are responsible for preventing the initiation of DNA replication

(Furstenthal et aI., 2001). CDKls, such as members of the CIP and KIP families, have also not been identified in other protists including Plasmodium (Doerig et aI., 2002) or diatoms (Huysman et aI., 2010) and it has been suggested that their sequences may be too 151 divergent to allow their identification by homology searches. Clarification of the mechanisms by which CDK regulate S-phase proteins in dinoflagellates first requires the identification of the CDKs active at Gl/S. The current study suggests a protein with

CDK4-like activity will be involved.

Growth Rate Specific PCNA

Because of its association with cell proliferation, PCNA has been used for prognosis of tumor and cancer development (Lee et aI., 1995) and has the potential for studying growth in phytoplankton. Examination ofPCNA at short increments over S­ phase in K. brevis showed that the --37 kDa form is quite abundant throughout DNA replication during log phase growth (Figure 15). It was expected that the lower band of

28 kDa would be present at LDT2, at which time it was barely visible, and the 37 kDa band would gradually increase and dissipate over S-phase. However, due to the high synchronicity of these cultures, if there is a mechanism producing this higher band of

PCNA such as a post-translational modification, most cells may be utilizing this physiological response. In the dinoflagellate P. do ngha iense, PCNA remained at a high level during exponential growth and decreased when cultures entered late stationary phase (Liu et aI., 2005). In another algal protist, the coccolithophore

Pleurochrysiscarterae, a 36 kDa PCNA homolog was shown to have higher abundance during the exponential growth phase and was diminished in the stationary growth phase

(Lin and Corstjens, 2002).

In the dinoflagellate Pfiesteria piscicida, PCNA was found to be more abundant in the exponential growth phase compared to stationary phase based on western blot analysis (Zhang et aI., 2006). Therefore, in this study PCNA protein expression was 152 investigated over the growth curve in K. brevisto determine whether PCNA expression decreases predictably as cells proceed from log into stationary phase. Surprisingly, we repeatably found a higher molecular weight band of --47 kDa cross-reactive with the

PCNA antibody only in late stationary phase cultures on the cusp of decline. With further development, we propose that this may lend itself to a marker of terminating blooms. In order to confirm the results, Day 6 (log phase) S-phase cultures were compared to Day 12 (stationary phase) S-phase cultures, and it was found that the 37kDa was prevalent in log phase and absent in the stationary phase cultures (Figure 8). This suggests that PCNA in stationary phase cultures could be marked for degradation when the cell no longer needs this protein for proliferation, possibly by ubiquitination.

Potential Utility of PCNA as a Biomarker for Harmful Algal Bloom Monitoring

The comparison of PCNA abundance in log and stationary phase cultures and cultures with decreased growth rate provided insight into whether PCNA might be useful as a biomarker for growth. PCNA expression levels were compared in log phase cells at different growth rates in order to determine if a linear relationship exists between S-phase progression and PCNA expression levels. We found significant linear correlation between percent of the cell population in S-phase and expression of the 37 kDa form of

PCNA. The 37kDa band was prominent in cultures sampled during S-phase (LDTI2) on

Day 6, mid-log phase of growth. In stationary phase cultures (Day 12)sampled at the exact same time (LDTI2), this band is completely absent.The neutral density samples provide important information in that they have reduced growth rate and when sampled at the typical log phase time point, there is decreased production of the PCNA protein.

Therefore, this growth pattern along with the differences in PCNA abundance may be 153 indicative of the growth state and be informative of the phase of K. brevis bloom progression. Indeed, the linear correlation analysis did show a significant relationship between the 37kDa PCNA protein abundance and the percentage of the cell population in

S-phase of the cell cycle (Figure 17). Therefore not only does the 37kDa PCNA have the potential for use as a biomarker of actively growing blooms; the 47kDa may be useful as well in identifying bloom termination since this protein only appears in dyeing cultures.

It will be interesting to test the utility of these potential biomarkers and their relationship to bloom progression in the field. Field populations have been shown to have cell cycle behavior similar to those of laboratory cultures with all bloom patches tested proceeding through a circadian regulated G 1, S-phase and G2/Mitosis in a tightly phased and uniform manner (Van Dolah et aI., 2008). Also similar to laboratory cultures, there is variation in the percent of the cell population that proceeds through the cell cycle on any given day. Based on these similarities between field populations and laboratory cultures, testing ofPCNA protein levels at LDT12-14 could provide predictive information on growth phase of a bloom. Currently, measurements of in situ growth rates of blooms require either repeated analysis by flow cytometry to determine the fraction of cells dividing over the course of a 24 h period, or 24 h 14C uptake measurements as proxies of growth. The former method requires a research vessel to stay on a bloom patch well into the night, making it infeasible for routine monitoring. The latter method both requires radioactivity and artificially manipulates the sample, thereby introducing error in the measurement. With currents, weather and other physical oceanographic conditions, the results will most likely be different from that in a controlled laboratory setting.

154 Notwithstanding, these studies, for the first time, provide a potential protein biomarker

for growth in K. brevisthat may be sampled during mid-day, when cells are in S-phase.

CONCLUSIONS

The studies presented in this Chapter provide information on how S-phase gene

expression may be regulated in an organism that displays post-transcriptional control of

its cell cycle. Although the analysis of CDK contigs present in the K. brevis library did

not identify a candidate G 1 CDK, the inhibition of S-phase entry by fascaplysin provides

evidence for a CDK4-like kinase in this transition in K. brevis. The PCNA protein

decreased under CDKI treatment. Whether CDK is directly or indirectly interacting with

these DNA replication proteins, in the absence of transcriptional control, begs further

investigation. Additionally, there appears to be post-translational modifications of PCNA

occurring not only over the cell cycle, but over the growth curve as well. Evidence is provided here for the potential sumoylation and ubiquitination ofPCNA over the cell

cycle (sumoylation) and over the growth curve (ubiquitination). These post-translational modifications have been shown to be highly conserved across eukaryotes. Sumoylation

is thought to be involved in genomic stability. The prevalence of this form ofPCNA in

K. brevis may reflect the necessity for genome stability due to its unusually large genome, especially during DNA replication. Post-translational modification may be one

level this unusual is regulating cellular processes. These studies provide the first information on post-translational modifications in dinoflagellates. This level of protein regulation may be a very important system in K. brevis and other protists. The results presented in this chapter provide information necessary to determine the level 155 (translational or post-translational) at which to develop biomarkers for bloom progressIon.

156 A 10 20 30 40 50 60 10 80 90 100 - -- - 1 - - - - 1 - - - - 1 - - - - 1 - - - - 1 - -- - K. brevis SUMO ------~! G ~~ S P. piscisada SUMO ------MI!~ :II G-- I QIIAiAJ~ :

P. tricornutum SUMO ------~U'VYaT]n~ H. sapiens SUM03 ------MSI1:EKI)I()]G S. pombe SUMO MSESPSAHISDADKSAITPTTGDT A. tha1i ana SUMO 1 ------M:SAN~!EEDI ~ GI]GIGA C. reinhardhtii SUMO ------MADIE~~Tl~GNV S . cerevisiae SUT3

120 - - 1 - -- - 1 -- K. brevis SUMO P. piscisada SUMO P. tricornutum SlDm Q H. sapiens SlD.f03 ESSLAG S. pombe SlD.fO A . tha1i ana SUMO 1 C. reinhardhtii SUMO S . cerevisiae SlIT3

8 10 20 30 40 50 60 10 -- - - 1 ------...... _. K. brevis Ubiquitin P. fa1ciparmn Ubiqui H. sapiens Ubiquitin S. pombe Ubiqui tin A . tha1iana Ubiquiti I C. reinhardtii Ubiqu

Figure 1: Multiple alignments of (A) the SUMO polypeptide and (B) ubiquitin. Outlined are the conserved G-G motifs that are involved in lysine conjugation on the substrate protein.

157 A 10 20 30 40 ~O 60 10 K. brevis PCNA ~~~~~QQ~~~~~~~~~~~~~~~~~~b~~~~b~Q~Q~~~~!~~~~~~~~~~~~~~~~~b

80 90 100 110 120 130 140 .... I .... I .... I .... I .... I .... I .... I .... I .... I· ... I .... I .... I .... I .... I K. brevis PCNA MNVESLGKIFKMCGPSDSLKLKWQNDADTVSFQCESGEDDRIADFDLKLMQIESEHMEIPEQQYKVLARL

1~0 160 El 110 180 190 200 210 K. brevis PCNA ~~~~i~~~~~~~~~~~iQ~~~i~~~Q~bi~~~~~~~·K ~~~~~~~~~~~~~~~i

220 230 240 2~0 C K. brevis PCNA ~~~~~~~~~~~~~~~~b~~~~i~ ~ I~~~~~~~~~~~~~~

Figure 2: K. brevis PCNA amino acid sequence used for analysis (A) DNA polymerase binding site (B) Conserved Ubiquitin and SUMO binding lysine residue in S. cerevisiae

(C) RFC binding site. Boxed in red are the predicted ubiquitin binding lysine residues by the UbPred program and in green are the sumoylated lysine residues as predicted by the

SUMOsp program.

158 (A) (B) Ubiquitin: 5 sec. exposure SUMO: 30 sec. exposure

kDa kDa H1 H2 H3 62 49 49 38 38 28 28 17 17 14 14 6

Figure 3: (A) The ubiquitin protein in K. brevis at LDT 2, 10, 12, 14, 16 and 18. (B) The sumo protein in K. brevis. The first 3 lanes are K. brevis protein extracts from LDT2

(G1), LDT12 (S-phase) and LDT18 (G2/M). Lanes 4,5 and 6 are human recombinant sumo1, 2 and 3, respectively. Ten micrograms of K. brevis protein extract were loaded in each well and five micrograms of recombinant SUMO proteins were loaded.

159 PCNA IP/PCNA Western

98-

62- < 65kDa PCNA? 49- < Heavy chain (50 kDa) < 47kDa PCNA 38- < 37kDa PCNA 28- < 28kDa PCNA < Light chain (25 kDa) 17-

Figure 4: Immunoprecipitation using the PCNA antibody to obtain all forms ofPCNA.

The blot was probed for Western blotting with the same antibody used for immunoprecipitation. The left lane is negative control (no protein lysate) which shows a faint band at 50kDa, representing the heavy chain of the antibody. The right lane shows immunoprecipitated forms ofPCNA.

160 (A) (8) 1 2 3 1 2 kDa 62- 49- 49 38-

28- 28

18_ 18 14 -

Figure 5: (A) Immunoprecipitation using the PCNA antibody to obtain all forms of

PCNA. The blot was probed for Western blotting with anti-sumo. Lane 1:

Immunoprecipitation of PCNA showing a possible sumoylated form just above 38kDa.

Lane 2: Negative control of no protein lysate showing just the heavy (50kDa) and light

(25kDa) chains form the antibody. Lane 3: Protein lysate from LDT14 showing

components of the sumo pathway/sumo substrates. (B) Blot (A) stripped and probed with

PCNA. The left lane is the immunoprecipitated PCNA that is putatively sumoylated and

right lane is the negative control.

161 (A) 100000 I

I 10000 -~ ....I E • Day 6 Samples K'=O.53 ...... V) • Day 8 Samples K'=O.29 - I -cu I • Day 10 Samples u K'=O.16 1000 ~ X Day 12 Samples K'= -0.13

X Day 14 Samples

100 --1----r--~-~-__r_-__,__-~-__,__-____, o 2 4 6 8 10 12 14

Time (Day)

(B) 100

80 60% c o 60 ~ D%Gl fa -:::J oQ. 40 no %G2/M cu u 11% 6% ?fl. 0% o Day 6 Day 8 Day 10 Day 12 Day 14 Sampling Day

Figure 6: (A) Combined growth curve of K. brevis in 125mL flasks. Sampling dates included Day 6, 8, 10, 12 and 14. (B) Flow cytometry dataat LDT12 on Day 6,8, 10, 12 and 14.

162 (A)

kDa 06 08 010 012 06 08 010 012 06 08 010 012 06 08 010 012 06 08 010 012 62- 49- 38- 28- 17- 14-

(C) (8) -47kDa PCNA 70 -37kDaPCNA >- 0;;; 60 ~ 60 -s:: -G) G) E Cl 50 0 50 -0;;; " G) (I)G) 40 s:: 40 ~.2 G) (I) L- t'G 30 Cl G) ~> o ::J 30 ~(ij -E 20 «> 0> 20 :> 10 G) "1U « am ~ L- 10 0 a 6 8 10 12 G) Ii 0 -- C Time (Day) 6 8 10 -12 Time (Day) p=0.1 079, ANOVA p=0.1886, ANOVA

Figure 7: Western blot analysis ofPCNA over the K. brevis growth curve. (A) PCNA at

LDT12 in 5 biological replicates (B) Densitometry of the ---37kDa PCNA (C)

Densitometry of the ---47kDa PCNA.

163 PCNAat LOT12 in Log vs. Stationary Phase cells 06 vs. 012

(A) (8) Stationary phase replicates kDa Log phase replicates (n=5) kDa

62- 62 49- 49 38- 38 28- 28

%S-phase: 51.7 61.1 64.5 41.1 60.3 0/0 S-phase: 6.8 0 8.9 6.6 6.5

(C)

014 PL 014 PL kDa 62- 49- 38- 28- 18- 14-

Figure 8: PCNA in log and stationary phase cultures. (A) Western blots ofPCNA in flask cultures showing the ---37kDa band. During log phase at LDTI2.Flow cytometry data shows that 55.74% of the log phase cell population was proceeding through S-phase at LDT 12. (B) Western blots data showing a PCNA of ---45 kDa at LDT 12 in stationary phase cultures. PL=protein lysate from a log phase culture at LDT 12 showing S-phase bands. (C) Peptide competition assay showing the specificity of the ---47kDa form ofPCNA. Left panel is western blot

164 showing aLDT12 protein lysate from Day 14 (late stationary) and log phase cultures. Right panel is western blot with antibody incubated in the presence of the peptide antigen to block specificity. PL=protein lysate from a log phase flask culture at LDT12 showing absence of S-phase bands.

165 (B) (A) Ubiquitin in K. brevis: PCNA: Growth Curve samples Growth Curve samples kOa 010 010 012 012 06 08 010 012 06 08 010 012 kOa Kb9 Kb20 Kb19 Kb7 62 Kb9 Kb7 Kb20 Kb19 62 - 49- 49- 38- 38- 28- 28 - 17- 17 - 14 -

6-

3-

Figure 9: Pattern of ubi quitin at ion in K. brevis matches the presence of the hypothesized ubiquitinated PCNA in biological replicates over the growth curve. (A) Western blot with anti-ubiquitin antibody showing the putative ubiquitin pathway components and substrates in Day 10 and 12 samples. (B) Western blot ofPCNA over the growth curve in two sets of biological replicates. Kb 9, 20, 19 and 7 refer to the same lL biological replicate cultures. Highlighted in red is the culture (one biological replicate) which is beyond stationary phase and according to cell counts has cell death occurring.

166 10 20 30 40 50 " 70 80 ,. 100 110 120 ,...... --+~-t--o~-+----+...... , . 1 · · ·· 1 ··· · 1 ··· · 1 ·· · · 1 ···· 1 ···· 1 · ··· 1 ···· 1 ···· 1 ··· · I · ··· I ···· 1···· 1 ···· 1 ···· 1 ···· 1 · · · · 1 · ··· 1 ···· 1 Kb CDK Contig5041 TCC GTAGCCATTTT GGCTCAAG TGCTCC CAAGCTCAGCAAGTCAGATGAGGCACTGCTTATCAGTAGGGTTTATCGAGTT GCC TT GGCAGCTCAGCC GGTGACAGGCC TGTTGCCC TT G 5' Spl.iced Leader TCC GTAGCCATTTT GGCTCAAG

130 140 150 160 110 180 1'0 200 210 220 230 240 Kb CDK Contig5041 C T GC ~ T C TA~ CCC T ~ C GATiG C T G ~ T G ~GGGT ~CC T G~ G CC G~GGA C ~ C TTiGC Ci~ C T ~ C G~G ~ C T C T ~ C GAG ~ GAA~ GAG T ~G CC G~ T GG TiCAG ~ GAAG ~ T G C T ~ Spl.iced Leader

250 2'0 210 280 2,. 300 310 320 330 340 350 3 60 ·· · · 1 ···· 1 · ··· 1 · · ·· 1 ··· · 1 · ·· · 1 ·· ·· 1 · · ·· 1 · · · · 1 ·· ·· 1 · ··· 1 .... 1 . ·.· 1 .··· 1 ··.· 1 .... 1 ···. 1 ·· ·. 1 . . · ·1 .··. 1 ···· 1 ···· 1 ···· 1 · · 1 Kb CDK Contig5041 CTGCTGCTAGACAAGCTAGAAAAGCAAGAAGTCGCAGCC GAGAGGGAGCC GCAGCGGGGGTGCC TTTAACTGCAGTGCC TT CGGCGGCGGTGACTGCGGGTACAGTGCAGG CAGCC GCCA Spl.iced Leader

310 380 3'0 400 410 420 430 440 450 460 410 480 ···· 1 · ·· · 1 ··· · 1 ··· ·1 · · ·· 1 · ··· 1 · ·· · 1 ·· ·· 1 · · · · 1 · · · · 1 ···· 1 ···· 1 · ··· 1 ···· 1 ···· 1 ·· · · 1 · ·· · 1 ···· 1 ···· 1 ···· 1 ···· 1 ···· 1 ···· 1 · · · · 1 Kb CDK Contig5041 CC G C GCACA G CA C GAGAGAGG CAGCAT C G CAGGA T G C GG C GAGC GA GG C G CC T G CC G C TAAGA CC G CACC GG C G CC TACAGA G C T G GAGGC T GGTAG CCC GGAT C T G C ~C tATTATA Spl.iced Leader

4'0 500 510 520 530 540 550 560 510 580 5'0 600 ···· 1 · ··· 1 ·· ·· 1 · ··· 1 · ··· 1 ·· · · 1 ···· 1 ···· 1 ·· · · 1 ··· · 1 · · 1 ···· 1 ···· 1 ··· · 1 ···· 1···· 1 ··· · 1 ··· · 1 ···· 1 ··· · 1 ···· 1 ···· 1 ··· · 1 ·· · · 1 Kb CDK Contig5041 CACTCGCC TT GCCACACGAGACATCTACC GTACACGACTACGAGAGGA TTGGAGCTCCAATT GG CGAGGGCACGTATGG GACGGTTT GGAGAGCGACTT GCAAGAAGACTGGGCAGACGG Spl.iced Leader

'10 '20 630 640 650 '60 610 680 no 100 110 120 ···· 1 ···· 1 ···· 1 ··· · 1···· 1 ···· 1 ···· 1 ···· 1 ··· · 1 ·· · · 1···· 1 ···· 1 · ··· 1 ··· · 1 ···· 1 · ·· · 1 ···· 1 .... 1 .·.· 1 ···· 1 · · · · 1···· 1 ·· · · 1 ···· 1 Kb CDK Contig5041 TT GCAATCAAGCGCGTGATGGTCAGAAATGAGCGTGATGGGTTTCCACTCA CC GCCATCGAGAAATCAGACACTGTACC GCTTAAGCAGAACAACATAGTGAATCTGA TT GAGGTT GTGC Spl.iced Leader

130 140 150 160 110 180 1'0 800 810 820 830 840 ···· 1···· 1 ···· 1 · ··· 1 ···· 1 ···· 1 ···· 1 ·· · · 1 · ··· 1 ·· ·· 1 · ··· 1 ···· 1 · ··· 1 ···· 1···· 1 · · ·· 1 · ... 1 .· ·· 1 ···· 1 · · · · 1· ··· 1 · ··· 1 ···· 1 ·.·· 1 Kb CDK Contig5041 AGACC CAGA GAAGG CA GAA GC GTT C TT GGT C TTT GAATA TT G CC CAAG T GA T C T GA CC GG C T GA T GGCA TACAGAAAG CA G C GT C T GAAAA T GGG C GAGAT ~GTTT GAT CAAGCAG C Spl.iced Leader

850 860 810 880 .,0 '00 '10 '20 HO '40 '50 "0 ···· 1···· 1 ···· 1···· 1 ···· 1 ···· 1 · · · · 1 ···· 1 · ··· 1 ···· 1···· I · · 1 ···· 1 ···· 1 · · ·· 1···· 1 ···· 1 .... 1 ···· 1 ·· · · 1 · · ·· 1 ···· 1···· 1 ···· 1 Kb CDK Contig5041 TCATGAATGCAGTCGACTT CTGCCATATGAACGGTATCATGCA CC GTGACTT GAAGCCC TCCAACGTGCTGATCACAGCAAGGGGAGAGCTGAAGCTATGTGATTT CGGCC TGTCGAGAA Spl.iced Leader

no ,.0 "0 1000 1010 1020 1030 1040 1050 1060 1010 1080 ···· 1 ···· 1···· 1···· 1 ···· 1 ···· 1···· 1 ···· 1 ···· 1 ···· 1 ···· 1 ···· 1 ···· 1 ··· · 1 · · · · 1 ···· 1 .·.· 1.·.· 1 ···· 1 ···· 1 ···· 1 ··· · 1 ···· 1 ···· 1 Kb CDK Contig5041 TATT CAGCGGATCAGGCAACTT CTCAACA CGGGTGATCACATT GTGGTACC GGCCAGCTGAGCTACTT CTCGGCACGCGAACATACGATCAGAGTGTAGACGTCTGGTCGTGCGGCTGCA Spl.iced Leader

lUO 1100 1110 1120 1130 1140 1150 1160 1110 1180 lUO 1200 ···· 1···· 1 ··· · 1 ··· · 1 ·· · ·1 · · ·· 1 ··· · 1 ··· · 1 ·· ·· 1 ···· 1 · · ·· 1 · · · · 1 ···· 1 ···· 1 ···· 1 ···· 1···· 1 · ··· 1 ···· 1 · · ·· 1 ···· 1 ···· 1···· 1···· 1 Kb CDK Contig5041 TGCTT GGTGAGTTATT GACC GGCTATCCAATGTTTCCAGAGTCCAGCGAGA CGAGGGTTTT CGTCGCTATAGCCAACAAGCTAGCCC CAGTGGAAGATCTTT GGCCAGTGGAGTT GCGGA Spl.iced Leader

1210 1220 1230 1240 1250 1260 1210 1280 12,. 1300 1310 1320 ···· 1···· 1 ··· · 1 ···· 1 ···· ' ···· , ··· · , · ·· · , · ·· · , · · · · , ···· , ··· · , ···· ' ···· 1 ···· ' ···· , ···· , ··· · , · ··· , ···· , · ··· , ···. , .... , .... , Kb CDK Contig5041 T GC T CC CACAGT G GAACAAG C T G T GGAG T GT GAAG CAGCAG TT CA CC GGGAAT GAAGAG CC CA G T GA C GT C GT ~ G CATTTTT CAGGGAAGTT G CA C GTTT G TAC GG C CAAGAGT C TT Spl.iced Leader

1330 1340 1350 1360 1310 1380 lUO 1400 1410 1420 1430 1440 . . .. , . ... , . .. . , ... . , ... . , .... , ... . , .... , .. . . , . ... , .... 1 ···· 1 ·· . , ... · ' ... · 1 ·.·· 1 ·· · · ' ·· ·· ' ···· 1 ···· 1 .... ' ···· 1 ···· ' ·· . . , Kb CDK Contig5041 CAGAGCTCTT GCGCTCCA CGCTGGAACTGGAGCCAAGCAAGCGCATCAGCTCGGAAGCGGTCC TGCGACATT CGTGGTT CTT CAGG GATCC GATGCC TT GTCAACCC CACGAGATAAGAC Spl.iced Leader

1450 1460 1470 1480 14'0 1500 1510 1520 1530 1540 1550 1560 · ··· , ···· ' ···· 1 · ··· , ·· · · ' ···· 1···· 1 ···· ' ···· ' ···· 1 ··· · 1 ···· 1 ···· ' · ·· · 1 ·· ·· 1 ···· ' ···· , ···· , ···· ' ···· 1 ·· · · ' · ··· ' ···· 1 .. ·. 1 Kb CDK Contig5041 TCAACCAGAACATGTCTTT CCACGAGCTCGATGTCAAAAGACATAGGGAAAAGTTGAGGCAAGAACAAGAGAACAAGCGCAAGCGGACGCAGGAATCGATGCGCGCGGGGGCCGATGACT Spl.iced Leader

1510 1588 15'0 1600 1610 1620 lnO 1640 1650 1660 Kb CDK Contig5041 C T GA~G CiG ~~~ GGAG~ CiG ci G TT C ~ GAT G ~GA C ~AGCA~ G T GG ~ T GG~T C T ~ GGCC ~ G T G~ ~~~~ ! ~~~ ~ ~~~ ~ ~~~~ ~ ~~~~ ~~ ~' Spl.iced Leader

Figure 10: Full length sequence of a cyclin-dependent kinase in K. brevis. The presence of the spliced leader at the 5' end of the sequence is shown in the box, as well as the ATG start codon and the poly A tail at the 3' end.

167 11 20 ... . ,... ,... ,.... ,. • Kb COK Con ig ~0.1 fQPL V C. reinha d i1 COK T. psuedonan CDK O. sativa COKC S . 1yoopersioum CDKC

Kb COK Cont 9 SO 1 C. reinh rdtl1 COX T . psuedonana COK O. sativa COKC s . 1yooperslcum COKC

42' 30 440 450 4'0 410 4" ... , · ' .... 1 · .. · 1 ··· ' .. ·· 1 ·· .. 1 .. · 1 ·· 1 .... 1 .. · 1 ·· I ··· Kb CDK Contig SOt1 - Q CP ------I c. relnh rdti1 CDK t LI OLL T. psuedonan COK ------VDLLE O. sat1va COKC ------DLL OI) LPCOl)K LI'X' PP I QQT QPHPQI S . 1yoopers1oum COKC -----LOLLO IOI'LPCO R L X' 1001' H "QH

52' 53' 54' 550 5'0 510 5.0 5'0 '00 ·· 1 ·· 1 .... 1 .. 1 .... 1 · .. 1 · 1 .... 1 · .. I .. · 1 .. · 1 .. · 1 · ·· 1 .... 1 .... 1 ·· I .. I .. · ' Kb COK Con 19 SO 1 JORR QQ ------­ p C. re1nhardtl COK K V R ------T . psuedonana COX ------O. sativa COKC P Wt - GC(;;YGI[;CStfYPQQ S . 1yoopers1cum COKC WiQ US -

'10 '2' , , · .. · 1 · .. 1 ·· 1 .... 1 .... 1 ·· I Kb COK Cont1g 50 1 C. re1nhard 11 COK T . psuedonan COK O. sat1va COKC S . 1y opersloum CDKC

Figure 11: Amino acid sequence of K. brevis CDK. This sequence has highest homology to the plant CDKCs found in O. sativa, S. lycopersicum, as well as the diatom CDK from

T. psuedonana and the green alga, C. reinhardtii. Shown in the black box is the cyclin binding domain present in CDKs. The red box highlights the residue that is phosphorylated, in the T -loop of CDKs, which in tum activates the CDK. The asterisks

(*) represent the residues that are involved in ATP binding and that are conserved in this

K. brevis CDK.

168 110 I· ... I ...... Plasmodium CDK 7 K. brevis CDK 7-like K. brevis CDK 7-like K. brevis CDK C-like Arabidopsis CDK C Chl~dmmonas CDK Cp

Figure 12: Partial alignment of K. brevis cyclin dependent kinases (CDKs) and CDK 7 from Plasmodiumfalciparum, CDK C from Arabidopsis thaliana, and a CDK from

Chalmydomonas reinhardtii. Boxed region shows the conserved cyclin binding motif that characterizes CDKs.

169 A B

NH

Olomoucine Fascaplysin

Figure 13: Cyclin-dependent kinase inhibitors used for the CDKI studies. (A)

Olomoucine, shown to inhibit CDKs 1,2 and 5, (B) Fascaplysin, a CDK4-specific inhibitor.

170 (A) (B)

Figure 14: SYTOX staining of K. brevis showing (A) live cells with no nuclear staining and (B) a dead cell with decreased chloroplast autofluorescence and stained nuclei.

171 (A) (8)

100 Hours after lights on ...... G1 c: --.- s 2 10 12 14 16 18 o 75 kDa ~ ...... G2/M "S 49 Q. o 50 0.. 38

G) u 25 28 '#. 17 o~~~~~~~~~~ o 2 4 6 8 10 12 14 16 18 20 22 24 Hours after lights on

(C) PCNA -37kDa ....>- 65 ·iii 60 c: G> 55 CI 50 "'0 45 ~ (I) 40 ~ G> L. ::::I 35 Cl- G> ~ 30 1:> 25 20 G> Cl 15 ~ L. 10 G> .- 5 -«> 0 2 10 12 14 16 18 Hours after lights on p

Figure 15: PCNA abundance throughout S-phase. (A) Flow cytometry data showing

---60% of the cell population proceeding through S-phase between LDT12 and 14. (B)

Western blot analysis of PCNA throughout S-phase where the 28kDa PCNA is only present in a small amount at LDT2. A long exposure was necessary for this band to appear. (C) Densitometry of the ---37kDa band shows a higher significant difference between LDT2 and all other S-phase time points.

172 PC A at LDT12 (mid-S-phase)

49kDa -

38 -

28 -

100

-0 ..,'- 80 c: 0 u 60 't- ..,0 c: 40 Q) u '- Q) 20 Q. t O~--.-----~~----~------.--- o

p

Figure 16: PCNA protein abundance at LDT12 after inhibition with CDKls. "S-phase bands" refers to the positive control for both forms of PCNA from a 1 liter culture. (A)

Western blot analysis. (B) Densitometry of the 5 replicates when normalized to the control. 173 60 -c III • E: 50 tU m • >. 40 • • «~ Z en • U E: 30 • • a...... G) tU E: C- 20 • • ~ ~ f') 10 • • 0 0 10 20 30 40 50 60 70 80

% S-phase

Figure 17: Linear correlation analysisof percent of the cell population in S-phase

and the 37kDa PCNA protein abundance. R2=O.7812, p

174 Table 1: Sequences from K. brevis with homology to cyclin dependent kinases. Shown are the top hit organisms and the e-values.

Marine Genomics 10 Protein Species EValue 108857 cyclin dependent kinase C Arabidopsis Iyrata 3.00E-57 2031566 cell cycle-related kinase Esoxlucius 5.00E-22 105637 cyclin-dependent kinase-related kinase Cryptosporidium hominis 1E-53 2033180 cycl i n-dependent ki nase-rel ated ki nase C. hominis 6.16E-66 2031949 cyclin-dependent kinase 1; p34cdc2 Dunaliella tertio/ecta 3.00E-09 107238 cyclin G associated kinase 80s taurus 4E-29 2079904 cdc2 DrosoDhiia me/anoaaster 3.00E-08

175 Table 2: Fascaplysin range finder study results. The concentration determined to induce the highest percent G 1 block with the highest viability and best reproducibility was

1.5JlM.

%Gl %S %G2/M %Viability Control 39.40 60.60 0.00 95.5 H,O 40.30 59.53 0.17 98 Fascaplysin SIlM 90.40 0.27 9.37 84.5 Fascaplysin 2.SIlM 95.50 1.00 3.53 37 Fascaplysin 1.SIlM 92.17 5.00 2.83 80.4 Fascaplysin 0.7SIlM 75.73 21.83 2.40 68

176 Table 3: Olomoucine range finder study results. The concentration determined to induce the highest percent G 1 block with highest viability and the best reproducibility was

12.5I1M.

%Gl %S %G2/M %Viability Control 51 48.85 0.15 100 Olomoucine 50J.l,M 97.8 0 2.2 86 DMSO (37.25 J.l,L) 79.6 16.1 4.35 95

Olomoucine 25J.1,M 98.5 0 1.5 92

Olomoucine 12.5J1M 93.8 0.47 5.73 96 DMSO (9.3125 J.l,L) 26.5 68.05 5.5 96

Olomoucine 6.25J.1,M 83.97 13.5 2.5 93.5

DMSO (4.656 ~LJ 44.35 55.3 0.35 92

177 Table 4: Flow cytometry of K. brevis cell populations at LDT12 after inhibition with the

CDKIs, olomoucine or fascaplysin,for 12 hours. Percent viability refers to percent of the

cell population viable after a 12 hour exposure to the CDKIs as determined by SYTOX

staining. Samples from this study were used for protein abundance analysis.

%Gl %S %G2/M %Viability Cell Counts (Ave. celis/mLl Control 39.6 55.54 4.86 96.8 15,206 H,O 41.94 54.34 3.76 98.6 15,675 Fascaplysin 1.51lM 95.84 0.84 3.32 80 15,917 DMSO 58.78 39.58 1.62 98.6 15,354 Olomoucine 12.51lM 94.84 2.54 2.64 99 13,369

178 Chapter 5: Conclusions, Significance and Future Directions

179 The studies presented in this dissertation were undertaken to gain better understanding of processes that regulate the cell division cycle in the dinoflagellate K. brevis. K. brevis is responsible for toxic red tides that occur almost annually in the Gulf of Mexico that cause marine animal mortalities, human illness and economic loss by impacting fisheries and tourism. The study of growth in this organism will lead to better understanding of processes driving bloom formation and the development of field biomarkers for assessing bloom status that will improve modeling and mitigation efforts.

Since growth in blooms results primarily from vegetative cell division, processes regulating the cell cycle are central to bloom growth. The cell cycle of dinoflagellates has been studied for many decades (Sweeney and Hastings, 1958; Olsen and Chisholm, 1986;

Homma and Hastings, 1989; Roenneberg, 1996; Taroncher-Oldenburg, 1997; Van Dolah and Leighfield, 1999; Leighfield and Van Dolah, 2001, Litaker et aI., 2002; Dagenais­

Bellefuielle et aI., 2008). However this has mostly been investigated in terms of cellular physiology and only in the past decade with the development of genomics tools has the molecular biology of these organisms begun to be elucidated.

Dinoflagellates are a unique group of organisms in which to study cell cycle based on their position in the tree of life and many novel adaptations found only in this group of primitive, unicellular eukaryotes. The diversity of cellular and molecular systems in microbial eukaryotes is staggering, and some emerging patterns are indicative of convergent evolution which appears to have played a major role in shaping the overall organization of eukaryotic cells at all levels (Lukes et aI., 2009). Dinoflagellates have 180 acquired photosynthesis through secondary (and in the case of K. brevis, tertiary) endosymbiotic events and through these evolutionary events, have obtained 3 genomes

(nuclear, chloroplastic and mitochondrial (Keeling, 2009). Dinoflagellates deviate from the canonical view of gene expression, and the central dogma, in that they use trans­ splicing and have polycistronic messages which are unlike prokaryotic polycistronic messages in that they seem to contain tandemly repeated copies of the same gene (Zhang et aI., 2007, Lidie and Van Dolah, 2007, Bachvaroffand Place, 2008; reviewed in Lukes et aI., 2009). These characteristics are missing from their nearest neighbors, the apicomplexans and ciliates, and thus appear to be a secondarily evolved state.

Kinetoplastids, such as the parasitic Trypansoma, also possess these unusual molecular biological characteristics, which are absent from their evolutionary neighbors the

Euglenids, although they are separated from dinoflagellates by more than 900 million years. As parasites, trypanosomes have to deal with stressful transitions between environments found in different host species in which these organisms have different life stages. It is unclear what conditions have driven their convergent evolution, but both trans-splicing and polycistronic messages appear to confer adaptive advantage to dinoflagellates and kinetoplastids despite their different life styles.

The overarching objective of this project was to determine how K. brevis regulates its cell cycle in the absence of transcriptional control of gene expression. As the cell cycle is one of many processes tightly coupled to the photoperiod, this project first investigated diel patterns of protein expression in K. brevis. Unlike the dinoflagellate

L. polyedrum,which has been shown to have a circadian rhythm of translation where translation takes place only during the night, we determined that in K. brevis there does 181 not appear to be circadian regulation of global protein synthesis, and therefore we might expect to see cell cycle proteins synthesized at times corresponding to the cell cycle phases in which they are needed. Proteomic analysis using DIGE found only 4% of the

--500 proteins resolved from total cellular protein to be changing in expression over the diel cycle, and these changes followed varying patterns that would appear to correspond with their needs, rather than a bulk turnover during the dark as suggested by the studies in

L. polyedrum. Because whole cell protein extracts resolved only the highest expressed proteins in the cell, subcellular fractionation followed by two dimensional gel proteomic methods were developed that succeeded in identifying nuclear proteins for which there no previously published data in dinoflagellates. We found that nuclear isolation in combination with LTQ MS/MS was essential to obtain successful protein identifications because we currently lack genomic sequence data from which to derive adequate mass peptide fingerprints for K. brevis proteins. The identification of nuclear proteins clearly shows that this method can be used successfully to study K. brevis. Additionally, once mass spectra are obtained, such as those from the studies presented in Chapter 2, the spectra can later be searched against newly acquired sequence information with the hopes of identifying more proteins from previous experiments. Nuclear proteomics should provide a wealth of information on dinoflagellate biology since this organism has an unusual nucleus. Nuclear proteomics has recently proved to be a useful tool in studying plant biology, since the plant nucleus also possesses some significant differences in appearance and composition from other eukaryotes, which indicates specific molecular pathways (Erhardt et aI., 2010). The nucleus is a dynamic organelle and proteins are constantly moving into and out this compartment (Gonzalez-Melendi et aI., 2000). 182 Therefore, an interesting experiment would be to investigate the nuclear proteome of K. brevis over the cell cycle since this organism displays post-transcriptional control, appears to use translational control and possibly post-translational modifications to regulate cellular processes and nuclear proteins are now able to be successfully identified.

We had hoped to identify DNA replication proteins in K. brevis nuclei as has been done in the chickpea C. arietinum. Proteins identified in this plant found a variety of protein classes including those involved in signaling, gene regulation, DNA replication and transcription (Panday et aI., 2006). U sing a large amount of K. brevis nuclear protein may overcome the problem of identifying proteins of specific interest since nuclear protein samples have low protein quantity and concentration (Erhardt et aI., 2010).

Genomic studies in K. brevis have provided a wealth of information that, without EST sequencing and development of a microarray, would have been impossible. However, there are limitations with genomic analysis in that it cannot provide complete information of cellular and subcellular functions, in which proteins, not genes, govern the functions

(Klein and Thongboonkerd, 2004). Proteins are dynamic and many functions are regulated by post-translational modifications, which are unable to be investigated using a genomics approach. This becomes especially important in an organism such as K. brevis that uses post-transcriptional control of gene expression. As additional information on the genome of K. brevis becomes available through deep transcriptome sequencing via

second generation sequencing technologies, the use ofproteomics will become easier and

more fruitful.

While this project was underway, a 5' trans-spliced leader sequence was

identified on diverse mRNAs in dinoflagellates, including K. brevis (Lidie and Van 183 Dolah, 2007; Zhang et aI., 2007). The presence of the spliced leader sequence and polycistronic messages in dinoflagellates suggests a constitutive transcription program similar to the Trypanosomatids, and much of our current insight into dinoflagellates comes from studies on their well characterized system of post-transcriptional regulation.

In Chapter 3 we reported the presence of the spliced leader sequence on the 5' end ofS­ phase genes, suggesting that S-phase might be regulated by post-transcriptional mechanisms, and consistent with this we saw no changes in their transcript abundance over the cell cycle. Turning to the trypanosome cell cycle for a model, we found the trypanosomatid,CrithidiaJasciculata possesses polyA binding proteins that stabilize the mRNAs of DNA metabolism proteins such as RPA1 specifically during S-phase (Pasion et aI., 1994). These polyA binding proteins bind to an octomer sequence present in transcripts required for the cell cycle-dependent regulation of the mRNA levels that cycle and it is thought that the binding of these proteins varies during the cell cycle in parallel with the levels of putative target mRNAs (Mittra and Ray, 2004). However, we have not identified a similar octomeric motif, or any other conserved motif, in the 5' or 3' UTR of

K. brevis S-phase genes. Therefore, it would seem that dinoflagellates have a different solution to the problem of cell cycle phase-specific protein expression. It might be informative to do a comparative study of this process in dinoflagellates and apicomplexans, the nearest relatives to dinoflagellates. Apicomplexans lack the spliced leader mechanism and transcription in the apicomplexan, Plasmodium Jalciparum, is monocistronic; however, the genome contains a dearth of genes encoding transcription factors compared with other eukaryotic genomes and comparative genomic analysis shows a disproportionately high number of proteins involved in mRNA stability and 184 deadenylase activity in the genome, suggesting post-transcriptional control of gene expression to some extent (Coulson et aI., 2004; Le Roch et aI., 2004). It is thought that these mechanisms may confer advantages for the Plasmodium parasites to quickly adapt to environmental changes and could explain the common usage of these mechanisms throughout Apicomplexa (Le Roch et aI., 2004). Since dinoflagellates evolved from a common ancestor to the apicomplexans, it may be that dinoflagellates are using a similar approach to life, with enhancement of the spliced-leader to mediate mRNA processing.

At the start of this project ESTs for a number of cell cycle genes had been identified in K. brevis but they had not been fully characterized. The studies in Chapter 3 found some unusual phenomena in K. brevis cell cycle proteins. The first is the high copy number of PCNA genes, as evidenced by 11 unique contigs that encode 9 different amino acid sequences that vary only at 20 conserved polymorphic sites. PCNA is also highly amplified in other dinoflagellates species. The alveolate Perkinsusmarinus, which is the closest relative to dinoflagellates, seems to represent an intermediate state between the true dinoflagellates and the true apicomplexans. Perkensis has over 10 copies of the

PCNA gene, with identical gene structures, similar nucleotide sequences (88-99%) and almost identical amino acid sequences (Zhang et aI., 2010). A spliced leader sequence was also found at the 5' end of P. marinus PCNA. It may be that this highly important gene has not only been conserved throughout evolution but in these unusual protists which have high copy number of certain genes such as PCNA, the duplication events have also been retained. Another gene also highly amplified was RnRI. However, its partner RnR2, as well as RFC and RP A seem to be in low copy number based on EST diversity. If this is the case it must be that correct protein stoichiometry is achieved at the 185 level of translation. In Chapter 3 we observed S-phase specific increases in protein levels of these proteins, in the absence of increases in their transcript levels. In Trypanosoma cruzi, PCNA mRNA and protein are detected throughout the cell cycle similar to K. brevis; however in T. cruzi, there is an increase in both mRNA and protein during S- \ phase (Calderano et aI., 2007). The PCNA protein is always present in the nuclear space but varies over the cell cycle with· replication sites at the nuclear periphery indicating that there are storage pools of PCNA in T. cruzi (Calderano et aI., 2007). In contrast, in K. brevis anti-PCNA strongly labeled replication sites distributed throughout the nucleus during S-phase. In this case, peripheral pools were apparent in Gland G2. This indicates that replication forks in K . brevis, which have not been previously characteriz~d, are distributed throughout the chromatin during S-phase.

The investigation of dinoflagellate CDK and cyclin sequences in Chapter 4 revealed that they had limited sequence siniilarity to the well-characterized CDKs, being more similar CDKC in plants (Barroco et aI., 2003) and CDK 7 in other alveolates such as P. Jalciparum (Srinivasan and Krupa, 2005). The K. brevis CDKe-like protein has conserved ATP binding sites but a divergent cyclin binding motif. We have likely not identified all of the CDKs in K. brevis in the existing EST library. CDK i$ibitor studies in Chapter 4 using fascaplysin, which is highly selective for CDK4, points to the presence of a CDK4-like protein, as this inhibitor blocked DNA replication as well as PCNA protein production. The timing of the restriction point in K. brevis is currently unknown and the block in S-phase progression by fascaplysin addition at LDTO (Gl by flow cytometry analysis) may be indicative of the restriction point occurring sometime around the dark to light transition. This is suggested by the fact that PCNA is present at LDT2, 186 suggesting a subpopulation of cells that have already passed through the restriction point and are primed to proceed into S-phase. By flow cytometry, S-phase does not begin until

--LDT6; however, by the time flow cytometry can identify a cell as being in S-phase based on fluorescent DNA staining, it has already carried out significant DNA replication. Together, these results lead to the possibility that there is/are functional homolog(s) ofGl CDKs in K. brevis that control cell cycle progression. A question that is raised from the sequence analysis and cell physiology studies involving CDK is how many CDKs might be involved in cell cycle progression in K. brevis. A phylogenetic study ofCDKs across phyla identified 14 CDK members in C. elegans, 13 in D. melanogaster, 8 inS. pombe, 7 in S. cerevisiae and 21 in H. sapiens (Liu and Kipreos,

2000) and 6 have been identified in P. Jalciparum (Jirage et aI., 2010). It will be interesting to determine the number of cell cycle CDKs in dinoflagellates and how they drive the K. brevis cell cycle, given the apparent absence ofCDK-dependent transcription.

The presence of multiple forms ofPCNA protein in Chapter 3 led to the investigation in Chapter 4 of possible post-translational modification mechanisms in K. brevis, specifically sumoylation and ubiquitination. This level of regulation may be essential in dinoflagellate cell cycle control and it has been neglected in dinoflagellate research. In general, eukaryotes use a multitude of post-translational modifications to regulate the cell cycle including but not limited to phosphorylation by cyclin dependent kinases and other kinases, of by acetyltransferases (HATs), ubiquitination by the anaphase promoting complex and other by other ubiquitin pathway components, and by the more recently discovered sumoylation of transcription factors 187 and PCNA. The fact that K. brevis and other dinoflagellates use mostly post­ transcriptional regulation of gene expression leads to the hypothesis that not only translational control but post-translational modifications may play an even larger role in cellular processes than in other eukaryotes. For example, PCNA can be ubiquitinated, sumoylated, and acetylated and in yeast PCNA it appears that ubiquitin and sumo can coexist in a single PCNA subunit. Therefore, this protein has multiple levels of regulation and these PTMs could be occurring in the dinoflagellate since they are reversible and provide flexibility for the cell.

Chapter 4 provides the first evidence of sumoylation in a dinoflagellate which warrants further study. In the yeast S. cerevisiae, a recent study found that in exponentially growing cultures that PCNA modification by SUMO is more extensive than previously reported and includes oligomeric chains on 2 different lysine residues,

K164 and K127 (Windecker and Ulrich, 2008). Changes in molecular weight of K. brevis PCNA during S-phase are consistent sumoylation during S-phase in exponentially growing cultures. A single homolog for SUMO was identified in the K. brevis EST library. Single SUMO homo logs are also found in a variety of protists including

Trypanosoma brucei, T. cruzi, Leishmania major, P.Jalciparum, annulata,

Entomaeba histolytica, Cryptosporirium parvum and C. hominis (Ponder and Bogyo,

2007), as well as yeast. Immunoprecipitation studies in P. Jalciparum showed that the nuclear protein Sir2 is sumoylated in this protist and it is suggested that this PTM may play an important role in this parasite (Issar et aI., 2008). Preliminary data generated here suggesting that PCNA is sumoylated should be confinned. An infonnative experiment to further detennine PCNA sumoylation in K. brevis would be to produce a recombinant 188 PCNA and either Western blot with an anti-SUMO antibody (which has not proved

trivial) or perform an in vitro sumoylation assay to verify whether PCNA can be

sumoylated. Alternatively, a mass spectrometry approach could be used to obtain

information on sumoylated PCNA. There are signatures that could be indentified such as

the lysine on PCNA to which the sumo G-G motif binds. In the SUMO sequence, there

is a conserved threonine residue on the amino side of the motif that is unique to this

ubiquitin-like modifier which would aid in its identification. This method may prove

difficult due to the fact that sumo can form chains similar to ubiquitin. Additionally,

although the ubiquitin studies were not conclusive, an interesting study would be to

inhibit the ubiquitin pathway, for example using MG 132, and determine if PCNA is

modified. Investigation of post-translational control, specifically of growth related

proteins, in dinoflagellates will ultimately be necessary in order to gain in depth

information on important cellular processes and how they are controlled.

The last objective of this project included an effort to determine whether PCNA

might provide a useful biomarker for growth in Florida red tides. In order to develop

successful biomarkers for growth in dinoflagellates, there is a need for understanding the basic biology of this organism. K. brevis HAB forecasting is currently done using

satellite derived image products, wind predictions and models derived from previous

observations and research (Stumpf et aI., 2009) which involve integrating physical and biological oceanographic measurements. There is limited data on the physiological processes of K. brevis that are critical to growth of cells and populations in different

environments (Sinclair et aI., 2009). Therefore it would be valuable to implement biomarkers that can fine-tune forecasting models by determining bloom trajectory (i.e., 189 growing or declining) on a cellular scale. The results of this study suggested PCNA at the protein level could prove useful since this dinoflagellate is regulating its cell cycle at the post-transcriptional level, such that PCR-based methods would not be infonnative.

The analyses in Chapter 4 show that there is a significant linear correlation between the expression of S-phase specific 37kDa PCNA and the growth rate of the population. This protein is virtually absent from stationary phase making this protein a particularly promising tool for biomarker development. An unexpected result of Chapter 4 was the identification of a 47 kDa form ofPCNA only in cultures beginning to decline. If the hypothesis of PCNA ubiquitination during stationary phase presented in Chapter four proves correct, there is potential for this protein to be developed into a bloom termination biomarker.

In summary, the studies presented in this body of research provide a new level of insight into the regulation of protein expression in K. brevis, and in particular the expression S-phase proteins that are regulated in a cell cycle dependent manner. The new information gained has both theoretical and applied value. As K. brevis uses post­ transcriptional control of the cell cycle unlike most eukaryotes, the insight gained from these studies will contribute to our understanding of cell cycle regulation from an evolutionary standpoint. This research has laid the groundwork for further investigations on cell cycle regulation at the translational or post-translational level in this organism.

Additionally, this work identified an applied endpoint with the potential ofPCNA to be used as a biomarker for K. brevis bloom progression.

190 LIST OF REFERENCES

Abbas, T. & A. Dutta (2011). "CRL4Cdt2: master coordinator of cell cycle progression and

genome stability." Cell Cycle 10(2): 241-249.

Akimoto, H., C. Wu, et al. (2004). "Biological rhythmicity in expressed proteins of the

marine dinoflagellate Lingulodinium polyedrum demonstrated by chronological

proteomics." Biochem Biophys Res Commun 315(2): 306-312.

Andersson, J. O. (2005). "Lateral gene transfer in eukaryotes." Cell Mol Life Sci 62(11):

1182-1197.

Andreou, A. M. & N. Tavernarakis (2009)."SUMOylation and cell signalling."

Biotechnology Journal 4(12): 1740-1752.

Archer, S. K., Luu, V. D., de Queiroz, R. A., Brems, S. & Clayton, C. (2009). "Trypanosoma

brucei PUF9 Regulates mRNAs for Proteins Involved in Replicative Processes over

the Cell Cycle." PLoS 5(8): el000565.

Bachvaroff, T. R., & Place, A. R. (2008). "From stop to start: tandem gene arrangement,

copy number and trans-splicing sites in the dinoflagellate Amphidinium carterae."

PLoS ONE 3: e2929.

Bailey, D. & P. O'Hare (2005). "Comparison of the SUM01 and ubiquitin conjugation

pathways during the inhibition of pro tea some activity with evidence of SUMO 1

recycling." Biochem J 392(Pt 2): 271-281.

Baldauf, S. L. (2003). "The Deep Roots of Eukaryotes. " Science 300(5626): 1703-1706.

Barbier, M., Leighfield, T. A., Soyer-Gobillard, M. & Van Dolah, F. M. (2003)."Permanent

Expression ofa Cyclin B Homologue in the Cell Cycle of the Dinoflagellate Karenia

brevis." Journal of Eukaryotic Microbiology 50(2): 123- 131.

191 Bayer, P., A. Arndt, et al. (1998). "Structure determination of the small ubiquitin -related

modifier SUMO-I." Journal of Molecular Biology 280(2): 275-286.

Bertomeu, T., Rivoal, J. & Morse, D. (2007)."A dinoflagellate CDK5-like cyclin dependent

kinase." Biology of the Cell 99(9): 531-540.

Bertomeu, T. & D. Morse. (2004). "Isolation of a dinoflagellate mitotic cyclin by functional

complementation in yeast." Biochemical and Biophysical Research Communications

323(4): 1172-1183.

Bradford, M. (1976)."A Rapid and Sensitive Method for the Quantitation of Microgram

Quantities of Protein Utilizing the Principle of Protein-Dye Binding." Annals of

Biochemistry 72: 248-254.

Bravo, R. & J. E. Celis (1985). "Changes in the nuclear distribution of cyclin (PCNA) during

S-phase are not triggered by post-translational modifications that are expected to

moderately affect its charge." FEBS Lett 182(2): 435-440.

Bravo, R. & H. Macdonald-Bravo (1985). "Changes in the nuclear distribution of cyclin

(PCNA) but not its synthesis depend on DNA replication." EMBO J 4(3): 655- 661.

Brown, L. M. R., D. S. (1997). "Cell cycle regulation ofRPAl transcript levels in the

trypanosomatid Crithidiafasciculata." Nucleic Acids Research 25(16): 3281- 3289.

Brunelle, S., Hazard, E. S., Sotka , E. E. & Van Dolah, F. M. (2007). "Characterization of a

dinoflagellate cryptochrome blue-light receptor with a possible role in circadian

control of the cell cycle." Journal of Phycology 43: 509-518.

Brunelle, S. A. (2005). Circadian control in the dinoflagellate Karenia brevis: a role for blue

light and characteristics of a blue light receptor. Biology. Charleston, The Graduate

School of the College of Charleston. M.S.: 80.

192 Calderano, S.G., Rubin-De-Celis, S.S., Dossin, F.M., Mortara, R.A., Schenkman, S. &Elias,

M.C. (2007). "Storage pools of proliferating cell nuclear antigen PCNA in the

nucleus of Trypansoma cruzi." XXIII Meeting of the SBPZ.

Celis, 1. E. & A. Celis (1985). "Cell cycle-dependent variations in the distribution of the

nuclear protein cyclin proliferating cell nuclear antigen in cultured cells: subdivision

ofS phase." Proceedings of the National Academy of Sciences of the United States of

America 82(10): 3262-3266.

Chabes, A. L., Pfleger, C. M., Kirschner, M. W. & Thelander, L. (2003). "Mouse

ribonucleotide reductase R2 protein: A new target for anaphase-promoting complex­

Cdhl-mediated proteolysis." Proceedings of the National Academy of Sciences USA

100: 3925-3929.

Chabes, A. T., L. (2000). "Controlled Protein Degradation Regulates Ribonucleotide

Reductase Activity in Proliferating Mammalian Cells during the Normal Cell Cycle

and in Response to DNA Damage and Replication Blocks." Journal of Biological

Chemistry 275: 17747-17753.

Chan, L. L., 1.1. Hodgkiss, et al. (2005). "Use of two-dimensional gel electrophoresis to

differentiate morpho species of Alexandrium minutum, a paralytic shellfish poisoning

toxin-producing dinoflagellate of harmful algal blooms." Proteomics 5(6): 1580-

1593.

Chan, L. L., 1. 1. Hodgkiss, et al. (2004). "Use of two-dimensional gel electrophoresis

proteome reference maps of dinoflagellates for species recognition of causative

agents of harmful algal blooms." Proteomics 4( 1): 180-192.

193 Chan, L. L., 1. J. Hodgkiss, et al. (2004). "Proteomic study of a model causative agent of

harmful algal blooms, Prorocentrum triestinum II: the use of differentially expressed

protein profiles under different growth phases and growth conditions for bloom

prediction." Proteomics 4(10): 3214-3226.

Chan, L. L., S. C. Lo, et al. (2002). "Proteomic study of a model causative agent of harmful

red tide, Prorocentrum triestinum I: Optimization of sample preparation

methodologies for analyzing with two-dimensional electrophoresis." Proteomics 2(9):

1169-1186.

Chan, L. L., W. H. Sit, et al. (2006). "Identification and characterization of a "biomarker of

toxicity" from the proteome of the paralytic shellfish toxin-producing dinoflagellate

Alexandrium tamarense ()." Proteomics 6(2): 654-666.

Christie, J. M. & W. R. Briggs (2001)."Blue Light Sensing in Higher Plants." Journal of

Biological Chemistry 276(15): 11457-11460.

Ciechanover, A., A. Orian, et al. (2000). "Ubiquitin-mediated proteolysis: biological

regulation via destruction." BioEssays: news and reviews in molecular, cellular and

developmental biology 22(5): 442-451.

Colby, T., Matthai, A., Boeckelmann, A., & Stuible, H. P. (2006)."SUMO-Conjugating and

SUMO Deconjugating Enzymes from A rab idops is ." Plant Physiology 142: 318-332.

DeCoursey, P. J., J. R. Krulas, et al. (1997). "Circadian Performance of Suprachiasmatic

Nuclei (SCN)-Lesioned Antelope Ground Squirrels in a Desert Enclosure."

Physiology & Behavior 62(5): 1099-1108.

Donald, R.G., Zhong, T., Meijer, L., & Liberator, P.A. (2005) "Characterization of two

T. gondii CKI isoforms." Mol. Biochem. Pararsitol. 141(1):15-27.

194 Ellis, J., Sarkar, M., Hendriks, E. & Matthews, K. (2004)."A novel ERK-like, CRK-like

protein kinase that modulates growth in Trypanosoma brucei via an autoregulatory C-

terminal extension." Mol Micro 53(5): 1487-1499.

Engstrom, Y., &, Rozell, B. (1988). "Immunocytochemical evidence for the cytoplasmic

localization and differential expression during the cell cycle of the M 1 and M2

subunits of mammalian ribonucleotide reductase." EMBO J 7: 1615-1620.

Essers, 1., Theil, A. F., Baldeyron, C., van Cappellen, W. A., Houtsmuller, A.B., Kanaar, R.

& Vermeulen, W. (2005). "Nuclear Dynamics ofPCNA in DNA Replication and

Repair." Molecular and Cellular Biology 25(21): 9350-9359.

Evans, T.J., Kirkpatrick, G.J., Millie, D.F., Chapman, D.J., & Schofield, O. E. (2001)

"Photophysiological response of the toxic red-tide dinoflagellate Gymnodinium breve

(Dinophyceae) under natural sunlight. J oumal of Plankton Research 23: 11 77 -1193.

Fagan, T., Morse, D. & Hastings, J. W. (1999). "Circadian synthesis of a nuclear-encoded

chloroplast glyceraldehyde-3-phosphate dehydrogenase in the dinoflagellate

Gonyaulax polyedra is translationally controlled." Biochemistry 38: 7689-7695.

French, M., K. Swanson, et al. (2005).Identification and Characterization of Modular

Domains That Bind Ubiquitin.Methods in Enzymology. J. D. Raymond, Academic

Press. Volume 399: 135-157.

Gomes, F.C., Ali, N.O., Brown, E., Walker, R.G., Grant, K.M., & Mottram, J.C. (2010)

"Recombinant Leishmania mexicana CRK3:CYCA has protein kinase activity in

the absence of phosphorylation on the T-Ioop residue ThrI78." Mol. Biochem.

Parasitol. 171(2):89-96.

Gardner, M. J., R. Bishop, et al. (2005). "Genome Sequence of Theileria parva, a Bovine

Pathogen That Transforms Lymphocytes." Science 309(5731): 134-137. 195 Gill, G. (2004). "SUMO and ubiquitin in the nucleus: different functions, similar

mechanisms?" Genes & Development 18(17): 2046-2059.

Gourguechon, S., & Wang, C.C. (2010) "CRK9 contributes to regulation of mitosis and

cytokinesis in the procyclic fonn of Trypanosoma brucei." BMC Cell Biology

10:68.

Guillard, R. R. L. (1973). Division rates. Cambridge, Cambridge University Press.

Gutierrez, C., E. Ramirez-Parra, et al. (2002). "G 1 to S transition: more than a cell cycle

engine switch." Current Opinion in Plant Biology 5(6): 480-486.

Hackett, J. D., D. M. Anderson, et al. (2004). "Dinoflagellates: a remarkable evolutionary

experiment." American Journal of Botany 91(10): 1523-1534.

Hagiwara, S.-y., M. Takahashi, et al. (2001). "Novel Findings Regarding Photoinduced

Commitments of G 1-, S- and G2-phase Cells to Cell-cycle Transitions in Darkness

and Dark-induced Gl-, S- and G2-phase Arrests in Euglena." Photochemistry and

Photobiology 74(5): 726-733.

Haglund, K., P. P. Di Fiore, et al. (2003). "Distinct monoubiquitin signals in receptor

endocytosis." Trends in Biochemical Sciences 28(11): 598-604.

Hassan, P., D. Fergusson, et al. (2001). "The CRK3 protein kinase is essential for cell cycle

progression of Leishmania mexicana." Molecular and Biochemical Parasitology

113(2): 189-198.

Hastings, M. H. (2003). "Circadian clocks: self-assembling oscillators?" Current Biology

13(17): R681-R682.

Heil, C. A., Vargo, G., Spence, G., Neely, M. B., Merkt, R., Lester, K., & Walsh, 1. (2001).

"Nutrient stoichiomtery of a Gymnodinium breve Davis (:

Dinophyceae) bloom: what limits blooms in oligotrophic environments? In: 196 Proceedings of the Ninth International Conference on Harmful Algal Blooms,

Intergovernmental Oceanographic Commission of UNESCO, Paris, France, pp.165-

169.

Henrich, S., S. 1. Cordwell, et al. (2007)."The nuclear proteome and DNA-binding fraction of

human Raji lymphoma cells." Biochimica et Biophysica Acta (BBA) - Proteins &

Proteomics 1774(4): 413-432.

Hicke, L., H. L. Schubert, et al. (2005). "Ubiquitin-binding domains." Nat Rev Mol Cell BioI

6(8): 610-621.

Hoege, C., B. Pfander, et al. (2002). "RAD6-dependent DNA repair is linked to modification

ofPCNA by ubiquitin and SUMO." Nature 419(6903): 135-141.

Holzmann, K., Gerner, C., PaltI, A., Schafer, R., Obrist, P., Ensinger, C., Grimm, R., &

Sauermann, G. (2000). "A human common nuclear matrix protein homologous to

eukaryotic translation initiation factor 4A. Biochem.Biophys. Res. Commun. 267(1):

339-344.

Homma, K. & J. Hastings (1989). " kinetics, division asymmetry and volume

control at division in the marine dinoflagellate Gonyaulax polyedra: a model of

circadian clock control of the cell cycle." J. Cell Sci. 92(2): 303-318.

Homma, K. & J. W. Hastings (1989). "The S phase is discrete and is controlled by the

circadian clock in the marine dinoflagellate Gonyaulax polyedra." Experimental Cell

Research 182(2): 635-644.

Hou, Y. & S. Lin (2009). "Distinct Gene Number-Genome Size Relationships for Eukaryotes

and Non-Eukaryotes: Gene Content Estimation for Dinoflagellate Genomes." PLoS

ONE 4(9): e6978.

197 Hurd, H. K., C. W. Roberts, et al. (1987). "Identification of the gene for the yeast

ribonucleotide reductase small subunit and its inducibility by methyl

methanesulfonate." Mol. Cell. BioI. 7(10): 3673-3677.

Hutchins, A. P., Roberts, G. R., Lloyd, C. W. & Doonan, J. H. (2004). "In vivo interaction

between CDKA and eIF4A: a possible mechanism linking translation and cell

proliferation." FEBS Letters 556(91-94).

Huwe, A., R. Mazitschek, et al. (2003). "Small Molecules as Inhibitors of Cyclin- Dependent

Kinases." Angewandte Chemie International Edition 42(19): 2122- 2138.

Huysman, M., C. Martens, et al. (2010). "Genome-wide analysis of the diatom cell cycle

unveils a novel type of cyclins involved in environmental signaling." Genome

Biology 11 (2): Rl 7.

Indiani, A. A. o. D. (2006). "Mechanism of the 8-wrench in opening the Bsliding clamp."

Journal of Biological Chemistry 278(41): 40272-40281.

Ishigaki, Y., X. Li, et al. (2001). "Evidence for a pioneer round ofmRNA translation:

mRNAs subject to nonsense-mediated decay in mammalian cells are bound by

CBP80 and CBP20." Cell 106(5): 607-617.

Issar, N., E. Roux, et al. (2008). "Identification of a novel post-translational modification in

Plasmodium Jalciparum: protein sumoylation in different cellular compartments."

Cellular Microbiology 10(10): 1999-2011.

Janowitz, G. S. & Kamykowski, D. (2006)."Modeled Karenia brevis accumulation in the

vicinity of a coastal nutrient front." Marine Ecology Progress Series 314: 49-59.

Jensen, L. J., Jensen, T. S., de Lichtenberg, D., Brunak, S., & Bork, P. (2006)."Co- evolution

of transcriptional and post-translational cell-cycle regulation." Nature 443: 594-597.

198 Jones, D. T., W.R. Taylor & J.M. Thornton.(1992). "The Rapid Generation of Mutation Data

Matrices from Protein Sequences." Computer Applications in the Biosciences 8: 275-

282.

Kamykowski, D., Milligan, E. J. & Reed, R. E. (1998). "Biochemical relationships withthe

orientation of the autotrophic dinoflagellate Gymnodinium breve under nutrient

replete conditions." Mar. Ecol. Prog. Ser. 167: 105-117.

Khan, F., J. Tang, et al. (2002). "Cyclin-dependent kinase TPK2 is a critical cell cycle

regulator in Toxoplasma gondii." Molecular Microbiology 45(2): 321-332.

Koroleva, O. A., J. W. Brown, et al. (2009). "Localization of eIF4A-III in the nucleolus and

splicing speckles is an indicator of plant stress." Plant Signa1.Behav. 4(12): 1148-

1151.

Kumar, D., Minocha, N., Rajanala, K. & Saha, S. (2009). "The distribution pattern of

proliferating cell nuclear antigen in the nuclei of Leishmania donovani."

Microbiology 155: 3748-3757.

Kuriyan, J. M. O. D. (1993). "Sliding clamps of DNA polymerases." Journal of Molecular

Biology 234: 915-925.

Kvaal, C. A., J. R. Radke, et al. (2002). "Isolation of a Toxoplasma gondii cyclin by yeast

two-hybrid interactive screen." Molecular and Biochemical Parasitology 120(2): 187-

194.

Le, Q. H., Markovic, P., Hastings, J. W., Jovine, R. V. M. & Morse, D. (1997). "Structure

and organization of the peridinin chlorophyll a binding protein gene in Gonyaulax

polyedra." Molecular and General 255: 595-604.

199 Le, Q. H., R. Jovine, P. Markovic & D. Morse (2001). "Peridinin-Chlorophyll a-Protein is not

implicated in the photosynthesis rhythm of the dinoflagellate Gonyaulax despite

circadian regulation of its translation." Biological Rhythm Research 32: 579-594.

Le Roch, K., C. Sestier, et al. (2000). Activation of a Plasmodium Jalciparum cdc2- related

kinase by heterologous p25 and cyclin H. Functional characterization of a P.

Jalciparum cyclin homologue. J BioI Chern 275(12): 8952-8958.

Le Roch, K., Johnson, J.R., Florens,L., Zhou, Y., Santrosyan, A., Grainger, M.,Yan, S.F.,

Williamson, K. C., Holder, A.A., Carucci, DJ., Yates, J.R., & Winzeler, E.A. (2004)

"Global analysis of transcript and protein levels across the PlasmodiumJalciparum

life cycle." Genome Research 14: 2308-2318.

Lee, S. H. H., J. (1990). "Mechanism of elongation of primed DNA by DNA polymerase

delta, proliferating cell nuclear antigen, and activator 1." Proceedings of the National

Academy of Sciences USA 87: 5672-5676.

Leggat, W., O. Hoegh-Guldberg, et al. (2007)."Analysis of an EST library from the

dinoflagellate (Symbiodinium sp.) symbiont of reef-building corals." Journal of

Phycology 43(5): 1010-1021.

Leighfield, T. A. & F. M. Van Dolah (2001). "Cell cycle regulation in a dinoflagellate,

Amphidinium operculalum: identification of the diel entraining cue and a possible

role for cyclic AMP." Journal of Experimental Marine Biology and Ecology 262(2):

177-197.

Lidie, K. B., Ryan, J. C., Barbier, M., & Van Dolah, F. M. (2005). "Gene expression in

Florida red tide dinoflagellate Karenia brevis: analysis of an expressed sequence tag

library and development of DNA microarray." Marine Biotechnology 7(5): 481-493.

200 Lidie, K. B. & Van Dolah, F. M. (2007)."Spliced leader RNA-mediated trans-splicing in a

dinoflagellate, Karenia brevis."Journal of Eukaryotic Microbiology 54: 427- 435.

Lin, S., H. Zhang, et al. (2010). "Spliced leader-based metatranscriptomic analyses lead to

recognition of hidden genomic features in dinoflagellates." Proceedings of the

National Academy of Sciences 107(46): 20033-20038.

Litaker, R. W., V. E. Warner, et al. (2002). "Effect of diel and interday variations in light on

the cell division pattern and in situ growth rates of the bloom-forming dinoflagellate

Heterocapsa triquetra." Marine Ecology Progress Series 232: 63- 74.

Liu, J. & Kipreos, E.T. (2000) "Evolution of cyclin-dependent kinases (CDKs) and CDK­

activating kinases (CAKs): Differential conservation ofCAKs in yeast and metazoa."

Molecular Biology and Evolution 17: 1061-1074.

Liu, J., Jiao, N., Hong, H., Luo, T. &Cai, H. (2005). "Proliferating cell nuclear antigen

(PCNA) as a marker of cell proliferation in the marine dinoflagellate Prorocentrum

donghaiense Lu and the green alga Dunaliella salina Teodoresco." Journal of

Applied Phycology 17: 323-330.

Liu L.Y., Hastings, J. W. (2006). "Novel and rapidly diverging intergenic sequences between

tandem repeats of the luciferase genes in seven dinoflagellate species." Journal of

Phycology 42: 96-103.

Loor, G., Zhang, S. 1., Zhang, P., Toomey, N. L. & Lee, M. Y. W. T. (1997). "Identification

of DNA replication and cell cycle proteins that interact with PCNA." Nucleic Acids

Research 25(24): 5041-5046.

Magana, H. A., & Villareal, T. A. (2006). "The effects of environmental factors on the

growth rate of Karenia brevis (Davis) G. Hansen and Moestrup. Hannful Algae 5:

192-198. 201 McInnes, C., S. Wang, et al. (2004). "Structural Detenninants of CDK4 Inhibition and Design

of Selective ATP Competitive Inhibitors." Chemistry & Biology 11 (4): 525-534.

Meulmeester, E. and F. Melchior (2008). "Cell biology: SUMO." Nature 452(7188): 709-

711.

Mittag, M., D.H. Lee & l.W. Hastings. (1994). "Circadian expression of the luciferin­

binding protein correlates with the binding of a protein to the 3' untranslated region of

its mRNA." Proceedings of the National Academy of Sciences USA 91(12): 5257-

5261.

Mittag, M. (1996). "Conserved circadian elements in phylogenetically diverse algae."

Proceedings of the National Academy of Sciences 93(25): 14401-14404.

Mittag, M., Li, L. & Hastings, l. W. (1998). "The mRNA level of circadian regulated

Gonyaulax luciferase remains constant over the cycle." Chronobiology International

15: 93-98.

Morgan, D. O. (1997). "Cyclin-dependent kinases: Engines, Clocks, and Microprocessors."

Annual Review of Cell and Developmental Biology 13(1): 261-291.

Morse, D., P. M. Milos, et al. (1989). "Circadian regulation of bioluminescence in Gonyaulax

involves translational contro1." Proc Natl Acad Sci USA 86(1): 172- 176.

Murphy, W. 1., Watkins, K. P. & Agabian, N. (1986) "Identification ofa novel Y branch

structure as an intennediate in trypanosome mRNA processing: evidence for

transsplicing." Cell 47:517-525.

Murphy, R. F. (2005). "Location proteomics: a systems approach to subcellular location."

Biochemical Society Transactions 33: 535-538.

Murray, A. W. (2004). "Recycling the Cell Cycle: Cyclins Revisited." Ce11116(2): 221- 234.

202 Nosenko, T. & D. Bhattacharya (2007). "Horizontal gene transfer in chromalveolates." BMC

Evol BioI 7: 173.

Okamoto, O. K., D. L. Robertson, et al. (2001). "Different regulatory mechanisms modulate

the expression of a dinoflagellate iron-superoxide dismutase." J BioI Chern 276(23):

19989-19993.

O'Neill, J. S., G. van Ooijen, et al. (2011). "Circadian rhythms persist without transcription in

a eukaryote." Nature 469(7331): 554-558.

Ott, D. E., Coren, L. V., Copeland, T. D., Kane, B.P., Johnson, D. G., Sowder ,R. C.,

Yoshinaka, Y., Oroszlan, S., Arthur, L. 0., & Henderson, L. E. (1998). "Ubiquitin is

covalently attached to the p6Gag proteins of human immunodeficiency virus type 1

and simian immunodeficiency virus and to the p12Gagprotein of Moloney Murine

Leukemia Virus." Journal of Virology 72(4): 2962-2968.

Ouyang, Y . ., C. R. Andersson, et al. (1998). "Resonating circadian clocks enhance fitness in

cyanobacteria." Proceedings of the National Academy of Sciences 95( 15): 8660-

8664.

Panday, A., Choudhary, M.K., Bhushan, D., Chattopadhyay, A., Chakraborty, S., Datta, A.,

& Chakraborty, N. (2006) "The Nuclear Proteome of Chickpea (Cicerarietinum L.)

Reveals Predicted and Unexpected Proteins." Journal of Proteo me Research 5(12):

3301-3311.

Parker, J. L., Bucceri, A., Davies, A. A., Heidrich, K., Windecker, H. & Ulrich, H. D. (2008).

"SUMO modification ofPCNA is controlled by DNA." The EMBO Journal 27:

2422-2431.

203 Partensky, F., D. Vaulot, et al. (1991). "Growth and cell cycle of two closely related red tide­

forming dinoflagellates: Gymnodinium nagasakiense and G. CF. Nagasakiense."

Journal ofPhycology 27(6): 733-742.

Pickart, C. M. (2004)."Back to the Future with Ubiquitin." Cell 116(2): 181-190.

Ponder, E. L. and M. Bogyo (2007). "Ubiquitin-Like Modifiers and Their Deconjugating

Enzymes in Medically Important Parasitic ." Eukaryotic Cell 6( 11): 1943-

1952.

Radivojac, P., V. Vacic, et al. (2010). "Identification, analysis, and prediction of protein

ubiquitination sites." Proteins: Structure, Function, and Bioinformatics 78(2): 365-

380.

Ramalho, C. B., J. W. Hastings, et al. (1995). "Circadian Oscillation of Nitrate Reductase

Activity in Gonyaulax polyedra Is Due to Changes in Cellular Protein Levels." Plant

Physiology 107(1): 225-231.

Ren, J., X. Gao, et al. (2009). "Systematic study of protein sumoylation: Development of a

site-specific predictor ofSUMOsp 2.0." Proteomics 9(12): 3409-3412.

Renaut, J., J. Hausman and M. E. Wisniewski. (2006). "Proteomics and low-temperature

studies: bridging the gap between gene expression and metabolism." Physiologia

Plantarum 126(1): 97-109.

Robbens, S., Khadaroo, B., Camasses, A., Derelle, E., Ferraz, C., Inze, D., Van de Peer, Y.,

& Moreau, H. (2005). "Genome-wide analysis of core cell cycle gens in the

unicellular green alga, Ostreococcus tauri." Mol. BioI. Evol. 22: 589-597.

Rodriguez, M. S., C. Dargemont, et al. (2001). "SUMO-l Conjugation in Vivo Requires Both

a Consensus Modification Motif and Nuclear Targeting." Journal of Biological

Chemistry 276(16): 12654-12659. 204 Roenneberg, T. (1995)."The effects of light on the Gonyaulax circadian system." Ciba Found.

Symp. 183(117-133).

Roenneberg, T., & Mittag, M. (1996)."The circadian program of algae." Seminars in Cell and

Developmental Biology 7: 753-763.

Roenneberg, T. (1996)."The complex circadian system of Gonyaulax polyedra." Physiologia

Plantarum 96(4): 733-737.

Roenneberg, T., M. Merrow, & B. Eisensamer.( 1997/98). "Cellular mechanisms of circadian

systems." Zoology 100: 273-286.

Rustici, G., Mata, J., Kivinen, K., Lio, P., Penkett, C. J., Bums, G., Hayles, 1., Brazma, A.,

Nurse, P., & Bahler, J. (2004). "Periodic gene expression program of the fission yeast

cell cycle." Nature Genetics 36(8): 809-817.

Sancar A . ., C. T., RJ. Thresher, F. Araujo, J. Mo, S. Ozgur, E. Vegas, L. Dawut, & C.P.

Selby (2000). "Photolyase/cryptochrome family blue-light photoreceptors use light

energy to repair DNA or set the circadian clock." Cold Spring Harbor Symposia on

Quantitative Biology 65: 157-171.

Schaffer, R., J. Landgraf, et al. (2001). "Microarray Analysis of Diurnal and Circadian­

Regulated Genes in Arabidopsis." The Plant Cell Online 13(1): 113-123.

Schaffer, B. A., Kamykowski, D., McKay, L., Sinclair, G., & Miiligan, E. J. (2007). "A

comparison of photoresponse among ten different Karenia brevis (Dinophyceae)

isolates" Journal ofPhycology 43: 702-714.

Schirmer, E.C., & Gerace, L. (2002) " Organellar proteomics: the prizes and pitfalls of

opening the nuclear envelope." Genome Biology 3(4): 1-4.

Schroder-Lorenz, A. a. L. R. (1987). "Circadian changes in protein-synthesis rate and protein

phosphorylation in cell-free extracts of Gonyaulax polyedra." Planta 170(7-13). 205 Sherr, C. J. (1996). "Cancer Cell Cycles." Science 274(5293): 1672-1677.

Sherr, C. J. & Roberts, J.M. (1999). "CDK inhibitors: positive and negative regulators of G 1-

phase progression." Genes & Development 13(12): 1501-1512.

Sinclair, G., D. Kamykowski, et al. (2009). "Growth, uptake, and assimilation of ammonium,

nitrate, and urea, by three strains of Karenia brevis grown under low light." Harmful

Algae 8(5): 770-780.

Sivan, G. & O. Elroy-Stein (2008). "Regulation ofmRNA Translation during cellular

division." Cell Cycle 7(6): 741-744.

Somers, D. E. (1999)."The Physiology and Molecular Bases of the Plant Circadian Clock."

Plant Physiology 121(1): 9-19.

Soni, R., L. Muller, et al. (2000). "Inhibition ofCyclin-Dependent Kinase 4 (Cdk4) by

Fascaplysin, a Marine Natural Product." Biochemical and Biophysical Research

Communications 275(3): 877-884.

Sorokin, A., E. Kim, et al. (2009). "Proteasome system of protein degradation and

processing." Biochemistry (Moscow) 74(13): 1411-1442.

Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P.

0., Botstein, D., & Futcher, B. (1998). "Comprehensive identification of cell cycle­

regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. "

Mol. BioI. Cell. 9: 3273-3297.

Stumpf, R. P., R. W. Litaker, et al. (2008). "Hydrodynamic accumulation of Karenia off the

west coast of Florida." Continental Shelf Research 28(1): 189-213.

Stumpf, R. P., M. C. Tomlinson, et al. (2009). "Skill assessment for an operational algal

bloom forecast system." J Mar Syst 76(1-2): 151-161.

206 Sweeney, B. M. & J. W. Hastings (1958)."Rhythmic Cell Division in Populations of

Gonyaulax polyedra." Journal of Eukaryotic Microbiology 5(3): 217-224.

Taroncher-Oldenburg, G., D. M. Kulis, et al. (1997). "Toxin Variability During the Cell

Cycle of the Dinoflagellate A lexandriumfundyense." Limnology and Oceanography

42(5): 1178-1188.

Toulza, E., Shin, M., Blanc, G., Audic, S., Laabir , M., Collos, Y., Claverie, 1., & Grzebyk,

D. (2010). "Transcriptome of proliferating Alexandrium cells."Applied

Environmental Microbiology.

Trachana, K., Jensen, L. J. & Bork, P. (2010). "Evolution and regulation of cellular periodic

processes: a role for paralogs." EMBO Reports 11(3): 233-238.

Ulrich, H. D. (2009a). "Regulating post-translational modifications of the eukaryotic

replication clamp PCNA." DNA Repair 8(4): 461-469.

Ulrich, H. D. (2009b). "Ubiquitin, SUMO and the maintenance of genome stability." DNA

Repair 8(4): 429-429.

Umen, J. G., & Goodenough, U.W. (2001). "Control of cell division by a retinoblastoma

protein homolog in Chlamydomonas." Genes and Development 15: 1652-1661.

Van Dolah, F. M., Leighfield, T. A., Sandel, H.D., & Hsu, C.K. (1995). "Cell division in the

dinoflagellate toxicus is phased to the diurnal cycle and accompanied

by activation of the cell cycle regulatory protein cdc2 kinase." Journal of Phycology

31: 395-400.

Van Dolah, F. M., & T.A. Leighfield (1999)."Diel phasing of the cell-cycle in the Florida red

tide dinoflagellate, Gymnodinium breve." Journal ofPhycology 35: 1404-1411.

207 Van Dolah, F. M., Lidie, K., Morey, J., Brunelle, S., Ryan, J., Monroe, E., & Haynes, B.

(2007). "Microarray analysis of diurnal and circadian regulated genes in the Florida

red tide dinoflagellate, Karenia brevis."Journal ofPhycology 43: 741- 752.

Van Dolah, F. M., T. A. Leighfield, et al. (2008). "Cell cycle behavior of laboratory and field

populations of the Florida red tide dinoflagellate, Karenia brevis." Continental Shelf

Research 28(1): 11-23.

Vanhamme, L., & Pays, E. (1995) "Control of gene expression in trypanosomes." Microbiol

Rev 59(2): 223-240.

Vaulot, D., Olson, RJ. & Chisholm, S.W. (1986) "Light and dark control of the cell cycle in

two marine phytoplankton species." Experimental Cell Research 167: 38- 52.

Volknandt, W. & R. Hardeland (1984). "Circadian rhythmicity of protein synthesis in the

dinoflagellate, Gonyaulax polyedra: a biochemical and radioautographic

investigation." Comparative Biochemistry and Physiology Part B: Comparative

Biochemistry 77(3): 493-500.

Walsh, J.J., Dieterle, D.A., Darrow, B.P., Milroy, S. P., Joliff, lK., Lenes, lM., Weisberg,

R.H., & He, R. (2004)."Coupled biophysical models of Florida red tides." In:

Harmful Algae 2002. Eds. K. A. Steidinger, J. H. Landsberg, C. R. Tomas, G. A.

Vargo. Florida Fish and Wildlife Conservation Commission, Florida Institute of

Oceanography, and Intergovernmental Oceanographic Commission of UNESCO.

Walsh, J. J., J. K. Jolliff, et al. (2006). "Red tides in the Gulf of Mexico: Where, when, and

why?" J. Geophys. Res. 111(C11): C11003.

Weissman, A. M. (2001). "Themes and variations on ubiquitylation." Nat Rev Mol Cell BioI

2(3): 169-178.

208 Windecker, H. & H. D. Ulrich (2008)."Architecture and Assembly of Poly-SUMO Chains on

PCNA in Saccharomyces cerevisiae." Journal of Molecular Biology 376(1): 221-231.

Wijnen, H. & Young, M.W. (2006) "Interplay of circadian clocks and metabolic rhythms."

Annual Review of Genetics 40: 409-448.

Zachleder, V. and H. V. D. Ende (1992)."Cell cycle events in the green alga Chlamydomonas

eugametos and their control by environmental factors." J Cell Sci 102(3): 469-474.

Zhang, H., HOll, Y., & Lin, S. (2006). "Isolation and characterization of proliferating cell

nuclear antigen from the dinoflagellate Pfisteria piscicida." Journal of Eukaryotic

Microbiology 53: 1-9.

Zhang, H., Hou, Y., Miranda, L., Campbell, D. A., Sturm, N. R., Gaasterland, T., & Lin, S.

(2007). "Spliced leader RNA trans-splicing in dinoflagellates." Proceedings of the

National Academy of Sciences USA 104(11): 4618-4623.

209 Appendix 1: Cell cycle genes present in K. brevis. Contigs with homology to S-phase genes are shaded. Contigs in bold are S-phase genes investigated in this study. Marine genomics ID is shown and sequences can be found at www.marinegenomics.org.

210 Marine Sequence Minimum Mean Sequence Description #Hits #GOs Gene Ontologies Genomics 10 Length e-value Similarity P:mitosis; C:spindle; C:centrosome; F:ATP binding; F:- MGI D1986592 aaa atpase 737 10 1.63E-62 71.90% 7 severing ATPase activity; P:cell division; C:microtubule C:cytoplasm; C: complex; F:protein binding; P:mitosis; P:nervous system development; F:motor activity; C:spindle pole; MGID103202 aaa atpase 571 10 3.39E-17 54.50010 8 C:microtubule F:DNA ligase (ATP) activity; P:cell cycle; P:DNA repair; F:ATP binding; F:DNA binding; P:cell division; P:DNA replication; P:DNA MGID2059523 atp-dependent dna ligase 1700 10 2.27E-05 45.90% 8 recombination P:apoptosis; F:protein serine/threonine kinase activity; P:cell division; F:ATP bi nding; P:mitosis; C:outer kinetochore of condensed chromosome; P:negative regulation of cell cycle; C:cytoplasm; C:nucleus; P:protein amino acid phosphorylation; P:serine family MGID2033955 budding uninhibited by benzimidazoles 1 beta 969 10 l.28E-16 53.60'10 11 amino acid metabolic process MGIDl961650 budding uninhibited by benzimidazoles 3 homolog 752 3 1. 88E-05 46.67% 0 P:ciliary orflagellar motility; P:mitosis; F:calcium ion binding; MGI D2079501 caltractin ame: full=centrin 1295 10 l.37E-09 63.10010 6 C:centrosome; C:bacterial-type flagellum; P:cell division P:ciliary orflagellar motility; P:mitosis; C:bacterial-type flagellum; MGIDl944632 caltractin ame: full=centrin 438 10 2.59E-31 79.30010 5 F:calcium ion binding; P:cell division P:ciliary or flagellar motility; P:mitosis; C:bacterial-type flagellum; MGID1950142 caltractin ame : full=centrin 381 10 6.95E-16 62.00% 5 F:calcium ion binding; P:cell division c:cytosol; F:bindi ng; P:negative regulation of ubiquitin-protein ligase activity duri ng mitotic cell cycle; P:mitotic metaphase plate congression; P:cell division; P:anaphase-promoting complex- dependent proteasomal ubiquitin-dependent protein catabolic process; P:G1 phase of mitotic cell cycle; P:positive regulation of ubiquitin-protein I igase activity during mitotic cell cycle; P:regulation of exit from mitosis; P:regulation of mitotic metaphase/anaphase transition; C:nucleoplasm; C:anaphase-promoti ng complex MGID1986476 cdc23 (cell division cycle homolog) 696 10 1.72E-15 53.80010 12 MGI D2025806 cdc27 nuc2-like protein 1010 10 7.28E-18 52.60'10 1 F:binding P:protein amino acid phosphorylation; F:ATP binding; F:protein serine/threonine kinase activity; P:serine family amino acid metabolic MGI D2063280 cdc2-like protein 460 10 3.62E-41 71.10010 4 process P:peptidyl-tyrosine phosphorylation; F:ATP binding; F:protein tyrosine kinase activity; C:nucleus; P:protein amino acid autophosphorylation; F:protein serine/threonine kinase activity; P:serine family amino acid MGID1984286 cdc-like kinase 2 769 8 l.23E-15 55.25% 7 metabolic process P:protein amino acid phosphorylation; P:cell cycle; F:ATP binding; C:nucleus; F:cyclin-dependent protein kinase activity; P:cell division; ; MGI D2031566 cell cycle related kinase 1453 10 l.75E-56 53.90010 8 P:serine family amino acid metabolic process MGI D2079185 cell division 908 10 1. 11E-27 55.70010 2 P:cell cycle; F:binding C:cytoplasm; F:exonuclease activity; F:nucleic acid binding; P:rRNA MGID1950185 cell division control protei n 696 4 8.03E-05 43.25% 5 processing; C:nucleus F:RNA polymerase II carboxy-terminal domain kinase activity; C:midbody; C:spindle microtubule; P:mitotic cell cycle G2/M transition DNA damage checkpoint; P:cell division; P:anti-apoptosis; F:ATP binding; P:mitosis; F:cyclin-dependent protein kinase activity; C:nucleus; P:protein amino acid phosphorylation; F:Hsp70 protein binding; P:serine family amino acid metabolic process; MGIDl944473 cell division control protein 2 512 10 2.D4E-32 73.40010 14 P:regulation of mitotic metaphase/anaphase transition; F:binding; MGID2062699 cell division cycle protein 23 747 10 2.27E-27 59.50010 3 C:anaphase-promoting complex F:nucleoside-triphosphatase activity; F:ATP-dependent peptidase activity; F:ATP binding; P:proteolysis; F:serine-type endopeptidase activity; P:cell division; P:ATP-dependent proteolysis MGID106856 cell division cycle protein 48 594 10 1. 13E-64 70.70010 7

211 MGI D2053020 cell division protein 758 7 2.5OE-05 47.29% 0 F:identical protein binding; F:serine-type endopeptidase activity; P:response to cadmium ion; C:plasma membrane; C:cell wall; F:ATP binding; F:nucleoside-triphosphatase activity; P:proteolysis; F:ATP- dependent peptidase activity; P:cell division; P:ATP-dependent MGID2075292 cell division protein 880 10 3.40E-I13 81.40% 11 proteolysis MGIDI988475 cell division transporter substrate-bi nding protein 796 10 7.53E-ll 44.100A. 2 P:biological_process; C:cellular component P:ciliary or flagellar motility; C:bacterial-type flagellum; F:calcium ion MGIDI942943 centri n 304 10 8.53E-13 79.10% 3 binding F:ATP-dependent helicase activity; F:nucleic acid binding; P:mitosis; F:calcium ion binding; C:centrosome; F:ATP binding; P:cell division MGID1952319 centrin 500 10 6. 84E-14 91.80% 7 P:ciliary or flagellar motility; P:mitosis; C:bacterial-type flagellum; MGIDI950294 centrin 647 10 1.28E-51 80.00% 5 F:calcium ion binding; P:cell division F:ATP-dependent helicase activity; F:nucleic acid bindi ng; P:mitosis; F:calcium ion binding; C:centrosome; F:ATP binding; P:cell division MGID1939358 centrin 410 10 1. 14E-10 80.20% 7 F:ATP-dependent helicase activity; P:ciliary orflagellar motility; F:nucleic acid binding; F:calcium ion binding; F:ATP binding; C:bacterial MGID2035389 centrin 1074 10 5. 13E-57 82.90% 6 type flagellum P:ciliary or flagellar motility; P:mitosis; C:bacterial-type flagellum; MGID2035650 centrin 769 10 1.39E-75 87.60% 5 F:calcium ion binding; P:cell division P:ciliary or flagellar motility; P:mitosis; C:bacterial-type flagellum; MGID2035920 centrin 672 10 1.C6E-75 87.60% 5 F:calcium ion binding; P:cell division P:ciliary or flagellar motility; P:mitosis; C:bacterial-type flagellum; MGID2039572 centrin 764 10 1.37E-75 87.60% 5 F:calcium ion binding; P:cell division P:ciliary or flagellar motility; P:mitosis; C:bacterial-type flagellum; MGID2039826 centrin 826 10 1. 59E-75 87.60% 5 F:calcium ion binding; P:cell division P:ciliary orflagellar motility; P:mitosis; C:bacterial-type flagellum; MGI D2040229 centrin 774 10 2.85E-76 88.20% 5 F:calcium ion binding; P:cell division P:ciliary or flagellar motility; P:mitosis; C:bacterial-type flagellum; MGID2021179 centrin 815 10 1. 54E-75 87.60% 5 F:calcium ion binding; P:cell division P:ciliary or flagellar motility; P:mitosis; C:bacterial-type flagellum; MGID2043662 centrin 768 10 1.39E·75 87.60% 5 F:calcium ion binding; P:cell division P:ciliary or flagellar motility; P:mitosis; C:bacterial-type flagellum; MGID2045393 centrin 794 10 1. 48E·75 87.60% 5 F:calcium ion binding; P:cell division P:cil iary or flagellar motil ity; P:mitosis; C:bacterial-type flagellum; MGID2046404 centrin 770 10 6. 24E-76 88.200A. 5 F:calcium ion binding; P:cell division P:ciliary orflagellar motility; P:mitosis; C:bacterial-type flagellum; MG1D2047451 centrin 792 10 1.48E-75 87.60% 5 F:calcium ion binding; P:cell division F:ATP-dependent he Iicase activity; F:nucleic acid bindi ng; P:mitosis; F:calcium ion binding; C:centrosome; F:ATP binding; P:cell division MGID2053478 centrin 602 10 2. 11E-31 81.200Ai 7 P:ciliary orflagellar motility; P:mitosis; C:bacterial-type flagellum; MGID2055568 centrin 817 10 1.56E-75 87.60% 5 F:calcium ion binding; P:cell division P:ciliary orflagellar motility; P:mitosis; C:bacterial-type flagellum; MGID2057811 centrin 787 10 2.94E-76 88.20% 5 F:calcium ion binding; P:cell division P:ciliary or flagellar motility; P:mitosis; C:bacterial-type flagellum; MGID2072941 centrin 789 10 1.47E-75 87.600Ai 5 F:calcium ion binding; P:cell division MGID2033952 centromere protein zwl0 1370 10 7.88E-19 44.90% 3 C:chromosome, centromeric region; P:mitosis; C:nucleus MGIDI06806 centrosomal protein 860 1 5.44E-05 46.00% 0 MGID2036S40 centrosomal protein 110 847 10 5.65E-07 50.90% 3 C:centrosome; F:protein binding; P:cell division MGID1991453 centrosomal protein 164kda isoform 4 890 4 3.70E-04 48.00% 0 MGIDI985334 centrosomal protein 2 isoform 1 isoform 2 739 10 1.27E-06 44.90% 2 C:; P:cell cycle MG1D198649O centrosomal protein 250 642 10 5. 21E-OS 43.10% 2 C:cilium; P:cell cycle MGID2020556 centrosomal protein 290kda 1075 1 5.04E-04 48.oooA. 0

212 P:vesicle-mediated transport; C:cytosol; P:cell projection organization; F:microtubule minus-end binding; P:eye photoreceptor cell development; C:cell surface; C:centrosome; P:hindbrain development; F:transcription activator activity; C:photoreceptor connecting cilium; P:otic vesicle formation; C:gamma-tubulin complex; P:pronephros development; P:protein transport; C:membrane; C:nucleus; C:tubulin complex; P:positive regulation of transcription MGID2042570 centrosomal protein 290kda 1247 10 8.70E-06 43 .00010 18 C:chromosome, centromeric region; F:zinc ion binding; P:cell cycle; P:regulation of transcription, DNA-dependent; P:chromatin assembly or disassembly; F:histone methyltransferase activity (H3-K9 specific); P:histone H3-K9 methylation; C:nuclear heterochromatin; P:cell differentiation; F:chromatin binding; P:chromatin remodeling MGID1952149 chromodomain protein (hp1-like) 432 10 l.00E-06 65.90% 11 F:kinesin binding; P:centrosome organization; P:cell cycle; C:ci liary rootlet; P:cell projection organization; C:centriole; F:structural MGID1965108 ciliary rootlet coiled- rootletin 815 10 2.35E-07 40.90% 8 molecule activity; C:kinesin complex P:protein amino acid phosphorylation; P:positive regulation of cell proliferation; P:multicellular organismal development; P:regulation of mitosis; F:ATP binding; F:cyclin-dependent protein kinase activity; P:cell division; ; P:serine family amino acid metabolic process MG ID 108857 cyclin dependent kinase 804 10 1.52E-51 62.10% 9 MGID1990865 cyclin domain protein f-box protein 717 6 4.26E-08 54.50% 0 MGID2032522 cyclin domain protein f-box protein 1507 10 5.08E-14 47.70% 2 F:protein binding; F:calcium ion binding MGID2022329 cyclin fold protein 1 isoform 2 430 3 1.00E-12 58.00010 1 C:nucleus P:protein amino acid phosphorylation; F:ATP binding; F:protein serine/threonine kinase activity; P:serine family amino acid metabolic MGID2031949 cyclin-dependent kinase 1 p34cdc2 1332 10 1.87E-09 45.70% 4 process P:protein amino acid phosphorylation; F:ATP binding; F:protein serine/threonine kinase activity; F:protein tyrosine kinase activity; MGID2033180 cyclin-dependent kinase-related kinase 1309 10 6. 16E-66 61.00010 5 P:serine family amino acid metabolic process F:DNA binding; F:ATP-dependent DNA helicase activity; F:ATP binding; MGID2021802 helicase rad25 family protein 1249 10 2.39E-96 68.80% 6 P:nucleotide-excision repair; C:nucleus; C:replication fork P:DNA replication; F:ATP binding; F:zinc ion binding; F:DNA ligase MGID2045460 dna ligase 1198 10 6.88E-61 56.10% 6 (ATP) activity; P:DNA recombination; P:DNA repair F:protein tyrosine phosphatase activity; P:DNA replication; P:protein amino acid dephosphorylation; F:ATP binding; F:nucleoside- triphosphatase activity; F:DNA ligase (ATP) activity; P:DNA MGID2021309 dna ligase iii-like protein 956 10 6.38E-37 54.40% 9 recombination; P:DNA repair; P:tyrosine metabolic process C:focal adhesion; P:positive regulation of neurogenesis; P:response to X-ray; P:response to gamma radiation; F:protein C-terminus binding; P:central nervous system development; P:single strand break repair; P:chromosome organization; P:immunoglobulin V(D)J recombination; P:double-strand break repair via nonhomologous end joining; P:nucleotide-excision repair, DNA gap filling; P:cell division; C:cytoplasm; P:somatic stem cell maintenance; P:negative regulation of neuron apoptosis; C:DNA-dependent protein kinase-DNA ligase 4 complex; P:DNA ligation involved in DNA repair; F:DNA binding; C:condensed chromosome; P:T cell differentiation in the thymus; P:provirus integration; P:DNA ligation involved in DNA recombination; P:Tcell receptor V(D)J recombination; P:positive regulation of fibroblast proliferation; P:pro-B cell differentiation; F:DNA ligase (ATP) activity; C:DNA ligase IV complex; P:initiation of viral infection; F:ATP binding; P:in utero embryonic development; P:DNA replication; P:cell cycle; P:isotype switching MGID1960453 DNA ligase IV 809 10 1.76E-23 56.90% 33 MGID2073646 DNA ligase-like 655 6 1.00E-17 48% 3 F:ligase activty; F:methyltransferase activity; C:plastid

213 F:DNA-directed DNA polymerase activity; P:DNA replication; F:exonuclease activity; F:DNA bi nding; F:two-component sensor activity; C:membrane; C:DNA polymerase complex; P:two-component signal transduction system (phosphorelay); C:protein histidine kinase MGID107892 dna polymerase epsilon subunit 803 10 8.10E-13 45.80% 10 complex; P:phosphorylation F:DNA-directed DNA polymerase activity; P:DNA replication; F:DNA binding; C:DNA polymerase III complex; F:ATP binding; F:nucleoside- MGID1970469 dna polymerase iii (subunit gamma tau) 744 1 2.05E-04 40.00% 7 triphosphatase activity; C:nucleus MGID1971927 dna polymerase kappa 793 10 5.89E-32 59.20% 1 P:DNA repair F:N-acetyltransferase activity; F:DNA binding; C:intracellular; F:aspartic-type endopeptidase activity; F:zinc ion binding; P:DNA MGID1993226 dna polymerase lambda (pol lambda) 828 5 3.50E-06 49 .60"10 7 integration; P:acyl -carrier-protein biosynthetic process MGID2053352 dna replication complex gins protein psfl 850 10 9.65E-23 54.70% 1 C:integral to membrane F:ATP binding; C:nucleus; F:DNA binding; P:DNA replication initiation MGID1984213 dna replication licenSing factor mcm6 852 10 1. 18E-44 57.40% 4 P:regulation of DNA replication initiation; P:cytokinesis; F:zinc ion binding; P:embryonic development ending in birth or egg hatching; P:cell cycle; P:regulation of transcription, DNA-dependent; P:ONA unwinding involved in replication; C:MCM complex; P:reproduction; F:DNA binding; F:ATP binding; C:chromatin; F:ATPase activity; MGID2054360 dna replication licensing factor mcm6 1816 10 4.97E-98 60.60% 14 F:chromatin binding P:DNA replication initiation; F:DNA binding; F:ATP binding; MGID2060973 dna replication licensing factor mcm7 578 10 1.63E-06 57.00% 5 F:nucleoside-triphosphatase activity; C:nucleus F:nucleoside-triphosphatase activity; P:cell cycle; C:integral to membrane; F:ATP binding; F:DNA binding; P:cell division; MGIDl948832 dna segregation atpase related protein 459 3 4. 17E-05 50.33% 7 P:chromosome segregation F:nucleoside-triphosphatase activity; P:cell cycle; C:integral to membrane; F:ATP binding; F:DNA binding; P:cell division; MGID2067895 dna segregation atpase related protein 480 4 4.18E-05 51.50% 7 P:chromosome segregation C:cytoplasm; P:negative regulation of cell cycle; C:nucleus; F:protein MGID1960293 dph1 homolog 786 10 2.59E-48 69.70% 5 binding; P:cell proliferation P:ciliary or flagellar motility; F:ATP -dependent helicase activity; F:nucleicacid binding; P:mitosis; F:calcium ion binding; C:centrosome; F:ATP binding; C:bacterial -type flagellum; P:cell division MGID1964053 ef-hand 1 613 10 1.31E-31 80.80% 9 C:cytoplasm; P:microtubule-based movement; C:microtubule associated complex; P:mitosis; F:ATP binding; P:cell division; MGID2059101 kinesin-related eg5 1 1523 10 2.85E-47 54.30% 8 F:microtubule motor activity; C:microtubule F:sequence-specific DNA binding; C:nucleus; F:flap endonudease activity; F:5'-3' exonuclease activity; P:DNA repair; P:DNA catabolic MGID1947619 flap endonuclease 1 541 10 2.47E-08 60.70% 6 process P:cell cycle; C:intracellular; P:barrier septum formation; F:GTP binding MGID1947939 gtp-binding protein 358 10 1.03E-19 57.40% 4 F:protein transporter activity; F:protein binding, bridging; P:chromosome condensation; C:nuclear pore; P:M phase of mitotic cell cycle ; P:NLS-bearing substrate import into nucleus; C:cytoplasm; P:response to stress; F:GTP binding; P:regulation of mitotic cell cycle MGI02073891 importin alpha 1766 10 1.62E-38 46.80% 10 MGI02032753 katanin p80 (wd repeat containing) subunit b 1 813 10 4.40E-22 54.90% 5 P:mitosis; C:spindle; C:centrosome; C:microtubule; P:cell division P:negative regulation of microtubule depolymerization; P:positive regulation of microtubule depolymerization; C:axon; c:microtubule; C:spindle pole; C:centrosome; P:cell division; F:microtubule-severing ATPase activity; F:dynein binding; P:mitosis; F:protein heterodimerization activity; C:dynein complex MGID2032756 katanin p80 subunit 785 10 5.06E -12 49.50% 12 P:microtubule-based movement; P:cell proliferation; F:DNA binding; P:mitosis; C:spindle; C:plus-end kinesin complex; C:centrosome; F:ATP binding; F:microtubule motor activity; C:microtubule MG101974388 kinesin family member 15 764 10 7.87E-15 57.90% 10 C:cytoskeleton; C:nudeoplasm; F:protein binding; C:focal adheSion; MGID2021441 mediator of dna damage checkpoint 1 840 10 3.6OE-14 54.20% 6 P:cell cycle ; P:DNA repair

214 MGID2031034 minichromosome maintenance complex component 10 833 10 1.22E-06 43.50"10 3 C:nucieus; F:zinc ion binding; P:DNA replication F:DNA helicase activity; F:nucieoside-triphosphatase activity; C:nucieolus; P:regulation of transcription, DNA-dependent; P:DNA unwinding involved in replication; F:ATP binding; F:protein binding; P:DNA replication initiation; F:single-stranded DNA binding; MGID2036751 minichromosome maintenance complex component 4 1312 10 5.63E-43 57.20% 11 C:nucieoplasm; C:replication fork F:DNA helicase activity; F:nucieoside-triphosphatase activity; C:nucieolus; P:regulation of transcription, DNA-dependent; P:DNA unwinding involved in replication; F:ATP binding; F:protein binding; P:DNA replication initiation; F:single-stranded DNA bind ing; MGID2036754 minichromosome maintenance complex component 4 823 10 3.44E-30 57.90% 11 C:nucieoplasm; C:repl ication fork MGID2058230 minichromosome maintenance protein 3 597 3.ooE-07 0 MGID1973770 mitotic checkpoint protein 751 10 5.99E-44 67.00% 1 F:protein binding MGIDl944670 mitotic checkpoint protein bub3 449 10 3.35E-18 57.90% 2 P:biological process; C:CUL4 RING ubiquitin ligase complex F:DNA polymerase processivity factor activity; F:DNA binding; P:regulation of DNA replication; C:PCNA complex; C:nucleus; C:DNA MG I 0107271 proliferating cell nuclear antigen 936 10 1.11E-134 94.50% 6 polymerase complex F:DNA polymerase processivity factor activity; F:DNA binding; P:regulation of DNA replication; C:PCNA complex; C:nucleus; C:DNA MG I 01938210 proliferating cell nuclear antigen 702 10 2.S4E-123 95.10% 6 polymerase complex F:DNA polymerase processivity factor activity; F:DNA binding; P:regulation of DNA replication; C:PCNA complex; C:nucleus; C:DNA MG I 01942569 proliferating cell nuclear antigen 834 10 6.37E-21 89.70% 6 polymerase complex F:DNA polymerase processivity factor activity; F:DNA binding; P:regulation of DNA replication; C:PCNA complex; C:nucleus; C:DNA MGIOl942792 proliferating cell nuclear antigen 779 10 1.44E-43 85.20% 6 polymerase complex F:DNA polymerase processivity factor activity; F:DNA binding; P:regulation of DNA replication; C:PCNA complex; C:nucleus; C:DNA MGI019S0082 proliferating cell nuclear antigen 660 10 3.20E-S2 76.30% 6 polymerase complex F:DNA polymerase processivity factor activity; F:ONA binding; P:regulation of DNA replication; C:PCNA complex; C:nucleus; C:DNA MGIOl9S0735 proliferating cell nuclear antigen 731 10 1.94E-l0 54.40% 6 polymerase complex F:DNA polymerase processivity factor activity; F:DNA binding; P:regulation of DNA replication; C:PCNA complex; C:nucleus; C:DNA MG I 01959915 proliferating cell nuclear antigen 596 10 1.66E-73 83.00% 6 polymerase complex F:DNA polymerase processivity factor activity; F:DNA binding; P:regulation of DNA replication; C:PCNA complex; C:nucleus; C:DNA MG I 01964521 proliferating cell nuclear antigen 793 10 5.14E-137 95.50% 6 polymerase complex F:DNA polymerase processivity factor activity; F:DNA binding; P:regulation of DNA replication; C:PCNA complex; C:nucleus; C:DNA MGIOl964522 proliferating cell nuclear antigen 827 10 2.18E-117 94.20% 6 polymerase complex F:DNA polymerase processivity factor activity; F:DNA binding; P:regulation of DNA replication; C:PCNA complex; C:nucleus; C:ONA MGIOl994890 proliferating cell nuclear antigen 789 10 9.39E-115 94.50% 6 polymerase complex F:DNA polymerase processivity factor activity; F:DNA binding; P:regulation of DNA replication; C:PCNA complex; C:nucleus; C:DNA MGI02019375 proliferating cell nuclear antigen 78> 10 5.38E-131 94.70% 6 polymerase complex F:DNA polymerase processivity factor activity; F:DNA binding; P:regulation of DNA replication; C:PCNA complex; C:nucleus; C:DNA MGI02027046 proliferating cell nuclear antigen 994 10 2.28E-141 94.90% 6 polymerase complex F:DNA polymerase processivity factor activity; F:DNA binding; P:regulation of DNA replication; C:PCNA complex; C:nucleus; C:DNA MG I 02047621 proliferating cell nuclear antigen 992 10 5.OSE-141 95.50% 6 polymerase complex F:DNA polymerase processivity factor activity; F:DNA binding; P:regulation of DNA replication; C:PCNA complex; C:nucleus; C:DNA MGI02OS2699 proliferating cell nuclear antigen 901 10 1.22E-135 95.40% 6 polymerase complex F:DNA polymerase processivity factor activity; F:DNA binding; P:regulation of DNA replication; C:PCNA complex; C:nucleus; C:DNA MGI02OS7215 proliferating cell nuclear antigen 945 10 9.41E-142 95.10% 6 polymerase complex

215 F:DNA polymerase processivity factor activity; F:DNA binding; P:regulation of DNA replication; C:PCNA complex; C:nucleus; C:DNA MGID2060175 proliferating cell nuclear antigen 440 10 1. 16E-47 94.30% 6 polymerase complex F:DNA polymerase processivity factor activity; F:DNA binding; P:regulation of DNA replication; C:PCNA complex; C:nucleus; C:DNA MGID2063631 proliferating cell nuclear antigen 669 10 6.10£-116 95.00% 6 polymerase complex F:DNA polymerase processivity factor activity; F:DNA binding; P:regulation of DNA replication; C:PCNA complex; C:nucleus; C:DNA MGID2063966 proliferating cell nuclear antigen 1921 10 1.13E-141 95.10% 6 polymerase complex F:DNA polymerase processivity factor activity; F:DNA binding; P:regulation of DNA replication; C:PCNA complex; C:nucleus; C:DNA MGID2069044 proliferating cell nuclear antigen 626 10 1.65E-45 92.30% 6 polymerase complex F:DNA polymerase processivity factor activity; F:DNA binding; P:regulation of DNA replication; C:PCNA complex; C:nucleus; C:DNA MGID2077091 proliferating cell nuclear antigen 1008 10 2.43E-13O 90.50% 6 polymerase complex F:manganese ion binding; P:regulation of DNA replication; F:protein C- terminus binding; P:ceramide metabolic process; C:protein phosphatase type 2A complex; P:response to organic substance; C:microtubule cytoskeleton; P:negative regulation of tyrosine phosphorylation of Stat3 protei n; P:regulation of cell adhesion; P:mesoderm development; C:cytosol; F:protein heterodimerization activity; P:regulation of Wnt receptor signaling pathway; P:RNA splicing; P:regulation of cell cycle; F:iron ion binding; P:induction of apoptosis; P:negative regulation of cell growth; P:i nactivation of MAPK activity; P:regulation of cell differentiation; P:protein amino acid dephosphorylation; P:regulation of transcription; P:second-messenger mediated signaling; C:plasma membrane; C:soluble fraction; C:nucleus; F:protein serine/threonine phosphatase activity; MGID207S819 protein phosphatase 2 catalytic alpha isoform 1392 10 8.S8E-138 84.70% 28 C:mitochondrion P:cell cycle; F:ATP binding; F:nucleoside-triphosphatase activity; MGID20811S7 rad17 homolog 1067 10 3. llE-14 42 .30"10 S C:nucleus; P:DNA repair P:DNA replication; P:nucleotide-excision repair, DNA gap filling; C:nucleoplasm; F:DNA clamp loader activity; C:DNA replication factor C complex; F:ATP binding; F:enzyme binding; C:DNA polymerase MGID20S6083 replication factor c (activator 1) 1298 10 8.76E-81 66.30% 8 complex P:DNA replication; F:ATPase activity; F:DNA binding; C:DNA replication factor C complex; F:protein binding; F:ATP binding; C:nucleus MGID106618 replication factor c (activator 1) 38kda 864 10 7.34E-5O 64.20% 7 P:DNA replication; F:DNA clamp loader activity; C:DNA replication factor C complex; F:ATP binding; C:nucleus; C:DNA polymerase MGID1942001 replication factor c 36 kda subunit 498 10 1. 16E-21 61.70% 6 complex P:nucleotide-excision repair, DNA damage removal; P:DNA recombination; P:; F:zinc ion binding; C:cytoskeleton; P:DNA- dependent DNA replication; P:nucleotide-exdsion repair, DNA gap filling; C:PML body; F:protein binding; C:condensed chromosome; F:single-stranded DNA binding; C:cytoplasm; F:chromatin binding; C:DNA replication factor A complex; C:chromatin MGID2026854 replication protein 70kda 1690 10 2.8OE-48 50.30% 15 P:DNA replication; P:chromatin modification; P:cell cycle; C:nucleus; MGID19S7189 retinoblastoma binding protein 4 44G 10 G.81E-OS 49.70% S P:regulation of transcription, DNA-dependent C:cytosol; P:purine nucleoside metabolic process; P:S phase of mitotic cell cycle; F:ribonucleoside-diphosphate reductase activity; P:oxidation reduction; F:ATP binding; P:purine deoxyribonucleoside triphosphate biosynthetic process; F:protein binding; P:pyrimidine deoxyribonucleoside triphosphate biosynthetic process; C:ribonucleoside-diphosphate reductase complex; P:regulation of DNA replication; P:purine base metabolic process; P:pyrimidlne base metabolic process; P:deoxyribonucleoside diphosphate metabolic MGID1964781 ribonucleoside-diphosphate alpha subunit 841 10 2.44E-95 71.80% 14 process

216 C:ribonucleoside-diphosphate reductase complex; P:DNA replication; F:protein binding; P:oxidatlon reduction; F:AlP binding; F:ribonucleoside-diphosphate reductase activity; P:purine base metabolic process; P:pyrimidine base metabolic process; MGI02022197 ribonucleoside-diphosphate alpha subunit 1153 10 1.56E-147 82.50% 9 P:deoxyribonucleoside diphosphate metabolic process P:oxidation reduction; C:ribonudeoside-diphosphate reductase complex; F:protein binding; F:ribonucleoside-diphosphate reductase activity; P:DNA replication; P:purine base metabolic process; P:pyrimidine base metabolic process; P:deoxyribonucleoside MGI02033417 ribonucleoside-diphosphate alpha subunit 807 10 1.51E-91 78.80% 8 diphosphate metabolic process P:oxidation reduction; C:ribonudeoside-diphosphate reductase complex; F:protein binding; F:ribonucleoside-diphosphate reductase activity; P:ONA replication; P:purine base metabolic process; P:pyrimidine base metabolic process; P:deoxyribonucleoside MGI02033632 ribonucleoside-diphosphate alpha subunit 821 10 6.99E-92 79.10% 8 diphosphate metabolic process C:ribonucleoside-diphosphate reductase complex; P:DNA replication; F:protein binding; P:oxidation reduction; F:AlP binding; F:ribonucleoside-diphosphate reductase activity; P:purine base metabolic process; P:pyrimidine base metabolic process; MGI02044150 ribonucleoside-diphosphate alpha subunit 2782 10 0 80.60% 9 P:deoxyribonucleoside diphosphate metabolic process C:ribonucleoside-diphosphate reductase complex; P:ONA replication; F:protein binding; P:oxidation reduction; F:AlP binding; F:ribonucleoside-diphosphate reductase activity; P:purine base metabolic process; P:pyrimidine base metabolic process; MGI01954849 ribonucleoside-diphosphate large 161 10 9.46E-13 84.80% 9 P:deoxyribonucleoside diphosphate metabolic process C:ribonucleoside-diphosphate reductase complex; P:ONA replication; F:protein binding; P:oxidation reduction; F:AlP binding; F:ribonucleoside-diphosphate reductase activity; P:purine base metabolic process; P:pyrimidine base metabolic process; MGI02030200 ribonucleoside-diphosphate large 837 10 3.41E-65 85.20% 9 P:deoxyribonucleoside diphosphate metabolic process C:ribonucleoside-diphosphate reductase complex; P:ONA replication; F:protein binding; P:oxidation reduction; F:AlP binding; F:ribonucleoside-diphosphate reductase activity; P:purine base metabolic process; P:pyrimidine base metabolic process; MGI02046113 ribonucleoside-diphosphate large 892 10 8.44E-65 86.50% 9 P:deoxyribonucleoside diphosphate metabolic process C:ribonucleoside-diphosphate reductase complex; P:ONA replication; F:protein binding; P:growth or development of symbiont on or near host; P:oxidation reduction; F:ribonucleoside-diphosphate reductase activity; P:purine base metabolic process; P:pyrimidine base metabolic process; P:deoxyribonudeoside diphosphate metabolic process MG I02063679 ribonucleoside-diphosphate large 494 10 1.11E-21 83.10% 9 MGI02057891 ribonucleoside-diphosphate reductase large chain 1n4 10 1.17E-161 79.70% 9 C:ribonucleoside-diphosphate reductase complex; P:DNA replication; MGI02056546 ribonucleotide-diphosphate small 1399 10 8.38E-125 80.90% 8 P:oxidation reduction; F:iron ion binding; F:ribonudeoside- P:chromatin modification; F:DNA-dependent ATPase activity; F:helicase activity; F:DNA binding; F:ATP binding; C:nucleus; F:zinc ion MGID105468 similar to Plasmodium falciparum. DNA helicase 861 10 8.39E-30 63.30% 7 binding MGID108507 snf2 family dna-dependent atpase 588 10 8.11E-2S 62.00"10 8 P:i ntein-mediated protein spl ici ng; F:hel icase activity; F:DNA binding; P:ubiquitin-dependent protein catabolic process; F:protein binding MGID2051432 s-phase kinase-associated 1309 10 7.57E-48 69.60% 2 P:regulation of DNA repl ication during S phase; C:nuclear chromatin; P:DNA replication checkpoint; P:cell division ; P:replication fork protection ; P:intra-S DNA damage checkpoint; F:protein binding; P:mitosis; P:positive regulation of cell proliferation; C:cytoplasm MGID2026304 timeless interacting protein 1486 10 5.34E -08 45.40% 10

217 Appendix 2: Genes involved in post-translational modifications, including the ubiquitination and sumoylation pathways, present in K. brevis. The contigs with homology to ubiquitin and SUMO polypeptide genes are shaded. Marine genomics ID is shown and sequences can be found at www.marinegenomics.org.

Marine Genomics Sequence Minimum Mean Sequence Description #Hits #GOs Gene Ontologies ID Length eValue Similarity

C:protein complex; P:protein catabolic process; F:ATP binding; MGID2030002 26s protease regulatory subunit 8 1418 10 0 88.10% 6 F:nucleoside-triphosphatase activity; C:nucleus; C:cytosol

C:protein complex; P:protein catabolic process; F:ATP binding; MG I 02034299 26s proteasome regulatory atpase subunit 1407 10 0 90.00% 6 F:nucleoside-triphosphatase activity; C:nucleus; c:cytosol

C:cytoplasm; F:peptidase activity; F:ATPase activity; P:ubiquitin- MGID1978797 26s proteasome regulatory particle triple-a atpase su 325 9 1.89E-29 68.22% 6 dependent protein catabolic process; F:ATP binding; C:nucleus

MG102021373 ankyrin partial 802 10 9.57E-06 55.70% 2 C:cytoplasm; P:u biquitin-dependent protein catabolic process F:ubiquitin-protein ligase activity; C:intracellular; P:ubiquitin MG I 02040404 ankyrin unc44 853 8 1.41E-05 54.63% 4 cycle; P:protein ubiquitination

F:protein binding; P:protein ubiquitination; F:ubiquitin-protein MGID2026032 ariadne homolog 2 791 10 6.29E-I0 40.70% 6 ligase activity; C:nucleus; F:zinc ion binding; C:cytosol F:ubiquitin-protein ligase activity; C:ubiquitin ligase complex; MGID2033025 arm repeat-containing protein 774 8 4.06E-06 40.75% 4 F:binding; P:protein ubiquitination C:eytoplasmic membrane-bounded vesicle; P:protein ubiquitination; F:ubiquitin-protein ligase activity; F:zinc ion MGID1969538 atlg15100 f911 3 599 10 7.95E-07 56.80% 6 binding; C:plastid; C:ubiquitin ligase complex C:cytoplasm; C:protein complex; P:ubiquitin-dependent protein MGID2034824 btb domain containing 1 1611 10 3.47E-08 38.90% 4 catabolic process; F:protein binding P:protein ubiquitination; F:ubiquitin-protein ligase activity; F:zinc MGID1975704 calcium ion binding protein 733 7 7,61E-04 56.29% 4 ion binding; C:ubiquitin ligase complex

C:cytosol; F:binding; P:negative regulation of ubiquitin-protein ligase activity during mitotic cell cycle; P:mitotic metaphase plate congression; P:cell division; P:anaphase-promoting complex- dependent proteasomal ubiquitin-dependent protein catabolic process; P:G1 phase of mitotic cell cycle; P:positive regulation of ubiquitin-protein ligase activity during mitotic cell cycle; P:regulation of exit from mitosis; P:regulation of mitotic metaphase/anaphase transition; C:nucleoplasm; C:anaphase- MGID1986476 edcB (cell division cycle homolog) 696 10 1.72E-15 53,80% 12 promoting complex

; F:ubiquitin-protein ligase activity; C:intracellular; P:ubiquitin MG1D2047165 chp-rich zinc finger 899 10 1.39E-06 46.40% 6 cycle; F:zinc ion binding; P:protein ubiquitination

C:cytoplasm; C:nucleolus; F:protein binding; P:ubiquitin- dependent protein catabolic process; F:ubiquitin thiolesterase MGID1982613 clan family ubiquitin hydrolase-like cysteine peptidas 757 10 1.97E-18 45.90% 6 activity; P:protein deubiquitination

C:cytoplasm; F:binding; P:mRNA transport; P:ubiquitin cycle; MG102038778 deltex 4 homolog 1182 10 6.21E-06 46.90% 6 F:ubiquitin-protein ligase activity; P:protein ubiquitination C:cytoplasm; F:ONA-dependent ATPase activity; F:helicase activity; F:DNA binding; F:protein binding; P:protein ubiquitination; F:ATP binding; F:ubiquitin-protein ligase activity; C:nucleus; F:zinc ion binding; C:ubiquitin ligase complex; P:DNA MGID1944780 dna repair protein 344 10 l.29E-09 56.30% 12 repair

P:protein modification process; P:proteasomal ubiquitin- dependent protein catabolic process; F:damaged DNA binding; MG I 02028070 dna repair...erotein putatitve 1021 10 1.96E-IO 56.50% 5 P:nucleotide-excision repair; C:nucleus

P:protein modification process; P:proteasomal ubiquitin- dependent protein catabolic process; F:damaged DNA binding; MGID2057998 dna repair protein rad23 619 10 4.40E-19 56.00% 5 P:nucleotide-excision repair; C:nucleus

F:ubiquitin-protein ligase activity; C:SCF ubiquitin ligase complex; MGI01958756 f-box and leucine-rich repeatIlrotein 14b 785 10 4.44E-08 46.20% 4 FJlfotein binding; P]>!,otein ubiquitination C:cytoplasm; P:ubiquitin-dependent protein catabolic process; MGID2041513 f-box and leucine-rich repeat protein 20 831 10 8.11E-19 43.30% 3 F:protein binding

218 P:ubiquitin-dependent protein catabolic process; F:zinc ion binding; F:peptidyl-arginine C-methyltransferase activity; F:ubiquitin-protein ligase activity; P:protein ubiquitination; F:protein binding; F:histone-arginine N-methyltransferase activity; C:ubiquitin ligase complex; P:sensory perception of MGID2039744 f-box only protein 1923 10 6.40E-75 54.40% 12 sound; C:cytoplasm; C:nucleus; P:protein amino acid methylation

F:protein binding; P:ubiquitin cycle; F:ubiquitin-protein ligase MGID2023224 hect domain and ankyrin repeat e3 ubiquitin protein 852 10 4.66E-33 52.60% 5 activity; C:endoplasmic reticulum; P:protein ubiquitination

P:ubiquitin-dependent protein catabolic process; F:zinc ion binding; C:mitochondrial inner membrane; F:guanyl-nucleotide exchange factor activity; P:spermatogenesis; F:ubiquitin-protein ligase activity; F:ATP binding; F:protein binding; F:heme binding; P:regulation of mitotic metaphase/anaphase transition; P:intraceUular protein transport; P:protein amino acid phosphorylation; C:anaphase-promoting complex; P:regulation MGID2033139 hect domain and rid 2 744 10 2.04E-12 48.60% 15 of GTPase activity; P:protein ubiquitination

P:regulation of mitotic metaphase/anaphase transition; F:protein binding; F:heme binding; P:ubiquitin cycle; F:ubiquitin-protein ligase activity; F:zinc ion binding; C:anaphase-promoting MGID2039182 hect domain and rid 2 674 10 2.84E-12 54.10% 8 complex; P:protein ubiquitination

C:intracellular; P:ubiquitin-dependent protein catabolic process; MGID1992109 hect domain and rid 3 833 10 5.43E-15 44.80% 4 F:acid-amino acid ligase activity; P:protein modification process P:regulation of mitotic metaphase/anaphase transition; F:protein binding; P:protein ubiquitination; F:heme binding; F:ubiquitin- protein ligase activity; F:zinc ion binding; C:anaphase-promoting MGID2057691 hect domain and rid partial 1194 10 2.77E-13 56.70% 7 complex F:ubiquitin-protein ligase activity; C:intracellular; P:ubiquitin MGID2029262 hect domain-containing family protein 757 10 2.10E·28 51.00% 4 cycle; P:protein ubiquitination

P:negative regulation of microtubule depolymerization; P:chromatin modification; P:ubiquitin-dependent protein catabolic process; F:histone deacetylase activity; F:zinc ion binding; P:protein amino acid deacetylation; P:protein polyubiquitination; C:microtubule; P:regulation of transcription, DNA-dependent; F:tubulin deacetylase activity; C:histone deacetylase complex; F:actin binding; C:cytoplasm; F:histone deacetylase binding; P:nitrogen compound metabolic process; MG1D1977338 histone deacetylase 6 739 10 1.77E-16 40.70% 16 C:tubulin complex

P:negative regulation of microtubule depolymerization; P:chromatin modification; P:ubiquitin-dependent protein catabolic process; F:histone deacetylase activity; F:zinc ion binding; P:protein amino acid deacetylation; P:protein polyubiquitination; C:microtubule; P:regulation oftranscription, DNA-dependent; F:tubulin deacetylase activity; C:histone deacetylase complex; F:actin binding; c:cytoplasm; F:histone deacetylase binding; P:nitrogen compound metabolic process; MGID2037051 histone deacetylase 6 998 10 3.92E-16 51.20% 16 C:tubulin complex

P:negative regulation of microtubule depolymerization; P:chromatin modification; P:ubiquitin-dependent protein catabolic process; F:histone deacetylase activity; F:zinc ion binding; P:protein amino acid deacetylation; P:protein polyubiquitination; C:microtubule; P:regulation of transcription, DNA·dependent; F:tubulin deacetylase activity; C:histone deacetylase complex; F:actin binding; C:cytoplasm; F:histone deacetylase binding; P:nitrogen compound metabolic process; MGID2076755 histone deacetylase 6 1607 10 7.44E-19 55.90% 16 C:tubulin complex

P:protein folding; F:binding; P:mycelium development; P:protein ubiquitination; F:ubiquitin-protein ligase activity; C:nucleus; MGID2040793 hsp chaperone complex subunit 815 10 3.39E-06 50.00% 8 C:cytosol; C:ubiquitin ligase complex

P:protein folding; F:binding; P:mycelium development; P:protein ubiquitination; F:ubiquitin-protein ligase activity; C:nucleus; MGID2075692 hsp chaperone complex subunit 1607 1.23E-05 48.88% 8 C:cytosol; C:ubiquitin ligase complex

F:binding; P:protein ubiquitination; F:ubiquitin-protein ligase MGID2054805 immediate-early fungal elicitor protein cmpg1 811 10 3.04E-07 51.00% 6 activity; C:plastid; C:mitochondrion; C:ubiquitin ligase complex P:Wnt receptor signaling pathway; F:protein binding; P:multicellular organismal development; P:ubiquitin-dependent MGID2068259 klh12 xenia ame: full=kelch-like protein 12 1129 10 6.12E-40 47.40% 5 protein catabolic process; P:ubiquitin cycle

C:cytoplasm; P:ubiquitin-dependent protein catabolic process; MGID1946605 leucine rich repeat family expressed 389 10 2.08E-12 51.50% 4 F:protein binding; P:protein modification process

C:neuron projection; P:positive regulation of MAPKKK cascade; P:activation of MAP KKK activity; P:regulation of exocytosis; P:exocrine system development; P:neurotransmitter secretion; P:regulation oftranscription, DNA-dependent; C:intrinsic to plasma membrane; F:GTP binding; F:calcium ion binding; P:protein transport; P:small GTPase mediated signal transduction; F:ubiquitin-protein ligase activity; P:protein ubiquitination; F:identical protein binding; C:perinuclear region of cytoplasm; F:GTPase activity; P:regulation of apoptosis; F:transcription factor binding; F:ATP binding; C:transcription MGID2054783 member ras oncogene family 1042 10 l.34E-22 54.70% 21 factor complex 219 P:regulation of receptor internalization; P:ubiquitin-dependent protein catabolic process; P:melanocyte differentiation; F:zinc ion binding; F:signal transducer activity; P:Notch signaling pathway; C:cytoplasmic vesicle; P:positive regulation of I-kappaS kinase/NF-kappaS cascade; C:eariy endosome; F:ubiquitin- protein ligase activity; F:actin binding; P:protein autoubiquitination; C:ubiquitin ligase complex; C:perinuclear MGID2079469 mib2 protein 1262 10 1.05E-06 48.60% 15 region of cytoplasm; F:protein homodimerization activity

MGID1944670 mitotic checkpoint protein bub3 449 10 3.35E-18 57 .90% 2 P:biological process; C:CUL4 RING ubiquitin ligase complex

P:ubiquitin-dependent protein catabolic process; F:ubiquitin thiolesterase activity; F:squalene-hopene cyclase activity; F:zinc MGID2054988 mynd finger family protein 851 10 l.27E-06 53 .00% 5 ion binding; P:protein deubiquitination

P:ubiquitin-dependent protein catabolic process; F:ubiquitin thiolesterase activity; F:squalene-hopene cyclase activity; F:zinc MG ID2075908 mynd finger family protein 1362 10 2.00E-06 52.70% 5 ion binding; P:protein deubiquitination

F:NAD or NADH binding; F:zinc ion binding; P:protein amino acid deacetylation; F:histone acetyltransferase binding; C:microtubule; C:chromatin silencing complex; P:negative regulation of striated muscle tissue development; F:tubulin deacetylase activity; F:ubiquitin binding; P:cell division; F:DNA binding; P:chromatin silencing; F:NAD-dependent histone deacetylase activity; P:mitosis; F:transcription factor binding; C:cytoplasm; F:histone deacetylase binding; P:nitrogen compound metabolic process; C:tubulin complex; C:histone MGID2027142 nad-dependent histone deacetylase 1330 10 4.28E -22 53.80% 21 deacetylase complex; C:transcription factor complex

F:ubiquitin-protein ligase activity; C:intracellular; P:ubiquitin MGID1985341 nedd4 protein 693 8 3.06E-04 52.75% 5 cycle; F:protein binding; P:protein ubiquitination

P:sodium ion transport; P:ubiquitin-dependent protein catabolic process; P:response to metal ion; P:cellular sodium ion homeostasis; P:positive regulation of endocytosis; F:sodium channel regulator activity; F:ubiquitin-protein ligase activity; P:excretion; P:protein ubiquitination; F:protein binding; P:regulation of protein catabolic process; P:interspecies interaction between organisms; C:cytoplasm; P:water MGID2055265 nedd4 protein 1633 10 9.01E-12 58.30% 14 homeostasis

P:ubiquitin-dependent protein catabolic process; P:sarcomere organization; F:zinc ion binding; F:ligase activity; F:protein MGID2048506 neuralized homolog 594 10 6.41E-09 65 .30% 7 binding; C:muscle tendon junction; C:VCS complex P:immune response; P:ubiquitin-dependent protein catabolic MGID2036338 otu ubiquitin aldehyde binding 1 1845 10 2.76E -12 45 .30% 3 process; F:cysteine-type peptidase activity P:ubiquitin-dependent protein catabolic process; F:ubiquitin thiolesterase activity; P:ubiquitin cycle; P:protein MGID1991203 peptidase c19 family protein 736 10 1.10E-26 56 .40% 4 deubiquitination C:cytoplasm; P:positive regulation oftranscription; F:transcription regulator activity; F:protein binding; P:protein MGID107043 plasmodium falciparum polyubiquitin 904 10 8. llE-63 78.80% 6 ubiquitination; C:nucleus C:cytoplasm; C:cell wall; C:nucleus; P:protein modification MGID2042393 polyubiquitin 1 1206 10 2.06E-04 50.50% 5 process; C:plasma membrane MGID1973919 polyubiquitin 2 708 10 2.03E-43 87 .50% 2 F:zinc ion binding; P:protein modification process MGID2025451 polyubiquitin ubi4 824 10 1.77E-18 74.60% 2 F:zinc ion binding; P:protein modification process

F:RNA binding; P:negative regulation of ubiquitin-protein ligase activity during mitotic cell cycle; C:mitochondrion; F:threonine- type endopeptidase activity; P:anaphase-promoting complex- dependent proteasomal ubiquitin-dependent protein catabolic process; P:positive regulation of ubiquitin-protein ligase activity during mitotic cell cycle; F:protein binding; C:proteasome core MGID2048982 [proteasome ( macropain) alpha 6 967 10 5.84E-70 73 .00% 9 complex; C:nucleus

P:negative regulation of SMP signaling pathway; P:negative regulation of ossification; C:intracellular; F:ubiquitin-protein ligase activity; P:cell differentiation; F:protein binding; P:ectoderm development; P:protein ubiquitination during MGID1947361 smad ubiquitination regulatory factor 1 533 10 2.57E-07 46.00% 8 ubiquitin-dependent protein catabolic process MGID1945086 small ubiquitin-like modifier 516 10 2.88E-37 70.90% 1 P:protein modification process F:small protein activating enzyme activity; P:ubiquitin cycle; MGID2030454 sumo-1 activating enzyme e1 n subunit 1121 10 2.65E -27 51.30% 5 F:binding; C:nucleus; F:ligase activity

220 P:cell differentiation; F:DNA binding; P:mRNA transport; F:ubiquitin-protein ligase activity; C:nucleus; P:protein MGID2030838 uba and wwe domain containing 1 1119 10 1.86E-26 57.70% 6 ubiquitination C:integral to membrane; P:regulation of protein metabolic process; F:small conjugating protein ligase activity; P:post- MGID107380 ubc6p like ubiquiting conjugating enzyme possible tr 927 10 2.17E-50 75.20% 4 translational protein modification MGID1945979 ubiQuitin 300 10 5.29E-32 100.00% 1 P:protein modification process C:ribosome; P:protein modification process; F:structural constituent of ribosome; F:zinc ion binding; P:translation; MG1D2041303 ubiquitin 40s ribosomal protein s27a fusion 834 10 8.17E-19 85.20% 6 P:ribosome biogenesis P:ubiquitin-dependent protein catabolic process; F:calcium ion binding; F:ubiquitin thiolesterase activity; C:membrane; MGID1984359 ubiquitin carboxyl-terminal 664 10 5.74E-10 40.90% 5 P:protein deubiquitination

P:RNA splicing; F:protein binding; P:ubiquitin-dependent protein catabolic process; P:mRNA processing; F:ubiquitin thiolesterase MGID2052656 ubiquitin carboxyl-terminal 1234 10 9.00E-96 65.00% 8 activity; C:nucleus; F:zinc ion binding; P:protein deubiquitination

C:cytoplasm; P:protein deubiquitination; P:eating behavior; P:ubiquitin-dependent protein catabolic process; F:ubiquitin MGID2078878 ubiquitin carboxyl-terminal esterase 13 958 10 2.87E-53 60.90% 6 thiolesterase activity; P:adult walking behavior P:ubiquitin-dependent protein catabolic process; F:ubiquitin thiolesterase activity; P:ubiquitin cycle; P:protein MGID2055229 ubiquitin carboxyl-terminal hydrolase 1402 10 8.96E-10 37.60% 4 deubiquitination MGID1963697 ubiquitin carboxyl-terminal hydrolase 6 831 10 2.21E-08 57.50% 0

P:ubiquitin-dependent protein catabolic process; F:ubiquitin MGID1945063 ubiquitin carboxyl-terminal hydrolase family protein 454 10 1.11E-05 60.40% 3 thiolesterase activity; P:protein deubiquitination P:ubiquitin-dependent protein catabolic process; F:ubiquitin thiolesterase activity; P:ubiquitin cycle; P:protein MGID2049337 ubiquitin carboxyl-terminal hydrolase family protein 804 10 3.07E-36 56.80% 4 deubiquitination

C:intracellular; P:ubiquitin-dependent protein catabolic process; MGID2028135 ubiquitin carboxyl-terminal hydrolase isozyme 13 1028 10 3.68E-57 58.30% 4 F:ubiquitin thiolesterase activity; P:protein deubiquitination MGID1987082 ubiquitin conjugating enzyme 791 10 2.16E-10 49.40% 1 P:protein modification process P:ubiquitin cycle; P:regulation of protein metabolic process; MGID1984306 ubiquitin conjugating enzyme 806 8 2.26E-55 84.25% 3 F:small conjugating protein ligase activity P:ubiquitin cycle; P:regulation of protein metabolic process; MGID1946804 ubiquitin conjugating enzyme 336 10 2.06E-10 65.70% 3 F:small conjugating protein ligase activity P:ubiquitin cycle; P:regulation of protein metabolic process; MGID2024467 ubiquitin conjugating enzyme 865 10 3.61E-81 79.50% 3 F:small conjugating protein ligase activity P:ubiquitin cycle; P:regulation of protein metabolic process; MGID1976492 ubiquitin conjugating enzyme 1 331 1 4.98E-06 85.00% 3 F:small conjugating protein ligase activity MGID1968753 ubiquitin conjugating enzyme 7 interacting protein 865 6 l.31E-06 60.83% 2 F:zinc ion binding; F:protein binding

MGID2022383 ubiquitin conjugating enzyme 7 interacting protein 788 10 6.90E-09 57.50% 3 F:zinc ion binding; F:nucleic acid binding; F:protein binding

P:nuclear migration; P:ubiquitin-dependent protein catabolic process; F:ubiquitin thiolesterase activity; P:germ cell migration; P:cellularization; P:negative regulation of proteolysis; P:mystery cell fate differentiation; P:embryonic development; MGID1957111 ubiquitin domain-containing protein 696 10 9.16E-09 59.60% 11 P:endocytosis; P:protein deubiquitination; C:cytoplasm MGID2063489 ubiquitin domain-containing protein 620 10 7.79E-08 48.40% 1 P:protein modification process MGID1948161 ubiquitin family protein 435 10 7.24E-74 98.20% 1 P:protein modification process MGID2032537 ubiquitin family protein 1157 4 1.48E-04 68.25% 1 P:protein modification process C:extracellular region; P:intein-mediated protein splicing; MGID2039702 ubiquitin family protein 862 10 6.23E-09 47.80% 3 P:pathogenesis C:cytoplasm; F:ubiquitin protein ligase binding; P:protein autoubiquitination; F:chaperone binding; P:protein polyubiquitination; F:ubiquitin-protein ligase activity; P:embryonic development ending in birth or egg hatching; MGID2020514 ubiquitin fusion degradation (yeast ufd homolog) fam 872 10 3.81E-06 55.10% 9 C:nucleus; C:ubiquitin ligase complex MGID2051780 ubiquitin isoform cra a 1313 10 4.65E-13 51.30% 2 C:nucleus; P:protein modification process MGID1962857 ubiquitin ligase complex f-box protein 802 10 9.24E-09 55.70% 1 F:protein binding

221 C:cytoplasm; F:ubiquitin-specific protease activity; C:nucleolus; F:protein binding; P:ubiquitin-dependent protein catabolic MGID1991202 ubiquitin specific peptidase 15 755 10 6.93E-24 62.40% 6 process; P:protein deubiquitination P:ubiquitin-dependent protein catabolic process; F:ubiquitin thiolesterase activity; F:zinc ion binding; P:protein MGIDI06568 ubiquitin specific peptidase 3 923 10 2.03E-08 40.80% 4 deubiquitination

F:Rab GTPase activator activity; C:lysosome; P:ubiquitin- dependent protein catabolic process; F:ubiquitin thiolesterase activity; F:nudeic acid binding; F:ca1cium ion binding; F:cysteine- type endopeptidase activity; P:regulation of Rab GTPase activity; MGID2073584 ubiquitin specific peptidase 32 1086 10 2.30E-20 45.20% 10 F:protein binding; P:protein deubiquitination F:glycylpeptide N-tetradecanoyltransferase activity; P:ubiquitin- dependent protein catabolic process; P:N-terminal protein myristoylation; F:ubiquitin thiolesterase activity; C:nucleus; P:acyl,arrier-protein biosynthetic process; P:protein MGID2031434 ubiquitin specific peptidase 36 948 10 1.31E-50 59.70% 7 deubiquitination

C:cytoplasm; P:ubiquitin-dependent protein catabolic process; MGID2031330 ubiquitin specific protease 31 817 10 4.30E-17 46.90% 4 F:ubiquitin thiolesterase activity; P:protein deubiquitination

P:ubiquitin-dependent protein catabolic process; F:ubiquitin thiolesterase activity; F:transcription initiation factor activity; F:zinc ion binding; P:transcription initiation; F:DNA binding; P:ubiquitin cycle; F:queuine tRNA-ribosyltransferase activity; C:transcription factor TFII 0 complex; P:queuosine biosynthetic MGID1994290 ubiquitin specific protease 43 709 10 3.98E-23 50.80% 12lprocess; P:protein deubiQuitination; P:regulation of transcription F:ATP binding; F:ubiquitin-protein ligase activity; F:small protein activating enzyme activity; P:ubiquitin cycle; P:protein MGID109427 ubiquitin-activating enzyme el 263 10 5.0SE-06 61.90% 5 ubiquitination MGID1974830 ubiquitin-associated domain,ontaining 770 3 1.46E-24 57.00% 0 P:ubiquitin cycle; P:regulation of protein metabolic process; MGID1976634 ubiquitin-conjugating enzyme 466 10 2.92E-54 86.30% 3 F:small conjugating protein ligase activity P:ubiquitin cycle; P:regulation of protein metabolic process; MGID1976063 ubiquitin-conjugating enzyme 800 10 4.41E-43 73.20% 3 F:small conjugating protein ligase activity

C:cytosol; P:negative regulation of ubiquitin-protein ligase activity during mitotic cell cycle; P:protein polyubiquitination; P:positive regulation of protein ubiquitination; P:BMP signaling pathway; P:anaphase-promoting complex-dependent proteasomal ubiquitin-dependent protein catabolic process; F:ubiquitin-protein ligase activity; P:positive regulation of ubiquitin-protein ligase activity during mitotic cell cycle; MGID1976064 ubiquitin-conjugating enzyme 817 10 4.59E-19 74.50% 9 C:nucleoplasm P:ubiquitin cycle; P:regulation of protein metabolic process; MGID1975837 ubiquitin-conjugating enzyme 698 10 1.22E-4S 91.00% 3 F:small conjugating protein ligase activity P:ubiquitin cycle; P:regulation of protein metabolic process; MGID1965228 ubiquitin-conjugating enzyme 662 10 1.62E-65 90.30% 3 F:small coniugatiOl1. protein ligase activity

P:regulation of protein metabolic process; F:small conjugating MGID1964821 ubiquitin-conjugating enzyme 765 10 1.10E-32 68.30% 3 protein ligase activity; P:post-translational protein modification P:ubiquitin cycle; P:regulation of protein metabolic process; MGID1962935 ubiquitin-conjugating enzyme 614 10 3.81E-23 89.50% 3 F:small conjugating protein ligase activity P:ubiquitin cycle; P:regulation of protein metabolic process; MGID1954817 ubiquitin-conjugating enzyme 446 10 2.76E-44 81.30% 3 F:small conjugating protein ligase activity P:ubiquitin cycle; P:regulation of protein metabolic process; MGID1944212 ubiquitin-conjugating enzyme 590 10 1.41E-69 87.40% 3 F:small conjugating protein ligase activity P:ubiquitin cycle; P:regulation of protein metabolic process; MGID1949460 ubiquitin-conjugating enzyme 529 10 4.55E-41 70.30% 3 F:small conjugating protein ligase activity P:ubiquitin cycle; P:regulation of protein metabolic process; MGID1941007 ubiquitin-conjugating enzyme 663 10 5.13E-51 74.40% 3 F:small conjugating protein ligase activity P:ubiquitin cycle; P:regulation of protein metabolic process; MGID2036731 ubiquitin-conjugating enzyme 821 10 5.97E-59 88.00% 3 F:small conjugating protein ligase activity P:ubiquitin cycle; P:regulation of protein metabolic process; MGID2038699 ubiquitin-conjugating enzyme 1117 10 1.50E-30 62.50% 3 F:small conjugating protein ligase activity P:ubiquitin cycle; P:regulation of protein metabolic process; MGID2040596 ubiquitin-conjugating enzyme 614 10 2.43E-70 88.60% 3 F:small conjugating protein ligase activity P:ubiquitin cycle; P:regulation of protein metabolic process; MGID2041278 ubiQuitin-conjugating enzyme 644 10 4.05E-66 90.30% 3 F:small conjugating protein ligase activity P:ubiquitin cycle; P:regulation of protein metabolic process; MGID2046806 ubiquitin-conjugating enzyme 1050 10 2.70E-71 89.20% 3 F:small conjugating protein ligase activity P:ubiquitin cycle; P:regulation of protein metabolic process; MGID2049051 ubiquitin-conjugating enzyme 840 10 2.47E-55 84.00% 3 F:small conjugating protein ligase activity P:ubiquitin cycle; P:regulation of protein metabolic process; MGID2056449 ubiquitin-conjugating enzyme 620 10 1.14E-67 88.30% 3 F:small conjugating protein ligase activity P:ubiquitin cycle; P:regulation of protein metabolic process; MGID2058457 ubiquitin.conjugating enzyme 626 10 4.03E-68 88.90% 3 F:small conjugating protein ligase activity P:ubiquitin cycle; P:regulation of protein metabolic process; MGID2062954 ubiquitin-conjugating enzyme 360 10 3.90E-35 91.30% 3 F:small ConiURatinR protein liRase activity P:ubiquitin cycle; P:regulation of protein metabolic process; MGID2066195 ubiquitin-conjugating enzyme 554 10 6.35E-39 69.60% 3 F:small conjugating protein ligase activity

222 F:ubiquitin-protein ligase activity; P:ubiquitin cycle; P:regulation MG1D2041887 ubiquitin-conjugating enzyme e2 578 10 2.32E-53 76.20% 4 of protein metabolic process; P:protein ubiquitination

P:regulation of protein metabolic process; F:small conjugating MGID1985696 ubiquitin-conjugating enzyme e2 z-like 730 10 3.72E-27 55.10% 3 protein ligase activity; P:post-translational protein modification C:cytoplasm; C:nucleolus; P:apoptosis; P:ubiquitin-dependent protein catabolic process; F:ubiquitin-protein ligase activity; P:regulation of protein metabolic process; P:protein MGID2027441 ubiquitin-conjugating enzyme e2z 1036 10 1.20E-31 52.10% 7 ubiquitination MGID2045222 ubiquitin-fold modifier 1 472 10 5.51E-13 68.20% 1 P:ubiquitin cycle MGID1984819 ubiquitin-fusion protein 758 10 7.02E-16 69.20% 1 P:protein modification process

C:Golgi apparatus; F:protein binding; P:protein ubiquitination; MGID2057562 ubiquitin-protein ligase e3 636 10 1.69[-16 57.30% 5 P:signal transduction; F:ubiquitin-protein ligase activity C:cytoplasmic membrane-bounded veSicle; F:zinc ion binding; MGID2061456 ubiquitin-protein ligase e3 2152 10 2. 91E-08 65.80% 3 F:protein binding MGID2053073 ubiquitin-protein ligase-like 830 5.99E-06 54.57% 1 F:zinc ion binding C:cytoplasm; F:ubiquitin protein ligase binding; P:protein autoubiquitination; F:chaperone binding; P:protein polyubiquitination; F:ubiquitin-protein ligase activity; P:embryonic development ending in birth or egg hatching; MGID1985376 u-box domain protein 887 10 1.77E-06 51.70% 9 C:nucleus; C:ubiquitin ligase complex F:ubiquitin-protein ligase activity; C:ubiquitin ligase complex; MGID1964714 U-bOK domain protein 762 10 1.10E-08 59.00% 3 P:protein ubiquitination F:protein binding; P:protein ubiquitination; F:ubiquitin-protein ligase activity; F:zinc ion binding; C:plastid; C:ubiquitin ligase MGID2020548 u-box domain protein 854 10 8.83E-08 53.50% 6 complex F:ubiquitin-protein ligase activity; C:ubiquitin ligase complex; MGID2022434 u-box domain protein 838 10 1.36E-05 58.80% 4 F:binding; P:protein ubiquitination

F:ubiquitin-protein ligase activity; C:ubiquitin ligase complex; MGID1958259 von willebrand factor type a domain containing prote 836 10 6.68E-21 64.70% 4 P:protein ubiquitination; P:intein-mediated protein splicing

F:ubiquitin-protein ligase activity; C:ubiquitin ligase complex; MGID2021019 von willebrand factor type a domain containing prote 784 10 1.05E-25 59.00% 4 P:protein ubiquitination; P:intein-mediated protein spliCing F:ubiquitin-protein ligase activity; C:ubiquiti n ligase complex; MGID2056311 wd sam and u-box domain containing 1 830 10 1.47E-12 69.90% 3 P:protein ubiquitination

P:protein amino acid phosphorylation; P:response to stress; P:protein ubiquitination; F:ATP binding; F:ubiquitin-protein ligase activity; F:protein serine/threonine kinase activity; C:ubiquitin MGID104206 wd sam and u-box domain containing 1 680 10 6.93E-14 60.00% 8 ligase complex; P:serine family amino acid metabolic process

F:receptor activity; P:protein amino acid phosphorylation; P:protein ubiquitination; F:ATP binding; F:ubiquitin-protein ligase activity; F:protein serine/threonine kinase activity; C:ubiquitin ligase complex; P:signal transduction; P:serine family amino acid MGI02053718 wd sterile alpha motif and u-box domain containing 1 810 10 1.46E-09 62.30% 9 metabolic process

F:hydrolase activity, hydrolyzing O-glycosyl compounds; P:carbohydrate metabolic process; P:protein ubiquitination; MGID1987113 wd40 repeat protein 673 3.37E-05 43.50% 5 F:ubiquitin-protein ligase activity; C:ubiquitin ligase complex P:innate immune response; P:protein ubiquitination; C:intrinsic to membrane; P:signal transduction; F:ubiquitin-protein ligase activity; F:transmembrane receptor activity; C:ubiquitin ligase MGID1976985 wd-40 repeat protein 760 10 1. 12E-29 51.60% 7 complex

F:protein binding; P:ubiquitin-dependent protein catabolic process; p:ubiquitin cycle; P:locomotion; F:ubiquitin-protein ligase activity; P:embryonic development ending in birth or egg hatching; P:entry of virus into host cell; C:ubiquitin ligase MGID2080865 ww domain containing protein 819 10 5.87E-06 42.90% 9 complex; P:protein ubiquitination

P:ubiquitin-dependent protein catabolic process; C:axon; F:zinc ion binding; P:negative regulation of axon extension; F:DNA binding; F:ubiquitin-protein ligase activity; C:PML body; F:transcription activator activity; F:protein binding; P:positive regulation oftranscription, DNA-dependent; C:cytoplasm; MGID1943053 zinc finger 571 10 3.07E-10 53.90% 12 P:protein ubiquitination P:defense response to fungus; F:ubiquitin-protein ligase activity; P:response to chitin; F:zinc ion binding; F:protein binding; MGID2028256 zinc finger (c3hc4-type ring finger) family protein 1073 10 9. 77E-08 65.30% 6 P:protein ubiquitination C:proteasome core complex; F:threonine-type endopeptidase activity; P:ubiquitin-dependent protein catabolic process; MGID2019194 proteasome (prosome macropain) subunit beta type 1022 10 3.09E-48 65.20% 4 C:nucleus C:proteasome core complex; F:threonine-type endopeptidase activity; P:ubiquitin-dependent protein catabolic process; MGID2066412 Iproteasome subunit beta type 7-a precursor 583 10 1.58EAl 75.90% 4 C:nucleus

223