<<

J. Phycol. 42, 96–103 (2005) r 2005 Phycological Society of America DOI: 10.1111/j.1529-8817.2005.00165.x

NOVEL AND RAPIDLY DIVERGING INTERGENIC SEQUENCES BETWEEN TANDEM REPEATS OF THE GENES IN SEVEN SPECIES1

Liyun Liu and J. Woodland Hastings2 Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, Massachusetts 02138, USA

Tandemly arranged luciferase genes were previ- Our previous studies of the structure of dinoflagel- ously reported in two dinoflagellates species, but lates genes and their circadian regulation revealed that their intergenic regions were strikingly different several occur in tandemly arranged copies (Le et al. and no canonical promoter sequences were found. 1997, Li and Hastings 1998, Okamoto et al. 2001). Here, we examined the intergenic regions of the Other than for ribosomal genes (Sollner-Webb and luciferase genes of five other dinoflagellate species Tower 1986) and a few protein-coding genes in two along with those of the earlier two. In all cases, the , Trypanosoma brucei and (Lee and genes exist in multiple copies and are arranged Van der Ploeg 1997, Suarez et al. 1998), such an ar- tandemly, coding for proteins of similar sizes and rangement is not known in other . Indeed, sequences. However, the 50 untranslated region, 30 it is well known that the dinoflagellate nucleus is very untranslated region, and intergenic regions of the unusual; its envelope remains intact throughout the seven genes differ greatly in length and sequence, cell cycle, with the separation of the chromosomes in except for two stretches that are conserved in the mitosis being carried out by an external mitotic spindle intergenic regions of two pairs of phylogenetically (Taylor 1987). Microscopically, the chromosomes have close species. Microsatellites and minisatellites a ‘‘cholesteric-like liquid crystalline organization,’’ with were detected in the intergenic sequences of four a characteristic banding pattern in transmission elec- species, Alexandrium affine (H. Inoue & Y. Fukuyo) tron microscopy and a whorled appearance in the 3D E. Balech, A. tamarense (Lebour) E. Balech, Proto- models (Herzog and Soyer 1981, Spector 1984, Soyer- reticulatum (Clapare`de & Lachmann) Gobillard and Geraud 1992, Rizzo 2003). Dinoflagel- Butschli, and Pyrocystis lunula (Schu¨tt) Schu¨tt, the lates typically possess a large amount of DNA (up to 40 first three of which have unusually high percentages times that of the human cell) and an enormous of particular sets of dinucleotides. Most remarkably, number of chromosomes (4100), which lack typical the P.reticulatum intergenic region is almost exclu- eukaryotic histones and nucleosomes, and remain con- sively made up of 19 nearly identical repeats of an densed permanently. This last feature raises the ques- 11-nucleotide sequence. Dinoflagellate luciferase tion of how transcription factors and RNA polymerase intergenic regions bear similarities to ribosomal have access to the DNA, because in other eukaryotes genes and to some protein-encoding genes in try- transcription is temporarily repressed when the chro- panosomes, both of which are transcribed by RNA mosomes are condensed during mitosis (White et al. polymerase I. It is possible that the transcription of 1995, Gottesfeld and Forbes 1997). the dinoflagellate genes are catalyzed by an RNA To understand the molecular machinery for tran- polymerase with novel properties. scription and its regulation, workers have studied pro- Key index words: intergenic region; luciferase; mi- teins that interact with the chromosomal DNA. crosatellites; minisatellites; promoter; tandem genes Histone-like proteins have been isolated from a few dinoflagellate species (Rizzo 1981) and one from Ling- Abbreviations: Aa, Alexandrium affine;At,Ale- ulodinium polyedrum (Lp) binds weakly to DNA (Chud- xandrium tamarense; LCF and lcf, luciferase protein novsky et al. 2002). In addition, two transcription and gene; Lp, Lingulodinium polyedrum (F. Stein) factors, a WW protein and a TATA box bind- J.D. Dodge; Pf, Pyrocystis fusiformis (Wyville– ing protein (TBP), have been identified in the dino- Thomson ex Haeckel) Blackman; Pl, Pyrocystis flagellate Crypthecodinium cohnii (Guillebault et al. 2001, lunula; Pn, Pyrocystis noctiluca Murray ex Haec- 2002). At the sequence level, a C. cohnii TBP homolog kel; Pr, Protoceratium reticulatum; TEM, transmis- differs considerably from those of other eukaryotes, sion electron microscopy; 30 UTR, 30 untranslated and its expressed protein can bind weakly to a syn- region; 50 UTR, 50 untranslated region thetic TATA box and more strongly to a TTTT box. Sequences in the 50 upstream region (up to 500 nt from the translation start codon) have been obtained previously for seven dinoflagellate genes (Le et al. 1997, Li and Hastings 1998, Okamoto et al. 2001, 1Received 19 May 2005. Accepted 20 October 2005. Guillebault et al. 2002). None of them have canonical 2Author for correspondence: e-mail [email protected]. eukaryotic promoter sequences such as a TATA box, a

96 INTERGENIC SEQUENCES OF 97

CAAT box, or an initiator element. Furthermore, no pelleted at 10,000g for 30 min, washed once with 80% sequence similarities were found among these, except ethanol, and dissolved in TE (10 mM Tris, 1 mM EDTA, for a 13-nt sequence present in the 50 upstream region pH 8.0). One-third volume of 8M LiCl was then added; of both the peridinin binding protein and luciferase after 3h on ice, RNA was obtained by centrifugation at 10,000g for 30 min. The DNA was recovered from the supe- genes of Lp (Li and Hastings 1998), and GC-rich se- rnatant by adding an equal volume of isopropanol. Residual quences in both Pyrocystis lunula (Pl) luciferase and Peri- DNA in the RNA sample was digested with DNase I (Ambion dinium bipes ferredoxin genes(Yoshikawa et al. 1997, Inc., Austin, TX, USA) at 0.5 U per 10 mgRNAin10mL Okamoto et al. 2001). Neither of the above two se- at 371 C for 20 min. Similarly, trace RNA was removed from quences was demonstrated to mediate transcription. the genomic DNA using RNase A (Sigma Inc., St. Louis, MO, Dinoflagellate luciferase catalyzes a light-emitting USA). DNase I and RNase A were inactivated by phenol:chlo- roform (1:1). One-tenth volume of 3M sodium acetate reaction in which the (a tetrapyrrole) is oxi- (pH 5.2) and 2.5 volumes of ethanol were added to the aque- dized to give a product in an electronically excited state ous phase containing DNA or RNA. DNA or RNA was pel- (Wilson and Hastings 1998). In Lp, the protein itself leted, washed with 80% ethanol and resuspended in TE. was found to consist of three intramolecular repeats, After measurement of concentration by OD at 260 nm, the each of which can act as an active luciferase (Li et al. DNA or RNA was stored at 801 Cuntiluse. 1997). The coding sequences of luciferase were later RNA electrophoresis and Northern blotting: Ten micrograms of total RNA were denatured at 751 C for 5 min, separated on isolated from six other photosynthetic dinoflagellate a 1.4% agarose gel by electrophoresis, and transferred onto a species; they share high sequence similarity to the Lp N þ nylon membrane (Amersham Biosciences Corp., Piscat- luciferase gene and possess a similar three-repeat con- away, NJ, USA) as described (Pelle and Murphy 1993). After figuration (Okamoto et al. 2001, Liu et al. 2004). Both the transfer, the blot was baked at 801 C for 1 h to fix the RNA Lp and Pl lcf genes were found to occur in tandem andprehybridizedfor1hat501 C in the hybridization buffer copies, separated by intergenic sequences with no simi- (20% formamide, 6SSPE,0.5%SDS,5Denhardt’s reagent, larity to each other or to others in the database (Li and and 20 mg/mL sonicated salmon sperm DNA). Afterwards, radioactive probes generated from the full-length Lp lcf DNA Hastings 1998, Okamoto et al. 2001). by random priming labeling (Amersham) were added to In the present report, the genomic structure and 106 cpm/mL. The hybridization lasted for 12 h and the blot organization of luciferase genes from five additional was washed with three consecutive washes (2SSPE, 0.1% species were determined and analyzed along with SDS, 15 min, 201 C; 0.5SSPE, 0.1% SDS, 15 min, 601 C; those of Lp and Pl. All were found to occur in tandem 0.2SSPE,0.1%SDS,15min,601 C). The blot was detected copies and, except for closely related species, to have by a phosphorimaging machine. Protein extraction and Western blotting: To prepare the total very different intergenic sequences and no evident proteins, the cells were harvested and homogenized as for promoter elements in the region upstream from the the nucleic acids described above but resuspended in protein 0 0 5 untranslated region (5 UTR). The results suggest extraction buffer (100 mM Tris-HCl, 10 mM EDTA, and that the RNA polymerase responsible for transcription 5 mM 2-mecaptethanol, pH 8.5) (Johnson et al. 1984). After of the luciferase genes, and probably other dinoflagel- centrifugation, 10 mg of total protein from the supernatant lates genes, is likely to be very different from those in was adjusted to 1 Laemmli gel loading buffer and sepa- rated on 10% SDS-polyacrylamide gel by electrophoresis most eukaryotes. at 200 V for 50 min (Laemmli 1970). The separated protein was then transferred onto nitrocelluose BA 85 (Schleicher & Schuell, Keene, NH, USA) and probed with an affinity- MATERIALS AND METHODS purified luciferase antibody (Knaust et al. 1998). The blot was detected by chemiluminescence (Amersham) following Algal cultures: Dinoflagellates were grown in F2 medium the manufacturer’s instructions. Protein concentration was under 12:12 LD cycles at 191 C and a light intensity of determined by the Bradford method (Bradford 1976). 150 mmol photons m 2 s 1 (Guillard and Ryther 1962, 30 RACE, and PCR amplification of the intergenic re- Liu et al. 2004). gions: Based on the protein coding sequences of Alexandrium Isolation of genomic DNA and total RNA: Cells of Lp and affine (Aa), Alexandrium tamarense (At), Pf, Pyrocystis noctiluca Pyrocystis fusiformis (Wyville–Thomson ex Haeckel) Blackman Murray ex Haeckel (Pn) and Pr, reported previously (Liu (Pf) were collected by centrifugation at 5000g for 10-min at et al. 2004), two nested sense primers for 30 RACE and two 41 CandthoseofProtoceratium reticulatum (Pr) and two Ale- nested antisense primers for 50 RACE were designed for each xandrium species by vacuum filtration on Whatman paper species (Table 1). Five micrograms of total RNA were tran- 541(Whatman GFC, Maidstone, UK). Cells were ground into scribed with a modified oligo dT primer (for 30 RACE) or a a fine powder under liquid in a mortar and the iso- specific primer derived from the coding sequence (for 50 lation of total RNA and genomic DNA was immediately car- RACE) to generate cDNA using a kit (Invitrogen, Carlsbad, ried out according to a phenol-based protocol (Bugos et al. CA, USA). For 50 RACE, only amplification of the Pf cDNA 1995). In brief, the powder was mixed quickly with 10 mL yielded specific PCR products. 50 UTRs for At and Pn were extraction buffer (100 mM Tris-HCl, pH 9.0, 200 mM NaCl, amplified with the nested 50 RACE primers paired with T7 15 mM EDTA, 0.5% Sarkosyl, and 100 mM b-mecaptoethanol primer using their respective phage library lysates (Strata- freshly added) per gram of cell pellet. To the slurry were gene, La Jolla, CA, USA). The 50 UTRs of Aa and Pr were not added sequentially an equal volume of phenol buffered at pH determined. The PCR conditions for 30 RACE and 50 RACE 8.0 (VWR), 1/5 volume of chloroform, and 1/10 of 3 M sodi- were described in the manual accompanying the kit. The um acetate (pH 5.2). After 15 min incubation at 41 C, the sense primers for the amplification of the intergenic sequenc- sample was centrifuged at 10,000g for 30 min. The supernat- es were derived from the 30 untranslated region (30 UTRs ), ant was then mixed well with an equal volume of isopropanol while the antisense primers were the same as those for 50 and chilled at 701 C for 20 min. Total nucleic acids were RACE. The PCR amplifications were run for one cycle at 98 LIYUN LIU AND J. WOODLAND HASTINGS

941 C for 1 min, and 35 cycles of 941 C for 30 s, 551 Cfor TABLE 1. Features of dinoflagellate luciferase genes. 1 min and 721 C for 2–5 min using Taq DNA polymerase (Qiagen, Chatsworth, CA, USA). PCR products were cloned Aa At Lp Pf Pl Pn Pr into pCR2.1 (Invitrogen) and sequenced using the Big- dye kit (Applied Biosystems, Foster City, CA, USA). The in- Full gene 6132 6762 5495 5365 6463 4762 4303 tergenic sequences of Aa, At, Pf, Pn, and Pr were deposited in IR þ 50 UTR 2162 2798 1527 1381 2320 784 323 the GenBank under the accession numbers: AY766382, 50 UTR NDa 437 71 464 469 426 ND ORF 3714 3711 3723 3726 3711 3699 3711 AY766382, AY766384, AY766385, and AY 0 766386, respectively. 3 UTR 238 253 245 194 432 253 269 Copy number determination: One microgram of genomic GC% DNA or various amounts of the full-length Lp lcf cDNA cloned Full gene 57 60 61 58 57 56 60 in pGEX-5X-1 (Pharmacia, Pfizer, Groton, CT, USA) in 10 mL IR þ 50 UTR 57 64 59 62 61 55 44 TE were denatured at 951 C for 2 min after mixing with 10 mL 30 UTR 54 52 60 53 52 51 58 of 0.5 N NaOH and 1.5 M NaCl. The denatured DNA was im- ORF 58 58 62 57 56 57 61 mediately quenched on ice and subsequently neutralized by a 20 mL of 3 M potassium acetate (pH 5.5). The mixture was ap- ND means that the sequence could not be determined by plied onto a N þ nylon membrane (Amersham) under vacuum RACE PCR. 0 0 0 0 using a slot apparatus (Schleicher & Schuell, Keene, NH, 3 UTR, 3 untranslated region; 5 UTR, 5 untranslated re- USA), followed by rinsing with 1 mL of 10 SSC and 5 gion; Aa, Alexandrium affine;At,Alexandrium tamarense; Lp, SSC. The membrane was then baked at 801 C for 1 h to im- Lingulodinium polyedrum (F. Stein) J.D. Dodge; Pf, Pyrocystis mobilize the DNA. Hybridization was carried out as described fusiformis (Wyville–Thomson ex Haeckel) Blackman; Pl, Pyro- above for Northern blotting. cystis lunula; Pn, Pyrocystis noctiluca Murray ex Haeckel; Pr, Bioinformatics analysis: BioEdit was used to manipulate Protoceratium reticulatum. DNA sequences. A computer program was developed and used for statistical analysis of dinucleotide frequency. Microsat- ellite and minisatellite sequences were detected by Tandem Re- also occur within the genome, such as a head-to-head peat Finder (Benson 1999). Searches for sequence homology arrangement, single gene copies, or those with were conducted by BLAST (Altschul et al. 1990) and sequence 0 alignment by Clustal X (Thompson et al. 1997). To search for different 3 UTRs. Thus, while tandem repeats are potential transcription factor binding sites, the lcf sequences evidently present, copies in different conformations 300 nt upstream of the conceptional translation start site were may also be present. subjected to interrogation by PLACE (Higo et al. 1999) and Intergenic sequences of dinoflagellate luciferases vary in TFSEARCH (Heinemeyer et al. 1998). length and sequence: The sizes of different parts of the genes and their base compositions are given in Table 1 RESULTS for all seven luciferase genes. The precise transcription Luciferase genes from seven bioluminescent species are start site has been determined only for the Lp luciferase all organized as tandem repeats: Phylogenetic analysis gene, so the exact lengths of the intergenic regions for based on the amino acid sequences of the luciferases the other six are not known. Western and Northern from seven species resolved the species into three analyses showed that, as for Lp, Pl, and Pn (Knaust clades: those of two Alexandrium species and Pro- et al. 1998), luciferases from Aa, At, Pf, and Pr have toceratium reticulatum in one clade, three Pyrocystis spe- protein masses of 137 kDa and a transcript size of cies in another, and that of L. polyedrum itself as a third 4 kb (Fig. 1). The 30 UTRs of these luciferase genes clade (Liu et al. 2004). In two previous studies, the are similar in size (194–432 nt), and the lengths of all genes of Lp and Pl were shown to be arranged in 50 UTRs should be comparable and likely less than tandem and separated by intergenic sequences of dif- 200 nt. Thus, the intergenic regions of luciferases from ferent lengths (Li and Hastings 1998, Okamoto et al. different species differ in length by nearly 10-fold, 2001). In Lp, the two adjacent copies of lcf in the re- from less than 200 nt for Pr to more than 2 kb for gion studied were identical but in Pl they were slight- two Alexandrium species (Table 1). ly different. Using the BLAST program, pairwise comparisons Recently, we found that lcf also occurs in tandem of the seven luciferases were performed for the se- repeats in five other luminescent species. To isolate quences upstream of the start codon. Two blocks of these sequences, we performed a nested PCR by using sequence similarities were detected in the intergenic a combination of forward primers derived from the lcf region in each of two pairs, Aa/At and Pf/Pn. The first 30 UTRs and reverse primers from the 50end ORFs block, which was close to the translational stop codon, with genomic DNA as a template (Table 1, supplemen- embraced 350 nt and included only intergenic tary material). We were able to obtain specific PCR sequences. These were 65% identical in Aa/At and products for all five; sequencing of these after cloning 92% in Pf/Pn (Fig. 2). The second block is closer to confirmed that they are indeed the intergenic regions, the translational start codon, encompassing not only flanked by the ORFs of two lcf genes. For each species, the intergenic sequence but also part of the 50 UTR, at least six independent clones were isolated and se- with different lengths in the two pairs. In this block, quenced. All were identical except for Pr where one Aa/At shared a 74% identity over a sequence of 110 clone was slightly different in sequence from all others. bases while Pf/Pn lcfs had an 80% identity for a se- It should be noted that the PCR strategy used would quence of 310 bases. No significant similarity was not detect other possible lcf arrangements that might observed between other pairs of lcf intergenic regions. INTERGENIC SEQUENCES OF LUCIFERASES 99

minisatellites are present in the same intergenic re- gion. The minisatellite nt sequences differ among the species, the longest of which was 21 copies of an 11 nt sequence (TGTGTGTGGT) in Aa. The intergenic sequence of the Pr lcf gene was the most unusual. The combined length of its intergenic sequence and 50 UTR is only 326 nt, most of which consisted of 19 nearly perfect repeats of an 11nt se- quence, GTCAATCAATC (Fig. 3). The minisatellite se- quence was flanked by a proximal 67 bp GC-rich (72%) sequence, likely the 50 UTR, and a distal 32 nt AT-rich (78%) sequence, potentially a polyadenylation and transcription termination signal. The minisatellite it- self contained a tandem repeat of a microsatellite se- quence (AATC). A search of GenBank identified a similar minisatellite with eight repeats (AT- AAATCAATC) in the genome of the archeon Met- hanosarcina acetivorans. Out of 11 nucleotides in a repeat unit, only two of them differed. Sequencing of 10 independent clones of Pr lcf intergenic sequences revealed two slightly variant ( 6%) sequences, with one shorter by 11 nucleotides. Potential transcriptional start sites in the intergenic re- gions: Several programs have been developed to pre- dict transcription factor binding sites based on DNA sequences. We used two programs, PLACE and TFSEARCH, to examine the intergenic lcf sequenc- es. Each predicts a variety of transcriptional factor binding sites. For instance, the 11-nucleotide se- quence (GTCAATCAATC) of Pr lcf intergenic regions closely matched the consensus of the Pbx-1 site (AN- FIG. 1. Sizes of the luciferase protein and mRNA are similar CAATCAW; W denotes A or T) and more loosely that in different species. (A) A protein gel blot of four species probed of the CCAAT box (RRCCAATNR; R denotes A and with affinity-purified polyclonal antibodies against Lingulodinium G).However,forallsevenluciferases,noTATAbox polyedrum (F. Stein) J.D. Dodge (Lp) LCF and detected by chemi- was predicted within the 300 bp region upstream of luminescence. Size markers indicate that proteins of all five spe- cies are 137 kDa. (B) Northern analysis of total RNA of five the translation initiation site. When the sequences of 0 species hybridized with radioactively labeled Lp full-length lcf. the 3 UTRs of the seven lcfs were compared in pairs, The mRNAs of the five species are all 4.0 kb. significant identities were seen only for two, Aa/At, 85% and Pf/Pn, 66% identical. The copy number of luciferase genes is high in the gen- Microsatellites and minisatellites occur in the inter- ome: The Lp and Pl lcf genes have been reported to genic regions of Aa, At, Pr, and Pl luciferases. Visual occur in more than 100 copies per genome (Li and inspection of the intergenic sequences easily showed a Hastings 1998, Okamoto et al. 2001). Here, we de- high frequency of GT dinucleotides in Aa and At lcfs. termined the copy numbers of the five other lucif- The frequency of dinucleotide sequences was then erases using the full-length lcf cDNA cloned in a computed, and the result is given in Table 2 (supple- plasmid pGEX-5X-1 as the standard (Fig. 4). mentary material) for all seven species, showing that Under the assumption that the haploid genome size Aa, At, and Pr had unusually high percentages of cer- for all is 200 pg, as established previously for Lp tain sets. This highly skewed composition of dinucleo- (Spector 1984), the copy numbers of luciferase genes tides in these species indicated the presence of in seven species fall in the range of 44–160 per ha- microsatellites and minisatellites (Li 1997). ploid genome. However, as the genome sizes of the In order to identify microsatellites and minisatel- other species are not known, and may be much small- lites, we analyzed the seven intergenic sequences by er than Lp, copy numbers could be lower, possibly as Tandem Repeat Finder (Benson 1999), which is able to small as two. Also, it is not known whether the tan- detect repeats ranging from 2 nt to 1 kb. The inter- dem units involve only two genes or three or more. genic sequences from the three species noted above, as well as from Pl, contain microsatellites and minisatel- lites with repeat lengths of 2–15 nt, copy numbers of DISCUSSION 2–50, and percentage matches of 75–100 (Table 2). In An important question that prompted this study is all four, more than two types of microsatellites and/or the identity and location of the promoter for the tan- 100 LIYUN LIU AND J. WOODLAND HASTINGS

FIG. 2. Intergenic regions of seven lucif- erases. The TGAs and ATGs at left and right, respectively, define the beginning and end of the noncoding sequences (30 untranslated re- gion [30 UTR], intergenic region and 50 un- translated region [50 UTR]) between two tandem units, drawn in proportion to their lengths using the reference ruler shown at the bottom of the diagram. 30 UTRs are repre- sented by the stippled narrower rectangular boxes and the intergenic region and 50 UTRs by the wider ones. Regions that show similarity between two pairs of the species (Alexandrium affine/Alexandrium tamarenseand Pyrocystis fusi- formis [Wyville–Thomson ex Haeckel] Black- man/Pyrocystis noctiluca Murray ex Haeckel) occur at the 30 and 50 ends; they are connect- ed with pairs of lines and have the lengths and sequence identities shown. The regions with closely spaced vertical lines denote microsatel- lites or minisatellites, with the numbers indi- cating their lengths in nucleotides.

demly repeated lcf gene; no promoter is known for any elements characteristic of prokaryotic promoters dinoflagellate gene. We had expected to provide an (Rosenberg and Court 1979). In the intergenic answer by sequencing and identifying conserved ele- sequences of the other six lcfs, neither the 13 nt ments in the 50 upstream and intergenic regions of lcf sequence (CGTGAACGCAGTG) common to the inter- genes from different related species. It turned out that, genic sequences of Lp pcp and lcf (Li and Hastings even though the tandem organization is a feature con- 1998), nor the conserved 13-bp sequence served in all seven lcfs, their intergenic sequences share (CTTGAATGCAGAA) in the same region of the dino- little similarity, except between very closely related flagellate pcp (Reichman et al. 2003) was found. Thus, species in certain regions. As earlier reported for the the lcf intergenic regions are not conserved at the pcp and lcf genes of Lp and lcf of Pl (Le et al. 1997, Li sequence level among the dinoflagellates. and Hastings 1998, Okamoto et al. 2001), the first 300 However, as suggested from studies of other tan- bases upstream of the translation start site of the five demly repeated genes, transcription could neverthe- additional lcfs reported here also do not contain a less be initiated from the intergenic region, or possibly TATA box typical of many promoters of eukaryotic further upstream of the tandem repeats. Other than protein-coding genes (FitzGerald et al. 2004). Nor dinoflagellate genes, tandem organization has been re- do they contain 10 (TATAAG) and 35 (TTGACA) ported only for the rhoptry-associated protein-1 (rap-

TABLE 2. Microsatellite and minisatellite sequences in the intergenic regions of the luciferases.

Positions

Species name Start End Repeat size Copy number Percentage of matches Percentage of indels Consensus sequences Aa 664 693 3 10.0 100 0 TGT 837 907 2 37.0 75 13 GT 841 907 9 7.6 82 14 TGTGTGTTG 1061 1095 4 8.8 100 0 TTGG 1437 1494 14 4.1 77 0 TGCGTGTATTTGTG 1516 1550 12 2.9 91 0 GTGCGTCTGCGG 1566 1610 8 5.6 100 0 GTGCGCGA 1625 1831 10 20.5 83 10 TGTGTGTGGT At 2203 2259 10 5.7 80 0 GCGTGCACGC 2285 2319 12 2.9 91 0 GCGCGTGTGCGT 2361 2409 2 24.5 82 0 GT 2427 2494 4 17.5 81 6 GCGT 2509 2558 12 4.2 97 0 GCAAGCGTGCGT Pl 1581 1680 2 50.0 77 4 TG 1588 1679 10 8.8 84 11 GTGTGTGTCT Pr 30 58 15 1.9 100 0 GTGGTCAATCAATGT 48 253 11 18.7 94 0 GTCAATCAATC

Aa, Alexandrium affine;At,Alexandrium tamarense; Pl, Pyrocystis lunula;Pr,Protoceratium reticulatum. INTERGENIC SEQUENCES OF LUCIFERASES 101

FIG. 4. Determination of copy numbers of lcfs in seven spe- cies by slot-blot hybridization. Various copy numbers (104–107)of Lingulodinium polyedrum (F. Stein) J.D. Dodge (Lp) lcf cloned into pGEX 5X-1 (pGex5x-1_Lplcf) were loaded into the slots to gen- erate a standard after hybridization with radioactively labeled Lp lcf and analysis by densitometry (lower panel). The calculation of the copy numbers of lcfs of seven was based on the signal inten- sity resulting from the hybridization of the Lp lcf with one mi- crogram of genomic DNA of each species (upper panel). The genome sizes of all species are assumed to be the same and equal to 200 pg.

cannot categorically rule out the possibility that the tandemly repeated copies are transcribed dicistronic- FIG. 3. Minisatellite sequences of the Protoceratium reticulatum (Pr) lcf intergenic region and their similarity to the sequences of ally or even polycistronically, with the intergenic re- the archaeon Methanosarcina acetivorans (Ma). The Pr intergenic gions being rapidly spliced out and processed. In that region consists of 326 nt, working back from the end of the se- case, the intergenic region might or might not be used quence at the conceptional translation start codon ATG going up as the promoter. to the first nucleotide immediately downstream of the 30 un- translated region. The repeats of 11 nucleotides are aligned with Another question concerning transcription related the nonconserved nucleotides shown in lower case. to the nature and properties of the RNA polymerases. Eukaryotic nuclear RNA polymerases types I, II, and 1) gene of the apicomplexan B. bovis (Suarez et al. III are responsible for synthesis of ribosomal, messen- 1998), a few trypanosome genes (Lee and Van der ger, and transfer RNA, respectively. Exceptions include Ploeg 1997) and all eukaryotic ribosomal genes (Soll- trypanosome variant surface glycoprotein (VSG) and ner-Webb and Tower 1986). rap-1 has two head-to-tail procyclic acidic repetitive protein (PARP) genes, which repeats flanked by other genes and separated by an are tandemly repeated and transcribed by RNA 1 kb intergenic sequence, which has been shown to polymerase I (Gunzl et al. 2003). Ribosomal RNA drive the expression of a reporter gene (Suarez et al. genes, which are also transcribed by RNA polymerase 2004). Similarly, the promoter of all ribosomal genes I, use unique intergenic promoters with no well-con- resides in the intergenic region (Sollner-Webb and served motifs. This could also be so for the dinoflag- Tower 1986), which is in the range of 1881 nt in the ellates genes, but so far we have been unable to identify ascidian urochordate (Degnan et al. 1998) to 30 kb these. Ribosomal intergenic regions possess rapidly in humans (Gonzalez et al. 1992). Several genes in T. evolving minisatellites flanked by conserved unique brucei are known to be in multiple tandem copies with sequences (Bhatia et al. 1996), as also found for the lcf an intergenic sequence of 200–400 bases (Lee and Van genes. The possibility that dinoflagellate genes are der Ploeg 1997). They are transcribed from the inter- transcribed by an RNA polymerase different from II genic promoter as a monocistronic unit, but in some is suggested by the finding that in vivo cases from a promoter far upstream as a polycistronic and its circadian rhythmicity are unaffected by a- unit (Johnson et al. 1987, Lee et al. 1988). Without amanitin, a polymerase II inhibitor (Rossini et al. exception, the transcript matured from the latter 2003, Hastings and Taylor, unpublished data), but sen- acquires an additional short sequence via trans-splic- sitive to actinomycin D, which inhibits polymerase I ing (Van der Ploeg et al. 1982). (Karakashian and Hastings 1962). With regard to dinoflagellate lcfs, and in spite of the An alternative, but not strongly favored, interpreta- fact that intergenic regions are not conserved at the tion of our data is that two or more lcf repeats form a sequence level, we are inclined to believe the promoter mini-circle in which the intergenic region harbors the is harbored there, because no transcript correspond- sequence elements required for both replication and ing to the intergenic sequences could be detected by transcription, as is the case for the ribosomal gene of either Northern analysis or RT–PCR and no trans- the protozoan Leishmania (Boucher et al. 2004). In- splicing sequence was found in Lp lcf, whose transcrip- deed, it has been reported that some genes tion starting site and complete cDNA have been deter- of dinoflagellates are organized as mini-circles with one mined (Li and Hastings 1998). However, these data or two genes per circle (Zhang et al. 1999, Howe et al. 102 LIYUN LIU AND J. WOODLAND HASTINGS

2003). The replication origin and the promoter for Boucher, N., McNicoll, F., Laverdiere, M., Rochette, A., Chou, M. chloroplast mini-circles remain to be elucidated. None- N. & Papadopoulou, B. 2004. The ribosomal RNA gene pro- theless, there is no sequence similarity between the in- moter and adjacent cis-acting DNA sequences govern plasmid DNA partitioning and stable inheritance in the parasitic pro- tergenic sequences of the mini-circle genes and those tozoan Leishmania. Nucleic Acids Res. 32:2925–36. of the lcfs. Bradford, M. 1976. A rapid and sensitive for the quantitation of Microsatellites and minisatellites are present in four microgram quantities of protein utilizing the principle of pro- of the eight lcf intergenic regions examined. The in- tein-dye binding. Anal. Biochem. 72:248–54. Bugos, R. C., Chiang, V. L., Zhang, X. H., Campbell, E. R., Podila, tergenic sequence of Pr lcf is quite extraordinary, com- G. K. & Campbell, W. H. 1995. RNA isolation from plant tis- prising 19 copies of 11 nt minisatellites. While sues recalcitrant to extraction in guanidine. Biotechniques microsatellite and minisatellite sequences are wide- 19:734–7. spread in rRNA gene promoters, they are also found Chudnovsky, Y., Li, J. F., Rizzo, P. J., Hastings, J. W. & Fagan, T. F. in many other regions of protein genes, including 50 2002. Cloning, expression, and characterization of a histone- 0 like protein from the marine dinoflagellate Lingulodinium UTRs (Contente et al. 2002), 3 UTRs (Frisch et al. polyedrum (). J. Phycol. 38:543–50. 2001), introns (Albanese et al. 2001), and promoter Contente, A., Dittmer, A., Koch, M. C., Roth, J. & Dobbelstein, M. regions (Rothenburg et al. 2001). They have been 2002. A polymorphic microsatellite that mediates induction of shown to regulate gene activity at both transcription- PIG3 by p53. Nat. Genet. 30:315–20. Degnan, B. M., Yan, J. & Lavin, M. F. 1998. Evidence of multiple al and translational levels (Khashnobish et al. 1999). transcription initiation and termination sites within the rDNA The GT repeats, which are present in Aa, At, and Pl lcf intergenic spacer and rRNA readthrough transcription in the intergenic sequences, can bind to RecA and likely play urochordate Herdmania curvata. Mol. Mar. Biol. Biotechnol. a role in genetic rearrangements (Dutreix 1997). The 7:294–302. GT repeats can also bind to human protein translin, a Dutreix, M. 1997. (GT)n repetitive tracts affect several stages of RecA-promoted recombination. J. Mol. Biol. 273:105–13. protein involved in recombination and telomere main- FitzGerald, P. C., Shlyakhtenko, A., Mir, A. A. & Vinson, C. 2004. tenance (Jacob et al. 2004). Clustering of DNA sequences in human promoters. Genome In spite of the unusual nuclear organization of dino- Res. 14:1562–74. flagellate DNA and great interest in the mechanism of Frisch, R., Singleton, K. R., Moses, P. A., Gonzalez, I. L., Carango, P., Marks, H. G. & Funanage, V. L. 2001. Effect of triplet re- its transcription, the regulation of luciferase mRNA peat expansion on chromatin structure and expression of synthesis, or any protein, remains essentially un- DMPK and neighboring genes, SIX5 and DMWD, in myo- known. The intergenic regions reveal no sequence or tonic dystrophy. Mol. Genet. Metab. 74:281–91. motif conservation except for the two pairs from the Gonzalez, I. L., Wu, S., Li, W. M., Kuo, B. A. & Sylvester, J. E. 1992. very closely related species. This rapid divergence of Human ribosomal RNA intergenic spacer sequence. Nucleic Acids Res. 20:5846. the intergenic sequences of seven luciferases is a hall- Gottesfeld, J. M. & Forbes, D. J. 1997. Mitotic repression of the mark of the promoters of all eukaryotic ribosome transcriptional machinery. Trends Biochem. Sci. 22:197–202. rRNA genes, as is the head-to-tail tandem arrange- Guillard, R. R. & Ryther, J. H. 1962. Studies on marine planktonic ment. The presence of microsatellites and minisatel- diatoms. I. Cyclotellanana Hustedt and Detonula confervacea lites in the intergenic regions of many lcf genes is also (Cleve). Can. J. Microbiol. 8:229–39. Guillebault, D., Derelle, E., Bhaud, Y. & Moreau, H. 2001. Role of characteristic of eukaryotic ribosomal rRNA promot- nuclear WW domains and proline-rich proteins in dinoflagel- ers. These similarities, together with the evidence that late transcription. Protist 152:127–38. luciferase genes of dinoflagellates may not be tran- Guillebault, D., Sasorith, S., Derelle, E., Wurtz, J. M., Lozano, J. C., scribed by RNA polymerase II, suggest that the RNA Bingham, S., Tora, L. & Moreau, H. 2002. A new class of transcription initiation factors, intermediate between TATA polymerase for transcribing lcfs of dinoflagellates may box-binding proteins (TBPs) and TBP-like factors (TLFs), is be unique or similar to RNA polymerase I. present in the marine , the dinoflagellate Crypthecodinium cohnii. J. Biol. Chem. 277:40881–6. We thank Dr. The´re`se Wilson for valuable comments. We also Gunzl, A., Bruderer, T., Laufer, G., Schimanski, B., Tu, L. C., Chung, thank Ping Liu for technical assistance, Xianghui Liu for soft- H. M., Lee, P. T. & Lee, M. G. 2003. RNA polymerase I tran- ware, and Sasha Mirsky for administrative assistance. This re- scribes procyclin genes and variant surface glycoprotein gene search was supported in part by grants from the National expression sites in Trypanosoma brucei. Eukaryot. Cell 2:542–51. Science Foundation (MCB-0343407) and the Office of Naval Heinemeyer, T., Wingender, E., Reuter, I., Hermjakob, H., Kel, A. Research (N00014-03-1-0173) to J.W.H. E., Kel, O. V., Ignatieva, E. V., Ananko, E. A., Podkolodnaya, O. A., Kolpakov, F. A., Podkolodny, N. L. & Kolchanov, N. A. 1998. Databases on transcriptional regulation: TRANSFAC, TRRD and COMPEL. Nucleic Acids Res. 26:362–7. Albanese, V., Biguet, N. F., Kiefer, H., Bayard, E., Mallet, J. & Herzog, M. & Soyer, M. O. 1981. Distinctive features of dinoflag- Meloni, R. 2001. Quantitative effects on gene silencing by ellate chromatin—absence of nucleosomes in a primitive spe- allelic variation at a tetranucleotide microsatellite. Hum. Mol. cies Prorocentrum micans E.. Eur. J. Cell Biol. 23:295–302. Genet. 10:1785–92. Higo, K., Ugawa, Y., Iwamoto, M. & Korenaga, T. 1999. Plant cis- Altschul, S., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. acting regulatory DNA elements (PLACE) database. Nucleic 1990. Basic local alignment search tool. J. Mol. Biol. 215: Acids Res. 27:297–300. 403–10. Howe, C. J., Barbrook, A. C., Koumandou, V. L., Nisbet, R. E., Benson, G. 1999. Tandem repeats finder: a program to analyze Symington, H. A. & Wightman, T. F. 2003. Evolution of the DNA sequences. Nucleic Acid Res. 27:573–80. chloroplast genome. Philos. Trans. R. Soc. London. B. Biol. Sci. Bhatia, S., Singh Negi, M. & Lakshmikumaran, M. 1996. Structural 358:99–106. analysis of the rDNA intergenic spacer of Brassica nigra: evo- Jacob, E., Pucshansky, L., Zeruya, E., Baran, N. & Manor, H. 2004. lutionary divergence of the spacers of the three diploid The human protein translin specifically binds single-stranded Brassica species. J. Mol. Evol. 43:460–8. microsatellite repeats, d(GT)n, and G-strand telomeric re- INTERGENIC SEQUENCES OF LUCIFERASES 103

peats, d(TTAGGG)n: a study of the binding parameters. J. Rosenberg, M. & Court, D. 1979. Regulatory sequences involved in Mol. Biol. 344:939–50. the promotion and termination of RNA transcription. Annu. Johnson, C. H., Roeber, J. F. & Hastings, J. W. 1984. Circadian Rev. Genet. 13:319–53. changes in enzyme concentration account for rhythm of en- Rossini, C., Taylor, W., Fagan, T. & Hastings, J. W. 2003. Lifetimes zyme activity in . Science 223:1428–30. of mRNAs for clock-regulated proteins in a dinoflagellate. Johnson, P. J., Kooter, J. M. & Borst, P. 1987. Inactivation of tran- Chronobiol. Int. 20:963–76. scription by UV irradiation of T. brucei provides evidence for a Rothenburg, S., Koch-Nolte, F., Rich, A. & Haag, F. 2001. A poly- multicistronic transcription unit including a VSG gene. Cell morphic dinucleotide repeat in the rat nucleolin gene forms Z- 51:273–81. DNA and inhibits promoter activity. Proc. Natl. Acad. Sci. USA Karakashian, M. W. & Hastings, J. W. 1962. The inhibition of a 98:8985–90. biological clock by actinomycin D. Proc. Natl. Acad. Sci. USA Sollner-Webb, B. & Tower, J. 1986. Transcription of cloned eukar- 48:2130–7. yotic ribosomal RNA genes. Annu. Rev. Biochem. 55:801–30. Khashnobish, A., Hamann, A. & Osiewacz, H. D. 1999. Modulation Soyer-Gobillard, M. O. & Geraud, M. L. 1992. Nucleolus behavior of gene expression by (CA)n microsatellites in the filamentous during the cell-cycle of a primitive dinoflagellate , ascomycete Podospora anserina. Appl. Microbiol. Biotechnol. Prorocentrum micans Ehr, seen by light-microscopy and electron- 52:191–5. microscopy. J. Cell Sci. 102:475–85. Knaust, R., Urbig, T., Li, L. M., Taylor, W. & Hastings, J. W. 1998. Spector, D. L. [Ed.] 1984. Dinoflagellates. Academic Press, New The circadian rhythm of bioluminescence in Pyrocystis is not York, 545 pp. due to differences in the amount of luciferase: a comparative Suarez, C. E., Palmer, G. H., Hotzel, I. & McElwain, T. F. 1998. study of three bioluminescent marine dinoflagellates. J. Phycol. Structure, sequence, and transcriptional analysis of the Babesia 34:167–72. bovis rap-1 multigene locus. Mol. Biochem. Parasitol. 93:215–24. Laemmli, U. K. 1970. Cleavage of structural proteins during Suarez, C. E., Palmer, G. H., LeRoith, T., Florin-Christensen, M., assembly of head of bacteriophage-T4. Nature 227:680. Crabb, B. & McElwain, T. F. 2004. Intergenic regions in the Le, Q. H., Markovic, P., Hastings, J. W., Jovine, R. V. M. & Morse, rhoptry associated protein-1 (rap-1) locus promote exogenous D. 1997. Structure and organization of the peridinin-chloro- gene expression in Babesia bovis. Int. J. Parasitol. 34:1177–84. phyll a-binding protein gene in Gonyaulax polyedra. Mol. Gen. Taylor, F. J. R. [Ed.] 1987. The Biology of Dinoflagellates. Blackwell Genet. 255:595–604. Scientific Publications, Oxford, 785 pp. Lee, M. G., Atkinson, B. L., Giannini, S. H. & Van der Ploeg, L. H. Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F. & 1988. Structure and expression of the hsp 70 gene family of Higgins, D. G. 1997. The CLUSTAL_X windows interface: Leishmania major. Nucleic Acids Res. 16:9567–85. flexible strategies for multiple sequence alignment aided by Lee, M. G. & Van der Ploeg, L. H. 1997. Transcription of protein- quality analysis tools. Nucleic Acids Res. 25:4876–82. coding genes in trypanosomes by RNA polymerase I. Annu. Van der Ploeg, L. H., Liu, A. Y., Michels, P. A., De Lange, T., Borst, Rev. Microbiol. 51:463–89. P., Majumder, H. K., Weber, H., Veeneman, G. H. & Van Li, L. & Hastings, J. W. 1998. The structure and organization of the Boom, J. 1982. RNA splicing is required to make the messen- luciferase gene in the photosynthetic dinoflagellate Gonyaulax ger RNA for a variant surface antigen in trypanosomes. Nucleic polyedra. Plant Mol. Biol. 36:275. Acids Res. 10:3591–604. Li, L., Hong, R. & Hastings, J. W. 1997. Three functional luciferase White, R. J., Gottlieb, T. M., Downes, C. S. & Jackson, S. P. 1995. domains in a single polypeptide chain. Proc. Natl. Acad. Sci. Cell cycle regulation of RNA polymerase III transcription. USA 94:8954–8. Mol. Cell. Biol. 15:6653–62. Li, W.-H. 1997. Molecular Evolution. Sinauer Associates, Inc, Sun- Wilson, T. & Hastings, J. W. 1998. Bioluminescence. Annu. Rev. Cell derland, 487 pp. Dev. Biol. 14:197–230. Liu, L., Wilson, T. & Hastings, J. W. 2004. Molecular evolution of Yoshikawa, T., Takishita, K., Ishida, Y. & Uchida, A. 1997. cloning dinoflagellate luciferases, enzymes with three catalytic do- and nucleotide sequence analysis of the gene coding for mains in a single polypeptide. Proc. Natl. Acad. Sci. USA chloroplast-type ferredoxin from the dinoflagellates 101:16555–60. bipes and Alexandrium tamarense. Fisheries Sci. 63:692–700. Okamoto, O. K., Liu, L., Robertson, D. L. & Hastings, J. W. 2001. Zhang, Z. D., Green, B. R. & Cavalier-Smith, T. 1999. Single gene Members of a dinoflagellate luciferase gene family differ in circles in dinoflagellate chloroplast genomes. Nature 155–9. synonymous substitution rates. Biochemistry 40:15862–8. Pelle, R. & Murphy, N. B. 1993. Northern hybridization—rapid and simple electrophoretic conditions. Nucleic Acids Res. Supplementary Material 21:2783–4. Reichman, J. R., Wilcox, T. P. & Vize, P. D. 2003. PCP gene family in from Hippopus hippopus: low levels of concerted The following supplementary material is availa- evolution, isoform diversity, and spectral tuning of chromo- ble for this article online: phores. Mol. Biol. Evol. 20:2143–54. Rizzo, P. J. 1981. Comparative aspects of basic chromatin proteins in dinoflagellates. Biosystems 14:433–43. Table S1. Primers used for PCR amplifications. Rizzo, P. J. 2003. Those amazing dinoflagellate chromosomes. Cell Table S2. Frequency of dinucleotides in the inter- Res. 13:215–7. genic regions of seven lcfs.