<<

Comparative Functional Genomics Reveals Fission Innovation in Centromere Structure, Transcriptional Control, Carbon Metabolism and Antisense Transcription of Meiotic Genes

Nicholas Rhind 1†, Zehua Chen2, Moran Yassour3,4,5*, Dawn A Thompson3*, Brian J Haas2*, Naomi Habib6,7*, Ilan Wapinski3,8*, Sushmita Roy3,9*, Michael F. Lin9, David I Heiman2, Sarah K Young2, Kanji Furuya10, Yabin Guo11, Alison Pidoux12, Huei Mei Chen13, Barbara Robbertse14,27, Jonathan M. Goldberg2, Keita Aoki10, Elizabeth H. Bayne12,28, Aaron M Berlin2, Edward Dobbs12, Livio Dukaj1, Lin Fan2, Michael G FitzGerald2, Courtney French7, Sharvari Gujja2, Klavs Hansen15,29, Dan Keifenheim1, Joshua Z. Levin2, Rebecca A. Mosher16,30, Carolin A. Müller17, Jenna Pfiffner2, Margaret Priest2, Carsten Russ2, Agata Smialowska18, Peter Swoboda20, Sean M Sykes2, Matthew Vaughn15, Sonya Vengrova21, Ryan Yoder14, Qiandong Zeng2, Robin Allshire12, David Baulcombe16, Bruce W. Birnen22, William Brown17, Karl Ekwall18, Manolis Kellis9,3, Janet Letherwood13, Henry Levin11, Hanah Margalit5, Rob Martienssen15, Conrad A. Nieduszynski17, Joey Spatafora14, Nir Friedman5,23, Jacob Z. Dalgaard21, Peter Baumann24,24,26, Hironori Niki10, Aviv Regev3,4,26†, Chad Nusbaum2†

1 Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, 364 Plantation Street, Worcester MA 01605 USA 2 Broad Institute of MIT and Harvard, 320 Charles St., Cambridge MA 02141 USA, 3 Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge MA 02142 USA 4 Department of Biology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge MA 02139 USA 5 School of Engineering and Computer Science, Hebrew University, Jerusalem, Israel 6 School of Computer Science, The Hebrew University, The Hebrew University 7 Department of Molecular Genetics and Biotechnology, Faculty of Medicine, The Hebrew University, Jerusalem 91120, Israel 8 Department of Systems Biology, Harvard Medical School, 200 Longwood Ave, Alpert 536, Boston, MA 02115 USA 9 Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar St. 32-D510, Cambridge, MA 02139, USA 10 Microbial Genetics Laboratory, Genetic Strains Research Center, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka, Japan 11 Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland 20892, USA 12 Wellcome Trust Centre for Cell Biology, Institute of Cell Biology, School of Biological Sciences, The University of Edinburgh, 6.34 Swann Building, Mayfield Road, Edinburgh EH9 3JR, UK 13 Department of Molecular Genetics and Microbiology, Life Science, room 130, State University of New York, Stony Brook, NY, 11794-5222 14 Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon 97331, USA. 15 Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724 16 Department of Plant Sciences, University of Cambridge, Downing Street, Cambridge, CB2 3EA, UK 17 Centre for Genetics and Genomics, University of Nottingham, Queen's Medical Centre, Nottingham, UK 18 Karolinska Institute, Center for Biosciences, Dept of Biosciences and Nutrition, Stockholm, Sweden 19 Department of Life Sciences, Sodertorn Hogskola, Huddinge, Sweden 20 Karolinska Institute, Center for Biosciences at NOVUM, Department of Biosciences and Nutrition, Hälsovägen 7, S-141 83 Huddinge, Sweden 21 Warwick Medical School, University of Warwick, Gibbet Hill Campus, Coventry CV4 7AL, UK 22 Broad Institute of MIT and Harvard, 301 Binney St., Cambridge MA 02141 USA 23 Alexander Silberman Institute of Life Sciences, Hebrew University, Jerusalem, Israel 24 Stowers Institute for Medical Research, Kansas City, MO 25 Department of Molecular and Integrative Physiology, University of Kansas Medical School, Kansas City, KS 26 Howard Hughes Medical Institute

Present addresses: 27 National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 45 Center Drive, Bethesda, MD, 20892, USA 28 Wellcome Trust Centre for Gene Regulation and Expression, College of Life Sciences, University of Dundee, Dundee DD1 5EH, Scotland, UK 29 Evolva Biotech A/S, Bülowsvej 25, 1870 Frederiksberg C, Denmark 30 The University of Arizona, The School of Plant Sciences, 303 Forbes Bldg., 1140 E. South Campus Dr., Tucson, AZ 87421-0036, USA

*These authors made equivalent contributions †To whom correspondence should be addressed. E-mail: [email protected] (NR), [email protected] (AR), [email protected] (CN)

2 The fission yeast clade, comprising pombe, S. octosporus, S. cryophilus and S. japonicus, occupies the earliest known branch of Ascomycete fungi and is an important model of biology. We present a comparative annotation and expression analysis of these four genomes, and identify the following key innovations in the fission that yield general biological insights. First, there has been a near extinction of transposons and associated innovation of transposon-free centromeres in later branches of the clade. Second, established trans-acting regulators have expanded their scope to control new genes within the context of expanded functional modules for meiosis and stress response. Third, innovations in gene content and regulatory mechanisms explain why, despite the convergent evolution of aerobic ethanol production between fission yeast and the , fission yeasts cannot use ethanol as a primary carbon source. Finally, meiotic genes are subject to extensive antisense transcription during vegetative growth, providing a mechanism for their tight regulation. Our analysis elucidates fundamental evolutionary principles in genome structure and gene regulation, enhances the understanding of fission yeast biology, and provides tools for investigation across the

Schizosaccharomyces clade.

The fission yeast genus Schizosaccharomyces forms a broad and ancient clade within the Ascomycete fungi (Figure 1A) that has a distinct life history from other yeasts (1). For example, fission yeast grow preferentially as haploids, divide by medial fission rather than asymmetric budding, and have evolved a single-celled lifestyle independently from the budding yeasts. Fission yeasts share many characteristics with metazoans that are missing or highly diverged in the budding yeast of Saccharomycotina.

Examples include chromosome structure and metabolism (in which fission yeast have relatively large chromosomes, large repetitive centromeres, heterochromatic histone methylation, chromodomain heterochromatin proteins, siRNA-regulated heterochromatin and TRF-family telomere binding proteins),

G2/M cell cycle control, cytokinesis, the mitochondrial translation code, the RNAi pathway, the

3 signalosome pathway and many spliceosome components. In general, the fission yeast complement of core ortholog genes more closely resembles that of metazoans than do those of other Ascomycetes (2).

However, fission yeasts have also evolved important innovations in carbon metabolism, including aerobic fermentation of glucose to ethanol (3). This convergent evolution with the budding yeast

Saccharomyces cerevisiae offers insight into the evolution of complex phenotypes.

Because of its similarity to metazoans, its experimental tractability and its key contrasts with budding yeast, S. pombe is widely used as a model for basic cell-biological processes and to study genes implicated in human disease. Nonetheless, due to their evolutionary distance from other yeasts and the fact the only S. pombe has been significantly studied, much remains to be learned about the evolution and natural history of fission yeast. To better understand the fission yeasts, we have sequenced, assembled, annotated and analyzed the S. japonicus, S. octosporus and S. cryophilus genomes and transcriptomes, and carried out detailed comparative analyses of genome structure, gene content and gene expression. Fission yeasts are amenable to genetic, cell biological and molecular approaches; with the addition of genome sequences they now possess a comprehensive toolkit to enable direct functional characterization of the full range of cellular and evolutionary processes.

Genome sequence and phylogeny

We sequenced and assembled the genomes of S. octosporus, S. cryophilus and S. japonicus using a variety of clone-based and clone-free whole-genome shotgun (WGS) approaches (Table 1). The genomes are all approximately 11.5 Mb in size. S. octosporus and S. cryophilus have GC contents close to 38%, with S. japonicus at 44%. By comparison, the S. pombe genome is 12.5 Mb in size with a 36%

GC composition. Using physical and genetic map data, the S. octosporus and S. japonicus scaffolds were assembled into 3 full-length chromosomes of similar quality to the finished S. pombe genome

(Figures 1B, S1, S2 and Tables S1, S2). Using a novel approach to identify telomeric sequence in standard WGS sequence data (Methods), we identified telomere-repeats in S. japonicus (GTCTTA), S.

4 octosporus (GGGTTACTT) and S. cryophilus (GGGTTACTT) that matched a one and a half repeat-unit sequence at the putative telomerase-RNA locus, similar to the configuration in S. pombe (GGTTAC) (4).

Using this sequence, we extended the S. japonicus and S. octosporus chromosomes into subtelomeric and telomeric sequence. Details of sequencing, assembly and annotation are provided in supporting online material (SOM) text.

The Schizosaccharomycetes form a monophyletic group, with substantial sequence divergence between species, and low population diversity within species. We constructed a phylogeny of the

Schizosaccharomycetes within the (Figure 1A, S3), based on an analysis of 440 single-copy core orthologs. As expected, the Schizosaccharomyces species form a monophyletic group and, as the earliest branch in Ascomycota, are a sister group to the clade including the filamentous fungi

() and budding yeast (Saccharomycotina). The Schizosaccharomycetes clade is broader than the vertebrates; the average amino acid identity of all 1:1 orthologs between S. pombe and S. japonicus is 55%, similar to that between humans and the cephalochordate amphioxus (Table S3). For the most closely related species, S. cryophilus and S. octosporus, 1:1 orthologs share 85% identity on average, similar to humans and dogs. Despite the breadth of the Schizosaccharomyces clade, the population diversity within each species is low (WB and PB, unpublished data). Sequencing of two other diverse S. pombe isolates, S. pombe NCYC132 and S. pombe var kambucha, revealed less than 1% nucleotide difference between the three strains (Figure S4, Table S4).

Evolutionary eradication of transposons and reorganization of centromere structure

Transposons and other repetitive sequences play a central role in centromere function in species from fungi to mammals, and are thought to be crucial for centromeric function through the maintenance of heterochromatin (5). These sequences are known to evolve rapidly, but the evolutionary relationship between centromeres, transposons and heterochromatin is unclear. In particular, little is known about fungal centromeres, since they have not generally been included in genome assemblies.

5 The distribution of transposons between the fission yeast genomes reveals a dramatic loss of transposons within the clade. The S. japonicus genome harbors 10 families of gypsy-type retrotransposons (SOM text, Figure S5 and Table S5). The sequence divergence of their reverse transcriptases is as great as that between transposons found in S. pombe and S. cerevisiae, suggesting that it is an old group of transposons inherited by the Schizosaccharomycetes from the last common ancestor of the Ascomycetes. However, after the divergence of S. japonicus, most transposons were lost;

S. pombe harbors two closely related retrotransposon families, Tf1 and Tf2; S. cryophilus has a single related retrotransposon, Tcry1; S. octosporus contains no transposons, but does contain sequences related to reverse transcriptase and integrase that may represent remnants of extinct transposons related to Tcry1 (Figure S5, Table S5).

The disappearance of transposons in the post-S. japonicus fission yeast species correlates with the appearance of the cbp1 gene family, marking a transition in the control of centromere function. In S. pombe, Cbp1 proteins bind centromeric repeats and are required for transposon silencing and genome stability (6, 7). Although previously described as orthologs of CENP-B (6), a human centromere- binding protein, these proteins apparently evolved independently within the Schizosaccharomyces lineage from a domesticated Pogo-like DNA transposase (8). The appearance of the cbp1 gene family also correlates with the switch from RNAi-mediated transposon silencing in S. japonicus (see below) to a Cbp1-based mechanism in S. pombe. This correlation suggests that evolution from RNAi-based to

Cbp1-based transposon control allowed the eradication of most transposons from the fission yeast genomes, possibly by promoting recombinational deletion between LTRs (7). Furthermore, the gene content of the cbp1 family is evolving rapidly, with 3 paralogs in S. pombe, 6 in S. octosporus, but none in S. cryophilus or S. japonicus (Figure S6). The late appearance of this gene family suggests that Cbp1- based transposon silencing is a Schizosaccharomyces-specific innovation that arose after the divergence of S. japonicus.

6 The loss of transposons was accompanied by a significant reorganization of chromosome architecture that nonetheless conserves centromere function, suggesting compensatory co-evolution. In

S. japonicus, transposons cluster next to telomeres and centromeres, as is common in metazoans (Figure

1B,C). In the other Schizosaccharomycetes, the subtelomeres and pericentromeres are also repetitive, but they are devoid of transposons (Figure 1C). However, like the S. japonicus transposons, the centromeric and subtelomeric repeats are confined to pericentromeric and subtelomeric regions, respectively. The one exception is a centromeric repeat involved in transcriptional silencing at the S. pombe mating-type locus (9). We confirmed that the centromeres are heterochromatic by the enrichment of histone H3 lysine-9 methylation (Figure S7), and confirmed their genetic function in S. japonicus by meiotic mapping (Table S1).

Centromeric repeats evolve rapidly between species and even between different strains of the same species. However, the sequences of individual repeats on the different centromeres within a strain are very similar (Figure 1B). A comparison of centromere sequences reveals that there is no similarity between the centromeric repeats of S. pombe, S. octosporus or S. cryophilus, consistent with the rapid evolution of centromere structure between different strains of S. pombe (10). However, the overall structure of the S. pombe and S. octosporus centromeres is strikingly similar. They both contain repeated elements arrayed in a larger inverted repeat structure around a unique core sequence containing no recognizable sequence features (Figure 1B). Moreover, the repeats are highly similar between chromosomes, suggesting that they are regularly homogenized by non-reciprocal recombination. This structure contrasts sharply with that of S. japonicus, in which the arrangement of retrotransposons shows no symmetry. The lack of symmetry in S. japonicus suggests that transposition occurs more rapidly than homogenization by recombination. This observation raises the possibility that that suppression of transposition led both to the degeneration of transposon sequences and to the evolution of symmetric centromeric repeats.

7 Remarkably, despite the rapid evolution of centromere sequence and structure, centromere location relative to surrounding gene order is well conserved between S. pombe and S. octosporus (Figure S8).

Thus, neither centromeric translocations nor neo-centromere events have occurred within these lineages, even though centromeres can occur at novel locations in manipulated S. pombe strains (11). This observation suggests that, although gene conversion is involved in maintaining the similarity of centromeric repeats between the different centromeres, crossover recombination events between centromeres are completely suppressed. Nonetheless, S. pombe, S. octosporus and S. cryophilus have retained the repetitive nature of their centromeres, even as they have lost their transposons, implying an important function for the centromeric repeats.

siRNAs are known to be involved in both transposon silencing and centromere function (12), leading us to investigate the interplay between these two roles in the Schizosaccharomyces lineage. In S. pombe, the centromeric repeats produce dicer-dependent siRNAs that are required for maintenance of centromeric structure, function and transcriptional silencing via Argonaute-dependent heterochromatin formation (13). However, transposons are silenced in S. pombe by RNAi-independent mechanisms and do not produce abundant siRNAs (Figures 1B, S9)(6). To investigate whether centromere-directed siRNA production is conserved within the transposon-rich centromeres of S. japonicus, we isolated and sequenced small RNAs from log-phase S. japonicus cultures (Methods). The small RNAs have a modal size of 23 nucleotides and 94% map to transposons, both telomeric and centromeric (Figures 1B, S9).

The fact that siRNAs map to transposons in S. japonicus but not in S. pombe suggests that either the fission yeast RNAi pathway targets repetitive sequences instead of mobile elements per se, or that after the loss of transposons in the post-S. japonicus species, the pathway evolved away from an ancestral role in transposon control to a dedicated role in heterochromatin function. Either way, fission yeast provide a striking example of how the primary sequence and transposon content of centromeres can evolve while maintaining the same trans-acting regulation of heterochromatic structure.

8 Evolution of silencing mechanisms of mating-type loci

The general structure of the mating-type locus and the cis-acting elements that regulate switching of mating type loci in the fission yeast are highly conserved across all four species (Figure S10). Fission yeast switch between the plus (P) and minus (M) mating types by epigenetically-programmed gene conversion (14, 15). The general structure of the mating-type locus is conserved across the

Schizosaccharomyces, with an expressed mat1 that can contain either the P or the M allele and two heterochromatically-silenced donor cassettes: mat2-P and mat3-M (Figures S10, S11). All of the known cis-acting regulatory sequences required for epigenetic imprinting and recombinational switching are conserved in S. octosporus and S. cryophilus (16, 17). Since the mat1 locus is not present in the S. japonicus genome assembly (although it is detectable by physical mapping, Figure S11), it is not clear whether the sequences for imprinting are conserved in this species, although the donor-loci mat2-P and mat3-M are flanked by homologous sequences required for recombinational switching. Nonetheless, the epigenetically-programmed genomic mark associated with mat loci (14) is detectable in other species

(SOM text, Figure S11).

Conversely, those elements that regulate transcriptional silencing of the mating type loci vary dramatically between the species. None of the cis-acting sequences known to be involved in transcriptional repression of the silent cassettes in S. pombe are identifiable in the other species, although the donor cassettes are heterochromatically silenced, as indicated by an enrichment of H3K9me

(Figure S7). In S. japonicus, the silent mat loci directly abut the centromere of Chromosome 3, suggesting that they are silenced by a centromeric position effect. In S. octosporus and S. cryophilus, the mat loci are distant from the centromeres, but they each contain a conserved region of transposon remnants, similar to the S. cryophilus Tcry1 family of transposons, suggesting that transposon sequences may serve as silencing triggers in these species. Interestingly, they also contain inverted repeats, albeit shorter and less similar to each other than the inverted repeats that flank, and are involved in silencing of, the mat2/3 locus in S. pombe. Thus, their silencing strategies may share elements from both S. 9 pombe and S. japonicus. These results suggest that the mechanisms of imprinting and switching have been specifically conserved, but that the strategies for establishing heterochromatin are plastic.

Comparative annotation of the Schizosaccharomyces transcriptomes

We annotated the three genomes sequenced in this work using traditional computational methods, and used comparisons with these genomes to improve the S. pombe annotation (Methods). To validate and refine gene models for all four species, and to determine transcriptional start and stop sites, we deep sequenced cDNA from polyA-enriched RNA (18) using strand-specific libraries (19) (RNA-Seq), and constructed annotation-independent transcript models (Figure S12). We used RNA from four growth conditions from S. pombe, S. octosporus, and S. japonicus (log phase, diauxic shift, early stationary phase and heat shock) and three conditions from S. cryophilus (log phase, diauxic shift and heat shock).

In S. pombe, we were able to fully reconstruct 4277 out of 5065 previously annotated genes. Of the remaining 788 genes, 60% were covered over at least 90% of their length. The RNA-Seq-based transcript models suggested 401 changes to coding exon structure, 95% of which maintained or improved the conserved coding capacity of the gene models (Table 2 and Figure S12) (20), and thus apparently improve the existing annotation. In addition, we identified 253 UTR introns. Lastly, we found 89 new protein-coding genes in S. pombe, 53 of which are conserved (Table 2 and Figure S13).

These newly annotated protein-coding genes had been previously missed because they have short ORFs

(the median ORF length for all 89 genes is 90 amino acids compared to 379 for the full gene set) and short exons (the average length of exons for the 37 multi-exonic genes is 115 nt compared to 1001 for the multi-exonic genes in the full gene set). We constructed similar transcript models for the other three species (Table 2), with 4882, 4988, and 5126 protein-coding transcripts predicted in S. japonicus, S. octosporus and S. cryophilus, respectively.

Despite previous speculation that the intron-rich structure of fission yeast genes may allow metazoan-like alternative splicing (21), we found no evidence for extensive alternative splicing. We

10 found evidence for 433 alternative splicing events in S. pombe in the form of intron retention and alternative splice-donor or -acceptor usage, but no evidence for exon skipping or the use of alternative coding exons; we found similar levels of splice variants in the other species (Table S6). However, many of these variants disrupt the coding capacity of the gene (Figure S14). Furthermore, only a minority of the S. pombe intron-retention events (146/393) are conserved between two or more species. We therefore suspect that the bulk of alternative splicing in fission yeast represents nonproductive splicing variants. Interestingly, in some cases the spliced variant may be the non-productive isoform (Figure

S14c). Whether any of these alternative-splicing events plays a regulatory role remains to be seen.

Transcription is concentrated in well-defined, primarily protein-coding, transcripts

We find that the vast majority of fission yeast transcripts originate from annotated protein coding genes.

Previous work has debated the extent and significance of transcription in intergenic regions (22). In particular, it has been reported that 94% of the S. pombe genome is transcribed (18). Although we see a similar distribution of transcription (91% of nucleotides are covered by at least one RNA-Seq read), we find this statistic is misleading, since the vast majority of transcription, as measured by steady-state polyA-enriched RNA levels, is associated with well-defined transcripts, most of which are protein- coding. Specifically, 37% of intergenic nucleotides (defined as nucleotides between the UTRs of annotated protein coding transcripts) are not detectibly expressed and the transcription of 90% of intergenic nucleotides accounts for only 0.16% of overall polyA-enriched transcript signal. Moreover, the median expression level of nucleotides in exons (99.1% of which are detectably expressed) is 305- fold higher than that of intergenic nucleotides (Figure 2 and Table S7), consistent with similar findings from RNA-Seq in S. cerevisiae (23). The observed intergenic transcription is greatly enriched within origins of DNA replication (Figure S15), which are known to be large, gene-free regions with significant nucleosome-free regions (24-26); such NFRs may provide permissive loci for ectopic transcriptional initiation (27).

11 Transcription of coding genes is heavily biased to the sense strand. Of coding genes, 73% have less than 5% of their RNA-Seq reads on the antisense strand. Genes that have more than 5% antisense reads are highly enriched for convergent transcripts with intergenic distances of less than 200 bp (p < 10-8), but not with those of greater than 200 bp (p > 0.1), suggesting that much antisense transcription is due to read through of 3'-termination sites, as has been previously observed (28) (Figure S16). From these results, we conclude that stable transcripts of the fission yeast genome are primarily associated with known transcription units and that these genomes are pervasively transcribed only in that they are very gene dense. We discuss the specific exceptions of strong anti-sense transcription at specific loci and well-defined intergenic transcripts below.

Gene content and structure are highly conserved

Despite the evolutionary breadth of the fission-yeast clade, as measured by amino-acid divergence, their gene content and structure is remarkably conserved. Of approximately 5000 coding genes in each of the species, 4218 are 1:1:1:1 orthologs across the clade, with the remainder of the orthologous groups containing genes that have been duplicated or deleted since the last common ancestor of the clade

(Tables 3, S8). Moreover, of 3601 S. pombe introns in 2616 spliced 1:1:1:1 orthologs, 2901 (81%) are identical across the four species (Table S9). Of the orthologous groups that have changed, the majority are due to the gain of species- and clade-specific genes (Table S8). These expanded orthologous groups fall into three general categories: transcriptional regulation, cell wall/cell membrane function and meiosis (Table S10). Because gene content is defined by sequence similarity, much of the observed gene gain is likely to be due to rapidly evolving gene families. Indeed, genes involved in both transcriptional regulation and meiosis are also significantly enriched in the fastest evolving gene families (Tables S11, S12). The rapid evolution of transcriptional regulation and cell surface proteins may have allowed fission yeast to adapt to changes in environmental conditions, while that of meiotic genes may be involved in speciation events within the clade.

12 The stability in gene content among the fission yeast is also apparent in the small number of fungal genes missing from the clade. Of 3788 orthologous groups of genes conserved in at least 15 of 20 other sequenced Ascomycetes (Table S13), only 745 (20%) are missing from all fission yeast (Table S14). As is the case of genes gained by fission yeast, this group of genes lost from fission yeast contains many that encode transcription factors, membrane proteins and meiotic proteins, as well as proteins involved in budding. However, the only significantly enriched categories of genes in this group are involved in a sub-pathway of fatty acid biosynthesis and glycogen metabolism (Table S15), which we discuss below.

Therefore, with these exceptions, fission yeast maintain the core complement of fungal genes, a conclusion also supported by the phylogenetic distribution of orthologous groups conserved in fission yeast (SOM text, Figure S17).

Examining the relative rates of gene evolution between the fission yeast clade and the

Saccharomyces clade, we find several classes of genes that are more rapidly evolving within the fission yeast than the in the Saccharomyces, including those involved in membrane synthesis and transport, translation, glycolysis and respiration (Tables S11, S12). The consequences of the rapid evolution of energy metabolism are discussed below. Of the gene classes that evolve relatively slowly in fission yeast, the most striking example is the pre-mRNA splicing machinery (Table S16). This phenomenon reflects the conservation of complex exon structures in fission yeast in contrast to the significant simplification of exon structures and the splicing machinery in budding yeast (21).

Overall, the conservation of gene content, gene order and gene structure within

Schizosaccharomyces is much higher than would be expected given the level of amino acid divergence.

Based on amino acid divergence, we estimate that the fission yeast clade arose about 250 million years ago (Figure S3). However, the conservation of gene content is significantly higher than that within 4- species clades of Saccharomyces or Kluveromyces, both of which have much lower amino acid divergence (Table S17). This discrepancy suggests that either fission yeast amino acid sequence is evolving anomalously quickly, or that its genome structure in unusually stable. 13 Clear evidence for intergenic and antisense non-coding transcripts

We found clear evidence for stable, well-defined noncoding transcripts. We identified 1097 putative transcript models in S. pombe that are strongly supported by strand-specific RNA-Seq data yet contain no obvious coding capacity and do not correspond to known non-coding RNAs, such as rRNAs, snoRNAs or the ter1 telomerase RNA (Figure S18 and Tables 2 and S18). Of these potential ncRNAs,

449 are intergenic and 648 are antisense, overlapping a coding gene on the other strand by at least 30%.

213 of the ncRNAs overlap an annotated UTR on the same strand, suggesting that they may be alternative UTRs. However, the data strongly support 338 of the intergenic and 546 of the antisense ncRNAs as distinct transcripts. Indeed, by northern analysis, we validated that 5 are single transcripts of the size predicted by the RNA-Seq-derived transcript model (Figure S19b).

Of the 338 distinct intergenic ncRNAs in S. pombe, 113 (33%) are conserved in location relative to neighboring coding transcripts, suggesting that a some of these transcripts may be biologically significant. Moreover, 26 (6%) are conserved in sequence in at least one of the other genomes and of these 13 are in conserved locations. These transcripts represent potentially biologically important non- coding RNAs of unknown function. Of the antisense transcripts, 328 (51%) are conserved across two or more genomes (Table S19), suggesting that many of them are biologically significant, consistent with the observation of conserved antisense transcripts in budding yeast (23).

Antisense regulation of meiotic transcription

We found a striking conservation of antisense transcription of meiotic genes. There are 267 S. pombe genes with greater antisense transcription than sense transcription (Table S20). This class of genes is significantly enriched for involvement in meiosis (29) (p < 10-9) and includes 27 of 57 curated meiotic genes (Figures 3, S20, Table S21). The meiotic enrichment of antisense transcription is conserved across all four fission yeast (Figure S20 and Table S22), and is consistent with a similar enrichment observed in S. cerevisiae (23). Many of the genes we identify as antisense targets have been previously

14 proposed to be regulated by alternative splicing in the form of intron retention (30, 31), however these studies did not use strand-specific approaches, making it impossible to distinguish antisense transcripts from unspliced sense transcripts. We validated our RNA-Seq results by measuring transcript levels from several S. pombe antisense-transcribed meiotic genes by strand-specific Northern and q-rtPCR analysis, and by strand-specific splicing assays (Figure S19).

The source of the antisense transcripts of meiotic genes is varied. In some cases, a distinct transcript is expressed on the opposite strand (Figure 3A), in other cases the antisense transcription appears to be due to read-through from an downstream gene (Figure 3B). Regardless of the source, the antisense transcription tends to be stronger than the corresponding sense transcription. Although meiotic genes are highly enriched for genes with more antisense than sense transcription (p < 10-9), they are not enriched among genes with greater than 5% antisense transcription but less than 100% antisense transcription (p = 0.47).

Antisense transcription at meiotic genes does not uniformly decrease as cognate sense transcription increases during meiosis (Figure S19), suggesting that antisense transcription per se may not inhibit sense transcription. This result is in contrast to the anti-correlation observed in S. cerevisiae (23, 27).

However, the antisense transcript itself provides a potential mechanism for tight stoichiometric regulation of expression by ensuring that any leaky transcription from meiotic genes during log growth is bound by its antisense transcript and destroyed by double-strand-specific RNases. In S. pombe, the double stranded RNA produced by bidirectional transcription is a substrate for the RNAi machinery

(13). Indeed, evidence of RNAi-dependent heterochromatin has been observed at several meiotic genes in S. pombe (32). However, we see no evidence for systematic enrichment of siRNA, Ago1 binding or

H3K9 methylation (based on published datasets) at antisense-transcribed genes (33, 34). Moreover, we see no significant effect of dcr1 or ago1 deletions on sense or antisense transcript levels at several antisense-transcribed loci (data not shown). However, in the absence of Dcr1, it is possible that double- strand RNA can be destroyed by Pac1, an essential RNase-III-type double-strand-specific RNase. 15 Global conservation of expression programs within fission yeasts

To examine the evolution of gene expression in the fission yeast clade, we compared the patterns of differential gene expression between the species by correlating the levels of gene expression in each condition. As expected, in S. pombe, expression levels in the stress conditions of diauxic shift, early stationary phase and heat shock are more similar to each other (r = 0.71 - 0.80) than any of them are to log phase (r = 0.25 - 0.63). Surprisingly, we found that expression levels in the other species (with the exception of S. octosporus early stationary phase) showed much less difference between conditions (r =

0.78 - 0.97) and clustered with the S. pombe log-phase sample (Figure S21). These results suggest that the other species show a less robust transcriptional response to environmental conditions, as tested under standard lab conditions.

Despite these differences, the global organization of expression is conserved across species, with dominant programs for ‘growth’ and ‘stress’. For a more detailed look at gene regulation, we analyzed expression patterns across the four conditions and between the four fission yeast by phylogenic clustering to identify conserved modules of gene expression (Figure 4). In general, we find that the pattern of gene expression between species grown in similar conditions is similar, with two dominant patterns of gene expression: one associated with growth (log and heat shock) and another associated with stress (diauxic shift and early stationary phase). Moreover, similar expression clusters are enriched for similar gene annotations across the species. These results show that although the amplitude of expression changes is significantly higher in S. pombe than the other species, the overall patterns of expression are conserved.

Although patterns of transcriptional regulation during diauxic shift are similar across the fission yeast, the clade displays a number of differences when compared to other Ascomycetes (Table S23). In response to glucose depletion, fission yeast specifically up regulate genes involved in mitosis, including those involved in the kinetocore, the spindle pole body and the anaphase-promoting complex. In contrast, fission yeast down regulate genes involved in other aspects of the cell cycle, including genes 16 involved in cell-wall biosynthesis, S-phase DNA damage checkpoints and repair, transcription and splicing, all consistent with imminent withdrawal from the cell cycle. Further, they up regulate genes involved in meiosis, and in meiotic recombination in particular. None of these genes are significantly regulated in diauxic shift in S. cerevisiae (35) or other Ascomycetes (DAT and AR, unpublished data).

The apparently paradoxical up-regulation of mitotic genes and down-regulation of other cell-cycle genes reflects the fission yeast cell cycle, in which G2 is the longest phase of the cycle, so that most cells are in G2. This configuration contrasts with the budding yeast cell cycle, in which most cells are in G1. In response to starvation, fission yeast stop growing and withdraw from the cell cycle, but only after executing final mitotic divisions that reduces their size and allows them to arrest in G1 (36). The up regulation of meiotic genes in response to glucose depletion reflects the fission yeast haploid life style; fission yeast diploids are unstable and are induced to enter meiosis in response to even mild nutrient limitation.

Ancestral cis-regulatory mechanisms associated with novel and expanded functions

By analyzing the transcriptional regulatory mechanisms that control Schizosaccharomyces-specific transcriptional patterns and Schizosaccharomyces-specific genes, we identified conserved motifs associated with novel functions. We used a catalog of DNA motifs and their evolutionary history across

23 Ascomycota (NH, IW, HM, NF and AR, submitted), including three of the four Schizosaccharomyces species (S. japonicus, S. octosporus, and S. pombe). First, we found that several conserved regulatory motifs, all with a highly conserved regulatory function across the Ascomycota phylum, have acquired a new functionality in the stress response in the Schizosaccharomyces clade. For example, the motif bound by Rtg3 in S. cerevisiae is associated with amino acid metabolism genes across the phylum, including Schizosaccharomyces. In fission yeast, however, it is also enriched in genes important for

various stress responses, including H2O2, heat-shock, cadmium, and hyper-osmotic stress (Figure 5A).

Of the stress genes that have Rtg3 motifs in S. pombe, 36% are found only in the Schizosaccharomyces

17 clade. These genes are also associated with the Atf1/Sko1 regulatory motif, a major regulator of stress responses (Figure 5B). Thus, these Schizosaccharomyces-specific stress genes have been co-opted under the control of the stress regulator Atf1, leading to an expanded functional scope for a regulatory system that is not associated with the stress response in other species. We note that Rtg3 does not have a detectable ortholog in the Schizosaccharomyces clade (37), but the motif recognized by Rtg3 in S. cerevisiae is clearly identifiable in fission yeast, suggesting that these regulatory motifs are more conserved than their binding proteins.

We found a similar expansion of the regulatory scope of an established ancestral regulator in the regulation of meiotic genes. In particular, we found two highly conserved cell-cycle related regulatory motifs to be enriched in genes that have antisense transcripts (Figure 5C): the Fkh2 motif (p < 5X10-27), which is bound in S. pombe by the paralogs Fkh1 and Mei4 (38), and the Mbp1-Swi6 motif (p < 4X10-6), bound in S. pombe by MBF (39). Most of the Fkh1/Mei4 target genes with antisense transcripts (80%,

47 genes) are meiotic genes, the majority of which are specific to the Schizosaccharomyces clade

(Figure 5C). This observation suggests that the duplicated Fkh1 and Mei4 transcription factors have gained control over newly acquired genes, co-opting them under the same regulatory module as ancestral cell cycle targets.

Gene content reflects glucose-dependent lifestyle

Both fission yeast and budding yeast from the Saccharomyces clade produce ethanol by aerobic fermentation, making them critical for the human production of ethanol. This metabolic strategy is unusual, in that most other yeast do not ferment in the presence of oxygen, and arose independently in the two lineages (3, 40). In contrast to the convergent evolution of ethanol production, the utilization of ethanol has not converged; although budding yeast can efficiently catabolize ethanol, fission yeast cannot use ethanol as a primary carbon source. The evolution of respirofermentation in budding yeast

18 has been related to changes in gene content, most notably following a whole genome duplication (WGD) event, and in regulatory mechanisms of glucose repression (3, 41).

Similar to budding yeast, fission yeast have duplicate copies of the pyruvate decarboxylase (PDC) gene, needed to funnel pyruvate to fermentation. They also have orthologs of some of the major regulators (both activators and repressors) of respiratory genes, including the HAP2/3/4/5 complex members, the ADR1, TUP and MIG transcriptional regulators, and the major signaling components

(kinases) SNF1, SIP1 and SIP2 (3).

Despite the similarities, there are also substantial distinctions in the gene content between fission yeast and post-WGD budding yeast (such as S. cerevisiae) (Figure S22). The following six examples of evolutionary changes to central genes in carbon metabolism shed light on the distinct metabolic capacities of fission yeast. i) Fission yeast lack the glyoxylate cycle, preventing the use of ethanol as a sole carbon source (40, 42). ii) Although budding yeast use both glycogen and trehalose as internal carbohydrate stores, fission yeast have lost the glycogen biosynthetic pathway. Consistent with this loss, they have higher intracellular levels of trehalose (MS, DAT, and AR, unpublished data), which correlates with the duplication of the tps2 gene for trehalose-phosphate synthase. iii) Fission yeast have fewer pairs of paralogs within glycolysis genes than budding yeast: although post-WGD budding yeast species have paralogous gene pairs for enzymes catalyzing six of ten glycolysis reactions, fission yeast have paralogous gene pairs for enzymes catalyzing only three reactions (glyceraldehyde-3-phosphate dehydrogenase, enolase, and hexokinase), and most notably lack a paralog for the pfk1 gene (which is duplicated in both pre- and post-WGD Saccharomycotina species). iv) Fission yeast have a reduced capacity to generate glucose from other carbon sources. In particular, they lack phosphoenolpyruvate carboxykinase (PCK1), which is required for gluconeogenesis from ethanol. This loss further limits their ability to use ethanol as a primary carbon source. S. japonicus also lacks FBP1, which is required for gluconeogenesis from non-sugar carbon sources, including glycerol. v) Fission yeast have not expanded the ADH genes, as has happened in the post-WGD budding yeast. In contrast to the four 19 ADH1 paralogs in post-WGD species, there is a single adh1 ortholog in S. pombe, S. octosporus and S. cryophilus, and two adh1 paralogs in S. japonicus. The fission yeast ADH1 homologs are similar to S. cerevisiae ADH1, and to the ancestrally reconstructed ADH, both of which are kinetically tuned for ethanol production, rather than consumption (43). vi) Fission yeast lack some, but not all, of the critical regulatory proteins known to control glucose repression or activation of respiratory functions in the absence of glucose in S. cerevisiae. Most notably, they lack CAT8 and SIP4, known to activate gluconeogenesis genes in S. cerevisiae. All of these adaptations are consistent with the inability of fission yeast to consume ethanol as a sole carbon source. The loss of these conserved enzymes highlights the evolution of the fission yeast clade away from generalist metabolism towards one dependent solely on glucose.

The transcriptional regulation of some of these key enzymes further reinforces the similarities and differences of the metabolic strategy of the fission yeast species and the post-WGD species. On the one hand, in species from both clades, as glucose is depleted during the diauxic shift, the expression of respiratory genes (oxidative phosphorylation enzymes, TCA cycle) is induced. On the other hand, the expression of the genes encoding the pyruvate dehydrogenase complex, which channels pyruvate to respiration, and adh1 genes are reduced, and the expression of the ALD genes is highly induced. ALD genes are repressed in the post-WGD species, such as S. cerevisiae (35), and may provide an alternative mechanism for generating acetyl-coA in fission yeast.

These observations explain the previously puzzling lack of ethanol assimilation by S. pombe. The lack of efficient ethanol catabolism by fission yeast demonstrates that aerobic fermentation did not evolve in these species as a way to create a by-product that could be consumed later. Instead, the ethanol is produced as a waste product, possibly because the production of ethanol is toxic to competing micro-organisms. Interestingly, the fission yeast approach to aerobic fermentation appears to have evolved as early as 250 million years ago (Figure S3), long before the budding yeast whole genome

20 duplication, and may represent an early evolutionary response to the rise the glucose-rich fruits of flowering plants (44, 45).

The implications of fission yeast functional comparative genomics

The comparative analysis of genome structure and expression in the fission yeast opens up a new and important eukaryotic clade to modern genomic analysis. It exposes evolutionary innovation within the clade and provides insights into fundamental properties of genome function.

Our analysis of centromere structure and evolution demonstrates that these critical chromosomal features can be dramatically rearranged without altering their function, while being remarkably stable in chromosomal position over evolutionary time. Because of their complicated and repetitive nature, centromeres have generally been excluded from genomic analysis. The well-assembled centromeres of the fission yeast provide rare insight into centromeric biology and will serve as a foundation for further work on centromeric function and evolution across .

The discovery of conserved antisense transcription of a large number of meiotic genes demonstrates that antisense transcription can act in a systematic manner to regulate entire pathways of gene expression. This system affords opportunities to study antisense regulation of gene expression at both the mechanistic, single gene level and also the systematic, gene network level.

Aerobic fermentation is a major evolutionary innovation in microbial metabolism that arose in parallel in the fission yeast and budding yeast lineages. However, the gene content and expression patterns of fission yeast show that this clade evolved the ability to produce ethanol, but not to efficiently consume it as a primary carbon source. Moreover, it has actually lost the ability to metabolize non- glucose carbon sources making it increasingly dependent on glucose-rich environments. This evolutionary trajectory suggests that ethanol production arose as a toxic waste product in a strategy to consume glucose more quickly and that efficient ethanol catabolism was a secondary innovation, likely made possible by the budding yeast whole-genome duplication.

21 In addition to the specific discoveries presented here, the dataset and tools generated will provide powerful tools for future fission yeast research, both on the well-studied S. pombe and on other aspects of the other fission yeast species. For example, S. japonicus possesses a hyphal life style and is poised to become an important model for fungal hyphal development (46). The comparative power of the four genomes will allow for more rigorous analysis of metazoan homologs in the clade, making fission yeast a more powerful model for eukaryotic biology.

22 References 1. S. L. Forsburg, The best yeast? Trends Genet 15, 340 (1999).

2. V. Wood et al., The genome sequence of Schizosaccharomyces pombe Nature 415, 871 (2002).

3. C. L. Flores, C. Rodriguez, T. Petit, C. Gancedo, Carbohydrate and energy-yielding metabolism in

non-conventional yeasts. FEMS Microbiol Rev 24, 507 (2000).

4. J. Leonardi, J. A. Box, J. T. Bunch, P. Baumann, TER1, the RNA subunit of fission yeast

telomerase. Nat Struct Mol Biol 15, 26 (2008).

5. L. H. Wong, K. H. Choo, Evolutionary dynamics of transposable elements at the centromere.

Trends Genet 20, 611 (2004).

6. H. P. Cam, K. Noma, H. Ebina, H. L. Levin, S. I. Grewal, Host genome surveillance for

retrotransposons by transposon-derived proteins. Nature 451, 431 (2008).

7. M. Zaratiegui et al., CENP-B preserves genome integrity at replication forks paused by

retrotransposon LTR. Nature (2010).

8. C. Casola, D. Hucks, C. Feschotte, Convergent domestication of pogo-like transposases into

centromere-binding proteins in fission yeast and mammals. Mol Biol Evol 25, 29 (2008).

9. S. I. Grewal, A. J. Klar, A recombinationally repressed region between mat2 and mat3 loci shares

homology to centromeric repeats and regulates directionality of mating-type switching in fission

yeast. Genetics 146, 1221 (1997).

10. B. Fishel, H. Amstutz, M. Baum, J. Carbon, L. Clarke, Structural organization and functional

analysis of centromeric DNA in the fission yeast Schizosaccharomyces pombe. Mol Cell Biol 8,

754 (1988).

11. K. Ishii et al., Heterochromatin integrity affects chromosome reorganization after centromere

dysfunction. Science 321, 1088 (2008).

12. S. I. Grewal, RNAi-dependent formation of heterochromatin and its diverse functions. Curr Opin

Genet Dev 20, 134 (2010). 23 13. T. A. Volpe et al., Regulation of Heterochromatic Silencing and Histone H3 Lysine-9 Methylation

by RNAi Science 297, 1833 (2002).

14. D. H. Beach, Cell type switching by DNA transposition in fission yeast Nature 305, 682 (1983).

15. R. Egel, The pedigree pattern of mating-type switching in Schizosaccharomyces pombe Curr

Genet 8, 205 (1984).

16. M. Kelly, J. Burke, M. Smith, A. Klar, D. Beach, Four mating-type genes control sexual

differentiation in the fission yeast. EMBO J 7, 1537 (1988).

17. B. Arcangioli, A. J. Klar, A novel switch-activating site (SAS1) and its cognate binding factor

(SAP1) required for efficient mat1 switching in Schizosaccharomyces pombe. EMBO J 10, 3025

(1991).

18. B. T. Wilhelm et al., Dynamic repertoire of a eukaryotic transcriptome surveyed at single-

nucleotide resolution. Nature 453, 1239 (2008).

19. J. Z. Levin et al., Comprehensive comparative analysis of strand-specific RNA sequencing

methods. Nat Methods 7, 709 (2010).

20. M. F. Lin, I. Jungreis, M. Kellis, PhyloCSF: a comparative genomics method to distinguish

protein-coding and non-coding regions Nature Precedings (2010).

21. N. F. Kaufer, J. Potashkin, Analysis of the splicing machinery in fission yeast: a comparison with

budding yeast and mammals. Nucleic Acids Res 28, 3003 (2000).

22. K. Struhl, Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nat Struct Mol

Biol 14, 103 (2007).

23. M. Yassour et al., Strand-specific RNA sequencing reveals extensive regulated long antisense

transcripts that are conserved across yeast species. Genome Biol 11, R87 (2010).

24. M. Gomez, F. Antequera, Organization of DNA replication origins in the fission yeast genome.

EMBO J 18, 5683 (1999).

24 25. M. L. Eaton, K. Galani, S. Kang, S. P. Bell, D. M. Macalpine, Conserved nucleosome positioning

defines replication origins. Genes Dev 24, 748 (2010).

26. A. B. Lantermann et al., Schizosaccharomyces pombe genome-wide nucleosome mapping reveals

positioning mechanisms distinct from those of Saccharomyces cerevisiae. Nat Struct Mol Biol 17,

251 (2010).

27. Z. Xu et al., Bidirectional promoters generate pervasive transcription in yeast. Nature 457, 1033

(2009).

28. M. Zofall et al., Histone H2A.Z cooperates with RNAi and heterochromatin factors to suppress

antisense RNAs. Nature 461, 419 (2009).

29. J. Mata, R. Lyne, G. Burns, J. Bahler, The transcriptional program of meiosis and sporulation in

fission yeast. Nat Genet 32, 143 (2002).

30. A. Moldon et al., Promoter-driven splicing regulation in fission yeast. Nature 455, 997 (2008).

31. N. Averbeck, S. Sunder, N. Sample, J. A. Wise, J. Leatherwood, Negative control contributes to an

extensive program of meiotic splicing in fission yeast. Mol Cell 18, 491 (2005).

32. H. P. Cam et al., Comprehensive analysis of heterochromatin- and RNAi-mediated epigenetic

control of the fission yeast genome. Nat Genet 37, 809 (2005).

33. I. Djupedal et al., Analysis of small RNA in fission yeast; centromeric siRNAs are potentially

generated through a structured RNA. EMBO J 28, 3832 (2009).

34. M. Buhler, N. Spies, D. P. Bartel, D. Moazed, TRAMP-mediated RNA surveillance prevents

spurious entry of RNAs into the Schizosaccharomyces pombe siRNA pathway. Nat Struct Mol

Biol 15, 1015 (2008).

35. J. L. DeRisi, V. R. Iyer, P. O. Brown, Exploring the metabolic and genetic control of gene

expression on a genomic scale. Science 278, 680 (1997).

36. G. Costello, L. Rodgers, D. Beach, Fission yeast enters the stationary phase Go state from either

mitotic G1 or G2 Current Genetics 11, 119 (1986). 25 37. I. Wapinski, A. Pfeffer, N. Friedman, A. Regev, Natural history and evolutionary principles of

gene duplication in fungi. Nature 449, 54 (2007).

38. H. Abe, C. Shimoda, Autoregulated expression of Schizosaccharomyces pombe meiosis-specific

transcription factor Mei4 and a genome-wide search for its target genes. Genetics 154, 1497

(2000).

39. N. F. Lowndes, C. J. McInerny, A. L. Johnson, P. A. Fantes, L. H. Johnston, Control of DNA

synthesis genes in fission yeast by the cell-cycle gene cdc10+ Nature 355, 449 (1992).

40. J. Piskur, E. Rozpedowska, S. Polakova, A. Merico, C. Compagno, How did Saccharomyces

evolve to become a good brewer? Trends Genet 22, 183 (2006).

41. M. Kellis, N. Patterson, M. Endrizzi, B. Birren, E. S. Lander, Sequencing and comparison of yeast

species to identify genes and regulatory elements. Nature 423, 241 (2003).

42. P. de Jong-Gubbels, J. P. van Dijken, J. T. Pronk, Metabolic fluxes in chemostat cultures of

Schizosaccharomyces pombe grown on mixtures of glucose and ethanol. Microbiology 142, 1399

(1996).

43. J. M. Thomson et al., Resurrecting ancestral alcohol dehydrogenases from yeast. Nat Genet 37,

630 (2005).

44. C. D. Bell, D. E. Soltis, P. S. Soltis, The age of the angiosperms: a molecular timescale without a

clock. Evolution 59, 1245 (2005).

45. The Origins of Angiosperms and their Biological Consequences (Cambridge University Press,

1987),

46. K. Enczi, M. Yamaguchi, M. Sipiczki, Morphology transition genes in the dimorphic fission yeast

Schizosaccharomyces japonicus. Antonie Van Leeuwenhoek 92, 143 (2007).

47. Assemblies, annotations, gene sets, and single nucleotide polymorphisms are available in GenBank

and at the Broad Institute Schizosaccharomyces website

26 . The Broad Institute website provides search, visualization and BLAST tools, as well as

download of assemblies, annotations and gene sets. The expression data can be visualized in the

IGV genome browser launched from the Broad Institute Schizosaccharomyces website. The

Schizosaccharomyces annotations will be available at pomBase .

Assemblies reported here have been deposited in GenBank under the following project accession

numbers: ABHY04000000 (S. octosporus), ACQJ02000000 (S. cryophilus), AATM02000000 (S.

japonicus). The S. pombe SNP sequencing data sets have been deposited as [in process]

(NCYC132) and [in process] (SPK1820). The RNA-Seq data sets have been deposited at the

NCBI SRA as [in process]. The S. japonicus siRNA datasets have been deposited at the NCBI

SRA as ERP000479 and [in process].

48. This work was supported by the US National Human Genome Research Institute (NHGRI) under

the Fungal Genome Initiative at the Broad Institute. C.N. and M.F.L. were supported by NHGRI,

US National Institutes of Health (NIH), Department of Health and Human Services; R.M. was

supported by the NIH; K.H. was supported by The Danish Research Council; C.A.N and C.A.M

were supported by the Biotechnology and Biological Sciences Research Council; A.R. was

supported by the Human Frontiers Science Program, a Career Award at the Scientific Interface

from the Burroughs Wellcome Fund, the Sloan Foundation, and an NIH Director’s PIONEER

award. We thank Asao Fujiyama and Atsushi Toyoda for generating DNA sequence, Marcia Lara

and Nicole Stange-Thomann for developing molecular biology protocols, Jim Robinson, Manuel

Garber, Paul Muller for technical advice and support, Amar Klar for providing S. pombe var

kambucha (SPK1820), Leslie Gaffney for assistance with the figures, Kendra Mar and Judy

Mwangi for administrative support, and Christina Cuomo for comments on the manuscript. We

also gratefully acknowledge the contributions of the Broad Institute Sequencing Platform.

27 Supporting Online Material

SOM Text

Materials and Methods

Figs. S1 to S24

Tables S1 to S32

Supplemental References

28

Table 1 Sequence and assembly statistics

somal

-

gs

Species version GC Conti Total Contig Length N50 Contig Scaffolds ScaffoldN50 Total Scaffold Length Chromosomes Total Chromosome Length Chromo (nTel/Gaps* nCen)** S. pombe SP2 36% 7 12.57Mb 3 4.54Mb 12.57Mb 3 12.57Mb 4 (0) S. octosporus SO5 38% 20 11.27Mb 1.78Mb 8 2.58Mb 11.28Mb 3 11.60Mb 9 (1) S. cryophilus SJ4 44% 85 11.13Mb 1.85Mb 33 1.91Mb 11.38Mb 3 11.19Mb 23 (4***) S. japonicus SCY3 38% 194 11.52Mb 290kb 26 1.39Mb 11.55Mb

Sequence data included ABI ABI ABI 454 XLR BAC ABI pOT pJAN Fosmid 454 FLX Paired 454 XLR Paired Paired Paired Paired Fragment 500 Fragment Illumina 75 Species 750bp 750bp 750bp 750bp 250 bases bases 500 bases bases S. pombe S. octosporus 6.4kb 168.5kb 1.7Mb S. cryophilus 718.0kb 1.3Mb S. japonicus 2.8kb 124.1kb 33.0kb 27.7kb 686.5kb 15.5Mb

* Chromosomal gaps are gaps in sequence continuity with the chromosomal scaffolds, including the centromeric gaps ** Non-telomeric and non-centromerric chromosomal gaps exculed gaps in the sub-telomeric and peri- centromeric regions *** All 4 non-telomeric and non-centromeric chromosomal gaps in S. japonicus are within one 15 kb locus

29

Table 2 Annotation Statistics

Genes Genes Exons Exons CDSs CDSs Genes Genes before after before after unchanged changes added deleted S. pombe 5065 5144 9917 10497 4656 408 89 6 S. octosporus 4925 4988 10164 10578 4354 571 65 0 S. cryophilus 5057 5126 10564 11026 4402 655 73 1 S. japonicus 4814 4882 9776 10351 4121 693 66 0 Before refers to standard computational annotation, or in the case of S. pombe, manually curated annotation hosted at GeneDB.org. After refers to after the RNA-seq-based transcript analysis.

Table 3 Conservation of gene content and structure

Orthologous groups Introns same gain loss dup same gain loss S. pombe 4218 321 83 23 2901 297 27 S. octosporus 4218 133 48 5 2901 25 8 S. cryophilus 4218 283 73 11 2901 75 4 Ancestor of Soct and Scry 4218 103 44 15 2901 396 0 Ancestor of Spom, Soct and Scry 4218 339 159 29 2901 415 412 S. japonicus 4218 242 0 18 2901 708 214 Ancestor of Schizosaccharomyes 640 745 Total 2061 1152 101 1916 665

30 Figure Legends

Figure 1 Comparative analysis is Schizosaccharomyces phylogeny and chromosome structure

A) A maximum-likelihood phylogeny of 12 fungal species based 440 core orthologs (each occurring once in each of the genomes) from fly to yeast. A maximum-parsimony analysis produces the same topology. Each approach has 100% bootstrap support for all nodes.

B) The chromosome structure of S. pombe, S. octosporus and S. japonicus. The middle bar in each figure represents the chromosome and its centromere: red for Chromosome 1, blue for Chromosome 2 and yellow for Chromosome 3. Above and below each chromosome are depicted the chromosomes in the other two species to which the genes on the chromosome of interest map, using the same color scheme. Note that the pericentromeric regions of S. pombe and S. octosporus are the same colors, denoting the conservation of gene order around their centromeres. Above the S. pombe and S. japonicus chromosomes are depicted the distributions of transposons and mapping of siRNAs.

C) The centromeric repeat structures of S. pombe, S. octosporus and S. japonicus.

Figure 2 Polyadenylated transcription is predominantly confined to protein coding genes

The S. pombe genome was divided in to five different feature classes: protein coding sequence, intron sequence, untranslated sequence (5'- and 3' UTRs) and intergenic sequence (all nucleotides between

UTRs of protein coding genes). The frequency of RNA-Seq reads was calculated over sequential 20 bp windows across these features; for coding sequence, the frequency of antisense reads was also calculated. Frequency was normalized to the maximum frequency within each feature class to compensate for the different class sizes.

Figure 3 Meiotic genes are subject to antisense transcription

A) Examples of antisense transcription of meiotic genes. Above and below the chromosome coordinates are the coding sequence annotations on the top and bottom strand, respectively. Above and below these

31 are the strand-specific RNA-Seq read densities on a 0-300 scale. Note that the antisense transcription for spo4 is a distinct transcript, but that the antisense transcription of spo6 appears to be read-through transcription from SPBC1778.05c.

B) Enrichment of GO annotations within the set of protein-coding genes with more antisense than sense transcription. All terms with a p value of less than .01 are included, except for high-level terms (i.e. biological process and molecular function).

Figure 4 Expression profiles cluster into similar patterns with conserved biological functions

A) Expression clusters for each species. Gene expression profiles for each species were clustered as described in the Methods. The size of each heat map is proportional to the number of genes in the cluster and the number of genes in each is indicated. Similar cluster sizes and patterns reflect similar expression patterns between the species.

B) A selection of enriched GO terms for each cluster. The color intensity is proportional to the negative logarithm of the hyper-geometric p-value enrichment on a continuous scale of 0-10. Complete GO term enrichments are shown in Table S30.

Figure 5 Conserved regulatory motifs have acquired new functions and new target genes

A) The enrichment of gene classes regulated by the Rtg3-biding motif in 23 Ascomycota. This motif is enriched upstream of amino acid metabolism genes in all Ascomycota. However, in fission yeast, it is specifically enriched upstream of stress-response genes. S. cerevisiae (Scer), S. paradoxus (Spar), S. mikatae (Smik), S. bayanus (Sbay), C. glabrata (Cgla), S. castellii (Scas), K. waltii (Kwal), A. gossypii

(Agos), K. lactis (Klac), S. kluyveri (Sklu), D. hansenii (Dhan), C. guilliermondii (Cgui), C. lusitaniae

(Clus), C. albicans (Calb), C. tropicalis (Ctro), C. parapsilosis (Cpar), C. elongosporus (Celo), Y. lipolytica (Ylip), N. crassa (Ncra), A. nidulans (Anid), S. japonicus (Sjap), S. octosporus (Soct), S. pombe (Spom)

32 B) Enrichment of Rtg3- and Aft1-binding sites in front of stress response genes. Each row represents a gene. The strength of the strongest regulatory site upstream of the gene is indicated in the blue heat map. The expression of the gene in diauxic shift (ds) and early-stationery phase (es) relative to log phase is indicated in the blue-yellow heat map. Genes specific to the fission yeast clade are indicated in orange.

C) Enrichment of Fkh2/Mei4- and MBF-binding sites in front of antisense-transcribed genes. As in b, but each row represents a gene with greater antisense than sense transcription. Gene associated with meiosis (29) are indicated in magenta.

33 Rhind et al. A Aspergillus fumigatus Pezizomycotina Figure 1 Neurospora crassa (Filamentous Fungi)

Saccharomyces cerevisiae

Ashbya gossypii Ascomycota Saccharomycotina Debaryomyces hansenii (Budding Yeast)

Yarrowia lipolytica

Schizosaccharomyces pombe

Schizosaccharomyces octosporus Schizosaccharomyces cryophilus (Fission Yeast)

Schizosaccharomyces japonicus

Ustilago maydis

0.1 Rhizopus oryzae Mucormycotina

B Chromosome 1 Chromosome 2 Chromosome 3

S. pombe siRNA Transposons So synteny Sj synteny

S. octosporus Sp synteny Sj synteny

S. japonicus siRNA Transposons Sp synteny So synteny

C S. pombe

Cen1

Cen2

Cen3

dg/dh repeat lmr Cnt tRNA 10 kb S. octosporus Cen1

Cen2

Cen3

CenA CenB CenC CenE tRNA 10 kb CenBl CenBr CenD CenF

S. japonicus

Cen1L Cen1R

Cen2

Cen3 10 kb tRNA Tj 2 Tj 5 Tj 7 Tj 9 Tj 1 Tj 3 Tj 6 Tj 8 Tj 10 Rhind et al. Figure 2

1.0

0.8

0.6

0.4 Coding Intron

Relative frequency UTR Intergenic 0.2 Antisense

0.0 100 101 102 103 104 105 Relative RNA-seq coverage depth Rhind et al. Figure 3

A Spo4 antisense transcription

spo4 SPBC21C3.19

3,826,000 3,827,000 3,828,000 3,829,000 3,830,000 3,831,000 (bp)

SPBC21C3.17c

Spo6 antisense transcription

spo6

3,101,000 3,102,000 3,103,000 3,104,000 3,105,000 3,106,000 3,107,000 3,108,000 (bp)

SPBC1778.03c SPBC1778.05c fim1

B 10

8

6

4 Significance (log[1/ P -value]) –3 –3 2 –8 –3 –10 –6 5.60 × 10 5.60 × 10 1.33 × 10 1.53 × 10 4.38 × 10 4.54 × 10 0 Meiosis Meiotic Meiotic Nuclear Ascospore chromosome recombination chromosome formation segregation

S. pombe GO annotation

Rhind et al. Mei4/ Figure 5 A B Rtg3Aft1 C Fkh1 MBF 2 2 2 2

1 1 1 1

0 0 0 0 S.jap S.oct S.pom Rtg3 S.jap S.oct S.pom 2 ds es ds es ds es ds es ds es ds es S.jap S.oct S.pom S.jap S.oct S.pom S.jap S.oct S.pom S.jap S.oct S.pom

1

0 Scer Spar Smik Sbay Cgla Scas Kwal Agos Klac Sklu Dhan Cgui Clus Calb Ctro Cpar Celo Ylip Ncra Anid Sjap Soct Spom

,Heat,Cd,Sorbitol,CESR-induced 2 O 2 AA metabolism

Peroxisome, FA metabolism Genes with antisense transcrip ts Mitochondrion

Stress response - H2O2, heat, Cd, sorbitol, CESR-induced No enrichment in function Stress Response Genes : H Stress Response Genes Schizo specific Schizo

None Strong 0.125 80 specific genes

genes None Strong 0.125 80 Meiotic genes Binding site Fold change Binding site Fold change strength strength