<<

Comparative genomics of biotechnologically important Supplementary Appendix

Contents Note 1 – Summary of literature on ascomycete yeasts used in this study ...... 3 CUG-Ser yeasts ...... 3 Other ...... 5 ...... 10 Note 2 – overview ...... 11 culturing, identification, DNA and total RNA extraction ...... 12 sequencing and assembly ...... 12 Transcriptome sequencing and assembly ...... 13 Table S1. Genome statistics ...... 14 Table S2. Annotation statistics ...... 15 Figure S1. Genome size, repeat content, and protein conservation measures for the yeasts...... 16 Figure S2. Intron occurrence in 303 predicted orthologs in the yeasts...... 17 Figure S3. Distribution of TY-LTR elements in the yeasts...... 18 Figure S4. Organization of rDNA in yeast species...... 19 Note 3 – Organism phylogeny ...... 20 Figure S5. Phylogenetic tree inferred with ExaML ...... 21 Figure S6. Phylogenetic tree inferred with FastME ...... 22 Note 4 – Alternative genetic codes: CUG coding for Ser and Ala...... 23 Figure S7. Translation of CUG codons to Ser or Ala ...... 24 Table S3. Results of Bagheera predictions of codon usage for selected yeasts...... 25 Dataset S1. Genetic code of P. tannophilus ...... 25 Note 5 – Correlation of genomically encoded enzymes to metabolic traits ...... 26 Assigning genes to enzyme functions ...... 27 Figure S8. Fermentative, metabolic, osmotolerance, and temperature-dependent growth of yeasts...... 28 Figure S9. Loss of Complex I genes and of fermentative lifestyle ...... 29 Figure S10. Correlation between disaccharide metabolism and genome content ...... 30

1

Note 6 – clusters ...... 31 Dataset S2. Phylogenetic distribution of gene clusters ...... 32 Dataset S3. Pairwise associations of ECs ...... 32 Note 7 – MAT locus organization and mating-type switching ...... 33 Pan-ascomycete synteny at the MAT locus...... 33 Homothallism and heterothallism...... 33 Origin of mating type switching in Saccharomycotina...... 34 Loss of H3K9me2/3 heterochromatin...... 34 Figure S11. Mating-type locus organization and synteny in ...... 37 Figure S12. MAT locus isomeric structures in Pachysolen tannophilus and Ascoidea rubescens...... 38 References ...... 40

2

Note 1 – Summary of literature on ascomycete yeasts used in this study

The organisms used in this study were selected to characterize the genomes of yeasts important or potentially important from a biotechnological perspective, and to fill in gaps in our understanding of phylogenetic relationships. Many of the yeasts were selected because of their abilities to ferment or metabolize xylose, cellobiose, arabinose, galactose, maltose and other substrates such as methanol or to produce useful fuels and advanced chemicals. Even though many yeasts and fungi will grow on various carbon sources aerobically, fewer will convert these substrates into ethanol or other chemicals with meaningful yields at significant rates. Some species were selected for their capacities to make lipids and others were selected for their acid- thermo- cryo- or osmotolerance. Even though we understand some of the mechanisms determining these traits, we do not know how widespread they are, nor do we know the constellation of metabolic activities supporting these properties. By sequencing the genomes of several species having the traits of interest, and by comparing those genomes to other species without those traits, we expected to discover new mechanisms. We also sought to better understand diverse species that occupy habitats such as insect guts, soil or exudates of plants. Finally, we wanted to broaden knowledge about yeast species that could be adapted to new uses. Increasingly, biotechnology is seeking new yeasts as “platform organisms” for the production of natural or engineered metabolites. Synthetic biology and metabolic engineering are increasing the capacities of microbes to make specialized, rapid xylose fermentation could provide novel routes for the bioconversion of agricultural and wood harvest residues into renewable fuels and chemicals. This literature summary provides a summary of the major physiological or genetic properties, and growth habits as documented in the existing literature. The text is organized to follow our overall taxonomic structure as determined by whole genome comparisons. Selected members of Pezizomycotina and Basidiomycota have been included in our study for phylogenetic comparisons, but they are not covered in this literature summary.

CUG-Ser yeasts

Scheffersomyces stipitis Scheffersomyces stipitis(1) is a predominantly haploid, heterothallic yeast related to Candida shehatae, Spathaspora passalidarum and several other pentose metabolizing ascomycetous yeast species(2, 3). Strains of Sch. stipitis are among the best xylose-fermenting yeasts in type culture collections(4-7). Fed batch cultures of Sch. stipitis produce up to 47 g/L of ethanol from xylose at 30°C under low aeration conditions(5). Sch. stipitis belongs to a group of yeasts that uses code 12 in which CUG codes for serine rather than leucine(8). This makes correct translation of many heterologous genes problematic. Sch. stipitis CBS 6054 is related to several yeasts found as endosymbionts of that inhabit and degrade white-rotted wood(9-11). Unlike , which regulates fermentation by sensing the presence of fermentable sugars such as glucose, Sch. stipitis induces fermentative activity in response to oxygen limitation(12). A description of this genome was published in 2007(13). The comparative

3 analysis presented in the current publication helps to identify the genomic features underlying its physiological traits.

Spathaspora passalidarum Spathaspora passalidarum(10) like P. stipitis and Candida tenuis (see below), can ferment and assimilate xylose, cellobiose, glucose and maltose. Efficient fermentation of xylose, a five- carbon sugar that is a major component of plant cell walls, is a major step towards effective biofuel production from plant materials. These three yeasts are also unusual in that they are found in symbiotic association with wood-boring beetles, and comparative study may give new insight into the genetic underpinnings of symbiotic relationships(9, 14). A number of other species of Spathaspora have been described with similar properties(15-17). However, not all related species ferment xylose well(18). S. passalidarum is unusual in that it has a strong capacity for xylose fermentation in the absence of or with very low aeration(19). Moreover, it will co-ferment glucose, xylose and cellobiose, which makes this yeast particularly useful for ethanol production from mixed cellulosic and hemicellulosic sugars(20). An analysis of the S. passalidarum genome was previously published(21).

Candida tanzawaensis Candida tanzawaensis is likewise representative of the large and mostly unstudied group of yeasts that are often associated with insects, particularly -feeding beetles(22). The phylogenetic classification of C. tanzawaensis was previously studied by rRNA gene sequencing(23), and a number of related yeasts have been identified(24). C. tanzawaensis has mainly been studied from a taxonomic perspective, and relatively little is known about its physiology or biochemistry. One of the best sources of microbial enzymes that can degrade polysaccharides and other components of mushrooms are the microbes growing on basidiomata, which are the fruiting bodies of basidiomycetes. The basidiomata of shelf fungi and mushrooms are comprised of highly complex polysaccharides that are difficult to degrade enzymatically, so the present study examined the genome of this yeast to see if it contains enzymes capable of degrading unusual polysaccharides as well as other traits.

Hyphopichia burtonii Hyphopichia burtonii is best known for its ability to grow on starch(25). It is halotolerant(26), and has been found in cured meat(27, 28). The species is widespread in nature and appears to be strongly competitive(29). Hyphopichia burtonii along with Wickerhamomyces anomalus and Saccharomycopsis fibuligera, are spoilage yeasts that cause chalk defects on partially baked breads and traditional cookies(30). H. burtonii and it appears to have fairly strong amylolytic activity.

Debaryomyces hansenii hansenii is a cryotolerant, osmotolerant and halotolerant marine yeast that can tolerate salinity levels up to 24%. Its is largely attributable to its highly inducible activity of glycerol-3-phosphate dehydrogenase(31), and its consequential ability to produce large amounts of glycerol(32). D. hansenii is the most common species of yeast found in all types of cheeses and is common in dairies and in brine due to its ability to grow in the presence of at low temperature and to metabolize lactic and citric acids. It is widely studied mainly for its role in cheese production and for its physiological properties. D. hansenii is able to overproduce riboflavin(33) and is normally considered non-pathogenic. The genome of D. hansenii has been previously published(34).

4

Candida tenuis Candida tenuis is another xylose fermenting yeast belonging to the CUG-Ser clade that is found in association with wood or bark larvae(35, 36) – in this case as an endosymbiont of the gut in longhorn beetles(14, 37) (Cerambycidae) where they are found in specialized structures termed mycetomes(37). C. tenuis has been long recognized and well-studied for its ability to ferment xylose(38). It is perhaps best studied for its xylose (aldose) reductase(39), which has been engineered to increase activity with NADH as a cofactor(40-42).

Metschnikowia bicuspidata Metschnikowia bicuspidata is an aquatic yeast(43) that has been reported to infect freshwater prawns(44, 45) and brine shrimp and to cause mortality when infected shrimp are fed to salmon(46). A number of yeasts such as Wickerhamomyces anomalus(47), Kluyveromyces siamensis(48), membranifaciens(43), and Williopsis saturnus(49) all produce killer toxins effective against M. biscuspidata.

Babjeviella inositovora Kurtzman and Suzuki recently reclassified Pichia inositovora (Golubev & Blagodatskaya) and Yamadazyma inositovora (Billon-Grand)(50) into a new genus, Babjeviella inositovora(51).One unusual feature of this yeast species is that it harbors linear DNA plasmids with a novel killer toxin activity(52, 53). Sequencing showed that the smaller of these elements has four ORFs that account for 95% of the sequence: a putative immunity gene, a B-type DNA polymerase, and proteins similar to the alpha- and gamma-subunits of Kluyveromyces lactis killer toxin, zymocin(54). In common with other linear extrachromosomal DNA elements, they have terminal inverted repeat sequences(55). Chitin-binding and hydrophobic domains enable a separate anticodon nuclease element to enter the target cell and cleave specific tRNAs 3' of the wobble nucleotide(56). Recently, Kast et al. have shown that ORF4 of B. inositovora causes specific fragmentation of the 25S and 18S rRNA in the target yeast(57).

Other Saccharomycotina This clade includes the methylotrophs, Ogataea polymorpha, Candida arabinofermentans, and Komagataella pastoris. As mentioned in the main text, methylotrophy is distinguished by the presence of methanol oxidase (EC 1.1.3.13), but some genes for other enzymatic activities are also present in relative abundance. These include formaldehyde transketolase (EC 2.2.1.3), which is completely absent in the CUG yeasts, along with higher levels of ubiquinone-linked NADH dehydrogenase (EC 1.6.5.3), and transaldolase (EC 2.2.1.2).

Komagataella phaffii The yeast strains commonly called Pichia pastoris, which are frequently used for heterologous protein production(58-61), are now recognized to come from two different species(62) in the genus Komagataella. As with Ogataea polymorpha, these yeasts can produce large amounts of heterologous proteins. These are often expressed using the alcohol oxidase (AOX1) promoter following induction on methanol, although other promoter systems have also been developed(63). Most of the widely-used biotechnology strains of "Pichia pastoris", including CBS 7435, its derivatives GS115 and X-33, and the host strains for the popular Invitrogen expression system, are the species Komagataella phaffii(62). Others, including stain DSM 70382 which is also used in biotechnology, are the species Komagataella pastoris(62). Complete genome sequences of K. phaffii CBS 7435(64) and GS115(65), and a draft genome sequence of K. pastoris DSMZ 70382(66) have previously been reported. The genomes of K.

5 phaffii and K. pastoris are significantly different in sequence, approximately 10% divergent at the nucleotide level. Our phylogenomic analyses were done using the GS115 (K. phaffii) genome sequence(65).

Ogataea polymorpha Ogataea polymorpha(67), which is still commonly known as Hansenula polymorpha, is one of two methylotrophic yeasts frequently used for heterologous protein expression – the other being Komagataella species ("Pichia pastoris")(58, 59). Most of the interest in these two yeasts stems from their abilities to produce large amounts of proteins from genes driven by alcohol oxidase (AOX1) promoters following induction on methanol. One advantage of O. polymorpha over K. phaffii is its higher level of thermotolerance(68). From a physiological perspective, most of the research on O. polymorpha has centered on its thermotolerance(69) alcohol oxidases(70, 71) and development of technologies for improved genetic manipulation(72, 73), although some studies have used O. polymorpha as a platform for metabolic pathway engineering(74). The group of strains previously called "Hansenula polymorpha" is now recognized to consist of two different species(67, 75): strains NCYC 495 and CBS 4732 are Ogataea polymorpha, while strain DL-1 is Ogataea parapolymorpha. We sequenced the genome of O. polymorpha strain NCYC 495 in this study, and the genome of O. parapolymorpha strain DL-1 was previously sequenced by Ravin et al. (76). The O. polymorpha and O. parapolymorpha genomes are approximately 10% divergent in genome sequence.

Candida arabinofermentans Candida arabinofermentans(77) was isolated from insect frass and related strains have been found in association with bark beetles(78). Like O. polymorpha and K. phaffii, it is methylotrophic(79). However, its ability to ferment L-arabinose to ethanol(80) makes it almost unique among the various classified yeast species(81). Prior to its genome sequencing in order to identify genes related to arabinose utilization(82), it had not been well studied.

Pichia membranifaciens Pichia membranifaciens produces organic acids and is a common food and wine spoilage yeast. It is the type species of the genus Pichia, which was previously a polyphyletic group based largely on ascospore morphology. P. membranifaciens is often found as a film on the surface of table wine, olives and cheeses(83-85), where it carries out oxidative metabolism. It will tolerate up to 11% ethanol by volume and has occasionally been reported as a contaminant in sugar cane juice, sulfite fermentation plants and natural tequila fermentations(86). It produces acetaldehyde, ethyl acetate, and iso-amyl acetate(87). Strains of P. membranifaciens have been reported with a strain dependent tolerance of 2 to 3 M NaCl(88), which probably accounts for its occurrence in olive brines(84, 89, 90). P. membranifaciens produces killer toxins active against Brettanomyces bruxellensis and grey mould disease of grapevines(91, 92). Despite its prevalence and apparent osmotolerance, P. membranifaciens has not been well studied from genetic or molecular biological perspectives prior to its genome sequencing.

Dekkera bruxellensis Dekkera bruxellensis (anomorph Brettanomyces)(93) is a hemiascomycetous yeast with a genome size of about 13 Mb. D. bruxellensis is best known for its wine spoilage activity(94, 95), in which it produces volatile ethylphenols(96-98). D. bruxellensis is particularly difficult to deal with because like S. cerevisiae, it is a facultative anaerobe(99-101) and it has a high tolerance for ethanol(101). D. bruxellensis is found in sourdough starters where it contributes to the acidity of the bread(102). D. bruxellensis will grow at low pH. It uses a wide range of carbon sources

6 under aerobic conditions and at least some strains will produce ethanol from xylose(103) and cellobiose(104) under oxygen limitation or anaerobiosis. These features have led some researchers to suggest D. bruxellensis as a promising yeast for the fermentation of lignocellulosic hydrolysates(103, 105).

Pachysolen tannophilus Pachysolen tannophilus is a rare and unusual yeast, having been isolated by Boidin and Adzet in 1957(79), and reportedly once again in 2010(106). It is best known as the first yeast shown to ferment xylose directly to ethanol(107, 108). James et al. developed a genetic system for P. tannophilus very early(109). Aside from xylose and galactose, P. tannophilus has the unusual ability to ferment glycerol to ethanol(110, 111). Early in its study, the presence(112) and the induction(113, 114) of xylose reductase (XOR, XYL1) and xylitol dehydrogenase (XID, XYL2) were shown to be instrumental in xylose assimilation. Multiple forms of XOR were demonstrated(115), and cultivation of P. tannophilus on nitrate was shown to enable aerobic growth on xylose while blocking fermentation(116). By selecting for rapid growth on xylitol and nitrate while preventing ethanol oxidation mutants with significantly higher xylose fermentation properties can be obtained(38). Soon, however, Scheffersomyces shehatae (previously Candida shehatae), a yeast known to be associated with wood boring insects(117) eclipsed the capacities of the best P. tannophilus isolates(118), and later, however, a broad screening of unconventional yeasts for xylose metabolism showed that S. stipitis, which is closely related to S. shehatae, had even greater xylose fermenting capacity(119). New publications on P. tannophilus have fallen in recent years, but it remains of interest mainly because of its unusual abilities to ferment xylose and glycerol, while resisting compounds in tanning liquors and various waste streams. A draft genome of P. tannophilus was previously reported(120) in addition to the complete, annotated genome sequence reported here.

Kluyveromyces lactis Kluyveromyces lactis (formerly Saccharomyces lactis) is a heterothallic species with a predominantly haplontic cycle (anamorph: Candida sphaerica). Many strains were originally isolated from milk-derived products and its name is derived from its ability to assimilate lactose and convert it to lactic acid. The natural habitat of K. lactis, however, is diverse. K. lactis has become attractive to science and biotechnology owing to its distinct metabolic and physiological properties. One of the most defining features of K. lactis is its ability to ferment galactose with high efficiency(121), and its consequent use in producing ethanol from cheese whey(122, 123). K. lactis has been used extensively as a probiotic (124). Aside from its use in dairy applications, K. lactis is well-recognized for carrying linear DNA plasmids encoding for killer factors(125, 126). Like the closely-related S. cerevisiae, K. lactis carries circular plasmids that enabled the early development of an autonomous replication-based genetic system(127-129). Dujon, Sherman and Fischer, et al. sequenced and published its genome in 2004(130), and that sequence was used in the present comparative analysis.

Eremothecium (Ashbya) gossypii The filamentous fungus gossypii, previously known as Ashbya gossypii, was first described in 1929 as a cotton pathogen transmitted by sucking insects(131). In addition to cotton, it infects other agricultural crops such as citrus fruits. It is likely better known as an appealing model to study filamentous growth(132) because it is haploid with a small genome, has efficient , propagates plasmids and grows on defined media(133). E. gossypii is used in industry for the production of riboflavin (134-136). The E. gossypii genome has been

7 used to annotate the genome of S. cerevisiae, to which it is closely related. The genomic sequence used in the present study was previously published (133, 137).

Saccharomyces cerevisiae Saccharomyces cerevisiae is so well known that proverbially, it needs no introduction. It holds the distinction of being the first fully sequenced eukaryotic genome(138). This fundamental work, which is employed in the present analysis, has been cited more than 1500 times. Probably one of the most defining features of S. cerevisiae is its capacity for anaerobic growth with only minimal nutritional supplements (139-141) and its simultaneous production of ethanol. S. cerevisiae lacks the native capacity for fermentation of cellobiose, xylose or arabinose, but all three of these features have been engineered (142-144). S. cerevisiae’s capacity for anaerobic growth on xylose has been demonstrated, but redox balancing and ATP production appear to be limiting when compared to anaerobic growth on glucose (145-150). S. cerevisiae has a very active glycolytic pathway for ethanol production, a high capacity for glucose uptake by facilitative transport and the capacity for anaerobic growth, but it lacks mitochondrial complex 1 (NADH:ubiquinone oxidoreductase), which is responsible for the oxidation of NADH coupled to proton translocation, and consequently for ATP synthesis. S. cerevisiae’s extensive genetic system(151, 152), the creation of a nearly complete collection of deletion mutants(153, 154), extensive analysis of protein-protein interactions(155-157) and expression studies(158, 159) have made S. cerevisiae the best understood eukaryotic (160, 161).

Hanseniaspora valbyensis Hanseniaspora valbyensis is an unusual yeast that is often found in traditional balsamic vinegar and cider fermentations. Balsamic vinegar is made by a natural fermentation of cooked juice, which is rich in pectins, has a high sugar content and a low pH. As such, H. valbyensis will propagate in a stressful environment(162). H. valbyensis is known for its significant pectinolytic activity(163), which could account for its prevalence in balsamic and natural cider fermentations(162, 164-166). It also possesses endo-glucanase activity(166). It does not produce high levels of ethanol but does form significant amounts of ethyl and phenethyl acetate(166). Its osmotolerance, acid tolerance and capacities for polysaccharide depolymerization, make it of potential biotechnological interest.

Cyberlindnera jadinii The asexual state of Cyberlindnera jadinii is the correct name for the well-known taxon Candida utilis, which has been used since the early 1900s as a fodder yeast for livestock and as a dietary supplement for humans. The species grows well on pentoses and tolerates lignin by- products, which has made it attractive for fermentation of spent sulfite liquor from paper processing(167, 168) or wood hydrolysates(169). It shows significant lipase activity and has been used to add value to the waste oil processing industry(170, 171). C. utilis has also been reported to synthesize (R)-phenylacetylcarbinol, a pharmaceutical precursor for L-ephedrine production(172-176). C. jadinii has been isolated from a variety of substrates, but the natural habitat of this species is uncertain. Some strains are from clinical sources, and the species appears to be a low-grade opportunistic pathogen(177). From a physiological perspective, C. jadinii is Crabtree negative, which means that cell yields can be high under aerobic conditions(178-180).

Wickerhamomyces anomalus Wickerhamomyces anomalus, previously known as Pichia anomala and Hansenula anomala, is frequently associated with spoilage or processing of food and grain products. Its capacity for

8 growth on a wide range of carbon sources at low pH under high osmotic pressure and with little or no oxygen enables it to propagate in a wide range of environments(181). It is a non- Saccharomyces wine yeast that contributes to wine aroma through the production of volatile compounds. In recent years it has been used as a biocontrol agent against other fungi due to its ability to produce mycocin killer toxins(182-184). It has been studied for its cyanide-resistant alternative oxidase activity(185, 186), and it possesses an active beta-glucosidase that plays a role in wine fermentations(187-190). It also produces phytase(191) and an exo-beta- (1,3)- glucanase(191). Schneider et al. published a draft genome of this yeast in 2012(192), but the genomic sequence, annotation and analysis reported here were performed at JGI.

Ascoidea rubescens Lindeau collected Ascoidea rubescens from sap flowing from felled beeches in the royal forest at Wolbeck. Brefeld studied the isolate and gave a comprehensive description in 1891. Ascoidea species live in close association with bark beetles, in slime fluxes of trees or behind bark. All species appear primarily to be disseminated by insect vectors(193-195). The A. rubescens genome was sequenced principally for phylogenetic comparisons.

Yarrowia lipolytica Yarrowia lipolytica is well recognized for its ability to produce lipids from glycerol or other carbon sources(196, 197). Its lipogenic activity along with roles in other processes has made it very important from a biotechnological perspective(85, 198-200). Genetic transformation, expression and secretion systems are well developed for Y. lipolytica(201-207). These tools have been used to engineer Y. lipolytica for the production of a wide range of products such as citric(208), succinic(209) and arachidonic(210) acids(211), erythritol(212) along with higher value eicosapentaenoic (EPA) and docosahexaenoic (DHA) essential fatty acids (213-217). Y. lipolytica mutants have been used extensively in the study of peroxisome biogenesis and function(218, 219). The copy of the genome of Yarrowia lipolytica strain CLIB122 used in the present study was obtained from Génolevures [Release Version: 2012/02/09](130).

Nadsonia fulvescens Nadsonia fulvescens var. elongata has no significant biotechnological applications. It is a relatively rare soil yeast(220, 221) that is characterized by a unique life cycle(222). N. fulvescens is one of the few yeasts that divide by bipolar budding, but it is not closely related to other species with similar budding characteristics(223, 224). Following conjugation between the parent and the first budding cell, the zygote moves into a second bud formed at the opposite end of the parent cell. A septum then arises between the parental cell and the second bud and the new bud becomes an ascus. This species forms ubiquinone with a side chain of six isoprene units, but neighboring species form ubiquinone with nine isoprene units. Other yeast species form ubiquinone with nine isoprene units, thus providing an unusual contrast. We sequenced its genome mainly to clarify the phylogeny of basal ascomycete yeast species.

Tortispora caseinolytica Tortispora caseinolytica (previously Candida caseinolytica) (225, 226) is notable for its apparent taxonomic diversity when its gene sequences from the large-subunit rRNA, the mitochondrial small-subunit rRNA, and cytochrome oxidase II are compared to those of other yeasts(79). It occurs in the rotting tissues of opuntia cactus in the Sonoran desert. It does not ferment any sugars and it assimilates only a limited number of carbon compounds, including 2- and 5- ketogluconic acids. It exhibits strong extracellular proteolytic activity on casein at pH 6.5, but gelatin is not hydrolyzed or is only weakly hydrolyzed by a few strains(225). When cultivated on

9 a medium containing casein, it secretes an active, broad-range protease. Like N. fulvescens and A. rubescens, the genome of T. caseinolytica was sequenced mainly to clarify phylogeny and to discover new genes from a taxonomically unusual organism.

Lipomyces starkeyi Lipomyces starkeyi has been studied extensively, and its capacity for lipid production has been recognized since the 1950’s(227-229). In recent years it has been studied mainly for its capacity to produce and accumulate large amounts of lipid from a wide range of carbon sources(230) including starch(231, 232), xylose(230, 233-237) and even sewage sludge(231, 238). With optimal ratios of carbon to nitrogen, L. starkeyi will accumulate > 60% lipid by weight(233), making it a prime candidate for biodiesel production from renewable biomass(230, 231, 236- 238). A transformation system for L. starkeyi has recently been described(239). Its genomic sequencing was included in this study for its biotechnological and biofuel potential as well as to clarify its taxonomic status.

Taphrinomycotina

Pneumocystis jirovecii Pneumocystis jirovecii is a fungus that causes severe pneumonia in immune-compromised patients. This genome lacks virulence factors and most amino acid biosynthesis enzymes and presents reduced GC content and size. Together with epidemiological observations and inability to grow in culture, these features suggest that P. jirovecii is an obligate parasite specialized in the colonization of human lungs. This genome sequence will boost research on this deadly pathogen, a major cause of mortality in patients with impaired immune systems. The genome of P. jirovecii was published previously(240) and is included here principally for phylogenetic comparisons(235).

Schizosaccharomyces pombe Schizosaccharomyces pombe or "fission yeast" is a model organism in molecular and cell biology. The sequence of the S. pombe genome was published in 2002, by a consortium led by the Sanger Institute, becoming the sixth model eukaryotic organism whose genome has been fully sequenced(241, 242). Approximately 120 natural strains of S. pombe have been isolated. The genetic system of S. pombe is highly developed, which has allowed the collection of almost 5,000 heterozygotic diploid deletion mutants that have enabled annotation of function through comparisons with budding yeast(243).

Taphrina deformans Taphrina deformans is a fungus responsible for peach leaf curl, an important plant disease. It is phylogenetically assigned to the Taphrinomycotina subphylum, which includes the fission yeasts and the mammalian pathogens of the genus Pneumocystis. Its role in peach leaf curl has been recognized since about 1926(244), and it has been studied principally as a plant pathogen. Its production of indol-3-acetic acid and cytokinin are responsible for the deformation of plant leaves following infection(245-247). The genome of T. deformans was published in 2013 and is used here for phylogenetic comparisons(248).

Saitoella complicata complicata was included in this study for comparative analysis of poorly sampled yeast lineages. This rare fungus occupies a unique position as one of a small group

10

(Taphrinomycotina) of the earliest diverging Ascomycota(249-251). Its distinct position and saprobic life style – capable of living in an environment rich in organic matter that lacks oxygen – while belonging to a phylogenetic group containing plant and animal pathogens, makes this species important for sequencing. A draft genome of S. complicata has been previously published(252), but the complete, annotated sequence performed by JGI has been used in the present analysis. Note 2 – Genomes overview The genomes of Saccharomycotina yeasts are, in general, compact (13.2 Mb on average) (Table S1, Fig. S1) with low repeat content (3% repetitive on average; Table S1). The smallest genome in Saccharomycotina is O. polymorpha at 9.0 Mb, and the largest is L. starkeyi at 21.2 Mb, followed by Y. lipolytica at 20.5 Mb. The same pattern holds true for the Taphrinomycotina yeast genomes (Table S1). The two ascomycete yeast subphyla differ in these respects from the filamentous fungi of Pezizomycotina, Basidiomycota, and early diverging fungi, which generally have larger, more repetitive genomes. Predicted proteomes of the ascomycete yeasts are, similar to their genomes, compact at 3.5k to 8.2k proteins (Table S2, Fig. S1), which again contrasts the filamentous fungi, which tend to have more predicted proteins. Using MCL clustering(253), we inferred protein family relationships among some 342 fungi, including the ascomycete yeasts, and found a core proteome 2.5k to 4.8k that is conserved throughout the fungi (Fig. S1). A substantial fraction of yeast proteomes consists of proteins specific to Ascomycota (absent from Basidiomycota and early diverging fungi), and generally a small fraction consists of organism-specific proteins (Fig. S1). We noticed that the predicted coding genes of the yeasts tend to be intron-poor (Table S2). In order to remove potential bias caused by differing genome completeness and gene calling methods, we used a set of 303 single copy orthologs from the 30 yeast genomes to quantify introns in yeast genomes (Fig. S2). Our data showed that these genes in Saccharomycotina yeasts have few introns, consistent with previous work(254) showing intron-rich ancestors to the fungi, and early-diverging fungal lineages and intron losses in later derived species. Lipomyces starkeyi is an exception with an average of three introns per gene. In contrast to Saccharomycotina, the yeasts of Taphrinomycotina tend to have introns, most notably Pneumocystis jirovecii, which has an average of six introns per gene in these 303 orthologs (Fig. S2). We used the methods described by Carr et. al.(255) to identify Ty family long terminal repeat (LTR) retrotransposons in the yeast genomes. As previously reported, S. cerevisiae has a large number of complete Ty elements and signatures of partial insertions as seen by LTR only elements. While Ty elements are relatively rare in other yeasts, their distribution shows lineage specific expansion suggesting horizontal transfer as previously proposed. S. cerevisieae primarily shows Ty1 and Ty2 elements are rare in other yeasts while Ty3 and Ty5 are rare in S. cerevisieae, but found in several other lineages. Since the ribosomal DNAs are among the most abundantly transcribed genes in the genome, their organization and stoichiometry is expected to be under strong evolutionary constraint. However, the rDNA loci have been reorganized several times during yeast evolution. In all species except Pneumocystis(256), tens to hundreds of copies of the 18S, 5.8S and 26S rDNA genes are arranged as a tandem array (or multiple arrays), but in some such as S. cerevisiae the 5S genes are also included within the repeating unit of this array, whereas in others such as Y. lipolytica the 5S genes instead occur at dozens of dispersed loci around the genome(257).

11

Surprisingly, we find evidence for at least five evolutionary transitions between these two modes of organization of the 5S genes (Fig. S4) rather than the single transition from dispersed to tandem organization that was previously postulated(257). We infer that the 5S genes were dispersed in the ancestor of Ascomycota, became incorporated into the main rDNA array around the time of the Ascoidea divergence, but then secondarily became dispersed again in several taxa including Komagataella (Fig. S4).

Yeast culturing, identification, DNA and total RNA extraction Yeast strains were grown in 500 ml Erlenmeyer flasks containing potato dextrose broth (CM0962, Oxoid, Thermo Fisher Scientific), and depending on taxon, incubated at 25 °C between 2 and 7 days until log-phase was reached. Cultures were spin-down, cell pellets quickly dissolved in approximately 10 ml sterile water and snap frozen in pre-folded aluminum bags and stored at – 80 °C until processed. DNA and total RNA extraction was carried out using Qiagen genomic tip 500/g and RNAeasy maxi kit columns respectively (Genomic tip 500/G, 13362; RNeasy maxi kit, 75162) following manufactures instruction. RNA integrity was assessed via Agilents bioanalyzer 2100 (Agilent Technologies) following manufactures protocols. Species (strain) identity prior genome sequencing was confirmed by subsequent PCR amplification and Sanger sequencing (3730XL, Thermo Fisher Scientific) of the universal fungal DNA barcode internal transcribed spacer region (ITS) employing standard protocols(258).

Genome sequencing and assembly Seven genomes (Ascoidea rubescens, Babjeviella inositovora, Candida arabinofermentans, Candida tanzawaensis, Cyberlindnera jadinii, Hyphopichia burtonii, and Nadsonia fulvescens var. elongata) were sequenced using Illumina platform only, combining Illumina fragment 270bp insert size and 4Kbp long mate-pair (LMP) libraries. The fragment library was produced from genomic DNA, sheared using the Covaris E210 (Covaris) and size-selected using Agencourt Ampure Beads (Beckman Coulter). The DNA fragments were treated with end repair, A-tailing, and adapter ligation using the TruSeq DNA Sample Prep Kit (Illumina) and purified using Agencourt Ampure Beads (Beckman Coulter). For the LMP libraries, genomic DNA was sheared to its desired insert size using the Hydroshear®. The ends of the fragments were ligated with biotinylated adapters containing loxP. The adapter ligated DNA fragments were circularized via recombination by a Cre excision reaction. The circularized DNA was digested using 4 base cutter restriction enzymes followed by self-ligation and inverse PCR. Both types of libraries were sequenced on Illumina HiSeq machines. Illumina data were QC filtered for artifact/process contamination and subsequently assembled with AllPathsLG (259). The genome of Cyberlindnera jadinii was further improved by closing gaps with PacificBiosciences reads using PBJelly (260).

The remaining nine genomes were sequencing using hybrid approach, which combined 454 (Roche) standard and paired-end (PE) pyrosequencing with optional Illumina fragments and/or Sanger fosmid sequences. For 454 libraries, genomic DNA was fragmented by nebulization. The ends of the fragments were treated with end repair and ligated with adapters using the Titanium or 454 Rapid Library Preparation Kits (454). Adapter ligated products were immobilized on beads and treated with nick repair. Single strands were eluted to generate the final library. For 454 PE libraries, genomic DNA was sheared to its desired insert size using the Hydroshear®. The ends of the fragments were ligated with biotinylated adapters containing

12 loxP. The adapter ligated DNA fragments were circularized via recombination by a Cre excision reaction. The circularized DNA was randomly sheared using the covaris E210, ligated with adapters, and amplifed via PCR. Limited amounts of Illumina fragment data were produced as described above for all nine hybrid assemblies except for Candida caseinolytica and Ogataea (Hansenula) polymorpha and assembled with Velvet (261). In addition, fosmid libraries were prepared for O. polymorpha, Pachysolen tannophilus, and Pichia membranifaciens. For these, genomic DNA was sheared using the Hydroshear® and size selected for 40kb using a pulse field gel. The long inserts were cloned into pCC1FOS vector. The draft assemblies were produced from Roche (454) standard and PE data, with optional Sanger fosmid reads and/or shredded consensus from Velvet assembled illumina, followed by gapResolution automated gap closure (262).

Transcriptome sequencing and assembly The transcriptomes of Ogataea polymorpha, Nadsonia fulvescens, and Wickerhamomyces anomalus were sequenced using 454 (Roche) technology. mRNA was purified from total RNA using oligo dT beads, chemically fragmented, and reverse transcribed using random hexamers and SSII. The ends of the fragments were treated with end repair and ligated with adaptors using the 454 Rapid Library Preparation Kit (454). The libraries were sequenced using 454 (Roche) and assembled using Newbler (http://www.454.com/). For the remaining 13 transcriptomes, Illumina protocol was used. mRNA was purified from total RNA using oligo dT beads, chemically fragmented, and reversibly transcribed using random hexamers and SSII. The ends of the fragments were reverse transcribed using random hexamers and SSII. Second strand was synthesized using dNTP/dUTP mix. The ends of the fragments were treated with end repair, A-tailing, and ligration of adapters. The Second strand was removed using AmpErase UNG followed by PCR to enrich for the final library. The Illumina libraries were sequenced using HiSeq and assembled using Rnnotator (263).

13

-

portalid Scaffolds Assembly length Scaffold N50 Scaffold L50 Repeat covered regions Scaffold gaps GC % Scaffolds >=2kbp Organism MycoCosm Ascoidea rubescens NRRL Y-17699 Ascru1 63 17,503,998 6 1,035,938 17,300 38 32.85 47 Babjeviella inositovora NRRL Y-12698 Babin1 49 15,221,050 6 933,105 1,490 161 48.13 38 Candida arabinofermentans NRRL YB-2248 Canar1 62 13,234,078 6 701,640 6,984 91 34.74 52 Candida caseinolytica NRRL Y-17796 Canca1 6 9,180,161 2 3,113,199 485 43 45.42 6 Candida tanzawaensis NRRL Y-17324 Canta1 16 13,136,204 4 1,512,535 1,137 36 45.34 13 Cyberlindnera jadinii NRRL Y-1542 Cybja1 76 13,018,415 6 700,888 1,620 316 44.50 56 Hanseniaspora valbyensis NRRL Y-1626 Hanva1_1 647 11,464,179 11 332,595 20,073 516 26.71 646 Hansenula polymorpha NCYC 495 leu1.1 Hanpo2 7 8,974,850 4 1,302,532 742 1 47.86 7 Hyphopichia burtonii NRRL Y-1933 Hypbu1 27 12,403,110 3 1,853,527 5,071 78 34.99 16 Lipomyces starkeyi NRRL Y-11557 Lipst1_1 117 21,267,709 7 1,341,299 2,503 322 46.92 117 Metschnikowia bicuspidata NRRL YB-4993 Metbi1 48 16,055,203 3 2,544,904 3,529 373 47.85 48 Nadsonia fulvescens var. elongata DSM 6958 Nadfu1 20 13,746,944 4 1,627,684 4,388 44 39.25 16 Pachysolen tannophilus NRRL Y-2460 Pacta1_2 198 12,597,116 3 1,751,653 20,466 385 29.93 28 Pichia membranifaciens NRRL Y-2026 Picme2 11 11,584,685 3 1,676,949 2,867 134 45.12 11 Saitoella complicata NRRL Y-17804 Saico1 35 14,136,186 11 436,079 1,788 39 52.60 35 Wickerhamomyces anomalus NRRL Y-366-8 Wican1 46 14,145,566 3 2,188,197 2,641 161 34.55 46

Table S1. Genome statistics The MycoCosm portal id can be used to access the genome portals of the respective organisms on MycoCosm (http://genome.jgi.doe.gov/fungi) using the format http://genome.jgi.doe.gov/. A group page for the organisms used in this study can be found at http://genome.jgi.doe.gov/biotech_yeasts/.

14

Number of Number of Genes Median Intron Length Avg. of number Introns pfam with proteases transporters secretome sscp Ascoidea rubescens NRRL Y-17699 6,802 109 0.38 4,782 167 667 269 41 Babjeviella inositovora NRRL Y-12698 6,403 73 0.26 4,833 187 710 260 36 Candida arabinofermentans NRRL YB-2248 5,861 95 0.33 4,495 152 617 221 20 Candida caseinolytica NRRL Y-17796 4,657 73 0.20 3,875 160 548 172 21 Candida tanzawaensis NRRL Y-17324 5,895 82 0.24 4,577 163 618 233 22 Cyberlindnera jadinii NRRL Y-1542 6,038 67 0.34 4,835 179 739 296 38 Hanseniaspora valbyensis NRRL Y-1626 4,800 94 0.19 3,417 119 493 149 10 Hansenula polymorpha NCYC 495 leu1.1 5,177 46 0.20 4,382 163 664 187 7 Hyphopichia burtonii NRRL Y-1933 6,002 82 0.22 4,769 169 694 269 20 Lipomyces starkeyi NRRL Y-11557 8,192 56 1.84 5,777 212 922 292 53 Metschnikowia bicuspidata NRRL YB-4993 5,851 93 0.26 4,396 159 579 261 60 Nadsonia fulvescens var. elongata DSM 6958 5,657 90 0.57 4,696 174 600 169 13 Pachysolen tannophilus NRRL Y-2460 5,675 97 0.32 4,616 165 643 189 13 Pichia membranifaciens NRRL Y-2026 5,546 114 0.22 4,309 147 618 216 28 Saitoella complicata NRRL Y-17804 7,034 56 1.23 5,404 225 710 290 36 Wickerhamomyces anomalus NRRL Y-366-8 6,423 58 0.42 5,160 198 843 288 22

Table S2. Annotation statistics Pfam annotations of predicted genes was performed using HMMER (v3.0) against the pfam database (v27)(264) and AntiFam (v3.0)(265). Proteases and transporters were identified using BLAST against the MEROPS database (v9.2)(266) and Transporter Classification Database (v.July 15 2011)(267) respectively. Secretome was predicted using a combination of SignalP (v4.1), TargetP (v1.1), TMHMM (v2.0) and WoLF-PSORT (v0.2)(268) using methods similar to those described by Emanuelsson et. al(269). Small secreted cysteine rich proteins (sscp) were identified as secreted proteins that were less than 300 amino acids, lacked pfam annotations, and had 2% or more cysteine residues.

15

Figure S1. Genome size, repeat content, and protein conservation measures for the yeasts. Yeasts newly sequenced for this study are shown in bold.

16

7

6

5 P.jirovecii

O.polymorpha

4

3 starkeyi L.

H.valbyensis

C.tanzawaensis

Average introns/gene Average

S.passalidarum

P.membranifaciens

D.bruxellensis

C.jadinii

2 T.caseinolytica

C.arabinofermentans

S.complicata

S.pombe

S.cerevisiae

H.burtonii

T.deformans

1 N.fulvescens

A.rubescens

W. anomalus

Y. Y. lipolytica

P.tannophilus

B. B. inositovora

S. stipitis S.

M.bicuspidata

K.phaffii D.hansenii gossypii E.

C.tenuis K.lactis 0 0 10 20 30 40 50 60 70 80 90 100 Percent intronless genes

Figure S2. Intron occurrence in 303 predicted orthologs in the yeasts.

17

Figure S3. Distribution of TY-LTR elements in the yeasts.

18

Figure S4. Organization of rDNA genes in yeast species. 5S rDNA genes are highlighted in red. For each species, BLAST searches were used to infer the organization of the tandem rDNA array and to search for dispersed 5S rDNA genes. Square brackets and the subscript n indicate that the structure of the complete repeating unit of the rDNA array was determined for that species. In many cases the 18S-5.8S-26S unit structure was not present in genomic scaffolds but was inferred from EST cluster data. ND, not determined. In Pneumocystis there is only a single copy of the 18S-5.8S-26S cluster, and we identified two 5S genes at different locations.

19

Note 3 – Organism phylogeny The 38 genome sequences were phylogenetically investigated using the DSMZ phylogenomics pipeline as previously described (270-277) using BLAST (278), OrthoMCL (279), MUSCLE (280), RASCAL (281), GBLOCKS (282), and MARE (283). Clusters of orthologs were generated using OrthoMCL, inparalogs were removed, the remaining sequences were aligned with MUSCLE and filtered with RASCAL and GBLOCKS. Three distinct supermatrices were compiled, (i) a ‘full’ supermatrix based on all filtered alignments comprising at least four sequences; (ii) a ‘MARE’ matrix obtained by removing relatively uninformative and low-coverage genes from the ‘full’ matrix using MARE under default values except that deleting organisms was disallowed; (iii) a ‘core’ matrix comprising only those genes present in all organisms. The ‘full’ supermatrix used 7,297 genes (2,094,063 characters), the ‘MARE’ matrix used 1,559 genes (364,126 characters), and the ‘core’ matrix used 418 genes (166,611 characters). Maximum likelihood and maximum-parsimony trees were inferred from the concatenated alignments with RAxML (284) and PAUP* (285), respectively, as previously described (270- 277). The selected model for all three matrices was PROTCATLGF (the LG model of amino acid evolution (286) in conjunction with the CAT approximation for modeling rate heterogeneity (284) and empirical amino-acid frequencies. The resulting trees had a log likelihood of (i) - 48,541,040.42, (ii) -11,399,822.76 and (iii) -5,676,545.75, respectively. The best maximum- parsimony (MP) trees found had a length of (i) 9,198,654 steps, (ii) 2,156,869 and (iii) 1,093,186 (not counting uninformative characters). In order to assess potential artifacts caused by model oversimplification and standard bootstrapping, the full matrix was also analyzed with ExaML version 3.0.7 (287) and one model per gene (estimated during search for the best tree) in conjunction with the partition bootstrap (288); TNT (289) was used for inferring MP trees and MP partition bootstrap values. Moreover, the entire proteomes were subjected to a Genome BLAST Distance Phylogeny (GBDP) analysis (290) using the greedy-with-trimming approach in -8 conjunction with formula d5, an e-value filter of 10 , and pseudo-bootstrapping (291). We performed a partition bootstrap analysis of the full matrix with ExaML(287) and TNT(289). In contrast to the trees shown in Figure 1 in the main text, this computationally expensive approach used (for ML) one model per gene and to obtain branch support bootstrapped entire genes, not individual alignment positions(288). The results shown in Figure S5 strongly support the findings depicted in Figure 1. The entire proteomes were also subjected to a Genome BLAST Distance Phylogeny (GBDP) analysis(292, 293). This method is of interest because the complete genomic information enters tree reconstruction, with corrections for paralogs on a pairwise basis, instead of a global determination of orthology(294, 295), which considerably decreases overall computation time. The tree shown in Figure S6 confirms most major groupings and yields no significantly supported conflicts compared to the previous phylogenetic analyses. Support is overall slightly lower, however; for instance, Taphrinomycotina receive only little support, and the relative positioning of Lipomyces and Tortispora is unresolved.

20

Figure S5. Phylogenetic tree inferred with ExaML with one individually estimated model per gene and rooted with Batrachochytrium dendrobatidis. Numbers on branches are ML (left) and MP (right) partition bootstrap support values. The branches are scaled in terms if the expected number of substitutions per site.

21

Figure S6. Phylogenetic tree inferred with FastME from interproteomic distances calculated with GBDP and rooted with Batrachochytrium dendrobatidis. Numbers on branches are GBDP pseudo-bootstrap support values. Long terminal branches are due to the distinct scaling used by GBDP in combination with formula d5.

22

Note 4 – Alternative genetic codes: CUG coding for Ser and Ala The CUG-Ser clade of yeasts, use an altered genetic code in which CUG codons are translated as Ser, rather than the canonical Leu (296-301). This change is due to alterations in the tRNA that decodes CUG (tRNACAG), which is recognized by seryl-tRNA synthetase and, to a lesser degree, leucyl-tRNA synthetase, resulting in a mixture of roughly 97% Ser and 3% Leu residues(302, 303) at CUG-encoded sites in proteins. To investigate the origins of this genetic code change, we inspected predicted tRNACAGs for the presence of three sequence features indicative of Ser translation(304): a G33 residue 5’ to the anticodon, which may lower rates of leucylation(305), a Ser identity element in the variable loop, and a G discriminator base (Fig. S7A). The assembled genomes of selected yeasts were submitted to the Bagheera web server(306) (Table S3). For an unambiguously “standard table” yeast like Lipomyces starkeyi, Bagheera predicts a 7:146 ratio of alternate codon usage to standard usage sequence similarity suggesting that L. starkeyi uses the standard table. Similarly, for a CUG-Ser yeast like Candida tanzawaensis, the ratio is 32:15 suggesting alternative codon usage. However, for P. tannophilus, that ratio is 14:12, only slightly in favor of alternative codon usage. This, combined with our observation of tRNA differences from CUG-Ser clade yeasts suggests that translation of CUG codons in P. tannophilus may be neither standard (Leu) nor the same as in the CUG- Ser clade (Ser). Two plasmids were designed to investigate usage of the alternate genetic code in P. tannophilus, both containing hygromycin resistance genes under the control of native glyceraldehyde-3-phosphate dehydrogenase promoters (1000 bp) and terminators (300bp). One version utilized a standard bacterial aminoglycoside phosphotransferase (APH) gene, while the other contained the same APH gene but was codon-optimized for expression in CUG yeasts. In particular, the codon-optimized APH replaced 9 CUG codons with alternate leucine codons, to preserve the integrity of the amino acid sequence in CUG yeasts. When introduced into P. tannophilus, thousands of transformants were obtained using the codon-optimized version of the construct (Fig. S7B), while none were obtained when treated under identical conditions with the version containing CUG codons (Fig. S7C). Oligonucleotides were obtained from Integrated DNA Technologies (IDT, Coralville, IA). Plasmids were constructed with the Gibson method for in-vitro enzymatic DNA assembly (307), using PCR-amplified products with 30 overlapping regions. Plasmids were sequenced at Functional Biosciences Inc. (Madison, WI) to confirm that their assembly was successful. DNA was prepared for integrative transformations by digesting plasmids with BamHI and SfoI, followed by DNA purification. Transformations were carried out using a method similar to the standard lithium acetate transformation protocol (308). A dilute culture was grown in 50 mL of YPD overnight, and cells were harvested after reaching an OD600 near 0.2. After addition of the transformation mixture, cells were treated with a 30 minute incubation step at 30°C, a 20 minute heat shock step at 40°C, a 2 hour recovery step in 3 mL of YPD, and plated onto YPD media containing 125 µg/mL of hygromycin. Transformants were visible after about three days of outgrowth at 30°C. Our current success in demonstrating the ability to transform P. tannophilus with hygromycin resistance only in the absence of CUG codons highlights the utility and importance of considering alternate codon systems when attempting to engineer non-conventional yeast species.

23

Figure S7. Translation of CUG codons to Ser or Ala (A) tRNACAG features associated with translation of CUG codons to Ser is limited to CUG-Ser clade of yeasts. Basal to the CUG-Ser clade, M. bicuspidata and B. inositovora lack either the Ser identity element or discriminator base, respectively, possibly indicating Ser gradual acquisition of tRNACAG sequence features. P. tannophilus, like other yeasts related to it, lacks the characteristic features of Ser tRNACAG . (B-C) P. tannophilus cells were transformed with a hygromycin resistance construct either codon-optimized to utilize alternate leucine codons (B), or containing CUG codons (C). Variable colony sizes can reflect differences in the locus, copy number, or stability of genomic insertions.

24

Alternative Standard Indiscriminative

C. tanzawaensis 32 15 45

D. hansenii 26 17 44

P. tannophilus 14 12 54

A. rubescens 11 4 24

L. starkeyi 7 143 41

Table S3. Results of Bagheera predictions of codon usage for selected yeasts. Bagheera predictions were based on the following web server parameters. Reference species for gene prediction=Saccharomyces cereviseae S288C, Alignment method=MAFFT, Alignment configuration=End gap free.

Dataset S1. Genetic code of P. tannophilus In a separate excel file.

25

Note 5 – Correlation of genomically encoded enzymes to metabolic traits Growth profiles for the yeasts (309) are summarized in context of the phylogenetic tree in Fig. S8. In yeasts where the gene complement conflicted with the reported growth data (e.g. genes present but (-) on a substrate), showed variable growth, or in which the strain sequenced was not the strain with reported growth data, we performed tests of the sequenced strain’s growth on the substrate in question. Cultures were incubated in yeast nitrogen base (duplicate tubes) for one month at 25°C (or 20°C in the case of Ascoidea rubescens). Controls were carbon-free YNB and YNB with glucose. Results by substrate were: D-xylose: Candida tanzawaensis NRRL Y-17324 was confirmed D-xylose (-) as originally reported (309), Metschnikowia bicuspidata NRRL YB-4993 (+), Ogataea polymorpha NCYC 495 (+), Pichia membranifaciens NRRL Y- 2026 (-), Wickerhamomyces anomalus NRRL Y-366-8 (+); galactose: Ogataea polymorpha NCYC 495 (-), Wickerhamomyces anomalus NRRL Y-366-8 (+), Ascoidea rubescens NRRL Y17699 (-), Nadsonia fulvescens var. elongata DSM 6958 (+); L-rhamnose: Ascoidea rubescens NRRL Y17699 (-) in contrast to a previous report (309); maltose: Ogataea polymorpha NCYC 495 (+), Dekkera bruxellensis CBS 2499 (+), Eremothecium gossypii ATCC 10895 (+); raffinose: Dekkera bruxellensis CBS 2499 (-); sucrose: Ogataea polymorpha NCYC 495 (+), Dekkera bruxellensis CBS 2499 (+); melibiose: Debaryomyces hansenii CBS 767 (+); lactose: Scheffersomyces stipitis CBS 6054 (+), Debaryomyces hansenii CBS 767 (+), Candida tenuis NRRL Y-1498 (+), Lipomyces starkeyi NRRL Y-11557 (+); cellobiose: Ogataea polymorpha NCYC 495 (-), Dekkera bruxellensis CBS 2499 (+), Pachysolen tannophilus NRRL Y-2460 (+), Eremothecium gossypii ATCC 10895 (+). Our broad phylogenetic study confirmed and elaborated prior findings that loss of respiration Complex I (RC1) preceded the gain of bacterial dihydroorotate dehydrogenase (Ura1) in S. cerevisiae and closely related genera (Fig. S9). These two features along with other evolutionary changes such as expansion of genes for facilitated uptake of sugars enabled anaerobic growth in these yeasts. Unexpectedly, two other fermentative yeasts (Nadsonia fulvescens and Schizosaccharomyces pombe) showed diminished RC1 components, but did not acquire Ura1 (Fig. S9). The utilization of several disaccharides correlated broadly with the presence of genes encoding well-characterized enzymes, but several exceptions are noteworthy (Fig. S10). The genomes of many yeasts from the CUG-Ser clade lack recognizable SUC2 homologs, but each of these yeasts can still utilize sucrose. In each of these cases, their genomes encode multiple predicted alpha- or beta-glucosidases. An attractive hypothesis is that these enzymes have a broader specificity that allows them to bind to and hydrolyze glucose disaccharides more generally, a property known for some S. cerevisiae alpha-glucosidases(310). A similar relationship may exist among alpha-glucosidases for Sch. stipitis(311). In support of this hypothesis, within the CUG- Ser clade, only H. burtonii and D. hansenii encode beta-fructosidase (s), and these yeasts can utilize raffinose (a trisaccharide where a galactose monomer is linked to glucose monomer of sucrose, a configuration likely to block access from alpha- or beta-glucosidases). Surprisingly, A. rubescens scored positive on many disaccharide growth assays, but its genome is predicted to encode few of the enzymes characterized in disaccharide catalysis (Fig. S10). If the positive growth scores are not artifacts, this yeast species may be a fruitful source of novel genes. Notably, the L. starkeyi genome contains multiple homologs encoding each enzyme family, which is consistent with its broad spectrum of disaccharide utilization.

26

Assigning genes to enzyme functions Predicted protein sequences were assigned enzyme function by three methods: PRIAM (312) profile searches, TBLASTN searches using query protein sequences from the characterized pathways in model organisms vs. genome assemblies, and BLASTP searches of the same query proteins against the full predicted protein catalog of each organism. In the case of discrepancies between PRIAM/TBLASTN/BLASTP results, we manually inspected the results to resolve ambiguities. For example, to more thoroughly search the yeast genomes for homologs to known GAL pathway genes, we used TBLASTN to search the six translated reading frames of the genomes for matches to the known Gal1, Gal7, and Gal10 proteins. Using an e-value threshold of 10-5, we found the same complement of enzymes found by PRIAM, with one exception: TBLASTN detected one GAL10 gene in K. phaffii, whereas PRIAM detected none. The K. phaffii annotation (NCBI accession PRJNA39439) lacked any predicted protein sequence that we could assign to a Gal10 function (UDP-glucose 4-epimerase; EC 5.1.3.2), whereas the TBLASTN search identified a high-confidence hit from 1,583,243 to 1,584,238 on 4, where, presumably due to the inherent inaccuracies of automated annotation, a gene had not been predicted. We thus were able to assign a single putative GAL10 gene to K. phaffii (Fig. 3). XYL genes were detected by mapping the characterized proteins (313) (Xyl1: Picst3|89614; Xyl2: Picst3|86924; Xyl2: Picst3|68734) to their PRIAM assignments (EC 1.1.1.175, EC 1.1.1.9, and EC 2.7.1.17, respectively), and in turn finding these PRIAM assignments in the proteins of the study set. However, this approach was apparently not sufficiently sensitive because it resulted in Xyl1 and/or Xyl2 going undetected in several yeasts capable of growing on D-xylose (309). We therefore inspected the TBLASTN and BLASTP results, selecting TBLASTN (e-value threshold 10-5) results for Fig. 3 because it produced the fewest disagreements between gene complement and published growth data. Methylotrophy genes were detected similarly using K. phaffi queries: AOX (ALOX1_PICP7/ALOX2_PICP7), CAT (CCA37834), DAS (CCA39320/ACN76560), FDH (CCA39210), FGH (CCA39282), FLD (CCA39112), and TKL (CCA37054) to EC 1.1.3.13, EC 1.11.1.6, EC 2.2.1.3, EC 1.2.1.2, EC 3.1.2.12, EC 1.1.1.284, EC 2.2.1.1, respectively; Fig 3 shows the PRIAM results. Rhamnose genes were detected with BLASTP of queries Rha1 (Picst3|50944), Lra2 (Picst3|63908), Lra3 (Picst3|50672), Lra4 (Picst3|64442); Fig. 3 shows the results using an e-value threshold of 10-30. Genes involved in disaccharide metabolism in Fig. S10 were identified using TBLASTN and an e-value threshold of 10-5 with the following protein queries: alpha-glucosidase: S. cerevisiae CBS 2354 MAL32 (GenBank: CCB84898.1); beta-fructosidase: S. cerevisiae S288c SUC2 (NCBI Reference Sequence: NP_012104.1); alpha-galactosidase: S. pombe 972h- MEL1 (NCBI Reference Sequence: NP_595012.1); beta-galactosidase: S. stipitis CBS 6054 LAC4 (NCBI Reference Sequence: XP_001382569.2); beta-glucosidase: S. stipitis CBS 6054 BGL5 (NCBI Reference Sequence: XP_001387350.1).

27

Figure S8. Fermentative, metabolic, osmotolerance, and temperature-dependent growth of yeasts. Throughout this text we refer to the fermentative and metabolic abilities of the various yeasts. These abilities are summarized here. Adapted from Kurtzman et al. (75).

28

Figure S9. Loss of Complex I genes and evolution of fermentative lifestyle Homologs to the catalytic components of Respiration Complex I (RC1; blue; EC 1.6.5.3 NADH:ubiquinone oxidoreductase; EC 1.6.99.3 NADH dehydrogenase; EC 1.6.5.11 NADH dehydrogenase (quinone)) were identified using the PRIAM pipeline(314). Alternative oxidase (Picst3|67332) was detected using BLASTP(315) and an e-value of 1e-50. URA9 (red) was detected using the K. lactis URA9 sequence (Q6CTX8|PYRD2_KLULA from Swissprot(316)) as a query and an e-value of 1e-50. URA1 (green) was detected using the 3610|Sacce1 sequence as a query and e-value of 1e-100. Fermentation capabilities (orange) are from Kurtzman et al(75).

29

Figure S10. Correlation between disaccharide metabolism and genome content Utilization (orange) was scored as follows: +, positive growth; v, variable among strains tested; s, slow growth; w, weak growth; -, no growth. The number of homologs encoding each enzyme (green) were detected using TBLASTN (e-value cutoff = 1.0e-5, no low complexity filtering) with the following query genes: S. cerevisiae CBS3254 MAL32, alpha-glucosidase; Sch. stipitis CBS 6054 BGL5, beta-glucosidase; S. cerevisiae S255c SUC2, beta-fructosidase; S. pombe 972h MEL1, alpha-galactosidase; K. lactis NRRL Y-1140 LAC4, beta-galactosidase.

30

Note 6 – Gene clusters Genes for enzymes in the same metabolic pathway are frequently adjacent to one another in genomes where they are coordinately regulated(317). Examples include clustering of genes for cellulose(311), urea, allantoin, galactose(318), and maltose degradation(310, 319, 320). Based on these earlier observations, we hypothesized that genes having related metabolic activities would be likely to remain proximal to one another in the various genomes if the physiological function of their combined activities depended on the presence of both metabolic activities(321). To test this hypothesis, we scanned the 38 genomes for genes encoding proteins with known enzymatic activities (i.e. EC numbers). We then scanned the scaffolds of each species and recorded the positions of genes for enzymatic activities that shared a common metabolite. Clusters were designated as two or more genes sharing one or more common metabolites, and separated by no more than six ORFs from one another on a scaffold, along with intervening genes. The combination of ECs found within a given cluster was designated as a cluster profile (CP), and many of these were found to recur in multiple genomes. The genes showing two or more metabolic and genomic associations were then plotted against our phylogenetic tree. We recorded 200 CPs that occurred in one or more species. We also performed a taxonomically independent analysis by counting the number of instances in which the gene for each enzyme occurred in any other pairwise cluster. Genes for 255 enzymes were found in CPs consisting of two or more related genes (Dataset S2). The number of CPs identified depends on the number of intervening ORFs allowed between genes for metabolically associated activities. Allowing for more intervening ORFs expands the number of clusters but also incorporates more proximal genes that could code for proteins metabolically unrelated to those in the CP. Genes often associated with activities for substrate assimilation include permeases, transporters or regulatory proteins, most of which are difficult to annotate. Examining the genomic context of CPs in specific taxa thereby can facilitate identification and annotation of other activities related to that metabolic activity. The number of CPs scored also depends on the numbers of subunits, orthologs, or isoforms of each enzymatic activity present in the genome. If multiple isoforms or subunits of a single enzymatic activity are present in the cell, there are more opportunities for the encoding genes to be paired with other related genes. For example, three separate DNA-directed RNA polymerases occur in the cell, each of which has multiple subunits – thereby increasing the frequency with which components of this activity can be paired with genes for other related enzymes. Genes for a particular enzyme occur in CPs with several other related genes, and some CPs occur across a taxonomic range. In some cases, the CPs occurred among multiple genes and across taxonomic ranges. Pairwise CPs in a single organism are common and scattered throughout the species examined. Other CPs were observed in up to 14 species over wide taxonomic ranges. Some CPs consist of genes for three or four metabolically related enzymes. These sometimes occur in a single organism, but sets of CPs consisting of three or four activities were identified in multiple species as well. Metabolic traits that depend on the presence of a few specific enzymatic activities can show close proximal associations over wide taxonomic ranges. Galactose metabolism is one of the best examples. Galactokinase [EC 2.7.1.6] occurs in 16 pairwise associations with EC 2.7.7.12 (UDP-glucose--hexose-1-phosphate uridylyltransferase), and 4 times in pairwise associations with EC 5.1.3.2 (UDP-glucose 4-epimerase), and all three enzymes must be present in the genome, along with galactose permease, in order for a species to metabolize galactose. The capacity to metabolize galactose is widely distributed among yeasts and other fungi, but is particularly prevalent in the CUG-Ser yeasts (Dataset S3). Genes for the metabolism of N- acetylglucosamine [EC 2.7.1.1, EC 3.5.1.25, and EC 3.5.99.6] are likewise frequently clustered

31 and frequently found among the CUG-Ser yeasts. As had been previously noted, genes coding for β-glucosidases [EC 3.2.1.21] are frequently found in association with cellulases (endo- glucanases; EC 3.2.1.4) and are occasionally found in association with EC 3.2.1.58 (glucan 1,3- β-glucosidases)(311). Surprisingly, genes encoding α-glucosidases (maltase; EC 3.2.1.20) were found more frequently in association with β-glucosidases than with glucan 1,4-α-glucosidases (glucoamylases; EC 3.2.1.3), but this might have been attributable to the greater abundances of α- and β-glucosidases in the genomes of the organisms studied. Also surprisingly, genes encoding glycogen synthases [EC 2.4.1.11] and 1,3-β-glucan synthases [EC 2.4.1.34] were frequently found in association with one another, but this might be attributable to their common precursor (UDP glucose). The presence of particular CPs can be indicative of how metabolic activities differ among taxonomic groups. For example, genes involved with biotin synthesis fell into two clusters that exhibited different phylogenetic and taxonomic distributions. The first, consisting of dethiobiotin synthase (EC 6.3.3.3) and adenosylmethionine---8-amino-7-oxononanoate transaminase (EC 2.6.1.62) was prevalent in the Saccharomycotina. The second, consisting of dethiobiotin synthase and biotin synthase (EC 2.9.1.6) prevailed among the other phylogenetic clades. These three genes function in sequence (EC 2.6.1.2, 6.3.3.3 and 2.9.1.6) in the biotin synthetic pathway, and in the Saccharomycota, EC 2.6.1.62 is found in a tandem bidirectional association with EC 6.3.3.3. In the CUG-Ser yeasts, biotin synthase (EC 2.8.1.6) is often clustered with genes for 60S ribosomal protein L22 and ribonuclease III, but the significance of this (if any) is not known. Genes coding for nitrate reductases (EC 1.7.1.3) (NADPH) occur in 10 pairwise clusters with EC 1.7.1.4 nitrite reductases [NAD (P)H] across a wide taxonomic range, and in some instances, these genes are adjacent or proximal to nitrate permease. The times that nitrate reductase or nitrite reductase occur alone in the genome of each organism were not counted, but because nitrite is toxic, the presence of nitrate permease and reductase would likely require the presence of nitrite reductase in the genome as well. Some of the most conspicuous CPs were found with DNA-directed RNA polymerase [EC 2.7.4.8], which occurred in 23 pairwise associations in the 38 organisms (Dataset S3). Almost all of these pairings [CPs 18, 43, 68, 78, 107, 140, 142 and 193] occur in purine metabolism where GTP and ATP are used to make RNA. In S. cerevisiae, the three forms of RNA polymerase (RNA polymerase I, II and III) are made up of a total of 31 subunits, so there are more opportunities for subunits of DNA-directed RNA polymerase to be proximal to genes for related enzymes. By comparison, DNA-directed DNA polymerase [EC 2.7.7.7], which has 3 genes associated with this function in S. cerevisiae, was found paired to related enzymes in only three instances. In Y. lipolytica, it was found proximal to genes for three other enzymes acting on its substrates: pyruvate kinase [EC 2.7.1.40], nucleoside-diphosphate kinase [EC 2.7.4.6] and UMP/CMP kinase [EC2.7.4.14]. It was also associated with pyruvate kinase in Komagataella phaffii and Cyberlindnera jadinii (Dataset S3). RNA-directed RNA polymerase activity [EC 2.7.7.48], which has 12 subunits in S. cerevisiae, was proximally associated with a related enzymatic activity, nucleoside-diphosphate kinase [EC 2.7.4.6] only once, in the basidiomycete Ustilago maydis.

Dataset S2. Phylogenetic distribution of gene clusters Dataset S3. Pairwise associations of ECs In separate excel files

32

Note 7 – MAT locus organization and mating-type switching Cell type in yeasts is specified by the genotype at the mating-type (MAT) locus. This locus commonly contains two genes (a1 and a2) in MATa haploid cells, and two different genes (α1 and α2) in MATα haploids(322, 323). The a2 and α1 genes code, respectively, for a-specific and α-specific transcription activators that are the master determinants of the two haploid cell types. The a1 and α2 proteins, which are simultaneously present only in diploid cells, are homeodomain proteins that form a heterodimer that represses transcription of haploid-specific functions such as mating. There are however many exceptions to this typical picture, with several losses of individual MAT genes having occurred in different species. For example, the a2 transcription activator was lost in an ancestor of S. cerevisiae. Pan-ascomycete synteny at the MAT locus. Our data reveal, for the first time, synteny conservation of genes near the MAT locus across the ascomycete phylogenetic tree, extending the known relationship between Pezizomycotina and Saccharomycotina(324) to Taphrinomycotina. The MAT locus is linked to each of NVJ2, APC5, APN2, SLA2 and SUI1 in at least one species in all three subphyla (Fig. 4 – main text). This conservation of context shows that cell type has been controlled during more than a billion years(325) of ascomycete evolution by a single orthologous locus (MAT), despite gross changes of the gene content at this locus such as the loss of all HMG domain genes in S. cerevisiae. The stability of MAT contrasts with the frequent turnover of sex-determination loci in animals and plants(326-328). Homothallism and heterothallism. The reproductive modes of yeast species can be characterized as heterothallic or homothallic, and the structures of their MAT loci correlate with these modes (Fig. 4 – main text). In heterothallic species each haploid strain is either MATa or MATα, and mating is possible only between haploids of opposite MAT genotypes. Fifteen genomes in our dataset appear to be from heterothallic species (Fig. 4C – main text) because their genome assemblies contain either MATa genes or MATα genes, but not both. Only one of these, C. jadinii is diploid (having MATa and MATα genes at allelic locations). Fourteen other species appear to be homothallic (Fig. 4D – main text), based on the presence of both MATa and MATα genes at non-allelic locations in their genome assemblies. The inferred homothallic or heterothallic structure of the genomes generally agrees with previous reports of the reproductive mode of each species(75) with four exceptions (D. bruxellensis, H. valbyensis, C. jadinii and N. fulvescens; Fig. 4B – main text). In homothallic species, any strain can mate with any other strain. At the cellular level, homothallism often occurs by mating type switching – a DNA rearrangement process by which the genes at the MAT locus of a cell are replaced by genes of the opposite type, converting a MATa haploid cell to a MATα or vice versa. Such homothallism is called secondary homothallism because mating still occurs only between MATa and MATα cells, as in heterothallism. Homothallic species in the family , such as S. cerevisiae and K. lactis, are secondary homothallics that switch mating types by a well characterized copy-and-paste mechanism wherein DNA at the MAT locus is replaced by DNA copied from one of two non- expressed ‘cassette’ loci called HMLα and HMRa(323, 329). Secondary homothallism was also recently described in two methylotrophic yeasts, O. polymorpha and K. phaffii(330, 331) (Fig. 4D – main text). These methylotrophs switch mating types by a flip/flop mechanism: a chromosomal region that has MATa genes at one end, and MATα genes at the other end, undergoes inversion to swap the MAT genes between an expression context and a repression context. Transcription of one set of MAT genes is repressed by a centromere (in O. polymorpha)

33 or a telomere (in K. phaffii) located just outside the invertible region. Inversion of the region moves the MATa and MATα genes relative to the repressing element, activating the previously silent genes and silencing the previously active genes. Inversion occurs by recombination between two identical ~2 kb sequences that form an inverted repeat (IR) at the ends of the region. Primary homothallism in yeasts is defined as the ability of any cell to mate with any other cell of the same species; in other words, these species do not have distinct ‘a’ and ‘α’ cell types. The molecular details of how cells of these species sense their , or avoid responding to their own pheromones, is unknown. Four yeasts in our dataset are possible primary homothallics: Scheffersomyces stipitis and Debaryomyces hansenii (CUG-Ser clade)(332), Lipomyces starkeyi (basal Saccharomycotina), and Pneumocystis murina (Taphrinomycotina)(333) (Fig. 4D – main text). Among these, only Sch. stipitis has been demonstrated experimentally to be haploid and homothallic, with Mendelian segregation of markers in meiosis. Origin of mating type switching in Saccharomycotina. Our new sequence data suggested that Pachysolen tannophilus and Ascoidea rubescens are secondary homothallics that can switch mating types by a flip/flop inversion mechanism similar to that of O. polymorpha and K. phaffii (Fig. 4D – main text). In both of these species, a chromosomal region containing MATa and MATα genes is flanked by an inverted repeat (IR) formed by two copies of a 2 kb sequence (blue triangles in Fig. 4D – main text). We confirmed that in P. tannophilus, the orientation of the 8-kb MAT region located between the two IRs can be inverted by growing cells in nitrogen limitation media (Fig. S12A), similar to the mechanism of mating type switching in the other methylotrophic yeasts O. polymorpha and K. phaffii. In A. rubescens, one copy of the IR is located at a telomere, similar to the organization in K. phaffii. We examined three strains of A. rubescens by PCR of genomic DNA, and confirmed that both possible orientations of the MAT region exist in this species (Fig. S12B). Genomic DNA isolations made from different colonies of the same A. rubescens strain appear to vary in the relative abundance of the two orientations, suggestive of frequent mating type switching in A. rubescens (Fig. S12). Because A. rubescens is an outgroup to the methylotroph and Saccharomycetaceae clades, the evolutionary origin of mating-type switching in Saccharomycotina can be traced to the point of divergence between this species and other Saccharomycotina (Fig. 4A – main text). The ancestral mechanism of switching in Saccharomycotina can be inferred to have been a flip/flop inversion system with two MAT-like loci (one expressed and one silent), which later evolved into a copy-and-paste system with three loci (one expressed and two silent) in the Saccharomycetaceae family(334). Surprisingly, we found that another yeast in the ‘methylotrophs’ clade, P. membranifaciens, has three MAT-like loci, which may indicate a parallel origin of a MAT/HML/HMR system in this species. Loss of H3K9me2/3 heterochromatin. In Schizosaccharomyces pombe, transcription of the silent MAT loci is repressed by di- or tri- methylation of lysine 9 of histone H3 (H3K9me2/3), forming a type of heterochromatin that is almost universal in eukaryotes but has been lost in S. cerevisiae and most other Saccharomycotina(334). We found that Lipomyces starkeyi is the only sequenced Saccharomycotina species with orthologs of genes involved in H3K9me2/3 heterochromatin processes (335). It has orthologs of S. pombe Swi6 (LSTA_112810), a chromodomain/ chromoshadow protein that binds to H3K9me2/3 nucleosomes and is central to heterochromatin-mediated gene silencing; of the H3K9 demethylase Epe1 (LSTA_69885); and a SET domain protein (LSTA_103455) that is a candidate ortholog of the H3K9

34 methyltransferase Clr4 although it lacks a chromodomain. L. starkeyi also has a family of retroelements coding for a protein fusion between a Tf1-like integrase and a Swi6-like chromodomain. Swi6-like chromodomains specifically bind H3K9me2/3, so the presence of these retroelements confirms that L. starkeyi has an H3K9 methyltransferase. Our results therefore suggest that mating-type switching, which cannot operate without a mechanism to silence the non-expressed MAT genes, evolved in the Saccharomycotina lineage relatively soon after the ancestral H3K9me2/3 form of heterochromatin was lost (Fig. 4A – main text). However, the mechanism of silencing by SIR proteins, as occurs in S. cerevisiae, did not appear until much later(334) (Fig. 4A – main text). Whether the first switching systems in Saccharomycotina silenced their extra MAT genes by proximity to centromeres or telomeres, or by some other mechanism, will be an interesting question for future studies. Independently, an analogous switching system using H3K9me2/3 silencing evolved in Schizosaccharomyces.

35

36

Figure S11. Mating-type locus organization and synteny in Ascomycota. (A) Phylogenetic tree showing inferred points of origin of mating-type switching, loss of H3K9me- mediated heterochromatin, and gain of Sir1-mediated silencing and HO endonuclease. (B) Summary of literature on homo- or heterothallism of each species (309). Discrepancies with genome data are highlighted in bold. (C,D) Schematic gene organization around MAT in selected species. Species were categorized as either heterothallic (C) or homothallic (D) depending on whether MATa and MATalpha genes are present in the same haploid genome assembly. Red outlines indicate species known or predicted to be capable of mating-type switching (secondary homothallism). Gene names in bold (NVJ2, APC5, SLA2, SUI1 and APN2) show conserved synteny among Taphrinomycotina, Pezizomycotina and Saccharomycotina near the MAT locus. To maximize data, for Wickerhamomyces and Pneumocystis we used different congeneric species for the MAT locus and phylogenomic analyses.

37

Figure S12. MAT locus isomeric structures in Pachysolen tannophilus and Ascoidea rubescens. (A) Mating-type switching in P. tannophilus by inversion of MAT genes. The schematic on the left shows the organization of the two copies of a 2-kb inverted repeat sequence (IR-L and IR- R), which includes the 3’ end of the gene SLA2, relative to the MATa and MATα genes, in the

38 type strain CBS4044. The orientation of the invertible region is assessed by PCR amplification of genomic DNA using primers A-D. Growth of CBS4044 in YPD media yields amplification products only from primer pairs AB and CD, indicating haploid cells with the chromosomal orientation as drawn. After growth of CBS4044 in nitrogen limitation media (NaKG: 0.5% sodium acetate, 1% , 1% glucose), PCR products are also seen from the AC and BD primer pairs, indicating recombination between the two copies of the IR in some cells in the culture. Strain ACY4044 (a) is isogenic to CBS4044 but has the opposite orientation of the invertible region. It was isolated by growing CBS4044 in NaKG for 24 h, followed by dilution and plating for single colonies on YPD agar and screening the MAT region orientation by colony PCR. The chromosome structure diagram is a composite of data from our assembly and from Liu et al.(336), which are both from the type strain of P. tannophilus. (B) Organization of the MAT region in A. rubescens. A 2-kb inverted repeat includes the 5’ end of the MATa1 gene. A telomere (TEL) is located immediately to the right of one copy of the IR (IR-R). The MATa and MATα genes are separated by approximately 45 kb of noncoding DNA. The 12 gel panels show PCR amplification, with 4 primer pairs as indicated, of genomic DNA isolated from three strains of A. rubescens including the one whose genome was sequenced (NRRL Y-17699). For each strain, we isolated genomic DNA separately from four colonies cut from YM agar (A. rubescens is hyphal and did not grow well in liquid culture). Most of the DNA preparations show a predominance of either the AB and CD products (corresponding to the orientation as drawn) or the AC and BD products (corresponding to recombination between the two IRs), but with a significant presence of the alternative orientation. These cultures could be either a mix of two haploid cell types (with a majority unswitched and a minority switched), or a mixture of diploid and unswitched haploid cells. The chromosome structure diagram represents a merge between scaffolds 22 and 25 of our genome assembly of the type strain of A. rubescens. This junction, and the presence of the telomere beside IR-R, were confirmed by PCR. The consensus telomeric repeat sequence of A. rubescens is 5’- (TGCTGGATGA)n.

39

References

1. Kurtzman CP & Suzuki M (2010) Phylogenetic analysis of ascomycete yeasts that form coenzyme Q-9 and the proposal of the new genera Babjeviella, Meyerozyma, Millerozyma, Priceomyces, and Scheffersomyces. Mycoscience 51(1):2-14. 2. Kurtzman CP (1990) Candida shehatae ? Genetic diversity and phylogenetic relationships with other xylose-fermenting yeasts. Antonie van Leeuwenhoek 57(4):215-222. 3. Vaughan Martini AE (1984) Genome comparison between Pichia stipitis and some related imperfect species. Annali della Facolta di Agraria Universita degli Studi di Perugia 38:331- 336. 4. Du Preez JC, Van Driessel B, & Prior BA (1989) D-Xylose fermentation by Candida shehatae and Pichia stipitis at low dissolved oxygen levels in fed batch cultures. Biotechnology Letters 11(2):131-136. 5. Du Preez JC, Van Driessel B, & Prior BA (1989) Ethanol tolerance of Pichia stipitis and Candida shehatae strains in fed-batch cultures at controlled low dissolved oxygen levels. Applied Microbiology and Biotechnology 30(1):53-58. 6. du Preez JC, Bosch M, & Prior BA (1986) Xylose fermentation by Candida shehatae and Pichia stipitis: effects of pH, temperature and substrate concentration. Enzyme and Microbial Technology 8(6):360-364. 7. van Dijken JP, van den Bosch E, Hermans JJ, de Miranda LR, & Scheffers WA (1986) Alcoholic fermentation by 'non-fermentative' yeasts. Yeast 2(2):123-127. 8. Laplaza JM, Torres BR, Jin YS, & Jeffries TW (2006) Sh ble and Cre adapted for functional genomics and metabolic engineering of Pichia stipitis. Enzyme and Microbial Technology 38(6):741-747. 9. Urbina H, Schuster J, & Blackwell M (2013) The gut of Guatemalan passalid beetles: a habitat colonized by cellobiose- and xylose-fermenting yeasts. Fungal Ecology 6(5):339- 355. 10. Nguyen NH, Suh SO, Marshall CJ, & Blackwell M (2006) Morphological and ecological similarities: wood-boring beetles associated with novel xylose-fermenting yeasts, Spathaspora passalidarum gen. sp. nov. and Candida jeffriesii sp. nov. Mycol Res 110(Pt 10):1232-1241. 11. Suh SO, White MM, Nguyen NH, & Blackwell M (2004) The status and characterization of Enteroramus dimorbhus: a xylose-fermenting yeast attached to the gut of beetles. Mycologia 96(4):756-760. 12. Passoth V, Cohn M, Schafer B, Hahn-Hagerdal B, & Klinner U (2003) Analysis of the hypoxia-induced ADH2 promoter of the respiratory yeast Pichia stipitis reveals a new mechanism for sensing of oxygen limitation in yeast. Yeast 20(1):39-51. 13. Jeffries TW, et al. (2007) Genome sequence of the lignocellulose-bioconverting and xylose-fermenting yeast Pichia stipitis. Nat Biotechnol 25(3):319-326. 14. Jones KG, Dowd PF, & Blackwell M (1999) Polyphyletic origins of yeast-like endocytobionts from anobiid and cerambycid beetles. Mycological Research 103:542-546. 15. Cadete RM, et al. (2013) Spathaspora brasiliensis sp. nov., Spathaspora suhii sp. nov., Spathaspora roraimanensis sp. nov. and Spathaspora xylofermentans sp. nov., four novel (D)-xylose-fermenting yeast species from Brazilian Amazonian forest. Antonie Van Leeuwenhoek 103(2):421-431. 16. Cadete RM, et al. (2012) Diversity and physiological characterization of D-xylose- fermenting yeasts isolated from the Brazilian Amazonian Forest. PLoS One 7(8):e43135.

40

17. Cadete RM, et al. (2009) Spathaspora arborariae sp. nov., a d-xylose-fermenting yeast species isolated from rotting wood in Brazil. FEMS Yeast Res 9(8):1338-1342. 18. Barbosa AC, Cadete RM, Gomes FC, Lachance MA, & Rosa CA (2009) Candida materiae sp. nov., a yeast species isolated from rotting wood in the Atlantic Rain Forest. Int J Syst Evol Microbiol 59(Pt 8):2104-2106. 19. Hou X (2012) Anaerobic xylose fermentation by Spathaspora passalidarum. Appl Microbiol Biotechnol 94(1):205-214. 20. Long TM, et al. (2012) Cofermentation of glucose, xylose, and cellobiose by the beetle- associated yeast Spathaspora passalidarum. Appl Environ Microbiol 78(16):5492-5500. 21. Wohlbach DJ, et al. (2011) Comparative genomics of xylose-fermenting fungi for enhanced biofuel production. Proceedings of the National Academy of Sciences of the United States of America 108(32):13212-13217. 22. Suh SO, McHugh JV, & Blackwell M (2004) Expansion of the Candida tanzawaensis yeast clade: 16 novel Candida species from basidiocarp-feeding beetles. Int J Syst Evol Microbiol 54(Pt 6):2409-2429. 23. Kurtzman CP & Robnett CJ (1998) Identification and phylogeny of ascomycetous yeasts from analysis of nuclear large subunit (26S) ribosomal DNA partial sequences. Antonie Van Leeuwenhoek International Journal of General and Molecular Microbiology 73(4):331- 371. 24. Kurtzman CP (2001) Six new anamorphic ascomycetous yeasts near Candida tanzawaensis. FEMS Yeast Res 1(3):177-185. 25. Ramakrishna N, Lacey J, & Smith JE (1996) Colonization of barley grain by Penicillium verrucosum and ochratoxin A formation in the presence of competing fungi. Journal of Food Protection 59(12):1311-1317. 26. Ohisa N, Suzuki N, Miura M, Endo K, & Fujita Y (2012) Halotolerant yeast participates in the cavity formation of hand-pulled dry noodles. J Jpn Soc Food Sci 59(9):442-446. 27. Simoncini N, Rotelli D, Virgili R, & Quintavalla S (2007) Dynamics and characterization of yeasts during ripening of typical Italian dry-cured ham. Food Microbiol 24(6):577-584. 28. Virgili R, et al. (2012) Biocontrol of Penicillium nordicum growth and ochratoxin A production by native yeasts of dry cured ham. Toxins (Basel) 4(2):68-82. 29. Ramakrishna N, Lacey J, & Smith JE (1996) The effects of fungal competition on colonization of barley grain by Fusarium sporotrichioides on T-2 toxin formation. Food Addit Contam 13(8):939-948. 30. Deschuyffeleer N, et al. (2011) Identification and characterization of yeasts causing chalk mould defects on par-baked bread. Food Microbiol 28(5):1019-1027. 31. Thome PE & Trench RK (1999) Osmoregulation and the genetic induction of glycerol-3- phosphate dehydrogenase by NaCl in the euryhaline yeast Debaryomyces hansenii. Mar Biotechnol (NY) 1(3):230-238. 32. Andre L, Nilsson A, & Adler L (1988) The Role of Glycerol in Osmotolerance of the Yeast Debaryomyces hansenii. Journal of General Microbiology 134:669-677. 33. Guma-Cintron Y, Bandyopadhyay A, Rosado W, Shu-Hu W, & Nadathur GS (2015) Transcriptomic analysis of cobalt stress in the marine yeast Debaryomyces hansenii. FEMS Yeast Res 15(8). 34. Sacerdot C, et al. (2008) Promiscuous DNA in the nuclear genomes of hemiascomycetous yeasts. FEMS Yeast Res 8(6):846-857. 35. Suh SO, Nguyen NH, & Blackwell M (2005) Nine new Candida species near C. membranifaciens isolated from insects. Mycol Res 109(Pt 9):1045-1056. 36. Wohlbach DJ, et al. (2011) Comparative genomics of xylose-fermenting fungi for enhanced biofuel production. Proc Natl Acad Sci U S A 108(32):13212-13217.

41

37. Grunwald S, Pilhofer M, & Holl W (2010) Microbial associations in gut systems of wood- and bark-inhabiting longhorned beetles [Coleoptera: Cerambycidae]. Syst Appl Microbiol 33(1):25-34. 38. Jeffries TW & Kurtzman CP (1994) Strain selection, , and of xylose- fermenting yeasts. Enzyme and Microbial Technology 16(11):922-932. 39. Neuhauser W, Haltrich D, Kulbe KD, & Nidetzky B (1997) NAD(P)H-dependent aldose reductase from the xylose-assimilating yeast Candida tenuis. Isolation, characterization and biochemical properties of the enzyme. Biochem J 326 ( Pt 3):683-692. 40. Petschacher B & Nidetzky B (2008) Altering the coenzyme preference of xylose reductase to favor utilization of NADH enhances ethanol yield from xylose in a metabolically engineered strain of Saccharomyces cerevisiae. Microb Cell Fact 7:9. 41. Biswas D, Pandya V, Singh AK, Mondal AK, & Kumaran S (2012) Co-factor binding confers substrate specificity to xylose reductase from Debaryomyces hansenii. PLoS One 7(9):e45525. 42. Khattab SM, Watanabe S, Saimura M, & Kodaki T (2011) A novel strictly NADPH- dependent Pichia stipitis xylose reductase constructed by site-directed mutagenesis. Biochem Biophys Res Commun 404(2):634-637. 43. Golubev WI (2011) Differentiation between aquatic and terrestrial Metschnikowia species of based on their sensitivity to Pichia membranifaciens mycocins. Microbiology 80(2):154- 157. 44. Su HY, et al. (2005) Upregulation of actin-like gene expression in giant freshwater prawns Macrobrachium rosenbergii infected with Metschnikowia bicuspidata. Dis Aquat Organ 66(2):175-180. 45. Chen SC, et al. (2007) Metschnikowia bicuspidata dominates in Taiwanese cold-weather yeast infections of Macrobrachium rosenbergii. Dis Aquat Organ 75(3):191-199. 46. Moore MM & Strom MS (2003) Infection and mortality by the yeast Metschnikowia bicuspidata var. bicuspidata in chinook salmon fed live adult brine shrimp (Artemia franciscana). Aquaculture 220(1-4):43-57. 47. Sun HY, Wang K, Chi Z, Xu HM, & Chi ZM (2012) Simultaneous production of single cell protein and killer toxin by Wickerhamomyces anomalus HN1-2 isolated from mangrove ecosystem. Process Biochemistry 47(2):251-256. 48. Buzdar MA, Chi Z, Wang Q, Hua MX, & Chi ZM (2011) Production, purification, and characterization of a novel killer toxin from Kluyveromyces siamensis against a pathogenic yeast in crab. Appl Microbiol Biotechnol 91(6):1571-1579. 49. Peng Y, Chi Z, Wang X, & Li J (2010) beta-1,3-glucanase inhibits activity of the killer toxin produced by the marine-derived yeast Williopsis saturnus WC91-2. Mar Biotechnol (NY) 12(4):479-485. 50. Billongrand G (1989) A New Ascosporogenous Yeast Genus - Yamadazyma Gen-Nov. Mycotaxon 35(2):201-204. 51. Kurtzman CP, Babjeviella K, & Suzuki M (2011) In: Kurtzman, C.P., Fell, J.W., Boekhout, T. (2011) The Yeasts, a Taxonomic Study. 5th edition. Elsevier, Amsterdam. pp. 329-331. pp 329-331. 52. Ligon JM, Bolen PL, Hill DS, Bothast RJ, & Kurtzman CP (1989) Physical and biological characterization of linear DNA plasmids of the yeast Pichia inositovora. Plasmid 21(3):185- 194. 53. Hayman GT & Bolen PL (1991) Linear DNA plasmids of Pichia inositovora are associated with a novel killer toxin activity. Curr Genet 19(5):389-393. 54. Klassen R & Meinhardt F (2003) Structural and functional analysis of the killer element pPin1-3 from Pichia inositovora. Mol Genet Genomics 270(2):190-199.

42

55. Jeske S, Meinhardt F, & Klassen R (2007) Extranuclear inheritance: Virus-like DNA- elements in yeast. Progress in Botany, Progress in Botany, eds Esser K, Luttge U, Beyschlag W, & Murata J), Vol 68, pp 98-129. 56. Satwika D, Klassen R, & Meinhardt F (2012) Anticodon nuclease encoding virus-like elements in yeast. Appl Microbiol Biotechnol 96(2):345-356. 57. Kast A, Klassen R, & Meinhardt F (2014) rRNA fragmentation induced by a yeast killer toxin. Mol Microbiol 91(3):606-617. 58. Celik E & Calik P (2012) Production of recombinant proteins by yeast cells. Biotechnol Adv 30(5):1108-1118. 59. Romanos MA, Scorer CA, & Clare JJ (1992) Foreign gene expression in yeast: a review. Yeast 8(6):423-488. 60. Dikicioglu D, Wood V, Rutherford KM, McDowall MD, & Oliver SG (2014) Improving functional annotation for industrial microbes: a case study with Pichia pastoris. Trends Biotechnol 32(8):396-399. 61. Puxbaum V, Mattanovich D, & Gasser B (2015) Quo vadis? The challenges of recombinant protein folding and secretion in Pichia pastoris. Appl Microbiol Biotechnol 99(7):2925-2938. 62. Kurtzman CP (2009) Biotechnological strains of Komagataella (Pichia) pastoris are Komagataella phaffii as determined from multigene sequence analysis. J Ind Microbiol Biotechnol 36(11):1435-1438. 63. Prielhofer R, et al. (2013) Induction without methanol: novel regulated promoters enable high-level expression in Pichia pastoris. Microb Cell Fact 12:5. 64. Kuberl A, et al. (2011) High-quality genome sequence of Pichia pastoris CBS7435. J Biotechnol 154(4):312-320. 65. De Schutter K, et al. (2009) Genome sequence of the recombinant protein production host Pichia pastoris. Nat Biotechnol 27(6):561-566. 66. Mattanovich D, et al. (2009) Genome, secretome and glucose transport highlight unique features of the protein production host Pichia pastoris. Microb Cell Fact 8:29. 67. Suh SO & Zhou JJ (2010) Methylotrophic yeasts near Ogataea (Hansenula) polymorpha: a proposal of Ogataea angusta comb. nov. and Candida parapolymorpha sp. nov. FEMS Yeast Res 10(5):631-638. 68. Ah Kang H & Gellissen G (2008) Protein Production in Hansenula polymorpha. Protein Science Encyclopedia, (Wiley-VCH Verlag GmbH & Co. KGaA). 69. Levine DW & Cooney CL (1973) Isolation and characterization of a thermotolerant methanol-utilizing yeast. Appl Microbiol 26(6):982-990. 70. Kato N, Omori Y, Tani Y, & Ogata K (1976) Alcohol oxidases of Kloeckera sp. and Hansenula polymorpha. Catalytic properties and subunit structures. Eur J Biochem 64(2):341-350. 71. Reid MF & Fewson CA (1994) Molecular characterization of microbial alcohol dehydrogenases. Crit Rev Microbiol 20(1):13-56. 72. Agaphonov MO, Beburov M, Ter-Avanesyan MD, & Smirnov VN (1995) A disruption- replacement approach for the targeted integration of foreign genes in Hansenula polymorpha. Yeast 11(13):1241-1247. 73. Faber KN, et al. (1992) Chromosomal targeting of replicating plasmids in the yeast Hansenula polymorpha. J Gen Microbiol 138(11):2405-2416. 74. Voronovsky AY, Rohulya OV, Abbas CA, & Sibirny AA (2009) Development of strains of the thermotolerant yeast Hansenula polymorpha capable of alcoholic fermentation of starch and xylan. Metab Eng 11(4-5):234-242. 75. Kurtzman CP FJ, Boekhout T, ed. (2011) The Yeasts, a Taxonomic Study (Elsevier, Amsterdam).

43

76. Ravin NV, et al. (2013) Genome sequence and analysis of methylotrophic yeast Hansenula polymorpha DL1. BMC Genomics 14:837. 77. Kurtzman CP & Dien BS (1998) Candida arabinofermentans, a new L-arabinose fermenting yeast. Antonie Van Leeuwenhoek 74(4):237-243. 78. Rivera FN, et al. (2009) Gut-associated yeast in bark beetles of the genus Dendroctonus Erichson (Coleoptera: Curculionidae: Scolytinae). Biological Journal of the Linnean Society 98(2):325-342. 79. Kurtzman CP, Fell JW, & Boekhout T (2011) Definition, Classification and Nomenclature of the Yeasts. Yeasts: A Taxonomic Study, Vols 1-3, 5th Edition:3-5. 80. Fonseca C, Spencer-Martins I, & Hahn-Hagerdal B (2007) L-Arabinose metabolism in Candida arabinofermentans PYCC 5603T and Pichia guilliermondii PYCC 3012: influence of sugar and oxygen on product formation. Appl Microbiol Biotechnol 75(2):303-310. 81. Knoshaug EP, Franden MA, Stambuk BU, Zhang M, & Singh A (2009) Utilization and transport of l-arabinose by non-Saccharomyces yeasts. Cellulose 16(4):729-741. 82. Institute DOEJG (2013) Candida arabinofermentans NRRL YB-2248 Genome sequencing. European Nucleotide Archive. 83. Diaz C, Molina AM, Nahring J, & Fischer R (2013) Characterization and dynamic behavior of wild yeast during spontaneous wine fermentation in steel tanks and amphorae. Biomed Res Int 2013:540465. 84. Arroyo-Lopez FN, Querol A, Bautista-Gallego J, & Garrido-Fernandez A (2008) Role of yeasts in table olive production. Int J Food Microbiol 128(2):189-196. 85. Westall S & Filtenborg O (1998) Spoilage yeasts of decorated soft cheese packed in modified atmosphere. Food Microbiology 15(2):243-249. 86. Lachance MA (1995) Yeast communities in a natural tequila fermentation. Antonie Van Leeuwenhoek 68(2):151-160. 87. Rankine BC (1966) Pichia membranaefaciens, a yeast causing film formation and off- flavor in table wine. Amer J Enol Viticult 17((2)):82-86. 88. Aguiar C & Lucas C (2000) Yeasts killer/sensitivity phenotypes and halotolerance. Food Technology and Biotechnology 38(1):39-46. 89. Oliveira M, et al. (2004) Biotechnology of olive fermentation of 'Galega' Portuguese variety. Grasas Y Aceites 55(3):219-226. 90. Marquina D, et al. (1992) Characterization of the Yeast Population in Olive Brines. Letters in Applied Microbiology 14(6):279-283. 91. Santos A & Marquina D (2004) Killer toxin of Pichia membranifaciens and its possible use as a biocontrol agent against grey mould disease of grapevine. Microbiology 150(Pt 8):2527-2534. 92. Santos A, San Mauro M, Bravo E, & Marquina D (2009) PMKT2, a new killer toxin from Pichia membranifaciens, and its promising biotechnological properties for control of the spoilage yeast Brettanomyces bruxellensis. Microbiology 155(Pt 2):624-634. 93. Cai JP, Roberts IN, & Collins MD (1996) Phylogenetic relationships among members of the ascomycetous yeast genera Brettanomyces, Debaryomyces, Dekkera, and Kluyveromyces deduced by small-subunit rRNA gene sequences. International Journal of Systematic Bacteriology 46(2):542-549. 94. Woolfit M, Rozpedowska E, Piskur J, & Wolfe KH (2007) Genome survey sequencing of the wine spoilage yeast Dekkera (Brettanomyces) bruxellensis. Eukaryot Cell 6(4):721- 733. 95. Renouf V, Claisse O, & Lonvaud-Funel A (2007) Inventory and monitoring of wine microbial consortia. Appl Microbiol Biotechnol 75(1):149-164. 96. Kwast KE, Burke PV, & Poyton RO (1998) Oxygen sensing and the transcriptional regulation of oxygen-responsive genes in yeast. J Exp Biol 201(Pt 8):1177-1195.

44

97. Suarez R, Suarez-Lepe JA, Morata A, & Calderon F (2007) The production of ethylphenols in wine by yeasts of the genera Brettanomyces and Dekkera: A review. Food Chemistry 102(1):10-21. 98. Dias L, Pereira-da-Silva S, Tavares M, Malfeito-Ferreira M, & Loureiro V (2003) Factors affecting the production of 4-ethylphenol by the yeast Dekkera bruxellensis in enological conditions. Food Microbiology 20(4):377-384. 99. Prochazka E, Polakova S, Piskur J, & Sulo P (2010) Mitochondrial genome from the facultative anaerobe and petite-positive yeast Dekkera bruxellensis contains the NADH dehydrogenase subunit genes. FEMS Yeast Res 10(5):545-557. 100. Blomqvist J, Nogue VS, Gorwa-Grauslund M, & Passoth V (2012) Physiological requirements for growth and competitiveness of Dekkera bruxellensis under oxygen- limited or anaerobic conditions. Yeast 29(7):265-274. 101. Moktaduzzaman M, et al. (2015) Galactose utilization sheds new light on sugar metabolism in the sequenced strain Dekkera bruxellensis CBS 2499. FEMS Yeast Res 15(2). 102. Meroth CB, Hammes WP, & Hertel C (2003) Identification and population dynamics of yeasts in sourdough fermentation processes by PCR-denaturing gradient gel electrophoresis. Appl Environ Microbiol 69(12):7453-7461. 103. Galafassi S, et al. (2011) Dekkera/Brettanomyces yeasts for ethanol production from renewable sources under oxygen-limited and low-pH conditions. J Ind Microbiol Biotechnol 38(8):1079-1088. 104. Reis AL, et al. (2014) Oxygen-limited cellobiose fermentation and the characterization of the cellobiase of an industrial Dekkera/Brettanomyces bruxellensis strain. Springerplus 3:38. 105. Blomqvist J, et al. (2011) Fermentation of lignocellulosic hydrolysate by the alternative industrial ethanol yeast Dekkera bruxellensis. Lett Appl Microbiol 53(1):73-78. 106. Okabe M (JP2010000024-A. 107. Schneider H, Wang PY, Chan YK, & Maleszka R (1981) Conversion of D-Xylose into ethanol by the yeast Pachysolen tannophilus. Biotechnology Letters 3(2):89-92. 108. Slininger PJ, Bothast RJ, Van Cauwenberge JE, & Kurtzman CP (1982) Conversion of D- xylose to ethanol by the yeast Pachysolen tannophilus. Biotechnol Bioeng 24(2):371-384. 109. James AP & Zahab DM (1982) A Genetic System for Pachysolen tannophilus, a Pentose- Fermenting Yeast. Journal of General Microbiology 128(Oct):2297-2301. 110. Maleszka R, Wang PY, & Schneider H (1982) Ethanol-Production from D-Galactose and Glycerol by Pachysolen tannophilus. Enzyme and Microbial Technology 4(5):349-352. 111. Liu X, Jensen PR, & Workman M (2012) Bioconversion of crude glycerol feedstocks into ethanol by Pachysolen tannophilus. Bioresour Technol 104:579-586. 112. Smiley KL & Bolen PL (1982) Demonstration of D-Xylose reductase and D-Xylitol dehydrogenase in Pachysolen tannophilus. Biotechnology Letters 4(9):607-610. 113. Bolen PL & Detroy RW (1985) Induction of NADPH-linked D-xylose reductase and NAD- linked xylitol dehydrogenase activities in Pachysolen tannophilus by D-xylose, L- arabinose, or D-galactose. Biotechnol Bioeng 27(3):302-307. 114. Bicho PA, Runnals PL, Cunningham JD, & Lee H (1988) Induction of Xylose Reductase and Xylitol Dehydrogenase Activities in Pachysolen tannophilus and Pichia stipitis on Mixed Sugars. Appl Environ Microbiol 54(1):50-54. 115. Verduyn C, Frank J, Vandijken JP, & Scheffers WA (1985) Multiple forms of xylose reductase in Pachysolen tannophilus CBS4044. Fems Microbiology Letters 30(3):313-317. 116. Jeffries TW (1983) Effects of Nitrate on Fermentation of Xylose and Glucose by Pachysolen tannophilus. Bio-Technology 1(6):503-506.

45

117. Buckley HR & Vanuden N (1967) Candida Shehatae Sp N a Yeast Associated with Wood- Destroying Insects. Mycopathologia Et Mycologia Applicata 32(4):297-&. 118. Du Preez JC & Van Der Walt JP (1983) Fermentation of D-xylose to ethanol by a strain of Candida shehatae. Biotechnology Letters 5(5):357-362. 119. Toivola A, Yarrow D, van den Bosch E, van Dijken JP, & Scheffers WA (1984) Alcoholic fermentation of d-xylose by yeasts. Appl Environ Microbiol 47(6):1221-1223. 120. Liu X, Kaas RS, Jensen PR, & Workman M (2012) Draft Genome Sequence of the Yeast Pachysolen tannophilus CBS 4044/NRRL Y-2460. Eukaryotic Cell 11(6):827-827. 121. Moulin G & Galzy P (1978) Remarks on the metabolism of Kluyveromyces lactis van der walt. Mycopathologia 66(1-2):73-76. 122. Lahamguillaume M, Moulin G, & Galzy P (1979) Selection of Yeast Strains Due to Production of Alcohol on Whey. Lait 59(588):489-496. 123. Siso MIG (1996) The biotechnological utilization of cheese whey: A review. Bioresource Technology 57(1):1-11. 124. Kumura H, Tanoue Y, Tsukahara M, Tanaka T, & Shimazaki K (2004) Screening of dairy yeast strains for probiotic applications. J Dairy Sci 87(12):4050-4056. 125. Gunge N, Tamaru A, Ozawa F, & Sakaguchi K (1981) Isolation and characterization of linear deoxyribonucleic acid plasmids from Kluyveromyces lactis and the plasmid- associated killer character. J Bacteriol 145(1):382-390. 126. Tommasino M, Ricci S, & Galeotti CL (1988) Genome organization of the killer plasmid pGK12 from Kluyveromyces lactis. Nucleic Acids Res 16(13):5863-5878. 127. Bianchi MM, et al. (1987) Transformation of the yeast Kluyveromyces lacits by new vectors derived from the 1.6 mµ circular plasmid PKD1. Current Genetics 12(3):185-192. 128. Das S & Hollenberg CP (1982) A high-frequency transformation system for the yeast Kluyveromyces lactis. Curr Genet 6(2):123-128. 129. Chen XJ (1996) Low- and high-copy-number shuttle vectors for replication in the budding yeast Kluyveromyces lactis. Gene 172(1):131-136. 130. Dujon B, et al. (2004) Genome evolution in yeasts. Nature 430(6995):35-44. 131. Wallace GB (1929) Diseases of plants. Rept. Dept. Agric. Tanganyika Territory for the year ending 31st March, 1928:40-42. 132. Harris SD & Momany M (2004) Polarity in filamentous fungi: moving beyond the yeast paradigm. Fungal Genet Biol 41(4):391-400. 133. Dietrich FS, et al. (2004) The Ashbya gossypii genome as a tool for mapping the ancient Saccharomyces cerevisiae genome. Science 304(5668):304-307. 134. Stahmann KP, Revuelta JL, & Seulberger H (2000) Three biotechnical processes using Ashbya gossypii, Candida famata, or Bacillus subtilis compete with chemical riboflavin production. Applied Microbiology and Biotechnology 53(5):509-516. 135. Jimenez A, Santos MA, Pompejus M, & Revuelta JL (2005) Metabolic engineering of the purine pathway for riboflavin production in Ashbya gossypii. Appl Environ Microbiol 71(10):5743-5751. 136. Abbas CA & Sibirny AA (2011) Genetic control of biosynthesis and transport of riboflavin and flavin nucleotides and construction of robust biotechnological producers. Microbiol Mol Biol Rev 75(2):321-360. 137. Brachat S, et al. (2003) Reinvestigation of the Saccharomyces cerevisiae genome annotation by comparison to the genome of a related fungus: Ashbya gossypii. Genome Biol 4(7):R45. 138. Goffeau A, et al. (1996) Life with 6000 genes. Science 274(5287):546, 563-547. 139. Andreasen AA & Stier TJ (1953) Anaerobic nutrition of Saccharomyces cerevisiae. I. Ergosterol requirement for growth in a defined medium. J Cell Physiol 41(1):23-36.

46

140. Andreasen AA & Stier TJ (1954) Anaerobic nutrition of Saccharomyces cerevisiae. II. Unsaturated fatty acid requirement for growth in a defined medium. J Cell Physiol 43(3):271-281. 141. Albers E, Larsson C, Liden G, Niklasson C, & Gustafsson L (1996) Influence of the nitrogen source on Saccharomyces cerevisiae anaerobic growth and product formation. Appl Environ Microbiol 62(9):3187-3195. 142. Ha SJ, et al. (2011) Engineered Saccharomyces cerevisiae capable of simultaneous cellobiose and xylose fermentation. Proc Natl Acad Sci U S A 108(2):504-509. 143. Becker J & Boles E (2003) A modified Saccharomyces cerevisiae strain that consumes L- arabinose and produces ethanol. Applied and Environmental Microbiology 69(7):4144- 4150. 144. Kotter P & Ciriacy M (1993) Xylose Fermentation by Saccharomyces cerevisiae. Applied Microbiology and Biotechnology 38(6):776-783. 145. Sonderegger M & Sauer U (2003) Evolutionary engineering of Saccharomyces cerevisiae for anaerobic growth on xylose. Appl Environ Microbiol 69(4):1990-1998. 146. Sonderegger M, Jeppsson M, Hahn-Hagerdal B, & Sauer U (2004) Molecular basis for anaerobic growth of Saccharomyces cerevisiae on xylose, investigated by global gene expression and metabolic flux analysis. Appl Environ Microbiol 70(4):2307-2317. 147. Eliasson A, Christensson C, Wahlbom CF, & Hahn-Hagerdal B (2000) Anaerobic xylose fermentation by recombinant Saccharomyces cerevisiae carrying XYL1, XYL2, and XKS1 in mineral medium chemostat cultures. Appl Environ Microbiol 66(8):3381-3386. 148. Kuyper M, et al. (2005) Metabolic engineering of a xylose-isomerase-expressing Saccharomyces cerevisiae strain for rapid anaerobic xylose fermentation. FEMS Yeast Res 5(4-5):399-409. 149. Kuyper M, et al. (2003) High-level functional expression of a fungal xylose isomerase: the key to efficient ethanolic fermentation of xylose by Saccharomyces cerevisiae? FEMS Yeast Res 4(1):69-78. 150. Kuyper M, Winkler AA, van Dijken JP, & Pronk JT (2004) Minimal metabolic engineering of Saccharomyces cerevisiae for efficient anaerobic xylose fermentation: a proof of principle. FEMS Yeast Res 4(6):655-664. 151. Boeke JD, LaCroute F, & Fink GR (1984) A positive selection for mutants lacking orotidine-5'-phosphate decarboxylase activity in yeast: 5-fluoro-orotic acid resistance. Mol Gen Genet 197(2):345-346. 152. Sikorski RS & Hieter P (1989) A system of shuttle vectors and yeast host strains designed for efficient manipulation of DNA in Saccharomyces cerevisiae. Genetics 122(1):19-27. 153. Brachmann CB, et al. (1998) Designer deletion strains derived from Saccharomyces cerevisiae S288C: a useful set of strains and plasmids for PCR-mediated gene disruption and other applications. Yeast 14(2):115-132. 154. Giaever G, et al. (2002) Functional profiling of the Saccharomyces cerevisiae genome. Nature 418(6896):387-391. 155. Uetz P, et al. (2000) A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403(6770):623-627. 156. Ho Y, et al. (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415(6868):180-183. 157. Krogan NJ, et al. (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440(7084):637-643. 158. Ross PL, et al. (2004) Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol Cell Proteomics 3(12):1154-1169.

47

159. Spellman PT, et al. (1998) Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9(12):3273- 3297. 160. Hu Y, et al. (2007) Approaching a complete repository of sequence-verified protein- encoding clones for Saccharomyces cerevisiae. Genome Res 17(4):536-543. 161. Lee TI, et al. (2002) Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298(5594):799-804. 162. Solieri L, Landi S, De Vero L, & Giudici P (2006) Molecular assessment of indigenous yeast population from traditional balsamic vinegar. J Appl Microbiol 101(1):63-71. 163. Panon G, Massiot P, & Drilleau JF (1995) Pectinolytic enzymes production by yeasts in cider fermentation. Sciences Des Aliments 15(1):31-42. 164. Michel A, Bizeau C, & Drilleau JF (1988) Survey of the Yeast Flora Found in Cider Factories Located in the West of France. Sciences Des Aliments 8(3):359-368. 165. Valles BS, Bedrinana RP, Tascon NF, Simon AQ, & Madrera RR (2007) Yeast species associated with the spontaneous fermentation of cider. Food Microbiol 24(1):25-31. 166. Xu Y, Zhao GA, & Wang LP (2006) Controlled formation of volatile components in cider making using a combination of Saccharomyces cerevisiae and Hanseniaspora valbyensis yeast species. J Ind Microbiol Biotechnol 33(3):192-196. 167. Gold D, Mohagheghi A, Cooney CL, & Wang DIC (1981) Single-Cell Protein-Production from Spent Sulfite Liquor Utilizing Cell-Recycle and Computer Monitoring. Biotechnology and Bioengineering 23(9):2105-2116. 168. McKee LA & Quicke GV (1977) Yeast production on spent sultite liquor. South African Journal of Science 73(12):379-381. 169. Harris EE, Hajny GJ, & Johnson MC (1951) Protein Evaluations of Yeast Grown on Wood Hydrolyzate. Industrial & Engineering Chemistry 43(7):1593-1596. 170. Moftah OA, et al. (2012) Adding value to the oil cake as a waste from oil processing industry: production of lipase and protease by Candida utilis in solid state fermentation. Appl Biochem Biotechnol 166(2):348-364. 171. Grbavcic SZ, Dimitrijevic-Brankovic SI, Bezbradica DI, Siler-Marinkovic SS, & Knezevic ZD (2007) Effect of fermentation conditions on lipase production by Candida utilis. Journal of the Serbian Chemical Society 72(8-9):757-765. 172. Khan TR & Daugulis AJ (2010) Application of solid-liquid TPPBs to the production of L- phenylacetylcarbinol from benzaldehyde using Candida utilis. Biotechnol Bioeng 107(4):633-641. 173. Tripathi CM, Agarwal SC, & Basu SK (1997) Production of L-phenylacetylcarbinol by fermentation. Journal of Fermentation and Bioengineering 84(6):487-492. 174. Shin HS & Rogers PL (1995) Biotransformation of benzeldehyde to L- phenylacetylcarbinol, an intermediate in L-ephedrine production, by immobilized Candida utilis. Applied Microbiology and Biotechnology 44(1-2):7-14. 175. Rogers PL, Shin HS, & Wang B (1997) Biotransformation for L-ephedrine production. Adv Biochem Eng Biotechnol 56:33-59. 176. Shin HS & Rogers PL (1996) Production of L-phenylacetylcarbinol (L-PAC) from benzaldehyde using partially purified pyruvate decarboxylase (PDC). Biotechnology and Bioengineering 49(1):52-62. 177. Hazen KC (1995) New and emerging yeast pathogens. Clin Microbiol Rev 8(4):462-478. 178. Verduyn C (1991) Physiology of yeasts in relation to biomass yields. Antonie Van Leeuwenhoek 60(3-4):325-353. 179. Postma E, Kuiper A, Tomasouw WF, Scheffers WA, & Vandijken JP (1989) Competition for glucose between the yeasts Saccharomyces cerevisiae and Candida utilis. Applied and Environmental Microbiology 55(12):3214-3220.

48

180. Ordaz L, Lopez R, Melchy O, & de la Torre M (2001) Effect of high-cell-density fermentation of Candida utilis on kinetic parameters and the shift to respiro-fermentative metabolism. Appl Microbiol Biotechnol 57(3):374-378. 181. Passoth V, Fredlund E, Druvefors UA, & Schnurer J (2006) Biotechnology, physiology and genetics of the yeast Pichia anomala. FEMS Yeast Res 6(1):3-13. 182. Buzzini P & Martini A (2001) Large-scale screening of selected Candida maltosa, Debaryomyces hansenii and Pichia anomala killer toxin activity against pathogenic yeasts. Med Mycol 39(6):479-482. 183. Comitini F, De Ingeniis J, Pepe L, Mannazzu I, & Ciani M (2004) Pichia anomala and Kluyveromyces wickerhamii killer toxins as new tools against Dekkera/Brettanomyces spoilage yeasts. FEMS Microbiol Lett 238(1):235-240. 184. Fredlund E, Druvefors U, Boysen ME, Lingsten KJ, & Schnurer J (2002) Physiological characteristics of the biocontrol yeast Pichia anomala J121. FEMS Yeast Res 2(3):395- 402. 185. Minagawa N, Sakajo S, Komiyama T, & Yoshimoto A (1990) Essential role of ferrous iron in cyanide-resistant respiration in Hansenula anomala. FEBS Lett 267(1):114-116. 186. Minagawa N & Yoshimoto A (1987) The induction of cyanide-resistant respiration in Hansenula anomala. J Biochem 101(5):1141-1146. 187. Sabel A, Martens S, Petri A, Konig H, & Claus H (2014) Wickerhamomyces anomalus AS1: a new strain with potential to improve wine aroma. Annals of Microbiology 64(2):483- 491. 188. Spagna G, Barbagallo RN, Palmeri R, Restuccia C, & Giudici P (2002) Properties of endogenous beta-glucosidase of a Pichia anomala strain isolated from Sicilian musts and wines. Enzyme and Microbial Technology 31(7):1036-1041. 189. Swangkeaw J, Vichitphan S, Butzke CE, & Vichitphan K (2009) The characterisation of a novel Pichia anomala beta-glucosidase with potentially aroma-enhancing capabilities in wine. Annals of Microbiology 59(2):335-343. 190. Swangkeaw J, Vichitphan S, Butzke CE, & Vichitphan K (2011) Characterization of beta- glucosidases from Hanseniaspora sp and Pichia anomala with potentially aroma- enhancing capabilities in juice and wine. World Journal of Microbiology & Biotechnology 27(2):423-430. 191. Jijakli MH & Lepoivre P (1998) Characterization of an Exo-beta-1,3-Glucanase Produced by Pichia anomala Strain K, Antagonist of Botrytis cinerea on Apples. Phytopathology 88(4):335-343. 192. Schneider J, et al. (2012) Genome sequence of Wickerhamomyces anomalus DSM 6766 reveals genetic basis of biotechnologically important antimicrobial activities. FEMS Yeast Res 12(3):382-386. 193. Batra LR & Francke-Grosmann H (1961) Contributions to Our Knowledge of Ambrosia Fungi. I. Asocidea hylecoeti sp. nov. (Ascomycetes). American Journal of Botany 48(6):453. 194. de Hoog GS & Smith MT (1998) Ascoidea Brefeld & Lindau (1891). In: Kurtzman, C.P., Fell, J.W., Boekhout, T. (2011) The Yeasts, a Taxonomic Study. 5th edition. Elsevier, Amsterdam, pp. 325-328. pp 136-140. 195. Walker LB (1931) Studies on Ascoidea rubescens: I. History and Development. Mycologia 23(1):51. 196. Papanikolaou S & Aggelis G (2002) Lipid production by Yarrowia lipolytica growing on industrial glycerol in a single-stage continuous culture. Bioresour Technol 82(1):43-49. 197. Beopoulos A, et al. (2009) Yarrowia lipolytica as a model for bio-oil production. Prog Lipid Res 48(6):375-387.

49

198. Groenewald M, et al. (2014) Yarrowia lipolytica: safety assessment of an oleaginous yeast with a great industrial potential. Crit Rev Microbiol 40(3):187-206. 199. Lanciotti R, Vannini L, Lopez CC, Gobbetti M, & Guerzoni ME (2005) Evaluation of the ability of Yarrowia lipolytica to impart strain-dependent characteristics to cheese when used as a ripening adjunct. International Journal of Dairy Technology 58(2):89-99. 200. Suzzi G, et al. (2001) Proteolytic, lipolytic and molecular characterisation of Yarrowia lipolytica isolated from cheese. Int J Food Microbiol 69(1-2):69-77. 201. Chen DC, Beckerich JM, & Gaillardin C (1997) One-step transformation of the dimorphic yeast Yarrowia lipolytica. Appl Microbiol Biotechnol 48(2):232-235. 202. Ledall MT, Nicaud JM, & Gaillardin C (1994) Multiple-Copy Integration in the Yeast Yarrowial ipolytica. Current Genetics 26(1):38-44. 203. Muller S, Sandal T, Kamp-Hansen P, & Dalboge H (1998) Comparison of expression systems in the yeasts Saccharomyces cerevisiae, Hansenula polymorpha, Klyveromyces lactis, Schizosaccharomyces pombe and Yarrowia lipolytica. Cloning of two novel promoters from Yarrowia lipolytica. Yeast 14(14):1267-1283. 204. Nicaud JM, et al. (2002) Protein expression and secretion in the yeast Yarrowia lipolytica. FEMS Yeast Res 2(3):371-379. 205. Nicaud JM, Fournier P, Labonnardiere C, Chasles M, & Gaillardin C (1991) Use of ars18 based vectors to increase protein production in Yarrowia lipolytica. Journal of Biotechnology 19(2-3):259-270. 206. Barth G & Gaillardin C (1997) Physiology and genetics of the dimorphic fungus Yarrowia lipolytica. FEMS Microbiol Rev 19(4):219-237. 207. Madzak C, Gaillardin C, & Beckerich JM (2004) Heterologous protein expression and secretion in the non-conventional yeast Yarrowia lipolytica: a review. J Biotechnol 109(1- 2):63-81. 208. Liu X, Lv J, Zhang T, & Deng Y (2015) Citric acid production from hydrolysate of pretreated straw cellulose by Yarrowia lipolytica SWJ-1b using batch and fed-batch cultivation. Prep Biochem Biotechnol 45(8):825-835. 209. Jost B, et al. (2015) The influence of oxygen limitation for the production of succinic acid with recombinant strains of Yarrowia lipolytica. Appl Microbiol Biotechnol 99(4):1675-1686. 210. Damude HG, et al. (US2006094092-A1; WO2006055322-A2; KR2007085669-A; WO2006055322-A3; CN101437952-A; US7588931-B2; CA2585178-C. 211. Finogenova TV, Morgunov IG, Kamzolova SV, & Chernyavskaya OG (2005) Organic acid production by the yeast Yarrowia lipolytica: A review of prospects. Applied Biochemistry and Microbiology 41(5):418-425. 212. Rywinska A, Marcinkiewicz M, Cibis E, & Rymowicz W (2015) Optimization of medium composition for erythritol production from glycerol by Yarrowia lipolytica using response surface methodology. Prep Biochem Biotechnol 45(6):515-529. 213. Andre A, et al. (2009) Biotechnological conversions of bio-diesel-derived crude glycerol by Yarrowia lipolytica strains. Engineering in Life Sciences 9(6):468-478. 214. Damude HG, et al. (2014) US 08815566. 215. Damude HG, et al. (2014) US 08685682. 216. Xie D, Jackson EN, & Zhu Q (2015) Sustainable source of omega-3 eicosapentaenoic acid from metabolically engineered Yarrowia lipolytica: from fundamental research to commercial production. Appl Microbiol Biotechnol 99(4):1599-1610. 217. Xue Z, et al. (2013) Production of omega-3 eicosapentaenoic acid by metabolic engineering of Yarrowia lipolytica. Nat Biotechnol 31(8):734-740. 218. Titorenko VI & Rachubinski RA (1998) Mutants of the yeast Yarrowia lipolytica defective in protein exit from the endoplasmic reticulum are also defective in peroxisome biogenesis. Molecular and Cellular Biology 18(5):2789-2803.

50

219. Eitzen GA, Szilard RK, & Rachubinski RA (1997) Enlarged peroxisomes are present in oleic acid-grown Yarrowia lipolytica overexpressing the PEX16 gene encoding an intraperoxisomal peripheral membrane peroxin. J Cell Biol 137(6):1265-1278. 220. Golubev VI, Babeva IP, Vinovarova ME, & Tyurin VS (1987) New Strains of Rare Yeast Species of the Genus Nadsonia. Microbiology 56(3):366-370. 221. Starkey RL & Henrici AT (1927) The occurrence of yeasts in soil. Soil Science 23(1):33- 46. 222. Swart WC, van Wyk WJP, Pohl HC, & Kock LFJ (2010) Variation in mitochondrial activity over the life cycle of Nadsonia fulvescens. African Journal of Microbiology Research 4(16):1727-1732. 223. Walker WF (1985) 5-S ribosomal -RNA sequences from Ascomycetes and evolutionary implications. . Systematic and Applied Microbiology 6(1):48-53. 224. Golubev WI, Smith MT, Poot GA, & Kock JL (1989) Species delineation in the genus Nadsonia sydow. Antonie Van Leeuwenhoek 55(4):369-382. 225. Phaff HJ, Starmer WT, Lachance MA, & Ganter PF (1994) Candida caseinolytica sp. nov., a new species of yeast occurring in necrotic tissue of Opuntia and Stenocereus species in the southwestern United States and Baja California, Mexico. Int J Syst Bacteriol 44(4):641- 645. 226. Lachance MA & Kurtzman CP (2013) The yeast genus Tortispora gen. nov., description of Tortispora ganteri sp. nov., Tortispora mauiana f.a., sp. nov., Tortispora agaves f.a., sp. nov., Tortispora sangerardonensis f.a., sp. nov., Tortispora cuajiniquilana f.a., sp. nov., Tortispora starmeri f.a., sp. nov. and Tortispora phaffii f.a., sp. nov., reassignment of Candida caseinolytica to Tortispora caseinolytica f.a., comb. nov., emendation of Botryozyma, and assignment of Botryozyma, Tortispora gen. nov. and Trigonopsis to the family Trigonopsidaceae fam. nov. Int J Syst Evol Microbiol 63(Pt 8):3104-3114. 227. Lodder J & Kreger-Van Rij NJW (1952) The yeasts-a taxonomic study pp xi + 713 pp.-xi + 713 pp. 228. Deinema MH & Landheer CA (1956) Composition of fats, produced by Lipomyces starkeyi, under various conditions. Arch Mikrobiol 25(2):193-200. 229. Deinema MH (1961) Intra- and extra-cellular lipid production by yeasts. Mededel Landbouwhogesch Te Wageningen 61((2)):1-54. 230. Wang RL, Wang JC, Xu RH, Fang Z, & Liu AZ (2014) Oil production by the oleaginous yeast Lipomyces starkeyi using diverse carbon sources. Bioresources 9(4):7027-7040. 231. Wild R, et al. (2010) Lipids from Lipomyces starkeyi. Food Technology and Biotechnology 48(3):329-335. 232. Tsakona S, et al. (2014) Formulation of fermentation media from flour-rich waste streams for microbial lipid production by Lipomyces starkeyi. J Biotechnol 189:36-45. 233. Zhao X, Kong XL, Hua YY, Feng B, & Zhao ZB (2008) Medium optimization for lipid production through co-fermentation of glucose and xylose by the oleaginous yeast Lipomyces starkeyi. European Journal of Lipid Science and Technology 110(5):405-412. 234. Gong Z, et al. (2012) Co-fermentation of cellobiose and xylose by Lipomyces starkeyi for lipid production. Bioresour Technol 117:20-24. 235. Oguri E, Masaki K, Naganuma T, & Iefuji H (2012) Phylogenetic and biochemical characterization of the oil-producing yeast Lipomyces starkeyi. Antonie Van Leeuwenhoek 101(2):359-368. 236. Anschau A, Xavier MC, Hernalsteens S, & Franco TT (2014) Effect of feeding strategies on lipid production by Lipomyces starkeyi. Bioresour Technol 157:214-222. 237. Huang C, et al. (2014) Bioconversion of corncob acid hydrolysate into microbial oil by the oleaginous yeast Lipomyces starkeyi. Appl Biochem Biotechnol 172(4):2197-2204.

51

238. Angerbauer C, Siebenhofer M, Mittelbach M, & Guebitz GM (2008) Conversion of sewage sludge into lipids by Lipomyces starkeyi for biodiesel production. Bioresour Technol 99(8):3051-3056. 239. Calvey CH, Willis LB, & Jeffries TW (2014) An optimized transformation protocol for Lipomyces starkeyi. Curr Genet 60(3):223-230. 240. Cisse OH, Pagni M, & Hauser PM (2012) De novo assembly of the Pneumocystis jirovecii genome from a single bronchoalveolar lavage fluid specimen from a patient. MBio 4(1):e00428-00412. 241. Wood V, et al. (2012) PomBase: a comprehensive online resource for fission yeast. Nucleic Acids Res 40(Database issue):D695-699. 242. Wood V, et al. (2002) The genome sequence of Schizosaccharomyces pombe. Nature 415(6874):871-880. 243. Kim DU, et al. (2010) Analysis of a genome-wide set of gene deletions in the fission yeast Schizosaccharomyces pombe. Nat Biotechnol 28(6):617-623. 244. Carne WM (1926) Leaf curl of Peach and Nectarine (Taphrina deformans). Journal of the Department of Agriculture 3(4):472-474. 245. Perley JE & Stowe BB (1966) On the ability of Taphrina deformans to produce indoleacetic acid from tryptophan by way of tryptamine. Plant Physiol 41(2):234-237. 246. Sziraki I, Balazs E, & Kiraly Z (1975) Increased levels of cytokinin and indoleaceticacid in peach leaves infected with Taphrina deformans. Physiological Plant Pathology 5(1):45-50. 247. Sommer NF (1961) Production by Taphrina deformans of substances stimulating cell elongation and division. Physiologia Plantarum 14(3):460-&. 248. Cisse OH, et al. (2013) Genome sequencing of the plant pathogen Taphrina deformans, the causal agent of peach leaf curl. MBio 4(3):e00055-00013. 249. Nishida H & Sugiyama J (1993) Phylogenetic relationships among Taphrina, Saitoella, and other higher fungi. Molecular Biology and Evolution 10(2):431-436. 250. Liu Y, et al. (2009) Phylogenomic analyses support the monophyly of Taphrinomycotina, including Schizosaccharomyces fission yeasts. Mol Biol Evol 26(1):27-34. 251. Sjamsuridza W, et al. (1997) Evolutionary relationships of members of the genera Taphrina, Protomyces, Schizosaccharomyces, and related taxa within the archiascomycetes: Integrated analysis of genotypic and phenotypic characters. Mycoscience 38(3):267-280. 252. Nishida H, Hamamoto M, & Sugiyama J (2011) Draft genome sequencing of the enigmatic yeast Saitoella complicate. Journal of General and Applied Microbiology 57(4):243-246. 253. Enright AJ, Van Dongen S, & Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30(7):1575-1584. 254. Stajich JE, Dietrich FS, & Roy SW (2007) Comparative genomic analysis of fungal genomes reveals intron-rich ancestors. Genome Biol 8(10):R223. 255. Carr M, Bensasson D, & Bergman CM (2012) Evolutionary genomics of transposable elements in Saccharomyces cerevisiae. PLoS One 7(11):e50978. 256. Nahimana A, et al. (2000) Determination of the copy number of the nuclear rDNA and beta-tubulin genes of Pneumocystis carinii f. sp. hominis using PCR multicompetitors. J Eukaryot Microbiol 47(4):368-372. 257. Dujon B (2010) Yeast evolutionary genomics. Nat Rev Genet 11(7):512-524. 258. Stielow JB, et al. (2015) One fungus, which genes? Development and assessment of universal primers for potential secondary DNA barcodes. Persoonia 35:242-263. 259. Gnerre S, et al. (2011) High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci U S A 108(4):1513-1518. 260. English AC, et al. (2012) Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One 7(11):e47768.

52

261. Zerbino DR & Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18(5):821-829. 262. LaButti K, Foster, B., Han, C., Brettin, T., & Lapidus, A. (2009) Gap Resolution: A Software Package for Improving Newbler Genome Assemblies. Report Number: LBNL-1899E Abs. 263. Martin J, et al. (2010) Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genomics 11:663. 264. Finn RD, et al. (2014) Pfam: the protein families database. Nucleic Acids Res 42(Database issue):D222-230. 265. Eberhardt RY, et al. (2012) AntiFam: a tool to help identify spurious ORFs in protein annotation. Database (Oxford) 2012:bas003. 266. Rawlings ND, Barrett AJ, & Bateman A (2012) MEROPS: the database of proteolytic enzymes, their substrates and inhibitors. Nucleic Acids Res 40(Database issue):D343- 350. 267. Saier MH, Jr., Tran CV, & Barabote RD (2006) TCDB: the Transporter Classification Database for membrane transport protein analyses and information. Nucleic Acids Res 34(Database issue):D181-186. 268. Horton P, et al. (2007) WoLF PSORT: protein localization predictor. Nucleic Acids Res 35(Web Server issue):W585-587. 269. Emanuelsson O, Brunak S, von Heijne G, & Nielsen H (2007) Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc 2(4):953-971. 270. Abt B, et al. (2013) Genome sequence of the thermophilic fresh-water bacterium Spirochaeta caldaria type strain (H1(T)), reclassification of Spirochaeta caldaria, Spirochaeta stenostrepta, and Spirochaeta zuelzerae in the genus Treponema as Treponema caldaria comb. nov., Treponema stenostrepta comb. nov., and Treponema zuelzerae comb. nov., and emendation of the genus Treponema. Stand Genomic Sci 8(1):88-105. 271. Abt B, et al. (2012) Complete genome sequence of the termite hindgut bacterium Spirochaeta coccoides type strain (SPN1(T)), reclassification in the genus Sphaerochaeta as Sphaerochaeta coccoides comb. nov. and emendations of the family Spirochaetaceae and the genus Sphaerochaeta. Stand Genomic Sci 6(2):194-209. 272. Anderson I, et al. (2011) Novel insights into the diversity of catabolic metabolism from ten haloarchaeal genomes. PloS one 6(5):e20237. 273. Breider S, et al. (2014) Genome-scale data suggest reclassifications in the Leisingera- Phaeobacter cluster including proposals for Sedimentitalea gen. nov. and Pseudophaeobacter gen. nov. Front Microbiol 5:416. 274. Frank O, et al. (2014) Complete genome sequence of the Phaeobacter gallaeciensis type strain CIP 105210(T) (= DSM 26640(T) = BS107(T)). Stand Genomic Sci 9(3):914-932. 275. Goker M, Scheuner C, Klenk HP, Stielow JB, & Menzel W (2011) Codivergence of mycoviruses with their hosts. PloS one 6(7):e22252. 276. Scheuner C, et al. (2014) Complete genome sequence of Planctomyces brasiliensis type strain (DSM 5305(T)), phylogenomic analysis and reclassification of Planctomycetes including the descriptions of Gimesia gen. nov., Planctopirus gen. nov. and Rubinisphaera gen. nov. and emended descriptions of the order Planctomycetales and the family Planctomycetaceae. Stand Genomic Sci 9:10. 277. Spring S, et al. (2010) The genome sequence of Methanohalophilus mahii SLP(T) reveals differences in the energy metabolism among members of the Methanosarcinaceae inhabiting freshwater and saline environments. Archaea 2010:690737. 278. Altschul SF, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17):3389-3402.

53

279. Li L, Stoeckert CJ, Jr., & Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13(9):2178-2189. 280. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5):1792-1797. 281. Thompson JD, Thierry JC, & Poch O (2003) RASCAL: rapid scanning and correction of multiple sequence alignments. Bioinformatics 19(9):1155-1161. 282. Talavera G & Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 56(4):564- 577. 283. Meusemann K, et al. (2010) A phylogenomic approach to resolve the arthropod tree of life. Mol. Biol. Evol. 27(11):2451-2464. 284. Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22(21):2688-2690. 285. Swofford DL (2002) PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods), Version 4.0 b10. (Sinauer Associates, Sunderland). 286. Le SQ & Gascuel O (2008) An improved general amino acid replacement matrix. Mol Biol Evol 25(7):1307-1320. 287. Stamatakis A & Aberer AJ (2013) Novel parallelization schemes for large-scale likelihood- based phylogenetic inference. IEEE 27th International Parallel and Distributed Processing Symposium, pp 1195–1204. 288. Sidall ME (2010) Unringing a bell: Metazoan phylogenomics and the partition bootstrap. Cladistics 26:444–452. 289. Goloboff PA, Farris JS, & Nixon KC (2008) TNT, a free program for phylogenetic analysis. Cladistics 24(5):774-786. 290. Meier-Kolthoff JP, Auch AF, Klenk HP, & Goker M (2014) Highly parallelized inference of large genome-based phylogenies. Concurr Comp-Pract E 26(10):1715-1729. 291. Meier-Kolthoff JP, Auch AF, Klenk HP, & Goker M (2013) Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics 14:60. 292. Meier-Kolthoff JP, Auch, A.F., Klenk, H.-P., Göker, M. (2014) GBDP on the grid: a genome-based approach for species delimitation adjusted for an automated and highly parallel processing of large data sets. Hochleistungsrechnen in Baden-Württemberg – Ausgewählte Aktivitäten im bwGRiD 2012., ed Schulz JC, Hermann, S. (KIT Scientific Publishing). 293. Meier-Kolthoff JP, Auch AF, Klenk HP, & Goker M (2014) Highly parallelized inference of large genome-based phylogenies. Concurr Comp-Pract E 26(10):1715-1729. 294. Auch AF, Henz SR, Holland BR, & Goker M (2006) Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences. BMC Bioinformatics 7:350. 295. Henz SR, Huson DH, Auch AF, Nieselt-Struwe K, & Schuster SC (2005) Whole-genome prokaryotic phylogeny. Bioinformatics 21(10):2329-2335. 296. Kawaguchi Y, Honda H, Taniguchi-Morimura J, & Iwasaki S (1989) The codon CUG is read as serine in an asporogenic yeast Candida cylindracea. Nature 341(6238):164-166. 297. Miranda I, Silva R, & Santos MA (2006) Evolution of the genetic code in yeasts. Yeast 23(3):203-213. 298. Ohama T, et al. (1993) Non-universal decoding of the leucine codon CUG in several Candida species. Nucleic Acids Res 21(17):4039-4045. 299. Osawa S & Jukes TH (1995) On codon reassignment. J Mol Evol 41(2):247-249. 300. Schultz DW & Yarus M (1996) On malleability in the genetic code. J Mol Evol 42(5):597- 601.

54

301. Sugita T & Nakase T (1999) Non-universal usage of the leucine CUG codon and the molecular phylogeny of the genus Candida. Syst Appl Microbiol 22(1):79-86. 302. Suzuki T, Ueda T, & Watanabe K (1997) The 'polysemous' codon--a codon with multiple amino acid assignment caused by dual specificity of tRNA identity. EMBO J 16(5):1122- 1134. 303. Santos MA & Tuite MF (1995) The CUG codon is decoded in vivo as serine and not leucine in Candida albicans. Nucleic Acids Res 23(9):1481-1486. 304. Santos MA, Gomes AC, Santos MC, Carreto LC, & Moura GR (2011) The genetic code of the fungal CTG clade. C R Biol 334(8-9):607-611. 305. Santos MA, Ueda T, Watanabe K, & Tuite MF (1997) The non-standard genetic code of Candida spp.: an evolving genetic code or a novel mechanism for adaptation? Mol Microbiol 26(3):423-431. 306. Muhlhausen S & Kollmar M (2014) Predicting the fungal CUG codon translation with Bagheera. BMC Genomics 15:411. 307. Gibson DG, et al. (2009) Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat Methods 6(5):343-345. 308. Gietz RD & Schiestl RH (2007) High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method. Nat Protoc 2(1):31-34. 309. Kurtzman CP, Fell JW, & Boekhout T eds (2011) The Yeasts, a Taxonomic Study (Elsevier, Amsterdam). 310. Brown CA, Murray AW, & Verstrepen KJ (2010) Rapid expansion and functional divergence of subtelomeric gene families in yeasts. Curr Biol 20(10):895-903. 311. Jeffries TW & Van Vleet JR (2009) Pichia stipitis genomics, transcriptomics, and gene clusters. FEMS Yeast Res 9(6):793-807. 312. Claudel-Renard C, Chevalet C, Faraut T, & Kahn D (2003) Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Acids Res. 31(22):6633-6639. 313. Jeffries TW, et al. (2007) Genome sequence of the lignocellulose-bioconverting and xylose-fermenting yeast Pichia stipitis. Nat. Biotechnol. 25(3):319-326. 314. Claudel-Renard C, Chevalet C, Faraut T, & Kahn D (2003) Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Acids Res 31(22):6633-6639. 315. Altschul SF, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389-3402. 316. Bairoch A, et al. (2005) The Universal Protein Resource (UniProt). Nucleic Acids Res 33(Database issue):D154-159. 317. Burhans DT, et al. (2006) Non-random clustering of stress-related genes during evolution of the S. cerevisiae genome. BMC Evol Biol 6:58. 318. Webster TD & Dickson RC (1988) The organization and transcription of the galactose gene cluster of Kluyveromyces lactis. Nucleic Acids Res 16(16):8011-8028. 319. Hittinger CT, Rokas A, & Carroll SB (2004) Parallel inactivation of multiple GAL pathway genes and ecological diversification in yeasts. Proc Natl Acad Sci U S A 101(39):14144- 14149. 320. Wong S & Wolfe KH (2005) Birth of a metabolic gene cluster in yeast by adaptive gene relocation. Nat Genet 37(7):777-782. 321. Sugino RP & Innan H (2012) Natural selection on gene order in the genome reorganization process after whole-genome duplication of yeast. Mol Biol Evol 29(1):71- 79. 322. Bennett RJ & Johnson AD (2005) Mating in Candida albicans and the search for a sexual cycle. Annu Rev Microbiol 59:233-255. 323. Haber JE (2012) Mating-type genes and MAT switching in Saccharomyces cerevisiae. Genetics 191(1):33-64.

55

324. Butler G, et al. (2004) Evolution of the MAT locus and its Ho endonuclease in yeast species. Proc Natl Acad Sci U S A 101(6):1632-1637. 325. Hedges SB, Blair JE, Venturi ML, & Shoe JL (2004) A molecular timescale of eukaryote evolution and the rise of complex multicellular life. BMC Evol Biol 4:2. 326. Gempe T & Beye M (2011) Function and evolution of sex determination mechanisms, genes and pathways in insects. Bioessays 33(1):52-60. 327. Graves JA (2013) How to evolve new vertebrate sex determining genes. Dev Dyn 242(4):354-359. 328. Charlesworth D (2015) Plant contributions to our understanding of sex chromosome evolution. New Phytol 208(1):52-65. 329. Rajaei N, Chiruvella KK, Lin F, & Astrom SU (2014) Domesticated transposase Kat1 and its fossil imprints induce sexual differentiation in yeast. Proc Natl Acad Sci U S A 111(43):15491-15496. 330. Hanson SJ, Byrne KP, & Wolfe KH (2014) Mating-type switching by chromosomal inversion in methylotrophic yeasts suggests an origin for the three-locus Saccharomyces cerevisiae system. Proc Natl Acad Sci U S A 111(45):E4851-4858. 331. Maekawa H & Kaneko Y (2014) Inversion of the chromosomal region between two mating type loci switches the mating type in Hansenula polymorpha. PLoS Genet 10(11):e1004796. 332. Butler G (2010) Fungal sex and pathogenesis. Clin Microbiol Rev 23(1):140-159. 333. Almeida JM, Cisse OH, Fonseca A, Pagni M, & Hauser PM (2015) Comparative genomics suggests primary homothallism of Pneumocystis species. MBio 6(1). 334. Hickman MA, Froyd CA, & Rusche LN (2011) Reinventing heterochromatin in budding yeasts: Sir2 and the origin recognition complex take center stage. Eukaryot Cell 10(9):1183-1192. 335. Allshire RC & Ekwall K (2015) Epigenetic Regulation of Chromatin States in Schizosaccharomyces pombe. Cold Spring Harb Perspect Biol 7(7):a018770. 336. Liu X, Kaas RS, Jensen PR, & Workman M (2012) Draft genome sequence of the yeast Pachysolen tannophilus CBS 4044/NRRL Y-2460. Eukaryot Cell 11(6):827.

56