Reconstructing the Evolution of Xylose Fermentation in Scheffersomyces Stipitis by Kevin Correia a Thesis Submitted in Conformit

Reconstructing the evolution of xylose fermentation in Scheffersomyces stipitis by Kevin Correia A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Chemical Engineering and Applied Chemistry University of Toronto © Copyright 2019 by Kevin Correia Abstract Reconstructing the evolution of xylose fermentation in Scheffersomyces stipitis Kevin Correia Doctor of Philosophy Graduate Department of Chemical Engineering and Applied Chemistry University of Toronto 2019 Lignocellulose is a renewable feedstock that can be fermented to biofuels and biochemicals, but its use is limited by several techno-economic challenges, including xylose fermentation. Scheffersomyces stipitis has been identified as an efficient xylose fermenter, but does not ferment well in industrial conditions. The scientific community has been engineering Saccharomyces cerevisiae, the preferred yeast for industrial biotechnology, to ferment xylose to ethanol with the xylose reductase (XR)-xylitol dehydrogenase (XDH) genes from Sch. stipitis for 30 years; however, recombinant Sac. cerevisiae’s titres, rates, and yields for ethanol from xylose lags Sch. stipitis. This performance gap may be due to aspects of Sac. cerevisiae’s metabolism that hinders xylose fermentation, aspects of Sch. stipitis’ metabolism that enables xylose fermentation, or a combination of both. The focus of this thesis is to improve our understanding of budding yeast metabolism beyond Sac. cerevisiae, and ultimately reverse engineer how Sch. stipitis’ metabolism has evolved to ferment xylose to ethanol efficiently with improved genome annotations and metabolic modelling. To obtain higher-quality genome annotations, ortholog groups in 33 yeasts and fungi spanning 600 million years in Dikarya were predicted using OrthoMCL. Over 500 enzyme families were curated with phylogenetic reconstruction to resolve inconsistencies. These ortholog assignments are more accurate than existing ortholog databases, and span diverse yeasts. Next, the gains and losses of metabolic genes were reviewed to identify important and reoccurring events in the evolution of metabolism in budding yeasts. Duplications were found to play an important role in the evolution of metabolism, including changes in enzyme cofactor preference and localization. The curated pan-genome and functional annotations were used to reconstruct genome-scale metabolic networks of the 33 species; the reconstructions have more genomic and metabolic coverage than those made with previous methods. Finally, the Sch. stipitis metabolic network was used to simulate xylose fermentation. NADH kinase and NADP phosphatase (NADPase), an orphan enzyme in eukaryotes, were found to be critical to xylose fermentation in these metabolic flux simulations. Pho3.2p was the sole NADPase candidate showing expression during xylose fermentation; its activity was confirmed in vitro. Xylose fermentation evolved in the Scheffersomyces-Spathaspora clade following genes duplications for XR and an acid phosphatase. This thesis provides a foundation for unravelling how metabolism has evolved in the yeast pan-genome. ii Dedicated to my family, and especially to my late father, who was my first math and science teacher. Luis da Costa Correia (December 12, 1959-December 29, 2014) iii Acknowledgements My doctoral experience has been one of the most rewarding experiences in my life. I have been amazed at how much I have been able to learn during my time at the University of Toronto. First and foremost, I would like to thank Prof. Radhakrishnan Mahadevan for taking me on as a Masters student and eventually allowing me to bypass to the Ph.D. program. I am thankful for the freedom he gave me to pursue my research interests. I also thank my reading committee for guiding me throughout my doctoral research. Prof. Grant Allen for challenging me to make my research more accessible, Prof. Amy Caudy for inviting me to her lab meetings, Prof. Elizabeth Edwards for sitting as an examiner in my defences, and Prof. Jack Pronk for sitting as my external examiner and his helpful suggestions to improve my thesis. My labmates in the Laboratory for Metabolic Systems Engineering have been a great source of friendship and learning, including Peter Li, Pratish Gawand, Patrick Hyland, Chris Gowen, Kayla Nemr, Naveen Venayak, Shyam Srinivasan, Jeong Chan Joo, and Taeho Kim. Prof. Alexander F. Yakunin and Anna Khusnutdinova for their help and guidance with enzymology. Prof. Hung Lee, Xin Wen, and Mehdi Dashtban for their help with Scheffersomyces stipitis genetics. Dean Robson, Weijun Gao, Daniel Tomchyshyn for their help with IT. I thank my family and for their support throughout my Ph.D. It was surprising to hear my parents ask when I would graduate because when I was growing up they always told me to stay in school. They should have specified a time limit. Maiko Sugai, my better half and second brain, for her support. Pierre Sawtell for his sense of humour. I would also like to thank the taxpayers of Ontario and Canada who fund research programs across the country. Direct financial support was provided from Natural Sciences and Engineering Research Council of Canada from the Bioconversion project and the CREATE M3 scholarship. A special thanks to all the failures and rejections that brought me here. iv Contents List of Tables ix List of Figures xii 1 Abbreviations 1 2 Introduction 3 2.1 Motivation . .3 2.2 Challenges and Objectives . .5 2.3 Contributions . .6 2.3.1 Pan-genome creation . .6 2.3.2 Pan-genome analysis . .6 2.3.3 Pan-genome-scale metabolic network reconstruction . .6 2.3.4 Analysis of xylose fermentation in Sch. stipitis ....................7 2.3.5 Other contributions . .7 3 Literature review 9 3.1 History and physiology of xylose fermentation in yeasts . .9 3.1.1 Xylose catabolic pathways . .9 3.1.2 Native pentose fermentation by yeasts . 10 3.1.3 Engineering xylose fermentation in Saccharomyces cerevisiae ............ 15 3.1.4 Opportunities to improve our understanding of xylose fermentation . 18 3.2 Functional genome annotation . 19 3.2.1 Biology is technology . 19 3.2.2 Evolving machines with unknown origins . 19 3.2.3 Homology and analogy in biology . 21 3.2.4 Structural and functional genome annotation . 21 3.2.5 Identifying orthologous proteins . 23 3.2.6 Opportunities to improve functional genome annotation . 27 3.3 Reconstruction and analysis of metabolism in silico ...................... 27 3.3.1 The worm: the first full-scale reconstructed network of biology . 27 3.3.2 Genome-scale network reconstruction . 28 3.3.3 Flux balance analysis . 29 3.3.4 Scheffersomyces stipitis genome-scale network reconstruction and analysis . 30 v 3.3.5 Opportunities to improve genome-scale network reconstruction and analysis for Scheffersomyces stipitis ................................. 30 3.4 Summary of the literature and synthesis . 30 3.4.1 Xylose fermentation . 30 3.4.2 Functional genome annotation . 32 3.4.3 Genome-scale network reconstruction . 33 3.4.4 Synthesis of xylose fermentation in yeasts, genome annotations, and metabolic modelling . 34 3.5 Hypotheses, objectives, and organization of the thesis . 35 3.5.1 Curation of the yeast pan-genome . 35 3.5.2 Analysis of the yeast pan-genome . 35 3.5.3 In silico metabolic network reconstruction for the yeast pan-genome . 36 3.5.4 Reverse engineering xylose fermentation in Scheffersomyces stipitis ......... 36 4 AYbRAH: an open-source ortholog database for yeasts and fungi 38 4.1 Abstract . 38 4.2 Introduction . 39 4.3 Methods . 40 4.3.1 Initial construction of AYbRAH . 40 4.3.2 AYbRAH curation . 40 4.3.3 Annotating misidentified and unidentified proteins . 40 4.3.4 Comparison of AYbRAH to existing phylogenomic databases . 43 4.3.5 Subcellular localization prediction . 43 4.3.6 Literature references . 43 4.4 AYbRAH overview . 45 4.4.1 The AYbRAH web portal. 45 4.5 AYbRAH curation . 48 4.5.1 Over-clustering by OrthoMCL . 48 4.5.2 Under-clustering by OrthoMCL . 48 4.6 Comparison of AYbRAH to other ortholog identification methods. 48 4.6.1 BLASTP scoring metrics. 48 4.6.2 Comparison of AYbRAH to well-established phylogenomic databases. 51 4.7 Applications of a curated ortholog database. 53 4.8 Conclusion . 53 5 Reconstructing the evolution of metabolism in yeasts 54 5.1 Abstract . 54 5.2 Introduction . 55 5.3 Methods . 56 5.3.1 Refinement of the yeast species topology . 56 5.3.2 Reconstruction of gene duplications and losses . 56 5.3.3 Flux balance analysis . 56 5.4 Results & Discussion . 56 5.4.1 Refined yeast species tree topology . 56 vi 5.4.2 Evolution of the pyruvate dehydrogenase bypass . 60 5.4.3 Evolution of NADH dehydrogenase . 64 5.4.4 Heteromerization of enzyme complexes . 67 5.4.5 Ribosomal subunit duplications. 68 5.4.6 Changes in enzyme localization via gene duplications . 69 5.4.7 Redox cofactor changes in enzymes . 70 5.4.8 Horizontal gene transfer . 71 5.5 Conclusion . 71 6 Fungi pan-genome-scale network reconstruction 73 6.1 Abstract . 73 6.2 Introduction . 74 6.3 Methods . 76 6.3.1 Network reconstruction . 76 6.3.2 Biomass equation formulations . 78 6.3.3 Flux balance analysis . 79 6.4 Results . 79 6.4.1 Expanded genomic coverage in the fungal pan-GENRE . 79 6.4.2 Expanded metabolic coverage in the fungal pan-GENRE . 81 6.4.3 Amino acid yields vary across yeast strains . 81 6.4.4 Comparison.

Reconstructing the Evolution of Xylose Fermentation in Scheffersomyces Stipitis by Kevin Correia a Thesis Submitted in Conformit

METACYC ID Description A0AR23 GO:0004842 (Ubiquitin-Protein Ligase

Expanding the Knowledge on the Skillful Yeast Cyberlindnera Jadinii

Ribonucleotides Incorporated by the Yeast Mitochondrial DNA Polymerase Are Not Repaired

Table S1. List of Oligonucleotide Primers Used

Evaluation of the Chitin-Binding Dye Congo Red As a Selection Agent for the Isolation, Classification, and Enumeration of Ascomycete Yeasts

Comparative Genomics of Biotechnologically Important Yeasts Supplementary Appendix

A Curated Ortholog Database for Yeasts and Fungi Spanning 600 Million Years of Evolution

Myconet Volume 14 Part One. Outine of Ascomycota – 2009 Part Two

Mitochondrial Uncoupling and the Reprograming of Intermediary Metabolism in Leukemia Cells

Genome-Wide Investigation of Cellular Functions for Trna Nucleus

Table S3 Abbreviations Variable Name in Program Definition Or Full

Downloaded from by Lawrence Berkeley National Laboratory User on 13 May 2019