University of Groningen Genome Sequencing and Analysis Of
Total Page:16
File Type:pdf, Size:1020Kb
University of Groningen Genome sequencing and analysis of the filamentous fungus Penicillium chrysogenum van den Berg, Marco A.; Albang, Richard; Albermann, Kaj; Badger, Jonathan H.; Daran, Jean-Marc; Driessen, Arnold; Garcia-Estrada, Carlos; Fedorova, Natalie D.; Harris, Diana M.; Heijne, Wilbert H. M. Published in: Nature Biotechnology DOI: 10.1038/nbt.1498 IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 2008 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): van den Berg, M. A., Albang, R., Albermann, K., Badger, J. H., Daran, J-M., Driessen, A. J. M., ... Bovenberg, R. A. L. (2008). Genome sequencing and analysis of the filamentous fungus Penicillium chrysogenum. Nature Biotechnology, 26(10), 1161-1168. DOI: 10.1038/nbt.1498 Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 10-02-2018 Supplementary Figure 1. Ortholog comparison of main functional classes in different fungal genomes. Pen: Penicillium chrysogenum, Anig: Aspergillus niger, Ater: Aspergillus terreus, Afla: Aspergillus flavus, Acla: Aspergillus clavatus, Aory: Aspergillus oryzae, Afum: Aspergillus fumigatus, Enid: Emericella nidulans, Hcap: Histoplasma capsulatum, Cimm: Coccidioides immitis, Pans: Podospora anserina, Pchr: Phanerochaete chrysosporium, Umay: Ustilago maydis, Gzea: Gibberella zeae, Tree: Trichoderma reesei, Ncra: Neurospora crassa, Mgri: Magnaporthe grisea, Scer Saccharomyces cerevisiae. The size of the filled circles is proportional to the number of ortholog genes in each category. Categories have been sorted in respect to the number of ortholog genes over all indicated genomes increasing from left to right. Category “All”represents the total number of ortholog genes in respect to P. chrysogenum for the listed fungal genomes. Pen Anig Ater Afla Acla Aory Afum Enid Hcap Cimm Pans Pchr Umay Gzea Tree Ncra Mgri Scer All EnergyCell fate Cell cycle TransporterMetabolism Protein fate Transcription Homeostasis Rescue, death Communication Cellular transport Protein synthesis Unclassified proteins Supplementary Figure 2. Ortholog comparison of functional classes related to metabolism and energy in different fungal genomes. Pen: Penicillium chrysogenum, Anig: Aspergillus niger, Ater: Aspergillus terreus, Afla: Aspergillus flavus, Acla: Aspergillus clavatus, Aory: Aspergillus oryzae, Afum: Aspergillus fumigatus, Enid: Emericella nidulans, Hcap: Histoplasma capsulatum, Cimm: Coccidioides immitis, Pans: Podospora anserina, Pchr: Phanerochaete chrysosporium, Umay: Ustilago maydis, Gzea: Gibberella zeae, Tree: Trichoderma reesei, Ncra: Neurospora crassa, Mgri: Magnaporthe grisea, Scer Saccharomyces cerevisiae. The size of the filled circles is proportional to the number of ortholog genes in each category. Categories have been sorted in respect to the number of ortholog genes over all indicated genomes increasing from left to right. The numbers on the x-Axis represent the following functional categories: 1 biosynthesis of nonprotein amino acids, 2 biosynthesis of polyketides, 3 biosynthesis of alkanes, alkenes, alkanals, alkanols, 4 anaerobic aromate catabolism, 5 biosynthesis of aminoglycoside antibiotics, 6 biosynthesis of ß-lactams, 7 biosynthesis of peptide antibiotics, 8 aliphatic hydrocarbon catabolism, 9 biosynthesis of secondary products derived from L-phenylalanine and L-tyrosine, 10 isoprenoid biosynthesis, 11 aerobic aromate catabolism, 12 catabolism of secondary metabolites, 13 biosynthesis of amines, 14 fermentation, 15 aminosaccharide biosynthesis, 16 biosynthesis of alkaloids, 17 nitrogen and sulfur metabolism, 18 degradation of amino acids of the cysteine-aromatic group, 19 biosynthesis of acetoacetate, acetone, hydroxybutyric acid, 20 degradation of amino acids of the glutamate group, 21 assimilation of ammonia, biosynthesis of the glutamate group, 22 breakdown of lipids, fatty acids and isoprenoids, 23 degradation of amino acids of the aspartate group, 24 pentose-phosphate pathway oxidative branch, 25 metabolism of energy reserves (e.g. glycogen, trehalose), 26 fatty acid biosynthesis 27 biosynthesis of derivatives of dehydroquinic acid, shikimic acid and chorismic acid, 28 purine nucleotide metabolism, 29 degradation of amino acids of the pyruvate family, 30 glycolipid biosynthesis, 31 phosphate metabolism, 32 oxidation of fatty acids, 33 polysaccharide biosynthesis, 34 biosynthesis of secondary monosaccharides, 35 biosynthesis of secondary products derived from L-lysine, L-arginine and L-histidine, 36 extracellular metabolism, 37 biosynthesis of the cysteine-aromatic group, 38 glyoxylate cycle, 39 biosynthesis of the aspartate family, 40 glycolysis and gluconeogenesis, 41 metabolism of vitamins, cofactors, and prosthetic groups, 42 polynucleotide degradation, 43 deoxyribonucleotide metabolism, 44 pyrimidine nucleotide metabolism, 45 biosynthesis of the pyruvate family (alanine, isoleucine, leucine, valine) and D-alanine, 46 biosynthesis of secondary products derived from L-glutamic acid, L-proline and L-ornithine, 47 urea cycle, biosynthesis of polyamines and creatine, 48 biosynthesis of secondary products derived from L-tryptophan, 49 electron transport and membrane-associated energy conservation, 50 tricarboxylic-acid pathway (citrate cycle, Krebs cycle, TCA cycle), 51 respiration, 52 phospholipid biosynthesis, 53 C-1 compound catabolism, 54 biosynthesis of porphyrins, 55 biosynthesis of glycosides, 56 biosynthesis of sulfuric acid and L-cysteine derivatives, 57 metabolism of cyclic and unusual nucleotides, 58 pentose-phosphate pathway non oxidative branch, 59 degradation of amino acids of the hydroxyamino-acid group, 60 biosynthesis of derivatives of homoisopentenyl pyrophosphate, 61 biosynthesis of cobalamins. Penicillium chrysogenum specific proteins Penicillium chrysogenum proteins with orthologs A Protein synthesis B Transcription 12 (1%) Protein fate 123 (3%) 67 (2%) Cell cycle Transporter Metabolism 35 (1%) 126 (3%) 2165 (15%) Energy Energy Cellular transport Unclassified proteins 17 (1%) 291 (3%) 105 (3%) (32%) Metabolism Communication / Cell cycle 283 (7%) Signal transduction 474 (4%) 36 (1%) Transcription Cell rescue, death 867 (6%) 130 (3%) Protein synthesis Regulation of / interaction 249 (2%) with cellular environment 33 (1%) Protein fate 803 (6%) Subcellular localization 205 (5%) Transposable elements Transporter 224 (2%) 690 (5%) Unclassified proteins (72%) Transposable elements 59 (2%) Subcellular localization Cellular transport 2212 (16%) 804 (6%) Regulation of / interaction Communication / with cellular environment signal transduction 217 (2%) 271 (2%) Cell rescue, death 666 (5%) C all 4 non-syntenic assemblies Protein synthesis Protein fate 0 (0%) 9 (2%) Transporter Transcription 10 (2%) 25 (5%) Cellular transport Cell cycle 7 (2%) 12 (2%) Communication / signal transduction Energy 4 (1%) 2 (1%) Metabolism Cell rescue, death 18 (3%) 8 (2%) Regulation of / interaction with cellular environment 7 (2%) Subcellular localization 42 (7%) Transposable elements 18 (3%) Unclassified proteins 458 (74%) Supplementary Table 1. Genome statistics comparison between different filamentous fungi. P. chrysogenum A.niger A. nidulans A. fumigatus Af293 Chromosomes/Scaffolds Total Length (Kb) 32,224 33,931 30,069 28,810 GC content (%) 48.9 50.4 50.0 49.8 Number of Protein Coding Genes 12,941 14,165 10,662 9,632 Mean Gene Length (bp) 1,515 1,573 1,868 1,478.1 Gene Density 2,490 2,395 3,151 2,990 Percent Coding (%) 56.6 55.2 50.0 49.4 Genes with Introns (%) 83.5 87.0 86.9 78.6 Exons Average size (bp) 434 370 436 504 Number 41,996 50,629 35,797 28,254 Mean # per Gene 3 3.5 3.4 2.8 GC Content (%) 52.9 53.7 53.3 54 Total Length (bp) 18,238,634 18,733,984 15,477,748 14,241,720 Introns Average size (bp) 87.4 97.2 91 82 Number 28,326 36,464 24,792 18,619 Mean # per Gene 2.2 2.6 2.4 1.8 GC Content (%) 45.3 45.3 45.8 46.7 Total Length (bp) 2,475,921 3,544,638 2,243,391 1,521,138 Intergenic Region GC Content (%) 44.4 46.4 47.4 45.9 Mean Length (bp) 842 822 1,137 1,322 Longest intergenic region (bp) 44,111 19,212 13,654 55,846 RNA tRNA number 145 269 n/a 179 5S rRNA number 28 56 n/a 33 Supplementary table 2. Features of the four non-syntenic supercontigs Non-syntenic Contigs 67 69 73 74 All four Genome Length (Kb) 292 168 229 709 1398 32183 Coding (%) 33.8 37.5 36.3 35.0 35.3 56.6 C+G content (%) 48.6 48.1 48.0 48.1 48.2 48.9 Genes 128 78 103 327 636 13653 Genes (< 100 aa) 6 3 4 8 21 283 Genes with introns (%) 71.1 70.5 82.5 76.1 75.5 83.5 Mean No. exons 2.78 2.50 3.07 2.67 2.75 3.07 Mean protein length (bp) 302 309 307 288 302 446 Mean gene length (aa) 905 927 922 865 905 1338 Repeat element density 0.0177 0.0163 0.0196 0.0158 0.01735 0.0104 TE elements 30 25 11 65 131 559 Pseudogenes 19 11