View to the Fungal Kingdom
Total Page:16
File Type:pdf, Size:1020Kb
BMC Genomics BioMed Central Research article Open Access Comparison of protein coding gene contents of the fungal phyla Pezizomycotina and Saccharomycotina Mikko Arvas*1, Teemu Kivioja2, Alex Mitchell3, Markku Saloheimo1, David Ussery4, Merja Penttila1 and Stephen Oliver5 Address: 1VTT, Tietotie 2, Espoo, P.O. Box 1500, 02044 VTT, Finland, 2Biomedicum, P.O. Box 63 (Haartmaninkatu 8), FI-00014 University of Helsinki, Finland, 3EMBL Outstation – Hinxton, European Bioinformatics Institute, Welcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK, 4Center for Biological Sequence Analysis BioCentrum-DTU The Technical University of Denmark DK-2800 Kgs. Lyngby, Denmark and 5University of Manchester, Michael Smith Building, Oxford Road, Manchester, M13 9PT, UK Email: Mikko Arvas* - [email protected]; Teemu Kivioja - [email protected]; Alex Mitchell - [email protected]; Markku Saloheimo - [email protected]; David Ussery - [email protected]; Merja Penttila - [email protected]; Stephen Oliver - [email protected] * Corresponding author Published: 17 September 2007 Received: 23 April 2007 Accepted: 17 September 2007 BMC Genomics 2007, 8:325 doi:10.1186/1471-2164-8-325 This article is available from: http://www.biomedcentral.com/1471-2164/8/325 © 2007 Arvas et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: Several dozen fungi encompassing traditional model organisms, industrial production organisms and human and plant pathogens have been sequenced recently and their particular genomic features analysed in detail. In addition comparative genomics has been used to analyse specific sub groups of fungi. Notably, analysis of the phylum Saccharomycotina has revealed major events of evolution such as the recent genome duplication and subsequent gene loss. However, little has been done to gain a comprehensive comparative view to the fungal kingdom. We have carried out a computational genome wide comparison of protein coding gene content of Saccharomycotina and Pezizomycotina, which include industrially important yeasts and filamentous fungi, respectively. Results: Our analysis shows that based on genome redundancy, the traditional model organisms Saccharomyces cerevisiae and Neurospora crassa are exceptional among fungi. This can be explained by the recent genome duplication in S. cerevisiae and the repeat induced point mutation mechanism in N. crassa. Interestingly in Pezizomycotina a subset of protein families related to plant biomass degradation and secondary metabolism are the only ones showing signs of recent expansion. In addition, Pezizomycotina have a wealth of phylum specific poorly characterised genes with a wide variety of predicted functions. These genes are well conserved in Pezizomycotina, but show no signs of recent expansion. The genes found in all fungi except Saccharomycotina are slightly better characterised and predicted to encode mainly enzymes. The genes specific to Saccharomycotina are enriched in transcription and mitochondrion related functions. Especially mitochondrial ribosomal proteins seem to have diverged from those of Pezizomycotina. In addition, we highlight several individual gene families with interesting phylogenetic distributions. Conclusion: Our analysis predicts that all Pezizomycotina unlike Saccharomycotina can potentially produce a wide variety of secondary metabolites and secreted enzymes and that the responsible gene families are likely to evolve fast. Both types of fungal products can be of commercial value, or on the other hand cause harm to humans. In addition, a great number of novel predicted and known enzymes are found from all fungi except Saccharomycotina. Therefore further studies and exploitation of fungal metabolism appears very promising. Page 1 of 23 (page number not for citation purposes) BMC Genomics 2007, 8:325 http://www.biomedcentral.com/1471-2164/8/325 Background Atlases [11], e-Fungi [12], but none offers a level of detail At least 36 fungal genomes are already available at public comparable to Ensembl. However, high quality software databases and several more are being sequenced. Of to set up such a database is freely available from the eukaryotes, Fungi is the kingdom with most sequenced Generic Genome Browser (GBrowse) [13] and Ensembl species. The sequenced genomes cover fungal species projects. Most importantly, these support interfaces to broadly and include industrially, medically and agricul- BioPerl [14], which offers currently the most extensive turally important species with very diverse genomes. For sequence analysis programming library. these reasons fungal genomes form one of the most attrac- tive eukaryotic datasets for comparative genomics Our goal is to explain differences in phenotypes of fungi research and method development. Moreover, the possi- through differences in their genotypes and to deduce evo- bilities for in silico exploration of the diversity of biologi- lutionary processes that have shaped fungi. To this end we cal functions in these organisms have been have used TRIBE-MCL [15] protein clustering and Inter- revolutionized. ProScan [16] protein domain analysis to compare genome open reading frame (ORF) contents of Pezizomy- The fungal kingdom is divided in Chytridiomycota, Zygo- cotina and Saccharomycotina species in a large data set of mycota, Glomeromycota, Ascomycota, Basidiomycota 33 fungal genomes. TRIBE-MCL divides the genomes and Deuteromycota. Most sequenced fungi belong to comprehensively into protein clusters, while InterProScan Ascomycota, which is again divided in Pezizomycotina, detects known protein domains. We have also set up a Saccharomycotina and Taphrinomycotina. Pezizomy- GBrowse based comparative genomics web interface that cotina, commonly referred to as filamentous fungi, grow enables easy browsing of our data. We detected protein mostly in a filamentous form and degrade biomass to free clusters specific to Pezizomycotina, other fungi than Sac- sugars to be used as source of carbon and energy. In con- charomycotina and Saccharomycotina, and characterised trast, Saccharomycotina, commonly referred to as yeasts, them with Funcat gene classifications [17-19] and Protfun grow mostly in a unicellular form. They have generally no protein function predictions [20]. capability to degrade biomass and accordingly live on free sugars. We found that in Pezizomycotina protein clusters related to plant biomass degradation and secondary metabolism Comparative genomics has been applied extensively are the only ones with multiple proteins in each species. within the phylum Saccharomycotina (for review see [1]), Pezizomycotina have also an interesting expansion of a where a whole genome duplication followed by massive transcription factor protein family and a wealth of gene loss and specialization has been shown to have enzymes and subsequent metabolic capabilities not occurred [2]. This has been further confirmed by a broader found in Saccharomycotina. In contrast, Saccharomy- analysis of differential genome evolution in Saccharomy- cotina specific protein clusters are enriched in transcrip- cotina species [3] that also demonstrated the usefulness of tion and mitochondrion related functions. multi-species protein clustering. Results Studies of Pezizomycotina have concentrated on descrip- Overview of clustering results tions of individual genomes or comparisons inside a class To compare the ORF content of Pezizomycotina and Sac- such as the Eurotiales genomes of Aspergillus fumigatus [4], charomycotina we selected a set of 33 species shown in oryzae [5] and nidulans [6]. A. fumigatus and oryzae had Table 1. 14 of them are Pezizomycotina and 13 Saccharo- been previously considered as asexual organisms, but mycotina. In addition one Archeascomycota, four Basidi- comparative analysis showed that their genomes have the omycota and one Zygomycota were included as reference same potential for sexual reproduction as A. nidulans. species outside the two phyla. Sequencing of the Neurospora crassa genome [7] revealed the full effect of the mechanism called Repeat Induced To compare the ORF content systematically, we first Point mutations (RIP). RIP specifically mutates duplica- divided all the ORFs in groups related by sequence simi- tions that are longer than 400 bp and more identical than larity. For this we used a protein clustering software 80%, thus effectively preventing gene duplications and TRIBE-MCL [15]. It uses a graph clustering algorithm to keeping the genome extremely non-redundant (for review cluster proteins based on the results of an all-against-all see [8]). BLASTP [21]. The results have been shown to correlate well with manual family classifications [15]. The sizes of Extensive comparative genomic databases for some TRIBE-MCL clusters are influenced by the parameter infla- groups of eukaryotic species are available such as the tion value (r), which is limited to values between 1.1 and Ensembl for mammals [9]. Several comparative databases 4.5. Subsequently, it effects also the sensitivity, specificity, exist also for fungi such as MIPS [10], CBS Genome total count of clusters and count of orphan clusters (i.e. Page 2 of 23 (page number not for citation purposes) Table 1: Genome data Page23 3 of Phylum Subphylum Name Abbreviation