The Genome Sequence of the Soft-Rot Fungus Penicillium

bioRxiv preprint doi: https://doi.org/10.1101/197368; this version posted October 2, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1 The genome sequence of the soft-rot fungus Penicillium 2 purpurogenum reveals a high gene dosage for lignocellulolytic 3 enzymes 4 5 Wladimir Mardonesa,*, Alex Di Genovab,c,*, María Paz Cortésb,c, Dante Travisanyb, 6 c, Alejandro Maassb,c,d, Jaime Eyzaguirrea,+. 7 8 a Facultad de Ciencias Biológicas, Universidad Andrés Bello, Av. República 217, 9 Santiago, Chile. 10 b Fondap Center for Genome Regulation, Av. Blanco Encalada 2085, 3rd floor, 11 Santiago, Chile. 12 c Center for Mathematical Modeling, University of Chile, Av. Beauchef 851, 7th 13 floor, Santiago, Chile. 14 d Department of Mathematical Engineering, University of Chile, Av. Beauchef 15 851, 5th floor, Santiago, Chile. 16 17 * These authors contributed equally to this work. 18 19 + To whom correspondence should be addressed: Telephone: (562) 26618070. 20 FAX: (562) 26980414. E-mail: [email protected] 21 22 1 bioRxiv preprint doi: https://doi.org/10.1101/197368; this version posted October 2, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 23 Abstract 24 The high lignocellulolytic activity displayed by the soft-rot fungus P. purpurogenum 25 has made it a target for the study of novel lignocellulolytic enzymes. We have 26 obtained a reference genome of 36.2Mb of non-redundant sequence (11,057 27 protein-coding genes). The 49 largest scaffolds cover 90% of the assembly, and 28 CEGMA analysis reveals that our assembly covers most if not all all protein-coding 29 genes. RNASeq was performed and 93.1% of the reads aligned within the 30 assembled genome. These data, plus the independent sequencing of a set of 31 genes of lignocellulose-degrading enzymes, validate the quality of the genome 32 sequence. P. purpurogenum shows a higher number of proteins with CAZy motifs, 33 transcription factors and transporters as compared to other sequenced Penicillia. 34 These results demonstrate the great potential for lignocellulolytic activity of this 35 fungus and the possible use of its enzymes in related industrial applications. 36 37 Keywords: Penicillium purpurogenum; genome sequencing; lignocellulose 38 biodegradation; CAZymes; RNASeq, Illumina 39 2 bioRxiv preprint doi: https://doi.org/10.1101/197368; this version posted October 2, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 40 41 1. Introduction 42 Lignocellulose is by far the most abundant renewable resource on earth, and it 43 represents a highly valuable source of raw material for different industrial 44 processes, among them the production of second-generation biofuels such as 45 bioethanol (Ragauskas et al. 2006). Lignocellulose contains several 46 polysaccharides, mainly cellulose, hemicelluloses and pectin. Hydrolysis of these 47 polysaccharides can be achieved chemically or enzymatically, the second method 48 having the advantage of being environmentally friendly and free of potentially 49 poisonous side-products (Blanch 2012). The saccharification process to obtain the 50 monosaccharide components requires a variety of enzymes particularly due to the 51 complex structure of the hemicelluloses and pectin. These enzymes are secreted 52 by a number of bacteria and fungi (Dehority 1962; Kang et al. 2004). A thorough 53 understanding of the lignocellulose-degrading systems is important in order to 54 prepare enzyme cocktails that can efficiently degrade a variety of lignocellulose 55 raw materials of different chemical composition (Rosenberg 1978). Although the 56 study of individual enzymes (purification and characterization) may give valuable 57 information on their properties, it is a rather time-consuming process. The recent 58 developments in the “omics” sciences and bioinformatics have provided novel 59 techniques for the analysis of genomes, transcriptomes and secretomes of cells 60 and organisms, allowing a rapid identification and characterization of sets of 61 enzymes (Conn 2003). 3 bioRxiv preprint doi: https://doi.org/10.1101/197368; this version posted October 2, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 62 Fungi are the preferred sources of lignocellulolytic enzymes. Among them, 63 the genera Trichoderma (Merino and Cherry 2007) and Aspergillus (Kang et al. 64 2004) have been the subject of detailed studies and several of their enzymes have 65 been used for industrial applications. Less well known are the enzymes from 66 Penicillia. However, several members of this genus have been reported to be good 67 producers of cellulases and xylanases. A strain of Penicillium decumbens (Liu et 68 al. 2013) has been used for industrial-scale cellulase production in China since 69 1996, and Gusakov and Sinitsyn (2012) have reviewed the production of cellulases 70 by a set of Pencillium species, some of whose enzymes show superior 71 performance to those of the better-known Trichoderma. 72 Our laboratory has used as a model for the study of lignocellulolytic 73 enzymes a locally isolated strain of Penicillium (Musalem et al. 1984), which has 74 been registered in the ATCC as MYA-38. This soft-rot fungus grows on a variety of 75 lignocellulolytic natural carbon sources (i.e. sugar beet pulp, corncob, etc.) (Steiner 76 et al. 1994). It secretes to the medium a large number of cellulose- and xylan- 77 degrading enzymes, some of which have been characterized and sequenced 78 (Chávez et al. 2006; Ravanal et al. 2010), evidencing a high and plastic 79 lignocellulolytic enzyme activity, which may potentially have industrial applications 80 in raw material processing. 81 In this article we provide a genome sequence for this Penicillium generated 82 using a joint approach involving second and third generation sequencing 83 technologies (Illumina (Bentley et al. 2008) and PacBio (Eid et al. 2009) 84 respectively), together with annotated gene models for this fungus, analyzing novel 85 genes focused on lignocellulolytic activity. This fungus was found, among the 4 bioRxiv preprint doi: https://doi.org/10.1101/197368; this version posted October 2, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 86 Penicillia analyzed, to have the highest number of CAZymes. An RNAseq analysis 87 of the fungus grown on sugar beet pulp, and the independent sequencing of a set 88 of genes of lignocellulose-degrading enzymes confirm the quality of the gene 89 models. Due to the high number of CAZymes identified, we propose that this 90 Penicillium strain is a powerful source of enzymes for the industrial lignocellulose 91 biodegradation process. 92 93 2. Materials and methods 94 2.1 Fungal strain and culture conditions 95 Penicillium purpurogenum ATCC strain MYA-38 was grown in Mandel's medium as 96 described previously (Hidalgo et al. 1992). Liquid cultures were grown for 4 days at 97 28°C in an orbital shaker (200 rpm) using 1% glucose or sugar beet pulp as carbon 98 sources. Fungus grown on glucose was used for DNA isolation and genome 99 sequencing; cultures grown on sugar beet pulp were utilized for the transcriptome 100 analysis. 101 102 2.2 Genome sequencing and assembly 103 For genomic DNA preparation, the fungus grown on glucose was filtered and 104 frozen with liquid nitrogen in a mortar, and then powdered with a pestle. The 105 genomic DNA was purified using the Genomic DNA Purification Kit (Thermo 106 Fischer Scientific, USA) according to the manufacturer’s instructions. 5 bioRxiv preprint doi: https://doi.org/10.1101/197368; this version posted October 2, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 107 Genome sequencing was carried out in two stages and with two different 108 technologies. First, Illumina sequencing was performed on a HiSeq2000 instrument 109 utilizing two genomic libraries: 180bp (paired-end) and 2Kb (mate-pair) insert sizes, 110 with 100 bp read length. Second, PacBio sequencing technology was used to 111 produce long reads (over 1Kb). 112 The genome assembly process was carried out in two steps. First, both 113 Illumina libraries were assembled with ALLPAHTS-LG software (version r43019) 114 (Gnerre et al. 2011) using a 200X coverage in order to build high quality contigs 115 and scaffolds. Second, the Illumina assembly was scaffolded with the long PacBio 116 reads using SSPACE-Long-Reads version 1.1 (Boetzer and Pirovano 2014). 117 Finally, the assembly process was validated using CEGMA (Parra et al. 118 2007), by aligning 248 highly conserved eukaryotic proteins to the resulting 119 scaffolds. Since the CEGMA proteins are highly conserved, alignment methods 120 can identify their exon-intron structures on the assembled genome, thus allowing 121 estimation of the completeness of the assembled genome in terms of gene 122 numbers.

The Genome Sequence of the Soft-Rot Fungus Penicillium

Peach Brown Rot: Still in Search of an Ideal Management Option

Phylogeny and Nomenclature of the Genus Talaromyces and Taxa Accommodated in Penicillium Subgenus Biverticillium

Diversity and Extracellular Enzymes of Endophytic Fungi Associated with Cymbidium Aloifolium L

Phylogeny and Nomenclature of the Genus Talaromyces and Taxa Accommodated in Penicillium Subgenus Biverticillium

Bioactive Compounds Produced by Strains of Penicillium and Talaromyces of Marine Origin

New Insights in the Study of Strawberry Fungal Pathogens

(Oryza Sativa L.) and Defense Against Nematodes Njira

Aspergillus and Penicillium Species

An Online Resource for Marine Fungi

Aspergillus, Penicillium and Related Species Reported from Turkey

Diversity, Ecological Role and Biotechnological Potential of Antarctic Marine Fungi

The UK National Culture Collection (UKNCC) Biological Resource: Properties, Maintenance and Management