Pan-Genome Analyses of Model Fungal Species
Total Page:16
File Type:pdf, Size:1020Kb
RESEARCH ARTICLE McCarthy and Fitzpatrick, Microbial Genomics 2019;5 DOI 10.1099/mgen.0.000243 Pan-genome analyses of model fungal species Charley G. P. McCarthy1,2 and David A. Fitzpatrick1,2,* Abstract The concept of the species ‘pan-genome’, the union of ‘core’ conserved genes and all ‘accessory’ non-conserved genes across all strains of a species, was first proposed in prokaryotes to account for intraspecific variability. Species pan- genomes have been extensively studied in prokaryotes, but evidence of species pan-genomes has also been demonstrated in eukaryotes such as plants and fungi. Using a previously published methodology based on sequence homology and conserved microsynteny, in addition to bespoke pipelines, we have investigated the pan-genomes of four model fungal species: Saccharomyces cerevisiae, Candida albicans, Cryptococcus neoformans var. grubii and Aspergillus fumigatus. Between 80 and 90 % of gene models per strain in each of these species are core genes that are highly conserved across all strains of that species, many of which are involved in housekeeping and conserved survival processes. In many of these species, the remaining ‘accessory’ gene models are clustered within subterminal regions and may be involved in pathogenesis and antimicrobial resistance. Analysis of the ancestry of species core and accessory genomes suggests that fungal pan-genomes evolve by strain-level innovations such as gene duplication as opposed to wide-scale horizontal gene transfer. Our findings lend further supporting evidence to the existence of species pan-genomes in eukaryote taxa. DATA SUMMARY of a species (Fig. 1). The pan-genome of a species is then usually subdivided into two components. (i) The ‘core’ All the genomic sequence data has been previously genome, containing genes conserved across all observed uploaded to the National Center for Biotechnology Information (NCBI) GenBank, and links to relevant genomes from a species. These genes are usually, but not always, essential for the viability of an individual organism articles or NCBI BioProject pages are included in ‘ ’ ‘ ’ Table S1 (available with the online version of this arti- [2]. (ii) The accessory or dispensable genome, containing cle). Gene model prediction and post-processing pan- genes specific to sets of isolate genomes or individual isolate genome analysis pipelines are available from https:// genomes within a species. These genes could influence phe- github.com/chmccarthy/pangenome-pipelines. notypic differences between isolates; for example, in bacter- ia, antibiotic-resistant and antibiotic-susceptible isolates of the same species may have different accessory genomes [2]. INTRODUCTION A species’ pan-genome can evolve as a consequence of life- Many fields of eukaryote functional and comparative geno- style: sympatric species may have large pan-genomes (and mics rely on the use of curated reference genomes intended thus a large degree of intraspecific variation), while environ- to be broadly representative of a given species. Regardless of mentally isolated or highly specialized species have smaller their quality, reference genomes do not and cannot contain pan-genomes [2–5]. The existence of a species pan-genome all genetic information for a species due to genetic and in prokaryotes was first demonstrated across eight patho- genomic variation between individuals within a species [1]. genic strains of Streptococcus agalactiae in 2005 [6], and was To account for such variation, it has become increasingly quickly confirmed by similar analysis of exemplar bacteria common to refer to species with multiple genomes and archaea, including Haemophilus influenzae, Escherichia sequenced in terms of their ‘pan-genome’, which is defined coli and Sulfolobus islandicus [7–11]. Over 40 prokaryote as the union of all genes observed across all isolates/strains species had their pan-genomes described in the literature by Received 5 September 2018; Accepted 23 November 2018; Published 4 February 2019 Author affiliations: 1Genome Evolution Laboratory, Department of Biology, Maynooth University, Maynooth, Co. Kildare, Ireland; 2Human Health Research Institute, Maynooth University, Maynooth, Co. Kildare, Ireland. *Correspondence: David A. Fitzpatrick, [email protected] Keywords: fungal pangenomes; comparative genomics; yeast; Aspergillus; Cryptococcus. Abbreviations: 100GS, 100-genomes strains; BGC, biosynthetic gene cluster; CGN, conserved genomic neighbourhood; DP, dispensable pathway; GO, gene ontology; HGT, horizontal gene transfer; NCBI, National Center for Biotechnology Information; SGD, Saccharomyces Genome Database; UPR, unfolded protein response. Data statement: All supporting data, code and protocols have been provided within the article or through supplementary data files. Five supplementary tables and seven supplementary figures are available with the online version of this article. 000243 ã 2019 The Authors This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 1 McCarthy and Fitzpatrick, Microbial Genomics 2019;5 2013 [2]. Many tools for pan-genome analysis have been published in recent years, which utilize methods such as IMPACT STATEMENT whole-genome alignment, read mapping, clustering algo- Recent prokaryotic genomic studies of multiple individu- – rithms or de Bruijn graph construction [12 16]. als from the same species has uncovered large differen- Although the concept of the species pan-genome is well- ces in the gene content between individuals. It has established in comparative prokaryote genomics, it has only become increasingly common to refer to species with recently been extended to comparative intraspecific studies multiple genomes sequenced in terms of their ‘pan- of eukaryotes. This is despite repeated observation of intra- genome’. The pan-genome is the union of ‘core’ con- specific genomic content variation in eukaryotes dating served genes and all ‘accessory’ non-conserved genes back to the first intraspecific comparative analyses of Sac- across all strains of a species. Species pan-genomes charomyces cerevisiae genomes in the mid-2000s [17–20]. have been analysed in many prokaryotic species, but The relative dearth of eukaryotic pan-genome analysis in have been recently demonstrated in eukaryotes such as the literature is due in part to the relative difficulty of plants and fungi as well. Here, we have investigated the sequencing and analysing large eukaryotic genome datasets pan-genomes of four model fungal species namely, Sac- relative to prokaryotes [21]. Additionally, while horizontal charomyces cerevisiae, Candida albicans, Cryptococcus gene transfer (HGT) is thought to be the driving influence neoformans var. grubii and Aspergillus fumigatus. Each in prokaryotic gene family and pan-genome evolution, species is a model organism for fungal evolutionary biol- HGT occurs in far lower rates in eukaryotes and is more dif- ogy, genomics and comparative genomics. Our results ficult to detect [22–26]. Despite these challenges, there have show that between 80 and 90 % of gene models per been a number of recent studies of intraspecific variation strain are core genes that are highly conserved, many of within diverse eukaryote taxa that show strong evidence for which are involved in housekeeping and conserved sur- the existence of a eukaryotic pan-genome in some form. For vival processes. The remaining accessory gene models example, comparative analysis of nine diverse cultivars of are clustered within subterminal regions, and may be ~ Brassica oleracea found that 19 % of all genes analysed involved in pathogenesis and antimicrobial resistance. ~ were part of the B. oleracea accessory genome, with 2 % of Analysis of the ancestry of species core and accessory these being cultivar-specific [27]. A similar comparison of genomes suggests that fungal pan-genomes evolve by seven geographically diverse wild soybean (Glycine soja) strain-level innovations such as gene duplication as strains found approximately the same 80 : 20 proportion of opposed to wide-scale horizontal gene transfer. Our find- core to accessory gene content within the wild soybean pan- ings lend further supporting evidence to the existence of genome, while larger accessory genome sizes have been species pan-genomes in eukaryote taxa. reported in wheat, maize, grasses and Medicago [28–32]. Individual strains of the coccolitophore Emiliania huxleyi have an accessory complement of up to 30 % of their total within the Z. tritici species pan-genome, which is thought to gene content, which varies with geographical location [33]. arise from the species’ own genome defence mechanisms In fungi, a number of studies of the Saccharomyces cerevi- inducing polymorphisms as opposed to gene duplication siae pan-genome, including a recent large-scale analysis of events [37]. Peter et al. [36] observed a large proportion of genome evolution across 1011 strains, have shown evidence accessory genes within Saccharomyces cerevisiae appear to for an accessory genome of varying size, as well as large var- have arisen via introgression from closely related Saccharo- iation in subterminal regions across multiple Saccharomyces myces species, with a smaller number originating from HGT cerevisiae strains [13, 34–36], and recent analysis of the events with other yeasts [36]. Zymoseptoria tritici pan-genome found that up to 40 % of