Comparative Genome Analysis and Genome Evolution of Members of the Magnaporthaceae Family of Fungi Laura H
Total Page:16
File Type:pdf, Size:1020Kb
Okagaki et al. BMC Genomics (2016) 17:135 DOI 10.1186/s12864-016-2491-y RESEARCH ARTICLE Open Access Comparative genome analysis and genome evolution of members of the magnaporthaceae family of fungi Laura H. Okagaki1,3, Joshua K. Sailsbery1,2, Alexander W. Eyre1,3 and Ralph A. Dean1,3* Abstract Background: Magnaporthaceae, a family of ascomycetes, includes three fungi of great economic importance that cause disease in cereal and turf grasses: Magnaporthe oryzae (rice blast), Gaeumannomyces graminis var. tritici (take- all disease), and Magnaporthe poae (summer patch disease). Recently, the sequenced and assembled genomes for these three fungi were reported. Here, the genomes were compared for orthologous genes in order to identified genes that are unique to the Magnaporthaceae family of fungi. In addition, ortholog clustering was used to identify a core proteome for the Magnaporthaceae, which was examined for diversifying and purifying selection and evidence of two-speed genome evolution. Results: A genome-scale comparative study was conducted across 74 fungal genomes to identify clusters of orthologous genes unique to the three Magnaporthaceae species as well as species specific genes. We found 1149 clusters that were unique to the Magnaporthaceae family of fungi with 295 of those containing genes from all three species. Gene clusters involved in metabolic and enzymatic activities were highly represented in the Magnaporthaceae specific clusters. Also highly represented in the Magnaporthaceae specific clusters as well as in the species specific genes were transcriptional regulators. In addition, we examined the relationship between gene evolution and distance to repetitive elements found in the genome. No correlations between diversifying or purifying selection and distance to repetitive elements or an increased rate of evolution in secreted and small secreted proteins were observed. Conclusions: Taken together, these data show that at the genome level, there is no evidence to suggest multi- speed genome evolution or that proximity to repetitive elements play a role in diversification of genes. Keywords: Magnaporthaceae, Magnaporthe, Gaeumannomyces, Two-speed genome, Zig zag model, Comparative genomics, CAzymes, Transcription factors, Diversifying selection, Purifying selection Background genome analysis to identify genes that have similar func- Genome comparison studies have become critical to un- tion. For pathogens, comparisons can help to find novel derstanding the evolutionary relationships between simi- drug targets, mechanisms of infection, or common genes lar species. Genome sequencing and expression data that might shed light on pathogenic and non-pathogenic have become more cost-effective and easier to generate, lifestyles. resulting in an increase in the number of available ge- Homologs are genes that are shared among related or- nomes for analysis. In mycology, many of the genomes ganisms and can be used for genome comparisons. Ho- are poorly annotated, resulting in a need for large scale mologs can fall into two different subclasses: orthologs and paralogs. Orthologs are derived from a common an- * Correspondence: [email protected] cestor but usually diverge by speciation, resulting in 1Center for Integrated Fungal Research, North Carolina State University, 851 retention of similar functions during evolution. In con- Main campus Drive, Raleigh, NC 27606, USA trast, paralogs typically diverge after speciation and are 3Department of Plant Pathology, North Carolina State University, Raleigh, NC, USA the result of gene duplication events and may or may Full list of author information is available at the end of the article not retain similar functions. Orthologs and paralogs can © 2016 Okagaki et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Okagaki et al. BMC Genomics (2016) 17:135 Page 2 of 18 be useful tools in genome comparison studies because 103 fungal proteomes were performed by Zhao et al. [9], they can highlight genes shared among species that are and showed for M. oryzae, G. graminis var. tritici, and important to conserved biological processes or can re- M. poae that GHs were the most abundant class. Targets veal those genes that are unique to a particular subset of of GHs include cellulose, glycans, glucans, and chitin, fungi, such as families of fungi or fungi with a specific suggesting both plant and fungal targets for this enzyme lifestyle. Several algorithms have been developed to class [8, 9]. study orthologs across species, but most are limited to Fungal effector proteins are secreted proteins, often comparisons between only two species. OrthoMCL [1] is less than 250 amino acids in length, which interact an algorithm used for the identification of orthologs with host plant proteins in order to modulate the between multiple species. Developed by Li et al. [2], host immune system and promote infection [10, 11]. OrthoMCL uses multiple steps including BLASTp and Effectorsproteinshavebeenshowntobehighlydi- Markov clustering in order to group genes into likely versifying[3,6,11–24] and may be undergoing acceler- orthologous clusters. Using such algorithms, genes with ated evolution. Studies in M. oryzae have shown that similar functions as well as those genes unique to each some effector proteins are undergoing high rates of diver- species can be identified. sification in order to evade the host immune response, The Magnaporthaceae family of fungi contains several suggesting that there is selection pressure by the host en- economically important plant pathogens. Among the vironment to rapidly accumulate non-synonymous muta- pathogenic members of this family are Magnaporthe ory- tions [3, 12, 15, 21, 22, 24, 25]. These data suggest that zae, Gaeumannomyces graminis var. tritici, and Magna- diversification of genes through mutation is one mechan- porthe poae. M. oryzae is known as the rice blast fungus ism for fungi to evolve to escape plant recognition. This and causes disease in rice, wheat, and barley following concept of two-speed genome evolution, where virulence landing of conidia on the host plant leaf [3, 4]. Upon genes evolve more rapidly than other genes, has impli- germination on the hydrophobic leaf surface, the forma- cated repetitive DNA elements, including retrotranspo- tion of a specialized infection structure, the appresso- sons, in the increased rate of evolution in effector proteins rium, is stimulated. The appressorium penetrates the [12–22]. Together, CAzymes and small secreted proteins leaf surface allowing the fungus to invade and spread in are critical to initial host-pathogen interactions that allow the plant tissue. M. oryzae outbreaks have been known a fungal pathogen to degrade and enter host cells while to devastate vast acreages of rice on a regular basis and modulating their response to invasion. With more recent is a major concern for global food security [4, 5]. More advances in bioinformatics, both CAzymes and small se- recently, M. oryzae has also been shown to cause disease creted proteins of special interest can be identified and on other cultivated grasses including barley and wheat, characterized prior to studying them at the bench. increasing its threat to the food supply [3, 6]. G. grami- The goal of this study was two-fold: identify genes and nis var. tritici is the causative agent of take-all disease in gene clusters that are unique to the Magnaporthaceae wheat [3, 7]. Unlike M. oryzae, which targets the leaf of family of fungi in order to identify genes that may be in- the plant, G. graminis var. tritici attacks the roots of volved pathogenesis, and identify a core proteome of wheat plants resulting in root rot. Hyphae of the soil- conserved genes and identify functional clusters that are borne fungus wrap around the root and invade the root undergoing rapid diversification. First, the protein se- structure causing tissue necrosis and subsequent killing quences from 74 fungal genomes, including the genomes of the plant [3, 7]. M. poae, the causative agent of sum- of M. oryzae, G. graminis var. tritici, and M. poae, were mer patch disease in turf grasses, acts in a similar man- chosen from the Broad Institute’s Fungal Genome Initia- ner to G. graminis var. tritici and attacks the roots of tive [26] for OrthoMCL analysis. The genomes included grasses causing root-rot and subsequent host plant consisted of plant and animal pathogens as well as the death [3]. genomes of model fungi, such as Saccharomyces cerevi- Identification of proteins that are involved with host- siae. OrthoMCL clusters that contained only genes from pathogen interactions has, until recently, relied on the Magnaporthaceae family of fungi and unclustered molecular biology techniques at the bench. For plant genes that are species specific were further analyzed. pathogens, several classes of proteins are frequent tar- Gene Ontology annotation (GO annotation) [27], and gets of further study including carbohydrate active InterProScan [28] protein domain identification