Fungal Intron Evolution: Why a Small Genome Has Many Introns? Kemin Zhou, Alan Kuo, Asaf Salamov, and Igor Grigoriev
Total Page:16
File Type:pdf, Size:1020Kb
Fungal Intron Evolution: Why a small genome has many introns? Kemin Zhou, Alan Kuo, Asaf Salamov, and Igor Grigoriev Introduction Exon number reduction half loss rule. S. roseus is an exception Most frequent and the shortest exon length and evidence of intron loss Here we are trying to answer the question why one of the Table 2. Intron evolution within genomes. Coding exon number between differently smallest genome Sporobolomyces roseus has one of the most conserved genes were compared. The genes were divided in four conservation groups: introns of all fungal genomes in the context of fungal intron Sporo1 all--all genomes (GCAS), between-- between different phyla (GCBP), phylum--within the y = 0.503 x + 1.172 evolution. In this study we used a statistical comparative 1000 −0.7812x same phylum (GCWP), and species-specific genes (SSG). The p-values for t-test are 0 280067 genomics approach toward intron number evolution among 16 No Sporo1, p-val=8.196e-07 L = 1060.1e −1.8961x +198.5 1 206910 colored red if less than 10e-4, pink if less than10e-3, yellow if less than 10e-2, and green fungal genomes. Pospl1 2 206494 if less than 0.05. 600 2000 2500 3000 Table1. Fungal genomes used in this study. cryneo1 567 Database Species dbname all p-val between p-val phylum p-val species Lacbi1 Length Exon Average Count 0 200 Aspni1 Aspergillus niger Aspni1 3.76 8.61E-08 3.31 3.24E-09 3.02 0.002406 2.83 Phchr1 Phybl1 copci1 Mycfi1 Mycosphaerella fijiensis Batde5 5.90 0.000305 5.18 0.224571 4.86 8.79E-07 3.57 copci1 7.32 0.000322 6.61 8.94E-18 5.65 4.17E-34 4.38 Mycgr1 Mycosphaerella graminicola Batde5 G L(x 1) Necha2 Nectria haematococca cryneo1 7.29 0.000356 6.69 0.171129 6.37 1.32E-06 5.18 Necha2 = + Trire2 Picst3 Pichia stipitis Lacbi1 7.89 2.61E-07 6.84 2.97E-10 6.11 1.43E-35 4.83 Mycgr1 6000 8000 Trive1 Aspni1 0 500 1000 1500 Mycfi1 2.49 0.5253 2.44 0.065689 2.52 5.96E-09 2.23 Species-specific Number Exon Ascomycota Trire2 Trichoderma reesei Mycfi1 Trive1 Trichoderma virens Mycgr1 2.48 0.087217 2.59 0.000774 2.75 0.00569 2.91 0 100 200 300 400 500 234 Necha2 3.33 0.001856 3.09 0.002464 2.97 0.00023 3.14 ustma1 Total Exon Length copci1 Coprinus cinereus 80 160 Picst3 2000 4000 Phchr1 7.08 1.40E-05 6.32 2.41E-22 5.18 5.14E-12 4.34 Exon Length cryneo1 Cryptococcus neoformans 0 10203040506070 Phybl1 6.18 0.002574 5.68 0.003718 6.54 2.89E-14 4.21 2345678 Lacbi1 Laccaria bicolor Picst3 1.44 0.425451 1.41 0.510146 1.44 0.098549 1.54 Number of Introns Phchr1 Phanerochaete chrysosporium Exon Number Conserved in All Figure 9. Exon length distribution. Exon length shorter than 500 nt from Pospl1 6.90 0.31744 6.69 0.983844 6.68 1.28E-06 5.92 Z y Pospl1 Postia placenta all 16 genomes are plotted with exon of different phases. Exon phase is g Sporo1 7.21 0.677986 7.29 0.519538 7.48 0.405048 7.21 C o Basidiomycota Figure 4. Half loss rule. Showing the linear relationship between the average Figure 8. The shortest most frequent exon length. Top half, mean exon length as a h m Sporo1 Sporobolomyces roseus defined as the remainder of the length of exon divided by 3. The total y y tr c Trire2 3.31 3.97E-05 2.99 0.001305 2.85 0.046557 3.06 number of exons in species specific genes (SSG) and that of genes function of number of introns. The equation set x to 60 to 70, the estimated exon id o ustma1 Ustilago maydis number of exons (all sizes) in different phases are shown in the legend. io ta conserved in all species (GCAS). Sporo1 is an exception although its inclusion length is 66-86 nt long. The bottom half is simply plots the total exon length against m Trive1 3.35 5.51E-06 2.99 0.000228 2.84 0.045978 2.95 Phase 0 exon dominates. yc Phybl1 Phycomyces blakesleeanus o still make the correlation statistically significant to p-value of 9.996e-06. the intron numbers. t ustma1 1.67 0.846634 1.69 0.778802 1.67 0.003047 1.90 a Batde5 Batrachochytrium dendrobatidis Table 3. Conserved gene have shorter introns. Reversetranscriptase have divergent effects on No intron loss for S. roseus (Sporo1) Average of the log intron length (ALIL) were compared between conservation level all and species. Only Aspni1 exon number 7.29 Basidiomycota Aspni1 Mycfi1 Mycgr1 Necha2 250 cryneo1 Ag 500 showed no significant difference. The two genomes with the ari 300 400 7.66 T 7.32 co most intron loss ustma1 and Picst3 showed the opposite r m e yc 200 Sporo1 300 400 m copci1 7.89 et 78 trend. Column diffexp is the natural exponential of the e 150 200 250 300 e s Lacbi1 l 200 lo Lacbi1 100 m 100 differences of ALIL (species – all). 7.21 100 150 200 y 6.90 0204060801000 20406080100020406080100020406080100 cryneo1 copci1 c 0.05 Pospl1 6 cryneo1 Sporo1 e Phchr1 70 Pospl1 Sporo1 350 t Phchr1 Pospl1 e 0.07 0.62 Picst3 Trive1 Trire2 copci1 dbname All Species diffexp P-value s P Lacbi1 u -0.14 Phybl1 copci1 c 5 Aspni1 4.277 4.279 0.2 0.88794301 Phybl1 ci n 0.28 7.08 500 700 all io -0.03 Phchr1 Batde5 m 0.04 Batde5 4.605 4.653 5.0 1.47E-05 between Batde5 yc 0.05 phylum ot i 300 species n 10 30 50 Batde5 copci1 4.130 4.208 5.0 4.93E-22 a -0.23 Agaricomycotina 5.9 50 150 250 5678 -0.28 0204060801001000 200 300 20406080100 400 020406080100020406080100 Aspni1 cryneo1 4.083 4.175 5.7 4.99E-30 -0.02 Necha2 Chytridiomycota 80 ustma1 1.68 Lacbi1 Phchr1 0 Count 400 1000 Trive1 Mean Number of Exons of Mean Number Lacbi1 4.035 4.319 18.6 0 -5.57 Trire2 0 4 300 -1.07 Ustilaginomycotina 800 Aspni1 Mycgr1 Mycfi1 4.288 4.776 45.8 1.82E-81 Trire2 Mycfi1 200 7.25 -3.48 cryneo1 300 400 500 Trive1 Necha2 Pospl1 Mycgr1 4.283 4.878 58.9 7.27E-137 Average Number of Exons 200 300 400 500 100 Zygomycota 200 400 600 ustma1 0 Necha2 4.180 4.235 3.7 9.06E-05 Saccharomycotina 0204060801000 20406080100020406080100020406080100 Picst3 Mycgr1 Mycfi1 6.18 Phybl1 1234 Batde5 800 Phybl1 Phchr1 4.026 4.069 2.5 8.99E-08 ustma1 23 Pezizomycotina -0.04 -2.33 Phybl1 4.584 4.653 7.0 2.87E-12 ustma1 0.40 0.42 0.44 0.46 0.48 0.50 Picst3 -0.04 64 250 350 Picst3 4.428 4.251 -13.6 0.01634488 -1.21 Sporo1 Mean Relative Intron Location 1234567 150 40 60 80 100 100 200 300 400 Dothideomycetes -0.01 -0.40 Picst3 1.44 200 400 600 Pospl1 4.200 4.507 23.9 5.11E-129 log (Total Number of RT) 50 0204060801000 20406080100020406080100020406080100 Sporo1 4.412 4.529 10.2 7.64E-31 2.49 Mycfi1 Figure 3. Estimating the number of exons in the ancestor of -0.02 -0.003 0.1 Trire2 4.460 4.527 6.0 0.00102148 0.01 Percent Relative Location from 5’-End fungi with relative intron location. Figure 13. Average number of exons and amount of reverser transcriptase 0.1 Mycgr1 -0.03 0.01 Necha2 3.33 Ascomycota Trive1 4.361 4.450 7.3 5.26E-08 relationship. At species level, there is a positive correlation for 10 out of 16 2.48 Trire2 Trive1 Figure 2. Intron relative location distribution. A regression line was drawn with data ustma1 4.689 4.506 -18.2 0.0001481 genomes. At more conserved levels there is a negative correlation. Aspni1 3.35 Sordariomycetes Eurotiomycetes 3.31 excluding the extreme values from both ends. The dip from both ends are due to 3.76 edge effects. Data are grouped for every 1%. Summary and Discussion Intron length variability Figure 1. Whole genome phylogenetic tree and intron gain/loss estimates with Linear Least Square (LLS) method. The average number of coding exons from Pospl1 GCAS are labeled next to each database name (used for abbreviation of p-val: 0.01845 Why S. roseus has the smallest genome Intron lengths in fungi assume roughly log normal distribution so our analysis was carried out in log scale. Species names). Bootstrap values are all 100% except for the two values Mycfi1 Lacbi1 The average introns from GCAS range from 56 to 109 nt, but those from SSG range from 58 to 131 nt. Our shown in light blue boxes. Each value on the branch represents the estimated 18.0 analysis clearly showed that there is an overall trend of less conserved genes tend to have longer introns in intron gain or loss. The major phyla and subphyla are labeled. The number in Phybl1 No intron loss detected by phylogenetic tree method, or very few by the relative intron Necha2 most genome, but shorter introns from P. stipitis and U. maydis the only two genomes where intron loss has circle is the estimated number of coding exons of ancestor of fungi.