1 Evidence for suppression of immunity as a driver for genomic
2 introgressions and host range expansion in races of Albugo candida, a generalist
3 parasite
4 Mark McMullana,e†, Anastasia Gardinera†, Kate Baileya, Eric Kemenc, Ben J.
5 Wardb, Volkan Cevika, Alexandre Robert-Seilaniantza, Torsten Schultz-Larsena, Alexi
6 Balmuthd, Eric Holubf, Cock van Oosterhoutb‡, Jonathan D. G. Jonesa‡
7 a - The Sainsbury Laboratory, Norwich Research Park, Norwich, NR47UH, UK;
8 b - School of Environmental Sciences, University of East Anglia, Norwich Research
9 Park, Norwich NR4 7TJ, UK; c - Max Planck Research Group Fungal Biodiversity,
10 Max Planck Institute for Plant Breeding Research, Carl-von-Linne Weg 10, Cologne,
11 50829, Germany; d - J.R. Simplot Company , 5369 Irving street, Boise, Idaho 83706,
12 USA; e - The Genome Analysis Centre, Norwich Research Park, Norwich, NR47UH,
13 UK; f - University of Warwick, School of Life Sciences, Warwick Crop Centre,
14 Wellesbourne, CV35 9EF, UK.
15
16 † Authors contributed equally to this work.
17 ‡ Corresponding Authors: Jonathan D. G. Jones, The Sainsbury Laboratory,
18 Norwich Research Park, Norwich, NR47UH, UK; tel +44(0)1603450400;
19 [email protected] and Cock van Oosterhout, School of Environmental
20 Sciences, University of East Anglia, Norwich Research Park, Norwich NR47TJ, UK;
21 tel. +44(0)1603592921; [email protected]
22 Competing interests. The authors declare that they have no competing financial,
23 professional or personal interests that might have influenced the performance or
24 presentation of the work described in this manuscript.
1
25
26
27 Abstract
28 How generalist parasites with wide host ranges can evolve is a central question in
29 parasite evolution. Albugo candida is an obligate biotrophic parasite that consists of
30 many physiological races that each specialize on distinct Brassicaceae host species. By
31 analyzing genome sequence assemblies of five isolates, we show they represent three
32 races that are genetically diverged by ~1%. Despite this divergence, their genomes are
33 mosaic-like, with ~25% being introgressed from other races. Sequential infection
34 experiments show that infection by adapted races enables subsequent infection of
35 hosts by normally non-infecting races. This facilitates introgression and the exchange
36 of effector repertoires, and may enable the evolution of novel races that can undergo
37 clonal population expansion on new hosts. We discuss recent studies on hybridization
38 in other eukaryotes such as yeast, Heliconius butterflies, Darwin’s finches, sunflowers
39 and cichlid fishes, and the implications of introgression for pathogen evolution in an
40 agro-ecological environment.
41 Introduction
42 Most parasites have a restricted host range and are often unable to exploit even closely
43 related hosts (Thompson 2005; Poulin & Keeney 2014). Compared to necrotrophs that
44 reproduce on dead plant material, obligate biotrophic parasites can only reproduce on
45 living tissue, and thus are intimately associated with their hosts (Thines 2014). This
46 might be expected to result in host specialization. The adaptive evolution of, for
47 example, new effectors that enable more efficient exploitation of one host species,
48 increases the risk of detection in other host species by triggering their immune system
2
49 (Martin & Kamoun 2012). Due to this trade-off, the saying “Jack of all trades, master
50 of none” is especially true for obligate biotrophic parasites, because natural selection
51 that maximises parasite fitness on one particular host species might lead to
52 specialisation and reduced host range (e.g. Dong et al. 2014). Yet there are generalist
53 biotrophic parasites that appear to have overcome this evolutionary dilemma and show
54 virulence on diverse hosts.
55 Some “generalist” parasite species have solved the dilemma by evolving
56 multiple specialised races, each of which infect different hosts. For example, the
57 eukaryotic oomycete order Albuginales consists entirely of obligate biotrophic
58 pathogens that cause disease on a broad range of plant hosts (Biga 1955; Choi & Priest
59 1995; Walker & Priest 2007). Its largest genus, Albugo, comprises ~50 (usually)
60 specialist pathogens (Choi et al. 2009; Thines et al. 2009; Ploch et al. 2010; Choi et al.
61 2011), yet A. candida (Pers.) Roussel can infect over 200 species of plants in 63
62 genera from the families of Brassicaceae, Cleomaceae and Capparaceae (Saharan &
63 Verma 1992; Choi et al. 2009). A. candida infections are the causal agent of “white
64 blister rust” disease resulting in 9 – 60% losses on economically important oilseed and
65 vegetable brassica crops. (Saharan & Verma 1992; Harper & Pittma 1974; Meena et
66 al. 2002; Barbetti & Carter 1986). A. candida consists of different physiological races
67 that each usually show high host specificity (Hiura 1930; Pound & Williams 1963;
68 Petrie 1988). ~24 races of A. candida have been defined, based primarily on their host
69 range (Saharan & Verma 1992).
70 A. candida is a diploid organism that reproduces both asexually and
71 sexually (Holub et al. 1995), but the relative importance of both reproductive modes
72 is not well established. During asexual reproduction, diploid zoospores are formed in a
73 special propagule (the zoosporangium) on a thallus of hyphae beneath the leaf
3
74 epidermis. White blisters comprising large numbers of dehydrated sporangia
75 eventually rupture the epidermis to release inoculum for dispersal. When reproducing
76 sexually, fertilization between two isolates results in non-motile diploid thick-walled
77 oospores that can resist extreme temperatures and desiccation. Although the relative
78 importance of different mechanisms of reproduction in the Albugo life cycle remains
79 poorly understood, clonal reproduction enables rapid population expansion, especially
80 in genetically uniform crop monocultures.
81 A. candida is considered a single species despite the fact it comprises several
82 specialised physiological races that colonize different host plants (cf. Drès & Mallet
83 2002). According to evolutionary and population genetic theory, adaptations and
84 trade-offs associated with host-specialisation combined with strong population
85 structuring can result in adaptive radiation and speciation (Stukenbrock 2013; Abbott
86 et al. 2013). Conceivably, A. candida is an adaptive radiation “in progress,” and the
87 broad host range is realised by ongoing specialisation of independent races that are on
88 the road to speciation (Drès & Mallet 2002). This raises the important question; does
89 parasite specialization inevitably lead to speciation?
90 Albugo spp. infection strongly suppresses host innate immunity and Albugo spp.
91 are unique compared to other microbial plant pathogens in enhancing host
92 susceptibility to secondary infection by otherwise avirulent pathogens, including
93 downy mildews (Cooper et al. 2008a). It has been suggested that enhanced
94 susceptibility imposed by Albugo might accelerate adaptation of other pathogen
95 species to an Albugo-susceptible host (Thines 2014). However, no evolutionary
96 rationale has been put forward to explain why it might be adaptive for Albugo sp. to
97 render its hosts so susceptible to other pathogens that could compete for access to the
98 same resources (Cooper et al. 2008a). Arguably, suppression of host innate immunity
4
99 could facilitate cohabitation of distinct physiological races and thus may enable
100 genetic exchange between them. Introgression, here defined as the introduction by
101 recombination of syntenic nucleotide variation from a parental donor race into the
102 genome of a recipient race (Hedrick 2013), could slow down genetic divergence, and
103 hence, retard speciation. On the other hand, introgression between races that are well-
104 adapted to exploit different host plants could be maladaptive and strongly selected
105 against because hybrids will inherit effector alleles derived from both parental races.
106 Given that immune recognition of even a single effector is sufficient to trigger the
107 immune response and stop an infection, hybrids that possess an expanded repertoire of
108 effector alleles are likely to have a strong fitness disadvantage on most potential host
109 plants.
110 Much of the ecology and evolution of A. candida remains unknown, but with its
111 many specialized races and a broad host range, this “generalist” plant pathogen is a
112 fascinating study organism. The questions we addressed in the present study are: (1)
113 Are the distinct physiological A. candida races genetically isolated and “on the road to
114 speciation”? (2) Does suppression of host innate immunity enable cohabitation and
115 growth of races with non-overlapping host ranges? To answer these questions, we
116 generated genome sequence assemblies of five isolates that were collected from four
117 host species (Brassica oleracea, B. juncea, Capsella bursa-pastoris, and Arabidopsis
118 thaliana). We sequenced one isolate of one race, and two isolates of each of two
119 additional races. We show that these races are not genetically isolated despite having
120 non-overlapping host ranges. Recombination analysis shows there is widespread
121 genetic exchange between A. candida races, and that hybridisation leading to
122 introgression has occurred numerous times, which include exchanges in the recent
123 past. To explain this observation, we examined whether pre-infection of Arabidopsis
5
124 and Brassica with virulent A. candida races results in enhanced host susceptibility, and
125 found that pre-infection with a virulent strain enables proliferation of an A. candida
126 isolate that would otherwise not colonize that host. For the two races with two isolates,
127 we show that population expansion is by clonal reproduction. We discuss the impact
128 of genetic exchange on A. candida evolution, and consider the implications for
129 pathogen evolution and reproduction in an agro-ecological environment.
130
131 Results
132 A. candida host specificity: single race isolates are host specific
133 AcNc2 was recovered from infected leaves of A. thaliana Eri-1 field-grown
134 plants in Norwich (UK) in 2007. AcEm2 was isolated from wild Capsella bursa-
135 pastoris in Kent (UK) in 1993 (Borhan et al. 2008). Isolate AcBoT was harvested from
136 infected inflorescences of B. oleracea cultivar “Bordeaux F1” in Lincolnshire in May
137 2009, and another isolate AcBoL was harvested from infected leaves of B. oleracea in
138 Lincolnshire (UK) in January 2009. Races were single spore purified (E Kemen et al.
139 2011). The Ac2V isolate virulent on B. juncea was provided by M. Borhan
140 (Agriculture and Agri-Food, Canada (Links et al. 2011)) and single-spore purified.
141 We tested the virulence of single race infections of AcNc2, Ac2V and AcBoT
142 on different host species and cultivars. AcNc2 was propagated on A. thaliana Ws-2.
143 Two B. juncea cultivars (“Cutlass” and “Czerniac”) and 18 cultivars of B. oleracea
144 were resistant to AcNc2. We screened 356 A. thaliana accessions for their
145 susceptibility to AcNc2 and 38.5 % were susceptible (Table 1, Supplementary files 1,
146 2). Race Ac2V was originally isolated in Canada on B. juncea (Rimmer et al. 2000).
147 We confirmed virulence of Ac2V on B. juncea by spray inoculation of B. juncea
6
148 “Cutlass” and “Czerniac”. On three-week-old A. thaliana plants, all 107 tested
149 accessions were resistant. Two B. oleracea cultivars were also resistant to Ac2V
150 (Table 1, Supplementary files 1, 2). Race AcBoT was virulent on all fifteen tested
151 cultivars of B. oleracea. In contrast, 34 accessions of A. thaliana were fully resistant to
152 AcBoT, as were B. rapa and B. juncea cultivars (Table 1, Supplementary file 1).
153 These tests confirm that A. candida races show pronounced host specificity to distinct
154 host species. While we cannot prove that there are no host species in nature that
155 support growth of more than one of the three races we define here, our experiments
156 and all literature strongly suggests that Ac2V only grows on B. juncea, and on some
157 accessions of B. rapa, a diploid ancestor of tetraploid B. juncea (Kole et al. 2002),
158 AcBoT and AcBoL only grow on B. oleracea, and AcNc2 and AcEm2 can only grow
159 on a subset of Arabidopsis and Capsella genotypes. Genetic exchange between races
160 is unlikely to occur unless they colonize the same host. In our study, only the immune-
161 compromised A. thaliana Ws-2-eds1 mutant was susceptible to all races.
162
163 Genome assemblies of A. candida isolates
164 The AcNc2 A. candida assembly was used as the reference in this study, and
165 comprises 34 Mb in 5212 contigs of ~160-fold coverage (Table 2). We assembled
166 ~73% of an estimated 45 Mb genome of A. candida AcNc2 (Voglmayr & Greilhuber
167 1998; Links et al. 2011). In a previous study, a similar proportion (76%) of the A.
168 candida Ac2V race genome was assembled (Links et al. 2011). The unassembled part
169 of the genome (~11 Mb) is likely to include repeats and duplicated sequences. Repeat
170 sequences constitute ~17.4% of the AcNc2 assembly. Approximately 8% of annotated
7
171 repeats represent collapsed regions with coverage several times higher than average,
172 so the real repeat content may be higher.
173 Ab initio gene predictions were conducted with several gene prediction
174 programs, resulting in 10,907 predicted gene models. About 90% (9,830) of the
175 predicted proteins have homologous sequences in the proteome of A. candida Ac2V
176 race (15,824 genes) (Links et al. 2011). In about one thousand cases, when we predict
177 a single copy gene in the AcNc2 race, Links and co-authors (2011) have predicted
178 multi-gene families in the Ac2V race, explaining the discrepancy in the predicted gene
179 number for two assemblies.
180 Only 37% of AcNc2 proteins showed significant sequence similarity to known
181 proteins (BLASTP E-value ≤ 10-5). Using the Tribe-MCL algorithm, 3522 genes (32%
182 of the predicted AcNc2 gene repertoire) were clustered into 1020 gene families. The
183 largest gene tribes are protein kinases (187 members), kinesin- (59 genes) and myosin-
184 (44 genes) like proteins and 35 genes homologous to the secreted “CHXC” proteins of
185 Albugo laibachii (E Kemen et al. 2011).
186 915 genes in the AcNc2 assembly are predicted to encode putative proteins with
187 amino-terminal secretory signal peptides, but no trans-membrane domain. Only 34%
188 of the predicted secretome was functionally annotated (BLASTP, E-value ≤ 10-5),
189 including 115 proteins (proteases, hydrolases, elicitin-like proteins, elicitors, protease
190 inhibitors) that could be involved in plant cell wall degradation and protection against
191 host defense enzymes. In addition to the 35 CHxC proteins (Tyler et al. 2006), further
192 candidate virulence factors were identified including 19 homologs of the Phytophthora
193 Crinkler effectors (Haas et al. 2009), and another 23 secreted proteins with “RXLR”
194 and 35 proteins with similar “RXLQ” motifs. Both motifs are located in the N-
8
195 terminal part of protein after the predicted signal peptide, thus resembling the RXLR
196 effectors of P. infestans and H. arabidopsidis (Baxter et al. 2010; Haas et al. 2009),
197 but not having any other significant sequence similarity to these proteins.
198 After carrying out several assemblies based on different k-mer lengths, the
199 quality of each assembly was assessed with various parameters and one best assembly
200 was chosen for each isolate (Table 2). The high similarity of the five A. candida
201 isolates enabled us to conclude we had sequenced three ‘races’, within which AcNc2
202 and AcEm2 were isolates of the same race and AcBoT and AcBoL were also isolates
203 of the same race (Figure 1). Therefore, genome comparisons were first conducted on
204 one representative from each race (AcNc2, Ac2V and AcBoT), from each of which
205 ~33-34 Mb of genome was assembled (Table 2).
206
207 Genome-wide similarity between races with non-overlapping host range
208 To assess the overall genome-wide similarity between races, we performed
209 alignments of reads against the AcNc2 reference assembly. For the majority of the
210 AcNc2 genome, we observed a significant positive correlation between read depth in
211 the reference assembly and mapping depth of the Ac2V and AcBoT reads (r=0.65,
212 P<2.2e-16; Figure 2A). Some AcNc2 regions (3-4% of the assembly) showed low or
213 zero coverage by Ac2V and/or AcBoT reads (Figure 2 Supplement 1), suggesting the
214 presence of highly divergent or unique regions amongst the races. These are gene
215 sparse regions (150 and 234 genes predicted in the AcNc2, respectively), without
216 apparent enrichment for genes encoding for secreted proteins (χ2 = 0.11, d.f. = 1, p >
217 0.7). Amplification of randomly selected AcNc2 genes from these regions revealed
9
218 that four of the selected five genes are indeed absent/or highly diverged in the AcBoT
219 and Ac2V races, and present in the AcNc2 genome.
220 The overall mean level of nucleotide identity in the homologous genomic
221 regions amongst races is ~99% (Figure 2B). We verified 25 polymorphic genomic
222 regions by Sanger sequencing (Supplementary file 3). We used the longest of all
223 contigs from AcNc2, ‘contig 1’ (398,508 bp), to compare the levels of divergence
224 between races versus the number of heterozygous positions within each race. An
225 extremely low proportion of sites (0.03% and 0.01%) on ‘contig 1’ are heterozygous
226 within AcNc2, AcEm2 and Ac2V races, respectively (Figure 3). Races AcBoT and
227 AcBoL are more heterozygous than Ac2V and AcNc2, with 0.65% of nucleotide
228 positions in ‘contig 1’ being heterozygous in AcBoT (Figure 3). Importantly, >97% of
229 all heterozygous positions are shared in AcBoL and AcBoT (see below). In between-
230 race comparisons, ~1.0% of nucleotide positions on ‘contig 1’ have diverged between
231 AcBoT, Ac2V and AcNc2.
232
233 Mosaic-like genome structure of A. candida races
234 Polymorphisms are not homogeneously distributed among A. candida races.
235 Some regions of the genome are identical for up to ten thousand base pairs, whereas
236 the local nucleotide identity is as low as 89% in other regions of up to 5 kb (Figure
237 2B). We examined 133 contigs (12,373,253 bp), covering 38% of the reference
238 assembly. Stretches of nucleotide similarity amongst races are distributed in a block-
239 like structure; there are regions where AcNc2 is highly similar (or identical) to AcBoT
240 and significantly (nucleotide divergence π>1%) diverged from Ac2V, and vice versa
241 (Figure 4). By using multiple different algorithms that can detect recombination in
10
242 DNA sequence data incorporated in the software RDP3 (Martin et al. 2010), we
243 examined whether this pattern can be explained by genetic introgression amongst
244 races. In addition, we used the software HybRIDS
245 (http://www.norwichresearchpark.com/HybRIDS) to perform a probabilistic
246 recombination analysis, calculate the coalescence time of each recombinant block, and
247 visualize the mosaic-like genome structure.
248 Recombination analysis of 133 contigs highlighted a total of 675 recombined
249 blocks on 127 contigs that were significant (after Bonferroni correction) in three or
250 more tests using RDP3 (Supplementary file 4). The combined length of all identified
251 blocks is nearly 3 Mb or ~25% of the analysed regions. These blocks indicate regions
252 of genome in one race that derive from another race (or the ancestor of another race).
253 Figure 4 illustrates the effect of genetic introgression on the pattern of nucleotide
254 similarity between the three races in the largest contig of ~400 kb. The sequence
255 (dis)similarity between the three races shows a mosaic-like genome structure with
256 large regions where races AcNc2 and AcBoT show near sequence identity (yellow
257 blocks in Figure 4B), whilst other areas show a high level of sequence similarity
258 between AcNc2 and Ac2V, and AcBoT and Ac2V (indicated by the purple and
259 turquoise regions, respectively). Note that the presence of such well-defined blocks of
260 high sequence similarity in an otherwise diverged genome is characteristic for rare
261 introgression between organisms that show a high (yet incomplete) level of
262 reproductive isolation. Noteworthy too is the fact that a high level of recombination
263 (relative to the mutation rate) would homogenise the sequence divergence between the
264 races, and hence, that this would not result in the observed mosaic-like structure.
265 Despite the fact that introgression between races is rare, it must have occurred
266 multiple times between the ancestors of the three races given that the coalescence
11
267 times varies markedly between the different blocks (Figure 5). Assuming a base
268 mutation rate of µ=10-8 per cell cycle, with 100 cell cycles per year (i.e. a combined
269 mutation rate of 10-6 per year), analysis in the software HybRIDS show that the most
270 recent introgression event has occurred circa 220 years ago, whilst the oldest event
271 occurred almost 200,000 years ago. The mean age calculated across all introgression
272 events equals 6,237 (±12,594) years (Figure 5). (With a combined mutation rate of
273 µ=10-7 per base per year, the range in the age of introgression would span from 2,200
274 to 2,000,000 years). Irrespective of the mutation rate, the principal finding is that
275 genetic introgression amongst A. candida races is an ongoing evolutionary process
276 occurring across a wide range of evolutionary times, and that it gives rise to mosaic
277 genomes with the introgression blocks interspersed in the recipient genomic
278 background.
279 A total of 1,655 predicted genes are located in the recombined regions, and
280 amongst these, 125 are predicted to encode secreted proteins. In the introgression
281 regions, we identified 14 genes encoding secreted proteases and hydrolases that in
282 some pathogens act as virulence factors (Lebrun et al. 2009; Monod et al. 2002;
283 Soanes et al. 2007). Thus, recombination between races, resulting in introgression can
284 act as a mechanism for exchange of virulence gene alleles. However, neither gene
285 density nor dN/dS are enriched or depleted in regions affected by introgression,
286 compared to regions not affected (Paired t-test: T=-0.05, P=0.958; Mann-Whitney test:
287 W=3.15×166, P=0.152, respectively). It is however entirely likely that by studying
288 only a small subset of all the known races, we have underestimated the actual level of
289 introgression across all races.
290
12
291 Intra-race diversity suggests clonal propagation after creation of novel adapted
292 allele repertoires
293 To better understand the evolution and diversification of A. candida genomes,
294 we analysed the pattern of recombination and nucleotide divergence with the inclusion
295 of two additional A. candida isolates. AcBoL is an additional isolate from the AcBoT
296 race and AcEm2 is an additional isolate of the AcNc2 race (see Figure 1). We found
297 evidence for 581 recombination events, of which 335 included AcBoT or AcBoL as
298 recombinants. These two isolates shared a recombinant block in 97.3% of
299 recombination events (AcBoT and AcBoL = 326; AcBoT = 6; AcBoL = 3). AcNc2
300 and AcEm2 shared a recombinant block in 99.6% of events (AcNc2 and
301 AcEm2 = 246; AcNc2 = 0; AcEm2 = 1). This demonstrates that the AcBoT/AcBoL
302 and AcNc2/AcEm2 races have remained largely unchanged since their initial
303 emergence.
304 The nucleotide diversity within genomes (i.e. the observed heterozygosity) and
305 the nucleotide divergence between genomes (i.e. genetic differentiation) can be used to
306 further understand A. candida population biology. Isolates AcBoT and AcBoL were
307 both collected in Lincolnshire in 2009, and the observed heterozygosity in these
308 isolates was at least 13 times higher than that of any of the other races (percentage
309 heterozygous positions in contig 1: AcBoT = 0.653%; AcBoL = 0.644%;
310 AcNc2 = 0.047%; AcEm2 = 0.044%; Ac2V = 0.028%) (see Figure 3). Remarkably,
311 AcBoT and AcBoL are heterozygous for almost all of the same sites; 97.2% of the
312 sites that are heterozygous in AcBoT are also heterozygous in AcBoL (and 98.5% vice
313 versa). This is only consistent with clonal reproduction because after just one
314 generation of sexual reproduction (or selfing with recombination), Mendelian
315 segregation would eradicate this high level of genotypic similarity. Notably, such
13
316 evidence for clonal propagation of a race would be difficult to obtain for haploid
317 fungal pathogens.
318 With little evidence for recombination and gene flow, most nucleotide
319 divergence between AcBoT and AcBoL must have accumulated through mutation.
320 The nucleotide divergence between these isolates is just 0.030% (121 polymorphisms
321 in 398,508 bp in contig 1) (Figure 3). Assuming a mutation rate of 1 x 10-8 per cell
322 division, and 100 cell divisions per lineage per year, we estimate that these isolates
323 could have diverged 305 (262-353) years ago. The other pair of isolates, AcNc2 and
324 AcEm2, were collected in Norfolk in 2007 and in Kent in 1993 (160 km apart),
325 respectively. Similar to the former two isolates, AcNc2 and AcEm2 are also nearly
326 identical (nucleotide divergence π=0.046%, i.e. they are >99.95% identical) (Figure 3).
327 If we again assume that their nucleotide divergence arose by mutation alone, their
328 estimated divergence time is 890 (814-970) years. However, unlike AcBoT and
329 AcBoL, AcNc2 and AcEm2 are much less heterozygous (see Figure 3), which
330 suggests that another genetic mechanism, e.g. loss of heterozygosity (Lamour et al.
331 2012), might be operating which has eradicated the gene diversity in the clonally
332 propagating races over time.
333
334 Sequential infections abolish host specificity of susceptibility to A. candida
335 infection
336 A. candida infection compromises host resistance against otherwise avirulent
337 pathogen species . Conceivably, A. candida could suppress host defenses to otherwise
338 avirulent races of A. candida, enabling co-infection and sexual exchange. To test this
339 we performed sequential inoculation experiments, identifying races using the genome
14
340 sequences to create isolate-specific DNA markers. thaliana accession Ws-2 leaves
341 towards the B. juncea-infecting race, Ac2V. Also, preinfection by AcBoT suppresses
342 B. oleracea resistance towards Ac2V (Figure 6B). Furthermore, defense suppression
343 was so effective that Ac2V was able to complete its life cycle on both A. thaliana Ws-
344 2 and B. oleracea as observed by successful subsequent infection on B. juncea from
345 the sequentially inoculated plants. In a reciprocal experiment, ). It has long been noted
346 that Albugo sp. have a remarkable capacity to suppress immunity in their hosts Isolates
347 AcNc2 and AcEm2 can infect ~40% of accessions of A. thaliana, and some Capsella
348 accessions; isolates AcBoT and AcBoL have only been found to infect B. oleracea,
349 and race Ac2V can only infect B. juncea and some accessions of B. rapa (one of the
350 two diploid ancestors of B. juncea) (see Table 1). (Cooper et al. 2008a)Race-specific
351 PCR of pre-inoculated plants (Supplementary file 5, 6) shows that preinfection by
352 AcNc2 suppresses resistance in A. pre-infection of B. juncea with Ac2V enabled
353 AcNc2 growth on B. juncea (Figure 6C). Therefore, AcNc2 not only can suppress
354 Ac2V recognition on A. thaliana, but Ac2V is also capable of suppressing B. juncea
355 resistance towards AcNc2. TIR-NB-LRR resistance genes likely confer Ac2V
356 resistance in Arabidopsis (McHale et al. 2006), as Ac2V grows on an eds1-1 mutant of
357 A. thaliana Ws-2 (Supplementary file 6; (Parker et al. 1996)(Cooper et al. 2008a). We
358 hypothesise that suppression of host innate immunity enables co-infection of hosts by
359 races with otherwise non-overlapping host ranges, thus providing a remarkable
360 mechanism to enable sexual genetic exchange between specialised A. candida races.
361
362 Discussion
15
363 A. candida comprises distinct races that specialize on different plant species (Liu et al.
364 1996; Rimmer et al. 2000) and its physiological races can infect over 200 species of
365 plants. Here, we describe the genomes of five isolates from three races of A. candida
366 that colonize distinct host plant species and appear to have non-overlapping host
367 ranges. Genome analyses show that the races have a mosaic-like genome structure that
368 is consistent with genetic introgression between races that have significantly diverged
369 (mean nucleotide divergence ~1%). Despite this divergence, ~25% of the nucleotide
370 sequence within the analysed genome (3.2 Mb out of 12.4 Mb) is derived from other
371 A. candida races. 675 introgressed blocks were identified in the 127 analysed contigs
372 and each block was confirmed by at least three independent recombination algorithms.
373 Based on the high nucleotide similarity of the recombined regions, it appears that
374 some genetic exchange must have occurred recently and that introgression has
375 happened multiple times between the ancestors of the three races.
376 An alternative hypothesis for the observed mosaic-like genome structure is that
377 the polymorphisms were already present in the ancestor of A. candida, and that those
378 may have sorted stochastically among descendant host races. This is known as
379 incomplete lineage sorting (Pamilo & Nei 1988; Hobolth et al. 2007), and such
380 ancestrally shared polymorphism is notoriously difficult to distinguish from
381 polymorphisms shared through secondary contact and introgression. However, by
382 estimating the coalescence time of the introgressed regions we showed that
383 hybridisation between the races is an ongoing evolutionary process, with some
384 introgression events occurring as recent as 220 years ago. We therefore reject the
385 hypothesis of incomplete lineage sorting and propose that genetic exchanges between
386 the A. candida host races have occurred through rare but nevertheless significant
387 introgression events.
16
388 If host-ranges are determined by multiple loci, one might expect that
389 recombination will quickly lead to “super genotypes” with very wide host ranges.
390 However, this is unlikely because of the dual consequences of effector alleles; they not
391 only can contribute to virulence, but if recognized by Resistance (R) gene alleles in a
392 particular host, they can result in the complete loss of virulence on that host (Dodds &
393 Rathjen 2010). Effectors selected to facilitate infection and evade recognition by R
394 genes in one host might be recognised by the R genes in another host and be
395 maladaptive. Thus, genotypes that are highly virulent on multiple hosts are unlikely to
396 arise by recombination between races that specialize on distinct specific hosts,
397 although occasionally, acquiring an effector from one race by introgression might
398 offer an adaptive advantage assuming that the effector is not recognised.
399 Strikingly, each race shared a mosaic pattern of large genomic regions
400 (>10,000bp) that were virtually identical between two races (and diverged from the
401 third). Given that mutations accumulate over time, this implies that the exchange must
402 have occurred relatively recently. Yet, A. candida is an obligate biotrophic parasite
403 that is highly host-specific. The host-specificity of A. candida races was confirmed in
404 our experiments, and this raises the question of how distinct A. candida races can
405 achieve the physical contact required for them to have sex and recombine. We
406 addressed this question using experimental infections of host plants with multiple
407 races. Crucially, we showed that infection with a virulent race of A. candida
408 suppresses host immunity sufficiently to enable subsequent co-colonization by an
409 otherwise non-virulent race (see also Cooper et al. 2008). Although this opens up
410 competition between races for the same (limited) resource, it also brings a significant
411 evolutionary/ecological advantage by blurring the borders of host range, thereby
412 creating secondary contact zones that enable sexual reproduction and genetic exchange
17
413 between races. Given the high-host specificity of obligate biotrophic parasites, this
414 ability of A. candida to enable occasional genetic exchange between specialised races
415 could create novel repertoires of effector alleles that would enable colonization of new
416 hosts (“host jumps”). For example, Arabidopsis resistance to race Ac2V can at least in
417 part be explained by the WRR4 gene, a classical NB-LRR R gene, which presumably
418 recognizes an Ac2V effector (Borhan et al. 2008). Hypothetically, if this Ac2V
419 effector would segregate away in hybrid offspring, or by loss of heterozygosity, these
420 offspring could become virulent on WRR4-carrying Arabidopsis hosts.
421 Once a new hybrid race has been established, it can reproduce asexually and
422 clonally on susceptible hosts without continued genetic exchange with other races. We
423 base this inference on the exceptionally high genotypic similarity between independent
424 isolates that infect the same host plant (i.e. AcBoT-AcBoL and AcEm2-AcNc2). For
425 example, AcBoT and AcBoL share >97% of their heterozygous sites. This observation
426 is significant because it rules out sexual reproduction (or selfing with recombination),
427 given that Mendelian segregation would eradicate this high level of genotypic
428 similarity within a single generation. Hence, we conclude that reproduction of AcBoT
429 and AcBoL is asexual and clonal, and we speculate that these races may have derived
430 from a common ancestor that was selected coincident with the onset of widespread B.
431 oleracea cultivation in Europe. The A. thaliana-infecting isolates (AcNc2 and AcEm2)
432 were sampled at a broad spatiotemporal scale (14 years and 100 miles apart), and yet
433 they showed a very high nucleotide similarity (>99.95% identical). This shows that
434 clonal reproduction coincided with their rapid population expansion. Remarkably
435 though, compared to the former two isolates, AcEm2 and AcNc2 had a relatively low
436 level of observed heterozygosity, and the number of heterozygous sites was >13 times
437 less than that of AcBoT and AcBoL. Since these races do not self-fertilize, this
18
438 suggests that gene conversion or another mechanism might be operating to reduce the
439 nucleotide diversity within their genomes over time. Such loss of heterozygosity
440 (LOH) has also been reported for Phytophthora capsici (Lamour et al. 2012), and in
441 yeast, LOH events can encompass entire chromosomes, which is thought to be
442 explained by the break-induced replication (BIR) mechanism (Diogo et al. 2009).
443 ‘Hybrid speciation’ or ‘recombinational speciation’ is often associated with a
444 mosaic genome (Baack & Rieseberg 2007; Stukenbrock et al. 2012) and because
445 introgressed genes have been ‘pre-tested’ by selection, they are more likely to be
446 adaptive than random changes to the genetic code by mutations (Hedrick 2013). For
447 example, Helianthus anomalus is a wild sunflower species derived via hybridization
448 between two parental species, and its genome is characterised by parental species
449 blocks. In this case, however, hybridisation and speciation occurred over a short
450 evolutionary time-span (10 – 60 generations) (Ungerer et al. 1998), which differs from
451 hybridisation in A. candida, where it appears to be a persistent evolutionary process.
452 Darwin’s finches are a classic example of a young adaptive radiation, and recent
453 research by Lamichhaney et al. (2015) showed that species from different islands show
454 extensive genomic exchange through recent hybridisation. Introgressive hybridization
455 was found to occur throughout the radiation, fuelling the genetic variation in beak
456 shape, and thereby facilitating adaptive evolution. Perhaps more pertinently, the
457 relationships of members of the parasite’s principal host, plants of the Brassica genus,
458 are themselves considered to be the result of a number of hybridisation events, as
459 defined by the Triangle of U (Nagahara 1935). Hybridisation events have also been
460 implicated in speciation and adaptive radiation in vertebrates (Nichols et al. 2015).
461 Furthermore, rare introgression events among Heliconius butterflies are believed to
462 have facilitated the exchange of mimicry genes across multiple time points post
19
463 speciation (Martin et al. 2013). Evidence for more frequent introgression has been
464 observed in the malaria vector species complex (Anopheles gambiae) (Fontaine et al.
465 2015). More examples of such introgression can be anticipated to emerge given the
466 continued improvement and reduction in cost of DNA sequencing methods, and the
467 development of novel software that enable whole genome recombination (Ward & van
468 Oosterhout, submitted)
469 Perhaps more than any other system, studies on yeasts have offered valuable
470 insights into how introgression and hybridisation can affect genome architectures of
471 eukaryotic microorganisms (reviewed by Dujon 2010). Genetic exchanges amongst
472 three strains of Saccharomyces cerevisiae have been quantified using genomic data
473 which show that the rate of outcrossing is remarkably low, with only 314 outcrossing
474 events during circa 16 million cell divisions (Ruderfer et al. 2006). With such a low
475 rate of genetic exchange, the strains accumulate sequence variation by mutation and
476 thus genetically diverge. Given the low level of genetic exchange amongst the A.
477 candida host races, this may also explain their genetic divergence. In yeast, natural
478 hybrids have been reported in many species (Dujon 2010), with hybridisation leading
479 to the nonreciprocal genetic exchange accompanied by the loss of genes and genomic
480 regions. This results in chimeric sequences from which novel lineages could emerge
481 (Greig et al. 2002; Usher & Bond 2009). For example, Lachancea kluyveri possesses
482 mega-base long chromosomal fragments of distinct composition (Payen et al. 2009).
483 Furthermore, the genomes of wine strains of S. cerevisiae contain introgressed regions
484 from S. paradoxus, S. kudriavzevii, S. uvarum, and Zygosaccharomyces bailii (Dujon
485 2010). Given that the introgressed sequences in the genome of S. cerevisiae are nearly
486 identical to those in the donor genomes, hybridisation must have occurred recently,
487 which is similar to what we observe in A. candida. Although introgression appears to
20
488 be a general phenomenon in yeast genomes, its importance for evolution has yet to be
489 determined (Dujon 2010). The mosaic-like genome structure of A. candida suggests
490 that hybridisation and genetic introgression may play an equally important role in the
491 biology of this oomycete.
492 Introgression can introduce novel adaptive trait combinations (Seehausen 2004;
493 Hedrick 2013) as well as enable the loss by segregation of host-specific “avirulent”
494 effector alleles that can trigger the immune response in potential hosts. In rare novel
495 recombinant races, such maladapted effectors that trigger the response of a specific
496 host may be segregated away, allowing the recombinant to avoid immune recognition
497 and colonise this new host. Once this happens, the new hybrid can rapidly expand its
498 geographic range and population size through clonal reproduction. Hybrids between
499 Phytophthora spp. races can show expanded host range compared to their parental
500 lineages (Ersek et al. 1995) while the importance of virulence gene transfer that
501 subsequently leads to the expansion of pathogens’ host range has also been reported
502 for bacterial and fungal pathogens (Doolittle 1999; Mehrabi et al. 2011).
503 The ability of pathogens to recombine and generate novel recombinant
504 genotypes and subsequently proliferate clonally may be particularly favoured in the
505 agro-ecological environment. Recent fusion between genetically distinct plant
506 pathogens has been shown in Mycosphaerella graminicola (Stukenbrock et al. 2012),
507 where a hybrid speciation event has generated a generalist pathogen of grass species,
508 and in Blumeria graminis (Hacquard et al. 2013), where a mosaic genome structure
509 has been generated by sex between divergent isolates. Adaptation of pathogens to
510 agro-ecosystems can be correlated with a reduction in diversity of recently emerged
511 lineages and at the same time, high levels of genome plasticity (Stukenbrock &
512 Bataillon 2012). This genome plasticity has also been observed in B. graminis
21
513 (Hacquard et al. 2013). Race specialization observed in the potato late blight pathogen
514 P. infestans, the rice blast pathogen Magnaporthe oryzae, and the wheat yellow rust
515 pathogen Puccinia striiformis (Stukenbrock & Bataillon 2012), may represent the
516 result of a broader mechanism of pathogen adaptation to a crop monoculture. A.
517 candida, which is known to live on both wild weeds (e.g. Arabidopsis thaliana) and
518 important crop species (e.g. B. juncea), thus provides remarkable insights into the
519 impact of recombination in generating new virulent races and subsequent clonal
520 propagation of a novel race. These findings are particularly relevant to modern
521 agricultural methods and the emergence of new epidemic pathogen strains on crop
522 monocultures.
523 Materials and Methods
524 Pathogen isolation and cultivation
525 Albugo candida races were isolated and propagated by first washing
526 zoosporangia from infected leaves and then infecting A. thaliana Ws-eds1 (enhanced
527 disease susceptibility (Parker et al. 1996)) plants. After two weeks, one pustule was
528 punched out and spores were treated on ice for 30 min to release zoospores.
529 Unhatched zoosporangia were removed by filtration and zoospores were diluted to
530 ~10 zoospore per ml and sprayed on A. thaliana Ws–eds1 plants (~100μl / plant).
531 This procedure was repeated four times until spores were bulked up on A. thaliana
532 Ws-eds1 plants. Zoosporangia were harvested using a homemade cyclone spore
533 collector (Mehta & Zadoks 1971). Subsequently, A. candida races AcEm2/AcNc2,
534 AcBoT/AcBoL and Ac2v were propagated and maintained on A. thaliana Ws-2, B.
535 oleracea and B. juncea, respectively.
536 Virulence test
22
537 Host specificity was tested for the AcNc2, Ac2V and AcBoT races on a number
538 of A. thaliana and Brassica spp. accessions (Supplementary file 1, 2). A. candida
539 inoculations were performed using the following method: zoospores were suspended in
540 water (105 spores/ml) and incubated on ice for 30 min; the spore suspension was then
541 sprayed on plants using a spray gun (~700 µl/plant), and plants were incubated in a cold
542 room in the dark over night. Infected plants were kept under 10-hour light and 14-hour
543 dark cycles with 20°C day and 16°C night temperature. Plants were scored susceptible if
544 a pathogen was capable of accomplishing its life cycle and sporulation was
545 macroscopically visible within three weeks after plant inoculations.
546 For sequential infection analyses, we developed A. candida race specific PCR
547 primers from genome sequences (Supplementary file 5). Regions in all vs. all alignments
548 were identified that lacked read coverage by other isolates. Primers were designed within
549 these regions to amplify products of between 300 and 800 bases and were tested on pure
550 genomic DNA extracts from each isolate.
551 Primary inoculum was sprayed onto control and test plants. In the case of AcNc2
552 defence suppression assays, both A. thaliana Ws-2 and Ws-eds1 were inoculated; for
553 AcBoT assays, B. oleracea and Ws-eds1 were inoculated; for Ac2v assays, B. juncea and
554 Ws-eds1 were inoculated. Following inoculations, plants were incubated in the dark in a
555 cold room over night before transferring to a growth cabinet set to the conditions
556 described above. In addition to pathogen treatment, the same numbers of plants were also
557 treated with water in order to serve as a negative infection control. At seven days post-
558 inoculation a secondary infection with the avirulent A. candida race was performed on
559 50% of the plants while the remaining 50% were water treated (Supplementary file 6).
560 Plants were returned to the growth cabinet and cultivated for a further 8 days. Inoculated
561 plant tissue was harvested, washed in sterile water to remove surface adhering spores, and
23
562 flash frozen in liquid nitrogen. DNA was prepared using a DNeasy Plant Mini Kit
563 (Qiagen) as described in manufacturers instructions. PCR was performed using race-
564 specific primers and products visualised on a 1% agarose gel.
565 Plants from which tissue was harvested were maintained for a further seven days
566 before re-inoculation onto the original host of the otherwise non-virulent pathogen. This
567 was done in order to confirm the completion of the secondarily inoculated pathogen
568 race’s lifecycle on immunosuppressed non-host plants.
569 DNA extraction and sequencing
570 DNA was extracted from zoosporangia according to the method described in
571 Mckinney et al. (1995), and Illumina libraries for sequencing were constructed
572 according to Farrer et al. (Farrer et al. 2009). Paired-end libraries of 800 bp and 400
573 bp insert lengths (for the race Ac2V only one library of ~ 400 bp) were sequenced
574 using Illumina Genome Analyzer II platform at the Sainsbury Laboratory Sequencing
575 Centre (GA2). The base calling was done on the Illumina GAP v1.3 pipeline.
576 cDNA extraction and sequencing
577 A. thaliana Ws-0 plants were infected with A. candida AcNc2 and infected
578 plants were harvested at 0, 2, 4, 6, 8 and 10 days after infection. RNA was isolated
579 using TRI Reagent RNA Isolation Reagent (Sigma), and subsequently enriched for
580 mRNA with Dynabeads (Invitrogen). cDNA was prepared using the SMART cDNA
581 Library Construction Kit (Clontech) according to manufacturer’s instructions. These
582 libraries were normalized using Evrogen Duplex-specific nuclease (DSN).
583 Normalized cDNA libraries were fragmented using Covaris sonicator and libraries
584 prepared according to Illumina genomic library preparation kit. Libraries were
585 sequenced on the Illumina GA2 platform.
24
586 The sequence data have been deposited at the EMBL Nucleotide Sequence
587 Database, with the accession numbers for A. candida AcNc2: SRR1811450,
588 SRR1811464, AcEm2: SRR1806791, AcBoT: SRR1811472, SRR1811473, AcBoL:
589 SRR1811474, Ac2V: SRR1811471.
590 Genome and transcriptome assembly
591 The genomic assemblies were produced using program Velvet 1.0.19 (Zerbino
592 & Birney 2008). Using BLAST (Altschul et al. 1990), resulting contigs were searched
593 (BLASTN, E-value ≤ 10-5) against genomic sequences of A. thaliana TAIR 9.0 (Kaul
594 et al. 2000), fungi Neurospora crassa (Galagan et al. 2003), a collection of bacterial
595 genomes (such as, Xanthomonas sp. and Pseudomonas sp.:
596 microbialgenomics.energy.gov), and against mitochondrion of Pythium ultimum
597 (Levesque et al. 2010) to remove potential contamination and mitochondrial DNA.
598 The assembly of AcNc2 was processed by merging the overlapping contigs from two
599 velvet assemblies (based on the k-mers 55 and 61) with the Minimus2 genome merge
600 pipeline (Sommer et al. 2007) and in-house perl scripts.
601 Read alignment and mapping was performed using programs BWA (0.7.3) and
602 SAMtools (0.1.17) (Li et al. 2009; Li & Durbin 2009) and BedTools (Quinlan & Hall
603 2010). Duplicates were removed from mapped reads and SNP calling and filtering
604 was done with BCFtools view and varFilter (-D100). Where required, conversion to
605 fasta format from vcf was done using the modified version of vcf2fq which has been
606 modified to include indels (http://sourceforge.net/p/vcftools/feature-requests/19/) and
607 then sequences were aligned using mafft online server
608 (http://mafft.cbrc.jp/alignment/server/).
609
25
610 Illumina sequenced cDNA from the AcNc2 infected A. thaliana Ws-0 leaves
611 was assembled using Velvet/Oases (Schulz et al. 2012) with different k-mer lengths
612 (43, 45, 47, 51, 55, 57, 61, 63). We used various characteristics (total number of
613 contigs, assembly size, longest contig length and mean contig length, and the
614 proportion of core eukaryotic genes (KOGs) predicted by CEGMA to assess
615 assemblies’ quality. Two best assemblies based on the k-mers 55 and 57 were merged
616 using VMATCH (http://www.vmatch.de/). The cDNA orientation was predicted using
617 Illumina generated cDNA 5’ tags. Using Bowtie aligner (Langmead et al. 2009),
618 cDNA 5′ tags were aligned against the assembled cDNA and, based on tag counts,
619 orientated in the 5′ to 3′ direction.
620 Gene prediction and annotation
621 Ab initio gene predictions were performed for A. candida AcNc2 using the
622 Augustus gene prediction package (Stanke et al. 2006), Geneid (Blanco et al. 2002)
623 and GeneMark (Lomsadze et al. 2005). Alternative splice variants were predicted
624 with Augustus. To improve gene predictions, the “hints” files were created using
625 cDNA evidence and gene homology information. The generated library of AcNc2
626 transcripts was aligned to the AcNc2 assembly using BLAT (Kent 2002), setting
627 minimal identity to 92; the “hints” file was produced with script blat2hints.pl
628 provided with the Augustus package. The parameters previously obtained for the gene
629 prediction in the A. laibachii genome project were utilized when running the
630 Augustus and Geneid. GeneMark predictions were made with the default settings.
631 Consensus gene models were generated with Evigan (Liu et al. 2008). Subsequently,
632 the catalog of non-overlapping gene models was created from the Evigan, Augustus,
633 Genemark and Geneid predictions. The gene space coverage was assessed with
634 CEGMA (Parra et al. 2007).
26
635 Functional annotations of AcNc2 proteins were performed via comparison of
636 the predicted protein sequences with the protein databases; UniProtKB (Suzek et al.
637 2007) and NCBI non-redundant RefSeq (Pruitt et al. 2009) databases were scanned
638 using BLASTP algorithm (E-value ≤ 10-5); and Pfam database (Punta et al. 2012) was
639 searched with the program hmmscan from HMMER3 (Eddy 2011) with the default
640 settings. GO terms were assigned with BLAST2GO pipeline (Conesa et al. 2005).
641 Gene families were predicted using Tribe-MCL algorithm that implements
642 Markov cluster (MCL) approach for the clustering of proteins into families based on
643 the pre-computed sequence similarity information (Enright et al. 2002).
644 Signal peptides and cleavage sites were predicted by the hidden Markov
645 Model and the neutral network algorithm implemented in SignlP 4.0 program
646 (Petersen et al. 2011). Transmembrane helices were predicted with the hidden Markov
647 Model in TMHMM 2.0 (Moller et al. 2001).
648 Candidate cytoplasmic effectors carrying “RXLR” motif were identified
649 through the string search of the predicted secreted proteins using “R[A-Z]L[RQ]”
650 regular expression in the first 100 residues downstream of the signal peptide cleavage
651 site. Crinkler’s homologs were detected using BLASTP searches (E-value ≤ 10-5)
652 against NCBI non-redundant RefSeq database (Pruitt et al. 2009). Homologs of A.
653 laibachii “CHXC” proteins were identified through Tribe-MCL clustering, also using
654 BLASTP searches (E-value ≤ 10-5) of the A. laibachii predicted proteome, and string
655 search for the “CH[A-Z]C” motif in the first 100 residues after predicted signal
656 peptide.
657 Repetitive elements
658 The library of repeats in AcNc2 assembly was constructed with RepeatScout
659 (Price et al. 2005) and was joined with the previously made library for A. laibachii
27
660 (Eric Kemen et al. 2011). This updated library in combination with RepeatMasker
661 (http://www.repeatmasker.org/) was used for the identification of repeats and their
662 frequencies. Transposon elements were annotated using TBLASTX searches (E-value
663 ≤ 10-5) against the database of transposon elements, RepBase (Jurka et al. 2005).
664 Phylogenetic analysis
665 Phylogenetic trees were built (MrBayes program (Huelsenbeck & Ronquist
666 2001)) for the nucleotide sequence alignments of the orthologous genes present in
667 four A. candida races and A. laibachii. Trees were built for 100 single copy genes (50
668 core eukaryotic genes and 50 singletons with unknown function) Phylogenetic
669 Bayesian inference and Markov chain Monte Carlo (MCMC) methods were used to
670 estimate the posterior distribution of model parameters. We used lset=6, gamma
671 model, mcmc of 10 million, samfreq=7,000 and burnin = 375. Population genetic
672 parameter “theta” (Θ = 4Neμ) was estimated using mlRho program (Haubold et al.
673 2010). Four different topologies were inferred with equal support which warranted
674 further investigation into the role of introgression..
675 Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software
676 package version 1.7 (Drummond & Rambaut 2007) was used to produce the race
677 phylogeny (based on contig 1). BEAST implements Markov chain Monte Carlo
678 (MCMC) algorithms for Bayesian for divergence time dating (Drummond & Rambaut
679 2007). Bayesian phylogenetic trees were constructed with a HKY+G nucleotide
680 substitution model under a strict molecular clock (with units in mutations per site) and
681 a Yule tree prior. We ran ten independent MCMC analyses each of 10 million steps
682 and a 10% burn-in. MCMC chain mixing was assed using Tracer 1.5 which showed
683 ESS >3000 for each statistic.
684 Recombination analyses and detection of exchanged sequence blocks
28
685 Recombination events were statistically identified on contigs ≥10,000bp using
686 the software RDP3 using five independent detection algorithms: RDP (Heath et al.
687 2006), GENECONV (Padidam et al. 1999), Maxchi (Smith 1992), Chimaera (Posada
688 & Crandall 2001), and 3Seq (Boni et al. 2007). Tests were conducted using a critical
689 value α=0.05 and P-values were Bonferroni corrected for multiple comparisons of
690 sequences. Sequences were linear using unphase base calling and the random
691 assignment of one of the nucleotides at each polymorphic site. Given that
692 recombination algorithms use cis mutations to define regions of a sequence that share
693 the same phylogenetic history, the statistical power to detect recombination is reduced
694 when using unphased data (Darren Martin pers. comm.). This is because the signal of
695 unphase base calling erodes any underlying signal of recombination. This procedure
696 is conservative and underestimates the true number of recombination events because
697 it reduces sequence similarity between the recombinant and parental sequence.
698 Phylogenetic evidence of recombination was required to confirm a recombination
699 event. Window sizes for each detection method were set to defaults. Only events for
700 which the software identified the parental sequences (i.e. no ‘unknowns’) without
701 ambiguous start and end position of the recombination block are reported and used in
702 the analyses. Furthermore, events were only considered to be genuine if they were
703 supported by at least three of the five detection algorithms. Hence, the estimates of
704 the number of recombination events are conservative.
705 The effects of recombination on the sequence similarity between three genomes
706 was visualised using a newly developed code in the R package HybRIDS (Hybrid
707 Recombination, Identification and Dating, Software, http://www.elsa.ac.uk/).
708 HybRIDS uses a colour triangle to visualise the sequence similarity between aligned
709 sequences. It calculates the colour of each 100 bp window based on the proportion of
29
710 SNPs shared between the pairwise sequences. All monomorphic sites were excluded in
711 this calculation. HybRIDS uses the additive colour system in which the primary
712 colours used are red, green, and blue. These colours are plotted on the corners of the
713 RGB colour triangle, which is shown in the legend as a reference. In cases where all
714 SNPs are shared between just two of the three races, the hybrid colour is an exact 50%
715 mix of two primary colours. The hybrid colours are yellow, purple and turquoise, and
716 these colours suggest recent gene exchange between the two races. At such
717 recombined regions, the third race receives its primary colour (because by definition, it
718 must be unique at the 100 bp window and completely dissimilar from the other two
719 races). Older recombined regions are predicted to have accumulated SNPs unique to
720 each race, causing the block of two hybridizing races to diverge. In the graph, the
721 colour of such areas contains more than 50% of the primary colour of that race. The
722 colours in the centre of the triangle are pale and reflect areas where the three races
723 share approximately similar numbers of polymorphisms. The SNPs in these pale
724 regions, and in regions where the colours are close to the primary colours, are more
725 likely caused by mutations than by genetic exchange. Note that primary colours can
726 also be assigned to regions that may have recombined, but which originate from a race
727 that was not sampled, and hence, which polymorphisms were not included in this
728 analysis.
729 Dating recombination events
730 We used the sequence divergence of the recombination block between the
731 recombinant and the minor parent (i.e. the sequence donating the recombinant region)
732 to estimate the divergence time since a recombination event. We used two dating
733 methods, a binomial mass function and an analysis with the Bayesian Evolutionary
734 Analysis by Sampling Trees (BEAST) software package version 1.7 (Drummond &
30
735 Rambaut 2007). A binomial mass function was used to estimate the mean divergence
736 time of a block of given size with an observed number of SNPs. In order to correct for
737 mutation saturation, homoplasy, back mutations and transition / transversion ratios,
738 we converted the observed number of SNPs into the number of mutations using a JC
739 correction (Jukes & Cantor 1969). The probability of finding a number of SNPs less
740 or equal to the observed number in a block of known size was calculated. The mean
741 time is found when the binomial mass function returns a probability value p=0.5. This
742 approach finds the most probable age of the recombination event, and it assumes that
743 since the recombination event, the block evolved neutrally over t years, and that each
744 base has the chance to mutate with a probability µ per year (µ=10-6 and 10-7). The
745 algorithm uses a strict molecular clock, and because the mutation rate in oomycetes is
746 unknown, we assumed µ=10-6 as well as µ=10-7 per base per year. The lower value of
747 the mutation rate of µ=10-7 was used as a more conservative estimate. Given that we
748 do not know the mutation rate of oomycetes, the estimated dates are merely an
749 approximation and shown to illustrate that the exchanged blocks are dated back to a
750 wide range of evolutionary times. The simple dating method based on the binomial
751 mass function was compared to more computationally intensive analysis with BEAST
752 by dating 20 recombination events from the “contig 1” of AcNc2 and performing a
753 linear regression analysis to confirm application of the faster binomial algorithm to all
754 675 recombination events. The principle aim of these analyses was to identify
755 whether or not recombinant regions span a range of dates (in line with the expectation
756 under an introgression model).
757 BEAST bayesian phylogenetic trees were constructed with a HKY+G
758 nucleotide substitution model under a strict molecular clock (µ = 10-6) and a Yule tree
759 prior. We ran ten independent MCMC analyses each of 10 million steps and a 10%
31
760 burn-in for each of the 20 recombination events. MCMC chain mixing was assed
761 using Tracer 1.5 which showed ESS >3000 for each statistic.
762 Note however that the divergence estimates made by the binomial mass
763 function and the analysis with BEAST are conservative (i.e. the true time of the
764 recombination event are probably more recent) given that we had only three
765 sequences in the analysis. Consequently, the ‘true’ parental sequence has probably not
766 been sampled, which means that the observed divergence is larger than that of the
767 actual parental (donor) sequence.
768 Substitution rates and selection
769 The substitution rates (non-synonymous substitution rate per non-
770 synonymous site (dN) and synonymous substitution rate per synonymous site (dS))
771 and the ratio of the dN/dS for the orthologous protein-coding sequences between three
772 isolates were estimated with the M0 model in PAML (Yang 2007). The dN/dS ratio is
773 traditionally used as an indicator of the strength and type of selective constrains acting
774 upon a gene. Values of dN/dS ≈ 1 indicate neutral evolution. Values of dN/dS
775 significantly less than unity indicate purifying selection, whereas dN/dS significantly
776 larger than unity suggest positive selection. The Tajima’s D statistics (Tajima 1989)
777 were not calculated because we do not possess allele frequency data.
778 Acknowledgments
779 We thank Dr. M. Borhan for the providing Ac2V material; Jodie Pike for
780 technical assistance; M. Burrell, Dr. C Schudoma and Dr. D. MacLean for
781 computational assistance. We thank Sophien Kamoun and Kentaro Yoshida for helpful
782 comments on earlier drafts of the manuscript. The BEAST 1.6.2 analyses were carried
783 out on the High Performance Computing Cluster supported by the Research and
784 Specialist Computing Support service at the University of East Anglia. TSL was
32
785 supported by the grant DASTI (Danish Agency for Science, Technology and
786 Innovation) and European Research Council Advanced Investigator grant 233376
787 (ALBUGON); CvO, MM and BW are funded by Earth & Life Systems Alliance
788 (ELSA). EBH was supported by the grant NE/H020691 from NERC (UK Natural
789 Environment Research Council).
790 Author contributions
791 AG, MM conducted bioinformatics analysis; AG, MM, CvO and JJ wrote the
792 manuscript; CvO, BW, MM conducted analysis of recombination; KB and VC
793 performed the co-infection assays and molecular experiment; EK isolated races and
794 established propagation protocols; EK, TSL, ARS, KB and VC performed virulence
795 tests; JJ, CvO and EBH designed the experiments and corrected the manuscript. AG
796 and MM are joint first authors.
797
33
798 References
799 Baack, E.J. & Rieseberg, L.H., 2007. A genomic view of introgression and hybrid 800 speciation. Current Opinion in Genetics & Development, 17(6), pp.513–518. 801 Available at: 802 http://www.sciencedirect.com/science/article/pii/S0959437X07001669.
803 Barbetti, M.J. & Carter, E.C., 1986. Diseases of rapeseed., Perth: Western Australia 804 Department of Agriculture.
805 Baxter, L. et al., 2010. Signatures of Adaptation to Obligate Biotrophy in the 806 Hyaloperonospora arabidopsidis Genome. Science, 330(6010), pp.1549–1551. 807 Available at: 808 Biga, M.L.B., 1955. Review of the species of the genus Albugo based on the 809 morphology of the conidia. Sydowia, 9, pp.339–358. 810 Blanco, E.., Parra, G.. & Guigo, R., 2002. Using geneid to Identify Genes. A. 811 Baxevanis, ed., New York: John Wiley & Sons Inc. 812 Boni, M.F., Posada, D. & Feldman, M.W., 2007. An Exact Nonparametric Method for 813 Inferring Mosaic Structure in Sequence Triplets. Genetics, 176(2), pp.1035– 814 1047. Available at: http://www.genetics.org/content/176/2/1035.abstract. 815 Borhan, M.H. et al., 2008. WRR4 encodes a TIR-NB-LRR protein that confers broad- 816 spectrum white rust resistance in Arabidopsis thaliana to four physiological races 817 of Albugo candida. Molecular Plant-Microbe Interactions, 21(6), pp.757–768. 818 Available at: 819 Choi, D. & Priest, M.J., 1995. A Key to the Genus Albugo. Mycotaxon, 53, pp.261– 820 272. Available at: 821 Choi, Y.J. et al., 2011. Three new phylogenetic lineages are the closest relatives of the 822 widespread species Albugo candida. Fungal Biology, 115(7), pp.598–607. 823 Available at: 824 Choi, Y.J., Shin, H.D. & Thines, M., 2009. The host range of Albugo candida extends 825 from Brassicaceae through Cleomaceae to Capparaceae. Mycological Progress, 826 8(4), pp.329–335. Available at: 827 Cooper, A.J. et al., 2008. Basic compatibility of Albugo candida in Arabidopsis 828 thaliana and Brassica juncea causes broad-spectrum suppression of innate 829 immunity. Molecular Plant-Microbe Interactions, 21(6), pp.745–756. Available 830 at: 831 Dong, S. et al., 2014. Effector specialization in a lineage of the Irish potato famine 832 pathogen. Science (New York, N.Y.), 343(6170), pp.552–5. Available at: 833 http://www.ncbi.nlm.nih.gov/pubmed/24482481 [Accessed July 18, 2014]. 34 834 Doolittle, W.F., 1999. Lateral genomics. Trends in Cell Biology, 9(12), pp.M5–M8. 835 Available at: 836 http://www.sciencedirect.com/science/article/pii/S0962892499016645. 837 Drès, M. & Mallet, J., 2002. Host races in plant–feeding insects and their importance 838 in sympatric speciation. Philosophical Transactions of the Royal Society of 839 London. Series B: Biological Sciences, 357(1420), pp.471–492. Available at: 840 http://rstb.royalsocietypublishing.org/content/357/1420/471.abstract. 841 Ersek, T., English, J.T. & Schoelz, J.E., 1995. Creation of Species Hybrids of 842 Phytophthora with Modified Host Ranges by Zoospore Fusion. Phytopathology, 843 85(11), pp.1343–1347. Available at: 844 Farrer, R.A. et al., 2009. De novo assembly of the Pseudomonas syringae pv. syringae 845 B728a genome using Illumina/Solexa short sequence reads. Fems Microbiology 846 Letters, 291(1), pp.103–111. Available at: 847 Haas, B.J. et al., 2009. Genome sequence and analysis of the Irish potato famine 848 pathogen Phytophthora infestans. Nature, 461(7262), pp.393–398. Available at: 849 850 Hacquard, S. et al., 2013. Mosaic genome structure of the barley powdery mildew 851 pathogen and conservation of transcriptional programs in divergent hosts. 852 Proceedings of the National Academy of Sciences, 110(24), pp.E2219–E2228. 853 Available at: 854 http://www.pnas.org/content/early/2013/05/21/1306807110.abstract. 855 Harper, F.R. & Pittma, U.J., 1974. Yield loss by Brassica campestris and B. napus 856 from systemic stem infection by Albugo cruciferarum. Phytopathology, 64, 857 pp.408–410. 858 Heath, L. et al., 2006. Recombination patterns in aphthoviruses mirror those found in 859 other picornaviruses. 860 Hiura, M., 1930. Biological forms of Albugo candida (Pers.) Kuntze on some 861 cruciferous plants. Japanese Journal of Botany, 5(1e20). 862 Hobolth, A. et al., 2007. Genomic Relationships and Speciation Times of Human, 863 Chimpanzee, and Gorilla Inferred from a Coalescent Hidden Markov Model. 864 PLoS Genet, 3(2), p.e7. Available at: 865 http://dx.plos.org/10.1371/journal.pgen.0030007. 866 Kemen, E. et al., 2011. Gene Gain and Loss during Evolution of Obligate Parasitism 867 in the White Rust Pathogen of Arabidopsis thaliana. Plos Biology, 9(7). 868 Available at: 869 Kole, C. et al., 2002. Linkage mapping of genes controlling resistance to white rust 870 (Albugo candida) in Brassica rapa (syn. campestris) and comparative mapping to 871 Brassica napus and Arabidopsis thaliana. Genome, 45(1), pp.22–27. 35 872 Lamour, A.K.H. et al., 2012. Genome sequencing and mapping reveal loss of 873 heterozygosity as a mechanism for rapid adaptation in the vegetable pathogen 874 Phytophthora capsici . Molecular Plant-Microbe Interactions, 25, pp.1350–1360. 875 Lebrun, I. et al., 2009. Bacterial Toxins: An Overview on Bacterial Proteases and 876 their Action as Virulence Factors. Mini-Reviews in Medicinal Chemistry, 9(7), 877 pp.820–828. Available at: 878 Links, M. et al., 2011. De novo sequence assembly of Albugo candida reveals a small 879 genome relative to other biotrophic oomycetes. BMC Genomics, 12(1), p.503. 880 Available at: http://www.biomedcentral.com/1471-2164/12/503. 881 Liu, J.Q., P, P. & Rimmer, S.R., 1996. Development of monogenic lines for resistance 882 to Albugo candida from a Canadian Brassica napus cultivar. Phytopathology, 883 86(9), pp.1000–1004. 884 Lomsadze, A. et al., 2005. Gene identification in novel eukaryotic genomes by self- 885 training algorithm. Nucleic Acids Research, 33(20), pp.6494–6506. Available at: 886 887 Martin, D.P. et al., 2010. RDP3: a flexible and fast computer program for analyzing 888 recombination. Bioinformatics, 26(19), pp.2462–2463. Available at: 889 Martin, F. & Kamoun, S. eds., 2012. Effectors in Plant-Microbe Interactions., Wiley- 890 Blackwell. 891 McHale, L. et al., 2006. Plant NBS-LRR proteins: adaptable guards. Genome Biology, 892 7(212). 893 Mckinney, E.C. et al., 1995. Sequence-Based Identification of T-DNA Insertion 894 Mutations in Arabidopsis - Actin Mutants Act2-1 and Act4-1. Plant Journal, 895 8(4), pp.613–622. Available at: 896 Meena, P.D. et al., 2002. Yield loss in Indian mustard due to White rust and effect of 897 some cultural practices on Alternaria blight and White rust severity. Brassica, 898 4(1, 2), pp.18–24. 899 Mehrabi, R. et al., 2011. Horizontal gene and chromosome transfer in plant 900 pathogenic fungi affecting host range. Fems Microbiology Reviews, 35(3), 901 pp.542–554. Available at: 902 Monod, M. et al., 2002. Secreted proteases from pathogenic fungi. International 903 Journal of Medical Microbiology, 292(5-6), pp.405–419. Available at: 904 Padidam, M., Sawyer, S. & Fauquet, C.M., 1999. Possible Emergence of New 905 Geminiviruses by Frequent Recombination. Virology, 265(2), pp.218–225. 906 Available at: 907 http://www.sciencedirect.com/science/article/pii/S0042682299900569. 36 908 Pamilo, P. & Nei, M., 1988. Relationship between gene trees and species trees. 909 Molecular Biology and Evolution, 5, pp.568–583. 910 Parker, J.E. et al., 1996. Characterization of eds1, a mutation in Arabidopsis 911 suppressing resistance to Peronospora parasitica specified by several different 912 RPP genes. Plant Cell, 8(11), pp.2033–2046. Available at: 913 Peng, B. & Kimmel, M., 2005. simuPOP: a forward-time population genetics 914 simulation environment. Bioinformatics, 21(18), pp.3686–3687. Available at: 915 http://bioinformatics.oxfordjournals.org/content/21/18/3686.abstract. 916 Petrie, G., 1988. Races of Albugo candida (white rust and staghead) on cultivated 917 Cruciferae in Saskatchewan. Canadian Journal of Plant Pathology, 918 10(142e150). 919 Ploch, S. et al., 2010. Evolution of diversity in Albugo is driven by high host 920 specificity and multiple speciation events on closely related Brassicaceae. 921 Molecular Phylogenetics and Evolution, 57(2), pp.812–820. Available at: 922 Posada, D. & Crandall, K.A., 2001. Evaluation of methods for detecting 923 recombination from DNA sequences: Computer simulations. Proceedings of the 924 National Academy of Sciences of the United States of America, 98(24), 925 pp.13757–13762. Available at: 926 Poulin, R. & Keeney, D.B., 2014. Host specificity under molecular and experimental 927 scrutiny. Trends in Parasitology, 24(1), pp.24–28. Available at: 928 http://www.cell.com/trends/parasitology/abstract/S1471-4922(07)00281-4. 929 Pound, G.S. & Williams, P.H., 1963. Biological races of Albugo candida. 930 Phytopathology, 63(1146e1149). 931 Rimmer, S.R., Mathur, S. & Wu, C.R., 2000. Virulence of isolates of Albugo candida 932 from western Canada to Brassica species. Canadian Journal of Plant Pathology, 933 22(3), pp.229–235. 934 Saharan, G.S. & Verma, P.R., 1992. White Rusts: a review of economically important 935 species., International Development Research Centre, Ottawa, Ontario. 936 Seehausen, O., 2004. Hybridization and adaptive radiation. Trends in Ecology & 937 Evolution, 19(4), pp.198–207. Available at: 938 http://www.sciencedirect.com/science/article/pii/S0169534704000047. 939 Smith, J.M., 1992. Analyzing the mosaic structure of genes. Journal of Molecular 940 Evolution, 34(2), pp.126–129. Available at: 941 http://dx.doi.org/10.1007/BF00182389. 942 Soanes, D.M., Richards, T.A. & Talbot, N.J., 2007. Insights from Sequencing Fungal 943 and Oomycete Genomes: What Can We Learn about Plant Disease and the 944 Evolution of Pathogenicity? The Plant Cell Online, 19(11), pp.3318–3326. 945 Available at: http://www.plantcell.org/content/19/11/3318.short. 37 946 Stanke, M. et al., 2006. Gene prediction in eukaryotes with a generalized hidden 947 Markov model that uses hints from external sources. Bmc Bioinformatics, 7. 948 Available at: 949 Stukenbrock, E., 2013. Evolution, selection and isolation: a genomic view of 950 speciation in fungal plant pathogens. New Phytol, 199(4), pp.895–907. 951 Stukenbrock, E.H. et al., 2012. Fusion of two divergent fungal individuals led to the 952 recent emergence of a unique widespread pathogen species. Proceedings of the 953 National Academy of Sciences, 109(27), pp.10954–10959. 954 Stukenbrock, E.H. & Bataillon, T., 2012. A population genomics perspective on the 955 emergence and adaptation of new plant pathogens in agro-ecosystems. Plos 956 Pathogens, 8(9), p.e1002893. 957 Thines, M. et al., 2009. A new species of Albugo parasitic to Arabidopsis thaliana 958 reveals new evolutionary patterns in white blister rusts (Albuginaceae). 959 Persoonia, 22, pp.123–128. Available at: 960 Thines, M., 2014. Phylogeny and evolution of plant pathogenic oomycetes—a global 961 overview. European Journal of Plant Pathology, 138(3), pp.431–447. Available 962 at: http://link.springer.com/10.1007/s10658-013-0366-5 [Accessed August 19, 963 2014]. 964 Thompson, J., 2005. The geographic mosaic of coevolution., Chicago: University of 965 Chicago Press. 966 Ungerer, M.C. et al., 1998. Rapid hybrid speciation in wild sunflowers. Proceedings 967 of the National Academy of Sciences, 95(20), pp.11757–11762. Available at: 968 http://www.pnas.org/content/95/20/11757.abstract. 969 Voglmayr, H. & Greilhuber, J., 1998. Genome size determination in Peronosporales 970 (Oomycota) by Feulgen image analysis. Fungal Genetics and Biology, 25(3), 971 pp.181–195. Available at: 972 Walker, J. & Priest, M.J., 2007. A new species of Albugo on Pterostylis 973 (Orchidaceae) from Australia: confirmation of the genus Albugo on a 974 monocotyledonous host. Australian Plant Pathology, 36, pp.181–185. 975 Zerbino, D.R. & Birney, E., 2008. Velvet: Algorithms for de novo short read 976 assembly using de Bruijn graphs. Genome Research, 18(5), pp.821–829. 977 Available at: 978 979 38 980 Figures titles and legends 981 982 Figure 1. Phylogeny shows that the five sequenced Albugo candida isolates fall into three 983 divergent races. BEAST tree constructed based on using contig 1 (398,508 bp) shows the 984 divergence among the three A. candida races (AcEm2 and AcNc2; AcBoT and AcBoL; Ac2V). Blue 985 bars represent the 95% Higher Posterior Density (HPD). Here, we used a strict molecular clock 986 fixed at 1.0 in order to show the relationship in the scale of substitutions per site. 987 Figure 2. Comparison of A. candida races using alignments of Illumina reads against the 988 AcNc2 assembly. A - Positive correlation between the depths of coverage of the reference 989 assembly (AcNc2) by the Ac2V and AcBoT reads. For the reference contigs less than 20 kb, the 990 mean coverage was calculated across the whole contig length and log-transformed. For the contigs 991 over 20 kb, the mean coverage was calculated for the sliding window of 20 kb and log-transformed. 992 Y-axis shows the log-transformed depth of the reference coverage by the Ac2V reads; X-axis shows 993 the log-transformed depth of the reference coverage by the AcBoT reads. B – Nucleotide identity 994 amongst the homologous genomic regions of Ac2V, AcBoT and AcNc2. The mean identity was 995 calculated for the sliding window of 20 kb. 996 Figure 2 Supplement 1. Coverage of the reference assembly (AcNc2) by Ac2V and AcBoT. 997 Proportion of mean coverage is a measure relative to the depth of coverage of the reference reads 998 mapped backed to the reference assembly. The proportion of AcNc2 assembly not covered by 999 AcBoT and/or Ac2V reads (≤ 10% of the mean coverage) was 3.17% (971,030 bp) in Ac2V and 1000 4.21% (1,285,877 bp) in AcBoT. 1001 Figure 3. Nucleotide polymorphism within and between A. candida isolates. Mean (±5 – 95%CI) 1002 polymorphism expressed as the percentage observed heterozygote sites (solid symbols) and 1003 percentage nucleotide divergence (open symbols) at contig 1. Confidence intervals were calculated 1004 using a bootstrap of contig 1 after removal of indels. Isolates infecting the same host plant (i.e. 39 1005 AcBoT-AcBoL and AcEm2- AcNc2) show little nucleotide divergence, which indicates that they 1006 are genotypically almost identical (i.e. diverged by less than 0.05%). Nevertheless, the Brassica 1007 oleracea infecting race (AcBoT and AcBoL) possess a relatively high heterozygosity compared to 1008 the isolates of the Arabidopsis thaliana infecting race. Moreover, most of this heterozygous 1009 polymorphism is shared (low nucleotide divergence) and presence of the majority of heterozygous 1010 sites is consistent with clonal reproduction. 1011 Figure 4. Variation in sequence similarity between races. A – Alignment of nucleotides in 1012 between positions 158,779 and 167,382 within “contig 1” of three A. candida races (AcNc2, AcBoT 1013 and Ac2V) illustrating two recombination blocks coloured blue and green. Both blocks show high 1014 sequence similarity between races. Also shown is the sequence divergence in between blocks. 1015 Alignment gaps and monomorphic sites have been removed. B - The sequence similarity at “contig 1016 1” amongst three A. candida races was visualised using the colours of a RBG colour triangular in the 1017 software HybRIDS (http://www.norwichresearchpark.com/HybRIDS). Areas where two contigs 1018 have the same colour (yellow, purple or turquoise) are indicative of two races sharing the same 1019 polymorphisms. The linear plot of the proportion of SNPs shared between the three pairwise 1020 comparisons between the races. Shown on the X-axis is the actual base position. The graphs were 1021 made in the R package HybRIDS (http://www.norwichresearchpark.com/HybRIDS). 1022 Figure 5. Age of recombination blocks: A - Age of the 675 recombination blocks (mutation rate of 1023 μ=10-6) estimated using binomial mass function; B - Boxplot of the median (plus 1st nation blocks 1024 and 3rd quartile) log-age of recombination events in contigs. Only contigs with 8 or more events are 1025 shown. There is no significant difference in age of events between contigs (GLM: F22,233=1.06, 1026 p=0.387). 1027 Figure 6. PCR with race-specific markers on DNA prepared from plant tissue generated from 1028 co-infection assays. A - Co-infection assay of AcNc2 followed by Ac2V onto Ws-eds1 and Ws-2. B 1029 - Co-infection of AcBoT followed by Ac2V onto Ws-eds1 and B. oleracea (B.o). C - Co-infection 40 1030 of Ac2V followed by AcNc2 onto Ws-eds1 and B. juncea (B.j). Bands highlighted in orange indicate 1031 amplification of secondary inoculum on usually non-host plants upon primary inoculation with 1032 virulent A. candida. These experiments were repeated multiple times with similar results (see 1033 Supplementary file 6). 41 1034 Tables 1035 Table 1. Virulence of the A. candida races on different plant host accessions. 1036 A. thaliana B. rapa B. junceae B. oleraceae A. candida race + − + − + − + − AcNc2 137 219 0 1 0 1 0 18 AcBoT 1a 34 0 1 0 2 15 0 Ac2V 1ab 107 1c 4c 6c 0c 0 2 1037 1038 + Host-pathogen compatible interactions (number of susceptible accessions) 1039 − Host-pathogen incompatible interactions (number of resistant accessions) 1040 a A. thaliana Ws-eds1 (enhanced disease susceptibility) mutants were susceptible to all 1041 tested A. candida races. 1042 b In the laboratory conditions, the cotyledons of the A. thaliana accession Ws-3 were found 1043 to be susceptible to the Ac2V (Cooper et al. 2008b). 1044 c Data from (Rimmer et al. 2000) incorporated; in this study, one cultivar B. rapa (CrGC1- 1045 18, rapid-cycling accession) was infected by Ac2V race and four other tested cultivars (“Torch”, 1046 “Colt”, “Horizon”, “Reward”) were incompatible with Ac2V race. All analysed cultivars of B. 1047 juncea (CrGC4-1S, “Burgonde”, “Domo”, “Cutlass”) were susceptible to Ac2V. 1048 42 1049 Table 2. Summary of the A. candida AcNc2, AcEm2, AcBoT, AcBoL and Ac2V genome assemblies. AcNc2 AcEm2 AcBoT AcBoL Ac2V Number of Contigs 5,212 11581 11,929 11,143 12,210 N50 Length (bp) 41,078 29,326 14,673 14,953 24,005 N50 Number 231 284 584 581 353 Mean Contig Length (bp) 6,610 2668 2,781 2,998 2,770 Assembly Size (bp) 34,454,169 33,409,146 33,184,526 33,409,856 33,823,601 GC Content (%) 43.19% 43.09% 43.15% 43.15% 43.11% CEGMA Gene Space 93.55% 92.74% 93.55% 93,13% 92.34% Coverage (%)* Average Genome 160 150 140 140 200 Coverage Repeat Content (%) 17.4% NA NA NA NA Predicted genes 10,907 NA NA NA NA 1050 1051 * - Completeness of the gene space in the different genome assemblies was estimated using CEGMA pipeline; the presence of over 90% of core 1052 eukaryotic genes in the assembly serves as an indication of a overall complete gene space. 43 1053 Supplementary file captions 1054 Supplementary file 1 1055 List of Arabidopsis thaliana and Brassica spp. accessions assayed in virulence tests with 1056 Albugo candida race type isolates AcBoT, Ac2V and AcNc2. 1057 Supplementary file 2 1058 List of Arabidopsis thaliana accessions assayed in virulence tests with Albugo candida 1059 isolates AcNc2 and Ac2v. 1060 Supplementary file 3 1061 Polymorphisms between Albugo candida race genomic regions verified with Sanger 1062 sequencing. 1063 Supplementary file 4 1064 List of the predicted recombination blocks generated by the RDP3 software 1065 Supplementary file 5 1066 Details of race-specific primers. 1067 Supplementary file 6 1068 Details of co-infections assays. 44 AcNc2 AcEm2 AcBoT AcBoL Ac2V 0.0 0.001 0.002 0.003 0.004 ! "# A AcNc2 AcBoT Ac2V 158779 162760 165018 167373 167854 B