1 Evidence for suppression of immunity as a driver for genomic

2 introgressions and host range expansion in races of candida, a generalist

3 parasite

4 Mark McMullana,e†, Anastasia Gardinera†, Kate Baileya, Eric Kemenc, Ben J.

5 Wardb, Volkan Cevika, Alexandre Robert-Seilaniantza, Torsten Schultz-Larsena, Alexi

6 Balmuthd, Eric Holubf, Cock van Oosterhoutb‡, Jonathan D. G. Jonesa‡

7 a - The Sainsbury Laboratory, Norwich Research Park, Norwich, NR47UH, UK;

8 b - School of Environmental Sciences, University of East Anglia, Norwich Research

9 Park, Norwich NR4 7TJ, UK; c - Max Planck Research Group Fungal Biodiversity,

10 Max Planck Institute for Plant Breeding Research, Carl-von-Linne Weg 10, Cologne,

11 50829, Germany; d - J.R. Simplot Company , 5369 Irving street, Boise, Idaho 83706,

12 USA; e - The Genome Analysis Centre, Norwich Research Park, Norwich, NR47UH,

13 UK; f - University of Warwick, School of Life Sciences, Warwick Crop Centre,

14 Wellesbourne, CV35 9EF, UK.

15

16 † Authors contributed equally to this work.

17 ‡ Corresponding Authors: Jonathan D. G. Jones, The Sainsbury Laboratory,

18 Norwich Research Park, Norwich, NR47UH, UK; tel +44(0)1603450400;

19 [email protected] and Cock van Oosterhout, School of Environmental

20 Sciences, University of East Anglia, Norwich Research Park, Norwich NR47TJ, UK;

21 tel. +44(0)1603592921; [email protected]

22 Competing interests. The authors declare that they have no competing financial,

23 professional or personal interests that might have influenced the performance or

24 presentation of the work described in this manuscript.

1

25

26

27 Abstract

28 How generalist parasites with wide host ranges can evolve is a central question in

29 parasite evolution. is an obligate biotrophic parasite that consists of

30 many physiological races that each specialize on distinct Brassicaceae host species. By

31 analyzing genome sequence assemblies of five isolates, we show they represent three

32 races that are genetically diverged by ~1%. Despite this divergence, their genomes are

33 mosaic-like, with ~25% being introgressed from other races. Sequential infection

34 experiments show that infection by adapted races enables subsequent infection of

35 hosts by normally non-infecting races. This facilitates introgression and the exchange

36 of effector repertoires, and may enable the evolution of novel races that can undergo

37 clonal population expansion on new hosts. We discuss recent studies on hybridization

38 in other such as yeast, Heliconius butterflies, Darwin’s finches, sunflowers

39 and cichlid fishes, and the implications of introgression for pathogen evolution in an

40 agro-ecological environment.

41 Introduction

42 Most parasites have a restricted host range and are often unable to exploit even closely

43 related hosts (Thompson 2005; Poulin & Keeney 2014). Compared to necrotrophs that

44 reproduce on dead plant material, obligate biotrophic parasites can only reproduce on

45 living tissue, and thus are intimately associated with their hosts (Thines 2014). This

46 might be expected to result in host specialization. The adaptive evolution of, for

47 example, new effectors that enable more efficient exploitation of one host species,

48 increases the risk of detection in other host species by triggering their immune system

2

49 (Martin & Kamoun 2012). Due to this trade-off, the saying “Jack of all trades, master

50 of none” is especially true for obligate biotrophic parasites, because natural selection

51 that maximises parasite fitness on one particular host species might lead to

52 specialisation and reduced host range (e.g. Dong et al. 2014). Yet there are generalist

53 biotrophic parasites that appear to have overcome this evolutionary dilemma and show

54 virulence on diverse hosts.

55 Some “generalist” parasite species have solved the dilemma by evolving

56 multiple specialised races, each of which infect different hosts. For example, the

57 eukaryotic order Albuginales consists entirely of obligate biotrophic

58 pathogens that cause disease on a broad range of plant hosts (Biga 1955; Choi & Priest

59 1995; Walker & Priest 2007). Its largest genus, Albugo, comprises ~50 (usually)

60 specialist pathogens (Choi et al. 2009; Thines et al. 2009; Ploch et al. 2010; Choi et al.

61 2011), yet A. candida (Pers.) Roussel can infect over 200 species of plants in 63

62 genera from the families of Brassicaceae, Cleomaceae and Capparaceae (Saharan &

63 Verma 1992; Choi et al. 2009). A. candida infections are the causal agent of “white

64 blister rust” disease resulting in 9 – 60% losses on economically important oilseed and

65 vegetable brassica crops. (Saharan & Verma 1992; Harper & Pittma 1974; Meena et

66 al. 2002; Barbetti & Carter 1986). A. candida consists of different physiological races

67 that each usually show high host specificity (Hiura 1930; Pound & Williams 1963;

68 Petrie 1988). ~24 races of A. candida have been defined, based primarily on their host

69 range (Saharan & Verma 1992).

70 A. candida is a diploid organism that reproduces both asexually and

71 sexually (Holub et al. 1995), but the relative importance of both reproductive modes

72 is not well established. During asexual reproduction, diploid zoospores are formed in a

73 special propagule (the zoosporangium) on a thallus of hyphae beneath the leaf

3

74 epidermis. White blisters comprising large numbers of dehydrated sporangia

75 eventually rupture the epidermis to release inoculum for dispersal. When reproducing

76 sexually, fertilization between two isolates results in non-motile diploid thick-walled

77 oospores that can resist extreme temperatures and desiccation. Although the relative

78 importance of different mechanisms of reproduction in the Albugo life cycle remains

79 poorly understood, clonal reproduction enables rapid population expansion, especially

80 in genetically uniform crop monocultures.

81 A. candida is considered a single species despite the fact it comprises several

82 specialised physiological races that colonize different host plants (cf. Drès & Mallet

83 2002). According to evolutionary and population genetic theory, adaptations and

84 trade-offs associated with host-specialisation combined with strong population

85 structuring can result in adaptive radiation and speciation (Stukenbrock 2013; Abbott

86 et al. 2013). Conceivably, A. candida is an adaptive radiation “in progress,” and the

87 broad host range is realised by ongoing specialisation of independent races that are on

88 the road to speciation (Drès & Mallet 2002). This raises the important question; does

89 parasite specialization inevitably lead to speciation?

90 Albugo spp. infection strongly suppresses host innate immunity and Albugo spp.

91 are unique compared to other microbial plant pathogens in enhancing host

92 susceptibility to secondary infection by otherwise avirulent pathogens, including

93 downy mildews (Cooper et al. 2008a). It has been suggested that enhanced

94 susceptibility imposed by Albugo might accelerate adaptation of other pathogen

95 species to an Albugo-susceptible host (Thines 2014). However, no evolutionary

96 rationale has been put forward to explain why it might be adaptive for Albugo sp. to

97 render its hosts so susceptible to other pathogens that could compete for access to the

98 same resources (Cooper et al. 2008a). Arguably, suppression of host innate immunity

4

99 could facilitate cohabitation of distinct physiological races and thus may enable

100 genetic exchange between them. Introgression, here defined as the introduction by

101 recombination of syntenic nucleotide variation from a parental donor race into the

102 genome of a recipient race (Hedrick 2013), could slow down genetic divergence, and

103 hence, retard speciation. On the other hand, introgression between races that are well-

104 adapted to exploit different host plants could be maladaptive and strongly selected

105 against because hybrids will inherit effector alleles derived from both parental races.

106 Given that immune recognition of even a single effector is sufficient to trigger the

107 immune response and stop an infection, hybrids that possess an expanded repertoire of

108 effector alleles are likely to have a strong fitness disadvantage on most potential host

109 plants.

110 Much of the ecology and evolution of A. candida remains unknown, but with its

111 many specialized races and a broad host range, this “generalist” plant pathogen is a

112 fascinating study organism. The questions we addressed in the present study are: (1)

113 Are the distinct physiological A. candida races genetically isolated and “on the road to

114 speciation”? (2) Does suppression of host innate immunity enable cohabitation and

115 growth of races with non-overlapping host ranges? To answer these questions, we

116 generated genome sequence assemblies of five isolates that were collected from four

117 host species (Brassica oleracea, B. juncea, Capsella bursa-pastoris, and Arabidopsis

118 thaliana). We sequenced one isolate of one race, and two isolates of each of two

119 additional races. We show that these races are not genetically isolated despite having

120 non-overlapping host ranges. Recombination analysis shows there is widespread

121 genetic exchange between A. candida races, and that hybridisation leading to

122 introgression has occurred numerous times, which include exchanges in the recent

123 past. To explain this observation, we examined whether pre-infection of Arabidopsis

5

124 and Brassica with virulent A. candida races results in enhanced host susceptibility, and

125 found that pre-infection with a virulent strain enables proliferation of an A. candida

126 isolate that would otherwise not colonize that host. For the two races with two isolates,

127 we show that population expansion is by clonal reproduction. We discuss the impact

128 of genetic exchange on A. candida evolution, and consider the implications for

129 pathogen evolution and reproduction in an agro-ecological environment.

130

131 Results

132 A. candida host specificity: single race isolates are host specific

133 AcNc2 was recovered from infected leaves of A. thaliana Eri-1 field-grown

134 plants in Norwich (UK) in 2007. AcEm2 was isolated from wild Capsella bursa-

135 pastoris in Kent (UK) in 1993 (Borhan et al. 2008). Isolate AcBoT was harvested from

136 infected inflorescences of B. oleracea cultivar “Bordeaux F1” in Lincolnshire in May

137 2009, and another isolate AcBoL was harvested from infected leaves of B. oleracea in

138 Lincolnshire (UK) in January 2009. Races were single spore purified (E Kemen et al.

139 2011). The Ac2V isolate virulent on B. juncea was provided by M. Borhan

140 (Agriculture and Agri-Food, Canada (Links et al. 2011)) and single-spore purified.

141 We tested the virulence of single race infections of AcNc2, Ac2V and AcBoT

142 on different host species and cultivars. AcNc2 was propagated on A. thaliana Ws-2.

143 Two B. juncea cultivars (“Cutlass” and “Czerniac”) and 18 cultivars of B. oleracea

144 were resistant to AcNc2. We screened 356 A. thaliana accessions for their

145 susceptibility to AcNc2 and 38.5 % were susceptible (Table 1, Supplementary files 1,

146 2). Race Ac2V was originally isolated in Canada on B. juncea (Rimmer et al. 2000).

147 We confirmed virulence of Ac2V on B. juncea by spray inoculation of B. juncea

6

148 “Cutlass” and “Czerniac”. On three-week-old A. thaliana plants, all 107 tested

149 accessions were resistant. Two B. oleracea cultivars were also resistant to Ac2V

150 (Table 1, Supplementary files 1, 2). Race AcBoT was virulent on all fifteen tested

151 cultivars of B. oleracea. In contrast, 34 accessions of A. thaliana were fully resistant to

152 AcBoT, as were B. rapa and B. juncea cultivars (Table 1, Supplementary file 1).

153 These tests confirm that A. candida races show pronounced host specificity to distinct

154 host species. While we cannot prove that there are no host species in nature that

155 support growth of more than one of the three races we define here, our experiments

156 and all literature strongly suggests that Ac2V only grows on B. juncea, and on some

157 accessions of B. rapa, a diploid ancestor of tetraploid B. juncea (Kole et al. 2002),

158 AcBoT and AcBoL only grow on B. oleracea, and AcNc2 and AcEm2 can only grow

159 on a subset of Arabidopsis and Capsella genotypes. Genetic exchange between races

160 is unlikely to occur unless they colonize the same host. In our study, only the immune-

161 compromised A. thaliana Ws-2-eds1 mutant was susceptible to all races.

162

163 Genome assemblies of A. candida isolates

164 The AcNc2 A. candida assembly was used as the reference in this study, and

165 comprises 34 Mb in 5212 contigs of ~160-fold coverage (Table 2). We assembled

166 ~73% of an estimated 45 Mb genome of A. candida AcNc2 (Voglmayr & Greilhuber

167 1998; Links et al. 2011). In a previous study, a similar proportion (76%) of the A.

168 candida Ac2V race genome was assembled (Links et al. 2011). The unassembled part

169 of the genome (~11 Mb) is likely to include repeats and duplicated sequences. Repeat

170 sequences constitute ~17.4% of the AcNc2 assembly. Approximately 8% of annotated

7

171 repeats represent collapsed regions with coverage several times higher than average,

172 so the real repeat content may be higher.

173 Ab initio gene predictions were conducted with several gene prediction

174 programs, resulting in 10,907 predicted gene models. About 90% (9,830) of the

175 predicted proteins have homologous sequences in the proteome of A. candida Ac2V

176 race (15,824 genes) (Links et al. 2011). In about one thousand cases, when we predict

177 a single copy gene in the AcNc2 race, Links and co-authors (2011) have predicted

178 multi-gene families in the Ac2V race, explaining the discrepancy in the predicted gene

179 number for two assemblies.

180 Only 37% of AcNc2 proteins showed significant sequence similarity to known

181 proteins (BLASTP E-value ≤ 10-5). Using the Tribe-MCL algorithm, 3522 genes (32%

182 of the predicted AcNc2 gene repertoire) were clustered into 1020 gene families. The

183 largest gene tribes are protein kinases (187 members), kinesin- (59 genes) and myosin-

184 (44 genes) like proteins and 35 genes homologous to the secreted “CHXC” proteins of

185 (E Kemen et al. 2011).

186 915 genes in the AcNc2 assembly are predicted to encode putative proteins with

187 amino-terminal secretory signal peptides, but no trans-membrane domain. Only 34%

188 of the predicted secretome was functionally annotated (BLASTP, E-value ≤ 10-5),

189 including 115 proteins (proteases, hydrolases, elicitin-like proteins, elicitors, protease

190 inhibitors) that could be involved in plant cell wall degradation and protection against

191 host defense enzymes. In addition to the 35 CHxC proteins (Tyler et al. 2006), further

192 candidate virulence factors were identified including 19 homologs of the Phytophthora

193 Crinkler effectors (Haas et al. 2009), and another 23 secreted proteins with “RXLR”

194 and 35 proteins with similar “RXLQ” motifs. Both motifs are located in the N-

8

195 terminal part of protein after the predicted signal peptide, thus resembling the RXLR

196 effectors of P. infestans and H. arabidopsidis (Baxter et al. 2010; Haas et al. 2009),

197 but not having any other significant sequence similarity to these proteins.

198 After carrying out several assemblies based on different k-mer lengths, the

199 quality of each assembly was assessed with various parameters and one best assembly

200 was chosen for each isolate (Table 2). The high similarity of the five A. candida

201 isolates enabled us to conclude we had sequenced three ‘races’, within which AcNc2

202 and AcEm2 were isolates of the same race and AcBoT and AcBoL were also isolates

203 of the same race (Figure 1). Therefore, genome comparisons were first conducted on

204 one representative from each race (AcNc2, Ac2V and AcBoT), from each of which

205 ~33-34 Mb of genome was assembled (Table 2).

206

207 Genome-wide similarity between races with non-overlapping host range

208 To assess the overall genome-wide similarity between races, we performed

209 alignments of reads against the AcNc2 reference assembly. For the majority of the

210 AcNc2 genome, we observed a significant positive correlation between read depth in

211 the reference assembly and mapping depth of the Ac2V and AcBoT reads (r=0.65,

212 P<2.2e-16; Figure 2A). Some AcNc2 regions (3-4% of the assembly) showed low or

213 zero coverage by Ac2V and/or AcBoT reads (Figure 2 Supplement 1), suggesting the

214 presence of highly divergent or unique regions amongst the races. These are gene

215 sparse regions (150 and 234 genes predicted in the AcNc2, respectively), without

216 apparent enrichment for genes encoding for secreted proteins (χ2 = 0.11, d.f. = 1, p >

217 0.7). Amplification of randomly selected AcNc2 genes from these regions revealed

9

218 that four of the selected five genes are indeed absent/or highly diverged in the AcBoT

219 and Ac2V races, and present in the AcNc2 genome.

220 The overall mean level of nucleotide identity in the homologous genomic

221 regions amongst races is ~99% (Figure 2B). We verified 25 polymorphic genomic

222 regions by Sanger sequencing (Supplementary file 3). We used the longest of all

223 contigs from AcNc2, ‘contig 1’ (398,508 bp), to compare the levels of divergence

224 between races versus the number of heterozygous positions within each race. An

225 extremely low proportion of sites (0.03% and 0.01%) on ‘contig 1’ are heterozygous

226 within AcNc2, AcEm2 and Ac2V races, respectively (Figure 3). Races AcBoT and

227 AcBoL are more heterozygous than Ac2V and AcNc2, with 0.65% of nucleotide

228 positions in ‘contig 1’ being heterozygous in AcBoT (Figure 3). Importantly, >97% of

229 all heterozygous positions are shared in AcBoL and AcBoT (see below). In between-

230 race comparisons, ~1.0% of nucleotide positions on ‘contig 1’ have diverged between

231 AcBoT, Ac2V and AcNc2.

232

233 Mosaic-like genome structure of A. candida races

234 Polymorphisms are not homogeneously distributed among A. candida races.

235 Some regions of the genome are identical for up to ten thousand base pairs, whereas

236 the local nucleotide identity is as low as 89% in other regions of up to 5 kb (Figure

237 2B). We examined 133 contigs (12,373,253 bp), covering 38% of the reference

238 assembly. Stretches of nucleotide similarity amongst races are distributed in a block-

239 like structure; there are regions where AcNc2 is highly similar (or identical) to AcBoT

240 and significantly (nucleotide divergence π>1%) diverged from Ac2V, and vice versa

241 (Figure 4). By using multiple different algorithms that can detect recombination in

10

242 DNA sequence data incorporated in the software RDP3 (Martin et al. 2010), we

243 examined whether this pattern can be explained by genetic introgression amongst

244 races. In addition, we used the software HybRIDS

245 (http://www.norwichresearchpark.com/HybRIDS) to perform a probabilistic

246 recombination analysis, calculate the coalescence time of each recombinant block, and

247 visualize the mosaic-like genome structure.

248 Recombination analysis of 133 contigs highlighted a total of 675 recombined

249 blocks on 127 contigs that were significant (after Bonferroni correction) in three or

250 more tests using RDP3 (Supplementary file 4). The combined length of all identified

251 blocks is nearly 3 Mb or ~25% of the analysed regions. These blocks indicate regions

252 of genome in one race that derive from another race (or the ancestor of another race).

253 Figure 4 illustrates the effect of genetic introgression on the pattern of nucleotide

254 similarity between the three races in the largest contig of ~400 kb. The sequence

255 (dis)similarity between the three races shows a mosaic-like genome structure with

256 large regions where races AcNc2 and AcBoT show near sequence identity (yellow

257 blocks in Figure 4B), whilst other areas show a high level of sequence similarity

258 between AcNc2 and Ac2V, and AcBoT and Ac2V (indicated by the purple and

259 turquoise regions, respectively). Note that the presence of such well-defined blocks of

260 high sequence similarity in an otherwise diverged genome is characteristic for rare

261 introgression between organisms that show a high (yet incomplete) level of

262 reproductive isolation. Noteworthy too is the fact that a high level of recombination

263 (relative to the mutation rate) would homogenise the sequence divergence between the

264 races, and hence, that this would not result in the observed mosaic-like structure.

265 Despite the fact that introgression between races is rare, it must have occurred

266 multiple times between the ancestors of the three races given that the coalescence

11

267 times varies markedly between the different blocks (Figure 5). Assuming a base

268 mutation rate of µ=10-8 per cell cycle, with 100 cell cycles per year (i.e. a combined

269 mutation rate of 10-6 per year), analysis in the software HybRIDS show that the most

270 recent introgression event has occurred circa 220 years ago, whilst the oldest event

271 occurred almost 200,000 years ago. The mean age calculated across all introgression

272 events equals 6,237 (±12,594) years (Figure 5). (With a combined mutation rate of

273 µ=10-7 per base per year, the range in the age of introgression would span from 2,200

274 to 2,000,000 years). Irrespective of the mutation rate, the principal finding is that

275 genetic introgression amongst A. candida races is an ongoing evolutionary process

276 occurring across a wide range of evolutionary times, and that it gives rise to mosaic

277 genomes with the introgression blocks interspersed in the recipient genomic

278 background.

279 A total of 1,655 predicted genes are located in the recombined regions, and

280 amongst these, 125 are predicted to encode secreted proteins. In the introgression

281 regions, we identified 14 genes encoding secreted proteases and hydrolases that in

282 some pathogens act as virulence factors (Lebrun et al. 2009; Monod et al. 2002;

283 Soanes et al. 2007). Thus, recombination between races, resulting in introgression can

284 act as a mechanism for exchange of virulence gene alleles. However, neither gene

285 density nor dN/dS are enriched or depleted in regions affected by introgression,

286 compared to regions not affected (Paired t-test: T=-0.05, P=0.958; Mann-Whitney test:

287 W=3.15×166, P=0.152, respectively). It is however entirely likely that by studying

288 only a small subset of all the known races, we have underestimated the actual level of

289 introgression across all races.

290

12

291 Intra-race diversity suggests clonal propagation after creation of novel adapted

292 allele repertoires

293 To better understand the evolution and diversification of A. candida genomes,

294 we analysed the pattern of recombination and nucleotide divergence with the inclusion

295 of two additional A. candida isolates. AcBoL is an additional isolate from the AcBoT

296 race and AcEm2 is an additional isolate of the AcNc2 race (see Figure 1). We found

297 evidence for 581 recombination events, of which 335 included AcBoT or AcBoL as

298 recombinants. These two isolates shared a recombinant block in 97.3% of

299 recombination events (AcBoT and AcBoL = 326; AcBoT = 6; AcBoL = 3). AcNc2

300 and AcEm2 shared a recombinant block in 99.6% of events (AcNc2 and

301 AcEm2 = 246; AcNc2 = 0; AcEm2 = 1). This demonstrates that the AcBoT/AcBoL

302 and AcNc2/AcEm2 races have remained largely unchanged since their initial

303 emergence.

304 The nucleotide diversity within genomes (i.e. the observed heterozygosity) and

305 the nucleotide divergence between genomes (i.e. genetic differentiation) can be used to

306 further understand A. candida population biology. Isolates AcBoT and AcBoL were

307 both collected in Lincolnshire in 2009, and the observed heterozygosity in these

308 isolates was at least 13 times higher than that of any of the other races (percentage

309 heterozygous positions in contig 1: AcBoT = 0.653%; AcBoL = 0.644%;

310 AcNc2 = 0.047%; AcEm2 = 0.044%; Ac2V = 0.028%) (see Figure 3). Remarkably,

311 AcBoT and AcBoL are heterozygous for almost all of the same sites; 97.2% of the

312 sites that are heterozygous in AcBoT are also heterozygous in AcBoL (and 98.5% vice

313 versa). This is only consistent with clonal reproduction because after just one

314 generation of sexual reproduction (or selfing with recombination), Mendelian

315 segregation would eradicate this high level of genotypic similarity. Notably, such

13

316 evidence for clonal propagation of a race would be difficult to obtain for haploid

317 fungal pathogens.

318 With little evidence for recombination and gene flow, most nucleotide

319 divergence between AcBoT and AcBoL must have accumulated through mutation.

320 The nucleotide divergence between these isolates is just 0.030% (121 polymorphisms

321 in 398,508 bp in contig 1) (Figure 3). Assuming a mutation rate of 1 x 10-8 per cell

322 division, and 100 cell divisions per lineage per year, we estimate that these isolates

323 could have diverged 305 (262-353) years ago. The other pair of isolates, AcNc2 and

324 AcEm2, were collected in Norfolk in 2007 and in Kent in 1993 (160 km apart),

325 respectively. Similar to the former two isolates, AcNc2 and AcEm2 are also nearly

326 identical (nucleotide divergence π=0.046%, i.e. they are >99.95% identical) (Figure 3).

327 If we again assume that their nucleotide divergence arose by mutation alone, their

328 estimated divergence time is 890 (814-970) years. However, unlike AcBoT and

329 AcBoL, AcNc2 and AcEm2 are much less heterozygous (see Figure 3), which

330 suggests that another genetic mechanism, e.g. loss of heterozygosity (Lamour et al.

331 2012), might be operating which has eradicated the gene diversity in the clonally

332 propagating races over time.

333

334 Sequential infections abolish host specificity of susceptibility to A. candida

335 infection

336 A. candida infection compromises host resistance against otherwise avirulent

337 pathogen species . Conceivably, A. candida could suppress host defenses to otherwise

338 avirulent races of A. candida, enabling co-infection and sexual exchange. To test this

339 we performed sequential inoculation experiments, identifying races using the genome

14

340 sequences to create isolate-specific DNA markers. thaliana accession Ws-2 leaves

341 towards the B. juncea-infecting race, Ac2V. Also, preinfection by AcBoT suppresses

342 B. oleracea resistance towards Ac2V (Figure 6B). Furthermore, defense suppression

343 was so effective that Ac2V was able to complete its life cycle on both A. thaliana Ws-

344 2 and B. oleracea as observed by successful subsequent infection on B. juncea from

345 the sequentially inoculated plants. In a reciprocal experiment, ). It has long been noted

346 that Albugo sp. have a remarkable capacity to suppress immunity in their hosts Isolates

347 AcNc2 and AcEm2 can infect ~40% of accessions of A. thaliana, and some Capsella

348 accessions; isolates AcBoT and AcBoL have only been found to infect B. oleracea,

349 and race Ac2V can only infect B. juncea and some accessions of B. rapa (one of the

350 two diploid ancestors of B. juncea) (see Table 1). (Cooper et al. 2008a)Race-specific

351 PCR of pre-inoculated plants (Supplementary file 5, 6) shows that preinfection by

352 AcNc2 suppresses resistance in A. pre-infection of B. juncea with Ac2V enabled

353 AcNc2 growth on B. juncea (Figure 6C). Therefore, AcNc2 not only can suppress

354 Ac2V recognition on A. thaliana, but Ac2V is also capable of suppressing B. juncea

355 resistance towards AcNc2. TIR-NB-LRR resistance genes likely confer Ac2V

356 resistance in Arabidopsis (McHale et al. 2006), as Ac2V grows on an eds1-1 mutant of

357 A. thaliana Ws-2 (Supplementary file 6; (Parker et al. 1996)(Cooper et al. 2008a). We

358 hypothesise that suppression of host innate immunity enables co-infection of hosts by

359 races with otherwise non-overlapping host ranges, thus providing a remarkable

360 mechanism to enable sexual genetic exchange between specialised A. candida races.

361

362 Discussion

15

363 A. candida comprises distinct races that specialize on different plant species (Liu et al.

364 1996; Rimmer et al. 2000) and its physiological races can infect over 200 species of

365 plants. Here, we describe the genomes of five isolates from three races of A. candida

366 that colonize distinct host plant species and appear to have non-overlapping host

367 ranges. Genome analyses show that the races have a mosaic-like genome structure that

368 is consistent with genetic introgression between races that have significantly diverged

369 (mean nucleotide divergence ~1%). Despite this divergence, ~25% of the nucleotide

370 sequence within the analysed genome (3.2 Mb out of 12.4 Mb) is derived from other

371 A. candida races. 675 introgressed blocks were identified in the 127 analysed contigs

372 and each block was confirmed by at least three independent recombination algorithms.

373 Based on the high nucleotide similarity of the recombined regions, it appears that

374 some genetic exchange must have occurred recently and that introgression has

375 happened multiple times between the ancestors of the three races.

376 An alternative hypothesis for the observed mosaic-like genome structure is that

377 the polymorphisms were already present in the ancestor of A. candida, and that those

378 may have sorted stochastically among descendant host races. This is known as

379 incomplete lineage sorting (Pamilo & Nei 1988; Hobolth et al. 2007), and such

380 ancestrally shared polymorphism is notoriously difficult to distinguish from

381 polymorphisms shared through secondary contact and introgression. However, by

382 estimating the coalescence time of the introgressed regions we showed that

383 hybridisation between the races is an ongoing evolutionary process, with some

384 introgression events occurring as recent as 220 years ago. We therefore reject the

385 hypothesis of incomplete lineage sorting and propose that genetic exchanges between

386 the A. candida host races have occurred through rare but nevertheless significant

387 introgression events.

16

388 If host-ranges are determined by multiple loci, one might expect that

389 recombination will quickly lead to “super genotypes” with very wide host ranges.

390 However, this is unlikely because of the dual consequences of effector alleles; they not

391 only can contribute to virulence, but if recognized by Resistance (R) gene alleles in a

392 particular host, they can result in the complete loss of virulence on that host (Dodds &

393 Rathjen 2010). Effectors selected to facilitate infection and evade recognition by R

394 genes in one host might be recognised by the R genes in another host and be

395 maladaptive. Thus, genotypes that are highly virulent on multiple hosts are unlikely to

396 arise by recombination between races that specialize on distinct specific hosts,

397 although occasionally, acquiring an effector from one race by introgression might

398 offer an adaptive advantage assuming that the effector is not recognised.

399 Strikingly, each race shared a mosaic pattern of large genomic regions

400 (>10,000bp) that were virtually identical between two races (and diverged from the

401 third). Given that mutations accumulate over time, this implies that the exchange must

402 have occurred relatively recently. Yet, A. candida is an obligate biotrophic parasite

403 that is highly host-specific. The host-specificity of A. candida races was confirmed in

404 our experiments, and this raises the question of how distinct A. candida races can

405 achieve the physical contact required for them to have sex and recombine. We

406 addressed this question using experimental infections of host plants with multiple

407 races. Crucially, we showed that infection with a virulent race of A. candida

408 suppresses host immunity sufficiently to enable subsequent co-colonization by an

409 otherwise non-virulent race (see also Cooper et al. 2008). Although this opens up

410 competition between races for the same (limited) resource, it also brings a significant

411 evolutionary/ecological advantage by blurring the borders of host range, thereby

412 creating secondary contact zones that enable sexual reproduction and genetic exchange

17

413 between races. Given the high-host specificity of obligate biotrophic parasites, this

414 ability of A. candida to enable occasional genetic exchange between specialised races

415 could create novel repertoires of effector alleles that would enable colonization of new

416 hosts (“host jumps”). For example, Arabidopsis resistance to race Ac2V can at least in

417 part be explained by the WRR4 gene, a classical NB-LRR R gene, which presumably

418 recognizes an Ac2V effector (Borhan et al. 2008). Hypothetically, if this Ac2V

419 effector would segregate away in hybrid offspring, or by loss of heterozygosity, these

420 offspring could become virulent on WRR4-carrying Arabidopsis hosts.

421 Once a new hybrid race has been established, it can reproduce asexually and

422 clonally on susceptible hosts without continued genetic exchange with other races. We

423 base this inference on the exceptionally high genotypic similarity between independent

424 isolates that infect the same host plant (i.e. AcBoT-AcBoL and AcEm2-AcNc2). For

425 example, AcBoT and AcBoL share >97% of their heterozygous sites. This observation

426 is significant because it rules out sexual reproduction (or selfing with recombination),

427 given that Mendelian segregation would eradicate this high level of genotypic

428 similarity within a single generation. Hence, we conclude that reproduction of AcBoT

429 and AcBoL is asexual and clonal, and we speculate that these races may have derived

430 from a common ancestor that was selected coincident with the onset of widespread B.

431 oleracea cultivation in Europe. The A. thaliana-infecting isolates (AcNc2 and AcEm2)

432 were sampled at a broad spatiotemporal scale (14 years and 100 miles apart), and yet

433 they showed a very high nucleotide similarity (>99.95% identical). This shows that

434 clonal reproduction coincided with their rapid population expansion. Remarkably

435 though, compared to the former two isolates, AcEm2 and AcNc2 had a relatively low

436 level of observed heterozygosity, and the number of heterozygous sites was >13 times

437 less than that of AcBoT and AcBoL. Since these races do not self-fertilize, this

18

438 suggests that gene conversion or another mechanism might be operating to reduce the

439 nucleotide diversity within their genomes over time. Such loss of heterozygosity

440 (LOH) has also been reported for Phytophthora capsici (Lamour et al. 2012), and in

441 yeast, LOH events can encompass entire chromosomes, which is thought to be

442 explained by the break-induced replication (BIR) mechanism (Diogo et al. 2009).

443 ‘Hybrid speciation’ or ‘recombinational speciation’ is often associated with a

444 mosaic genome (Baack & Rieseberg 2007; Stukenbrock et al. 2012) and because

445 introgressed genes have been ‘pre-tested’ by selection, they are more likely to be

446 adaptive than random changes to the genetic code by mutations (Hedrick 2013). For

447 example, anomalus is a wild sunflower species derived via hybridization

448 between two parental species, and its genome is characterised by parental species

449 blocks. In this case, however, hybridisation and speciation occurred over a short

450 evolutionary time-span (10 – 60 generations) (Ungerer et al. 1998), which differs from

451 hybridisation in A. candida, where it appears to be a persistent evolutionary process.

452 Darwin’s finches are a classic example of a young adaptive radiation, and recent

453 research by Lamichhaney et al. (2015) showed that species from different islands show

454 extensive genomic exchange through recent hybridisation. Introgressive hybridization

455 was found to occur throughout the radiation, fuelling the genetic variation in beak

456 shape, and thereby facilitating adaptive evolution. Perhaps more pertinently, the

457 relationships of members of the parasite’s principal host, plants of the Brassica genus,

458 are themselves considered to be the result of a number of hybridisation events, as

459 defined by the Triangle of U (Nagahara 1935). Hybridisation events have also been

460 implicated in speciation and adaptive radiation in vertebrates (Nichols et al. 2015).

461 Furthermore, rare introgression events among Heliconius butterflies are believed to

462 have facilitated the exchange of mimicry genes across multiple time points post

19

463 speciation (Martin et al. 2013). Evidence for more frequent introgression has been

464 observed in the malaria vector species complex (Anopheles gambiae) (Fontaine et al.

465 2015). More examples of such introgression can be anticipated to emerge given the

466 continued improvement and reduction in cost of DNA sequencing methods, and the

467 development of novel software that enable whole genome recombination (Ward & van

468 Oosterhout, submitted)

469 Perhaps more than any other system, studies on yeasts have offered valuable

470 insights into how introgression and hybridisation can affect genome architectures of

471 eukaryotic microorganisms (reviewed by Dujon 2010). Genetic exchanges amongst

472 three strains of Saccharomyces cerevisiae have been quantified using genomic data

473 which show that the rate of outcrossing is remarkably low, with only 314 outcrossing

474 events during circa 16 million cell divisions (Ruderfer et al. 2006). With such a low

475 rate of genetic exchange, the strains accumulate sequence variation by mutation and

476 thus genetically diverge. Given the low level of genetic exchange amongst the A.

477 candida host races, this may also explain their genetic divergence. In yeast, natural

478 hybrids have been reported in many species (Dujon 2010), with hybridisation leading

479 to the nonreciprocal genetic exchange accompanied by the loss of genes and genomic

480 regions. This results in chimeric sequences from which novel lineages could emerge

481 (Greig et al. 2002; Usher & Bond 2009). For example, Lachancea kluyveri possesses

482 mega-base long chromosomal fragments of distinct composition (Payen et al. 2009).

483 Furthermore, the genomes of wine strains of S. cerevisiae contain introgressed regions

484 from S. paradoxus, S. kudriavzevii, S. uvarum, and Zygosaccharomyces bailii (Dujon

485 2010). Given that the introgressed sequences in the genome of S. cerevisiae are nearly

486 identical to those in the donor genomes, hybridisation must have occurred recently,

487 which is similar to what we observe in A. candida. Although introgression appears to

20

488 be a general phenomenon in yeast genomes, its importance for evolution has yet to be

489 determined (Dujon 2010). The mosaic-like genome structure of A. candida suggests

490 that hybridisation and genetic introgression may play an equally important role in the

491 biology of this oomycete.

492 Introgression can introduce novel adaptive trait combinations (Seehausen 2004;

493 Hedrick 2013) as well as enable the loss by segregation of host-specific “avirulent”

494 effector alleles that can trigger the immune response in potential hosts. In rare novel

495 recombinant races, such maladapted effectors that trigger the response of a specific

496 host may be segregated away, allowing the recombinant to avoid immune recognition

497 and colonise this new host. Once this happens, the new hybrid can rapidly expand its

498 geographic range and population size through clonal reproduction. Hybrids between

499 Phytophthora spp. races can show expanded host range compared to their parental

500 lineages (Ersek et al. 1995) while the importance of virulence gene transfer that

501 subsequently leads to the expansion of pathogens’ host range has also been reported

502 for bacterial and fungal pathogens (Doolittle 1999; Mehrabi et al. 2011).

503 The ability of pathogens to recombine and generate novel recombinant

504 genotypes and subsequently proliferate clonally may be particularly favoured in the

505 agro-ecological environment. Recent fusion between genetically distinct plant

506 pathogens has been shown in Mycosphaerella graminicola (Stukenbrock et al. 2012),

507 where a hybrid speciation event has generated a generalist pathogen of grass species,

508 and in Blumeria graminis (Hacquard et al. 2013), where a mosaic genome structure

509 has been generated by sex between divergent isolates. Adaptation of pathogens to

510 agro-ecosystems can be correlated with a reduction in diversity of recently emerged

511 lineages and at the same time, high levels of genome plasticity (Stukenbrock &

512 Bataillon 2012). This genome plasticity has also been observed in B. graminis

21

513 (Hacquard et al. 2013). Race specialization observed in the potato late blight pathogen

514 P. infestans, the rice blast pathogen Magnaporthe oryzae, and the wheat yellow rust

515 pathogen Puccinia striiformis (Stukenbrock & Bataillon 2012), may represent the

516 result of a broader mechanism of pathogen adaptation to a crop monoculture. A.

517 candida, which is known to live on both wild weeds (e.g. ) and

518 important crop species (e.g. B. juncea), thus provides remarkable insights into the

519 impact of recombination in generating new virulent races and subsequent clonal

520 propagation of a novel race. These findings are particularly relevant to modern

521 agricultural methods and the emergence of new epidemic pathogen strains on crop

522 monocultures.

523 Materials and Methods

524 Pathogen isolation and cultivation

525 Albugo candida races were isolated and propagated by first washing

526 zoosporangia from infected leaves and then infecting A. thaliana Ws-eds1 (enhanced

527 disease susceptibility (Parker et al. 1996)) plants. After two weeks, one pustule was

528 punched out and spores were treated on ice for 30 min to release zoospores.

529 Unhatched zoosporangia were removed by filtration and zoospores were diluted to

530 ~10 zoospore per ml and sprayed on A. thaliana Ws–eds1 plants (~100μl / plant).

531 This procedure was repeated four times until spores were bulked up on A. thaliana

532 Ws-eds1 plants. Zoosporangia were harvested using a homemade cyclone spore

533 collector (Mehta & Zadoks 1971). Subsequently, A. candida races AcEm2/AcNc2,

534 AcBoT/AcBoL and Ac2v were propagated and maintained on A. thaliana Ws-2, B.

535 oleracea and B. juncea, respectively.

536 Virulence test

22

537 Host specificity was tested for the AcNc2, Ac2V and AcBoT races on a number

538 of A. thaliana and Brassica spp. accessions (Supplementary file 1, 2). A. candida

539 inoculations were performed using the following method: zoospores were suspended in

540 water (105 spores/ml) and incubated on ice for 30 min; the spore suspension was then

541 sprayed on plants using a spray gun (~700 µl/plant), and plants were incubated in a cold

542 room in the dark over night. Infected plants were kept under 10-hour light and 14-hour

543 dark cycles with 20°C day and 16°C night temperature. Plants were scored susceptible if

544 a pathogen was capable of accomplishing its life cycle and sporulation was

545 macroscopically visible within three weeks after plant inoculations.

546 For sequential infection analyses, we developed A. candida race specific PCR

547 primers from genome sequences (Supplementary file 5). Regions in all vs. all alignments

548 were identified that lacked read coverage by other isolates. Primers were designed within

549 these regions to amplify products of between 300 and 800 bases and were tested on pure

550 genomic DNA extracts from each isolate.

551 Primary inoculum was sprayed onto control and test plants. In the case of AcNc2

552 defence suppression assays, both A. thaliana Ws-2 and Ws-eds1 were inoculated; for

553 AcBoT assays, B. oleracea and Ws-eds1 were inoculated; for Ac2v assays, B. juncea and

554 Ws-eds1 were inoculated. Following inoculations, plants were incubated in the dark in a

555 cold room over night before transferring to a growth cabinet set to the conditions

556 described above. In addition to pathogen treatment, the same numbers of plants were also

557 treated with water in order to serve as a negative infection control. At seven days post-

558 inoculation a secondary infection with the avirulent A. candida race was performed on

559 50% of the plants while the remaining 50% were water treated (Supplementary file 6).

560 Plants were returned to the growth cabinet and cultivated for a further 8 days. Inoculated

561 plant tissue was harvested, washed in sterile water to remove surface adhering spores, and

23

562 flash frozen in liquid nitrogen. DNA was prepared using a DNeasy Plant Mini Kit

563 (Qiagen) as described in manufacturers instructions. PCR was performed using race-

564 specific primers and products visualised on a 1% agarose gel.

565 Plants from which tissue was harvested were maintained for a further seven days

566 before re-inoculation onto the original host of the otherwise non-virulent pathogen. This

567 was done in order to confirm the completion of the secondarily inoculated pathogen

568 race’s lifecycle on immunosuppressed non-host plants.

569 DNA extraction and sequencing

570 DNA was extracted from zoosporangia according to the method described in

571 Mckinney et al. (1995), and Illumina libraries for sequencing were constructed

572 according to Farrer et al. (Farrer et al. 2009). Paired-end libraries of 800 bp and 400

573 bp insert lengths (for the race Ac2V only one library of ~ 400 bp) were sequenced

574 using Illumina Genome Analyzer II platform at the Sainsbury Laboratory Sequencing

575 Centre (GA2). The base calling was done on the Illumina GAP v1.3 pipeline.

576 cDNA extraction and sequencing

577 A. thaliana Ws-0 plants were infected with A. candida AcNc2 and infected

578 plants were harvested at 0, 2, 4, 6, 8 and 10 days after infection. RNA was isolated

579 using TRI Reagent RNA Isolation Reagent (Sigma), and subsequently enriched for

580 mRNA with Dynabeads (Invitrogen). cDNA was prepared using the SMART cDNA

581 Library Construction Kit (Clontech) according to manufacturer’s instructions. These

582 libraries were normalized using Evrogen Duplex-specific nuclease (DSN).

583 Normalized cDNA libraries were fragmented using Covaris sonicator and libraries

584 prepared according to Illumina genomic library preparation kit. Libraries were

585 sequenced on the Illumina GA2 platform.

24

586 The sequence data have been deposited at the EMBL Nucleotide Sequence

587 Database, with the accession numbers for A. candida AcNc2: SRR1811450,

588 SRR1811464, AcEm2: SRR1806791, AcBoT: SRR1811472, SRR1811473, AcBoL:

589 SRR1811474, Ac2V: SRR1811471.

590 Genome and transcriptome assembly

591 The genomic assemblies were produced using program Velvet 1.0.19 (Zerbino

592 & Birney 2008). Using BLAST (Altschul et al. 1990), resulting contigs were searched

593 (BLASTN, E-value ≤ 10-5) against genomic sequences of A. thaliana TAIR 9.0 (Kaul

594 et al. 2000), fungi Neurospora crassa (Galagan et al. 2003), a collection of bacterial

595 genomes (such as, Xanthomonas sp. and Pseudomonas sp.:

596 microbialgenomics.energy.gov), and against mitochondrion of Pythium ultimum

597 (Levesque et al. 2010) to remove potential contamination and mitochondrial DNA.

598 The assembly of AcNc2 was processed by merging the overlapping contigs from two

599 velvet assemblies (based on the k-mers 55 and 61) with the Minimus2 genome merge

600 pipeline (Sommer et al. 2007) and in-house perl scripts.

601 Read alignment and mapping was performed using programs BWA (0.7.3) and

602 SAMtools (0.1.17) (Li et al. 2009; Li & Durbin 2009) and BedTools (Quinlan & Hall

603 2010). Duplicates were removed from mapped reads and SNP calling and filtering

604 was done with BCFtools view and varFilter (-D100). Where required, conversion to

605 fasta format from vcf was done using the modified version of vcf2fq which has been

606 modified to include indels (http://sourceforge.net/p/vcftools/feature-requests/19/) and

607 then sequences were aligned using mafft online server

608 (http://mafft.cbrc.jp/alignment/server/).

609

25

610 Illumina sequenced cDNA from the AcNc2 infected A. thaliana Ws-0 leaves

611 was assembled using Velvet/Oases (Schulz et al. 2012) with different k-mer lengths

612 (43, 45, 47, 51, 55, 57, 61, 63). We used various characteristics (total number of

613 contigs, assembly size, longest contig length and mean contig length, and the

614 proportion of core eukaryotic genes (KOGs) predicted by CEGMA to assess

615 assemblies’ quality. Two best assemblies based on the k-mers 55 and 57 were merged

616 using VMATCH (http://www.vmatch.de/). The cDNA orientation was predicted using

617 Illumina generated cDNA 5’ tags. Using Bowtie aligner (Langmead et al. 2009),

618 cDNA 5′ tags were aligned against the assembled cDNA and, based on tag counts,

619 orientated in the 5′ to 3′ direction.

620 Gene prediction and annotation

621 Ab initio gene predictions were performed for A. candida AcNc2 using the

622 Augustus gene prediction package (Stanke et al. 2006), Geneid (Blanco et al. 2002)

623 and GeneMark (Lomsadze et al. 2005). Alternative splice variants were predicted

624 with Augustus. To improve gene predictions, the “hints” files were created using

625 cDNA evidence and gene homology information. The generated library of AcNc2

626 transcripts was aligned to the AcNc2 assembly using BLAT (Kent 2002), setting

627 minimal identity to 92; the “hints” file was produced with script blat2hints.pl

628 provided with the Augustus package. The parameters previously obtained for the gene

629 prediction in the A. laibachii genome project were utilized when running the

630 Augustus and Geneid. GeneMark predictions were made with the default settings.

631 Consensus gene models were generated with Evigan (Liu et al. 2008). Subsequently,

632 the catalog of non-overlapping gene models was created from the Evigan, Augustus,

633 Genemark and Geneid predictions. The gene space coverage was assessed with

634 CEGMA (Parra et al. 2007).

26

635 Functional annotations of AcNc2 proteins were performed via comparison of

636 the predicted protein sequences with the protein databases; UniProtKB (Suzek et al.

637 2007) and NCBI non-redundant RefSeq (Pruitt et al. 2009) databases were scanned

638 using BLASTP algorithm (E-value ≤ 10-5); and Pfam database (Punta et al. 2012) was

639 searched with the program hmmscan from HMMER3 (Eddy 2011) with the default

640 settings. GO terms were assigned with BLAST2GO pipeline (Conesa et al. 2005).

641 Gene families were predicted using Tribe-MCL algorithm that implements

642 Markov cluster (MCL) approach for the clustering of proteins into families based on

643 the pre-computed sequence similarity information (Enright et al. 2002).

644 Signal peptides and cleavage sites were predicted by the hidden Markov

645 Model and the neutral network algorithm implemented in SignlP 4.0 program

646 (Petersen et al. 2011). Transmembrane helices were predicted with the hidden Markov

647 Model in TMHMM 2.0 (Moller et al. 2001).

648 Candidate cytoplasmic effectors carrying “RXLR” motif were identified

649 through the string search of the predicted secreted proteins using “R[A-Z]L[RQ]”

650 regular expression in the first 100 residues downstream of the signal peptide cleavage

651 site. Crinkler’s homologs were detected using BLASTP searches (E-value ≤ 10-5)

652 against NCBI non-redundant RefSeq database (Pruitt et al. 2009). Homologs of A.

653 laibachii “CHXC” proteins were identified through Tribe-MCL clustering, also using

654 BLASTP searches (E-value ≤ 10-5) of the A. laibachii predicted proteome, and string

655 search for the “CH[A-Z]C” motif in the first 100 residues after predicted signal

656 peptide.

657 Repetitive elements

658 The library of repeats in AcNc2 assembly was constructed with RepeatScout

659 (Price et al. 2005) and was joined with the previously made library for A. laibachii

27

660 (Eric Kemen et al. 2011). This updated library in combination with RepeatMasker

661 (http://www.repeatmasker.org/) was used for the identification of repeats and their

662 frequencies. Transposon elements were annotated using TBLASTX searches (E-value

663 ≤ 10-5) against the database of transposon elements, RepBase (Jurka et al. 2005).

664 Phylogenetic analysis

665 Phylogenetic trees were built (MrBayes program (Huelsenbeck & Ronquist

666 2001)) for the nucleotide sequence alignments of the orthologous genes present in

667 four A. candida races and A. laibachii. Trees were built for 100 single copy genes (50

668 core eukaryotic genes and 50 singletons with unknown function) Phylogenetic

669 Bayesian inference and Markov chain Monte Carlo (MCMC) methods were used to

670 estimate the posterior distribution of model parameters. We used lset=6, gamma

671 model, mcmc of 10 million, samfreq=7,000 and burnin = 375. Population genetic

672 parameter “theta” (Θ = 4Neμ) was estimated using mlRho program (Haubold et al.

673 2010). Four different topologies were inferred with equal support which warranted

674 further investigation into the role of introgression..

675 Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software

676 package version 1.7 (Drummond & Rambaut 2007) was used to produce the race

677 phylogeny (based on contig 1). BEAST implements Markov chain Monte Carlo

678 (MCMC) algorithms for Bayesian for divergence time dating (Drummond & Rambaut

679 2007). Bayesian phylogenetic trees were constructed with a HKY+G nucleotide

680 substitution model under a strict molecular clock (with units in mutations per site) and

681 a Yule tree prior. We ran ten independent MCMC analyses each of 10 million steps

682 and a 10% burn-in. MCMC chain mixing was assed using Tracer 1.5 which showed

683 ESS >3000 for each statistic.

684 Recombination analyses and detection of exchanged sequence blocks

28

685 Recombination events were statistically identified on contigs ≥10,000bp using

686 the software RDP3 using five independent detection algorithms: RDP (Heath et al.

687 2006), GENECONV (Padidam et al. 1999), Maxchi (Smith 1992), Chimaera (Posada

688 & Crandall 2001), and 3Seq (Boni et al. 2007). Tests were conducted using a critical

689 value α=0.05 and P-values were Bonferroni corrected for multiple comparisons of

690 sequences. Sequences were linear using unphase base calling and the random

691 assignment of one of the nucleotides at each polymorphic site. Given that

692 recombination algorithms use cis mutations to define regions of a sequence that share

693 the same phylogenetic history, the statistical power to detect recombination is reduced

694 when using unphased data (Darren Martin pers. comm.). This is because the signal of

695 unphase base calling erodes any underlying signal of recombination. This procedure

696 is conservative and underestimates the true number of recombination events because

697 it reduces sequence similarity between the recombinant and parental sequence.

698 Phylogenetic evidence of recombination was required to confirm a recombination

699 event. Window sizes for each detection method were set to defaults. Only events for

700 which the software identified the parental sequences (i.e. no ‘unknowns’) without

701 ambiguous start and end position of the recombination block are reported and used in

702 the analyses. Furthermore, events were only considered to be genuine if they were

703 supported by at least three of the five detection algorithms. Hence, the estimates of

704 the number of recombination events are conservative.

705 The effects of recombination on the sequence similarity between three genomes

706 was visualised using a newly developed code in the R package HybRIDS (Hybrid

707 Recombination, Identification and Dating, Software, http://www.elsa.ac.uk/).

708 HybRIDS uses a colour triangle to visualise the sequence similarity between aligned

709 sequences. It calculates the colour of each 100 bp window based on the proportion of

29

710 SNPs shared between the pairwise sequences. All monomorphic sites were excluded in

711 this calculation. HybRIDS uses the additive colour system in which the primary

712 colours used are red, green, and blue. These colours are plotted on the corners of the

713 RGB colour triangle, which is shown in the legend as a reference. In cases where all

714 SNPs are shared between just two of the three races, the hybrid colour is an exact 50%

715 mix of two primary colours. The hybrid colours are yellow, purple and turquoise, and

716 these colours suggest recent gene exchange between the two races. At such

717 recombined regions, the third race receives its primary colour (because by definition, it

718 must be unique at the 100 bp window and completely dissimilar from the other two

719 races). Older recombined regions are predicted to have accumulated SNPs unique to

720 each race, causing the block of two hybridizing races to diverge. In the graph, the

721 colour of such areas contains more than 50% of the primary colour of that race. The

722 colours in the centre of the triangle are pale and reflect areas where the three races

723 share approximately similar numbers of polymorphisms. The SNPs in these pale

724 regions, and in regions where the colours are close to the primary colours, are more

725 likely caused by mutations than by genetic exchange. Note that primary colours can

726 also be assigned to regions that may have recombined, but which originate from a race

727 that was not sampled, and hence, which polymorphisms were not included in this

728 analysis.

729 Dating recombination events

730 We used the sequence divergence of the recombination block between the

731 recombinant and the minor parent (i.e. the sequence donating the recombinant region)

732 to estimate the divergence time since a recombination event. We used two dating

733 methods, a binomial mass function and an analysis with the Bayesian Evolutionary

734 Analysis by Sampling Trees (BEAST) software package version 1.7 (Drummond &

30

735 Rambaut 2007). A binomial mass function was used to estimate the mean divergence

736 time of a block of given size with an observed number of SNPs. In order to correct for

737 mutation saturation, homoplasy, back mutations and transition / transversion ratios,

738 we converted the observed number of SNPs into the number of mutations using a JC

739 correction (Jukes & Cantor 1969). The probability of finding a number of SNPs less

740 or equal to the observed number in a block of known size was calculated. The mean

741 time is found when the binomial mass function returns a probability value p=0.5. This

742 approach finds the most probable age of the recombination event, and it assumes that

743 since the recombination event, the block evolved neutrally over t years, and that each

744 base has the chance to mutate with a probability µ per year (µ=10-6 and 10-7). The

745 algorithm uses a strict molecular clock, and because the mutation rate in is

746 unknown, we assumed µ=10-6 as well as µ=10-7 per base per year. The lower value of

747 the mutation rate of µ=10-7 was used as a more conservative estimate. Given that we

748 do not know the mutation rate of oomycetes, the estimated dates are merely an

749 approximation and shown to illustrate that the exchanged blocks are dated back to a

750 wide range of evolutionary times. The simple dating method based on the binomial

751 mass function was compared to more computationally intensive analysis with BEAST

752 by dating 20 recombination events from the “contig 1” of AcNc2 and performing a

753 linear regression analysis to confirm application of the faster binomial algorithm to all

754 675 recombination events. The principle aim of these analyses was to identify

755 whether or not recombinant regions span a range of dates (in line with the expectation

756 under an introgression model).

757 BEAST bayesian phylogenetic trees were constructed with a HKY+G

758 nucleotide substitution model under a strict molecular clock (µ = 10-6) and a Yule tree

759 prior. We ran ten independent MCMC analyses each of 10 million steps and a 10%

31

760 burn-in for each of the 20 recombination events. MCMC chain mixing was assed

761 using Tracer 1.5 which showed ESS >3000 for each statistic.

762 Note however that the divergence estimates made by the binomial mass

763 function and the analysis with BEAST are conservative (i.e. the true time of the

764 recombination event are probably more recent) given that we had only three

765 sequences in the analysis. Consequently, the ‘true’ parental sequence has probably not

766 been sampled, which means that the observed divergence is larger than that of the

767 actual parental (donor) sequence.

768 Substitution rates and selection

769 The substitution rates (non-synonymous substitution rate per non-

770 synonymous site (dN) and synonymous substitution rate per synonymous site (dS))

771 and the ratio of the dN/dS for the orthologous protein-coding sequences between three

772 isolates were estimated with the M0 model in PAML (Yang 2007). The dN/dS ratio is

773 traditionally used as an indicator of the strength and type of selective constrains acting

774 upon a gene. Values of dN/dS ≈ 1 indicate neutral evolution. Values of dN/dS

775 significantly less than unity indicate purifying selection, whereas dN/dS significantly

776 larger than unity suggest positive selection. The Tajima’s D statistics (Tajima 1989)

777 were not calculated because we do not possess allele frequency data.

778 Acknowledgments

779 We thank Dr. M. Borhan for the providing Ac2V material; Jodie Pike for

780 technical assistance; M. Burrell, Dr. C Schudoma and Dr. D. MacLean for

781 computational assistance. We thank Sophien Kamoun and Kentaro Yoshida for helpful

782 comments on earlier drafts of the manuscript. The BEAST 1.6.2 analyses were carried

783 out on the High Performance Computing Cluster supported by the Research and

784 Specialist Computing Support service at the University of East Anglia. TSL was

32

785 supported by the grant DASTI (Danish Agency for Science, Technology and

786 Innovation) and European Research Council Advanced Investigator grant 233376

787 (ALBUGON); CvO, MM and BW are funded by Earth & Life Systems Alliance

788 (ELSA). EBH was supported by the grant NE/H020691 from NERC (UK Natural

789 Environment Research Council).

790 Author contributions

791 AG, MM conducted bioinformatics analysis; AG, MM, CvO and JJ wrote the

792 manuscript; CvO, BW, MM conducted analysis of recombination; KB and VC

793 performed the co-infection assays and molecular experiment; EK isolated races and

794 established propagation protocols; EK, TSL, ARS, KB and VC performed virulence

795 tests; JJ, CvO and EBH designed the experiments and corrected the manuscript. AG

796 and MM are joint first authors.

797

33

798 References

799 Baack, E.J. & Rieseberg, L.H., 2007. A genomic view of introgression and hybrid 800 speciation. Current Opinion in Genetics & Development, 17(6), pp.513–518. 801 Available at: 802 http://www.sciencedirect.com/science/article/pii/S0959437X07001669.

803 Barbetti, M.J. & Carter, E.C., 1986. Diseases of rapeseed., Perth: Western Australia 804 Department of Agriculture.

805 Baxter, L. et al., 2010. Signatures of Adaptation to Obligate Biotrophy in the 806 Hyaloperonospora arabidopsidis Genome. Science, 330(6010), pp.1549–1551. 807 Available at:

808 Biga, M.L.B., 1955. Review of the species of the genus Albugo based on the 809 morphology of the conidia. Sydowia, 9, pp.339–358.

810 Blanco, E.., Parra, G.. & Guigo, R., 2002. Using geneid to Identify Genes. A. 811 Baxevanis, ed., New York: John Wiley & Sons Inc.

812 Boni, M.F., Posada, D. & Feldman, M.W., 2007. An Exact Nonparametric Method for 813 Inferring Mosaic Structure in Sequence Triplets. Genetics, 176(2), pp.1035– 814 1047. Available at: http://www.genetics.org/content/176/2/1035.abstract.

815 Borhan, M.H. et al., 2008. WRR4 encodes a TIR-NB-LRR protein that confers broad- 816 spectrum white rust resistance in Arabidopsis thaliana to four physiological races 817 of Albugo candida. Molecular Plant-Microbe Interactions, 21(6), pp.757–768. 818 Available at:

819 Choi, D. & Priest, M.J., 1995. A Key to the Genus Albugo. Mycotaxon, 53, pp.261– 820 272. Available at:

821 Choi, Y.J. et al., 2011. Three new phylogenetic lineages are the closest relatives of the 822 widespread species Albugo candida. Fungal Biology, 115(7), pp.598–607. 823 Available at:

824 Choi, Y.J., Shin, H.D. & Thines, M., 2009. The host range of Albugo candida extends 825 from Brassicaceae through Cleomaceae to Capparaceae. Mycological Progress, 826 8(4), pp.329–335. Available at:

827 Cooper, A.J. et al., 2008. Basic compatibility of Albugo candida in Arabidopsis 828 thaliana and Brassica juncea causes broad-spectrum suppression of innate 829 immunity. Molecular Plant-Microbe Interactions, 21(6), pp.745–756. Available 830 at:

831 Dong, S. et al., 2014. Effector specialization in a lineage of the Irish potato famine 832 pathogen. Science (New York, N.Y.), 343(6170), pp.552–5. Available at: 833 http://www.ncbi.nlm.nih.gov/pubmed/24482481 [Accessed July 18, 2014].

34

834 Doolittle, W.F., 1999. Lateral genomics. Trends in Cell Biology, 9(12), pp.M5–M8. 835 Available at: 836 http://www.sciencedirect.com/science/article/pii/S0962892499016645.

837 Drès, M. & Mallet, J., 2002. Host races in plant–feeding insects and their importance 838 in sympatric speciation. Philosophical Transactions of the Royal Society of 839 London. Series B: Biological Sciences, 357(1420), pp.471–492. Available at: 840 http://rstb.royalsocietypublishing.org/content/357/1420/471.abstract.

841 Ersek, T., English, J.T. & Schoelz, J.E., 1995. Creation of Species Hybrids of 842 Phytophthora with Modified Host Ranges by Zoospore Fusion. Phytopathology, 843 85(11), pp.1343–1347. Available at:

844 Farrer, R.A. et al., 2009. De novo assembly of the Pseudomonas syringae pv. syringae 845 B728a genome using Illumina/Solexa short sequence reads. Fems Microbiology 846 Letters, 291(1), pp.103–111. Available at:

847 Haas, B.J. et al., 2009. Genome sequence and analysis of the Irish potato famine 848 pathogen Phytophthora infestans. Nature, 461(7262), pp.393–398. Available at: 849

850 Hacquard, S. et al., 2013. Mosaic genome structure of the barley powdery mildew 851 pathogen and conservation of transcriptional programs in divergent hosts. 852 Proceedings of the National Academy of Sciences, 110(24), pp.E2219–E2228. 853 Available at: 854 http://www.pnas.org/content/early/2013/05/21/1306807110.abstract.

855 Harper, F.R. & Pittma, U.J., 1974. Yield loss by Brassica campestris and B. napus 856 from systemic stem infection by Albugo cruciferarum. Phytopathology, 64, 857 pp.408–410.

858 Heath, L. et al., 2006. Recombination patterns in aphthoviruses mirror those found in 859 other picornaviruses.

860 Hiura, M., 1930. Biological forms of Albugo candida (Pers.) Kuntze on some 861 cruciferous plants. Japanese Journal of Botany, 5(1e20).

862 Hobolth, A. et al., 2007. Genomic Relationships and Speciation Times of Human, 863 Chimpanzee, and Gorilla Inferred from a Coalescent Hidden Markov Model. 864 PLoS Genet, 3(2), p.e7. Available at: 865 http://dx.plos.org/10.1371/journal.pgen.0030007.

866 Kemen, E. et al., 2011. Gene Gain and Loss during Evolution of Obligate Parasitism 867 in the White Rust Pathogen of Arabidopsis thaliana. Plos Biology, 9(7). 868 Available at:

869 Kole, C. et al., 2002. Linkage mapping of genes controlling resistance to white rust 870 (Albugo candida) in Brassica rapa (syn. campestris) and comparative mapping to 871 Brassica napus and Arabidopsis thaliana. Genome, 45(1), pp.22–27.

35

872 Lamour, A.K.H. et al., 2012. Genome sequencing and mapping reveal loss of 873 heterozygosity as a mechanism for rapid adaptation in the vegetable pathogen 874 Phytophthora capsici . Molecular Plant-Microbe Interactions, 25, pp.1350–1360.

875 Lebrun, I. et al., 2009. Bacterial Toxins: An Overview on Bacterial Proteases and 876 their Action as Virulence Factors. Mini-Reviews in Medicinal Chemistry, 9(7), 877 pp.820–828. Available at:

878 Links, M. et al., 2011. De novo sequence assembly of Albugo candida reveals a small 879 genome relative to other biotrophic oomycetes. BMC Genomics, 12(1), p.503. 880 Available at: http://www.biomedcentral.com/1471-2164/12/503.

881 Liu, J.Q., P, P. & Rimmer, S.R., 1996. Development of monogenic lines for resistance 882 to Albugo candida from a Canadian Brassica napus cultivar. Phytopathology, 883 86(9), pp.1000–1004.

884 Lomsadze, A. et al., 2005. Gene identification in novel eukaryotic genomes by self- 885 training algorithm. Nucleic Acids Research, 33(20), pp.6494–6506. Available at: 886

887 Martin, D.P. et al., 2010. RDP3: a flexible and fast computer program for analyzing 888 recombination. Bioinformatics, 26(19), pp.2462–2463. Available at:

889 Martin, F. & Kamoun, S. eds., 2012. Effectors in Plant-Microbe Interactions., Wiley- 890 Blackwell.

891 McHale, L. et al., 2006. Plant NBS-LRR proteins: adaptable guards. Genome Biology, 892 7(212).

893 Mckinney, E.C. et al., 1995. Sequence-Based Identification of T-DNA Insertion 894 Mutations in Arabidopsis - Actin Mutants Act2-1 and Act4-1. Plant Journal, 895 8(4), pp.613–622. Available at:

896 Meena, P.D. et al., 2002. Yield loss in Indian mustard due to White rust and effect of 897 some cultural practices on Alternaria blight and White rust severity. Brassica, 898 4(1, 2), pp.18–24.

899 Mehrabi, R. et al., 2011. Horizontal gene and chromosome transfer in plant 900 pathogenic fungi affecting host range. Fems Microbiology Reviews, 35(3), 901 pp.542–554. Available at:

902 Monod, M. et al., 2002. Secreted proteases from pathogenic fungi. International 903 Journal of Medical Microbiology, 292(5-6), pp.405–419. Available at:

904 Padidam, M., Sawyer, S. & Fauquet, C.M., 1999. Possible Emergence of New 905 Geminiviruses by Frequent Recombination. Virology, 265(2), pp.218–225. 906 Available at: 907 http://www.sciencedirect.com/science/article/pii/S0042682299900569.

36

908 Pamilo, P. & Nei, M., 1988. Relationship between gene trees and species trees. 909 Molecular Biology and Evolution, 5, pp.568–583.

910 Parker, J.E. et al., 1996. Characterization of eds1, a mutation in Arabidopsis 911 suppressing resistance to Peronospora parasitica specified by several different 912 RPP genes. Plant Cell, 8(11), pp.2033–2046. Available at:

913 Peng, B. & Kimmel, M., 2005. simuPOP: a forward-time population genetics 914 simulation environment. Bioinformatics, 21(18), pp.3686–3687. Available at: 915 http://bioinformatics.oxfordjournals.org/content/21/18/3686.abstract.

916 Petrie, G., 1988. Races of Albugo candida (white rust and staghead) on cultivated 917 Cruciferae in Saskatchewan. Canadian Journal of Plant Pathology, 918 10(142e150).

919 Ploch, S. et al., 2010. Evolution of diversity in Albugo is driven by high host 920 specificity and multiple speciation events on closely related Brassicaceae. 921 Molecular Phylogenetics and Evolution, 57(2), pp.812–820. Available at:

922 Posada, D. & Crandall, K.A., 2001. Evaluation of methods for detecting 923 recombination from DNA sequences: Computer simulations. Proceedings of the 924 National Academy of Sciences of the United States of America, 98(24), 925 pp.13757–13762. Available at:

926 Poulin, R. & Keeney, D.B., 2014. Host specificity under molecular and experimental 927 scrutiny. Trends in Parasitology, 24(1), pp.24–28. Available at: 928 http://www.cell.com/trends/parasitology/abstract/S1471-4922(07)00281-4.

929 Pound, G.S. & Williams, P.H., 1963. Biological races of Albugo candida. 930 Phytopathology, 63(1146e1149).

931 Rimmer, S.R., Mathur, S. & Wu, C.R., 2000. Virulence of isolates of Albugo candida 932 from western Canada to Brassica species. Canadian Journal of Plant Pathology, 933 22(3), pp.229–235.

934 Saharan, G.S. & Verma, P.R., 1992. White Rusts: a review of economically important 935 species., International Development Research Centre, Ottawa, Ontario.

936 Seehausen, O., 2004. Hybridization and adaptive radiation. Trends in Ecology & 937 Evolution, 19(4), pp.198–207. Available at: 938 http://www.sciencedirect.com/science/article/pii/S0169534704000047.

939 Smith, J.M., 1992. Analyzing the mosaic structure of genes. Journal of Molecular 940 Evolution, 34(2), pp.126–129. Available at: 941 http://dx.doi.org/10.1007/BF00182389.

942 Soanes, D.M., Richards, T.A. & Talbot, N.J., 2007. Insights from Sequencing Fungal 943 and Oomycete Genomes: What Can We Learn about Plant Disease and the 944 Evolution of Pathogenicity? The Plant Cell Online, 19(11), pp.3318–3326. 945 Available at: http://www.plantcell.org/content/19/11/3318.short.

37

946 Stanke, M. et al., 2006. Gene prediction in eukaryotes with a generalized hidden 947 Markov model that uses hints from external sources. Bmc Bioinformatics, 7. 948 Available at:

949 Stukenbrock, E., 2013. Evolution, selection and isolation: a genomic view of 950 speciation in fungal plant pathogens. New Phytol, 199(4), pp.895–907.

951 Stukenbrock, E.H. et al., 2012. Fusion of two divergent fungal individuals led to the 952 recent emergence of a unique widespread pathogen species. Proceedings of the 953 National Academy of Sciences, 109(27), pp.10954–10959.

954 Stukenbrock, E.H. & Bataillon, T., 2012. A population genomics perspective on the 955 emergence and adaptation of new plant pathogens in agro-ecosystems. Plos 956 Pathogens, 8(9), p.e1002893.

957 Thines, M. et al., 2009. A new species of Albugo parasitic to Arabidopsis thaliana 958 reveals new evolutionary patterns in white blister rusts (). 959 Persoonia, 22, pp.123–128. Available at:

960 Thines, M., 2014. Phylogeny and evolution of plant pathogenic oomycetes—a global 961 overview. European Journal of Plant Pathology, 138(3), pp.431–447. Available 962 at: http://link.springer.com/10.1007/s10658-013-0366-5 [Accessed August 19, 963 2014].

964 Thompson, J., 2005. The geographic mosaic of coevolution., Chicago: University of 965 Chicago Press.

966 Ungerer, M.C. et al., 1998. Rapid hybrid speciation in wild sunflowers. Proceedings 967 of the National Academy of Sciences, 95(20), pp.11757–11762. Available at: 968 http://www.pnas.org/content/95/20/11757.abstract.

969 Voglmayr, H. & Greilhuber, J., 1998. Genome size determination in 970 (Oomycota) by Feulgen image analysis. Fungal Genetics and Biology, 25(3), 971 pp.181–195. Available at:

972 Walker, J. & Priest, M.J., 2007. A new species of Albugo on Pterostylis 973 (Orchidaceae) from Australia: confirmation of the genus Albugo on a 974 monocotyledonous host. Australian Plant Pathology, 36, pp.181–185.

975 Zerbino, D.R. & Birney, E., 2008. Velvet: Algorithms for de novo short read 976 assembly using de Bruijn graphs. Genome Research, 18(5), pp.821–829. 977 Available at:

978

979

38

980 Figures titles and legends

981

982 Figure 1. Phylogeny shows that the five sequenced Albugo candida isolates fall into three

983 divergent races. BEAST tree constructed based on using contig 1 (398,508 bp) shows the

984 divergence among the three A. candida races (AcEm2 and AcNc2; AcBoT and AcBoL; Ac2V). Blue

985 bars represent the 95% Higher Posterior Density (HPD). Here, we used a strict molecular clock

986 fixed at 1.0 in order to show the relationship in the scale of substitutions per site.

987 Figure 2. Comparison of A. candida races using alignments of Illumina reads against the

988 AcNc2 assembly. A - Positive correlation between the depths of coverage of the reference

989 assembly (AcNc2) by the Ac2V and AcBoT reads. For the reference contigs less than 20 kb, the

990 mean coverage was calculated across the whole contig length and log-transformed. For the contigs

991 over 20 kb, the mean coverage was calculated for the sliding window of 20 kb and log-transformed.

992 Y-axis shows the log-transformed depth of the reference coverage by the Ac2V reads; X-axis shows

993 the log-transformed depth of the reference coverage by the AcBoT reads. B – Nucleotide identity

994 amongst the homologous genomic regions of Ac2V, AcBoT and AcNc2. The mean identity was

995 calculated for the sliding window of 20 kb.

996 Figure 2 Supplement 1. Coverage of the reference assembly (AcNc2) by Ac2V and AcBoT.

997 Proportion of mean coverage is a measure relative to the depth of coverage of the reference reads

998 mapped backed to the reference assembly. The proportion of AcNc2 assembly not covered by

999 AcBoT and/or Ac2V reads (≤ 10% of the mean coverage) was 3.17% (971,030 bp) in Ac2V and

1000 4.21% (1,285,877 bp) in AcBoT.

1001 Figure 3. Nucleotide polymorphism within and between A. candida isolates. Mean (±5 – 95%CI)

1002 polymorphism expressed as the percentage observed heterozygote sites (solid symbols) and

1003 percentage nucleotide divergence (open symbols) at contig 1. Confidence intervals were calculated

1004 using a bootstrap of contig 1 after removal of indels. Isolates infecting the same host plant (i.e. 39

1005 AcBoT-AcBoL and AcEm2- AcNc2) show little nucleotide divergence, which indicates that they

1006 are genotypically almost identical (i.e. diverged by less than 0.05%). Nevertheless, the Brassica

1007 oleracea infecting race (AcBoT and AcBoL) possess a relatively high heterozygosity compared to

1008 the isolates of the Arabidopsis thaliana infecting race. Moreover, most of this heterozygous

1009 polymorphism is shared (low nucleotide divergence) and presence of the majority of heterozygous

1010 sites is consistent with clonal reproduction.

1011 Figure 4. Variation in sequence similarity between races. A – Alignment of nucleotides in

1012 between positions 158,779 and 167,382 within “contig 1” of three A. candida races (AcNc2, AcBoT

1013 and Ac2V) illustrating two recombination blocks coloured blue and green. Both blocks show high

1014 sequence similarity between races. Also shown is the sequence divergence in between blocks.

1015 Alignment gaps and monomorphic sites have been removed. B - The sequence similarity at “contig

1016 1” amongst three A. candida races was visualised using the colours of a RBG colour triangular in the

1017 software HybRIDS (http://www.norwichresearchpark.com/HybRIDS). Areas where two contigs

1018 have the same colour (yellow, purple or turquoise) are indicative of two races sharing the same

1019 polymorphisms. The linear plot of the proportion of SNPs shared between the three pairwise

1020 comparisons between the races. Shown on the X-axis is the actual base position. The graphs were

1021 made in the R package HybRIDS (http://www.norwichresearchpark.com/HybRIDS).

1022 Figure 5. Age of recombination blocks: A - Age of the 675 recombination blocks (mutation rate of

1023 μ=10-6) estimated using binomial mass function; B - Boxplot of the median (plus 1st nation blocks

1024 and 3rd quartile) log-age of recombination events in contigs. Only contigs with 8 or more events are

1025 shown. There is no significant difference in age of events between contigs (GLM: F22,233=1.06,

1026 p=0.387).

1027 Figure 6. PCR with race-specific markers on DNA prepared from plant tissue generated from

1028 co-infection assays. A - Co-infection assay of AcNc2 followed by Ac2V onto Ws-eds1 and Ws-2. B

1029 - Co-infection of AcBoT followed by Ac2V onto Ws-eds1 and B. oleracea (B.o). C - Co-infection

40

1030 of Ac2V followed by AcNc2 onto Ws-eds1 and B. juncea (B.j). Bands highlighted in orange indicate

1031 amplification of secondary inoculum on usually non-host plants upon primary inoculation with

1032 virulent A. candida. These experiments were repeated multiple times with similar results (see

1033 Supplementary file 6).

41

1034 Tables

1035 Table 1. Virulence of the A. candida races on different plant host accessions.

1036

A. thaliana B. rapa B. junceae B. oleraceae

A. candida race + − + − + − + −

AcNc2 137 219 0 1 0 1 0 18

AcBoT 1a 34 0 1 0 2 15 0

Ac2V 1ab 107 1c 4c 6c 0c 0 2

1037 1038 + Host-pathogen compatible interactions (number of susceptible accessions)

1039 − Host-pathogen incompatible interactions (number of resistant accessions)

1040 a A. thaliana Ws-eds1 (enhanced disease susceptibility) mutants were susceptible to all

1041 tested A. candida races.

1042 b In the laboratory conditions, the cotyledons of the A. thaliana accession Ws-3 were found

1043 to be susceptible to the Ac2V (Cooper et al. 2008b).

1044 c Data from (Rimmer et al. 2000) incorporated; in this study, one cultivar B. rapa (CrGC1-

1045 18, rapid-cycling accession) was infected by Ac2V race and four other tested cultivars (“Torch”,

1046 “Colt”, “Horizon”, “Reward”) were incompatible with Ac2V race. All analysed cultivars of B.

1047 juncea (CrGC4-1S, “Burgonde”, “Domo”, “Cutlass”) were susceptible to Ac2V.

1048

42

1049 Table 2. Summary of the A. candida AcNc2, AcEm2, AcBoT, AcBoL and Ac2V genome assemblies.

AcNc2 AcEm2 AcBoT AcBoL Ac2V

Number of Contigs 5,212 11581 11,929 11,143 12,210

N50 Length (bp) 41,078 29,326 14,673 14,953 24,005

N50 Number 231 284 584 581 353

Mean Contig Length (bp) 6,610 2668 2,781 2,998 2,770

Assembly Size (bp) 34,454,169 33,409,146 33,184,526 33,409,856 33,823,601

GC Content (%) 43.19% 43.09% 43.15% 43.15% 43.11%

CEGMA Gene Space 93.55% 92.74% 93.55% 93,13% 92.34% Coverage (%)* Average Genome 160 150 140 140 200 Coverage Repeat Content (%) 17.4% NA NA NA NA

Predicted genes 10,907 NA NA NA NA

1050

1051 * - Completeness of the gene space in the different genome assemblies was estimated using CEGMA pipeline; the presence of over 90% of core

1052 eukaryotic genes in the assembly serves as an indication of a overall complete gene space.

43

1053 Supplementary file captions

1054 Supplementary file 1

1055 List of Arabidopsis thaliana and Brassica spp. accessions assayed in virulence tests with

1056 Albugo candida race type isolates AcBoT, Ac2V and AcNc2.

1057 Supplementary file 2

1058 List of Arabidopsis thaliana accessions assayed in virulence tests with Albugo candida

1059 isolates AcNc2 and Ac2v.

1060 Supplementary file 3

1061 Polymorphisms between Albugo candida race genomic regions verified with Sanger

1062 sequencing.

1063 Supplementary file 4

1064 List of the predicted recombination blocks generated by the RDP3 software

1065 Supplementary file 5

1066 Details of race-specific primers.

1067 Supplementary file 6

1068 Details of co-infections assays.

44

AcNc2

AcEm2

AcBoT

AcBoL

Ac2V

0.0 0.001 0.002 0.003 0.004

 ! "# 





   



                       A

AcNc2 AcBoT Ac2V  158779    162760    165018    167373   167854 

B