BMC Evolutionary Biology BioMed Central

Research article Open Access Evidence of recent interkingdom horizontal gene transfer between and Candida parapsilosis David A Fitzpatrick*, Mary E Logue and Geraldine Butler

Address: School of Biomolecular and Biomedical Science, Conway Institute, University College, Dublin, Belfield, Dublin 4, Ireland Email: David A Fitzpatrick* - [email protected]; Mary E Logue - [email protected]; Geraldine Butler - [email protected] * Corresponding author

Published: 24 June 2008 Received: 17 January 2008 Accepted: 24 June 2008 BMC Evolutionary Biology 2008, 8:181 doi:10.1186/1471-2148-8-181 This article is available from: http://www.biomedcentral.com/1471-2148/8/181 © 2008 Fitzpatrick et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract Background: To date very few incidences of interdomain gene transfer into fungi have been identified. Here, we used the emerging genome sequences of Candida albicans WO-1, Candida tropicalis, Candida parapsilosis, Clavispora lusitaniae, Pichia guilliermondii, and Lodderomyces elongisporus to identify recent interdomain HGT events. We refer to these as CTG species because they translate the CTG codon as serine rather than leucine, and share a recent common ancestor. Results: Phylogenetic and syntenic information infer that two C. parapsilosis genes originate from bacterial sources. One encodes a putative proline racemase (PR). Phylogenetic analysis also infers that there were independent transfers of bacterial PR enzymes into members of the Pezizomycotina, and protists. The second HGT gene in C. parapsilosis belongs to the phenazine F (PhzF) superfamily. Most CTG species also contain a fungal PhzF homolog. Our phylogeny suggests that the CTG homolog originated from an ancient HGT event, from a member of the . An analysis of synteny suggests that C. parapsilosis has lost the endogenous fungal form of PhzF, and subsequently reacquired it from a proteobacterial source. There is evidence that Schizosaccharomyces pombe and Basidiomycotina also obtained a PhzF homolog through HGT. Conclusion: Our search revealed two instances of well-supported HGT from bacteria into the CTG clade, both specific to C. parapsilosis. Therefore, while recent interkingdom gene transfer has taken place in the CTG lineage, its occurrence is rare. However, our analysis will not detect ancient gene transfers, and we may have underestimated the global extent of HGT into CTG species.

Background genes that confer the ability to catabolize certain amino Lateral or horizontal gene transfer (HGT) is defined as the acids that are important virulence factors [5]. However exchange of genes between different strains or species [1]. there is much debate as to whether lateral gene transfer is HGT introduces new genes into a recipient genome that an ubiquitous influence throughout prokaryotic genome are either homologous to existing genes, or belong to evolution [6]. Until recently, the process of gene transfer entirely new sequence families. Large-scale genomic has been assumed to be of limited significance to eukary- sequencing of prokaryotes has revealed that gene transfer otes [7]. The availability of diverse eukaryotic genome is an important evolutionary mechanism for these organ- sequence data is dramatically changing our views on the isms [2,3]. HGT has been linked to the acquisition of drug important role gene transfer can play in eukaryotic evolu- resistance by benign bacteria [4], and also to the gain of tion.

Page 1 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2008, 8:181 http://www.biomedcentral.com/1471-2148/8/181

The rapid increase in fungal sequence data has promoted in a previously published genome survey of C. parapsilosis this kingdom to the forefront of comparative genomics [33], suggesting its presence does not the result from con- [8]. Whereas there is some documented evidence for HGT tamination. We could not locate any related genes in any between fungal species [9-17] or from bacteria to fungi other CTG genome (using BlastP or TBlastN). Family [18-28] [see additional file 1], overall very few incidences members are widely distributed throughout the prokaryo- have been identified. There are two possible explanations: tes however, and are also located within the Pezizomy- either gene transfer is indeed extremely rare amongst cotina. fungi, or it has not yet been thoroughly studied. To address this question we investigated the frequency of suc- We extracted 321 putative proline racemases from 207 cessful recent interdomain HGT events between prokary- organisms, including members of the α, β, γ, and δ-pro- otes and yeast species belonging to the CTG clade. We teobacteria, Actinobacteria, Fungi, Protozoa and Metazoa. chose this course of action as we expect recent interdo- Numerous species were found to have several family main HGT events to be more readily identified and sup- members [see additional file 2]; all were included for ported than more ancient transfers. complete comparative purposes. A maximum likelihood (ML) phylogeny was reconstructed from an alignment of For the purposes of this study, we define CTG species as all the PR proteins (Figure 2). the immediate relatives of C. albicans, including C. tropica- lis, C. parapsilosis, Clavispora lusitaniae, Pichia guilliermon- There are a large number of polytomies displayed in Fig- dii, and Lodderomyces elongisporus. These species have been ure 2. These probably result from duplication of PR genes completely sequenced, share a relatively recent common followed by diversifying selection, leading to a high ancestor [29], and the codon CUG is translated as serine degree of sequence heterogeneity. For example, Agrobacte- rather than leucine [30]. rium tumefaciens str. C58 contains three PR homologs [see additional file 2], with an average amino acid pairwise We used syntenic, phylogenetic and sequence based anal- percentage identity of ~31%. Burkholderia cenocepacia AU yses to identify two cases of interdomain HGT between 1054 contains 2 proline racemase homologs [see addi- prokaryotes and C. parapsilosis, most likely involving the tional file 2], which are only 28% identical. To help proteobacteria phylum. Our results suggest that extant resolve the evolutionary history amongst PR homologs we CTG species do not readily take up exogenous DNA. reconstructed an additional ML phylogeny based on a reduced dataset (Figure 3). We also reconstructed a Baye- Results and discussion sian phylogeny using the heterogeneous CAT site model. Identification of horizontal gene transfer candidates The CAT model can account for site-specific features of through Blast database search sequence evolution and has been found to be more robust We compared all available CTG gene sets against UniProt than other methods against phylogenetic artifacts such as using BlastP [31]. CTG genes with top database hits to long branch attraction [34]. The resultant Bayesian phyl- bacterial species were identified as putative horizontally ogeny is highly congruent with the ML phylogeny (not transferred genes and the resultant Blast files were shown). inspected manually. A D. hansenii gene (protein ydhR pre- cursor) with a top database hit to a bacterial sequence was The putative C. parapsilosis PR homolog lies in a strongly not considered for further analsyes as it has previously supported (100% Bootstrap support (BP)) clade with Bur- been described [22]. After this process two genes from C. kholderia species (Figures 2 &3 clade-A). Burkholderia are β- parapsilosis were considered for further analysis; one proteobacteria. However, no other β-proteobacteria, or encodes a putative proline racemase, and the second indeed any other bacterial genus were found within clade- encodes a member of the phenazine F superfamily. A (Figures 2 &3). Related family members were identified by a second round of database searching against GenBank to ensure Although no PR homologs were identified in other CTG all available genomic data was utilized. species, or indeed in any other of the Saccharomycotina, there are homologs in family members of the Pezizomy- Proline racemase phylogeny and characterization cotina. A Pezizomycotina specific subclade is evident in The C. parapsilosis gene (designated CPAG_02038) is most our phylogeny containing Phaeosphaeria nodorum, Aspergil- similar to a proline racemase homolog from Burkholderia lus niger and Gibberella zeae (Figures 2 &3 clade-B 100% cenocepacia AU 1054 protein (66% pairwise identity; Fig- BP). This subclade is found in a strongly supported clade ure 1A). Amino acid racemases catalyze the interconver- with members of the Actinobacteria (Figure 2 100% BP), sion of L- and D-amino acids by abstraction of the α- containing Brevibacterium linens and an unclassified amino proton of the enzyme bound substrate [32]. marine actinobacterium and excluding Rubrobacter xylano- CPAG_02038 lies within a large contig and is also present philus (Figure 2 87% BP). This suggests that these Pezizo-

Page 2 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2008, 8:181 http://www.biomedcentral.com/1471-2148/8/181

(A) (B)

CpPR 0 MNQDRLISTIETHTGGEPFRIVTSGLPRLK CpPhzF 0 MSSA-FKQVDVFTSKPFKGNPVAVIMDAN BcPR 0 MKISRSLSTVEVHTGGEAFRIVTSGLPRLP PlPhzF 0 MSLVPFKQVDVFTHRPFKGNPVAVVMDAQ CpPR 30 GDTIVAKRTWIKTHHDEIRKFLMYEPRGHA CpPhzF 29 LSTEQMQTIANWTNLSETTFVFPATSDKA BcPR 30 GDTIVQRRAWLKAHADEIRRALMFEPRGHA PlPhzF 30 LSSIQMQGIANWTNLSETTFILPAENPLA

CpPR 60 DMYGGYLVDSVSDDADFGVIFLHNEGYSDH CpPhzF 59 YYVRIFTPQSELPFAGHPTIGTCHALLES BcPR 60 DMYGGYLTEPVSPNADFGVIFVHNEGYSDH PlPhzF 60 YRVRIFTPGSELPFAGHPTIGTAHALLEA

CpPR 90 CGHGIIALASTAVKLGWVERTQPKTRVGID CpPhzF 89 LISAKDGVVVQECGAGLVKLTIVPG---- BcPR 90 CGHGVIALSTAAVELGWVQRTVPETRVGID PlPhzF 90 LIQAREGRIVQECGAGLITLNVTERDEGQ

CpPR 120 APCGFIEAFVKWDGEKVGNVRFVNVPSFMY CpPhzF 115 STSFELPDPVITPLSDTQINSLETDLGCA BcPR 120 APCGFIEAFVQWDGEHAGPVRFVNVPSFIW PlPhzF 120 LITFELPEPTITPLSSEQIDRLESILDCP

CpPR 150 IKDATVDTPSFGEVIGDIAFGGAFYFYMNS CpPhzF DKNLRPALVDVGARWIIARVADAKTVLSA BcPR 150 145 RRDVSVDTPSFGTVTGDIAYGGAFYFYVDG PlPhzF 150 DRALTPALIDVGARWIVAHTTGAEAVLAT

CpPR 180 ASLDIPIGLSQVETLRRLGNEVKIAANKKY CpPhzF 175 PAFPQLAKHNNELKATGVSIYGNWSD--- BcPR 180 APFDLPVRESAVEKLIRFGAEVKAAANATY PlPhzF 180 PDYARLLEHDTQMNITGVCLYGAYHEGAE CpPR 210 KVVHPEIAEINHVYGTIIDNISPDGGASQS HIEVRSFAPACGVDEDPVCGSGNGAVAAF BcPR 210 CpPhzF 202 PVVHPEIPEINHIYGTIIANAPRHAGSTQA PlPhzF 210 DIEVRSFAPSCGVNEDPVCGSGNGSVAAF CpPR 240 NVCIFADRQVDRSPTGSGTAGRAAQLFARG CpPhzF 232 RSHTN---EDKILKSSQGSVVNREGALKL BcPR 240 NCCVFADREVDRSPTGSGTGGRVAQLYQRG PlPhzF 240 RHHKVAMIDDKIVHSSQGKKLGRQGSVWL

CpPR 270 ELQVGQVFTNESIVGSVFSAKVVKQVKYHG CpPhzF 259 ISNKKVLVGGDAVTCIEGKIRIT BcPR 270 LLAAGDTLVNESIVGTVFKGRVLRETTVGD PlPhzF 270 HSDGKIFVGGSAVTCINGTITI- CpPR 300 FDAVIPEVEGNANVIGYANWIVDPDDEIGK BcPR 300 FPAVIPEVEGSAHICGFANWIVDERDPLTY

CpPR 330 GFLVRE BcPR 330 GFLVR-

A)withFigure An MUSCLE alignment 1 of PR proteins from C. parapsilosis (CpPR CPAG_02038) and Burkholderia cenocepacia (BcPR) was generated A) An alignment of PR proteins from C. parapsilosis (CpPR CPAG_02038) and Burkholderia cenocepacia (BcPR) was generated with MUSCLE. These proteins are 66% identical B) An alignment of PhzF proteins from Candida parapsilosis (CpPhzF CPAG_03462) and Photorhabdus luminescens (PlPhzF), these are 61% identical. mycotina species obtained their PR gene from the that two independent gene transfers have occurred into Actinobacteridae subclass rather than the Rubrobacteri- the Metazoan and Pezizomycotina lineages from unsam- dae subclass. This transfer event is another independent pled bacterial donors. Finally, a transfer from unsampled HGT event of a PR gene into fungi, and we hypothesize it bacteria into one of the eukaryote clades (either Metazoa occurred early in the Pezizomycotina lineage, as it is or Pezizomycotina) may have occurred with subsequent shared by three distantly related species. Its patchy transfer from one eukaryotic group to the other. phyletic distribution suggests it has been subsequently lost in other Pezizomycotina species. A. niger, A. oryzae and G. zeae all contain multiple PR homologs [see additional file 2]. One A. niger, one G. zeae There are also PR homologs in the Metazoans. These are and the three A. oryzae PR homologs are nested in a found in a eukaryote clade that also contains a number of strongly supported Pezizomycotina specific subclade (Fig- Pezizomycotina representatives (Figures 2 &3 clade-C ures 2 &3 clade-D 100% BP). This subclade if found 93% BP). Several scenarios can explain this phylogenetic within a larger predominately proteobacterial clade (Fig- positioning. Firstly, the PR gene may have been present in ure 2 74% BP). This infers that there was an independent the last universal common ancestor of all eukaryotes but gene transfer event of a bacterial PR homolog into an has been differentially lost in all lineages except those ancestral Pezizomycotina species. leading to modern day Metazoa and Pezizomycotina. Alternatively, an ancient gene transfer from bacteria to the The phylogenetic position of the C. parapsilosis PR last common ancestor (LCA) of Metazoa and Fungi could homolog (Figures 2 &3) resemble that described for the have occurred, with subsequent gene loss amongst differ- adenosine deaminase (ADA) gene in the Dekkera bruxel- ent Metazoan and Fungal lineages. A third hypothesis is lensis genome [21]. In that analysis, the authors suggest

Page 3 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2008, 8:181 http://www.biomedcentral.com/1471-2148/8/181

Bacillus weihenstephanensis KBAB4 1

17616 384 Lysinibacillus sphaericus C3-41 1170 8

Bacillu 0 1 Bacillus cereus ATCC 14579 5

Bacillus thuringiensis 06 1325

Hakam Bacillus thuringiensis str Bacillus 4 C s

12

Al

C CC

0 ATCC M2304 r 9727 viciae T t B str Baci Bacillus cereus AH1134 Baci v Bacillus cereus G9842 s cereus ATCC 10987 s 0 804 01 s LB40 s WSM a Bacillus cere B Bacillus weihenstephanensis KBAB 1 34 A2 7

20 ans Bacillus cereus ATCC 1 cil WS (E) ac r Ame 46 55 S an 12 t

llus anthracis str Ames cereus NVH0597-99 ll or 4 Baci aria M l um A cien C25 Bacillus C us cereu i 1 Alkaliphilus us cer H8 r or llus D21 Bacillus sp B14905 p arum 231 Bacillus cereus s str Clostridium botulinum A str. ATCC 3502 arum b Ba Bacillus ce ltiv SM TC ub efa 2 Bacillus cer llus cer D Bacillus cereus G9241 os 4 nos 88 c m us A um HTCC2170 cil raci ambif i racis s i ingiensis str i 3 in a H ereus CC 7007 Bacillu e ri 1 eus G9 ium s lus cereus r th N 4 ATCC uringiensis tu cer ur ia um r ia mu D r Bacillus cer nth Clost h AT ium ALC- s A ill m us B t ter pet 1 Renib er s maris M PR1 T metalliredigenes QYMF str Al Hak a s 303099 de egum guminosarum i ME eus AH820 e a egum ormat bacter ndensis ME r Clos s riu Cl Trypanosomaypanosoma cr c l l e tli CF cter H18 uli E us th us W s ol a s cereus B4264 us an spir r reus E3 u bac lus ce l Dorea longicatena DSM 1381Clos l l e qu idiu Clost a el 941 a 4 l Clostridium dif 33L si 24 hold s sp bif es vi acter eus AH11 ill cte 9727 264 um k u b tridium stic c tor MAFF A r ric ium l a marina m scinden acillus la b en 1 7 l acter michigane AH187 hylo ba lea tri u a a TCC rdet ium les ba 6 B hodo acoccus denitrificans PD1222 Bacill ag e ridium Bacil Baci 23445 Ba ob h 3 ium salmo Burkh dium eus G984 B icida OT loti M ki 1 str 9 R scill C A1 4 Burkholderia xenov ar iz g 0987 am 3L Bo gro l 9 Saccharopolyspora erythraea N Met hizobi o inita eria anthropiC ATCC491 ar Brevibacteriu P h r riph 0 14579 A T v Streptomyces avermitilis R o 3 is 16M 5 DFL 6 r sp OhILAs R dong uzi strain bot Rhizobium big ia a s 8 HEL45 222 9 Clavi 34 Rhizob um n a 50 1 uzi strain CL Breners k Mic o d bact ia obium is A SI 1 landii Alg lavobacterialchroflexus n z D n ATCC 35704ulinumficile Bf 63 R or u fex F lite M4 Streptomyces coelicolorinarum A3(2) ATCC 2 s tus biosp phic Ru nsis bacter mi 4 K lavo euwenhoe suis 13 o Psy kdo orhi a s r F obactrl br o la ot ndoli WS21 6 10633 CL Brener Le l i e 9 Methylobacterobacter xyla str m lin D e lla me gi HTCC2a 0 Mes c a abor a rificansic P 10 Methylobacter Breviba ain N Ochrrucel e mona ti 0 c ti phot C 23 c e B a a granulosus HTCC251 T Haloarcul higan n Bru ru bulbus Me CPPB 382 s BL 4 B ran fle Q381 33209 Brucell e ni melilo KC A cterium linens BL s Rhodo thylo n MA46 ensis Au ium mmed sis ium n ophilus 2 Ho u a marismortuiium n ATCCb Ocea izob bi en bacterRhodo a R Oceanicol h ju odul c R Fulvimarinaracoccus pel denit ta D2 odulans ORSterium 2060DSM sp 446L 8 or rhizo molyticu aea 34H Fe bacter spha ans ORS 2060 2338 0 Pa che r ica TW7 Pseudomonasrropsphaeroides aerugi ATCC Sin ino 5 cus 12G01 9941 S parahaemolyticus RIMD 22 B 45 lasma acid Marinomonas sp MED121yti 2 nol 4 Hahellaio parahaegi Erw Pseu Vibrio psychreryth PV4nsis SB2CC7003 er Vibr romonas tun 8 inia carotovorad SCRI1043oides 43 241 rio al omona arm 049 Vibrio sp Ex2alte Vib azone anu 1 onadales bacteriumleana AT s chnosa C3717029 Colwellia TCC 5190 Burkholderia sp 383s fe Pseudo ella am loror r1 Alterom dyi A nella pea nis HAWEB3D1222 aph 9 loihica is Shewan ca KT99ans P Shewa la sedimi c Shewanella halifaxensis benthiitrifi HAW-EB Shewanella woo 1 Shewanel ium linens BL21 Shewanellaacoccus den s pv citri str 306 Par revibacter B e Rhodococcus sp yzaRHA Rhodococcus sp RHA (D) Xanthomonas niger axonopodi Mus musculus x RattusM norvegicusus mus Aspergillus or zeae PH1 Mus musculuculus Aspergillus Mus muscul Gibberellaergillus oryzae s Asp Pan trog us Aspergillus oryzae Macaca mulattalodytes Roseobacter spuminosarum MED193 WSM2304 Roseobacter sp SK20926 Homo sapiens Rhizobiumter usitatusleg Ellin6076 Bos grun Equus caballus niens x Solibac Bos tauru Roseiflexus castenholzii DSM13941 OrnithorhynchusCanis familiarisanatinuss Alkalilimnicola sp MED297 ehrlichei MLHE1 Reinekea DG893 Gallus gallus Chromohalobacterer algicola salexigens DSM3043 Monodelphis domestica Marinobact (C) Marinobacter aquaeolei7 VT8 Sil rinobacter sp ELB1 urana Ma s putida W619 Danio frankei Pseudomona Tetraodon ium sp BNC1 nigroviridis Mesorhizob 2304 Strongylocentrotus purpuratus Rhizobium leguminosarum WSM Strongylocentrotus purpuratus Burkholderia ambifaria MC406 Strongylocentrotus purpuratus Burkholderia multivorans ATCC1761 6 Agrob Nematostella vectensis acterium tumefaciens str C58 Rhizobium legu Nematostella vectensis minosarum bv viciae 3841 ctensis Rhizobium legumino Nematostella ve Rhizobi sarum WSM1325 us A1163 um etli CFN 42 lus fumigat Af293 Rhizobium leg Aspergil igatus Mes uminosar pergillus fum RL 181 orhizo um WSM2304 As R 4 Rho bium loti MAFF NIH262 dobactera 303099 Neosartorya fischerireu Ns niger Paracoccu les bacte pergillus ter gillus Rhizobium sleguminosarum denitrificansriu Pm WSM230 HTCC As Asper SC A4 2150 G Ochroba D122 nidulans F L4243 B ctru 2 illus DF rucell m an Asperg rophica cus J10 Bruc a ovi throp tot ntia ella s ATCC2 i ATCC 4 flea pho us aura 100 Brucell suis 58 4918 Hoe oflex EB4 B a 1330 40 8 Chlor AW- rucel canis AT Roseiflexussis H sp RS1 alpha proteobacla m CC 2336 axen CC 700345 elite halif 3 97 nsis 16 5 nella aleana ATi ATCC WEB51908 Pse t M hewa Pseuudom eriu S a PV42B 64 m BAL199 la woody SB Ps dom onas aerug Shewanellawanel pe loihicsis eu o She a sediminis HAen 34H Pseu do nas a anell anellazon ea 99 do mon eru inosa P Shew Shew thraCC21704 Pseudo mo as g a amahrery n aeru ino A7 nell 1 Si mon as aer g sa psyc 99 li inosa C371UC hewa ATCC 2313sis 75 Jan cibact as aerugug BPPPA1 S ina ica SIR-ilo 5 naschia sp inosa 21 Colwellia ar acif s 92 Psy er p 9 4 a m s p TM81 Rhodobachro omer ino 92 parap 6 Ro sa PAO1 cysti 40 71 mona CCS1oy icroscillsio dida C 83 77 Roseovseo i D FlavobacterialesM Ple bacteriumCan HT va cter s ingrahamiiSS 3 voransria MLB400 Roseovariusriu ales bacter 3 bifa ria sp 3 54 Stappia aggregaar s lderia phymatum S a PC1840 3 99 Silicibacte ius spsp kho ia am aci Ph . TM1 r r kholde a AU 1 7 Bu kholderialde xenoBur ci is Phaeoaeobacter n 21 035ium Bur rkho noceppa Roseo ub 7 H Bu ce pacia MC0 r sp TM1 in T ia ceno Ag bacter ga h CC2255 er ATCC 1761 Si ro ta ibens IS s s PsJN n bacter s IAM 12 hold MMD r Sin orhiba ga urk eria ce ran 1 Oc c llaeciensis0 2.10 B ld rman H o teriu 40 M lderia cenoceltivo s nigeP Oc e rh zobium llaeciens 61 pacia A lu e Ple ea a izobi m p Azw3 4 Burkho mu Verminephrobacternibulbu eiseniae EF012 Burkho s nicol tumef Burkholderia ubonens P ioc u medica (A) lderia ia nodorum SN15 Serse m melil B is BS10 o Aspergil linens BL2 Herp ud ysti a s a gr cie rkh Pho ra omo s p ind m Burkh t a oti 102e WSM41ns str Bu m PHSC20C1reus E33L Psy et ia pr nul olifex HEL45 7 Burkholderia ceGibberellariu zea AH820 tor acif Burkholderiaaeosphaer phytofi acteriume Blastopirellula marosiphon aurana osu ct Gemma chroba h oteamacus ica SIR-1 C58 Ph reus cereus Haka W R o abd s 1 illus ce Acin ho ld fluo HTCC251 ba e 987 9 Brevibo c Bacillus cereus Al 46 Aci eria pseu B dop us in Ba s str Ames 10 resc lus c CC 14579 B ur n et c lu il Bur urkhol et ta te Bacillus Pseudo k o irellul min lan e nsis str Pse h o b o r c nt ns Pf5 ne act BAB4 Pseu khol o a bs 6 ri Bac Ps bacte ryohaloledomalleescensia s ATCC 7 P lder c 568 4 Ch udomon te c cus ma r 9727 Chro s eud deria xenovor a balturig ringie lus cereus AH187 TCC35646 b P e der r us anthraci Ps dom mo ia in ATC Rubrobacter xylanophilus DSMll 9941 eus cereus G9842 P se r u r b i P o a a Ps seu domonasomon p bau lo Myxoc eu m i DSM 3 X s u a um ic i DM98s s thu X mobacte na h b Bacil 1710 Xantho eudom ntis ubsp u Bac X dom o o yma a u C23779 H3081.9 Xant an eudo phy cillus cereus AT i Xanthomonas oryzae an d Phae h nas phas aer man s R s a Rose an dom SH 1 Sta M

llei 14 Silicibacter pomeroyi DSS3 P R omo

Ro n (B) t aloba UQM a Ba 6 o K5 hae thomon h as oseo t n Bacillus ari laumo thomona eru ofi t ei K96243 seobact onas f 6 Bacill eus om occ u nii ii ATCC homonas stephanensis K ppi monas seova m STM 4 61 on a Bu Bacillus cer n n r ans LB400 thuringiensis ATCC 356 all s TXDOH oba onas 5 lei GB8 horse ob m i 1040 aer ug g mans obacter r eruginag 2 lus cereus G9241 omona ona as cte s us xa ium in a a as putida GB 2 v o nd ns 17 i llus a 4 a e D nas u nosa C371o c en lu r sal 6 ii cte as axonopo sa r gino 1 Bacillus cereus AH1134 te gg

rius s ius sp 21 pu o 81 TTO Bacil ensi vi er s cam pu PsJ s t re osa 3 7 Baci r ga nt omo TCC o P r 978 reg sp TM 5 Bacillus cereus NVH0597-99 c t e s sp MWY or on gall l A Bacillus cer oryzae tid ida s sa s 1 Bacillus thuringiensisBacillus A thuringiensis st ampestris hu aceu xig N

s er cens Pf p UCBPPP 7 p

t y s A olderia mal eria pseudomalle p. l a K acillus weihen ata pestri lderia pseudoma SK20 la zae s phi ub c P B ME F1 e aeciens DK 1622 a eci n AO TM1035 9 i m ba IAM T244 1 la s DS eria thailandensis E264 cenocepacia MC03 7 Burkholderia sp 383 D193 Fungi B di L 1 Burkh e MAF ATC s str der p lici O1 deria thailande nsis LS Burkholderia mallei ATCC 23344 tivoran 92 s 48 Burkho v ol

L1 Si M A14 α-proteobacteria Burkhold ul s str 0 126 o h 2 6 tr C is AT 3 k r F 31 Burkholderia thailandensis Bt4 m 56 Burkholderia cepacia AMMD B Burkholderia pseudom r yza 85 1 Burkholderia pseudomallei DM98 2.10 0 β-proteobacteria 306 2 a Burkhold 1 S 4 10 C 47 Burkholderia vietnamiensis G4 Bu Burkholderia ambifaria MC406 Burkholderia dolosa AUO158 4 3 Burkhol Burkholderia dolosa AUO158 10 e KACC1 1018 eri C33 γ-proteobacteria kholderia cenocepacia AU 1054 urkholderia 2 Burkholderia oklahomensis EO147 d

7 B

ol Burkholderia thailandensis MSMB43 Actinobacteria Bur 913 Trypanosoma

0331 Burkh Metazoa Firmicutes miscellaneous bacteria ProlineFigure racemase2 maximum likelihood phylogeny Proline racemase maximum likelihood phylogeny. The optimum model of protein substitution was found to be WAG+G. The number of gamma rate categories was 4 (alpha = 1.163). Bootstrap resampling (100 iterations) was undertaken and are displayed. For display purposes branches with less than 50% support were collapsed. Letters (A-E) in parentheses are used to distinguish clades and are discussed in the text. Branches are colored according to their . Fungal branches and species names are colored green.

that D. bruxellensis and Burkholderia species received the only C. parapsilosis and Burkholderia species (Figures 2 &3). ADA gene from a species not yet represented in the public Burkholderia species are known to have a genomic reper- sequence databases. Our PR phylogeny suggests a similar toire that allows the transfer and receipt of exogenous event may have occurred within clade-A, which contains DNA [35] and a number of studies have reported success-

Page 4 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2008, 8:181 http://www.biomedcentral.com/1471-2148/8/181

Mus musculus DRSPTGSGVTARIAL Rattus norvegicus DRSPTGSGVTARIAL 100 Pan troglodytes DRSPTGSGVTARIAL Homo sapiens DRSPTGSGVTARIAL 100 Aspergillus nidulans DRSPTGSCVTARLAL Aspergillus flavus (C) 100 DRSPTGSCVAARVAL Aspergillus niger DRSPTGSCVTARMAL 95 55 Aspergillus terreus DRSPTGSCVTARMAL

55 Neosartorya fischeri DRSPTGSCVTARMAL 100 Aspergillus fumigatus DRSPTGSCVIARMAL Candida parapsilosis DRSPTGSGTAGRAAQ Burkholderia phymatum DRSPTGSGTGGRVAQ 100 Burkholderia phytofirmans DRSPTGSGTGGRVAQ 55 100 Burkholderia xenovorans DRSPTGSGTGGRVAQ DRSPTGSGTGGRVAQ 95 Burkholderia Burkholderia cepacia DRSPTGSGTGGRVAQ (A) 90 75 Burkholderia ambifaria DRSPTGSGTGGRVAQ Burkholderia cenocepacia DRSPTGSGTGGRVAQ 70 Burkholderia cenocepacia DRSPTGSGTGGRVAQ 100 Burkholderia cenocepacia DRSPTGSGTAGRVAQ 100 100 marine actinobacterium DRSPCGSGTCARIAT Brevibacterium linens DRSPCGTGTSARVAA (B) 100 100 Gibberella zeae DRSPCGSGTASRIAV

100 Aspergillus niger DRSPCGSGTCARIAT Phaeosphaeria nodorum DRSPCGSGTSARLAI Agrobacterium tumefaciens DRSPTGTALSARMA V

100 Paracoccus denitrificans DRSPCGTGTSARMAQ 95 Brucella melitensis DRSPCGTGTSARMAQ 65 100 Brucella suis DRSPCGTGTSARMAQ 95 Silicibacter DRSPCGTGSSGFVAC Burkholderia xenovorans DRSPCGTGTSAKLAC

100 100 Burkholderia phytofirmans DRSPCGTGTSAKLAC 100 Burkholderia phymatum DRSPCGTGTSAKLAC 100 Burkholderia ambifaria DRSPCGTGTSAKLAC 100 Burkholderia cepacia DRSPCGTGTSAKLAC 100 90 Burkholderia multivorans DRSPCGTGTSAKVAC Burkholderia cenocepacia DRSPCGTGTSAKVAC 95 Burkholderia cenocepacia DRSPCGTGTSAKVAC Paracoccus denitrificans DRSPTGTGCSARMAV 95 Brucella suis DRSPTGTGCSARMAV 100 Brucella melitensis DRSPTGTGCSARMAV 100 Psychromonas ingrahamii DRCPTGTSVSARMAI 100 Agrobacterium tumefaciens DRSPTGTGCSARMAV 60 100 Silicibacter DRSPTGTAVSARMAL Gibberella zeae DRSPCGTGSSSRLAI (D) Aspergillus oryzae DRSPCGTGSSARMAV 95 65 Aspergillus oryzae DRSPCGTGSSARMAV 100 Aspergillus flavus DRSPCGTGSSARMAV 55 Aspergillus niger DRCPCGTGSCARMAV 90 Aspergillus flavus DRCPCGTGSSARMAV 100 Aspergillus oryzae DRCPCGTGSSARMAV

ReducedFigure 3 Proline racemase maximum likelihood phylogeny with active site alignment Reduced Proline racemase maximum likelihood phylogeny with active site alignment. Bootstrap resampling (100 iterations) was undertaken and percentages are displayed. Fungal branches are shown in green. An alignment around the active site is also displayed. Clade letters in parentheses correspond to those in Figure 2. The phylogeny is rooted around the Meta- zoan/Pezizomycotina specific clade (clade-C), all members of this clade have a threonine at the active site. C. parapsilosis and its phylogenetic neighbors have a threonine instead of a cysteine at the active site (clade-A). A. oryzae, A. niger and G. zeae all con- tain cysteine at the active site (clade-D). A. flavus, A. oryzae, A. niger, A. nidulans and G. zeae also have cysteine at the active site (clade-C).

Page 5 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2008, 8:181 http://www.biomedcentral.com/1471-2148/8/181

ful gene transfers into Burkholderia species [36,37]. It is We located tRNA genes for nearly all CTG species beside possible therefore that there have been other successful the large conserved syntenic block (Figure 4). It has been gene transfers into this bacterial lineage. shown that tRNA genes are associated with genomic breakpoints [46]. We hypothesize that a genomic rear- The vast majority of amino acids found in living cells cor- rangement has occurred at this site in the LCA of C. parap- respond to the L-stereoisomer [38]. However, D-amino silosis and L. elongisporus. We cannot determine if the acids are long known to be found in the cell walls of Gram bacterial PR homolog was inserted into the LCA of L. positive and negative bacteria, where they are essential elongisporus/C. parapsilosis and subsequently lost in L. components of peptidoglycan [39]. Apart from low levels elongisporus, or gained by C. parapsilosis after speciation. of D-amino acids derived from spontaneous racemization as a result of aging [40], it was assumed that only L-amino We also investigated the gene order around the Pezizomy- acid enantiomers were present in eukaryotes [41]. How- cotina PR homologs [see additional file 3]. Gene synteny ever, recent studies have reported the presence of numer- around the PR homologs found in clade-D (Figures 2 &3) ous D-amino acids in an array of organisms, including is not conserved (not shown). Interestingly however, both mammals [42]. The first eukaryotic (proline) amino acid A. niger and G. zeae in clade-D (Figures 2 &3) have genes racemase has recently been described from the human containing a FAD dependent oxidoreductase domain in pathogen Trypanosoma cruzi [43]. A high degree of close proximity to their PR homologs (not shown). sequence similarity was observed between the T. cruzi and According to Pfam [47], FAD dependent oxidases include bacterial homologs [43]. Our phylogeny infers that T. D-amino acid oxidases, that catalyze the oxidation of neu- cruzi obtained its PR homolog through interdomain HGT tral and basic D-amino acids into their corresponding from a member of the Firmicutes (subclass Clostridia), as keto acids. The presence of these oxidases may be another it is grouped beside members of this group with a high example of an adaptive translocation to enhance the activ- degree of support (Figure 2 clade-E 96% BP). We per- ity of the PR gene in these Pezizomycotina species. formed database searches [44], against other Protozoan genomes including Trypanosoma brucei, Trypanosoma con- A. oryzae has three PR homologs (Figures 2 &3 clade-C). golense and Trypanosoma annulata. We failed to locate a All of these have orthologs in its close relative A. flavus homolog in all species except for T. vivax. (Figure 3 clade-C), and synteny around these is conserved [see additional file 3 clade-D]. The remaining two species Previous analysis has shown that T. cruzi and T. vivax are in clade-C are A. niger and G. zeae. There is no evidence of not each others closest phylogenetic neighbors, relative to conserved gene order within these species, or with A. the other species sampled [45]. This suggests an ancestral oryzae or A. flavus. Gene order around the A. flavus and A. Trypanosoma gained the PR gene and multiple losses in terreus PR homologs found in the Metazoan/Pezizomy- different Trypanosoma lineages has subsequently cotina clade (Figures 2 &3) is also conserved [see addi- occurred. tional file 3], as is the order between A. fumigatus and N. fishceri [see additional file 3]. We could not locate amino Gene order around PR homologs acid transporters or FAD dependent oxidases beside any The C. parapsilosis PR homolog lies close to an ortholog of the PR homologs found in clades B or C (Figure 2). (CPAG_02041) of orf19.1135 from C. albicans (Figure 4). The gene order to the left of this ORF is conserved in all Proline racemase codon usage CTG species, the order to the right is conserved in most It has been shown that recently acquired genes often dis- CTG species apart from C. parapsilosis and L. elongisporus. play an atypical codon preference when compared to C. parapsilosis and L. elongisporus are closely related [29], other genes in the genome [48,49]. However, the trans- and an examination of synteny suggests that the PR gene ferred PR homologs have a codon usage consistent with (together with a second ORF, cpar5437) were inserted the rest of their genomes [see additional file 4]. We under- between CPAG_2041 and CPAG_2037 (Figure 4). took an analysis of variation in synonymous codon usage cpar5437 encodes a neutral amino acid (AA) transporter. on all PR genes shown in Figure 2. Homologs from related The presence of an AA transporter beside the PR homolog species cluster together [see additional file 5]. For exam- is interesting. If the putative proline racemase has a role in ple, the Actinobactria, the Firmicutes and the Burkholderia amino acid metabolism, then the presence of the trans- species all inhabit unique areas in two dimensional corre- porter may be the result of an adaptive translocation to spondence analysis space [see additional file 5]. enhance the activity of the PR gene. Unlike the PR ORF the AA transporter is fungal in origin. Most CTG species con- The majority of fungal and Metazoan PRs are clustered tain a single neutral AA transporter; however C. parapsilosis together [see additional file 5]. The C. parapsilosis PR and D. hansenii have four. homolog has a codon usage distinct from the other Pezi- zomycotina fungal PR homologs [see additional file 5],

Page 6 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2008, 8:181 http://www.biomedcentral.com/1471-2148/8/181

orf15281 C. albicans tRNA orf15282 orf19.1139 orf19.1137 orf19.1136 orf19.1135 C. albicans C. albicans C. albicans C. albicans C. albicans Cd11110 C. dubliniensis Cd11100 tRNA C. dubliniensis Cd11060 Cd11070 Cd11080 Cd11090 C. dubliniensis C. dubliniensis C. dubliniensis C. dubliniensis CTRG_03463 C. tropicalis CTRG_03464 C. tropicalis CTRG_03469 CTRG_03468 CTRG_03467 CTRG_03466 tRNA C. tropicalis C. tropicalis C. tropicalis C. tropicalis

F11121g F11231g F11209g F11187g F11165g tRNA D. hansenii D. hansenii D. hansenii D. hansenii D. hansenii CLUG_03492 Cl. lusitaniae CLUG_03488 CLUG_03489 CLUG_03490 CLUG_03491 Cl. lusitaniaeCl. lusitaniae Cl. lusitaniae Cl. lusitaniae

PGUG_01866 PGUG_01867 PGUG_01868 PGUG_01869 tRNA P. guillermondii P. guillermondii P. guillermondii P. guillermondii

AA CPAG_02044 CPAG_02043 CPAG_02042 CPAG_02041 C. parapsilosis C. parapsilosis C. parapsilosis C. parapsilosis cpar5437 C. parapsilosis PR CPAG_02038 C. parapsilosis LELG_00346 LELG_00345 LELG_00344 LELG_00343 CPAG_02037 L. elongisporus L. elongisporus L. elongisporus L. elongisporus C. parapsil

osis CPAG_02036 C. parapsilosis LELG_00342 L. e longisporus LELG_00341 L. elongi sporus

GeneFigure order 4 around C. parapsilosis proline racemase gene Gene order around C. parapsilosis proline racemase gene. Species names and identifiers are shown in each box. Gene identifiers relate to annotations from the Broad Institute [66]. On the left hand side orthologous genes are stacked under one another in pillars. Relative positions of t-RNA genes are shown and may indicate a breakpoint. After the breakpoint, synteny is conserved between C. albicans, C. dubliniensis, C. tropicalis, D. hansenii and Cl. lusitaniae. Synteny between C. parapsilosis and L. elongisporus is conserved but differs to the other CTG species. C. parapsilosis has a proline racemase (PR CPAG_02038) and a neutral amino acid transporter (AA cpar5437) insertion in this region. cpar5437 is absent from the Broad gene list but present in our manual gene call. which is unsurprising as C. parapsilosis belongs to the Sac- originated from a genome with no other close relatives charomycotina subphylum. The C. parapsilosis homolog is among the species analyzed here. also separate from the Burkholderia (β-proteobacteria) genes with which it forms a closely related phylogenetic Proline racemase activity group (Figures 2 &3). This suggests that the gene may have The PR active site from Trypanosoma cruzi, Clostridium stick- landii, Agrobacterium tumefaciens, Brucella melitensis and

Page 7 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2008, 8:181 http://www.biomedcentral.com/1471-2148/8/181

Pseudomonas aeruginosa all contain cysteine at amino acid ria ambifaria), α-proteobacteria (Roseovarius) and the γ- position 330 [43,50]. This amino acid is essential for proteobacteria (Azotobacter vinelandii, Acinetobacter bau- enzymatic function, because substitution with serine mannii, Shewanella baltica and Photorhabdus luminescens) abolishes activity [41]. However, PR homologs from (81% BP). In contrast, all other PhzF homologs from CTG human, mouse, Rhizobium and Brucella contain a threo- species are in a completely separate clade (Figure 5). These nine instead of a cysteine at position 330 [41]. We form a sister group (63% BP) to PhzF homologs from observed that cysteine is found in the equivalent position other Saccharomycotina species (C. glabrata, Saccharomy- in many of the bacterial proteins. The Pezizomycotina PR ces cerevisiae, Kluyveromyces lactis and Vanderwaltozyma genes found in clade-B and clade-D contain a cysteine at polyspora). All three clades are grouped together in a larger the active site (Figure 3). The PR homologs found in the clade with high support (75% BP). Metazoan/Pezizomycotina clade (clade-B) have a threo- nine at position 330. Similarly, the C. parapsilosis PR The sister group relationship between the PhzF homologs homolog, together with its relatives from Burkholderia all from the Ascomycota and the proteobacteria clade is contain a threonine (Figure 3). However, Burkholderia spe- intriguing (Figure 5), as it suggests that an ancestral Sac- cies have multiple PR homologs [see additional file 2] charomycotina species gained the PhzF homolog from a with a cysteine as the active site (not shown). It is not clear proteobacteria. The bacterial PhzF gene has subsequently what effect the substitution has on enzyme activity. It has been retained after multiple speciation events, but lost in been suggested that homologs containing threonine at the C. parapsilosis. We hypothesize that C. parapsilosis has active site are not true PRs [41], but may instead belong to recently reacquired a bacterial PhzF homolog from a pro- a superfamily. We cannot detect any difference in the abil- teobacterial source, as it is grouped (81% BP) within a ity of C. parapsilosis, the other CTG species or any of the proteobacterial subclade. To test this hypothesis we recon- Pezizomycotina species to utilize D-proline as growth structed constrained trees that placed C. parapsilosis media (data not shown). We therefore cannot confidently together with the remaining Ascomycota species [see infer the function of the PR homologs in the fungi ana- additional file 6 C-H]. The AU test of phylogenetic tree lyzed here. selection [53], showed that the original unconstrained tree (groups C. parapsilosis with proteobacteria) receives Phenazine F phylogeny and characterisation the optimal likelihood tree score, and the differences in The C. parapsilosis gene (designated CPAG_03462) is most likelihood scores when compared to the constrained trees similar to a Photorhabdus luminescens phenazine F (PhzF) [see additional file 6], are significant (P < 0.05). This is protein with 61% pairwise identity (Figure 1B). Phena- also supported by spectral analysis [see additional file 7]. zines are biologically active compounds, all of which have a characteristic tricyclic ring system and have been shown Our phylogeny shows that the Schizosaccharomyces pombe to confer a selective growth advantage to organisms which PhzF homolog is found in a clade containing all CTG secrete them, as they possess broad-spectrum antibiotic PhzF homologs (Figure 5 99% BP). Furthermore it is activity towards bacteria, fungi and higher eukaryotes grouped beside D. hansenii (66% BP). S. pombe is not a [51]. In Pseudomonas, the best studied phenazine pro- member of the Saccharomycotina, it belongs to the ducer, PhzF is part of an operon required for the conver- Taphrinomycotina subphylum. The genome sequences of sion of chorismic acid to phenazine-1-carboxylate (PCA) Schizosaccharomyces japonicus and Schizosaccharomyces oct- [52]. PhzF homologs were identified in most of the CTG osporus have recently been completed [54]. We could not species tested as well as several other fungal species. How- locate a PhzF homolog in S. japonicus but did locate a ever, we could not identify a PhzF homolog in the L. elong- homolog in S. octosporus using a TBlastN search strategy. isporus genome, even when multiple TBlastN and BlastN Phylogenetic analysis has shown that S. pombe and S. oct- searches were used. osporus are more closely related to one another than to S. japonicus [55]. Therefore we hypothesize that the LCA PhzF homologs were extracted from GenBank for subse- ancestor of S. pombe and S. octosporus gained the PhzF gene quent phylogenetic analysis. In total 181 representative from an ancestral D. hansenii-like species after speciation protein coding sequences distributed amongst 154 organ- from S. japonicus. We reconstructed a constrained tree that isms were used. These taxa were distributed amongst α, β, placed S. pombe outside the Saccharomycotina clade [see γ and δ-proteobacteria, Actinobacteria, Fungi, Firmicutes a additional file 6B]. The approximately unbiased test of well as other bacterial groups. phylogenetic tree selection (AU test) [53], showed that the phylogenetic inferences of the unconstrained tree are sig- We aligned all sequences and reconstructed a PhzF ML nificantly better (P < 0.05) than the constrained tree [see phylogeny (Figure 5). The C. parapsilosis PhzF homolog is additional file 6]. This infers that S. pombe has obtained a found in a clade with members of the β-proteobacteria PhzF homolog from a member of the CTG clade. (Burkholderia multiovorans, Burkholderia cepacia, Burkholde-

Page 8 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2008, 8:181 http://www.biomedcentral.com/1471-2148/8/181

Burkholderia cenocepacia AU 1054 Agrobacterium tumefaciens str C58

Pseudomonas fluorescens Pf5

1 Chromobacterium violaceum

Delftia acidovorans SPH1 rans CJ2 Malassezia globosa CBS 7966 3

Acidiphilium cryptum JF5 str DC3000 CBPPPA14

Bradyrhizobium sp ORS278 sa U SPH 1 alenivo sa PAO1 a PA7 ngae Frankia alni ACN14a W619 C3719 Bradyrhiz Frankia sp EAN1pec a yri Arthrobacte Janibacter sp HTCC2649 Ustilago maydis 521 philia R551 Acidovorax avenae AAC00 JS666 nosa PACS2 ATCC33913 Kineococcus radiotolerans SRS rugino Gluconacetobacter diazotrophicus PAL5 Frankia sp CcI3 aphth Nocardioides sp JS614 uginos inos Arthrobacter aurescens TC n 6 ae s putida

onas syringae 1448A as obium sp BTAi1 aerugi LS25 3913 r chlorophenolicusAcidovorax A6 sp JS42 as aerug Methylibium petroleiphilumA PM1 tia acidovorans rthrobacter sp FB24 monas Zymomonas mobilis ZM4 elf domonas aerugino Pseudomonas syringae B728a Pseudomonas s eudomona Pseudom D olaromonas sp Comamonas testosteroni KF omon nas campestris str 85 Rhodoferax ferrireducens T118 s Polaromon s oryzaeB Con P trophomonas maltonas axonopodis str 306 P Pseudomonas Coryn Pseu gregibacter litoralis KT71 Pseudomonas aer Pseudoseud onas oryzae KACC10331 Corynebacterium glutamicum P nthomonasthomo campestris ebacterium glutamicum Steno Xa Xan Brevibacterium linens BL2 3 1 Xanthomo ae BLS256 Salinispora tropica CNB440 0216 Xanthomona Mycobacterium ulcerans Agy99 Xanthom omonas campestris ATCC3 Xanthomonasth oryzae MAFF311018 Saccharopolyspora erythraea Stenotrophomonas maltophilia R5513 Xan Xanthomonas campestris pv campestris P Xanthomonas axonopodis str 306s C 17616 Xanthomonas campestris str 85 6 Xanthomonas oryz Xanthomonas oryzae MAFF 311018 Xanthomonas oryzae KACC10331 Roseovarius sp 217 Candida parapsilosi Azotobacter vinelandii AvO 5 alpha proteobacterium BAL199 Burkholderia multivorans ATC alpha proteobacterium BAL199 Burkholderia ambifaria MC40 Burkholderia cepacia AMMD alpha proteobacterium BAL199 Photorhabdus luminescens TTO1 Rhodopseudomonas palustris TIE1 Shewanella baltica OS15 Rhodopseudomonas palustris CGA009 Acinetobacter baumannii Acinetobacter baumannii ATCC 17978 Rhodopseudomonas palustris BisB18 Acinetobacter baumannii Candida glabrata Saccharomyces cerevisiae YJM789 Rhizobium etli CFN 42 Rhizobium leguminosarum WSM1325 Kluyveromyces lactis Vanderwaltozyma polyspora Rhizobium leguminosarum 3841 Clavispora lusitaniae Debaryomyces hansenii CBS767 Rhizobium leguminosarum WSM2304 Schizosaccharomyces pombe 972h Sinorhizobium medicae WSM419 Pichia guilliermondii ATCC 6260 1021 hizobium meliloti Pichia stipitis CBS Sinor 43 Candida tropicali 6054 phototrophica DFL43 s Hoeflea USDA 110 Candida dub 297 Candida al liniensis Bradyrhizobium japonicum 19 Candida al bicans SC5314 Reinekeaicae sp WSM4MED Erwini bicans med a carotov obium m meliloti 102130 Erwinia ch Sinorhiz uis 13pi Soda ora SCRI1043 ella s lis glosrysant SinorhizobiuBruc Yersinia hemi str 3937 Burkh sinid AM 12614Py2 frederikseius Ochrobactrum anthrous Burkhololde str morsit 01 Bu ria phymatum STM rophicli BJ0 rk deri nii ATCC 33ans autot ae Burkholderiaholderia ubonensisa phyto xenov Stappia aggregata popu I Bu 641 bacter er shib Bur rkho orans LB Xantho bacteriumSagittula stellata Burkhkhol lde 815 ria firmans PsJN Burkh deria cenocepacisp 400 Methylo eria sp 383043 Burkholderiaolderia dolosa cen AUO1583 Dinoroseobactld Burkholderiaolderia multivorans cenocepacia A83 M Jannaschia sp CCS1 urkho Burkholderia vietnamiensis G4 B kholderia sp 383 Burkholderia Bur ocepaci Bur antiacus Burkholderia a PC18 C 8106 Burkholdekholderia cepacia AMMD Bu a AU 1054 Burkholde 4 Bur rk ambi roflexus aur 440 Bur holderia oklah C Erwinia carotovora SCRI1ya sp PC Burkh 03 erythraeum IMS101 Burkholderi kholderia pseudoria o Chlo S205 Burkhold kholderia thailandensisfaria Burkhol T opha JMP134 Burkholderia C Pseudomonas aeruginosaLyngb PAO1 Burkholderia pseud r klahomens CCS2 Burkholde o ia pseudomal MC40 C17 Burkhold Burkholderia lde Parac Zymomonas mobilis ZM4 61 Natronomona Halobacteri Al Halobacterium sp NRC1 Cl Clo Herp ri d eria pseud a pseudomallei B7210omensis EO14 kaliphilus metalliredig 6 Trichodesmium ostridium diffi eri a pseudomallei pseudomalle

stridium d M occu a m is SMB43 a arenicola CN etosip Ralstonia eutr eria th ria pseudomallei K9 mallei 91 C6786 pseudomallei 668 lei 9 BurkholderiaChromobacterium cenocepacia MC03 violaceum allei ATCC 23 s den Salinispora tropica CNB oxidans DSM 684 Roseobacter sp tha omal t um s hon aurantiacus ailandensis E26 7 s Salinispor iffici ilandensis i ph itr omal 7894 Bacillus pumilus lei 14 c BCC215 Cyanothece sp CCY0110 p NRC i

i f Synechocystis sp PCC 6803 le QCD araonis DSM 2160 i le 63 cans

Bacillus sp B14905 lei 305 344

Clostridium sp OhILAs

0 Rhodobacterales bacterium HTCC2654 PD1222 Fungi enes QYM TXDOH 1 6243

32g58 α-proteobacteria Clostridium difficile QCD-63q42 Rhodopirellula baltica SH1 4 Clostridium beijerincki NCIMB 8052 Rhodopirellula baltica SH1 Clostridium bolteae ATCC BAA Desulfuromonas ace β-proteobacteria Desulfitobacterium hafniense Y51 Clostridium acetobutylicum ATCC 824

Blastopirellula marina DSM 3645 γ-proteobacteria

F

Campylobacter fetus subsp fetus 8240 Caldicellulosiruptor saccharolyticus Actinobacteria miscellaneous bacteria Archaea PhzFFigure maximum 5 likelihood phylogeny PhzF maximum likelihood phylogeny. The optimum model of protein substitution was found to be WAG+G. The number of gamma rate categories was 4 (alpha = 0.873). Bootstrap resampling (100 iterations) was undertaken and are dis- played. For display purposes branches with less than 50% support were collapsed. Branches are colored according to their tax- onomy. Fungal branches are shown in green. The S. pombe PhzF homolog is highlighted with a red rectangle.

A small basidiomycete clade is evident amongst prokary- bacterial source, and both species have retained this after ote species (Figure 5). Both Ustilago maydis and Malassezia speciation. globosa belong to the Ustilaginomycotina subphylum. Therefore our phylogeny infers that an ancestral Ustilagi- A correspondence analysis of synonymous codon usage nomycotina species gained a PhzF gene from an unknown for all PhzF homologs was also performed and is shown

Page 9 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2008, 8:181 http://www.biomedcentral.com/1471-2148/8/181

in additional information [see additional file 8]. The S. synteny and supports our hypothesis that PhzF was pombe PhzF homolog has a codon usage pattern very sim- recently acquired in this species (Figure 6). Homologs in ilar to the D. hansenii protein. the other CTG species are located in completely different regions of the genome relative to C. parapsilosis (not Gene order around PhzF shown). For example, the C. albicans PhzF homolog is Analysis of the genes adjacent to the PhzF homolog in C. located between orf19.5619 and orf19.5621, whereas the parapsilosis shows that there is a high conservation of gene C. parapsilosis homolog is found between orf19.6689 &

orf19.6689 orf19.6687 orf19.6686 orf19.6685 orf19.6684 C. albicans C. albicans C. albicans C. albicans C. albicans

Cd73320 Cd73200 Cd73190 Cd73180 Cd73170 C. dubliniensis C. dubliniensis C. dubliniensis C. dubliniensis C. dubliniensis

CTRG_05176 CTRG_05178 CTRG_05179 CTRG_05149 CTRG_05150 C. tropicalis C. tropicalis C. tropicalis C. tropicalis C. tropicalis

C146740g C14630g C14608 C14586 C14564 D. hansenii D. hansenii D. hansenii D. hansenii D. hansenii

CLUG_01663 CLUG_01662 CLUG_01661 CLUG_01660 CLUG_01659 Cl. lusitaniae Cl. lusitaniae Cl. lusitaniae Cl. lusitaniae Cl. lusitaniae

PGUG_05628 PGUG_05626 PGUG_05625 PGUG_05624 PGUG_05623 P. guillermondii P. guillermondii P. guillermondii P. guillermondii P. guillermondii

CPAG_03460 CPAG_03462 CPAG_03463 CPAG_03464 CPAG_03465 CPAG_03466 C. parapsilosis C. parapsilosis C. parapsilosis C. parapsilosis C. parapsilosis C. parapsilosis

LELG_05173 LELG_05171 LELG_05170 LELG_05169 LELG_05168 L. elongisporus L. elongisporus L. elongisporus L. elongisporus L. elongisporus

GeneFigure order 6 around C. parapsilosis PhzF gene Gene order around C. parapsilosis PhzF gene. Species names and gene identifiers are shown in each box. Orthologous genes are stacked under one another in pillars. The C. parapsilosis PhzF homolog (CPAG_03462) is highlighted with a red box. Synteny relative to the C. parapsilosis PhzF homolog is conserved in all species. Other CTG PhzF homologs are found in com- pletely different regions of the genome relative to C. parapsilosis. L. elongisporus is the only CTG species missing a PhzF gene, and there is no evidence for a pseudogene in the genome.

Page 10 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2008, 8:181 http://www.biomedcentral.com/1471-2148/8/181

orf19.6687 relative to C. albicans SC5314 (Figure 6). multiple independent transfers into fungi suggests the However, the L. elongisporus genome contains no PhzF protein does confer a biological advantage, although we homolog, either at a position equivalent to the C. parapsi- cannot determine what is. The bacteria-derived PR gene losis copy or elsewhere in the genome. has the potential to be a novel antifungal drug target as there would be no undesired host protein-drug interac- We propose that the LCA of L. elongisporus and C. parapsi- tions. losis lost the PhzF gene present in the other CTG species, and a second (new) copy was subsequently gained by C. The bacterial PhzF homolog (CPAG_03462) found in C. parapsilosis after speciation. We have partial sequence data parapsilosis most likely originated from a proteobacterial (unpublished) from Candida orthopsilosis, a species so source. Most CTG species examined contained PhzF closely related to C. parapsilosis that it was once designated homologs, with the exception of L. elongisporus. The crys- C. parapsilosis group II [56]. We located a C. orthopsilosis PR tal structure the PhzF homolog in S. cerevisiae has been homolog that is 83% identical (at the amino acid level) to determined and while its function remains unknown, it is the C. parapsilosis copy. This implies that the common not thought to be involved in phenazine production [62]. ancestor of C. parapsilosis and C. orthopsilosis acquired the We postulate that the PhzF homolog present in other CTG bacterial PhzF homolog after speciation from L. elongis- species was initially lost by the ancestor of C. parapsilosis porus. and L. elongisporus, but subsequently regained by C. parap- silosis through HGT. The loss of eukaryote genes and sub- Mechanisms of gene transfer into fungi are poorly under- sequent reacquisition of a prokaryotic copy has previously stood. To date no DNA uptake mechanism has been iden- been described in yeast, and can confer specific metabolic tified in CTG species. Interkingdom conjugation between capabilities. An analysis of the biotin biosynthesis path- bacteria and yeast has been observed however [57-59]. way discovered that the ancestor of Candida, Debaryomy- Similarly, Saccharomyces cerevisiae has been shown to be ces, Kluyveromyces and Saccharomyces lost the majority of transformant competent under certain conditions [60]. the pathway after the divergence from the ancestor of Y. CTG species are known interact with bacteria in vivo [61], lipolytica. However, Saccharomyces species have rebuilt the and it is therefore possible that interkingdom conjugation biotin pathway through gene duplication/neofunctional- and transformation may facilitate DNA transfer in C. par- ization after horizontal gene transfer from α and γ proteo- apsilosis. These mechanisms may also be applicable to the bacterial sources [20]. The acquisition of the URA1 gene Pezizomycotina species examined in this analysis. (encoding dihydroorotate dehydrogenase) from Lactoba- cillus and replacement of the endogenous gene in S. cere- Conclusion visiae, allowed growth under anaerobic conditions [19]. We investigated the frequency of recent interkingdom Similarly, acquisition of BDS1 (alkyl-aryl-sulfatase) from gene transfer between CTG and bacterial species. We proteobacteria may have enabled the survival of S. cerevi- located two strongly supported incidences of HGT, both siae in a harsh soil environment [19]. Our PhzF phylogeny within the C. parapsilosis genome. We also located inde- suggests that the PhzF homolog found in most CTG spe- pendent transfers into the Pezizomycotina, Basidiomy- cies originated from an ancient HGT event, from a mem- cotina and Protozoan lineages. ber of the proteobacteria. Our analysis also shows that S. pombe has obtained a PhzF homolog from a CTG species, We cannot determine the exact origin of the PR homolog most likely one closely related to D. hansenii. There is also (CPAG_02038) found in the C. parapsilosis genome. How- phylogenetic evidence showing that an ancestral Ustilagi- ever, based on its phylogenetic position it either origi- nomycotina species gained a PhzF gene from an unknown nated from a Burkholderia source, or more likely an bacterial source. We cannot however, determine the bio- organism not yet represented in the sequence databases. logical advantage to the organisms. Our PR phylogenetic analysis also suggests there were two independent transfers into Pezizomycotina species, one Although it was not the major goal of this study, we did from an Actinobacterial source, and the second is from an locate HGT from bacteria into fungal genomes outside the unknown proteobacterial source. There is also evidence CTG clade, and also inter-fungal transfers. In a previous that T. cruzi has obtained its PR homolog from a Firmi- analysis of HGT in diplomonads, fifteen genes were found cutes species. The transferred PR genes analyzed here to have undergone HGT [18]. There is phylogenetic evi- belong to a superfamily of proline racemases, although dence that these genes have undergone independent we cannot determine their exact function in the fungal transfers into other eukaryotic lineages including Fungi. species examined. Their proximity to an amino acid trans- Therfore, in eukaryotes just as HGT has affected some spe- porter (in C. parapsilosis) and a FAD dependent oxidore- cies more than others [63], there may be groups of genes ductase (in A. niger and G. zeae) suggests they do have a that are more likely to be taken up through HGT than oth- role in amino acid metabolism. Furthermore, evidence of ers. We cannot test this directly however, as we have not

Page 11 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2008, 8:181 http://www.biomedcentral.com/1471-2148/8/181

identified all cases of HGT from bacteria to fungi outside Blast based approach to detect potential horizontally the CTG clade. transferred genes Taking one CTG species at a time, we located gene families Our analysis indicates that recent interkingdom gene of interest by comparing individual protein coding genes transfer into extant CTG species is negligible. This sup- against the UniProt database (v11.1) using the BlastP ports a previous hypothesis that genetic code alterations algorithm [31] with a cutoff expectation (E) value of 10-20. blocks horizontal gene transfer [64]. It should be noted To use all available sequence data, CTG proteins with a however that we searched for recent bacterial gene trans- top database hit to a bacterial protein in UniProt were fers into individual CTG species, and not for more ancient extracted for a second round of database searching against transfers. We took this approach because the presence of GenBank (E value of 10-20). Proteins which also had a top recently gained bacterial genes in a eukaryote genome database hit to a bacterial protein in GenBank were con- should be readily detected compared to older transfers. sidered as possible incidences of horizontal gene transfer. Similarly, we have not investigated eukaryote-to-eukary- All putative homologs were extracted from GenBank and ote transfers. It is therefore possible that we have underes- searched against the relevant CTG genome to ensure a timated the overall rate of HGT into the CTG lineage. The reciprocal best Blast hit. For completeness, CTG proteins discovery of HGT in other fungal lineages implies that not yet deposited in GenBank were added to gene families HGT plays an important role in fungal evolution and of interest where appropriate. deserves further analysis. In particular a strategy which can detect ancient gene transfers would be meaningful. Accession numbers for all sequences used in this analysis can be found in additional material [see additional file 2]. Methods Sequence data Phylogenetic methods The complete C. albicans (SC5314) genome (Assembly Gene families were aligned using MUSCLE (v3.6) [74] 19) was obtained from the Candida genome database using the default settings. Obvious alignment ambiguities [65]. The Broad institute have sequenced and annotated were corrected manually. five CTG species (C. albicans (WO-1), C. tropicalis, L. elong- isporus, P. guilliermondii, and Cl. lusitaniae). These Phylogenetic relationships were inferred using maximum genomes were obtained directly from the Broad Institute likelihood methods. Appropriate protein models of sub- [66]. Gene sets for the C. dubliniensis were downloaded stitution were selected for each gene family using Model- from GeneDB [44]. Generator [75]. One hundred bootstrap replicates were then carried out with the appropriate protein model using The incomplete C. parapsilosis geneome was downloaded the software program PHYML [76] and summarized using from the Sanger Institute [67]. Gene annotations were the majority-rule consensus method. performed using two separate approaches. The first involved a reciprocal best BLAST [31] search with a cutoff We performed the approximately unbiased test of phylo- E- value of 10-7 of Candida albicans SC5314 protein coding genetic tree selection [53], to assess whether differences in genes against the unannotated C. parapsilosis genome. Top topology between constrained and unconstrained gene BLAST hits longer than 300 nucleotides were retained as trees are no greater than expected by chance. putative open reading frames. The second approach involved a pipeline of analysis that combined several dif- Codon usage analysis and spectral analysis ferent gene prediction programs, including ab initio pro- To determine if the putative HGT genes had a different grams SNAP [68], Genezilla [69], and AUGUSTUS [69], codon usage pattern to the host genome an analysis of with gene models from Exonerate [70] and Genewise [71] variation in synonymous codon usage was undertaken based on alignments of proteins and Expressed Sequence using the GCUA software [77]. Individual correspondence Tags. Putative gene sets from both approaches were analyses of raw codon counts for the Candida parapsilosis, imported into Artemis [72] and cross corroborated manu- Ustilago maydis, Malassezia globosa, Aspergillus flavus, ally. The resultant gene sets contained 5,823 protein-cod- Aspergillus niger, Gibberella zeae, Aspergillus oryzae, Phae- ing genes. The C. parapsilosis genome was also annotated osphaeria nodorum, and Schizosaccharomyces pombe by the Broad Institute, and where possible we have used genomes were performed, with the first four principal axes the gene names they assigned. being used to evaluate synonymous codon usage patterns. Similar analyses were also carried out on members of the The UniProt database (v11.1) was downloaded [73]. proline racemase and phenazine F gene families displayed Database searches against GenBank refer to release 164.0. in Figures 2 and 4. We used spectrum [78] to perform a spectral analysis on a subset of the phenazine data.

Page 12 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2008, 8:181 http://www.biomedcentral.com/1471-2148/8/181

Authors' contributions DAF, MEL and GB were involved in the design phase. MEL Additional file 5 predicted genes in unannotated genomes. DAF sourced Correspondence analysis of codon usage in the proline racemase gene homologs, examined synteny and performed phyloge- family analyzed in this study. Major groups are color-coded. The C. par- apsilosis PR gene has a codon usage pattern distinct from other fungal netic analyses. DAF and GB drafted the manuscript. All species in this analysis. It is also quite distinct from the Burkholderia (β- authors read and approved the final manuscript. proteobacteria) species, which were found to be its phylogenetic neighbors. Click here for file Additional material [http://www.biomedcentral.com/content/supplementary/1471- 2148-8-181-S5.eps]

Additional file 1 Additional file 6 Examples of reported incidences of interkingdom gene transfer Trees for approximately unbiased test for PhzF homologs. Tree A is the between prokaryotes and fungi. One Kluyveromyces lactis gene original unconstrained topology, which groups C. parapsilosis with pro- (KLLA0D19949g) previously highlighted [22], been omitted as it is no teobacteria. Topology B is a constrained tree that places S. pombe outside longer recognized as an ORF. Y. lipolytica genes denoted with a * and ^ the Saccharomycotina clade. Topologies C-H are constrained and place C. indicate possible gene duplications after HGT. parapsilosis amongst the other Saccharomycotina species. Log likelihood Click here for file scores for each tree are given. To assess the likelihood that any differences [http://www.biomedcentral.com/content/supplementary/1471- in topology between the inferred trees is no more significant that that 2148-8-181-S1.doc] expected by chance, we performed the approximately unbiased test. The AU test shows that the unconstrained tree receives the optimal likelihood Additional file 2 tree score. Furthermore, the differences in likelihood scores when com- GenBank accession numbers for PR (A) and PhzF (B) sequences used pared to the constrained trees are significant (P < 0.05). Therefore based in this analysis. Species identified with an * use the accession numbers on these results the placement of the C. parapsilosis homolog in the pro- created by the Broad Institute [66] or the Wellcome Trust Sanger Institute teobacterial clade to the exclusion of the Saccharomycotina and S. pombe [67]. within the Saccharomycotina clade is significant. Click here for file Click here for file [http://www.biomedcentral.com/content/supplementary/1471- [http://www.biomedcentral.com/content/supplementary/1471- 2148-8-181-S2.doc] 2148-8-181-S6.eps]

Additional file 3 Additional file 7 Gene order around Pezizomycotina proline racemase genes. Species PhzF spectral analysis. Analysis was performed on the Saccharomycotina names and identifiers are shown in each box. PR genes are labeled. Gene and selected proteobacterial clade. Bars above the x-axis represent fre- identifiers relate to annotations from the Broad Institute. Clade letters in quency of support for each split. Bars below the x-axis represent the sum parentheses correspond to those in Figure 2. There is evidence for con- of all corresponding conflicts. Clad grams above columns represent the cor- served gene synteny between some species such as A. oryzae and A. flavus responding splits in the data. There is no support for the placement of C. (clade-C). A. flavus/A. terreus and N. fischeri/A. fumigatus in the parapsilosis with the other Saccharomycotina species. Metazoan/Pezizomycotina clade (B). The A. flavus gene denoted by a * Click here for file is absent from the Broad gene set but we were able to locate it with a [http://www.biomedcentral.com/content/supplementary/1471- BlastX search. 2148-8-181-S7.eps] Click here for file [http://www.biomedcentral.com/content/supplementary/1471- Additional file 8 2148-8-181-S3.eps] Correspondence analysis of codon usage in the PhzF gene family ana- lyzed in this study. Major groups are color-coded. The C. parapsilosis Additional file 4 PhzF gene has a codon usage pattern similar to other CTG species ana- Correspondence analysis of codon usage. Correspondence analysis of lyzed. It is quite distinct from the proteobacterial species that were found codon usage in the C. parapsilosis (1), U. maydis (2), M. globosa (3), to be its phylogenetic neighbors. A. flavus (4), A. niger (5), G. zeae (6), A. oryzae (7), P. nodorum Click here for file (8), and S. pombe (9) genomes. Transferred genes are highlighted. All [http://www.biomedcentral.com/content/supplementary/1471- have a codon usage similar to the rest of their genomes which is unsurpris- 2148-8-181-S8.eps] ing as transferred genes have been shown to ameliorate their codon usage to their hosts [79]. Click here for file [http://www.biomedcentral.com/content/supplementary/1471- Acknowledgements 2148-8-181-S4.pdf] The authors wish to acknowledge the Wellcome Trust Sanger Institute and Broad institute of MIT & Harvard for releasing data ahead of publication. We would like to acknowledge the financial support of the Irish Research Council for Science, Engineering and Technology (IRCSET), the Irish Health Research Board (HRB) and Science Foundation Ireland (SFI). We wish to acknowledge the SFI/HEA Irish Centre for High-End Computing (ICHEC) for the provision of computational facilities and support. We thank Mike Lorenz and Paul Dyer for fungal strains. We also thank Jason Stajich for help with gene annotations.

Page 13 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2008, 8:181 http://www.biomedcentral.com/1471-2148/8/181

References Joyet P, Kachouri R, Kerrest A, Koszul R, Lemaire M, Lesur I, Ma L, 1. Doolittle WF: Lateral genomics. Trends Cell Biol 1999, 9(12):M5-8. Muller H, Nicaud JM, Nikolski M, Oztas S, Ozier-Kalogeropoulos O, 2. Jain R, Rivera MC, Moore JE, Lake JA: Horizontal gene transfer Pellenz S, Potier S, Richard GF, Straub ML, Suleau A, Swennen D, accelerates genome innovation and evolution. Mol Biol Evol Tekaia F, Wesolowski-Louvel M, Westhof E, Wirth B, Zeniou-Meyer 2003, 20(10):1598-1602. M, Zivanovic I, Bolotin-Fukuhara M, Thierry A, Bouchier C, Caudron 3. Eisen JA: Assessing evolutionary relationships among B, Scarpelli C, Gaillardin C, Weissenbach J, Wincker P, Souciet JL: microbes from whole-genome analysis. Curr Opin Microbiol Genome evolution in yeasts. Nature 2004, 430(6995):35-44. 2000, 3(5):475-480. 23. Gojkovic Z, Knecht W, Zameitat E, Warneboldt J, Coutelis JB, Pyn- 4. Woo PC, To AP, Lau SK, Yuen KY: Facilitation of horizontal yaha Y, Neuveglise C, Moller K, Loffler M, Piskur J: Horizontal gene transfer of antimicrobial resistance by transformation of transfer promoted evolution of the ability to propagate antibiotic-induced cell-wall-deficient bacteria. Med Hypotheses under anaerobic conditions in yeasts. Mol Genet Genomics 2004, 2003, 61(4):503-508. 271(4):387-393. 5. Martin K, Morlin G, Smith A, Nordyke A, Eisenstark A, Golomb M: 24. Brinkman FS, Macfarlane EL, Warrener P, Hancock RE: Evolution- The tryptophanase gene cluster of Haemophilus influenzae ary relationships among virulence-associated histidine type b: evidence for horizontal gene transfer. J Bacteriol 1998, kinases. Infect Immun 2001, 69(8):5207-5211. 180(1):107-118. 25. Temporini ED, VanEtten HD: An analysis of the phylogenetic dis- 6. Kurland CG, Canback B, Berg OG: Horizontal gene transfer: a tribution of the pea pathogenicity genes of Nectria haema- critical view. Proc Natl Acad Sci U S A 2003, 100(17):9658-9662. tococca MPVI supports the hypothesis of their origin by 7. Andersson JO: Lateral gene transfer in eukaryotes. Cell Mol Life horizontal transfer and uncovers a potentially new pathogen Sci 2005, 62(11):1182-1197. of garden pea: Neocosmospora boniensis. Curr Genet 2004, 8. Dujon B: Hemiascomycetous yeasts at the forefront of com- 46(1):29-36. parative genomics. Curr Opin Genet Dev 2005, 15(6):614-620. 26. Wenzl P, Wong L, Kwang-won K, Jefferson RA: A functional 9. Friesen TL, Stukenbrock EH, Liu Z, Meinhardt S, Ling H, Faris JD, Ras- screen identifies lateral transfer of beta-glucuronidase (gus) mussen JB, Solomon PS, McDonald BA, Oliver RP: Emergence of a from bacteria to fungi. Mol Biol Evol 2005, 22(2):308-316. new disease as a result of interspecific virulence gene trans- 27. Garcia-Vallve S, Romeu A, Palau J: Horizontal gene transfer of fer. Nat Genet 2006, 38(8):953-956. glycosyl hydrolases of the rumen fungi. Mol Biol Evol 2000, 10. Inderbitzin P, Harkness J, Turgeon BG, Berbee ML: Lateral transfer 17(3):352-361. of mating system in Stemphylium. Proc Natl Acad Sci U S A 2005, 28. Klotz MG, Klassen GR, Loewen PC: Phylogenetic relationships 102(32):11390-11395. among prokaryotic and eukaryotic catalases. Mol Biol Evol 11. Kavanaugh LA, Fraser JA, Dietrich FS: Recent evolution of the 1997, 14(9):951-958. human pathogen Cryptococcus neoformans by intervarietal 29. Fitzpatrick DA, Logue ME, Stajich JE, Butler G: A Fungal phylogeny transfer of a 14-gene fragment. Mol Biol Evol 2006, based on 42 complete genomes derived from supertree and 23(10):1879-1890. combined gene analysis. BMC Evol Biol 2006, 6:99. 12. Khaldi N, Collemare J, Lebrun MH, Wolfe KH: Evidence for hori- 30. Sugita T, Nakase T: Non-universal usage of the leucine CUG zontal transfer of a secondary metabolite gene cluster codon and the molecular phylogeny of the genus Candida. between fungi. Genome Biol 2008, 9(1):R18. Syst Appl Microbiol 1999, 22(1):79-86. 13. Paoletti M, Buck KW, Brasier CM: Selective acquisition of novel 31. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lip- mating type and vegetative incompatibility genes via inter- man DJ: Gapped BLAST and PSI-BLAST: a new generation of species gene transfer in the globally invading eukaryote protein database search programs. Nucleic Acids Res 1997, Ophiostoma novo-ulmi. Mol Ecol 2006, 15(1):249-262. 25(17):3389-3402. 14. Slot JC, Hallstrom KN, Matheny PB, Hibbett DS: Diversification of 32. Cardinale GJ, Abeles RH: Purification and mechanism of action NRT2 and the origin of its fungal homolog. Mol Biol Evol 2007, of proline racemase. Biochemistry 1968, 7(11):3970-3978. 24(8):1731-1743. 33. Logue ME, Wong S, Wolfe KH, Butler G: A genome sequence sur- 15. Slot JC, Hibbett DS: Horizontal transfer of a nitrate assimila- vey shows that the pathogenic yeast Candida parapsilosis has tion gene cluster and ecological transitions in fungi: a phylo- a defective MTLa1 allele at its mating type locus. Eukaryot Cell genetic study. PLoS ONE 2007, 2(10):e1097. 2005, 4(6):1009-1017. 16. Waller RF, Slamovits CH, Keeling PJ: Lateral gene transfer of a 34. Lartillot N, Brinkmann H, Philippe H: Suppression of long-branch multigene region from cyanobacteria to dinoflagellates attraction artefacts in the animal phylogeny using a site-het- resulting in a novel plastid-targeted fusion protein. Mol Biol erogeneous model. BMC Evol Biol 2007, 7 Suppl 1:S4. Evol 2006, 23(7):1437-1443. 35. Langley R, Kenna DT, Vandamme P, Ure R, Govan JR: Lysogeny and 17. Wei W, McCusker JH, Hyman RW, Jones T, Ning Y, Cao Z, Gu Z, bacteriophage host range within the Burkholderia cepacia Bruno D, Miranda M, Nguyen M, Wilhelmy J, Komp C, Tamse R, complex. J Med Microbiol 2003, 52(Pt 6):483-490. Wang X, Jia P, Luedi P, Oefner PJ, David L, Dietrich FS, Li Y, Davis 36. Eberl L, Tummler B: Pseudomonas aeruginosa and Burkholde- RW, Steinmetz LM: Genome sequencing and comparative anal- ria cepacia in cystic fibrosis: genome evolution, interactions ysis of Saccharomyces cerevisiae strain YJM789. Proc Natl Acad and adaptation. Int J Med Microbiol 2004, 294(2-3):123-131. Sci U S A 2007, 104(31):12825-12830. 37. Tuanyok A, Auerbach RK, Brettin TS, Bruce DC, Munk AC, Detter 18. Andersson JO, Sjogren AM, Davis LA, Embley TM, Roger AJ: Phylo- JC, Pearson T, Hornstra H, Sermswan RW, Wuthiekanun V, Peacock genetic analyses of diplomonad genes reveal frequent lateral SJ, Currie BJ, Keim P, Wagner DM: A horizontal gene transfer gene transfers affecting eukaryotes. Curr Biol 2003, event defines two distinct groups within Burkholderia pseu- 13(2):94-104. domallei that have dissimilar geographic distributions. J Bac- 19. Hall C, Brachat S, Dietrich FS: Contribution of horizontal gene teriol 2007. transfer to the evolution of Saccharomyces cerevisiae. 38. Buschiazzo A, Goytia M, Schaeffer F, Degrave W, Shepard W, Gre- Eukaryot Cell 2005, 4(6):1102-1115. goire C, Chamond N, Cosson A, Berneman A, Coatnoan N, Alzari 20. Hall C, Dietrich FS: The Reacquisition of Biotin Prototrophy in PM, Minoprio P: Crystal structure, catalytic mechanism, and Saccharomyces cerevisiae Involved Horizontal Gene Trans- mitogenic properties of Trypanosoma cruzi proline race- fer, Gene Duplication and Gene Clustering. Genetics 2007, mase. Proc Natl Acad Sci U S A 2006, 103(6):1705-1710. 177(4):2293-2307. 39. Lamzin VS, Dauter Z, Wilson KS: How nature deals with stereoi- 21. Woolfit M, Rozpedowska E, Piskur J, Wolfe KH: Genome survey somers. Curr Opin Struct Biol 1995, 5(6):830-836. sequencing of the wine spoilage yeast Dekkera (Brettanomy- 40. Fisher GH: Appearance of D-amino acids during aging: D- ces) bruxellensis. Eukaryot Cell 2007, 6(4):721-733. amino acids in tumor proteins. Exs 1998, 85:109-118. 22. Dujon B, Sherman D, Fischer G, Durrens P, Casaregola S, Lafontaine 41. Chamond N, Gregoire C, Coatnoan N, Rougeot C, Freitas-Junior LH, I, De Montigny J, Marck C, Neuveglise C, Talla E, Goffard N, Frangeul da Silveira JF, Degrave WM, Minoprio P: Biochemical characteri- L, Aigle M, Anthouard V, Babour A, Barbe V, Barnay S, Blanchin S, zation of proline racemases from the human protozoan par- Beckerich JM, Beyne E, Bleykasten C, Boisrame A, Boyer J, Cattolico asite Trypanosoma cruzi and definition of putative protein L, Confanioleri F, De Daruvar A, Despons L, Fabre E, Fairhead C, signatures. J Biol Chem 2003, 278(18):15484-15494. Ferry-Dumazet H, Groppi A, Hantraye F, Hennequin C, Jauniaux N,

Page 14 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2008, 8:181 http://www.biomedcentral.com/1471-2148/8/181

42. Wolosker H, Blackshaw S, Snyder SH: Serine racemase: a glial 66. The Candida group at the Broad Institute enzyme synthesizing D-serine to regulate glutamate-N- [http:www.broad.mit.edu/annotation/genome/candida_group/Multi methyl-D-aspartate neurotransmission. Proc Natl Acad Sci U S Home.html] A 1999, 96(23):13409-13414. 67. The Wellcome Trust Sanger Institute [http:// 43. Reina-San-Martin B, Degrave W, Rougeot C, Cosson A, Chamond N, www.sanger.ac.uk/] Cordeiro-Da-Silva A, Arala-Chaves M, Coutinho A, Minoprio P: A B- 68. Korf I: Gene finding in novel genomes. BMC Bioinformatics 2004, cell mitogen from a pathogenic trypanosome is a eukaryotic 5:59. proline racemase. Nat Med 2000, 6(8):890-897. 69. Majoros WH, Pertea M, Salzberg SL: TigrScan and Glimmer- 44. GeneDB [http://www.genedb.org] HMM: two open source ab initio eukaryotic gene-finders. Bio- 45. Stevens JR, Gibson WC: The evolution of pathogenic trypano- informatics 2004, 20(16):2878-2879. somes. Cad Saude Publica 1999, 15(4):673-684. 70. Slater GS, Birney E: Automated generation of heuristics for bio- 46. Fischer G, James SA, Roberts IN, Oliver SG, Louis EJ: Chromo- logical sequence comparison. BMC Bioinformatics 2005, 6(1):31. somal evolution in Saccharomyces. Nature 2000, 71. Birney E, Clamp M, Durbin R: GeneWise and Genomewise. 405(6785):451-454. Genome Res 2004, 14(5):988-995. 47. Pfam [http://pfam.sanger.ac.uk] 72. Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, 48. Medigue C, Rouxel T, Vigier P, Henaut A, Danchin A: Evidence for Barrell B: Artemis: sequence visualization and annotation. Bio- horizontal gene transfer in Escherichia coli speciation. J Mol informatics 2000, 16(10):944-945. Biol 1991, 222(4):851-856. 73. The UniProt database [ftp://ftp.uniprot.org/pub/databases/] 49. Ochman H, Lawrence JG, Groisman EA: Lateral gene transfer and 74. Edgar RC: MUSCLE: multiple sequence alignment with high the nature of bacterial innovation. In Nature Volume 405. Issue accuracy and high throughput. Nucleic Acids Res 2004, 6784 ENGLAND ; 2000:299-304. 32(5):1792-1797. 50. Rudnick G, Abeles RH: Reaction mechanism and structure of 75. Keane TM, Creevey CJ, Pentony MM, Naughton TJ, McLnerney JO: the active site of proline racemase. Biochemistry 1975, Assessment of methods for amino acid matrix selection and 14(20):4515-4522. their use on empirical data shows that ad hoc assumptions 51. Blankenfeldt W, Kuzin AP, Skarina T, Korniyenko Y, Tong L, Bayer P, for choice of matrix are not justified. BMC Evol Biol 2006, 6:29. Janning P, Thomashow LS, Mavrodi DV: Structure and function of 76. Guindon S, Gascuel O: A simple, fast, and accurate algorithm the phenazine biosynthetic protein PhzF from Pseudomonas to estimate large phylogenies by maximum likelihood. Syst fluorescens. Proc Natl Acad Sci U S A 2004, 101(47):16431-16436. Biol 2003, 52(5):696-704. 52. Parsons JF, Song F, Parsons L, Calabrese K, Eisenstein E, Ladner JE: 77. McInerney JO: GCUA: general codon usage analysis. Bioinfor- Structure and function of the phenazine biosynthesis protein matics 1998, 14(4):372-373. PhzF from Pseudomonas fluorescens 2-79. Biochemistry 2004, 78. Charleston MA: Spectrum: spectral analysis of phylogenetic 43(39):12427-12435. data. Bioinformatics (Oxford, England) 1998, 14(1):98-99. 53. Shimodaira H: An approximately unbiased test of phylogenetic 79. Lawrence JG, Ochman H: Amelioration of bacterial genomes: tree selection. Syst Biol 2002, 51(3):492-508. rates of change and exchange. J Mol Evol 1997, 44(4):383-397. 54. The Schizosaccharomyces group at the Broad Institute [http://www.broad.mit.edu/annotation/genome/ schizosaccharomyces_group/MultiHome.html] 55. Bullerwell CE, Leigh J, Forget L, Lang BF: A comparison of three fission yeast mitochondrial genomes. Nucleic Acids Res 2003, 31(2):759-768. 56. Lin D, Wu LC, Rinaldi MG, Lehmann PF: Three distinct genotypes within Candida parapsilosis from clinical sources. J Clin Micro- biol 1995, 33(7):1815-1821. 57. Heinemann JA, Sprague GF Jr.: Bacterial conjugative plasmids mobilize DNA transfer between bacteria and yeast. Nature 1989, 340(6230):205-209. 58. Inomata K, Nishikawa M, Yoshida K: The yeast Saccharomyces kluyveri as a recipient eukaryote in transkingdom conjuga- tion: behavior of transmitted plasmids in transconjugants. J Bacteriol 1994, 176(15):4770-4773. 59. Sawasaki Y, Inomata K, Yoshida K: Trans-kingdom conjugation between Agrobacterium tumefaciens and Saccharomyces cere- visiae, a bacterium and a yeast. Plant Cell Physiol 1996, 37(1):103-106. 60. Nevoigt E, Fassbender A, Stahl U: Cells of the yeast Saccharomy- ces cerevisiae are transformable by DNA under non-artificial conditions. Yeast 2000, 16(12):1107-1110. 61. Hogan DA, Kolter R: Pseudomonas-Candida interactions: an ecological role for virulence factors. Science 2002, 296(5576):2229-2232. 62. Liger D, Quevillon-Cheruel S, Sorel I, Bremang M, Blondeau K, Aboul- Publish with BioMed Central and every fath I, Janin J, van Tilbeurgh H, Leulliot N: Crystal structure of YHI9, the yeast member of the phenazine biosynthesis PhzF scientist can read your work free of charge enzyme superfamily. Proteins 2005, 60(4):778-786. "BioMed Central will be the most significant development for 63. Keeling PJ, Burger G, Durnford DG, Lang BF, Lee RW, Pearlman RE, disseminating the results of biomedical research in our lifetime." Roger AJ, Gray MW: The tree of eukaryotes. Trends Ecol Evol 2005, 20(12):670-676. Sir Paul Nurse, Cancer Research UK 64. Silva RM, Paredes JA, Moura GR, Manadas B, Lima-Costa T, Rocha R, Your research papers will be: Miranda I, Gomes AC, Koerkamp MJ, Perrot M, Holstege FC, Bouche- rie H, Santos MA: Critical roles for a genetic code alteration in available free of charge to the entire biomedical community the evolution of the genus Candida. Embo J 2007, peer reviewed and published immediately upon acceptance 26(21):4555-4565. 65. The Candida Genome Database [http://www.candidagen cited in PubMed and archived on PubMed Central ome.org] yours — you keep the copyright

Submit your manuscript here: BioMedcentral http://www.biomedcentral.com/info/publishing_adv.asp

Page 15 of 15 (page number not for citation purposes)