<<

Kollmar et al. BMC Research Notes 2012, 5:88 http://www.biomedcentral.com/1756-0500/5/88

RESEARCH ARTICLE Open Access Evolution of the eukaryotic ARP2/3 activators of the WASP family: WASP, WAVE, WASH, and WHAMM, and the proposed new family members WAWH and WAML Martin Kollmar*, Dawid Lbik and Stefanie Enge

Abstract Background: WASP family proteins stimulate the actin-nucleating activity of the ARP2/3 complex. They include members of the well-known WASP and WAVE/Scar proteins, and the recently identified WASH and WHAMM proteins. WASP family proteins contain family specific N-terminal domains followed by proline-rich regions and C- terminal VCA domains that harbour the ARP2/3-activating regions. Results: To reveal the evolution of ARP2/3 activation by WASP family proteins we performed a “holistic” analysis by manually assembling and annotating all homologs in most of the eukaryotic genomes available. We have identified two new families: the WAML proteins (WASP and MIM like), which combine the membrane-deforming and actin bundling functions of the IMD domains with the ARP2/3-activating VCA regions, and the WAWH protein (WASP without WH1 domain) that have been identified in amoebae, Apusozoa, and the anole lizard. Surprisingly, with one exception we did not identify any alternative splice forms for WASP family proteins, which is in strong contrast to other actin-binding proteins like Ena/VASP, MIM, or NHS proteins that share domains with WASP proteins. Conclusions: Our analysis showed that the last common ancestor of the eukaryotes must have contained a homolog of WASP, WAVE, and WASH. Specific families have subsequently been lost in many taxa like the WASPs in plants, algae, Stramenopiles, and Euglenozoa, and the WASH proteins in fungi. The WHAMM proteins are metazoa specific and have most probably been invented by the Eumetazoa. The diversity of WASP family proteins has strongly been increased by many species- and taxon-specific gene duplications and multimerisations. All data is freely accessible via http://www.cymobase.org. Keywords: ARP2/3 activation, WASP family proteins, WASP, WAVE, WASH, WHAMM

Background The ARP2/3 complex consists of two actin-related pro- Eukaryotes mediate many cellular processes, including teins, Arp2 and Arp3, and five additional subunits [5,8]. phagocytosis, intracellular trafficking, cell division, and The ARP2/3 complex alone is very inefficient in nucle- cell locomotion, by remodelling of the actin cytoskeleton ating actin filaments but becomes activated when bind- [1-4]. In many cases, signals from Rho family G-proteins ing to the sites of existing actin filaments. In addition, it activate proteins of the WASP family (WASP, Wiskott- needs to bind to so-called nucleation promoting factors Aldrich syndrome protein), which stimulate the actin (NPFs). NPFs of the WASP family belong to class I filament branching activity of the ARP2/3 complex [5] NPFs and have characteristic VCA domains at their C- resulting in specialised networks of actin filaments [6,7]. termini consisting of one or more WH2 domains (V; WASP homology 2 domain, also known as verprolin * Correspondence: [email protected] homology domain), a central domain (C; also known as Abteilung NMR basierte Strukturbiologie, Max-Planck-Institut für cofilin homology or connector domain), and an acidic Biophysikalische Chemie, Am Fassberg 11, D-37077 Göttingen, Germany

© 2012 Kollmar et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Kollmar et al. BMC Research Notes 2012, 5:88 Page 2 of 23 http://www.biomedcentral.com/1756-0500/5/88

(A) C-terminal domain. While the WH2 domains bind with actin, membranes, and microtubules) [20], and G-actin the CA domains bind to and activate the ARP2/ JMY [21], which is a WHAMM homolog [22]. Still, 3 complex. there is only limited knowledge about their structure, The founding member of the WASP family was the sequence homology, and phylogenetic distribution. Early WASP protein, which is deficient in Wiskott-Aldrich analyses have revealed that WHAMM might be syndrome. The Wiskott-Aldrich syndrome is an X- restricted to [20], while WASH seems to be linked human genetic disorder that leads to a variety of more widely distributed [19,22]. Like WAVE, WASH defects in the immune system, including thrombocyto- functions in a heteropentameric complex containing penia, compromised immune function, and susceptibility Strumpelin, FAM21, SWIP, and CCDC53, which show to leukemias and lymphomas [9-11]. In a genetic screen some sequence homology to the corresponding proteins for second-site suppressors of a mutation in one of the of the WAVE complex [23]. cAMP receptors, another protein that showed significant Very recently, the genomes of some representative homology to WASP, called Scar, was identified in Dic- eukaryotes have been analysed with respect to their tyostelium discoideum [12]. At the same time the so- WASP family protein content [22]. However, the analy- called WAVE-1 protein (WASP-family verprolin-homo- sis of motor proteins in sequenced insects [24] has logous protein-1) was identified in a database search for shown that it is essential not only to analyse the gen- proteins with sequence homology to the VCA region omes of representative members of the main eukaryotic [13]. The name Scar has subsequently been used for the taxa but to also analyse the genomes of very closely Dictyostelium and the plant homologs, while WAVE has related species, because not even those might contain been used for all metazoan family members. the same set of homologs. In addition, it is essential to WASP proteins are multi-domain proteins consisting search directly in the respective genome sequences and of an N-terminal WH1 domain, a basic domain contain- to manually reconstruct potential homologs because ing many lysine and arginine residues, a GTPase binding datasets of predicted proteins are far from being com- domain (GBD) consisting of a CRIB motif (for Cdc42/ plete. In those automatically generated datasets many Rac interactive binding) followed by an autoinhibitory proteins are missing at all, and most of the predicted domain, a poly-proline region, and the VCA domain. proteins contain mispredicted intronic sequence and The WASP protein by itself is inactive through intramo- miss exons thus hindering the identification of all lecular binding of the C domain to the autoinhibitory homologs. Especially in the case of the WASP family domain [14]. Binding of G-proteins to the GBD domain proteins, that contain many very short domains and releasing the intramolecular link activates the WASP sequence motifs, mispredicted exons easily result in proteins. A comparison of proteins of the WASP and missing domains and thus misleading hypotheses about the WAVE family shows that WAVE shares many of the potential physiological functions. Here we performed a WASP domains, including the polyproline rich region search for WASP family proteins in the available gen- and the VCA domain. Unlike WASP, WAVE family ome assemblies of more than 400 eukaryotes. All members do not contain a GBD domain and they have sequences were identified at the genomic DNA level a novel protein region instead of the WH1 domain, and the genes reconstructed manually. termed WAVE homology domain (WHD). WASP forms a constitutive complex with WIP (WASP interacting Results protein), which binds to the WH1 domain and has been Identification and assembly of WASP family proteins shown to stabilize WASP [15]. Three WIP homologs The WASP family proteins contain many but very short have been identified in the human genome [16], and all sequence motifs and domains. These domains are not seem to be able to bind to the two WASP homologs. In only characteristic for WASP proteins but are also part contrast, WAVE proteins form a 400 kDa heteropenta- of proteins from other families like the WHD domain, meric assembly called the WAVE regulatory complex that is shared by the Nance-Horan syndrome protein that consists of WAVE, a protein from the CYFIP family (NHS), the WH1 domain, that is shared by the Ena/ (e.g. SRA1), NAP1, ABI, and HSPC300 [17,18]. The VASP protein family, the WH2 domain, that is con- three mammalian WAVE homologs seem to participate tained for example in Spire, Cobl, missing-in-metastasis in the same protein complex and thus differences (MIM), and WIP, and the CA domains, that are shared between them are likely to be at the level of tissue dis- for example by certain coronins and class-1 myosins. tribution [17]. Given such short sequence motifs a few missing residues In the last years, several further proteins have been could prevent their identification and lead to a wrong identified that also activate the ARP2/3 complex like classification or even removal from the dataset. Vice WASH (Wiskott-Aldrich syndrome protein and Scar versa, the identification of just one or two of the homolog) [19], WHAMM (WASP homolog associated domains contained in one of the WASP family proteins Kollmar et al. BMC Research Notes 2012, 5:88 Page 3 of 23 http://www.biomedcentral.com/1756-0500/5/88

and subsequent thorough analysis of the genomic region Classification and analysis of the WASP protein family could lead to the identification of so far unrecognized or The WASP proteins are characterised by an N-terminal very divergent members of the WASP family. While this WH1 domain followed by a basic domain (containing subfamily categorization is mainly based on domain many lysins and arginines), a CRIB motif, a domain that organisation patterns, the classification of the duplicated we suggest to name WAID domain (WASP autoinhibi- WASP family genes into subgroups is based on the phy- tory domain), a unique region consisting of many pro- logenetic analysis of the corresponding protein line residues (PPR, poly-proline region) and a VCA sequences. Thus, it is of major importance to obtain the domain (Figure 1). In most of the previous descriptions best sequence data possible and to create the most accu- of WASP proteins, the CRIB motif and the WAID rate sequence alignment. We had to manually assemble domain have been described together as the GBD and annotate all WASP protein family sequences used domain (GTPase binding domain). Here, we suggest in this study mainly for two reasons. A) Automatic gene regarding the CRIB motif and the WAID domain as dis- predictions are error-prone and even those gene predic- tinct domains because they are structurally separated tions are only available for a small subset of all [14], they perform different functions, and they are dif- sequenced eukaryotic genomes [25]. B) Because all ferentially included into WASP and WAWH proteins. WASP family genes encode long unstructured and low- For example, while the fungi of the Chytridiomycota, complexity regions (for example the poly-proline the Blastocladiomycota, the Mucoromycotina, and the regions), and the acidic C-termini (the A domain, also Basidiomycota branches contain both the CRIB and the of low complexity) are often encoded by a single exon, WAID domain, the ascomycotes, the Schizosaccharomy- the automatic gene predictions contain many mispre- cota, and some of the yeast species only encode the dicted exons/introns and unrecognised N-terminal/C- WAID domain (Figure 1). Most yeast species sequenced terminal parts. Most likely due to the long low-complex- so far lost both the CRIB and the WAID domain (Figure ity regions, only a few full-length mRNA/cDNA 1). sequences for WASP family proteins are available. If Several species encode multiple WASP homologs. To partial cDNA/EST clones were found then those reveal subfamily relationships fragmented sequences and encoded in most cases the N-terminal WH1, WHD, the unique regions were omitted from the multiple WAHD, or WMD domains. The eukaryotic genomes sequence alignment that was used to generate the phy- were analysed in an iterative search process, meaning logenetic tree (Supplementary Information). The tree that those genomes, for which the WASP family pro- shows that there have been many species-specific WASP teins could not unambiguously assembled in the first duplications (e.g. in Trichomonas vaginalis, Dictyoste- instance, were reanalysed as soon as further data was lium fasciculatum, Trichoplax adhaerens, Saccoglossus added to the multiple sequence alignments. In this way kowalevski) and multiplications (e.g. in Naegleria gru- the completeness of the search for WASP family pro- beri, Acytostelium subglobosum, Acanthamoeba castella- teins and the accuracy of the gene assembly and annota- nii) but also two clade-specific duplications, one of tion have continuously been re-evaluated and improved. which happened in the ancestor of the Mucoromycotina, In addition to manually assembling all sequences, the and one at the onset of the vertebrates resulting in multiple sequence alignments of the WASP sequences WASP and N-WASP. The duplication and subsequent have been created and were maintained and improved subfunctionalization of the WASP homologs at the ori- manually (Additional file 1). gin of the vertebrates is most probably the result of the The WASP dataset contains 1021 sequences from 408 two whole genome duplications (2R) that happened organisms (Table 1). 848 sequences are complete and an shortly after the emergence of the vertebrates [27]. additional 58 sequences are partially complete. WASP (but not N-WASP) has been lost in birds, and Sequences, for which a small part is missing (up to 5%), duplicates of both WASP and N-WASP are found in were termed “Partials” while sequences, for which a con- fish genomes. The additional duplications in fish gen- siderable part is missing, were termed “Fragments”. This omes are probably the result of another whole genome difference has been introduced because “Partials” are duplication that happened at the origin of the Actinop- not expected to considerably influence the phylogenetic terygii branch [28]. Some of the fish encode a further analysis. WASP family genes were termed pseudogenes WASP homolog that is much shorter than the other if they contain more disablements like frame shifts and WASP types and misses the basic, the central and the miss more conserved sequence regions than can be acidic domain (Figure 1, Orn S-WASP). Therefore, we attributed to sequencing or assembly errors. The differ- suggest naming these WASP homologs S-WASP (short ent WASP subfamilies are distinguished by their domain WASP). Similarly, some of the sequenced Basidiomy- organisations outside the common proline-rich and C- cotes encode one or two further WASP homologs that terminal VCA domains. donotencodeapoly-prolinerichregionandaVCA Kollmar et al. BMC Research Notes 2012, 5:88 Page 4 of 23 http://www.biomedcentral.com/1756-0500/5/88

Table 1 Data statistics WASP WAVE WASH WHAMM WAWH WAML Sequence Total 400 344 165 76 19 17 From WGS 375 332 148 68 17 17 Domains 7 3 2 5 2 Amino acids 207219 251639 78326 58689 14066 8614 Total pseudogenes 3 19 4 Pseudogenes without sequence 1 2 1 Completeness Complete 316 312 138 47 18 17 Partials 33 9 9 7 Fragments 50 21 17 22 1 Species Total 316 169 158 42 11 3 WGS-projects 275 155 139 34 9 3 EST-projects 87 69 59 17 4 1 WGS- and EST-projects 128 87 81 23 6 2

502 WH1 PPR Hs WASP, 53 kDa

505 WH1 PPR Hs N-WASP, 55 kDa

681 WH1 PPR PPR PPR Tv WASP_A, 69 kDa

365 WH1 PPR Orn S-WASP, 40 kDa

0 200 400 600 800 1000 aa loss of WAID domain

633 Saccharomycetaceae WH1 PPR Sc WASP, 68 kDa Acidic region Basic region 511 WH2 domain Wickerhamomycetaceae WH1 PPR Wa WASP, 55 kDa Central region 642 WAID domain Debaryomycetaceae WH1 PPR Deh WASP, 68 kDa CRIB motif 629 loss of CRIB motiv Yarrowia lipolytica WH1 PPR Yl WASP, 64 kDa

640 Pezizomycotina WH1 PPR An WASP, 66 kDa

574 Schizosaccharomycetaceae WH1 PPR Sp WASP, 60 kDa

460 Basidiomycota WH1 PPR Um WASP, 47 kDa Figure 1 Domain organisation of representative WASP proteins. A colour key to the domain names and symbols is given on the right except for the WH1 domain and the poly-proline region (PPR) that are coloured in dark green and yellow, respectively. The abbreviations for the domains are: WH2, WASP homology 2 domain; WAID, WASP autoinhibitory domain. The phylogenetic tree on the left shows the schematic branching of some major taxa of the Dikarya fungi and, as single representative, Yarrowia lipolytica. For each branch, the domain organisation of one representative WASP homolog is shown on the right. The schematic tree has been created based on the most comprehensive phylogenetic analysis of yeast species [26]. Species abbreviations: Hs = Homo sapiens, Tv = Trichomonas vaginalis, Orn = Oreochromis niloticus, Sc = Saccharomyces cerevisiae, Wa = Wickerhamomyces anomalus, Deh = Debaryomyces hansenii, Yl = Yarrowia lipolytica, An = Aspergillus niger, Sp = Schizosaccharomyces pombe, Um = Ustilago maydis. Kollmar et al. BMC Research Notes 2012, 5:88 Page 5 of 23 http://www.biomedcentral.com/1756-0500/5/88

domain. Although these short WASPs in fish and Basi- Classification and analysis of the WAVE protein family diomycotes do not encode CA and VCA domains, In general, WAVE protein homologs consist of an N- respectively, that are the basis of ARP2/3-complex acti- terminal WHD domain followed by a basic region, a vation and therefore a requirement of WASP protein poly-proline region, and a C-terminal VCA domain (Fig- family membership, we would suggest to also group ure 2). A few of the identified WAVE homologs are these proteins to the WASP family because of the strong notable exceptions: 1) WAVE from the stramenopiles sequence homology to the WH1 and CRIB/WAID Ectocarpus siliculosus encodes two WH2 (V) domains domains of the long WASPs. Thus, they were most (Figure 2). 2) Three of the eleven WAVE homologs of likely derived from WASP duplications followed by Trichomonas vaginalis encode short coiled-coils instead domain loss events. Trichomonas vaginalis encodes two of the basic regions and also two WH2 domains. 3) The special WASP homologs that have two (TvWASP_B) plant WAVE homologs do not contain poly-proline and three (TvWASP_A) tandem repeats of the poly-pro- regions. Here, we could identify for the first time some line rich region and the VC domain but that miss the plant-specific sequence motifs in the region between the C-terminal acidic A domain (Figure 1). In comparison basic and the VCA domain that we suggest to name to a previous analysis [22] we were able to identify more WAM 1-4 (WAVE motif, Figure 3). Plant WAVE motif WASP homologs even in species, which had been ana- 1 (WAM1) is common to all plant WAVE homologs lysed before, and to complete sequences for which except the class-3 plant WAVE’s, the Arabidopsis domains had been missing before. First, we also identi- WAVE2B homologs, and the Brassica rapa class-2 fied WASP homologs in Trichomonas vaginalis and the WAVE. Thus, it is the most ancient of the plant WAVE Haptophyte species Emiliania huxleyi. However, we motifs. The WAVE homologs that do not contain the found only one WASP homolog in Caenorhabditis ele- WAM1 motif anymore seem to have lost it class- and gans and all other nematode species. Second, there are species-specific. The WAM2 motif is characterised by a invertebrate WASP homologs that do contain only one number of proline residues separated by leucine residues WH2 domain, like all Acoelomata WASPs sequenced so and followed by a glutamine-tryptophane-arginine tri- far, the Oikopleura dioica WASP, and one of plett (Figure 3). This motif is present in all plant WAVE the Trichoplax adhaerens homologs, TiaWASP_B. Our homologs except the Physcomitrella patens WAVE and extensive analysis also allowed us to identify species out- Arabidopsis WAVE1C homologs. The motif has there- side the metazoa that do encode WASP homologs con- fore been introduced in WAVE proteins after the taining two WH2 domains like Sphaeroforma arctica, separation of the moss species. Two further motifs were which belongs to the Fungi/Metazoa incertae sedis found in all class-1 WAVE homologs and are class-1 branch. specific. WAM3 is a highly conserved motif while the

0 200 400 600 800 1000 1200 1400 1600 1800 2000 aa

559 WHD PPR Hs WAVE1, 62 kDa Acidic region WAM1

Basic region WAM2 498 WH2 domain WAM3 WHD PPR Hs WAVE2, 54 kDa Central region WAM4

502 Coiled coil WHD PPR Hs WAVE3, 55 kDa

634 WHD proline-serine rich Rn WAVE5A, 71 kDa

682 WHD PPR Ecs WAVE, 71 kDa

423 WHD PPR Tv WAVE_D, 55 kDa

821 WHD At WAVE1A, 92 kDa

1020 2028 WHD At WAVE1B, 113 kDa

At WAVE1C, 222 kDa

WHD

1399 WHD At WAVE2A, 152 kDa

1170 WHD At WAVE2B, 129 kDa

Figure 2 Domain organisation of representative WAVE proteins. A colour key to the domain names and symbols is given on the right except for the WHD domain (WAVE homology domain) and the poly-proline region (PPR) that are coloured in light green and yellow, respectively. Species abbreviations: Hs = Homo sapiens, Ecs = Ectocarpus siliculosus, Tv = Trichomonas vaginalis, At = Arabidopsis thaliana. Kollmar et al. BMC Research Notes 2012, 5:88 Page 6 of 23 http://www.biomedcentral.com/1756-0500/5/88

Plant WAVE motif 1 (WAM1) 4 3 2 bits M M D Y N T E Q SH E 1 G D I N S TK E V SD NQS L D CR GR VT P D N H HE V N Y P R L YN D T E L LT ED E A SL I DR E T H N I S M I A S I M E A V E D L S A S S SK S P E V S E P P YR S G K L C T P F K D P Q D E N D I R L G E CE F N ET L S G G L A I A M D T G G S I G AK G S P E VM T R L D G H N G R I S DA I N I M H ED T R D R Q V E F A V 0 TG Y V 2 1 3 4 5 6 7 8 9 20 33 10 11 12 13 14 15 16 17 18 19 21 23 26 32 34 38 40 22 28 30 36 N 24 25 27 29 31 35 37 39 C

Plant WAVE motif 2 (WAM2) 4 3 2 bits P M M LG G 1 P P V PQL L E Q E R A V SKL S SA K AT P P SE S WT I HHD SALQ L L V DS L T VR V D P P T SR L L E M T S L D T E M M Q Q QA R T E G V P H T CA T V N T A G L P PA Q M F A P W I L A S S P P 0 A 1 5 6 2 3 4 7 8 9 25 16 17 22 23 24 26 12 13 14 15 N 10 11 18 19 20 21 C

Plant class-1 specific WAVE motif 3 (WAM3) 4 3 2 bits Q S V F D E H E SNS 1 SS ES E S V S K AV A L DS S E G AE CK T D SL L GM SAL G F P S A G PEN PT F P Q NN I EPS DRT P R S S P G G QD A Q T K I E Q G H D Q I P A N T V F L T S L LT T YAD N Q K P H NL H T A F K A G R I F N G I N V Q GD V C T N P L A E Q S T DQ S T T I T P H H V L E AL G G S N L VHN G V T T A P E Q N H T T K S S S W NAS F L P KPSS S KD V M N A W VK 0 W YV 1 3 4 5 7 9 2 6 8 27 36 43 16 18 21 23 24 25 26 28 29 30 32 34 35 37 39 41 44 10 12 33 40 42 N 11 13 14 15 17 19 20 22 31 38 C

Plant class-1 specific WAVE motif 4 (WAM4) 4 3 2 bits LGH G Q K D P G L D A H Q Q 1 G LV S KS E I F R V L A R E SL L Q G N P E F G TF S H E K V S Q R K K TD E Q V P ET VN T R I N VVVD H T D K EN S S F S D T D L K I R S S S Y RP Y E N I G E TKA K G D S T E Y E H LA L MG P D F ER QRQ D SA E A E TY T R R AH I D E SPV P E S S T I F K S AE RA F R A E A A G S D P S A TK LA R A E C QS V EN G L K D E S H N I TF E E K C T N D N D P T H G I P S E I P S T GQ N G S A D P Y G S V SN G DP S N S G V T I F NN I R E A VK K G I L R P T F DK T M N S E Q R P K L K N F M F T Q N I F H N I G N S S R T S W S F K G T P V S P H T G N E A S E A I TA T N L T A P K R QM I T S G T S L QQ SG Q S S Q QS S VN T L T YNP L SS 0 V FV 5 9 1 3 4 6 8 2 7 24 10 12 14 16 18 19 22 27 29 31 33 35 37 39 47 49 55 56 57 11 13 15 17 20 21 23 25 26 28 30 32 34 36 40 41 42 43 44 45 46 48 50 51 52 54 58 59 60 N 38 53 61 C 4 3 2 bits P H I DGF GS L K P NH E 1 P I D T K H S H M S V K F E E E G V S C H S M N Y T L T D S Y N A SK D D NG D P N S N A G S RF A S D L M I G D F P I V A I D G S A F NS I L C E D S SS HN S P R E N T E K Q Q P GP E F S R E L L K P L L C D N H L QY E Q I Y I Q R Q I GQ R E T E R N K V A R K K T L A I LA S S S E P T G W H L T M P E S GN G K HS N I D Q F Y G R D L D K VN SA SAG SANDTN L KTPLF PPV QMR M S S Q I MD K YY HNTL A RTKNVH VY F V 0 T P I 69 74 75 76 62 66 68 80 82 87 63 64 65 67 71 72 73 77 78 79 81 84 85 86 91 92 93 98 99 70 83 88 89 90 95 96 97 94 102 103 100 108 110 115 120 122 101 105 106 107 111 112 113 114 117 118 119 121 104 109 N 116 C 4 3 2 bits V D D H D M PS I P H G P M C SE Q S 1 R S L N E D L S D C Y H E S S S S I D E D D P S D S S P E T G F V DM A Q F DH PR D Q T D F S N L NN M D Y D L E A R I S Y L D G L E V I L E P L P S S D E S Q T Q A T A A G C S S G S L D D L C K LA L N S E H E Y V F Q D N Y K V E E N R D N E S S H I T D A A G F M S I P K G Q E S E K I N G A A T G Y Q L Q D S D A D Y L I C AC Y L V R H SR G H L F F N LQ F L S L HDK DTF Y R M E W Q C S S S G G V T GT I NYC T Y T T V EYYVTQ Y VH V PH 0 V A M 123 124 127 128 133 142 138 129 134 139 140 143 144 147 154 125 126 130 131 132 135 136 137 141 145 146 149 150 151 152 155 156 157 159 160 161 164 165 166 168 169 170 174 175 176 177 180 181 148 153 162 163 171 172 173 N 158 167 178 179 C

Figure 3 Conserved sequence motifs and domains in plant WAVE proteins. The sequence logos illustrate the sequence conservation within the multiple sequence alignments of three sequence motifs (WAM1, WAM2, and WAM3) and a domain (WAM4) that are specific to plant WAVE proteins. Plant WAVE proteins group into three different classes. WAM1 is present in almost all plant WAVE homologs, WAM2 has been introduced into plant WAVEs after the separation of the moss species, and WAM3 and WAM4 are unique to class-1 plant WAVEs. Kollmar et al. BMC Research Notes 2012, 5:88 Page 7 of 23 http://www.biomedcentral.com/1756-0500/5/88

WAM4 with a length of about 180 residues has the size contain a family-specific N-terminal domain that has of a domain (Figure 3). been call WAHD domain (WASH homology domain, Because many species encode more than one WAVE [4]) in analogy to the WH1 and WHD domains in homolog we sub-grouped the WAVE proteins based on WASP and WAVE, respectively. A short poly-proline a phylogenetic tree calculated from concatenated multi- rich region and a C-terminal VCA domain follow the ple sequence alignments of the WHD and VCA domains WAHD domain (Figure 5). In contrast to the WASP (Figure 4). The plant WAVE homologs clearly group and WAVE proteins the long region between the into two branches that we termed class-1 and class-2. WAHD domain and the poly-proline region is not Several plants contain further WAVE duplications, unique to species and taxa but contains several short which are, so far, species-specific. Vertebrates contain motifs that are present in all WASH proteins (Figure 5). WAVE homologs from up to five classes: A) The three All WASH proteins have this general domain organisa- well-known classes to which the human WAVEs belong. tion except the Trypanosoma species WASH proteins, B) A fourth class that comprises WAVE homologs from which miss the WH2 domain, and the Stramenopiles all major branches but mammalian WAVEs WASH proteins that encode two VCA domains from only Monodelphis domestica and Ornithorhynchus arranged in tandem (Figure 5). Through the use of our anantinus. C) A fifth mammalian specific class that con- manually curated sequence dataset the WASH homolog tains mainly pseudogenes but some potentially coding of Phytophthora sojae could be annotated correctly. transcripts (Figure 4). Most mammals have already lost WhileWASHandWAVEproteinsareeasilydistin- this class, like the primates, the dog, the panda bear, the guished for most species, the WAHD domain of the bats, while others have up to six duplicates (Cavia porr- Phytophthora species is the most divergent of all known cellus). Most of the members are potential pseudogenes WAHDdomains.However,theWASHproteinsofthe becausetheyareencodedbysingleexons,containin- Phytophthora species do encode the other characteristic frame stop codons and frame-shifts, and sometimes motifs of the WASH family. Those features are clearly miss stop-codons and starting methionines. Thus, there distinct from the WAVE homolog found in another are more genetic disablements than can be attributed to sequenced Stramenopiles species, Ectocarpus siliculosus. sequencing and assembly errors. The class-5 WAVE Through manual inspection of the genomic DNA homologs of Rattus norvegicus, Oryctolagus cuniculus, sequences together with manually maintaining the mul- Otolemur garnetii,andDasypus novemcinctus do not tiple sequence alignment we have been able to correct contain reading-frame interrupting mutations and might many automatic gene predictions and to greatly improve therefore present potentially expressed transcripts on initial bioinformatics studies [22]. Thus, we identified although they are all encoded by a single exon. The rat CA and VCA domains in the WASH homologs of kine- WAVE5A is the only class-5 WAVE supported by toplastid species (Trypanosoma and Leishmania species, cDNA data. The mouse paralog WAVE5A, however, Figure 5). In addition, the plant species Physcomitrella contains an in-frame stop codon. patens and Selaginella moellendorfii have a regular C- terminal acidic (A) domain. Classification and analysis of the WASH protein family Vertebrates and almost all other species analysed con- Classification and analysis of the WHAMM protein family tain just one single functional WASH homolog as com- The first member of this protein family, which has been pared to the multiple homologs of WASP and WAVE shown to function as an actin nucleation promoting fac- proteins. The only exceptions so far are the amoebae tor, has been named WHAMM [20]. Subsequently, a Dictyostelium fasciculatum and Acytostelium subglobo- close homolog has been named JMY [21], and, most sum,andtheRhizariaBigelowiella natans that encode recently, invertebrate homologs have been identified and two WASH homologs as result of species-specific gene named WHAMY [22]. A phylogenetic tree of the avail- duplications. In addition, several mammals, and espe- able homologs shows that they all belong to the same cially primates, contain one or many WASH pseudo- protein family (Figure 6). The grouping is similar to that genes that are characterised by consisting of non- of other metazoan protein families showing single mem- functional single exons because of frame-shifts, in-frame bers for the invertebrates and duplicates for the verte- stop-codons, missing terminal stop-codons, and missing brates that are the result of the 2R genome duplication conserved sequence [19]. These pseudogenes are located at the origin of the vertebrates. Another genome dupli- in subtelomeric regions that are hotspots for rearrange- cation followed the divergence of the fish resulting in ments and interchromosomal duplications. It is unlikely some more duplicates in fish genomes. Thus, all mem- but could be possible that some of these pseudogenes bers of this protein family should get the same name, are in fact functional genes because of the general which should be WHAMM because this was the first sequencing problems of these regions. WASH proteins member identified to function as NPF. Kollmar et al. BMC Research Notes 2012, 5:88 Page 8 of 23 http://www.biomedcentral.com/1756-0500/5/88

class -3 -2 lass c vert ebr ate

4 s- s c la la c s s -5

c 98 l a s 61 84 s 92 - 77 1 100 94

e

t

a

N

r

H

b

S e

t 97

r

a e

91 m v

o

n

i

100 e

b

a

e

T

98 r

i

c

h

o

m

91 o

n

a s

87 61

v

i

r

i

d

i

p

l

a

n

t

a c

e l

a

s

s

- 1

2

-

s s a l c

cDNA (gb-id: BC162049)

Figure 4 Phylogenetic tree of the WAVE proteins. The phylogenetic tree of the WAVE protein family was calculated from the multiple sequence alignment of the concatenated WHD and VCA domains using RAxML. For lack of space bootstrap support values are only given for major branches and indicated in percent. The WHD domains of a few NHS proteins have been used as outgroup. The unrooted tree was drawn with FigTree and branches were coloured according to class and taxonomic distributions. For the extended tree see Additional file 1. Kollmar et al. BMC Research Notes 2012, 5:88 Page 9 of 23 http://www.biomedcentral.com/1756-0500/5/88

0 200 400 600 800 1000 aa

468 Acidic region WAHD PPR Hs WASH, 51 kDa Basic region

WH2 domain

4 4 Central region 3 3 2 2 bits bits P V I I T E Y QV Y D V S 1 N Y L E I D G DL Y S G S P 1 EN F V P S D A A D K D E S D E T AL R A F NN I G A S D L LL L F S L L L T L N L Q I T D N L G A S G D N S A NDF R H N M P P F L L D F R GM G L E V W E V V A Q E E A Q SR Y SS A N M M S T R D F N D A S N R S KH S Q A I RS V D L G N T A Q A L R D PLY G M E S K K G I E L I SG P AT S F E EF I A A Q M S P V E I S T R D A M W N L M I N Q D R FR L P I P L A F F P I A Q F N K D E V T VS N YA T D Q S F L E S Y Q Q P A V L N K M I F K K E A H L G S D AK S I I V F L L V F L S V A E N Q R K N T P G F K EQ D KS A P T L V 0 1 2 3 4 5 7 8 9 0 6 7 1 2 3 5 4 6 8 9 10 11 12 13 14 15 16 18 19 20 21 22 23 26 28 30 34 36 39 40 41 42 43 17 24 25 27 29 31 32 33 35 37 38 10 11 13 14 17 N 12 15 16 18 C N C

594 WAHD PPR Trc WASH, 64 kDa

593 WAHD PPR Phr WASH, 65 kDa

Figure 5 Domain organisation of representative WASH proteins. A colour key to the domain names and symbols is given on the right except for the WAHD domain (WASH homology domain) and the poly-proline region (PPR) that are coloured in light green and yellow, respectively. The sequence logos illustrate the sequence conservation within the multiple sequence alignments of two characteristic motifs shared by all WASH protein family members. Species abbreviations: Hs = Homo sapiens, Trc = Trypanosoma cruzei, Phr = Phytophthora ramorum.

Previously, the N-terminal domain of WHAMM has between the two alleles. Therefore, not even the well- been defined differently, sometimes marking the first 50 conserved myosin family proteins could unambiguously residues at the N-terminus [21], sometimes comprising be annotated [25]. the first 170 residues of mammalian WHAMM1 [20,29]. It has been reported initially [21] and subsequently Inthefirstcase,theN-terminaldomainendsbeforea adopted in recent reviews [4,30] that class-2 WHAMM long (about 120 aa) low-complexity region in WHAMM2 homologs (JMY) contain three WH2 domains. The homologs, while in the other case the N-terminal domain region comprising the supposed first WH2 domain is is extended subsequent to this low-complexity region by highly conserved in all vertebrates (Figure 7). However, another 120 residues that are conserved between all this region does not show homology to the WH2 WHAMM homologs. Because all invertebrate WHAMM regions of the WASP family proteins (see also below) homologs also do not contain the low-complexity region, that are characterised by a highly conserved pattern it is most likely that it has been inserted into the N-term- ([VILM]-[LM]-[ASED]-[ASDEQ]-[IL]-[KRQ]-x10-[L]- inal domain of WHAMM2 after the vertebrate [KR]-[KR]-[VTA]). The prediction of a third WH2 WHAMM duplication happened. Therefore, we support domain in class-2 WHAMM homologs is thus most the definition of the longer N-terminal domain that has likely an artefact, or the respective region would com- been called WMD (WHAMM membrane interaction prise an extremely unusual WH2 domain. There is no domain, Figure 7, [30]). experimental data available supporting this motif being We have also found a WHAMM homolog in the a true WH2 domain. Hydrozoa species Clytia hemisphaerica, which is outside Bilaterians and thus extends a previously proposed ori- Identification and domain organisation of the WAWH and gin of WHAMM homologs [22]. The WHAMM homo- WAML families log of Strongylocentrotus purpuratus has been proposed While we extensively searched the eukaryotic genomes to contain a tandem N-terminal WMD and coiled-coil for WASP homologs we observed several hits for the region [22], but this is most likely an artefact from the GBD and VCA domains in amoebozoan genomes. The genome assembly. In fact, the duplicated part (WMD true WASP homologs could easily be identified and domain and coiled-coil region) is surrounded by assembled because of homology hits for the WH1 stretches of ‘N’ bases and has most probably mislead- domain. When we assembled the first of the hits in Dic- ingly been placed inside the supercontig containing the tyostelium discoideum, for which only homology to the WHAMM gene during the genome assembly process. GBD and VCA domains was indicated, we were long The Strongylocentrotus genome is one of the worst searching for potential WH1 domains upstream of the assemblies available because of the high polymorphism GBD domains, expecting even very low sequence Kollmar et al. BMC Research Notes 2012, 5:88 Page 10 of 23 http://www.biomedcentral.com/1756-0500/5/88

-1 ass cl te ra eb rt e v

in v e r t e b r a

t

e

B

2

-

s

s

a

l

c

v h

s e

r fi

t

e

b

r

A

a

2

t

-

e

s

s

c

a l

l a

c

s

s

h - s 2 fi

Figure 6 Phylogenetic tree of the WHAMM proteins. The phylogenetic tree of the WHAMM protein family was calculated with MrBayes from the multiple sequence alignment of the full-length WHAMM sequences excluding the low-complexity insertion of the WMD domain. Posterior probabilities are only given for some major branches because for lack of space. The unrooted tree was drawn with FigTree and branches were coloured according to class and taxonomic distributions. For the extended tree including all posterior probability values see Additional file 1. similarity. However, we were not able to identify any sub-types have been found: A) Type-I that has the same WH1 domain, and when we analysed further amoe- domain organisation as WASP except for the WH1 bozoan genomes it became clear that we have identified domain that is missing. B) Type-II that does not have a new type of the WASP protein family, that we suggest the basic region N-terminal to the GBD domain but has to call WAWH (WASP without WH1 domain). The two conserved CxxC motifs at the N-terminus followed WAWH proteins have a similar domain organisation as by a serine-rich region, and a long coiled-coil region the WASP proteins, except that they miss the N-term- between the GBD domain and the PPR region. C) Type- inal WH1 domain (Figure 8). Three different WAWH III WAWHs have a highly charged region (about 90% of Kollmar et al. BMC Research Notes 2012, 5:88 Page 11 of 23 http://www.biomedcentral.com/1756-0500/5/88

0 200 400 600 800 1000 aa

808 WMD PPR Hs WHAMM1, 91 kDa

988 Hs WHAMM2 WMD PPR 111 kDa WHAMM2/3 insertion

Acidic region 4 WH2 domain 3 2 Coiled coil bits KSASAP HL V R A A GL R 1 SA S S E Q V L G K G L P I L R DSSHW L Q I E T T DE M S R N SA A F K KT P R SS ASS P F R Q F V P Central region T I L Q T T S F P D L NAKQ LK PA D R HNVNG P Y T R I VL VS K 0 YL E 4 8 1 6 7 2 3 5 9 13 14 15 17 22 23 10 11 12 16 18 19 20 21 24 25 26 27 29 32 33 34 37 38 39 28 31 35 36 N 30 C

Figure 7 Domain organisation of representative WHAMM proteins. A colour key to the domain names and symbols is given on the right except for the WMD domain and the poly-proline region (PPR) that are coloured in dark green and yellow, respectively. The class-2 and class-3 WHAMM homologs encode insertions in the WMD domain. For HsWHAMM2 the insertion is symbolized by light-green bars. The sequence logo illustrates the sequence conservation within the multiple sequence alignments of the previously supposed third WH2 domain in class-2 WHAMM homologs. However, the sequence is not in accordance with the consensus WH2 sequence motif. Species abbreviation: Hs = Homo sapiens.

0 200 400 600 800 1000 aa

WAML 759 IMD Pro rich Mm MIM, 82 kDa

535 IMD Mm IRSp53_A, 59 kDa

Acidic region 513 Basic region IMD PPR Eh WAML_A, 57 kDa WH2 domain

492 Coiled coil Eh WAML_B, 54 kDa IMD PPR Central region

510 WASP autoinhibitiory domain (WAID)

IMD PPR Eh WAML_C, 56 kDa SH3 domain

CRIB motif 558 IMD PPR Eh WAML_D, 60 kDa

365 IMD Eh WAML_E, 42 kDa

WAWH 2x CxxC motif 905 Ser rich STN rich PPR Dd WAWH_A, 99 kDa

583 PPR Dd WAWH_B, 65 kDa

297 Aoc WAWH, 33 kDa

Figure 8 Domain organisation of representative WAWH and WAML proteins. A colour key to the domain names and symbols is given on the right except for the IMD domain and the poly-proline region (PPR) that are coloured in dark green and yellow, respectively. Species abbreviations: Mm = Mus musculus, Eh = Entamoeba histolytica, Dd = Dictyostelium discoideum, Aoc = Anolis carolinensis. Kollmar et al. BMC Research Notes 2012, 5:88 Page 12 of 23 http://www.biomedcentral.com/1756-0500/5/88

the sequence consists of glutamate, arginine, and lysine) of about 800 residues between the GBD domain and the 4 3 PPR region. While all amoebae encode one or more of 2 WASP-WH2A bits DA AS R GG I G K G G G G Q V 1 R L I K N SD DQ A T GHG S KA KG A S AV the type-I and type-II WAWHs, the type-III WAWH G A HT P E N A K S RP L E V Q S G L AK Q K R A E I S R L A N I N DN M K L T S S S T S M K F N D T I L S K N V G V TN I R I G 0 G A 3 4 1 2 5 6 7 8 9 10 11 14 16 17 19 20 22 has so far only been found in Entamoeba species. N 12 13 15 18 21 23 C WAWH homologs have also been identified in the Apu- 4 3 sozoa Thecamonas trahens and the lizard Anolis 2 WASP-WH2B bits G DA L Q Q P 1 D Q I G A T E V S E S S A S K D L E G K R V H carolinensis. A N K TG A PR EA N K EM K F L P T Q A D G N SQ T MH D I V K R S R R R V I S R M K T V L A T 0 Q A L 7 1 2 3 4 5 6 8 9 10 11 12 13 14 16 17 19 20 21 22 15 18 Because all amoeba genomes contain at least one N 23 C WASP gene we were searching for WASP homologs in 4 3 Entamoebae for a long time. Low-scoring hits were only 2 WAVE bits A A I Q RK S E T G V 1 D K ER V A N D N N L R obtained for the WH2 domain, and because this domain P S LS Q Q T KE S F P N M K N K S KE GPDTF A G VT Q L ST SAGL Q S D R M K H T Q N RA S MH I I LH 0 7 1 2 3 4 5 6 8 9 19 20 22 10 11 17 21 13 14 16 is very short (Entamoebae genes rarely contain introns N 12 15 18 23 C so that we would have expected longer hits if there 4 3 would be homology to neighbouring regions, which we 2 WASH bits A LEA G N N I K K RSV 1 S G A P T R QA H G K G G A A R T S K A wouldexpectforWASPhomologs)andalsopresentin S A I DH P P QRN D RK R N I I D H Q K V N A G I RG G E P N A E K A S F T L L E S P P V L S S KG VV D PL SH V A M L P P Q D S PHR 0 F S L 2 3 4 7 1 5 6 8 9 11 10 12 13 14 20 many other proteins we did not analyse these hits N 15 16 17 18 19 21 22 23 C further in the first instance. As soon as further amoe- 4 3 bozoan genomes became available we always searched 2 WHAMM-WH2A bits P K R SFH R 1 the Entamoeba genomes again for WASP homologs. As A V R P T Q K S E R HGQ I K S L A EV V Q S S A K D A RL K A Q S QR TM Q L L S L N P T 0 E P D 2 7 1 3 4 5 6 8 9 10 11 13 14 16 17 19 20 21 22 12 15 we started to inspect the short hits for the WH2 N 18 23 C domains further, the corresponding assembled genes 4 3 revealed a similar domain organisation as WASP pro- 2 bits Q WHAMM-WH2B SNN Q 1 D A K K teins starting with the WAID domain (Figure 8). Sur- H I VNE I S L R R K S V PPD R K A G DL VA A L G R M L M NL I V SQM T M K I F M P T QLR T 0 V L RT 1 2 3 4 5 6 7 8 9 14 10 11 12 13 17 19 20 21 22 16 18 23 prisingly, five to seven proteins with a similar domain N 15 C organisation have been found in the three sequenced 4 3 Entamoeba genomes. In contrast to WASP proteins, 2 WAML bits Q 1 A SL D K T V these proteins each encode an IMD domain (IRSp53/ P R K A K G A Q G Q K A GGN G T N G F NR E I F T S A S S S I SS N E I S Q G V D LSDL QA LRP P M A I P 0 T K 4 5 3 6 7 8 9 1 2 10 11 12 13 14 15 16 17 19 20 21 22 18 MIM homology domain) at the N-terminus and lack the N 23 C CRIB motif. IMD domains are found in proteins of the 4 3 IRSp53 (insulin receptor tyrosine kinase substrate p53) 2 WAWH bits V S LD R K K and proteins of the MIM (missing-in-metastasis) family, 1 GR L S I A G K L P S SA A KN A S T G M N N L E GF A A K K P P A T F Q D N Q K K T E NAV G N F QD L I H N FQ S N A Q G R P D Q P I R Q G S N D E S

S GM S L S M A R M TR 0 D Y V V 1 2 4 5 7 8 3 6 9 11 14 20 10 12 13 21 22 16 17 18 19 for example. The MIM family proteins additionally con- N 15 23 C tain proline-rich regions and a C-terminal WH2 domain Figure 9 Sequence conservation of the WH2 domains of WASP (in one of the alternatively spliced transcripts). We family proteins. The sequence logos illustrate the sequence therefore suggest calling these new types of WASP conservation within the multiple sequence alignments of the WH2 family proteins WAML (WASP and MIM like). One domains. Several of the WASP proteins as well as the vertebrate subclass of the WAML proteins comprises a short ver- WHAMM homologs encode two WH2 domains in tandem that are sion that misses the PPR and VCA region. shown separately here. The linker between the N-terminal helix part of the WH2 domain and the C-terminal LKKT motif are of variable length. To align all WH2 domains to each other, four to five gaps Comparison of the WH2 and central domains have been introduced into the linker of all but the WASP-WH2A The WH2 domain is a short sequence motif of 17-27 and WASH alignments. The sequence logos are based on the residues and binds to actin at the barbed-end [31-33]. following number of sequences: WASP-WH2A = 343; WASP-WH2B = The first ten amino acids of the WH2 domain form a 110; WAVE = 321; WASH = 141; WHAMM-WH2A = 63; WHAMM- WH2B = 69; WAML = 14; WAWH = 19. short alpha-helix that binds to the hydrophobic cleft between subdomains 1 and 3 of actin. A linker of variable length follows the helix. After the linker a strongly con- highly conserved large hydrophobic residues (VILM) fol- served LKKT(V) motif anchors the C-terminal part of the lowed by two residues, which might be small (AS), polar WH2 domain into a hydrophobic pocket (via the leucine) (SQ) or negatively charged (DE), and another completely while directing the C-terminal end of the WH2 domain conserved large hydrophobic residue (I or L). The hydro- towards the pointed-end of actin. The WASP family pro- phobic residues bind into the hydrophobic cleft of actin’s teins contain well-conserved WH2 domains (Figure 9). barbed-end. The linker almost always contains a hydro- Based on crystal structures of WH2 domains in complex phobic residue (VIF) that binds into a hydrophobic with actin the N-terminal helix is characterised by two pocket on the actin surface. Based on the 1080 WH2 Kollmar et al. BMC Research Notes 2012, 5:88 Page 13 of 23 http://www.biomedcentral.com/1756-0500/5/88

domain sequences of the WASP family proteins the fol- In contrast to the WH2 domains, the C domains of lowing pattern can be identified: ([VILM]-[LM]-[ASED]- the VCA region have WASP sub-family specific patterns [ASDEQ]-[IL]-[KRQ]-x10-[L]-[KR]-[KR]-[VTA]). The (Figure 10). The general pattern of the C domain can be proposed third WH2 domain of class-2 WHAMM homo- depicted as follows: The center of the motif is charac- logs does not show homology to the N-terminal helix terised by three large hydrophobic residues (VILM) that part of the motif and should therefore not be able to are each separated by three amino acids. A doublet or form a similar structure. The WH2 domains are highly triplet of arginine or lysine residues follows. The C conserved throughout all WASP family proteins (Figure domain is terminated by another large hydrophobic resi- 9) containing linker of four to five residues. WASH pro- due (VILMY). The hydrophobic residues are almost tein WH2 domains and fungi WASP-WH2 domains completely conserved in each subfamily, and the specific (included in the WASP-WH2A sequence logo in Figure combinations could be used as marker. The C domains 9) contain longer linker of ten and seven to eight resi- show stronger sequence conservation than the WH2 dues, respectively. The highest conserved residues of all domains. WH2 domains are the three hydrophobic amino acids of The acidic C-termini of the WASP family proteins are the N-terminal alpha-helix and the LKKV motif. The thought to mediate ARP2/3 binding. Most acidic WHAMM-WH2A sequence logo comprises only verte- domains contain a tryptophan residue as last coding brate sequences and is therefore highly conserved over residues or close to the terminus. While phenylalanines the entire length of 19 residues of the domain. instead of the tryptophans have been reported to be unique to invertebrate WHAMM homologs [22] we have identified WASP homologs, which contain F’s 4 instead of W’s (e.g. in Acoelomata), WASH homologs 3 without any hydrophobic residues in the A domain (e.g. 2 bits V WASP GL G A K H G A A L K K 1 G AA Leishmania WASH proteins), and many WASP and GA Q K S PT S MR E SQA I E I A D AS V E NA R G S Q RNN SP M S E S H E QDD S Q MA D R S NGV A L D T N E N AT KV K GTR R A S R Q L M LQ I 0 NM 1 2 3 4 5 6 7 8 9 WAVE homologs that contain both a tryptophan and a 12 14 16 17 19 10 13 15 18 20 N 11 21 C phenylalanine/tyrosine, both separated by variable num- 4 bers of acidic residues. Thus, it remains an open ques- 3 2 tion whether a tryptophan or just an aromatic residue is WAVE bits D A SR R I VE 1 GNL REP A T T E N HGVV K A LN S needed for proper function, and whether there is a dif- K RAA H I D I S N T VE Q A V V T R A S L M EA A S S I V MS N K K A H T R P S K R G D I RA N S I Q I K L V T M V T E I E Q R L A 0 Q 1 2 3 4 6 7 8 9 5 10 12 14 15 16 19 17 11 13 18 20 N 21 C ference in encoding only a tryptophan or an additional

4 phenylalanine/tyrosine. 3 2 bits Alternative splice forms of WASP family proteins WASH G S FN M Q D K G S 1 GDL K G M A A P A H L K F A V I S SA R A TL I LA L A L A Q A S RLSR D DE S LL M G RQ S A KM V A F M T M S L Surprisingly, alternative splice forms of WASP family K NA Q Q S Q I ES Q MQ S N E QL A G M L T R T V I N VM V T F M 0 G 1 2 4 6 7 8 9 3 5 10 12 13 14 15 16 17 18 19 20 N 11 21 C proteins have not been reported yet except for a neural- 4 specific splicing of WASP through that a truncated 3 WASP protein is generated lacking exon-2 [34]. Exon-2 2 bits T E A WHAMM DTDP H R E encodes the N-terminal five ß-strands of the seven ß- 1 S L A D Q I R S P S RS D L I K A K R E M P EK Q A H V E KV ASL M N L R GS LR Q A I AT N R N N L K M V 0 I K strands of the WH1 domain [35] and it seems impossi- 1 2 3 4 5 6 7 8 9 17 10 12 14 15 16 18 19 20 N 11 13 21 C ble that a functional and folded WH1 domain could be 4 formed without the exon-2 encoded residues. We have 3 2 extensively searched all cDNA/EST databases available WAML bits M 1 G KA QK A AN K A of all species for clones encoding the reported exon-2 D T S CN D L QK E E QV A KG M A V I M E S V D GSN R V S D Q S E N QDV QLL I RRSD I C MY K NM 0 V VT 1 3 5 6 7 8 2 4 9 12 17 10 14 15 16 18 19 20 N 11 13 21 C deletion but have not found any. Interestingly, none of the four to nine isoforms of the three mammalian 4 3 homer proteins, that also contain an N-terminal WH1 2

bits domain, encodes a sequence in which part of the WH1 WAWH M A 1 Q E A R L D M ED VN G AD N D SV KG DN L VK I K I G E D I R A RQ domain is missing [36]. The exon-2 deletion might thus EN I T NTA D G A M G A I LA S R E H GP AS N I E F A N T L F G LR HSR T S Q H ST GLH EFQ V I SKH LVS 0 A 3 5 1 4 6 7 8 9 2 10 16 12 13 14 15 17 18 19 N 11 20 21 C be a very limited event, so that it has not yet been Figure 10 Sequence conservation of the C domains of WASP included in cDNA/EST libraries (although dozens of family proteins. The sequence logos illustrate the sequence cDNA/EST clones from many species exist for the full- conservation within the multiple sequence alignments of the C length WH1 domain), or it might be an experimental domains. The sequence logos are based on the following number artefact. The available data argues for the latter. of sequences: WASP = 349; WAVE = 321; WASH = 154; WHAMM = We have searched all available databases for cDNA/ 69; WAML = 14; WAWH = 19. EST clones representing different splice forms of the Kollmar et al. BMC Research Notes 2012, 5:88 Page 14 of 23 http://www.biomedcentral.com/1756-0500/5/88

400 bps (ex.) 5000 bps (in.) 9 10 11A 11B 12

1 gi|254070614|gb|GL006459.1| (21929bp)

For clarity introns have been scaled down by a factor of 12.66

poly-proline region WH2A WH2B

exon 9 exon 10

10 20 30 40 50 60 70 ....|....|....|....|....|....|....|....|....|....|....|....|....|....| ApcWasp PPPPPPPPPPPPPTASPAGADTSGRGALMEAIRGGASLKTSSHTSNEAPKSADSRGQLLDAIRHGAQLKK

WH2B C A

exon 11B exon 12

80 90 100 110 120 ....|....|....|....|....|....|....|....|....|....|....|... ApcWasp VDNQEDRPVSSGPEEMGGLVGALAKALASRRPACVDSDDDDDDDDDDDDVSDDDDWDD Exon11B VDPSE-RP-SSGSNEANGLVGALARALANRQKQIHGS------exon 11A Figure 11 Gene structure of the alternatively spliced WASP of Aplysia californica. The cartoons outline the gene structure of the alternatively spliced WASP gene from Aplysia californica, ApcWASP. The ApcWASP gene contains a cluster of two mutually exclusive spliced exons, exon11A and exon11B, that span the C-terminal part of the second WH2 domain and the C domain. Dark grey bars and light grey bars mark exons and introns, respectively, and alternative exons are coloured.

WASP family proteins. We were not able to identify any proteins, is alternatively spliced, and encodes at least alternatively spliced exons except for a mutually exclu- five isoforms [37]. The three mammalian homer family sive spliced exon of the Aplysia californica WASP gene proteins have N-terminal WH1 domains (EVH1 (ApcWASP; Figure 11). The cluster of these mutually domain) like the Ena/VASP and the WASP proteins and exclusive spliced exons encodes alternative versions of encode four to nine different isoforms [36]. part of the second WH2 domain and the C domain, which is responsible for the autoinhibition of the WASP Discussion proteins. The autoinihibition is released through binding Initially, the WASP family of proteins consisted of the of activated GTPases, and one possible function of the WASP proteins and the WAVE proteins [7,38]. Both two different versions of the C domain of ApcWASP share the ability to catalyze actin filament branching via might be to form modulated autoinhibited complexes activation of the ARP2/3 complex but are localized and that will be activated by different GTPases. But this is regulated differently. The shared function is encoded in pure speculation and experimental evidence is needed to a similar subsequent set of domains located at the C- reveal a precise mechanism. Based on these findings we termini. A poly-proline or proline-rich region is fol- have searched all genomes of the closest related lowed by a WH2 domain, a central domain, and a C- sequenced species (e.g. Lottia gigantea, Capitalla teleta, terminal acidic domain that together are called the VCA Helobdella robusta) for mutually exclusive spliced exons domain. This set of domains has very recently been but could not identify any. identified in further proteins, WASH [19], WHAMM This situation is in sharp contrast to other proteins [20], and JMY [21] that have also been shown to activate sharing WASP domains like the NHS proteins or the the ARP2/3 complex. The common characteristic of all homer proteins. The Nance-Horan syndrome protein WASP family proteins is thus the VCA domain follow- has an N-terminal WHD domain like the WAVE ing a proline rich region. The aforementioned proteins Kollmar et al. BMC Research Notes 2012, 5:88 Page 15 of 23 http://www.biomedcentral.com/1756-0500/5/88

are regarded as subfamilies, and newly identified pro- Wiskott-Aldrich syndrome, linked to the X-chromo- teins containing a VCA domain but different N-terminal some. The protein causing the syndrome, the NHS pro- domains should be regarded as new subfamilies. The N- tein, contains an N-terminal WHD domain like the terminal domains defining the subfamilies are also pre- WAVE proteins [37,40]. Three NHS family members sent in other proteins (for example the WH1 domain is exist in humans, NHS, NHSL1, and NHSL2, and it present in Ena/VASP proteins, the WHD domain in could be shown that NHS also regulates actin remodel- NHS proteins, and the IMD domain in IRSp53 and ling and cell morphology [37]. Interestingly, the WHD missing-in-metastasis proteins) and therefore most domain of NHS interacts with all three members of the WASP family proteins also belong to other protein Abi protein family, Nap1, Sra1, and HSPC300 [37], as families. has been shown for the mammalian WAVE proteins The analysis of most eukaryotic sequenced genomes [17,18,41]. Thus, if we had wanted to confirm the pre- available allowed us to reconstruct the evolution of the sence or absence of WAVE or WASH proteins in cer- WASP family members in the different branches of the tain species by searching for the presence of the other eukaryotic tree and to identify and characterise new members of the WAVE and WASH complexes we subfamilies. It is very important to analyse as many gen- would have had to perform an exhaustive analysis of all omes as possible because even closely related species members of the complexes as well as the NHS proteins. might contain different sets of the WASP subfamilies This is, however, out of the scope of this analysis, and and subfamily members. For example, the Entamoeba we therefore rely on our experience in WASP protein species Entamoeba histolytica and Entamoeba dispar family identification to approve their presence or both contain one WASH protein, and three WAWH absence. and five WAML homologs, while the close relative Having identified all WASP family homologs in all Entamoeba invadens has lost WASH but instead sequenced organisms, the presence or absence of the encodes three WAWH and seven WAML homologs. various homologs in the respective species can now be Another example represents the green algae species plotted on the tree of the eukaryotes (Figure 12). As sequenced by today. For the Ostreococcus species a basis for the tree of the eukaryotes we choose the taxon- WASH homolog could be identified, while WASP family omy as available at NCBI [42] with many modifications homologs could not be identified in Volvox carteri and based on recently published phylogenetic analyses. Espe- Chlamydomonas reinhardtii although both have ARP2/ cially the grouping of taxa that emerged close to the ori- 3, and the Micromonas and Chlorella species do not gin of the eukaryotes remains highly debated. According have the ARP2/3 complex anymore and thus do not to most of the recent phylogenetic analyses, the Alveo- encode any WASP family protein. Therefore, it is not lata, Stramenopiles, and Rhizaria form the superfamily sufficient to only analyse a few sample genomes of each SAR [43,44]. The placement of the Haptophyceae to the of the main eukaryotic kingdoms as it has been done in SAR is still highly debated and there are several analyses previous analyses [22,39]. Of course, sequencing of in favour [45-47] as well as in contrast [43,44,48-50] to further genomes might also lead to a revision of the this grouping. Also, some analyses group the red algae conclusions drawn below. from the Rhodophyta branch to the Viridiplantae For species for which we were not able to identify any [44,50,51] and others support their independence WASP family protein homolog we searched for all actin [45,48]. The phylogeny of the supergroup Excavata is and actin-related protein genes. If these species also did the least understood because only a few species of this not contain members of the ARP2 and ARP3 subfami- branch have been completely sequenced so far. While lies the absence of ARP2/3-complex activating proteins the grouping of the Heterolobosea, Trichomonada, and was confirmed. However, we identified ARP2 and ARP3 Eugelenozoa into the Excavata is found in most analyses, in some species for which we were not able to find the grouping of the Diplomonadida as separate phyllum WASP family protein members yet. Based on the com- or as part of the Excavata is still highly debated parison of about 2,200 homologs of actin-related pro- although most recent analyses support the Excavata teins the identification of ARP2 and ARP3 homologs is superkingdom [43,48,49]. Also, we placed the nematodes straightforward (data not shown). This is in contrast to as sister group to the arthropods forming together the the search for homologs of the other four members of Ecdysozoa clade [52,53]. The other sequenced Protosto- the pentameric WAVE complex. Each of the other pro- mia species form the Lophotrochozoa branch [52]. teins of the WAVE complex is part of a protein family Our analysis shows that the most ancient eukaryote whosemembersarealsoincorporatedintotheWASH must have contained a WASP, a WAVE, and a WASH complex [23]. In addition, the WAVE complex subunits homolog, because homologs of these subfamilies have also interact with the NHS (Nance-Horan syndrome) been found in all major eukaryotic branches (Figure 12). proteins [37]. The Nance-Horan syndrome is, like the Some species contain extended repertoires of WASP Kollmar et al. BMC Research Notes 2012, 5:88 Page 16 of 23 http://www.biomedcentral.com/1756-0500/5/88

Figure 14 Figure 13 Figures 15/16 Cyanidioschyzon merolae Galdieria sulphuraria

Fungi/Metazoa Trichomonas vaginalis Aureococcus anophageferens

WAVE WAVE WAVE WAVE WASH Streptophyta

Amoebozoa Chlorophyta WASH Phytophthora ramorum WAVE WAVE WAVE WAVE WASP WASH Blastocystis hominis WAVE WAVE WAVE WASP ? ? ? WASH Ectocarpus siliculosus WAVE OomycetesBlastocystis Trichomonada WASH Naegleria gruberi Pelagophyceae WAVE WAVE Phaeophyceae WASH WASP WASP Rhodophyta Thalassiosira pseudonana ARP2/3 WASP Bacillariophyta WAVE WASP WASP ARP2/3 Bigelowiella natans WAVE WASH WASH WASP WASP

Heterolobosea Viridiplantae Stramenopiles Rhizaria Plasmodium faliciparum Trypanosoma cruzi Apicomplexa Aconoidasia Leishmania major ARP2/3 Coccidia Toxoplasma gondii 2 Alveolata Ciliophora WASH Euglenozoa Tetrahymena thermopila 3 WASH WASP WASP Excavata Giardia lamblia Diplomonadida ARP2/3 Chromalveolata WASP Emiliania huxleyi 1 Haptophyceae

Eukaryota WAVE WASH WASP Figure 12 Evolution of the WASP family proteins with respect to the species evolution. Schematic representation of the most widely accepted eukaryotic tree of life. Branch lengths are arbitrary. The WASP family protein inventories of certain taxa and specific species have been plotted to the tree with WASP subfamilies given in colour-coded boxes. White boxes in certain lineages or species indicate the loss of the ARP2/ 3 complex. Blastocystis hominis contains the ARP2/3 complex and is thus expected to contain an activator protein although none of the WASP family members could be identified. The numbers on the arrows refer to alternative placing of the respective taxa: 1: The monophyly of the Diplomonadida (instead of grouping them to the superkingdom Excavata) is supported by [54]. 2: The monophyly of the Rhodophyta is supported by [45,48]. 3: Grouping the Haptophyceae to the Chromalveolata is supported by [45-47]. family proteins like Trichomonas vaginalis encoding ele- very similar sequences to the Picea WASH in cDNA ven WAVE homologs. In contrast, the Euglenozoa con- data of another Coniferophyta, Pinus taeda,andthe tain only a WASH homolog, while the Stramenopiles Cycadophyta Cycas rumphii. N-terminal fragments of encode either a single WASH or a single WAVE pro- WASH have been identified in the genomes of the tein, and the Haptophyceae Emiliania huxleyi contains Liliopsida Sorghum bicolour, Oryza species, Phoenix dac- only a WASP homolog. The ARP2/3 complex, and thus tylifera,andSetaria italica. These fragments comprise WASP family proteins, is absent in Diplomonoda, Rho- the N-terminal about 200 residues, and no further dophyta, Bacillariophyta, and Apicomplexa. The Strame- WASH sequence could be identified in spite of extensive nopiles Blastocystis hominis contains the ARP2/3 manual inspection of the genomic DNA sequences. complex, but WASP family proteins could not be found Because the C-termini of these fragments are not con- in its genome. Here, proteins from class II NPFs might served and because homologous sequence to these frag- regulate the ARP2/3 complex. ments could not be identified in other Liliopsida WASP proteins have never been identified in plant genomes it is most probable that these fragments are genomes [39,55], and we were also not able to find any pseudogenes in the process of being disbanded. The WASP homolog in plants (Figure 13). Thus, the ances- analysis also shows that WASH has been lost in the last tor of the plants must have lost the WASP homolog. common ancestor of the eudicotyledons branch. However, we identified WASH homologs in green algae Although several good assemblies of genomes of green of the Ostreococcus species and some land plants that algae are available, WASH homologs (and the ARP2/3 separated very early in the evolution of the viridiplantae, complex) could only be identified in the Ostreococcus like the moss Physcomitrella patens and the fern Selagi- species (Figure 13). The Chlorella and Micromonas spe- nella moellendorfii. Based on these WASH proteins we ciesdonotencodeARP2andARP3anditisthusnot searched in all other available plant genomes for homo- surprising that they also do not contain WASH. How- logs. A full-length WASH homolog has only been iden- ever, the Chlorophyceae species Volvox carteri and tified in cDNA data of Picea glauca and fragments of Chlamydomonas reinhardtii do contain ARP2 and ARP3 Kollmar et al. BMC Research Notes 2012, 5:88 Page 17 of 23 http://www.biomedcentral.com/1756-0500/5/88

Populus trichocarpa WAVE1 WAVE2 Manihot esculenta, Ricinus communis WAVE1 WAVE2 Prunus persica WAVE1 WAVE2 Glycine max WAVE1 WAVE2 Salicaceae WAVE1 WAVE2 WAVE2 WAVE1 WAVE2 WAVE2 Cucumis sativus Rosales Euphorbiaceae WAVE1 WAVE2 Malpighiales Phaseoleae Medicago truncatula Carica papaya Cucurbitales Trifolieae WAVE1 WAVE2 WAVE1 WAVE2 Caricaceae Fabales Vitis vinifera fabids WAVE1 WAVE2 Arabidopsis thaliana Brassicaceae malvids Zea mays WAVE1 WAVE2 rosids incertae sedisWASH WAVE1 WAVE2 WAVE2 WAVE2 WAVE1 WAVE2 Sorghum bicolor WAVE1 rosids F Mimulus guttatus WAVE1 WAVE2 WAVE2 WAVE2 WASH Lamiales WAVE1 WAVE2 Brachypodium distachyon WASH eudicotyledons PACCAD clade WAVE1 WAVE2 WAVE2 WASH BEP clade Oryza sativa Liliopsida F Picea WAVE1 WAVE2 WAVE2 WAVE2 WASH Coniferophyta only EST data available WASH Magnoliophyta Volvox Selaginella moellendorfii Spermatophyta ? ? ? WASH WAVE WAVE Lycopodiophyta Chlamydomonas

Physcomitrella patens WASH Chlorella WAVE WAVE WASH Bryophyta Tracheophytina WASH WAVE WAVE WAVE TrebouxiophyceaeARP2/3 Chlorophyceae WASH Micromonas WAVE WAVE ARP2/3 WAVE Prasinophyceae Ostreococcus WASP WASH Figure 13 Evolution of the WASP family proteins in algae and plants. Schematic representation of the phylogenetic tree of the sequenced algae and plant species. Branch lengths are arbitrary. The WAVE and WASH protein inventories of the species have been plotted to the tree with WASH and WAVE families and classes given in colour-coded boxes. Loss of the ARP2/3 complex and WASP family proteins in certain lineages or species is indicated by white boxes. Volvox carteri and Chlamydomonas reinhardtii contain the ARP2/3 complex and are thus expected to contain an actin nucleation promoting protein although none of the WASP family members could be identified. The red “F” indicates that only a fragment of the WASH protein in Sorghum and Oryza is available. The genome assemblies of these species are of high quality and do not contain any gaps downstream of the WASH fragments. Because the found N-terminal fragments are highly similar to the respective regions of the WASH homologs of Picea, Selaginella, and Physcomitrella but homologous sequence to the rest of the WASH proteins could not be identified in the genome sequences these WASH proteins are most probably pseudogenes in the process of being disbanded. but WASH or WAVE homologs were not found. In of these classes, like those in Arabidopsis and Populus, these species the ARP2/3 complex must be activated by have been derived by species-specific duplications. another type of WASP family protein that has not been The amoebae have a rich repertoire of WASP family recognized so far. In all genomes of land plants several proteins (Figure 14). The Dictyosteliida and Acantha- WAVE homologs have been identified. While the moeba castellanii contain WASP, WAVE, and WASH WAVE homologs of Physcomitrella (containing a sur- homologs, and also homologs of the newly identified prisingly high number of seven WAVEs) and Selaginella WAWH proteins. In addition, most amoebae species consistently group together in the phylogenetic tree of sequenced so far contain several species-specific gene the WAVE proteins, the WAVE homologs of the Mag- duplications. In contrast, the Entamoeba species have noliophyta clearly group into two major classes. Variants lost WASP and WAVE, but instead encode five or more Kollmar et al. BMC Research Notes 2012, 5:88 Page 18 of 23 http://www.biomedcentral.com/1756-0500/5/88

Acytostelium subglobosum Acanthamoeba castellanii Entamoeba dispar WASH WASH WAVE WAVE WAWH WASP WASP WASP Entamoeba histolytica WASP WASP WASP WASP WASP WAVE WAWH WASH WAML WAML WAML WAML WAML Polysphondylium pallidum WAWH WAWH WAWH WASH WASP WASP WASH WAVE WAWH Entamoeba invadens WAML WAML WAML Dictyostelium fasciculatum WAML WAML WAML WAML WAVE WASP WASP WAWH WAWH WAWH WASH WASH WAWH Centramoebida WASH

WASP Dictyostelium purpureum Dictyosteliida Entamoeba Dictyostelium discoideum WAVE WAML WASH WASP WAVE Thecamonas trahens WAWH WAWH Amoebozoa WASH WASP WAVE WAWH Apusozoa

Figure 14 Evolution of the WASP family proteins in amoebae and Apusozoa. Schematic representation of the phylogenetic tree of the sequenced amoebae and Apusozoa species. Branch lengths are arbitrary. The WASP family protein inventories of the species have been plotted to the tree with WASP families given in colour-coded boxes. White boxes indicate losses of WASP family proteins in the Entamoeba lineage.

WAML homologs. WAML proteins contain an N-term- might be necessary to specifically form the distinct cel- inal IMD (IRSp53-MIM homology domain) domain, a lular architectures. The WAWH homologs in Enta- WAID domain, a proline-rich region, and a C-terminal moeba could be responsible for ARP2/3 dependent VCA domain. The VCA domain of WAML might acti- processes not located at the cell surface. vate the ARP2/3 complex in a similar way as the VCA All fungi that have been sequenced so far have lost the domains of WASP and WAVE protein. Because the WASH homolog (Figure 15). In addition, the Dikarya Rho-GTPase binding domain is missing in WAML there (containing the Ascomycotes and Basidiomycotes) and must be a different way to release the VCA domain Mucoromycotina also lost the WAVE homolog. The from the intramolecular binding to the WAID domain. ancestors of the Blastocladiomycota and Mucoromyco- What might be the function of the IMD domain in tina have duplicated the WASP gene independently of WAML proteins? N-WASP is known to be essential for each other. Some of the Basidiomycotes also encode an the formation of distinct cellular architectures such as S-WASP (short WASP, ending shortly behind the endocytic vesicles, filopodia and podosomes/invadopodia WAID domain) suggesting that the S-WASP must have [7]. These structures are formed by the action of been invented very early in Basidiomycote evolution and recently identified classes of membrane-deforming pro- subsequently been lost in most Basidiomycotes. The teins, which bind to phospholipids and deform mem- Microsporidia do not contain an ARP2/3 complex and branes to curved surfaces [56,57]. These proteins group thus do not encode any protein of the WASP family. into three subfamilies containing either a BAR domain, The missing WASP protein family complexity in Asco- an EFC/F-BAR domain, or an RCB/IMD domain [58]. mycotes and Basidiomycotes compared to the other The SH3 domains of these membrane-deforming pro- fungi lineages, plants, amoebae, and metazoans might be teins can interact with the WASP and WAVE proteins, compensated by other proteins. For example, coronins recruit these to the membrane and thus shape unique [61,62] and the class-1 myosins [63,64] contain Ascomy- membrane-cytoskeleton architectures. In addition, IMD cote- and Basidiomycote-specific additions of CA domains but not BAR domains have also been shown to domains to their protein domain organisation through bind to and bundle actin filaments [32,59,60]. The which they can bind and regulate the ARP2/3 complex. WAML proteins might represent a fusion of the mem- Most of the invertebrate species analyzed here contain brane-deforming function of the BAR/IMD proteins, the one of each of the WASP, WAVE, and WASH homo- actin-bundling function of the IMD proteins, and the logs (Figure 16). In a few cases, one of these proteins ARP2/3 activating function of the WASP/WAVE pro- has been duplicated in a species-specific manner or in teins. The high number of different WAML homologs recently separated branches. Surprisingly, the WHAMM Kollmar et al. BMC Research Notes 2012, 5:88 Page 19 of 23 http://www.biomedcentral.com/1756-0500/5/88

Metazoa Microsporidia Batrachochytrium dendrobatidis WASP WAVE

Dikarya (Ascomycota, Basidiomycota) WAVE WASP some Basidiomycotes encode 2 WASP homologs and 1 or 2 WASP S-WASPs ARP2/3 Chytridiomycota WAVE Allomyces macrogynus WASP WASP WAVE

Blastocladiomycota Mucor circinelloides Mucoromycotina WASH WAVE WASP WASP WASP Fungi Phycomyces blakesleeanus WASP WASP

Rhizopus arrhizus WASP WASP Figure 15 Evolution of the WASP family proteins in fungi. Schematic representation of the phylogenetic tree of representative fungi branches and species. Branch lengths are arbitrary. The WASP family protein inventories of certain branches and some specific species have been plotted to the tree with WASP families given in colour-coded boxes. White boxes indicate losses of the ARP2/3 complex and WASP family proteins in the respective lineages. protein has been invented very early in metazoan evolu- of the CA domains these WASP homologs should not tion, at the onset of the eumetazoans, which is based on be able to bind and activate the ARP2/3 complex. Simi- the identification of a WHAMM homolog in the Hydro- lar to WASP, the encode two homologs of the zoa Clytia hemisphaerica. However, most of the inverte- WHAMM proteins, a class-1 and a class-2 WHAMM, brate species sequenced so far have subsequently lost and several fish contain further duplicates of the class-2 the WHAMM homolog. In addition, the Trematoda WHAMM proteins. Thus, during or shortly after the Schistosoma mansoni and all sequenced Cestoda have two whole-genome duplications at the onset of the ver- lost the WASH homolog. At the origin of the verte- tebrates, the WASH duplicates and two of each of the brates two whole-genome duplication events happened WASP and WHAMM duplicates have been lost. This is [27] followed by another whole-genome duplication in contrast to the WAVE proteins. The vertebrate event in the Actinopterygii branch [28,65]. These dupli- WAVE homologs can be divided into four classes, of cations are, at least in part, represented in the WASP which the class-1 and class-2 WAVE proteins are more family protein inventory of the sequenced vertebrate closely related to each other as are the class-3 and class- species. WASH is the only WASP family protein that 4 WAVE proteins than these two groups of classes are has not been duplicated, although some WASH pseudo- related. This could best be explained by two subsequent genes could be identified in primate genomes. WASP gene-duplications. The fish have further WAVE dupli- has been duplicated resulting in one of each WASP and cates in some classes that are the result of their addi- N-WASP in tetrapods, and several further duplications tional whole-genome duplication. The Eutheria have lost of both in fish. Some of the sequenced fish in addition the class-4 WAVEs. Instead, many Eutheria encode one contain a short WASP (S-WASP) that is characterised to seven further WAVE homologs, class-5 WAVEs. by a very short poly-proline region and the absence of Most of them are, however, most likely pseudogenes the basic and the CA domains. Because of the absence because of frame-shifts, in-frame stop-codons, missing Kollmar et al. BMC Research Notes 2012, 5:88 Page 20 of 23 http://www.biomedcentral.com/1756-0500/5/88

Gallus gallus Mammalia Monodelphis domestica WAVE1 WAVE2 WAVE3 WAVE4 WAVE1 WAVE2 WAVE3 WAVE5 WAVE1 WAVE2 WAVE3 WAVE4 WHAMM1 WHAMM2 WASH WHAMM1 WHAMM2 N-WASP WASH WHAMM1 WHAMM2 WASH WASP N-WASP Anolis carolinensis WASP WASP N-WASP Aves WAVE1 WAVE2 WAVE3 WAVE4 Eutheria Gasterosteus aculeatus WHAMM1 WHAMM2 WASH WAVE4 Metatheria WAVE1 WAVE2 WAVE3 WAVE3 WAVE4 WASP N-WASP WAWH Sauropsida WHAMM1 WHAMM2 WHAMM2 WASH Mammalia WASP N-WASP N-WASP Xenopus tropicalis Brachydanio rerio WAVE1 WAVE2 WAVE3 WAVE4 Amphibia Amniota WAVE1 WAVE2 WAVE3 WAVE3 WAVE4 WAVE4 WHAMM1 WHAMM2 WASH WHAMM1 WHAMM2 WASH WASP N-WASP Actinopterygii Tetrapoda S-WASP WASP WASP N-WASP N-WASP S-WASP WASP WASH WAVE Branchiostoma floridae Oikopleura dioica Vertebrata Hexapoda N-WASP WHAMM WASP WASH WAVE WASP WAVE Tunicata WASH Cephalochordata WASP WASH WAVE Pancrustacea WHAMM Strongylocentrotus purpuratus WHAMM WASP WASH Chordata WAVE WAVE Echinodermata Ixodes scapularis WHAMM Hemichordata Saccoglossus kowalevski WHAMM WASP WASH WAVE Arachnida Arthropoda WHAMM WASP WASP WASH WAVE

Nematoda Deuterostomia Schmidtea mediterranea WHAMM Ecdysozoa WASP WASH WAVE aria WASP WASP WASH WAVE Turbell Schistosoma mansoni Lottia gigantea Mollusca Protostomia Trematoda WASH WASP WAVE WHAMM WASP WASH WAVE Lophotrochozoa Cestoda Echinococcus multilocularis Annelida Coelomata Acoelomata WASH Capitella sp. WASP WAVE WHAMM WHAMM WASP WASH WAVE WHAMM Nematostella vectensis Anthozoa WHAMM WASP WASH WAVE Helobdella robusta Cnidaria Hydrozoa WASP WASH WAVE WAVE Clytia hemisphaerica (only EST data) WASP WHAMM

Trichoplax adhearens Eumetazoa Amphimedon queenslandica Placozoa WASP WASP WASH WAVE WHAMM Porifera WASP WASH WAVE WAVE

Metazoa Monosiga brevicollis Choanoflagellida WASP WASH WAVE Capsaspora owczarzaki Fungi/Metazoa incertae sedis Holozoa WASP WASH WAVE Figure 16 Evolution of the WASP family proteins in metazoa. Schematic representation of the phylogenetic tree of representative sequenced metazoan species. Branch lengths are arbitrary. The WASP family protein inventories of the species have been plotted to the tree with WASP families given in colour-coded boxes. White boxes indicate losses of WASP family proteins in the respective lineages and species. promoters and stop-codons, and because they would be terminal domains like Ena/VASP (N-terminal WH1 encoded by a single exon in contrast to all other verte- domain), NHS (N-terminal WHD domain), and MIM brate WAVE homologs that are encoded by several (N-terminal IMD domain). Single transcripts are most exons. In addition, most species have completely lost probably a characteristic of the WASP family proteins this WAVE variant. The rat WAVE5A is the only class- because extensive cDNA data is already available. Given 5 WAVE supported by cDNA data. Interestingly, the the recent identification of two new WASP protein sub- lizard Anolis carolinensis also encodes a WAWH homo- families, WASH and WHAMM, and of the two new log based on its domain composition. subfamilies described here, WAML and WAWH, and in the prospect of many upcoming genome sequences the Conclusions WASP protein family is expected to grow further. We could show that most eukaryotes encode a rich repertoire of WASP family proteins to regulate the Methods ARP2/3 dependent remodelling of the actin cytoskele- Identification and annotation of the WASP family genes ton. Surprisingly, alternative splice variants could be WASP family genes have been identified in TBLASTN identified for only a single WASP homolog from the sea searches of the completed or almost completed genomes hare Aplysia californica. This is in strong contrast to of 529 organisms starting with the protein sequences of other actin-binding proteins that have the same N- the human homologs. Species outside the metazoans Kollmar et al. BMC Research Notes 2012, 5:88 Page 21 of 23 http://www.biomedcentral.com/1756-0500/5/88

that were expected to encode very divergent homologs Computing and visualizing the phylogenetic trees were searched with many different homologs and every The phylogenetic trees were built based on the manually single hit showing homology to either the C-terminal constructed and maintained multiple sequence align- domains (WH2, central, and acidic domains), the GBD ments of the concatenated WASP protein family charac- domain, or the N-terminal domains was analysed in teristic N-terminal and VCA (without linker) domains detail. During this search, the new WASP subfamilies using two different methods: 1. Maximum likelihood WAWH and WAML were identified based on the (ML) using the PROTMIXBLOSUM62 amino acid homology of their GBD domains to those from WASP model and bootstrapping (1,000 replicates) using proteins. All hits were manually analysed at the genomic RAxML [68]. 2. Posterior probabilities were generated DNA level. The correct coding sequences were identi- using MrBayes v3.1.2 [69]. Two independent runs with fied with the help of the multiple sequence alignment of 4,000,000 generations, four chains, and a random start- the respective protein family. As the amount of ing tree were computed using the mixed amino-acid sequences increased (especially the number of sequences option. From the 1,000th generation MrBayes exclu- of taxa with few representatives), many of the initially sively used the JTT model [70] in both runs. Trees were predicted sequences were reanalysed to correctly identify sampled every 1,000th generation and the first 100,000 all exon borders. Where possible, EST data has been of the trees were discarded as “burn-in” before generat- analysed to help in the annotation process. In addition ing a consensus tree. Phylogenetic trees were visualized to the analysis of these large-scale sequencing projects, with FigTree (http://tree.bio.ed.ac.uk/software/figtree). all WASP family genes in the “nr” database at NCBI have been collected and reanalysed. All sequence related Availability of supporting data data (names, corresponding species, GenBank ID’s, alter- The data sets supporting the results of this article are native names, corresponding reference publications, included within the article (and its additional files). domain predictions, and sequences) and references to genome sequencing centers are available through the Additional material CyMoBase (http://www.cymobase.org, [66]). Additional file 1: Zip archive of the phylogenetic trees and Generating the multiple sequence alignment sequence alignments. The file includes all phylogenetic trees of the WASP family proteins. The sequence alignments of the proteins are ThemultiplesequencealignmentsoftheWASPfamily included in fasta format. proteins have been built and extended during the pro- cess of annotating and assembling new sequences. The initial alignments have been generated from the first 10 Abbreviations non-validated sequences obtained from NCBI using the CRIB: Cdc42/Rac interactive binding; GDB: GTPase binding domain; JMY: ClustalW software with standard settings [67]. During Junction-; ediating and regulatory protein; NHS: Nance-Horan syndrome; the following correction of the sequences (removing VASP: Vasodilator-stimulated phosphoprotein; WAHD: WASH homology domain; WAID: WASP autoinhibitory domain; WAML: WASP and MIM like; wrongly annotated sequences and filling gaps) the align- WASH: Wiskott-Aldrich syndrome protein and Scar homolog; WASP: Wiskott- ment has been adjusted manually. Subsequently, every Aldrich syndrome protein; WAVE: WASP-family verprolin-homologous newly predicted sequence has been preliminary aligned protein; WAWH: WASP without WH1 domain; WH1: WASP homology domain 1; WH2: WASP homology domain 2; WHAMM: WASP homolog associated to its supposed closest relative using ClustalW, the with actin, membranes, and microtubules; WHD: WAVE homology domain, aligned sequence added to the multiple sequence align- WIP: WASP-interacting protein; WMD: WHAMM membrane interaction ment of the respective WASP family, and the alignment domain. adjusted manually during the subsequent sequence vali- Acknowledgements dation process. Still, many gaps in sequences derived We thank the many sequencing centers and funding agencies for making from low-coverage genomes remained. In those cases, unpublished sequence data available. Also, we want to thank Dr. Florian Odronitz and Björn Hammesfahr for their help with maintaining and the integrity of the exons surrounding the gaps has been developing CyMoBase. maintained (gaps in the genomic sequence are reflected as gaps in the multiple sequence alignment). The unique Authors’ contributions MK assembled and annotated sequences, did all data analysis and wrote the poly-proline regions are completely divergent in manuscript. DL assembled part of the WASP sequences. SE assembled part sequence and length and could therefore only be aligned of the WAVE sequences. All authors read and approved the final version of manually. The VCA domains are interrupted by linkers the manuscript. of different lengths, especially in WASH proteins, and Competing interests were therefore also aligned manually. The sequence The authors declare that they have no competing interests. alignments of the WASP family proteins can be Received: 3 November 2011 Accepted: 8 February 2012 obtained from CyMoBase, and are also included in Published: 8 February 2012 Additional file 1. Kollmar et al. BMC Research Notes 2012, 5:88 Page 22 of 23 http://www.biomedcentral.com/1756-0500/5/88

References 28. Steinke D, Hoegg S, Brinkmann H, Meyer A: Three rounds (1R/2R/3R) of 1. Le Clainche C, Carlier MF: Regulation of actin assembly associated with genome duplications and the evolution of the glycolytic pathway in protrusion and adhesion in cell migration. Physiol Rev 2008, 88(2):489-513. vertebrates. BMC Biol 2006, 4:16. 2. Chhabra ES, Higgs HN: The many faces of actin: matching assembly 29. Padrick SB, Rosen MK: Physical mechanisms of signal integration by factors with cellular structures. Nat Cell Biol 2007, 9(10):1110-1121. WASP family proteins. Annu Rev Biochem 2010, 79:707-735. 3. Firat-Karalar EN, Welch MD: New mechanisms and functions of actin 30. Rottner K, Hanisch J, Campellone KG: WASH, WHAMM and JMY: regulation nucleation. Curr Opin Cell Biol 2011, 23(1):4-13. of Arp2/3 complex and beyond. Trends Cell Biol 2010, 20(11):650-661. 4. Campellone KG, Welch MD: A nucleator arms race: cellular control of 31. Dominguez R: Actin filament nucleation and elongation factors-structure- actin assembly. Nat Rev Mol Cell Biol 2010, 11(4):237-251. function relationships. Crit Rev Biochem Mol Biol 2009, 44(6):351-366. 5. Pollard TD: Regulation of actin filament assembly by Arp2/3 complex 32. Lee SH, Kerff F, Chereau D, Ferron F, Klug A, Dominguez R: Structural basis and formins. Annu Rev Biophys Biomol Struct 2007, 36:451-477. for the actin-binding function of missing-in-metastasis. Structure 2007, 6. Chesarone MA, Goode BL: Actin nucleation and elongation factors: 15(2):145-155. mechanisms and interplay. Curr Opin Cell Biol 2009, 21(1):28-37. 33. Dominguez R: The beta-thymosin/WH2 fold: multifunctionality and 7. Takenawa T, Suetsugu S: The WASP-WAVE protein network: connecting structure. Ann N Y Acad Sci 2007, 1112:86-94. the membrane to the cytoskeleton. Nat Rev Mol Cell Biol 2007, 8(1):37-48. 34. Le Page Y, Demay F, Salbert G: A neural-specific splicing event generates 8. Goley ED, Welch MD: The ARP2/3 complex: an actin nucleator comes of an active form of the Wiskott-Aldrich syndrome protein. EMBO Rep 2004, age. Nat Rev Mol Cell Biol 2006, 7(10):713-726. 5(9):895-900. 9. Ochs HD, Filipovich AH, Veys P, Cowan MJ, Kapoor N: Wiskott-Aldrich 35. Volkman BF, Prehoda KE, Scott JA, Peterson FC, Lim WA: Structure of the syndrome: diagnosis, clinical and laboratory manifestations, and N-WASP EVH1 domain-WIP complex: insight into the molecular basis of treatment. Biol Blood Marrow Transplant 2008, 15(1 Suppl):84-90. Wiskott-Aldrich Syndrome. Cell 2002, 111(4):565-576. 10. Notarangelo LD, Miao CH, Ochs HD: Wiskott-Aldrich syndrome. Curr Opin 36. Shiraishi-Yamaguchi Y, Furuichi T: The Homer family proteins. Genome Biol Hematol 2008, 15(1):30-36. 2007, 8(2):206. 11. Kirchhausen T, Rosen FS: Disease mechanism: unravelling Wiskott-Aldrich 37. Brooks SP, Coccia M, Tang HR, Kanuga N, Machesky LM, Bailly M, syndrome. Curr Biol 1996, 6(6):676-678. Cheetham ME, Hardcastle AJ: The Nance-Horan syndrome protein 12. Bear JE, Rawls JF, Saxe CL: SCAR, a WASP-related protein, isolated as a encodes a functional WAVE homology domain (WHD) and is important suppressor of receptor defects in late Dictyostelium development. J Cell for co-ordinating actin remodelling and maintaining cell morphology. Biol 1998, 142(5):1325-1335. Hum Mol Genet 2010, 19(12):2421-2432. 13. Miki H, Suetsugu S, Takenawa T: WAVE, a novel WASP-family protein 38. Millard TH, Sharp SJ, Machesky LM: Signalling to actin assembly via the involved in actin reorganization induced by Rac. EMBO J 1998, WASP (Wiskott-Aldrich syndrome protein)-family proteins and the Arp2/ 17(23):6932-6941. 3 complex. Biochem J 2004, 380(Pt 1):1-17. 14. Kim AS, Kakalis LT, Abdul-Manan N, Liu GA, Rosen MK: Autoinhibition and 39. Kurisu S, Takenawa T: The WASP and WAVE family proteins. Genome Biol activation mechanisms of the Wiskott-Aldrich syndrome protein. Nature 2009, 10(6):226. 2000, 404(6774):151-158. 40. Burdon KP, McKay JD, Sale MM, Russell-Eggitt IM, Mackey DA, Wirth MG, 15. de la Fuente MA, Sasahara Y, Calamito M, Anton IM, Elkhal A, Gallego MD, Elder JE, Nicoll A, Clarke MP, FitzGerald LM, et al: Mutations in a novel Suresh K, Siminovitch K, Ochs HD, Anderson KC, et al: WIP is a chaperone gene, NHS, cause the pleiotropic effects of Nance-Horan syndrome, for Wiskott-Aldrich syndrome protein (WASP). Proc Natl Acad Sci USA including severe congenital cataract, dental anomalies, and mental 2007, 104(3):926-931. retardation. Am J Hum Genet 2003, 73(5):1120-1130. 16. Ramesh N, Geha R: Recent advances in the biology of WASP and WIP. 41. Innocenti M, Zucconi A, Disanza A, Frittoli E, Areces LB, Steffen A, Stradal TE, Immunol Res 2009, 44(1-3):99-111. Di Fiore PP, Carlier MF, Scita G: Abi1 is essential for the formation and 17. Stovold CF, Millard TH, Machesky LM: Inclusion of Scar/WAVE3 in a similar activation of a WAVE2 signalling complex. Nat Cell Biol 2004, 6(4):319-327. complex to Scar/WAVE1 and 2. BMC Cell Biol 2005, 6(1):11. 42. Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, 18. Eden S, Rohatgi R, Podtelejnikov AV, Mann M, Kirschner MW: Mechanism of Chetvernin V, Church DM, DiCuccio M, Federhen S, et al: Database regulation of WAVE1-induced actin nucleation by Rac1 and Nck. Nature resources of the National Center for Biotechnology Information. Nucleic 2002, 418(6899):790-793. Acids Res 2011, , 39 Database: D38-D51. 19. Linardopoulou EV, Parghi SS, Friedman C, Osborn GE, Parkhurst SM, 43. Parfrey LW, Grant J, Tekle YI, Lasek-Nesselquist E, Morrison HG, Sogin ML, Trask BJ: Human subtelomeric WASH genes encode a new subclass of Patterson DJ, Katz LA: Broadly sampled multigene analyses yield a well- the WASP family. PLoS Genet 2007, 3(12):e237. resolved eukaryotic tree of life. Syst Biol 2010, 59(5):518-533. 20. Campellone KG, Webb NJ, Znameroski EA, Welch MD: WHAMM is an Arp2/ 44. Burki F, Shalchian-Tabrizi K, Minge M, Skjaeveland A, Nikolaev SI, 3 complex activator that binds microtubules and functions in ER to Jakobsen KS, Pawlowski J: Phylogenomics reshuffles the eukaryotic Golgi transport. Cell 2008, 134(1):148-161. supergroups. PLoS One 2007, 2(8):e790. 21. Zuchero JB, Coutts AS, Quinlan ME, Thangue NB, Mullins RD: p53-cofactor 45. Nozaki H, Maruyama S, Matsuzaki M, Nakada T, Kato S, Misawa K: JMY is a multifunctional actin nucleation factor. Nat Cell Biol 2009, Phylogenetic positions of Glaucophyta, green plants (Archaeplastida) 11(4):451-459. and Haptophyta (Chromalveolata) as deduced from slowly evolving 22. Veltman DM, Insall RH: WASP family proteins: their evolution and its nuclear genes. Mol Phylogenet Evol 2009, 53(3):872-880. physiological implications. Mol Biol Cell 2010, 21(16):2880-2893. 46. Keeling PJ: Chromalveolates and the evolution of plastids by secondary 23. Jia D, Gomez TS, Metlagel Z, Umetani J, Otwinowski Z, Rosen MK, endosymbiosis. J Eukaryot Microbiol 2009, 56(1):1-8. Billadeau DD: WASH and WAVE actin regulators of the Wiskott-Aldrich 47. Hackett JD, Yoon HS, Li S, Reyes-Prieto A, Rummele SE, Bhattacharya D: syndrome protein (WASP) family are controlled by analogous Phylogenomic analysis supports the monophyly of cryptophytes and structurally related complexes. Proc Natl Acad Sci USA 2010, haptophytes and the association of rhizaria with chromalveolates. Mol 107(23):10442-10447. Biol Evol 2007, 24(8):1702-1713. 24. Odronitz F, Becker S, Kollmar M: Reconstructing the phylogeny of 21 48. Reeb VC, Peglar MT, Yoon HS, Bai JR, Wu M, Shiu P, Grafenberg JL, Reyes- completely sequenced arthropod species based on their motor proteins. Prieto A, Rummele SE, Gross J, et al: Interrelationships of chromalveolates BMC Genomics 2009, 10:173. within a broadly sampled tree of photosynthetic protists. Mol Phylogenet 25. Odronitz F, Kollmar M: Drawing the tree of eukaryotic life based on the Evol 2009, 53(1):202-211. analysis of 2,269 manually annotated myosins from 328 species. Genome 49. Hampl V, Hug L, Leigh JW, Dacks JB, Lang BF, Simpson AG, Roger AJ: Biol 2007, 8(9):R196. Phylogenomic analyses support the monophyly of Excavata and resolve 26. Kurtzman CP: Phylogeny of the ascomycetous yeasts and the renaming relationships among eukaryotic “supergroups”. Proc Natl Acad Sci USA of Pichia anomala to Wickerhamomyces anomalus. Antonie Van 2009, 106(10):3859-3864. Leeuwenhoek 2011, 99(1):13-23. 50. Burki F, Shalchian-Tabrizi K, Pawlowski J: Phylogenomics reveals a new 27. Van de Peer Y, Maere S, Meyer A: 2R or not 2R is not the question ‘megagroup’ including most photosynthetic eukaryotes. Biol Lett 2008, anymore. Nat Rev Genet 2010, 11(2):166. 4(4):366-369. Kollmar et al. BMC Research Notes 2012, 5:88 Page 23 of 23 http://www.biomedcentral.com/1756-0500/5/88

51. Keeling PJ: The endosymbiotic origin, diversification and fate of plastids. Philos Trans R Soc Lond B Biol Sci 2010, 365(1541):729-748. 52. Paps J, Baguna J, Riutort M: Bilaterian phylogeny: a broad sampling of 13 nuclear genes provides a new Lophotrochozoa phylogeny and supports a paraphyletic basal acoelomorpha. Mol Biol Evol 2009, 26(10):2397-2406. 53. Bourlat SJ, Nielsen C, Economou AD, Telford MJ: Testing the new phylogeny: a phylum level molecular analysis of the animal kingdom. Mol Phylogenet Evol 2008, 49(1):23-31. 54. Simpson AG, Inagaki Y, Roger AJ: Comprehensive multigene phylogenies of excavate protists reveal the evolutionary positions of “primitive” eukaryotes. Mol Biol Evol 2006, 23(3):615-625. 55. Deeks MJ, Kaloriti D, Davies B, Malho R, Hussey PJ: Arabidopsis NAP1 is essential for Arp2/3-dependent trichome morphogenesis. Curr Biol 2004, 14(15):1410-1414. 56. Peter BJ, Kent HM, Mills IG, Vallis Y, Butler PJ, Evans PR, McMahon HT: BAR domains as sensors of membrane curvature: the amphiphysin BAR structure. Science 2004, 303(5657):495-499. 57. Itoh T, De Camilli P: BAR, F-BAR (EFC) and ENTH/ANTH domains in the regulation of membrane-cytosol interfaces and membrane curvature. Biochim Biophys Acta 2006, 1761(8):897-912. 58. Frost A, Unger VM, De Camilli P: The BAR domain superfamily: membrane-molding macromolecules. Cell 2009, 137(2):191-196. 59. Millard TH, Bompard G, Heung MY, Dafforn TR, Scott DJ, Machesky LM, Futterer K: Structural basis of filopodia formation induced by the IRSp53/ MIM homology domain of human IRSp53. EMBO J 2005, 24(2):240-250. 60. Yamagishi A, Masuda M, Ohki T, Onishi H, Mochizuki N: A novel actin bundling/filopodium-forming domain conserved in insulin receptor tyrosine kinase substrate p53 and missing in metastasis protein. J Biol Chem 2004, 279(15):14929-14936. 61. Liu SL, Needham KM, May JR, Nolen BJ: Mechanism of a concentration- dependent switch between activation and inhibition of Arp2/3 complex by coronin. J Biol Chem 2011, 286(19):17039-17046. 62. Eckert C, Hammesfahr B, Kollmar M: A holistic phylogeny of the coronin gene family reveals an ancient origin of the tandem-coronin, defines a new subfamily, and predicts protein function. BMC Evol Biol 2011, 11:268. 63. Galletta BJ, Chuang DY, Cooper JA: Distinct roles for Arp2/3 regulators in actin assembly and endocytosis. PLoS Biol 2008, 6:(1):e1. 64. Sirotkin V, Beltzner CC, Marchand JB, Pollard TD: Interactions of WASp, myosin-I, and verprolin with Arp2/3 complex during actin patch assembly in fission yeast. J Cell Biol 2005, 170(4):637-648. 65. Jaillon O, Aury JM, Brunet F, Petit JL, Stange-Thomann N, Mauceli E, Bouneau L, Fischer C, Ozouf-Costaz C, Bernot A, et al: Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature 2004, 431(7011):946-957. 66. Odronitz F, Kollmar M: Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase). BMC Genomics 2006, 7:300. 67. Thompson JD, Gibson TJ, Higgins DG: Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinformatics 2002, Chapter 2, Unit 2.3. 68. Stamatakis A, Hoover P, Rougemont J: A rapid bootstrap algorithm for the RAxML Web servers. Syst Biol 2008, 57(5):758-771. 69. Ronquist F, Huelsenbeck JP: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 2003, 19(12):1572-1574. 70. Jones DT, Taylor WR, Thornton JM: The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 1992, 8(3):275-282.

doi:10.1186/1756-0500-5-88 Cite this article as: Kollmar et al.: Evolution of the eukaryotic ARP2/3 activators of the WASP family: WASP, WAVE, WASH, and WHAMM, and Submit your next manuscript to BioMed Central the proposed new family members WAWH and WAML. BMC Research Notes 2012 5:88. and take full advantage of:

• Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit