<<

The Evolutionary Landscape of Dbl-Like RhoGEF Families: Adapting Eukaryotic Cells to Environmental Signals Philippe Fort, Anne Blangy

To cite this version:

Philippe Fort, Anne Blangy. The Evolutionary Landscape of Dbl-Like RhoGEF Families: Adapting Eukaryotic Cells to Environmental Signals. Genome Biology and Evolution, Society for Molecular Biology and Evolution, 2017, 9 (6), pp.1471-1486. ￿10.1093/gbe/evx100￿. ￿hal-02267212￿

HAL Id: hal-02267212 https://hal.archives-ouvertes.fr/hal-02267212 Submitted on 24 Jun 2020

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés.

Distributed under a Creative Commons Attribution - NonCommercial| 4.0 International License GBE

The Evolutionary Landscape of Dbl-Like RhoGEF Families: Adapting Eukaryotic Cells to Environmental Signals

Philippe Fort1,2,* and Anne Blangy1,2 1CRBM, Universite´ of Montpellier, France 2CNRS, UMR5237, Montpellier, France

*Corresponding author: E-mail: [email protected]. Downloaded from https://academic.oup.com/gbe/article-abstract/9/6/1471/3852527 by guest on 24 June 2020 Accepted: May 24, 2017

Abstract Thedynamicsofcellmorphologyineukaryotesislargely controlledbysmallGTPasesoftheRhofamily.RhoGTPasesareactivatedby guanine nucleotide exchange factors (RhoGEFs), of which diffuse B-cell lymphoma (Dbl)-like members form the largest family. Here, we surveyed Dbl-like sequences from 175 eukaryotic genomes and illuminate how the Dbl family evolved in all eukaryotic supergroups. By combining probabilistic phylogenetic approaches and functional analysis, we show that the human Dbl- like family is made of 71 members, structured into 20 subfamilies. The 71 members were already present in ancestral jawed vertebrates, but several membersweresubsequentlylost in specific clades,up to 12% in birds.The jawed vertebrate repertoirewas established from two rounds of duplications that occurred between tunicates, cyclostomes, and jawed vertebrates. Duplicated members showed distinct tissue distributions, conserved at least in Amniotes. All 20 subfamilies have members in and . Nineteen subfamilies are present in Porifera, the first that diverged in Metazoa, 14 in Choanoflagellida and , single- celled organisms closely related to Metazoa and three in Fungi, the sister clade to Metazoa. Other eukaryotic supergroups show an extraordinary variability of Dbl-like repertoires as a result of repeated and independent gain and loss events. Last, we observed that in Metazoa, the number of Dbl-like RhoGEFs varies in proportion of cell signaling complexity. Overall, our analysis supports the conclusion that Dbl-like RhoGEFs were present at the origin of and evolved as highly adaptive cell signaling mediators. Key words: Dbl, guanine nucleotide exchange factors, Rho GTPases, cell signaling.

Introduction Ras-like GTPases is controlled by GTPase activating proteins The Ras-like superfamily is made of 167 proteins in human, (GAPs), which stimulate intrinsic GTPase activity and thus fa- which are distributed into five major families (Arf/Sar, Rab, vor the inactive GDP-bound form. Ran, Ras, Rho) and are highly conserved across evolution. Members of the Rho family control F-actin dependent re- Orthologs generally share 65–85% amino-acid sequence sim- organization of the cell membrane and associated intracellular ilarity, even between distantly related clades (Rojas et al. macromolecular scaffolds. Consequently, they play major 2012). The extreme conservation is in keeping with their roles roles in cell adhesion, polarity and locomotion processes. in basic cellular functions such as endo/exocytosis, F-actin dy- The human genome encodes 20 Rho GTPases, including the namics, vesicular and nucleo-cytoplasmic trafficking. Ras-like well-studied Rac1, Cdc42, and RhoA (Boureux et al. 2007), as proteins biochemically act as binary signaling switches that well as 82 RhoGEFs. The RhoGEFs can be divided into two rely on structural changes between their GDP-bound and families. There are 11 DOCK (Dedicator Of CytoKinesis)- GTP-bound conformations (Bosco et al. 2009; Raimondi related proteins (Meller et al. 2005) and the remaining 71 et al. 2011). Ras-like GTPases undergo activation and inacti- form the Dbl-like family, due to their similarity to the Dbl vation steps. Activation is promoted by guanine nucleotide (diffuse B-cell lymphoma) protein, which is an oncogenic pro- exchange factors (GEFs), which reduce affinity of the tein that activates the Cdc42 Rho GTPase (Eva and Aaronson GTPase for nucleotides thereby allowing entry of GTP, which 1985; Hart et al. 1991). Dbl-like RhoGEFs all share a 170–190 is more abundant than GDP in the cytosol. When bound to amino acid Dbl Homology (DH) domain, which is responsible GTP, Ras-like proteins activate a set of downstream effector for the guanine nucleotide exchange activity on Rho GTPases proteins that mediate their cellular effects. Inactivation of (Jaiswal et al. 2013; Rossman et al. 2005). Humans also have

ß The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]

Genome Biol. Evol. 9(6):1471–1486. doi:10.1093/gbe/evx100 Advance Access publication May 25, 2017 1471 Fort and Blangy GBE

70 RhoGAPs (Amin et al. 2016; Tcherkezian and Lamarche- http://www.geneious.com/). For specific searches, additional Vane 2007) and over 70 effectors (Bustelo et al. 2007). The genome browsers were used (see supplementary table S6, Rho signaling module is thus a complex regulatory network Supplementary Material online). Protein sequences derived that includes over 240 proteins. from genomes lacking annotations were annotated by Only a few Dbl-like members have been functionally stud- searching Pfam, CDD or SMART domain databases using ied in model organisms like Drosophila (Sone et al. 1997), the InterProScan tool, integrated in the Geneious software. Caenorhabditis elegans (Steven et al. 2005), yeast (Hart The InterProScan tool is freely available on the InterPro re- et al. 1991)andDictyostelium discoideum (Vlahou and source (http://www.ebi.ac.uk/interpro/). Rivero 2006) since the late 1990s. Most Dbl-like RhoGEFs have a high diversity of functional domains, in contrast with Sequence Alignments Downloaded from https://academic.oup.com/gbe/article-abstract/9/6/1471/3852527 by guest on 24 June 2020 DOCK RhoGEFs that have a conserved core organization, Amino acid sequences were aligned using MAFFT v7.017, made of DHR1/C2 and DHR2 domains, either alone or asso- available in the Geneious 9.1.6 package (Katoh et al. 2002). ciated with single SH3 or PH domains (Meller et al. 2005). In For human Dbl RhoGEFs, alignments were confirmed by using addition to the DH domain, Dbl RhoGEFs contain domains MUSCLE (Edgar 2004) and Promals3D (Pei et al. 2008), (http:// that either mediate interaction with membranes, proteins or prodata.swmed.edu/promals3d/promals3d.php), using the phosphorylated amino acids (e.g., C1, SH2, SH3, PDZ, PH ARHGEF7 (PDB: 1by1) and ARHGEF3 (PDB:2z0q) X-ray 3D domains) or that have diverse enzymatic activities (e.g., ki- structures as guides. MSAs were manually edited for minor nases, phosphatases, GEF or GAP). corrections and processed by BMGE (Block Mapping and The physiological importance of cell adhesion and locomo- Gathering with Entropy, Criscuolo and Gribaldo 2010)with tion, combined with the complexity of the Rho signaling net- a 0.6 cut-off value. work, raise the issue of when this network emerged and how it evolved in eukaryotic taxa, in particular in relation with multi-cellularity. We previously reported that the Rho Phylogenetic Analyses GTPase family was already present as Rac-like proteins in Phylogenetic trees were estimated by two probabilistic meth- the Last Eukaryotic Common Ancestor (LECA; Boureux et al. ods, that is, ML PhyML (Guindon and Gascuel 2003)and 2007), that is, 1.7–2.3 billion years ago (Hedges et al. 2004; Bayesian analysis MrBayes (Ronquist et al. 2012), as imple- Parfrey et al. 2011). The complexity of the Rho family re- mented in Geneious. ML returns the topology that maximizes mained at a low level in unicellular eukaryotes, fungi and an- the likelihood function and estimates node support (i.e., ro- cestral metazoa, which have a minimal Rho family repertoire, bustness of the topology) by nonparametric bootstrapping. comprising just Rac, Cdc42, and RhoA. The Rho family ex- MrBayes samples trees according to their PP and directly mea- panded in Metazoa (i.e., around 700 million years ago [Ma]), sures node support. Models for amino acid substitution were as a consequence of duplications and lateral gene transfers chosen using ProtTest (Abascal et al. 2005). In most analyses, (LGTs). In contrast, little is known about the evolutionary his- the best-fitting model was LG þ IþG. PhyML was set-up using tory of the Dbl-like RhoGEF family, in particular how members the gamma shape and proportion of invariable site parame- are related to each other within and between eukaryotic ters produced by ProtTest. ML trees were optimized for topol- clades and how the family evolved in terms of diversity, ogy, length and rate and were generated using the best of gain/loss events, and domain organization. nearest-neighbor interchange and subtree-pruning-regrafting Here we performed a comprehensive analysis of Dbl-like tree search algorithms, with 100 bootstrap replicates. RhoGEF sequences from all eukaryotic supergroups. In most MrBayes consensus trees were generated after two inde- eukaryotic clades, several species were examined, thus reduc- pendent runs of four Markov chains for 1,100,000 genera- ing the impact of incomplete assemblies on RhoGEF identifi- tions sampled every 200 generations, with sampled trees cation. Using annotation and phylogenetic tools, we trace the from the first 100,000 generations discarded as “burn- history of Dbl-like RhoGEF proteins back from extant species in”. At the end of each run, average standard deviations to the LECA and reveal a much higher plasticity of Dbl-like of split frequencies were below 0.01 and the estimated repertoires compared with Rho families. sample sizes (ESS) were above 200 for all sampled param- eters. Minimum ESS values were 787.01 (fig. 1A), 247.52 (fig. 3A), 313.87 (fig. 3B), 298.59 (fig. 3C), 347.39 (fig. 3D), Materials and Methods 428.16 (fig. 6C left panel), and 591.28 (fig. 6C right panel). For each analysis, 50% majority-rule consensus trees and Genomes and Annotated Sequences associated clade PPs were computed from sample trees. Most sequences were retrieved from the NCBI annotated Trees were visualized and exported as PDF files with databases (nr and EST, http://www.ncbi.nlm.nih.gov), using FigTree (v1.4.2, http://tree.bio.ed.ac.uk/software/figtree/) NCBI PHI-BLAST as well as BLAST and Annotation search tools then assembled in Adobe Illustrator. PP values below 0.95 available in the Geneious 9.1.6 software package (Biomatters, are considered unreliable for topological reconstruction

1472 Genome Biol. Evol. 1471–1486 doi:10.1093/gbe/evx100 Advance Access publication May 25, 2017 Evolution of Dbl-Like RhoGEFs in Eukaryotes GBE

A B DH 300 a.a. ARHGEF38 PP> 0.95 1. DNMBP DNMBP BAR SH3 ARHGEF37 2. ARHGEF7 ARHGEF6 CH SH3 PH BS> 75 13 ARHGEF7 12 VAV1

ARHGEF11 C1 SH3 SH2 SH3 11 3. VAV VAV2 CH ARHGEF28 ARHGEF1 VAV3 ARHGEF18 PDZ 4. TIAM TIAM1 PH UBQ ARHGEF2 10 TIAM2 PREX1 DEP SH3 AKAP13 PREX2 5. ARHGEF4 SPATA13 SH3 ARHGEF9

ARHGEF12 14 ARHGEF3 FGD6 NET1 ARGFEF4 ITSN1 SOS1 FGD5 ITSN2 6. SOS Histone RasGEF PLEKHG5 SOS2 FGD2 PLEKHG6 PLEKHG1 PLEKHG7 7. PLEKHG1 PLEKHG2 FGD4 9 PLEKHG3 ARHGEF15 F PLEKHG4B GD3 ARHGEF19 PLEKHG4 SEC14 SPEC F ARHGEF5 ARHGEF40 SH3 Kinase 15 GD1 MCF2 IG/FN FARP2 8. TRIO MCF2L NGEF Downloaded from https://academic.oup.com/gbe/article-abstract/9/6/1471/3852527 by guest on 24 June 2020 MCF2L2 FARP1 ARHGEF16 OBSCN // KALRN 16 ARHGEF26 TRIO ECT2L ARHGEF25 ARHGEF25 ARHGEF5 ECT2 ARHGEF19 TRIO2 9. NGEF NGEF SH3 ARHGEF17 ARHGEF15 KALRN2 ARHGEF16 ARHGEF26 17 ARHGEF10L PLEKHG4 PLEKHG5 10. PLEKHG5 PLEKHG6 ARHGEF10 ARHGEF40 PLEKHG7 ITSN1 11. ITSN EH SH3 C2 PLEKHG4B ITSN2 RASGRF2 12. NET1 ARHGEF3 KA NET1 18 LRN1 RASGRF1 ARHGEF1 TRIO1 8 ARHGEF12 PDZ RGS BCR 13. ARHGEF1 ARHGEF11 OBS ARHGEF2 ABR CN ARHGEF28 19 MCF2 ARHGEF18 MCF2 AKAP13 ALS2CL FGD1 M ALS2 CF2L2L FGD2 FYVE PH 14. FGD FGD3 PLEKHG1 FGD4 20 P PLEKHG3 FGD5 FGD6 ARHGEF33 PLEKHG2 15. FARP FARP1 FERM FA PH DNMB SOS1 FARP2 SOS2 BRCT 16. ECT2 ECT2 ARHGEF4 ECT2L FBOX ARHGEF38 ARHGEF9 SPATA13 PREX1 ARHGEF10 PREX2

TI ARHGEF37 7 17. ARHGEF17 ARHGEF10L 1 A ARHGEF17

ARHGEF39 VAV2 M1 ARHGEF7 VAV3 RASGRF1 VAV1 18. RASGRF PH RasGEF RASGRF2 ARHGEF6 TIAM2 6 19. BCR ABR PH C2 RhoGAP 2 BCR 20. ALS2 ALS2 RCC1 MORN RabGEF ALS2CL 3 5 ARHGEF33 MORN 4 ARHGEF39

FIG.1.—The 71 human Dbl-like RhoGEFs cluster into 20 structurally related subfamilies. (A) Phylogenetic cladogram of DH domains. The tree was deduced from multiple sequence alignment and processed by PhyML and MrBayes analysis. RhoGEFs subfamilies were delineated from node supports (PP: Posterior probability, BS: Bootstrap proportion). Only highly supported nodes (PP > 0.95 and BS > 75) are indicated (red circle). Numbers indicate subfamilies. (B) Subfamilies cluster members with similar functional domain organization. For each subfamily, the structural domains associated with the catalytic DH domain are indicated. All proteins are drawn at the same scale, except OBSCN. Domains typical of subfamilies are in red. BAR: Bin, Amphiphysin, Rvs; SH3: Src homology 3; CH: Calponin homology; PH: Pleckstrin homology; C1: N-terminal region of PKC; SH2: Src homology 2; UBQ: Ubiquitin; DEP: Dishevelled, Egl10, Pleckstrin; SPEC: Spectrin homology; IG: Immunoglobulin-like; FN: Fibronectin-like; EH: Eps15 homology; C2: Ca2þ binding domain of PKC; PDZ: PSD95, Dlg1, Zo-1; RGS: Regulator of G protein Signaling; FYVE: Fab1, YOTB, ZK632.12, Vac1, EEA1; FERM: Four-point-one, Ezrin, Radixin, Moesin; FA: FERM Adjacent; BRCT: BRCA1 C-Terminus; RCC1: Regulator of Chromosome Condensation 1; MORN: Membrane Occupation and Recognition Nexus.

(Erixon et al. 2003). Divergence times between taxa were mRNA-seq data were then clustered using Cluster 3.0 collected from the TimeTree database (Hedges et al. 2006) (http://bonsai.hgc.jp/mdehoon/software/cluster/; de Hoon (http://www.timetree.org/). et al. 2004). Mean centered log10 (RPKM) values were nor- malized and hierarchically clustered (Euclidian distance, complete method). Cluster heatmaps were created with Species and Tissue-Wise Gene Expression Analysis Java Treeview 1.1.6r4 (Saldanha 2004). For analysis of We used mRNA-seq data sets, generated from tissues of tissue-specificity RhoGEF expression across species, the various species, as described in (Brawand et al. 2011)and Spearman correlation was applied to log2 (RPKM) in the (Barbosa-Morais et al. 2012). As comparison of gene ex- nine species studied. Correlation heatmaps were drawn pression levels between species relied on orthology relation- with Excel 14.0.0. ships, we established a RhoGEF orthology table (see supplementary table S5, Supplementary Material online) Results by using Ensembl orthologous gene data (Flicek et al. 2014), which we further refined by phylogenetic analysis. The Human Dbl RhoGEF Repertoire Was Already Set Up at For each species, RhoGEF data were retrieved from global the Onset of Vertebrates mRNA-seq data and expressed as Reads Per Kilobase of We mined the available Dbl-like RhoGEF sequences in the transcript per Million reads mapped (RPKM). RhoGEF human genome and found the whole family includes 71

Genome Biol. Evol. 1471–1486 doi:10.1093/gbe/evx100 Advance Access publication May 25, 2017 1473 Fort and Blangy GBE

members (see supplementary table S1, Supplementary S2, Supplementary Material online). Very little is known about Material online). This represents 73 DH domains, since Trio the physiological functions associated with these genes, ex- and Kalirin each have two domains arranged in tandem. The cept a moderate association of PLEKHG4B and MCF2L2 poly- amino acid sequences of the 73 DH domains are significantly morphisms with type 2 diabetes and associated comorbidity divergent, sharing only 25 6 8% identity, suggesting an an- (Raffield et al. 2015; Zheng et al. 2009). cient origin and/or a rapid evolution rate. To gain further in- The mammalian Laurasiatheria and Afrotheria share the sight into Dbl-like RhoGEF ontogeny, we deduced same RhoGEF repertoire as Primates and Glires (fig. 2,see phylogenies by combining MrBayes and maximum- supplementary table S1, Supplementary Material online). likelihood, two site-based phylogenetic approaches, on a mul- Analysis of two Xenarthra genomes (armadillo and sloth) re- tiple sequence alignment (MSA) of the 73 DH domains. The vealed a nearly complete set of RhoGEFs, missing only Downloaded from https://academic.oup.com/gbe/article-abstract/9/6/1471/3852527 by guest on 24 June 2020 resulting tree topology has internal branches that are strongly PLEKHG4B. Finally, merging analyses of the gray short-tailed supported by both analytic methods (fig. 1A). Globally, DH opossum and Tasmanian devil genomes (Marsupials) recapit- domains appear structured into 20 clusters or subfamilies, ulated the 71 RhoGEF set, whereas the platypus genome withjusttwoisolatedmembers(ARHGEF33and (Monotremata) only lacks ALS2CL. We also identified a ARHGEF39). Most of the shallow branches (close to the pe- RhoGEF that is closely related to the drosophila GEF64C in riphery of the tree) contain two members, as a likely result of marsupials (Bashaw et al. 2001; see supplementary table S1, the whole genome duplication that occurred at the onset of Supplementary Material online). This RhoGEF was not found vertebrate radiation. Deeper branches connect more than in other mammals or in platypus. two RhoGEFs (ten for TRIO, seven for ARHGEF28, six for The RhoGEF repertoire has thus remained stable from the NGEF and FGD), indicating that these groups resulted from onset of Mammals, that is, 160–290 Ma (dos Reis et al. 2012). more ancient duplications. The clustering based on DH do- Such long-term stability is indicative of strong positive selec- main sequence is further supported by the similar functional tion of all members. Of the 71 widely shared RhoGEFS, three domain organization shared by members of each cluster were lost in several rodent species of the Murinae sub-family, (fig. 1B). Ontogeny of the TRIO cluster is complex, because suggesting they became neutral in this clade. Note that OBSCN, TRIO and KALRN have gained kinases and immuno- Murinae have gained a competitive advantage during middle globulin (IG)/fibronectin (FN) domains by recombination (see Miocene and currently constitutes the largest extant mamma- supplementary fig. S1, Supplementary Material online). lian subfamily (Tiphaine et al. 2013). Except ARHGEF33 and members of clusters 1 and 17, all In nonmammalian Amniotes, there have been greater Dbl-like RhoGEFs have a PH domain adjacent to the DH do- RhoGEF losses. The chicken genome (Sauria/Archelosauria/ main, suggesting that the two domains represent the ances- Dinosauria) lacked nine RhoGEFs (i.e., 12% of the repertoire; tral architecture. In support of this, tree topology deduced see supplementary table S2, Supplementary Material online). from phylogenetic analysis of DH/PH domains (see supple- All nine were absent in all the 48 bird genomes available. mentary fig. S2A, Supplementary Material online) and DH Three of the missing RhoGEFs (ARHGEF15, PLEKHG6, and domain (fig. 1A) are highly similar and identify same clusters. FGD1) were also missing in Crocodylia, the sister taxon of PH domains appear less conserved, because PH-only based birds, while turtles (Testudines) only lacked PLEKHG6. As topology, although similar, has lower internal nodes supports bony fishes and coelacanth have 72 RhoGEFs (the canonical for clusters 8, 14–15, and 9–12, and does not group cluster 71 RhoGEFs plus GEF64C), losses detected in Sauria are spe- 13 with clusters 9–12 (see supplementary fig. S2B, cific to this clade and have occurred sequentially (see supple- Supplementary Material online). Note also that ECT2 and mentary table S2, Supplementary Material online). Amphibia ECT2-like, which form a DH-based cluster with low support, lack FGD3 and PLEKHG6 and cartilaginous fishes lack failed to group when PH sequences are included in the ARHGEF40 and ARHGEF11 (fig. 2, see supplementary table analysis. S2, Supplementary Material online). The absence of GEF64C To gain insight in the timing of duplications detected by in placentals can be considered as a specific loss as it is present phylogenetic analyses, we looked for orthologs of human in other vertebrate clades. Dbl-like RhoGEFs across Vertebrates (fig. 2). Orthology was We next analyzed the Dbl-like RhoGEF repertoire in two calculated by reciprocal BLAST analysis and comparison of jawless fish genomes (Lampreys, Agnatha). Merging member structural domains. In all cases, orthologous members had lists from the two genomes produced a set of 31 RhoGEF unambiguous BLAST E-values, making it unnecessary to per- proteins. These 31 RhoGEFs include members of all the 20 form phylogenetic analysis. In Primates and Glires (Rodents mammalian RhoGEF subfamilies except ITSN (figs. 2 and 3A). and Lagomorpha) we identified the full set of 71 human Our phylogenetic analysis identified 21 vertebrate orthologs orthologs (fig. 2 and see supplementary table S1, and 10 cases in which lamprey sequences branched at the Supplementary Material online). However, all rodent species roots of vertebrate clusters (orange dots in fig. 3A). This indi- examined lacked PLEKHG4B and several rodent species like cates that duplications took place between Agnatha and mice lacked MCF2L2 and PLEKHG7 (see supplementary table Gnathostomata, increasing the copy number from 10 to 25.

1474 Genome Biol. Evol. 1471–1486 doi:10.1093/gbe/evx100 Advance Access publication May 25, 2017 Evolution of Dbl-Like RhoGEFs in Eukaryotes GBE Downloaded from https://academic.oup.com/gbe/article-abstract/9/6/1471/3852527 by guest on 24 June 2020

FIG.2.—Conservation of the human Dbl-like RhoGEF repertoire across Metazoa. Human RhoGEF orthologs were searched in genomes of species covering the major Metazoa clades, as indicated on the top. In most clades, three or more species were examined and orthology was deduced from reciprocal BLAST scores (“Vertebrates”, from mammals to bony fish) and by phylogenetic analysis (shark, lampreys, Ambulacraria, , Porifera, and nonmetazoan Filozoa). Members that were not found are indicated by an x box. Vertical bars show duplications that were deduced from phylogenetic analyses. Dashed vertical bars indicate duplications that cannot be precisely dated. The color code for Dbl-like subfamilies is the same as in figure 1A.

Genome Biol. Evol. 1471–1486 doi:10.1093/gbe/evx100 Advance Access publication May 25, 2017 1475 Fort and Blangy GBE

AB

OBSCNb_Pm OBSCNa_Pm

P Human P OBSCN_Cm Human OBSCN_Hs

M LE PLEKHG4B_H L PLEKHG4b_Cm m TRIO1_Pm KALR KALRN1_Hs Hs

EK s

Cartilaginous fish PLEKHG4a_Cm CF TRI TRIO1 C Tunicates _ Hs _Bf KHG _Hs ITSN ITS 5 PLEKHG4 ITSN1 I 5 H Lamprey 2_ Cephalochordates ARHGEF26_ TSN_Hr ARHGE IT O1_ G7 3_H

PLEKH NG I G N1_Cm 2_Sc C TSN SN_C G2_Le A N N2_Hs 7 No lamprey ortholog No prochordate ortholog 7_

_Hs _ KHG7_Pm Hs

m r KALRN2_H KHG1_Hs ARHGEF N KHG RHGEF G _ KALRN2_Cm Cm EF P KHG6_ _ E m PP> 0.95 EKH Bf AR G C MCF2_Hs EF_Hr _ MCF2L_Cm f MCF2L_Hs PP> 0.95 MCF2L_Pm _B m L HGEF P MCF2L2_ G1_Pm HG3_Hs H LE MCF2L2_Hs KHG3_Cm F EF_Ci _Bf i T1 G4_Hs PLEKHG1_Cm BS > 75 A LE 1_H _ B P i PL T1_Ci TRIO2_Pm PLE HG s P i BS > 75 _P s PLEKHG2_Hs RH 26_Pm PLEKHG m f P T A PLEKH AR E T1 TRIO PLEKHG C NE E ARHGEF2RHGEF25_TRIO m PLE ARH GE E 26 N Hr 26_Hs H N PLEK _Hs F ET1_ Pm PLEKH 16 _ r NE V_ _ Hs s s ABR_Hs NGEF_HsF15_Hs C N 2_Hs ARHGEF5_GE VAV_C V 1_ 2 ABR_ RF1_Cm ARH _Hs i _C BCR VA A V 3_H 4_Bf TIAM R_Pm 2_Cm F V A m BCR_Cm 19_Hs Pm TIAM 5_CmHs BC ARHGEF1_GEF11_Hs V T RASG A VAV Ci IA _P s RHGEF12_Pm VAV2_Hs TIAM1_HsM 2_Hs RASGRF1_Hs ARHGEF 2 m RASGRFSGRF2_Hs Hs ARHGEF ARH TIAM _Cm RA ARHGE ARHGEF4_Hs ARHGEF7_HsGEF7_Pm RasGRF_Pm H ARHGEF4_ Hs 1_ ARHG s ARHGEF4_ ARH Cm ARHGEF10_Cm 12_Hs TA13_Hs ARHGEF10_H ARHGEF28_CiF12_Bf ARHGEF9_Hr AR GEF7 0 ARHGEF10L_Cm EF28_Hr . EF17_Hs SPA 9 A H 9 ARHGEF10L_Hs AR GEF6_Hs_Cm RHGEF28_Bf ARHGEF9_ HGEF6_Cm ARHG Cm ARH PREX1_Hs ARHGEF17_Cm PREX2_Hs F33_ GEF PREX2_Hs P ARHGEF17_Pm ARHGEF2_Hs18_Hs PREX_Ci REX2_Cm ARHGE ARHGEF28_ PREX1_Hs PREX_Pm_Hr ARHGEF33_Hs REX PREX1_Cm ARHGEF39_Cm AKAP13_Hs P Hs PREX_Bf 7_Bf SPATA1 ARHGEF39_Hs RAS HGEF SPATA13_Cm3_Hs ARHGEF39_Pmm GRF1_Hs AR Downloaded from https://academic.oup.com/gbe/article-abstract/9/6/1471/3852527 by guest on 24 June 2020 RASG ARH ECT2_C RF2_Hs ARHGEF6_Hs GEF9_Hs ECT2_Hs ARHGEF7_Hs ARHGEF9_Cm 1 ECT2L_Hs BCR_Hr ARHGEF7_Ci_Pm ARHGEF4_Rt FGD1_Cm BCR_Pm ARHGEF7 ARHGEF4_Hs FGD1_Hs BCR_Ci ARHGEF7_Hr FGD3_Cm AM_Bf ARHGEF4_Cm FGD3_Hs BCR_Hs TI VAV3_Cm FGD2_Cm ABR_Hs TIAM1_Hs VAV3_Hs FGDb_Pm BCR_Bf TIAM2_Hs VAV1_Hs FGD4_Cm ARHGEF25_Hs VAV_Pm FGD4_Hs ARHGEF10_Pm KALRN2_Hs VAV2_Hs FGD ARHGEF10_Hr TRIO F a_Pm 2_Hs VAV2_Cm GDc_Pm ARHGEF10_Ci TRIO2_Bf FGD2_Hs ARHGEF40_Hs FARP2_Hs FGD5_Cm ARHGEF10L_Hs PLEKH FARP2_CmPm FGD 0_Bf RP1_ FGD6_Cm5_Hs ARHGEF10_Hs PLEKHG4_G4_Hs FA FG ARHGEF1 PLEKHG4_Hr FARP1_Hs D F17_Hs Ci SO 6_Hs 7_Pm PL s SOS1_HsS1_C ARHGE KALRN1_HsEKHG4B_Hs FARP1_CmBP_Pm F17_Ci SO m ARHGEF1 TRIO1_Hs DNM SOS2_CmS1_Pm DNMBP_H ARHGE TRIO_Ci SO TRIO AKAP13_Cm 6_Bf DNMBP_Cm8_Cm S2_Hs ARHGEF17_BfFGD6_Hs TRI AKAP _Pm AR FGD MCF2_HsO1_Bf ARHGEF38_Hs MCF2L_Ci ARHGH FGD5_Hs ARHGEF28_Cm1 MCF2L_Hr ARHGEF3HGEF37_Pm GEF18_3_ FGD2_Hs A H Hs MC AR _Cm ARHGRHGEF28EF s FGD3_Hs_ r MC AR F2L_Hs ARHGEF37_Hs m O ARHGE 18_HsC FGD4_HsD1 F ARHGE HGEF2_ ARHGEFB ARHGEF37_CmKHG7 m S 2 PLEKHG7_Hs s ARHGEF1_Hs EF2_Hs FG S L ARHG FGD4_HD4_Ci SOS1O C 2_Hs ARHGEF12_C D4_Bf i S N PLE HG5_P ARHGEF12_Hs _Hs FG S ARHGEF11_Hs 2b_Ci_Hs C SOS O _B _H PLEKHG6_Hs ARHGEF F _ SOS_Pm ARHGEF FG S S ARHGEF F2_Pm2_ s PLEKHG6_Cm NET1_Cm Le D _H f 3 N EF1_Sc D OS 2_Hs NET1_ CT2 ARHGEF5_Hs ARHGEF5_Le Sc ECT s DN 9_Hs ARHGEF15a_Cm ARHGEF15_Hs NMB _H PLEK E Hs DN ITSN2_H E AR NMB DNMBP_Hs s PLEKHG5_Hs AR L_Bf i FARP_Ci _ F TSN2_Cm T1_Hs CT2_Bf2L_Hs m _ I ECT2a Hs m r ITSN1_Hs E T A M PLEKHG5_Cm M Ci C T2 _ Bf HGEF3 _C Hr 3_Hs Hs ITSN1_Cm R H BP

_ P _P _ 1 _Hs 1 C G2_H _ B P_ Pm EC P1 G GEF26_Pm b 3_Cm 1 _ 1_Pm 2 P 3_Hs E H G P_Pm G1

5 m Bf H _ G EF H Ci

NGEF_Hs 1 R _ H H NGEF_Cm EK K

F A GEF19_Hs EKHG P r 3 ARHGEF26_Hs FAR E 8_Hs FARP E

m PL 7_Hs ARHGEF26_CmARHGEF16_Hs ARHGEF33 L EK FARP FARP1_ G PL LEKH ARHGEF16_Cm P L P ARH ARHGEF19_Rt H P

ARHGEF19_Cm ARHGEF15_Pm R

A

Co_PL

Co_PL

Hs_PLEKHG

Hs CDSr_AR M

Nv_PLEKHG7 Hv_PLEKHG7 Hs_PLEKHG7 H b_A _

Lc_PLEKHG5 HG6 Hv_PLE PLEKHG7 Hs_ARHGEF12s EKH Nv_P

Hs_PLEKHG6 Hs_ITSN1 _ARHG EK 1 Lc_ARHGHs Human RHG T1 Human Hs_ARH HGEF11 HGEF3 Nv_ARHG Hs_ARHG E 5 _PLEKHG5 H N Cnidaria Hv_ LEKHG5 G5b V2 Choanoflagellida G5a Hs_ K NET F1 Porifera Hs_ARHGEF2 EF _PLEK SN Hs_A A HG5 Filasterea s ITS o_N Hs_ E 5 ARHGEF2RHG _NET1 IT Hs_ITSN2 Hv_ITSN H No ortholog Lc_ITSN C Hs_VAV1 IO2 F11 11 E Hs_AR GEF F5 Hs_VAV3 No Filozoa ortholog Hs_ Hs_VA TRIO2R GEF1 RHGEF16 EF26 F2 Lc_VAV _KALRN2 ARHGEF18 Sr N Hs Nv_VAV Co_ PP> 0.95 Hs_ARH E Hs_ARHGEF25 _ H 6 PP> 0.95 Hs_AKAP13 EF28 Hs_ GE 6 _ARHGEFs_NGEF F26 Hs Co_ 1 BS > 75 H Hs_ BS > 75 Hs_ITSN2 s Nv_T c_TRIO1a Co RH F _ 6 L O1 A A Lc_TRIO1bs_KALRN1 Hs_ARHGE R GEF19 RH _A H H Hs_ARHGEF19 EF26 GE Hs_TRIO1 IO2 Hs_ GE Hs G Lc_NET 5 R s_NGEF F Hv_TRI F2 L Nv_NET 1 Nv_TRIO1 2 H F F11 H 7 5 s_FARP1ARP2 H Lc_T Hs_ARHGEEF v_NET EKHG4G4B 4B s_ARH Hs Hs_ Hs_MC _ Mb_ H NG F6 _ Hs_MCF_PL Nv_ARHGEARH N Sr_ ET1 Hs EKHG4 Hs_ FGD GEF3 s_PLEKH 4B RHGE Lc_AR H PL EKHG4_4B FGD6 Co_ARHGEF Hv_ L H Hv_ARHGEF1HG F s_FGD Hs_A RHGEF7 E 1 Nv_P 2L Hs_ F F2_ Hs_FGD2 Hs_A M1 ARH 1 Lc_PLEKHG4_2L 5 v_MC s_TIA 2 Hs_ARH GEF12 H F2_ Hs_FGD H IAM Hs GEF11 Nv_MCOBSCN Hs_T _ARHGEF Hs_ CN Hs_FGD 4 Nv OBS Mb_TIAM _ARHGE 1 Hv_ 2L2 3 F28 Hs_FGD1 IAM Hv_ARHG Hs_MCF EF4 Sr_T EF28 RHG NMBP Hs_AR Hs_A EF9 Co_ Co_D HGEF2 RHG FGD 7 Hs_A Hs_A F3 RHGEF18 Hs_SPATA13 Hs_ALS2CL Hs_ARHGE Hs_ARHGEF28 Lc_ARHGEF4 Hs_ARHGEF38 Hs_ALS2 Hs_AKAP13 Nv_ARHGEF4 Hs_DNMBP Hs_PRE V Lc_SOS X1 Co_VA Mb_D Hs_SOS2 Hs_PREX2 NMBP Lc_PREX Sr_VAV Hs_ARHGEF25 Hs_SOS1 S Hs_PLEKHG _VAV Hv_SO Mb Hs_KALRN2 SOS Hs_PLEKHG21 Nv_ Hs Hs_TR _PLE Hs_VAV2 IO2 _RASGRF Hv KHG3 Hs_ARHGEF Lc sGRF _PL Hs_VAV3 _Ra F2 Lc_ EKHG1 Hs_PLEKHG4 Hv Hs PL _3 _VAV1 F1 _A EK 2 40 Hs_AR HG1_ Hs CT Hs_PLEKHG4 Hs_RASGR L RHGE 3 RASGR Hv_ T2 H HG F6 Sr_E s Hs_ _ECT2 N ARHGE _KALR Nv CT2L v_ EF Hs_ T2L Lc A 7 Mb_EC Lc_E _ARHRHGEF7F7 39 H T B _EC 2 H F s RIO1 N1 Hs s_T Hs_M_OBSC HGEF39 2 Hs GEF7 Hs_ECT2L 33 ECT T Lc_ _ IAM1 Hs_ c_ C TIAM L E Hv_TIA Hs_M Hs_AR _ TIAM ARHGE C N Hv ECT217 Nv_TIAM 2 Hs MCF2LF2 Hs_DNM Hs_ Hs_ GEF Hs_ARHGEF38 1 H _P M GEF10 F Hs_ s CF2L2 Hs_ARHGEF37 Hs_ARHGEFRHGEF10L _P HGEF17 Hv_DNMBP A R Sr_PLEKHG1 LEKHG1 Lc_DN Hs_ARHG c_ARH 0 N BP s_ ARH BCR BR Hs PLEKHG2L L 1 Hs_FARv_ H _ 2 Hs_SPATA13 E s_AR H Sr_AR RHGEF17 Hv_FARP S M KH Sr H Hs_PREX _A H Nv_F s_FARP2 DNMBP Hs_ S1 _ Lc_FARP Hs s Hs_FGD1 7 MBP 7 b_PREX ARHGEF9 R Hs_FG _RASG

Hv_ARHGEF17 Hs_RASGRF2 s_ Hs_F Hs_FGD2 Hv_FGD1_4 Nv_FGD1_4 _PRE G3 H 1 Nv_A _ARHGEF10 CR Hs Sr_SOS CT2 CT2 HGEF10L ARP P1 P Lc E

Lc_BCR FGD E EF1 HGEF _ R Nv_ARHGEF10 Nv_BCR GD4 Hs_SO EF4

Hv_BC GEF D3 Co_SOS s Hs_SO EX Hv_ARHGEF Hs_B Hs_ABR X

Nv_ H

Hs_AR Co_ Hs_ARHGEF10 Hs_FGD6 Hs_FGD5 2

1 RHG

4

Lc_FGD1_4 A

_ARH

o_

Hs

C

FIG.3.—Phylogenetic analyses of Dbl-like RhoGEFs in Metazoa. (A) Clustering of human and early vertebrate RhoGEFs. Hs: Homo sapiens; Cartilaginous fishes: Cm: Callorhinchus milii;Le:Leucoraja erinacea;Rt:Rhincodon typus; Lamprey: Pm: Petromyzon marinus. Lamprey members (names in orange and colored branches) at the roots of subfamilies or clusters are figured by an orange dot. No ortholog to ECT2L (blue dot) was found in Cm, Le, Rt or Pm genomes. (B) Clustering of human and prochordate RhoGEFs. Tunicates (names in green, blue branches): Ci: Ciona intestinalis;Hr:Halocynthia roretzi;Pm: Phallusia mammillata; Cephalochordate (names in orange, red branches) Bf: Branchiostoma floridae. (C) Clustering of human and early metazoan RhoGEFs. Cnidaria (green): Hv: Hydra vulgaris;Nv:Nematostella vectensis. Porifera (orange): Lc: Leucosolenia complicata.(D) Clustering of human and nonmetazoan Filozoa RhoGEFs. Choanoflagellida (green): Mb: Monosiga brevicollis;Sr:Salpingoeca rosetta. Filasterea (orange): Co: owczarzaki.DHdomain amino acid sequences were aligned and analyzed by PhyML and MrBayes methods. Highly supported nodes are figured by a red circle. Names and accessions are listed in supplementary tables S1 and S3, Supplementary Material online. Color codes for RhoGEF subfamilies are as in figure 1A.

1476 Genome Biol. Evol. 1471–1486 doi:10.1093/gbe/evx100 Advance Access publication May 25, 2017 Evolution of Dbl-Like RhoGEFs in Eukaryotes GBE

The 31 RhoGEFs in extant Agnatha thus covers homologs to subfamilies, as shown in figure 4A. This is illustrated by 46 vertebrate RhoGEFs. Since the Vertebrate repertoire is 72, KALRN and RASGRF2, which are mostly expressed in the including GEF64C, at most 26 RhoGEFs can be considered as brain, whereas their respective paralogs TRIO and RASGRF1 missing or lost in lamprey genomes. This implies that ancestral are highly expressed in the cerebellum. Conservation of Agnatha might have had at most 57 Dbl-like members. RhoGEF expression across species and tissues was further es- We extended Dbl-like RhoGEF analysis to Tunicates, the timated by Pearson correlation analysis from gene expression sister clade to Vertebrates, and to Cephalochordates (lance- profiles (fig. 4B). In all species, correlation values were highly let). Tunicates and Cephalochordates (Prochordates) have a significant (>0.6), even for chicken, despite its higher diver- bilateral body plan, a notochord, a dorsal neural tube, pha- gence time (’ 310 Myr). These data indicate that the last ryngeal slits, a postanal tail and an endostyle (Holland 2005). round of duplications within RhoGEF subfamilies was rapidly Downloaded from https://academic.oup.com/gbe/article-abstract/9/6/1471/3852527 by guest on 24 June 2020 By combining four tunicate species, we identified 25 followed by subfunctionalization events and that tissue ex- RhoGEFs, including members of most of the 20 mammalian pression profiles of RhoGEFs were likely set up around 630 subfamilies (figs. 2 and 3B, see supplementary table S1, million years ago in early vertebrates. Supplementary Material online). All species lacked ALS2, Overall, this study shows that the ancestral vertebrate rep- ARHGEF33, ARHGEF39, OBSCN, and RASGRF. Phylogenetic ertoire was established from two rounds of duplications that analysis showed that the vast majority of Tunicate and occurred between Tunicates and bony fishes, and since then, Cephalochordate DH domains branched at the roots of mam- it has remained mostly unchanged. Since duplications were malian shallow clusters (fig. 3B). This indicates that wide- followed rapidly by subfunctionalization events, it is, there- spread RhoGEF duplications took place between Tunicates fore, implicit that major tissue-specific and Rho-controlled and Vertebrates as a result of whole genome duplications regulatory pathways werealreadysetupinearly that occurred between Tunicates and Agnatha (Smith et al. Vertebrates. The Dbl-like repertoire is nearly conserved be- 2013). The timing of duplications cannot be precisely deter- tween mammals and fishes, two clades with widely different mined for members of the ARHGEF4, ITSN, ARHGEF1, FGD, physiology. One may thus infer that the Dbl-like members and ARHGEF17 subfamilies, since they are absent in the two have been strongly selected for their roles in basic jawless fish genomes (dotted lines, fig. 2). vertebrate-specific functions. This may be more complex, as Thus, the ancestral vertebrate repertoire was established evolution of several vertebrate clades has been associated from two rounds of duplications: One that occurred between with Dbl-like RhoGEF losses: GEF64C was lost in placental Tunicates and Agnatha, that is > 547 Ma, which produced 57 mammals; PLEKHG4B and MCF2L2, two members of the RhoGEFs, the second, between Agnatha and bony fishes, that TRIO subfamily, and PLEKHG7 were lost in Rodents; Nine is, 485 Ma, produced 15 additional RhoGEFs. Dbl-like RhoGEF members were lost in Birds. The losses of members that are otherwise highly conserved across Vertebrates indicate that the selective pressure exerted on Inferring Subfunctionalization from Tissue Distribution tissue-specificity or expression of Rho-controlled pathways We next examined how tissue distribution of Dbl-like RhoGEF may considerably change during clade specialization. expression correlates with their ontogeny. We analyzed RhoGEF expression in high-throughput RNA sequencing (RNA-Seq) data, from brain, cerebellum, heart, kidney, liver Early Metazoans Had Most Vertebrate Dbl-like RhoGEF and testes, of man, chimp, gorilla, orangutan, macaque, Sub Families mouse, opossum, platypus, and chicken (Barbosa-Morais We next examined the RhoGEF repertoires in Ambulacraria et al. 2012; Brawand et al. 2011). Hierarchical clustering of ( and Hemichordates) and Protostomia (fig. 2, expression values shows that most RhoGEFs are differentially see supplementary table S3, Supplementary Material online). expressed (fig. 4). RhoGEFs grouped into a small number of Ambulacraria have 30 RhoGEFs (30 in Hemichordates, 28 in well-supported coregulated clusters, based on their distribu- Echinoderms), including members of all the 20 vertebrate tion across tissues and species. The three brain þ cerebellum subfamilies except the PREX branch. Protostomes also have (B þ C) sub-clusters have Pearson correlations ranging from members of the 20 vertebrate subfamilies, but all lack the 0.72 to 0.89. In the brain cluster (B), brain-specific expression PREX and NGEF branches of the ARHGEF4 and NGEF subfa- of NGEF, KALRN and RASGRF2 is highly correlated (Pearson’s milies, respectively (fig. 2). However, Dbl-like repertoires r ¼ 0.7) and conserved from primates to chicken. greatly differ within clades. In Lophochotrozoa, The tree topology of hierarchical clustering did not corre- Mollusks have the complete Protostome repertoire whereas late with that of the phylogenetic analysis. This argues in favor Platyhelminthes (flatworms) lack six subfamilies (fig. 2,see of subfunctionalization (i.e., differential expression of the two supplementary table S3, Supplementary Material online). In copies), a process that promotes conservation of functional , Crustaceans, and Hymenoptera (Insecta) lack duplicate gene copies (Krakauer and Nowak 1999). Indeed, ECT2L (fig. 2, see supplementary tables S2 and S3, most coregulated clusters include RhoGEFs from different Supplementary Material online). All other Insecta clades

Genome Biol. Evol. 1471–1486 doi:10.1093/gbe/evx100 Advance Access publication May 25, 2017 1477 Fort and Blangy GBE

0.23 Brain Cerebellum Heart Kidney Liver Testis 0.15 0.08 B A 0 - 0.08 Sub- - 0.15 families Ggor Mmul Oana Hsap Mmul Mmus Oana Ggor Mmus Hsap Ggor Mmul Mmus Mdom Oana Ggal Ptro Ggor Ggal Hsap Ggor Oana Ggal Pabe Mmus Mdom Ggal Ptro Pabe Ggal Ptro Ggor Pabe Mdom Oana Hsap Ptro Mmus Ptro Pabe Hsap Pabe Mdom Oana Ptro Mdom Mmul Hsap Mmul Mmus Ggal Mdom - 0.23 Mmus ALS2CL 20 1.00 Brain Cerebellum Heart Kidney Liver Testis ARHGEF10L 17 0.66 ARHGEF1 13b ARHGEF12 13b 0.33 al FARP1 15 g G Hsap Ggor Pabe Ptro Mmul Oana Oana Hsap Pabe Ggor Mmus Mdom Ggal Mmus Mmul Mmus Pabe Ggor Pabe Oana Hsap Ggor Mdom Oana Mdom Oana Ggal Mmus Oana Ggal Ptro Mmus Mdom Ptro Mmul Mdom Ggal Ggor Hsap Mdom Hsap Pabe Hsap Ggor Ggal Ptro Mmul Ptro Mmus Ptro 0.00 Mmul Mmul ARHGEF15 9a 14 FGD5 Hsap ARHGEF19 9 VAV2 3 Ptro ARHGEF26 9b Ggor NET1 12 Mmul FGD2 14 Brain Pabe VAV1 3 C ALS2 20 Mmus ARHGEF7 2 Mdom ECT2L 16 Oana ARHGEF37 1 Ggal ARHGEF40 8 ARHGEF39 Hsap T FARP2 15 Ptro ARHGEF10 17 Ggor PLEKHG4 8 Mmul ECT2 16 ABR 19 Cerebellum Pabe B+C FGD1 14 Mmus ARHGEF17 17 Mdom ARHGEF9 5a Oana Downloaded from https://academic.oup.com/gbe/article-abstract/9/6/1471/3852527 by guest on 24 June 2020 PREX2 5b ARHGEF25 8 Ggal PLEKHG5 10 Hsap MCF2L 8 Ptro ARHGEF18 13a Ggor ARHGEF3 12 BCR 19 Mmul ARHGEF11 13b Heart Pabe TIAM2 4 Mmus ARHGEF2 13a 8 Mdom MCF2L2 Oana ARHGEF4 5a PREX1 5b Ggal RASGRF1 18 Hsap TRIO 8 Ptro TIAM1 4 Ggor NGEF 9a KALRN 8 Mmul B RASGRF2 18 Kidney Pabe ARHGEF28 13a Mmus VAV3 3 Mdom ITSN1 11 MCF2 8 Oana PLEKHG4B 8 Ggal ARHGEF6 2 Hsap PLEKHG1 7 Ptro FGD6 14 SOS1 6 Ggor SOS2 6 Mmul ARHGEF5 9a Liver Pabe DNMBP 1 Mmus AKAP13 13a FGD4 14 Mdom PLEKHG2 7 Oana ARHGEF38 1 Ggal SPATA13 5a Hsap ARHGEF16 9b PLEKHG3 7 Ptro K FGD3 14 Ggor ITSN2 11 Mmul PLEKHG6 10 Testis Mmus ARHGEF33 OBSCN 8 Mdom PLEKHG7 10 Oana Ggal

FIG.4.—Conservation of tissue specific expression of Dbl-like RhoGEFs in vertebrates. (A) Heatmap of RhoGEF mRNA expression in six tissues across nine vertebrate species. Orthologous RhoGEF values were extracted from global RNA-seq data. Mean centered log10(RPKM) values were normalized and hierarchically clustered (Euclidian distance, complete method). Colors of RhoGEF names and sub-families correspond to those delineated in figure 1A. Red and green frames illustrate closely related members expressed in same or distinct tissues, respectively. Red and green dots indicate clusters with Pearson correlations of > 0.7 and >0.5, respectively. B: Brain; C: Cerebellum; K: Kidney. Hsap: Homo sapiens;Ptro:Pan troglodytes;Ggor:Gorilla gorilla;Mmul:Macaca mulatta;Pabe:Pongo abelii; Mmus: Mus musculus;Mdom:Monodelphis domesticus;Oana: Ornithorhynchus anatinus;Ggal:Gallus gallus.(B) Symmetrical heat map of Pearson correlations from RhoGEF gene mRNA expression (log RPKM values) for the six tissues and nine species examined. examined lack additional RhoGEFs; all lack DNMBP and NET1, In non metazoans, we identified 28, 22, and 26 while Diptera and Lepidoptera (Mecopterida) have also lost RhoGEF members in Cnidaria, and Porifera ge- ITSN. According to holometabola phylogeny (Peters et al. nomes, respectively. Phylogenetic analyses placed Cnidaria 2014), DNMBP and NET1 were lost in Aparglossata and Porifera DH amino acid sequences at the roots of most (Coleoptera, Lepidoptera, and Diptera), then ITSN in vertebrate subfamilies. Only orthologs to PREX and Mecopterida (Lepitoptera and Diptera; see supplementary ta- ARHGEF28 were missing in Cnidaria and Porifera, respectively ble S2, Supplementary Material online). The absence of (fig. 3C). Since Porifera is a monophyletic group that diverged DNMBP and NET1 in Hemiptera can be considered as inde- first in Metazoa, this suggests that ancestral Metazoa had at pendent losses. Last, Nematodes lack five Dbl-like proteins as least 19 of the 20 vertebrate subfamilies (fig. 3C, see supple- compared with other Ecdysozoa: NET1, BCR, GEF64C, mentary table S3, Supplementary Material online). This would RasGRF, and ARHGEF10 (see supplementary table S2, imply that the metabolic pathways controlled by RhoGEFs Supplementary Material online). subfamilies were already set up at the onset of Metazoa. Thus, the Dbl-like RhoGEF repertoire of ancestral Evolution from single-celled ancestors to metazoa is hy- Protostomia is very similar to that of Ambulacraria. However, pothesized to have involved a colony-forming transition stage, with the exception of Mollusks, several RhoGEF subfamilies formed by cells that were similar to the extant were lost in other Protostomia clades (up to five in Choanoflagellates (Monosiga brevicollis, Salpingoeca rosetta; Lepidoptera and Nematodes). In addition to vertebrate ortho- Nielsen 2008). Furthermore, Choanoflagellates and the logs, three specific RhoGEFs were found in most Protostomia Filasterea Capsaspora owczarzaki are closely related clades: PsGEF (Higuchi et al. 2009) and Tag-52, specific to to metazoans (King et al. 2008; Suga et al. 2013). Together Protostomia, and RhoGEF64C, conserved in Protostomia and with Metazoans, these single-celled lineages formed the Deuterostomia Vertebrates then lost in placentals (see supple- Filozoa group (Torruella et al. 2012). We identified mentary tables S1 and S2, Supplementary Material online). 13 Dbl-like RhoGEFs in M. brevicollis and S. rosetta, and 12

1478 Genome Biol. Evol. 1471–1486 doi:10.1093/gbe/evx100 Advance Access publication May 25, 2017 Evolution of Dbl-Like RhoGEFs in Eukaryotes GBE

in C. owczarzaki (fig. 2, see supplementary table S3, 100 RhoGEF Supplementary Material online). Phylogenetic analyses of Rho Vertebrates 80 DH amino acid sequences showed that Choanoflagellates Amphibia Fishes Primates and Filasterea RhoGEFs clustered with Vertebrate subfamilies 60 Squamates (fig. 3D). The clustering is further supported by the striking s = 0.38 conservation of functional domains associated to the DH do- 40 Agnatha mains (see supplementary fig. S3, Supplementary Material Cnidaria Tunicates online). Only six subfamilies have no homolog: TRIO, FARP, Porifera Arthropods Rhos of er Number of RhoGEFs 20 s = 0.035

ALS2, RasGRF, BCR, and ARHGEF10. We also deduced that Choanoflagellida 10 Numb two ancestral duplications took place between 0 Downloaded from https://academic.oup.com/gbe/article-abstract/9/6/1471/3852527 by guest on 24 June 2020 Choanoflagellida and Porifera, which produced two branches 0 50 100 150 Number of cell types in the ARHGEF1 and FGD subgroups in Porifera (fig. 2).

Concerning the ARHGEF1/ARHGEF2 duplication, the ances- FIG.5.—Sizes of Dbl-like families in Metazoa correlate with numbers tral architecture was likely (PDZ)/RGS/C1/DH/PH, as observed of cell types. Dot plot showing the relationship between the numbers of in C. owczarzaki. After duplication, one copy lost the N-ter- cell types in different metazoan species (Schad et al. 2011; Hedges et al. minal PDZ/RGS, likely by truncation (see supplementary fig. 2004; Valentine et al. 1994) and the numbers of RhoGEFs (circles) and S3, Supplementary Material online). their target Rho GTPases (squares) (calculated from Metazoa data in sup- Thus, a large fraction of the vertebrate RhoGEF repertoire is plementary tables S1 and S3, Supplementary Material online). Data from the following species were used: Primates: H. sapiens; Squamates: A. present in extant unicellular organisms that are closely related carolinensis; Amphibia: X. tropicalis;Fishes:Danio rerio; Agnatha: P. mar- to metazoans, strongly suggesting it was also the case of the inus; Tunicates: C. intestinalis; Arthropods: D. melanogaster, A. gambiae; last common ancestor of Filozoa >1,000 Ma (Hedges et al. Cnidaria: H. magnipapillata; Porifera: A. queenslandica; Choanoflagellida: 2004; Parfrey et al. 2011). The transition to multi-cellularity M. brevocollis. S indicates the slopes of regression lines, whose confidence has thus been associated with emergence of a restricted num- intervals are indicated by dashed lines. ber of novel RhoGEFs with new functional domains, such as the SEC14 and spectrin domains in TRIO or the Ras guanine nucleotide exchange domain in RasGRF. The repertoire of supergroup (Brown et al. 2013). Obazoa and its sister RhoGEFs then remained fairly stable in metazoans until the group form the supergroup (previ- vertebrate transition, at which time whole genome duplica- ously known as Unikonta; Adl et al. 2012), and they diverged tions generated a 2–3-fold increase in RhoGEF members and 1,480 Ma (Hedges et al. 2006; Parfrey et al. 2011). A common further sub functionalization. feature of Amorphea is the presence of a single cilium or The many expansion and loss events observed across and flagellum associated with a unique centriole, whereas the within Metazoan clades suggest that sizes of Dbl-like reper- species of other eukaryotic clades have two centrioles and toires reflect the diversity of external stimuli cells respond to. two flagella/cilia, that is, the ancestral state of all eukaryotes We tested this hypothesis by considering the number of cell (Roger and Simpson 2009). types in an organism as a proxy of its cell signaling complexity In Fungi, the number of extant species surpasses 100,000 (Carroll 2001; McCarthy and Enquist 2005; Valentine et al. and metagenomic data suggest there may actually be several 1994; Hedges et al. 2004). The number of RhoGEFs varies million (O’Brien et al. 2005). Fungi are structured into a few linearly with the number of cell types, with a steep slope phyla, among which , , and (0.38 60.05, P < 104; fig. 5). In contrast, the number of diverged between 812 and 1,300 Ma Rho GTPases activated by Dbl-like RhoGEFs varied with a whereas the and (the two main much lower slope (0.035 6 0.007, P ¼ 8.104). This supports phyla of ) diverged later (662–772 Ma; Floudas et al. the notion that Dbl-like RhoGEF and not Rho repertoires vary 2012; Parfrey et al. 2011). We examined 37 fungal genomes in proportion of cell signaling complexity. from various clades and found a high variability in the number of Dbl-like members, ranging from 35 to 39 in Mucoromycotina,11to21inChytridiomycota, Multiple Independent Dbl-like Expansion and Reduction Glomeromycota, and Basidiomycota, to only 5 to 8 in Events in Amorphea/Unikonta Ascomycota (fig. 7A, see supplementary table S4, The observation that 14 Dbl-like RhoGEF Vertebrate subfami- Supplementary Material online). The Ascomycota lies were already present at the root of Metazoa prompted us morels and truffles (PP in fig. 7A), which de- to look at sister clades of Metazoa (fig. 6). Metazoa and Fungi velop fruiting bodies, have the same Dbl-like repertoire as form the monophyletic eukaryotic supergroup known as yeasts (SS in fig. 7A), which do not. Thus, the higher Dbl- Opisthokonta (Baldauf and Palmer 1993), and they diverged like content in Basidiomycota as compared with around 1,300 Ma (Hedges et al. 2006; Parfrey et al. 2011). Ascomycota is probably not associated with the ability of Opisthokonta, Breviatea, and Apusomonads form the Basidiomycota to develop fruiting bodies.

Genome Biol. Evol. 1471–1486 doi:10.1093/gbe/evx100 Advance Access publication May 25, 2017 1479 Fort and Blangy GBE

20/20 sub-families 19/20 sub-families Mammals 14/20 losses Birds Duplications Vertebrates sub-families • Fishes Duplications Cyclostomata Dbl expansion Urochordates in Metazoa loss of PREX losses Platyhelminthes ARHGEF26 3/20 Mollusks sub-families • losses Nematodes Amorphea losses1 Arthropods

METAZOA Downloaded from https://academic.oup.com/gbe/article-abstract/9/6/1471/3852527 by guest on 24 June 2020 • Porifera Choanoflagellida • losses Ascomycota Basidiomycota Fungi expansion Mycetozoa expansion losses Entamoeba Amoebozoa RopGEF expansion losses 2 complete loss Rhodophyta Stramenopiles complete loss Acquisition of Rho signaling components LECA expansion S.A.R Dbl/DOCK/Rac expansion Bikonta Cryptophyta Haptophyta expansion Heterolobosea no Dbl Excavates expansion Parabasalia no Dbl Diplomonadida

Archaea

1780 1200 800 300 Million years 3500 2500 1500 500

Archean Proterozoic Phanerozoic

FIG.6.—Summary of Dbl-like RhoGEF gain and loss events across supergroups. A global view of the phylogeny of supergroups and taxa examined is presented, with the timeline of eukaryote emergence. Gain (in blue) and loss (in red) events that have built the repertoires of extant species are indicated. 1: In Holometabola and Paraneoptera, and 2: In Chlamydomonales. The number of subfamilies, as defined in figure 1, is indicated for Metazoa. Green dots indicate nodes important for inferring the expansion of the Dbl family in Metazoa.

In most fungal phyla and species examined, we identified Dbl-like families with similar levels of diversity, although the eight types of Dbl-like proteins. Seven display specific struc- former taxon encodes 6–8-fold more members than the latter tural domains associated with the DH (fig. 7B). These are (fig. 7A). This is consistent with multiploidy, which is sus- Cdc24 (CH and PB1), Rom (DEP and CNH), DNMBP (BAR), pected to have occurred in Mucoromycotina (Albertin and Tus1 (CNH), FGD (FYVE), ITSN (EH, SH3), and LRR (Leucine Marullo 2012). Rich Repeats). Rom and Cdc24 are present in all examined Phylogenetic analysis of DH domains identified three clus- phyla, DNMBP is only missing in (true ters grouping fungal and metazoan RhoGEFs (FGD, ITSN, and yeasts) and FGD, ITSN, and LRR are missing in Ascomycota. DNMBP, fig. 7C, red frames), all of which are active on Cdc42 In addition to these multi-domain Dbl-like proteins, all exam- in mammals. The clustering is in agreement with the presence ined genomes encoded RhoGEFs with DH/PH or DH domains of the FYVE, BAR, EH, and SH3 domains (fig. 7B). All three only, like Fusl in Agaricus bisporus and Fus2p in groups are well supported by posterior probabilities (PPs) Saccharomyces cerevisiae. Phylogenetic DH analysis clustered (>0.95) but have only moderate to low maximum likelihood fungal Dbl-like proteins from multiple clades together and (ML) bootstrap values, suggesting that fungal and vertebrate members of each cluster have the same functional domains DH sequences have reached too high a proportion of satu- (fig. 7C). Note that Mucoromycotina and Ascomycota have rated sites to be clustered with confident support.

1480 Genome Biol. Evol. 1471–1486 doi:10.1093/gbe/evx100 Advance Access publication May 25, 2017 Evolution of Dbl-Like RhoGEFs in Eukaryotes GBE

AB

C Downloaded from https://academic.oup.com/gbe/article-abstract/9/6/1471/3852527 by guest on 24 June 2020

FIG.7.—The Dbl-like RhoGEF family in Fungi. (A) RhoGEFs were searched in genomes of species (in italics) distributed in the various Fungi phyla (in bold). RhoGEFs were classified according to their family (Dbl-like or DOCK) and their structural domain organization (see B). Ascomycota: : PE: ; PL: ; PO: Orbiliomycetes; PP: Pezizomycetes; PS: . Ascomycota Saccharomycetales: SD: Debaryomycetaceae; SS: Saccharomycetaceae. T: . Basisiomycota A: Agaricales; C: Agaricomycotina Corticiales; M: ; P: Pucciniomycotina ; U: . (B) Eight types of RhoGEFs were identified in Fungi, based on the presence of functional domains. CH: Calponin homology, PB1: Phox/Bem1, DEP: Dishevelled/Egl10/Pleckstrin, CNH: Citron/Nik1 homology, BAR: Bin/ Amphiphysin/Rvs, FYVE: Fab1/YOTB/ZK632.12/Vac1/EEA1, EH: Eps15 homology, SH3: Src homology 3, LRR: Leucine Rich Repeats. Three fungal RhoGEFs (DNMBP, FGD, ITSN) share similar functional domain organization with human RhoGEFs. (C) PhyML and MrBayes phylogenetic analysis of DH domains from Fungi of different clades, as indicated by the color code on the top left, excluding (left tree) or including (right tree) human DH sequences (in black). Nodes supported by posterior probabilities above 0.95 are indicated by red and yellow circles, with bootstrap BS values > 60 or >40, respectively. The domain organization of each cluster is shown.

In summary, Dbl-like repertoires in Fungi are highly hetero- Motif), ARM (Armadillo), and ANK (Ankyrin). Note also that geneous, ranging from 5 to 39 in number, and this complexity the domains of the three RhoGEFs that are common to all appears unrelated to the ability to form fruiting bodies. Only Opisthokonts, (i.e., BAR for DNMBP, FYVE for FGD and EH for three fungal Dbl-like RhoGEFs have metazoan orthologs ITSN) are absent from T. trahens Dbl-like RhoGEFs (see sup- (FGD, ITSN, and DNMBP), as supported both by the phyloge- plementary fig. S4, Supplementary Material online). netic clustering of their DH domains and by the similar orga- Amoebozoa is the major phylum that regroups nization of structural domains. Nothing is known about the amoeba. Amoebozoa are defined as unicellular eukaryotes cellular roles of these RhoGEFs, due to their absence in yeast that move with highly dynamic pseudopodia. Amoebozoa biological models. are sub-divided into Conosa, which have a complex microtu- Apusomonads are heterotrophic flagellate that bule network at flagellate stages, and Lobosa, which do not live in soils, freshwater and marine habitats and which have (Cavalier-Smith et al. 2015). Among Conosa, the Mycetozoa an organic shell over the dorsal cell surface, called a theca. We class contains the true slime-molds, which are social amoebae identified 24 Dbl-like RhoGEFs in the genome of Thecamonas that develop a multicellular fruiting body upon starvation. trahens, of which 10 contain only DH or DH/PH domains (see Previous work reported the presence of 45 genes for Dbl- supplementary fig. S4A and table S4, Supplementary Material like RhoGEFs in the D. discoideum (Mycetozoa/ online). Other T. trahens RhoGEFs have DH/PH domains asso- Conosa; Vlahou and Rivero 2006). To get a more robust view ciated with additional domains, only two of which are classi- of Dbl-like repertoires in Mycetozoa, we examined four addi- cally found in Metazoa or Fungi (SH3 and MORN). The others tional Conosa species, namely Dictyostelium purpureum, domains were not observed in Fungi: The enzymatic Ras-like Dictyostelium fasciculatum, Acytostelium subglobosum and and ArfGAP, and the protein interacting motifs IQ (calmodu- Polysphondylium pallidum, which encoded respectively 45, lin-binding), LIM (Lin-11, Isl1, and Mec-3), SAM (Sterile Alpha 44, 64, and 44 Dbl-like RhoGEFs (see supplementary

Genome Biol. Evol. 1471–1486 doi:10.1093/gbe/evx100 Advance Access publication May 25, 2017 1481 Fort and Blangy GBE

table S4, Supplementary Material online). Phylogenetic anal- classically found in Metazoan RhoGEFs (CH, BAR, FYVE, ysis of their DH domains identified 44 robust clusters from RhoGAP, SH3, C1, C2, F-BOX, or MORN). However, A. cas- each species (see supplementary fig. S5A, Supplementary tellanii Dbl-like members do not contain any of the kinase, Material online). This indicates that the Dbl-like repertoire in RasGAP, LRR, and Myosin domains, found in Mycetozoa. Mycetozoa has remained stable for the last 600 Myr (Fiz- Neither do they contain either the ArfGAP or RasGEF do- Palacios et al. 2013). All Dbl-like RhoGEFs that regrouped in mains, which are found in of Entamoeba and Mycetozoa clusters had same functional domains associated. Several of (see supplementary fig. S4, Supplementary Material online). these domains were also found in RhoGEFs, like Thus, the repertoires of Dbl-like RhoGEFs in Amorphea the BAR and CH domains (DNMBP, VAV, and ARHGEF6, fig. clades are highly variable. Their numbers range from 15 to

1B) or LRR in Fungi (fig. 7B). Several domains that are not 72 in Metazoa, 5 to 35 in Fungi and from 46 to 108 in Downloaded from https://academic.oup.com/gbe/article-abstract/9/6/1471/3852527 by guest on 24 June 2020 found in Opisthokonts, like ANK, ARM, ArfGAP and IQ, are Amoebozoa. In the three clades, we observed multiple and present in T. trahens RhoGEFs (see supplementary fig. S4, independent loss and expansion events (fig. 6), which may be Supplementary Material online). Note that all D. discoideum directly linked to complexity of cell signaling. In addition, their DH domains are equally distantly related to metazoan DH, DH-associated domains are mostly different. The only do- which did not allow to detect any specific orthology. mains common to Amorphea clades are CH and SH3, two We next identified the Dbl-like members in four protein–protein interaction domains, FYVE/PHD, which tar- Entamoeba species (E. histolytica, E. invadens, E. dispar,and gets cell membranes, and RhoGAP, a negative regulator of E. nuttalli). Entamoeba also belong to Conosa and diverged Rho signaling. This suggests that they were likely present in from Mycetozoa around 1,500 Ma (Parfrey et al. 2011). The ancestral Amorphea and have prominent roles in basal Rho Entamoeba genus is made of amitochondriate and morpho- signaling. logically similar species, most of which are intestinal parasites (Stensvold et al. 2011). We identified 62 Dbl-like proteins in E. his,95inE. inv,63inE. dis and 60 in E. nut (see supplemen- Contrasting Evolutionary Repertoires of RhoGEF Families in tary table S4, Supplementary Material online). Interestingly, Bikonta although the numbers of Dbl-like members were higher in We next examined the presence of Dbl RhoGEFs in Bikonta Entamoeba than in Mycetozoa, the structural diversity asso- eukaryotes (fig. 6). Bikonta are clustered into three major su- ciated with their DH domains was much lower: Entamoeba pergroups whose relative positions are still debated (Derelle Dbl-like RhoGEFs lack the ARM, BAR, and IQ domains (see et al. 2015; Adl et al. 2012; He et al. 2014). The Bikonta are supplementary fig. S4, Supplementary Material online). divided into Archaeplastida, SAR and Excavates. The Phylogenetic analysis of the DH domains distributed them Archaeplastida are further divided into two supergroups, into 26 clusters, in which proteins have a similar domain or- the Viridiplantae (land and ) and ganization (see supplementary fig. S5B, Supplementary Rhodophyta (). The SAR supergroup is divided into Material online). It is probable that multiple Dbl-like members Stramenopiles (e.g., Phytophtora infestans, that causes the were lost in ancestral Entamoeba after the split with potato blight), Alveolates (e.g., the apicomplex Plasmodium Mycetozoa, reducing their repertoire from 45 down to 26. falciparum or the Paramecium tetraurelia), and Rhizaria This was followed by duplications leading to over 60 mem- (e.g., the amoeba-like Reticulomyxa filosa, a model system for bers. The four species examined diverged after the motility and organelle transport analysis, Ashkin et al. 1990). duplications. The Excavates supergroup is divided into Euglens (e.g., the Finally we examined the genome of Acanthamoeba castel- parasites Trypanosoma or Leishmania), Diplomonads (e.g., lanii, which belongs to Lobosa (Clarke et al. 2013), and iden- the intestinal parasite Giardia intestinalis), Heterolobosea tified 108 Dbl-like RhoGEFs (see supplementary fig. S4A, (Lee 2010), and Parabasalids (e.g., the sexually transmitted Supplementary Material online). Phylogenetic analysis infection Trichomonas vaginalis). Haptophyta and showed that the 108 DH domains distributed into 22 clusters Cryptophta are phylogenetically groups, al- and 33 single sequences (see supplementary fig. S5C, though they have been proposed to be related to SAR and Supplementary Material online). Although analysis of addi- Rhodophyceae (Parfrey et al. 2011; Reeb et al. 2009). To gain tional species is needed to get a more comprehensive view a deeper insight in the Bikonta, we also looked, in each spe- of the Dbl-like family in Lobosa, this nevertheless suggests cies, at the presence of Rho and DOCK proteins, as well as that the number of independent members is of the same RopGEF, exchange factors that are active on Rac-like proteins order in Lobosa and Mycetozoa. This implies that the number in plants and characterized by a PRONE domain (Berken et al. of Dbl-like members in ancestral Amoebozoa were in the 2005). same range. 74 of the 108 A. castellanii Dbl-like RhoGEFs We identified Dbl RhoGEFs in all Bikonta supergroups (see have only DH or DH/PH domains (see supplementary fig. supplementary fig. S6 and table S4, Supplementary Material S4A, Supplementary Material online). Among the domains online). However, the presence of Dbl-like proteins is variable associated with DH/PH in the 34 other RhoGEFs, most are within each supergroup and between phyla. Two clades have

1482 Genome Biol. Evol. 1471–1486 doi:10.1093/gbe/evx100 Advance Access publication May 25, 2017 Evolution of Dbl-Like RhoGEFs in Eukaryotes GBE

no Rho or RhoGEFs at all (i.e., Chlamydomonadales in sequences, irrespective to the presence or absence of Rac Chlorophyta or Apicomplexa in Alveolates). The absence of proteins (see supplementary fig. S7B, Supplementary Rho signaling may be associated with the fact that Material online). Although some cellular functions of DOCK Chlamydomonadales and Apicomplexa use gliding as a par- do not require its RacGEF activity (Ogawa et al. 2014), the ticular mode of locomotion, which depends on actin and my- presence of an apparently normal DHR2 domain in organisms osin class XIV but does not require membrane dynamics like that do not encode any Rac protein suggests that DHR2 do- amoeboid motion does (Heintzelman 2006). In mains may have additional activities in addition to regulating Archaeplastida, Viridiplantae do not encode Dbl-like proteins, Rac. although the exchange domain of SWAP70 orthologs has In summary, most Bikonta clades encode RhoGEFs of the been proposed to be structurally related to DH (Yamaguchi Dbl and DOCK families, indicating that, together with Rac, Downloaded from https://academic.oup.com/gbe/article-abstract/9/6/1471/3852527 by guest on 24 June 2020 et al. 2012). Viridiplantae encode Rac-like GTPases (Valster they were part of the basic Rho signaling module in LECA. et al. 2000) and GEFs of the DOCK and RopGEF families However, this module experienced independent loss or ex- (see supplementary fig. S6, Supplementary Material online). pansion events between and within taxa, varying from a total In Rhodophyta, the sister clade of Viridiplantae, the loss in Apicomplexa to acquisition of over 30 Dbl-like and Rac Florideaphyceae Chondrus crispus (Irish moss, a multicellular members in amoeba-like . red alga living in North Atlantic) encodes a Rho signaling set similar to other Viridiplantae. In contrast, the Cyanidiophyceae Discussion Galdieria sulphuraria and Cyanidioschyzon merolae, unicellu- lar and extremophilic red algae living in acidic hot sulfur By examining the genomes of 175 species covering all eukary- springs (Rothschild and Mancinelli 2001), encode Dbl-like pro- otic supergroups and spanning over 1.7 billion years of evo- teins but no DOCKs. These two unicellular red algae species lution, we show here that the Dbl RhoGEF family is present in thus express both Dbl-RhoGEFs and RopGEFs. This is also the all eukaryotic supergroups, implying it was already present in case of the Haptophyta Emiliania huxleyi, a phytoplanktonic the LECA. However, this family has experienced many inde- coccolithophore that has two Rac GTPases, four Dbl-like pendent expansion or loss episodes in branches of the same members and one RopGEF but has no DOCK protein. In the clades (fig. 6). This plasticity suggests that most of RhoGEF current phylogenetic view, Dbl-like genes were lost twice in diversity is not related to basic cellular metabolism but may Archaeplastida, once in the Rhodophyta clades (but not in rather reflect the diversity of external stimuli cells respond to. Cyanidiophyceae, the first clade that diverged from This hypothesis is supported by the tissue-specificity of Archaeplastida, Verbruggen et al. 2010) and once at the on- RhoGEF mRNA expression in Vertebrates (fig. 4) and by the set of Viridiplantae. Alternately, a single Dbl-like loss may be steepness of the linear relationship between the numbers of invoked if Cyanidiophyceae had gained Dbl-like genes by LGT, RhoGEFs and cell types in Metazoa (fig. 5). This is not the case which they are prone to (Qiu et al. 2013). for Rho GTPases activated by Dbl RhoGEFs (Jaiswal et al. In the Excavates and SAR supergroups, the presence of 2013), as their expression is ubiquitous in human tissues genes for Dbl-like RhoGEFs is heterogeneous. A high copy (Boureux et al. 2007) and their numbers vary in proportion number of Dbl, DOCK and Rho GTPases are found in to cell types in Metazoa with a much lower slope than Heterolobosea (Naegleria gruberi), Parabasalia (T. vaginalis) RhoGEFs (fig. 5). On the one hand, the remarkable conserva- and Rhizaria (R. filosa and Plasmodiophora brassicae). These tion of 14 of the 20 vertebrate Dbl subfamilies as far away as protists are not Amoebozoa yet they adopt a dynamic in Choanoflagellida/Filasterea unicellular might thus amoeba-like morphology, suggesting that independent am- have enabled the early emergence of the basic repertoire of plification of Rho components has enabled the acquisition of receptors and signaling in eukaryotes. On the other hand, the the amoeboid phenotype. many RhoGEF loss events observed in subclades may reflect a In contrast to these amoeboid protists, Euglenozoa and decreased diversity of signaling after their adaptation to spe- Diplomonadida do not encode Dbl-like proteins but they do cific ecological niches. For example, birds lost 12% of the encode unique DOCK proteins. Unexpectedly, we even de- vertebrate Dbl-like repertoire and this may be directly or indi- tected a DOCK protein in ten species that do not encode Rac rectly associated with establishmentofparticularfeatures, proteins (boxed in supplementary fig. S6, Supplementary such as exceptionally enlarged and diversified muscles or atyp- Material online). DOCK proteins of the ten species have a ical thermogenesis. Such features are thought to be driven by canonical domain structure (N-terminal lipid binding C2 and gene loss rather than gain (Newman et al. 2013). The loss of C-terminal catalytic DHR2 domains, see supplementary fig. ARHGEF19andARHGEF25inbirdsmayhavebeeninstru- S7A, Supplementary Material online) and are highly con- mental, as these genes control myogenesis and adipogenesis served at the amino acid level (70% overall similarity between in mice (Horii et al. 2009; Bryan et al. 2005). The loss of Trypanosoma and Leishmania, 53% between Bonodidae and PLEKHG2 might also have been instrumental in the evolution Trypanosomatidae). DOCK protein catalytic DHR2 domains of bird-specific features, since PLEKHG2 is involved in insulin- are 40–45% similar between human and euglenozoan stimulated GLUT4-mediated glucose uptake in L6 rat

Genome Biol. Evol. 1471–1486 doi:10.1093/gbe/evx100 Advance Access publication May 25, 2017 1483 Fort and Blangy GBE

myoblasts (Sato et al. 2014), and this process no longer occurs domains have different auxiliary domains, like AKAP13, in birds (Seki et al. 2003).Establishmentofparticularfeatures ARHGEF1, ITSN, PLEKHG5, or NGEF. Domain reshuffling has due to Dbl-like RhoGEF losses may concern other Metazoan also occurred in the TRIO family, in which OBSCN, TRIO, and sub-clades, like mice in mammals, nematodes and flies in KALRN have gained kinase and IG/FN domains from SPEG Ecdyzsozoa or yeasts in Fungi, which all had faster evolution kinases and Titin. Proteins with same domain organization rates. The finding that these organisms lost several RhoGEFs as RhoGEFs but lacking the DH domains also add an indirect implies that their respective physiology might be adapted to a support to reshuffling. reduced signaling diversity. This must be taken into account In conclusion, this study establishes that the family of Dbl- when addressing functional roles of RhoGEFs in these organ- like RhoGEFs had a highly complex pattern of evolution and isms, widely used as biological models. underwent repeated expansion and reduction events. Given Downloaded from https://academic.oup.com/gbe/article-abstract/9/6/1471/3852527 by guest on 24 June 2020 Decreased signaling diversity is also consistent with the that Dbl-like family complexity reflects the diversity of cell Dbl-like repertoire being smaller in Entamoeba than in signaling, this family of Rho regulators constitutes an adaptive Mycetozoa, since, as parasites of the intestines, Entamoeba toolbox whose requirement in eukaryotic cell physiology has inhabit a more constant environment than the free-living greatly varied depending on species biology. Mycetozoa. Besides, high numbers of Dbl-like RhoGEFs are observed in species of distinct clades but sharing all the amoe- Supplementary Material boid phenotype, that is, Amoebozoa, Heterolobosea, Parabasalia, and Rhizaria (see supplementary figs. S4 and Supplementary data are available at Genome Biology and S6, Supplementary Material online). Independent Dbl-like ex- Evolution online. pansion events thus occurred at least four times in amoeboid protists, which supports the notion that Dbl-like RhoGEFs are functionally involved in acquisition of the amoeboid pheno- Acknowledgments type. Amoebae produce different types of pseudopodia, whichtheyusetomoveandtofeedonbacteria.Amoebae We thank J. Piette and P. Lemaire for giving access to tunicate express different suites of genes when they encounter differ- transcriptomic data and P. Labbe´ for help in R scripts. We also ent bacterial species (Nasser et al. 2013), suggesting that dis- thank CRBM members for fruitful discussions and Julian tinct types of membrane receptors are involved and such Venables for help in writing the manuscript. This work was receptor diversity in amoebae may have been enabled by supported by institutional grants from the Centre National de the high number of Dbl-like proteins. Thus, despite their an- la Recherche Scientifique and from Montpellier Universite´. cient origin in Eukaryotes and their overall conservation in Metazoa, Dbl RhoGEFs appear as a dynamic family, which Literature Cited can adapt its size to the level of cell signaling diversity. Abascal F, Zardoya R, Posada D. 2005. ProtTest: selection of best-fit mod- We also show here that Dbl-like and DOCK RhoGEF fam- els of protein evolution. Bioinformatics 21:2104–2105. ilies were both present in the LECA. This implies that these Adl SM, et al. 2012. The revised classification of eukaryotes. J Eukaryot two families, which are both active on the same Rac-like pro- Microbiol. 59:429–514. teins, have distinct properties. One straightforward difference Albertin W, Marullo P. 2012. Polyploidy in fungi: evolution after whole- genome duplication. Proc Biol Sci. 279:2497–2509. is that DOCKs have additional functions as well as just acti- Amin E, et al. 2016. Deciphering the molecular and functional basis of vating Rac GTPases (Ogawa et al. 2014). This may explain the RHOGAP family proteins: a systematic approach toward selective in- unexpected finding that several Euglenozoa and activation of RHO family proteins. J Biol Chem. 291:20353–20371. Stramenopiles species encode DOCK proteins but no Ashkin A, Schu¨ tze K, Dziedzic JM, Euteneuer U, Schliwa M. 1990. Force GTPase of the Rho family. However, another striking differ- generation of organelle transport measured in vivo by an infrared laser trap. Nature. 348:346–348. ence between the Dbl-like and DOCK RhoGEFs concerns the Baldauf SL, Palmer JD. 1993. Animals and fungi are each other’s closest high diversity of domains associated with the tandem DH/PH relatives: congruent evidence from multiple proteins. Proc Natl Acad in Dbl-like RhoGEFs; DOCK proteins contain a C2 calcium- Sci U S A. 90:11558. binding domain and a DHR2 catalytic domain, either alone Barbosa-Morais NL, et al. 2012. The evolutionary landscape of alternative or associated with single SH3 or PH domains (Meller et al. splicing in vertebrate species. Science 338:1587–1593. Bashaw GJ, Hu H, Nobes CD, Goodman CS. 2001. A novel Dbl family 2005). In this respect, Dbl-like RhoGEFs have a much higher RhoGEF promotes Rho-dependent axon attraction to the central ner- capacity to evolve, by reshuffling various functional domains vous system midline in Drosophila and overcomes Robo repulsion. with the catalytic DH domain. Although a few domain com- J Cell Biol. 155:1117–1122. binations appeared conserved in the various clades studied Berken A, Thomas C, Wittinghofer A. 2005. A new family of RhoGEFs (e.g., FGD, DNMBP or ITSN in Amorphea), the general trend activates the Rop molecular switch in plants. Nature 436:1176–1180. Bosco EE, Mulloy JC, Zheng Y. 2009. Rac1 GTPase: a ‘Rac’ of all trades. is a high heterogeneity between eukaryotic clades, and even Cell Mol Life Sci. 66:370–374. within clades. Domain reshuffling is also supported by phylo- Boureux A, Vignal E, Faure S, Fort P. 2007. Evolution of the Rho family of genetic analyses, in which Dbl-like RhoGEFs with similar DH ras-like GTPases in eukaryotes. Mol Biol Evol. 24:203–216.

1484 Genome Biol. Evol. 1471–1486 doi:10.1093/gbe/evx100 Advance Access publication May 25, 2017 Evolution of Dbl-Like RhoGEFs in Eukaryotes GBE

Brawand D, et al. 2011. The evolution of gene expression levels in mam- Horii T, Morita S, Kimura M, Hatada I. 2009. Epigenetic regulation of malian organs. Nature 478:343–348. adipocyte differentiation by a Rho guanine nucleotide exchange fac- Brown MW, et al. 2013. Phylogenomics demonstrates that breviate flag- tor, WGEF. PLoS ONE. 4:e5809. ellates are related to opisthokonts and apusomonads. Proc R Soc Lond Jaiswal M, Dvorsky R, Ahmadian MR. 2013. Deciphering the molecular B Biol Sci. 280:20131755. and functional basis of Dbl family proteins: a novel systematic ap- Bryan BA, et al. 2005. Modulation of muscle regeneration, myogenesis, proach toward classification of selective activation of the Rho family and adipogenesis by the Rho family guanine nucleotide exchange proteins. J Biol Chem. 288:4486–4500. factor GEFT. Mol Cell Biol. 25:11089–11101. Katoh K, Misawa K, Kuma K, Miyata T. 2002. MAFFT: a novel method for Bustelo XR, Sauzeau V, Berenjeno IM. 2007. GTP-binding proteins of the rapid multiple sequence alignment based on fast Fourier transform. Rho/Rac family: regulation, effectors and functions in vivo. BioEssays Nucleic Acids Res. 30:3059–3066. News Rev Mol Cell Dev Biol. 29:356. King N, et al. 2008. The genome of the choanoflagellate Monosiga bre-

Carroll SB. 2001. Chance and necessity: the evolution of morphological vicollis and the origin of metazoans. Nature 451:783–788. Downloaded from https://academic.oup.com/gbe/article-abstract/9/6/1471/3852527 by guest on 24 June 2020 complexity and diversity. Nature 409:1102–1109. Krakauer DC, Nowak MA. 1999. Evolutionary preservation of redundant Cavalier-Smith T, et al. 2015. Multigene phylogeny resolves deep branch- duplicated genes. Semin Cell Dev Biol. 10:555–559. ing of Amoebozoa. Mol Phylogenet Evol. 83:293–304. Lee J. 2010. De novo formation of basal bodies during cellular differenti- Clarke M, et al. 2013. Genome of Acanthamoeba castellanii highlights ation of Naegleria gruberi: progress and hypotheses. Semin Cell Dev extensive lateral gene transfer and early evolution of tyrosine kinase Biol. 21:156–162. signaling. Genome Biol. 14:R11. McCarthy MC, Enquist BJ. 2005. Organismal size, metabolism and the Criscuolo A, Gribaldo S. 2010. BMGE (Block Mapping and Gathering with evolution of complexity in metazoans. Evol Ecol Res. 7:681–696. Entropy): a new software for selection of phylogenetic informative Meller N, Merlot S, Guda C. 2005. CZH proteins: a new family of Rho- regions from multiple sequence alignments. BMC Evol Biol. 10:210. GEFs. J Cell Sci. 118:4937–4946. Derelle R, et al. 2015. Bacterial proteins pinpoint a single eukaryotic root. Nasser W, et al. 2013. Bacterial discrimination by Dictyostelid amoebae Proc Natl Acad Sci U S A. 112:E693–E699. reveals the complexity of ancient interspecies interactions. Curr Biol de Hoon MJ, Imoto S, Nolan J, Miyano S. 2004. Open source clustering CB. 23:862–872. software. Bioinformatics 20:1453–1454. Newman SA, Mezentseva NV, Badyaev AV. 2013. Gene loss, thermogen- dos Reis M, et al. 2012. Phylogenomic datasets provide both precision and esis, and the origin of birds. Ann N Y Acad Sci. 1289:36–47. accuracy in estimating the timescale of placental mammal phylogeny. Nielsen C. 2008. Six major steps in evolution: are we derived Proc Biol Sci. 279:3491–3500. larvae? Evol Dev. 10:241–257. Edgar RC. 2004. MUSCLE: a multiple sequence alignment method with O’Brien HE, Parrent JL, Jackson JA, Moncalvo JM, Vilgalys R. 2005. Fungal reduced time and space complexity. BMC Bioinformatics. 5:113. community analysis by large-scale sequencing of environmental sam- Erixon P, Svennblad B, Britton T, Oxelman B. 2003. Reliability of Bayesian ples. Appl Environ Microbiol. 71:5544–5550. posterior probabilities and bootstrap frequencies in phylogenetics. Syst Ogawa K, et al. 2014. DOCK5 functions as a key signaling adaptor that Biol. 52:665–673. links FceRI signals to microtubule dynamics during mast cell degranu- Eva A, Aaronson SA. 1985. Isolation of a new human oncogene from a lation. J Exp Med. 211:1407–1419. diffuse B-cell lymphoma. Nature 316:273–275. Parfrey LW, Lahr DJ, Knoll AH, Katz LA. 2011. Estimating the timing of Fiz-Palacios O, et al. 2013. Did terrestrial diversification of early eukaryotic diversification with multigene molecular clocks. Proc (Amoebozoa) occur in synchrony with land plants? PLoS ONE. Natl Acad Sci U S A. 108:13624–13629. 8:e74374. Pei J, Kim B-H, Grishin NV. 2008. PROMALS3D: a tool for multiple protein Flicek P, et al. 2014. Ensembl 2014. Nucleic Acids Res. 42:D749–D755. sequence and structure alignments. Nucleic Acids Res. 36:2295–2300. Floudas D, et al. 2012. The Paleozoic origin of enzymatic lignin decompo- Peters RS, et al. 2014. The evolutionary history of holometabolous insects sition reconstructed from 31 fungal genomes. Science inferred from transcriptome-based phylogeny and comprehensive 336:1715–1719. morphological data. BMC Evol Biol. 14:52. Guindon S, Gascuel O. 2003. A simple, fast, and accurate algorithm to Qiu H, Yoon HS, Bhattacharya D. 2013. Algal endosymbionts as vectors of estimate large phylogenies by maximum likelihood. Syst Biol. horizontal gene transfer in photosynthetic eukaryotes. Front Sci. 52:696–704. 4:366. Hart MJ, Eva A, Evans T, Aaronson SA, Cerione RA. 1991. Catalysis of Raffield LM, et al. 2015. Heritability and genetic association analysis of guanine nucleotide exchange on the CDC42Hs protein by the dbl neuroimaging measures in the Diabetes Heart Study. Neurobiol Aging. oncogene product. Nature 354:311–314. 36:1602.e7–1615. He D, et al. 2014. An alternative root for the eukaryote tree of life. Curr Raimondi F, Portella G, Orozco M, Fanelli F. 2011. Nucleotide binding Biol. 24:465–470. switches the information flow in ras GTPases. PLoS Comput Biol. Hedges SB, Blair JE, Venturi ML, Shoe JL. 2004. A molecular timescale of 7:e1001098. eukaryote evolution and the rise of complex multicellular life. BMC Reeb VC, et al. 2009. Interrelationships of chromalveolates within a Evol Biol. 4:2. broadly sampled tree of photosynthetic protists. Mol Phylogenet Hedges SB, Dudley J, Kumar S. 2006. TimeTree: a public knowledge-base Evol. 53:202–211. of divergence times among organisms. Bioinformatics 22:2971–2972. Roger AJ, Simpson AGB. 2009. Evolution: revisiting the root of the eu- Heintzelman MB. 2006. Cellular and molecular mechanics of gliding loco- karyote tree. Curr Biol CB. 19:R165–16R167. motion in eukaryotes. In: Kwang, J, editor. International review of Rojas AM, Fuentes G, Rausell A, Valencia A. 2012. The Ras protein super- cytology. International review of cytology. Vol. 251. Academic Press. family: evolutionary tree and role of conserved amino acids. J Cell Biol. pp. 79–129. [accessed 2016 Sep 6]. Available from: http://www.sci- 196:189–201. encedirect.com/science/article/pii/S0074769606510034. Ronquist F, et al. 2012. MrBayes 3.2: efficient Bayesian phylogenetic inference Higuchi N, Kohno K, Kadowaki T. 2009. Specific retention of the and model choice across a large model space. Syst Biol. 61:539–542. protostome-specific PsGEF may parallel with the evolution of mush- Rossman KL, Der CJ, Sondek J. 2005. GEF means go: turning on RHO room bodies in insect and lophotrochozoan brains. BMC Biol. 7:21. GTPases with guanine nucleotide-exchange factors. Nat Rev Mol Cell Holland ND. 2005. . Curr Biol CB. 15:R911–R914. Biol. 6:167–180.

Genome Biol. Evol. 1471–1486 doi:10.1093/gbe/evx100 Advance Access publication May 25, 2017 1485 Fort and Blangy GBE

Rothschild LJ, Mancinelli RL. 2001. Life in extreme environments. Nature Suga H, et al. 2013. The Capsaspora genome reveals a complex unicellular 409:1092–1101. prehistory of animals. Nat Commun. 4:2325. Saldanha AJ. 2004. Java Treeview – extensible visualization of microarray Tcherkezian J, Lamarche-Vane N. 2007. Current knowledge of the large data. Bioinformatics 20:3246–3248. RhoGAP family of proteins. Biol Cell Auspices Eur Cell Biol Organ. Sato K, et al. 2014. PLEKHG2/FLJ00018, a Rho family-specific guanine 99:67–86. nucleotide exchange factor, is tyrosine phosphorylated via the Tiphaine C, et al. 2013. Correlated changes in occlusal pattern and diet in EphB2/cSrc signaling pathway. Cell Signal. 26:691–696. stem Murinae during the onset of the radiation of Old World rats and Schad E, Tompa P, Hegyi H. 2011. The relationship between proteome mice. Evol Int J Org Evol. 67:3323–3338. size, structural disorder and organism complexity. Genome Biol. Torruella G, et al. 2012. Phylogenetic relationships within the 12:R120. Opisthokonta based on phylogenomic analyses of conserved single- Seki Y, Sato K, Kono T, Abe H, Akiba Y. 2003. Broiler chickens (Ross strain) copy protein domains. Mol Biol Evol. 29:531–544.

lack insulin-responsive glucose transporter GLUT4 and have GLUT8 Valentine JW, Collins AG, Meyer CP. 1994. Morphological complexity in- Downloaded from https://academic.oup.com/gbe/article-abstract/9/6/1471/3852527 by guest on 24 June 2020 cDNA. Gen Comp Endocrinol. 133:80–87. crease in metazoans. Paleobiology 20:131–142. Smith JJ, et al. 2013. Sequencing of the sea lamprey (Petromyzon marinus) Valster AH, Hepler PK, Chernoff J. 2000. Plant GTPases: the Rhos in bloom. genome provides insights into vertebrate evolution. Nat Genet. Trends Cell Biol. 10:141–146. 45:415–421, 421. 2. Verbruggen H, et al. 2010. Data mining approach identifies research pri- Sone M, et al. 1997. Still life, a protein in synaptic terminals of dro- orities and data requirements for resolving the red algal tree of life. sophila homologous to GDP-GTP exchangers. Science BMC Evol Biol. 10:16. 275:543–547. Vlahou G, Rivero F. 2006. Rho GTPase signaling in Dictyostelium discoi- Stensvold CR, et al. 2011. Increased sampling reveals novel lineages of deum: insights from the genome. Eur J Cell Biol. 85:947–959. Entamoeba: consequences of genetic diversity and host specificity for Yamaguchi K, et al. 2012. SWAP70 functions as a Rac/Rop guanine and molecular detection. Protist 162:525–541. nucleotide-exchange factor in rice. Plant J. 70:389–397. Steven R, Zhang L, Culotti J, Pawson T. 2005. The UNC-73/Trio RhoGEF-2 Zheng Q, et al. 2009. Family-based association study of the MCF2L2 gene domain is required in separate isoforms for the regulation of pharynx and polycystic ovary syndrome. Gynecol Obstet Invest. 68:171–173. pumping and normal neurotransmission in C. elegans. Genes Dev. 19:2016–2029. Associate editor: Richard Cordaux

1486 Genome Biol. Evol. 1471–1486 doi:10.1093/gbe/evx100 Advance Access publication May 25, 2017